That’s interesting - I think you’re right. We might move our staging cluster into our main production deployment.
More likely though, AWS or OpenShift running on bare metal on a beefy ATX tower in the office. We want to have production and staging as close to each other as possible, so this is an additional reason and a p0 flag on reducing the dependency on Google-specific bits of Kubernetes as much as possible, hopefully also useful for our exit strategy as well.
Kubespray works well for me for setting up a bare bones kubernetes cluster for the lab.
I'll use helm to install metallb for the load balancer, which you can then tie into whatever egress controller you like to use.
For persistent storage a simple NFS server is the bees knees. Works very well and a NFS provisioned is a helm install. Very nice, especially, over 10GbE. Do NOT dismiss NFSv4. It's actually very nice for this sort of thing. I just use a small separate Linux box with software raid on it for that.
If you want to have the cluster self-host storage or need high availability then GlusterFS works great, but it's more overhead to manage.
Then you just use normal helm install routines to install and setup logging, dashboards, and all that.
Openshift is going to be a lot better for people who want to do multi-tenant stuff in a corporate enterprise environment. Like you have different teams of people, each with their own realm of responsibility. Openshift's UI and general approach is pretty good about allowing groups to self-manage without impacting one another. The additional security is a double edged. Fantastic if you need it, but annoying barrier to entry for users if you don't.
As far as AWS goes... EKS recently lowered their cost from 20 cents per hour to 10 cents. So costs for the cluster is on par with what Google is charging.
Azure doesn't charge for cluster management (yet), IIRC.
(replying to freedomben): NFS has worked fairly well for persistent file storage that doesn't require high performance for reads/writes (e.g. good for media storage for a website with a CDN fronting a lot of traffic, good for some kinds of other data storage). It would be a terrible solution for things like database storage or other high-performance needs (clustering and separate PVs with high IOPS storage would be better here).
It's good to have multiple options if you want to host databases in the cluster.
For example you could use NFS for 90% of the storage needs for logging and sharing files between pods. Then use local storage, FCOE, or iSCSI-backed PVs for databases.
If you are doing bare hardware and your requirements for latency are not too stringent then not hosting databases in the cluster is also a good approach. Just used dedicated systems.
If you can get state out of the cluster then that makes things easier.
All of this depends on a huge number of other factors, of course.
> Have you used NFS for persistent storage in prod much?
I think NFS is heavily underrated. It's a good match for things like hosting VM images on a cluster and for Kubernetes.
In the past I really wanted to use things iSCSI for hosting VM images and such things, but I've found that NFS is actually a lot faster for a lot of things. There are complications to NFS, of course, but they haven't caused me problems.
I would be happy to use it in production, and have recommended it, but it's not unconditional. It depends on a number of different factors.
The only problem with NFS is how do you manage the actual NFS infrastructure? How much experience does your org have with NFS? Do you already have a existing file storage solution in production you can expand and use that with Kubernetes?
Like if your organization already has a lot of servers running ZFS, then that is a nice thing to leverage for NFS persistent storage. Since you already have expertise in-house it would be a mistake not to take advantage of it. I wouldn't recommend this approach for people not already doing it, though.
If you can afford some sort of enterprise-grade storage appliance that takes care of dedupe, checksums, failovers, and all that happy stuff, then that's great. Use that and it'll solve your problems. Especially if there is some sort of NFS provisoner that Kubernetes supports.
The only place were I would say it's a 'Hard No' is if you have some sort of high scalability requirements. Like if you wanted to start some web hosting company or needed to have hundreds of nodes in a cluster. In that case then distributed file systems is what you need... Self-hosted storage aka "Hyper Converged Infrastructure". The cost and overhead of managing these things is then relative small to the size of the cluster and what you are trying to do.
It's scary to me to have a cluster self-host storage because storage can use a huge amount of ram and cpu at the worst times. You can go from a happy low-resource cluster, then a node fails or other component takes a shit, and then while everything is recovering and checksum'ng (and lord knows what) the resource usage goes through the roof right during a critical time. The 'perfect storm' scenarios.
My experience with NFS over the years has taught me to avoid it. Yes, it mostly works. And then every once a while you have a client that either panics or hangs. Despite the versions of Linux, BSD, Solaris, Windows changing over the decades. The server end is usually a lot more stable. But that's of little to no comfort to know that yes, other clients are fine.
However, if you can tolerate client side failure then go for it.
What? Shouldn't you try to make the creation and deletion of your staging cluster cheap instead of moving it to somewhere else?
And if that is your central infrastructure, shouldn't it be worth the money?
I do get the issue with having cheap and beefy hardware somewhere else, i do that as well, but only for private. My hourly salary spending or wasting time on stuff like that costs the company more than just paying for an additional cluster with the same settings but perhaps with much less Nodes.
If more than one person is using it, the multiplication effects for suddenly unproductive people, is much higher. Also that decreases the per head cost.
More likely though, AWS or OpenShift running on bare metal on a beefy ATX tower in the office. We want to have production and staging as close to each other as possible, so this is an additional reason and a p0 flag on reducing the dependency on Google-specific bits of Kubernetes as much as possible, hopefully also useful for our exit strategy as well.