Something that is pretty important, and is missing from this guide, is to make sure you add headers indicating what the original IP address for the requests were (either in x-forwarded-for or x-real-ip or something else common.)
On a security note - your application code should only trust those headers (X-Forwarded-For, X-Real-Ip, etc) for IP lookup if you control the load balancer and strip it from incoming requests.
There is nothing to stop a malicious client adding the header themselves and if you rely on IP lookup (i.e. Dev Mode active for 127.0.0.1) for access control you can leave yourself wide open. While I can't find the article at the moment, Stack Overflow accidentally gave admin level access to the site because of this over sight.
In my experience if a client adds their own X-Forwarded-For header trying to spoof their IP, nginx simply prepends it to the X-Forwarded-For header like "1.2.3.4, 33.33.33.1", where 1.2.3.4 is the address the client supplied in their spoofing attempt, and 33.33.33.1 is the actual IP address forwarded by nginx.
So you can choose to trust only the rightmost one, if there are several entries in the list.
For backend nginx instances (we use nginx to balance application servers, and nginx right in front of Unicorn on those application servers) use the Real IP module to have logs transparently show the original request IP not the load balancer's IP.
I'm really on the fence between Haproxy or Nginx. I have used Haproxy successfully in the past, but I'm tempted by the simplicity of Nginx, especially now that it supports SPDY.
Would like to hear people's thoughts on using Nginx in "real life" for load balancing rather than Haproxy.
I can't compare it to HAProxy, but nginx load balancing was probably the simplest and most reliable part of our web infrastructure, and did exactly what we wanted and needed. Never played around with SPDY, but I liked the various options regarding server weighting, SSL termination, and like you mention the ease of configuration. It wasn't too fancy, but solved a problem and solved it well.
Unfortunately we had to switch off of it due to PCI compliance concerns[1], but I'd use it again in a heartbeat.
[1] not because there were actual issues, but because other solutions were fully audited out of the box. I'm hardly surprised that we've had more issues with those solutions than we ever had with nginx, including the time when we barely knew how to configure the thing. One of the unavoidable hazards of PCI Level 1 :( We still use it for the actual web requests quite happily.
I love nginx. You stole the words out of my mouth re: simplicity & stability of nginx for load balancing. If there's one thing you can really nail like a pro while still a rookie (like me) it's configuring nginx to load balance.
We didn't have enough servers behind it to really deal with dead servers, to be honest. Seemed to detect a failed server and route around it quickly enough, and we have monitoring per server in place to go in and reboot the thing or whatever.
The configs are pretty straightforward, but might get a little nuts if you're dealing with hundreds of servers behind the thing. I don't have to wear a sysadmin hat too frequently (thank god) but when I did it was pretty easy to deal with.
Huge fan of the fact that reloading the config would perform a configtest automatically before trying to apply the new settings. I don't know why all software doesn't do this.
I don't know too many details as I wasn't on that side of the PCI audit (more handling the software we write), but my impression was that off-the-shelf hardware was already certified where nginx was not. It was also one less component for us to manage, as we opted for hosting where we manage our web stack and the hosting company deals with the hardware and network.
I've been using it heavily under production loads, and the balancing portion hasn't blinked.
I'm also doing SSL termination at it, so I don't really have any metrics on the balancing in isolation, but for moving 50-100 concurrent connections around it hasn't blinked.
I do really like HAProxy's more flexible up/down monitoring, though. In the past, we've done the trick with separate control connections that we can bring up & down with iptables to shuffle traffic around without any broken connections.
How much traffic are you pushing through it? I've been planning to set up stunnel + HAProxy on separate instances once we're comfortable going with 1.5, but am curious if we could get away terminating SSL on the same instance as HAProxy runs.
Nginx has also formed the basis of CloudFoundry's routing tier, and this cloudfoundry.com and appfog.com. Nginx load balancing can be very simple, but you can customize it to your heart's content with Lua [1].
You can also use it for your application tier with Passenger, [U]WSGI, FPM, FCGI...
What does Nginx offer that HAProxy doesn't? We've been using HAProxy 1.4 for over a year now, up to 1500 req/s on a virtual machine. It's the most reliable piece of all of our infrastructure. Hardest part has been tuning the Linux instance for a lot of connections when encountering DDOS attacks and the like.
We also run another HAProxy instance for rate limiting for attacked sites that feeds back into the main load balancer. And this is Layer 7 load balancing including inspecting headers. Never breaks a sweat. 1.5 supports SPDY, which is the last big thing for us (though I need it in the opposite direction from other mentions, used alongside stunnel).
To add more, as well: we use HAProxy for deploying (tell it to take down nodes using the unix socket), we have it set up to meter requests to a machine just brought back online, and with one command we can route all traffic to a standby Apache instance that serves a maintenance page. On top of that it has the monitoring page, and using the socket we pull stats every minute and ship them off to Librato Metrics as well as watch for high sessions and the like. I (obviously) cannot sing its praises enough.
You can setup stunnel to terminate SSL then append this line to the request that's sent to HAProxy, which will then add an X-Forwarded-For header from that info. This may be relevant to your interests, though: http://www.igvita.com/2012/10/31/simple-spdy-and-npn-negotia...
Can either of those solution do dynamic secure web sockets? I want to terminate SSL to various dynamic backend web socket servers. I'm spinning up additional web socket servers per user for user privilege separation.
I'm terminating the SSL outside VMs, so the VMs can be compromised without giving up the certificate's private key.
The VMs are each running a websocket server running as the user that will be connecting. This makes the security aspects very easy to handle. Each user can only modify their own environment and write to their own files (backed by unix permissions). Even if they root the VM (excluding hypervisor vulnerabilities) they won't be able to access any private data.
If I want to be able to hot migrate VMs between physical machines, I need some way of dynamically proxying the connections. If I had lots of IPs, I could simply let each VM have an IP address and the SSL terminator would route properly no matter where I move the VM.
No, not really. Sounds like you want to update the backend servers that your load balancer is proxying to while the load balancer is up? Can't you just create an internal network if you need IPs? I think this hinges on what you mean by the phrase "dynamic proxy"?
Another awesome thing - as of 1.3.1/1.2.2, nginx can do least connections load balancing, which is better if your upstream response time isn't very consistent.
It's based on NodeJS, and it's really good. I've been using it in front of three web servers serving around 800 small-to-medium business websites for the last six months and it's been fantastic.
It pull configuration data from Redis so you can easily do things like automating deployments etc.
If you really want to do it dynamically, you could read the upstreams from Redis with a combination of Nginx modules. Or use Puppet and restart, as the other poster suggested, as it won't break connections in progress.
Storing and reading the configuration from Redis will be slower at runtime, but should scale more easily for a large number of hosts and be much more responsive than using Puppet or Chef. I recommend choosing either solution based on the number of hosts and rate of change.
That is what puppet or a similar configuration management solution is for (or doing it by hand I suppose). There's not really a realistic scenario I can think of where you'd be wanting a built-in nginx function to service this requirement.
In puppet it's pretty easy as well. The puppet module modification I made to puppet-nginx allows you to do resource collection for a group of upstream servers (and thus add to a group of upstream locations transparently).
That's really helpful & nicely written - shouldn't that be added to the NGINX Wiki in some way (either as a complete how-to-do-this or just as a link)?
Beyond that, it depends how you're using (HTTP load balancing, TCP only, etc) Got any specific questions? We've been running it in production for over a year.
Here or in an article? We use both in our infrastructure.
Generally I would say that if you're proxying web connections and need caching or the ability to do lots of complicated rewriting on the proxy side, use nginx. If you're proxying database, mail or similar... haproxy. If you don't need any caching or similar, either nginx or haproxy depending on your application.
I'm not terribly experienced with Nginx, but HAProxy was (and is) a load balancer first where Nginx is a server with load balancing abilities (same as Apache can, though it gets less love these days). HAProxy has pretty powerful HTTP support and capabilities, however, so I'm not sure I buy the other argument in this thread.
HAProxy also allows you to modify balanced nodes while the server is running, and has fantastic logging once you get used to looking at it.
Can do this in the root location "/" with:
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
Also good to remember to put another header in for the forwarded protocol (if you're terminating an ssl tunnel at the balancer.)