Lua support in nginx is phenomenal, especially when combined with LuaJIT. It basically allows to transform nginx into an application server and run arbitrary code that can pretty much do anything.
I would like to mention agentzh and his team that did an amazing job in releasing OpenResty[1] which makes it easy to extend nginx with custom Lua functionality, which also happens to be the backbone of CloudFlare architecture, and the core technology being used by projects like Kong[2] when it comes to microservices management.
I built a nginx+luajit RTB bidder that did 168k qps on 8 cores. It smashed the C eventlib version and of course it crushed the Java version. Golang came in a close second but at the time with 8 cores I had to do crazy things to get it to pin to CPU's since the golang thread scheduling didn't seem very scaleable beyond 4 cores.
These bidding platforms typically don't provide you with JSON but with some headers or query parameters that are very quick to parse (especially for the native nginx code), so your Lua code can basically boil down to a few table lookups.
May be I got it the wrong. Are the time spent on the bid processing portion (CPU bounded) or the network messaging and connection handling portion (IO bounded)? When talking about eventlib and nginx, I assume the network messaging and connection handling are relevant. In that case 168K requests/sec doesn't sound too great.
This benchmark shows Ngix+Lua (openresty) is more than 5 times slower than the leading ones, which can do more than 6M requests/sec.
The problem w/ engine's Lua support is that it doesn't go far enough. Apache's mod_lua allows you to create complete modules for Apache, with total access to Apache's API.
I really enjoy nginx, as it's more flexible to configure but I've never understood why Apache got this slow label...
FTA:
In February 2012, the Apache 2.4.x branch was released to the public. Although this latest release of Apache has added new multi-processing core modules and new proxy modules aimed at enhancing scalability and performance, it's too soon to tell if its performance, concurrency and resource utilization are now on par with, or better than, pure event-driven web servers. It would be very nice to see Apache application servers scale better with the new version, though, as it could potentially alleviate bottlenecks on the backend side which still often remain unsolved in typical nginx-plus-Apache web configurations.
I'm using Apache 2.4 with mpm_event + mod_proxy_fcgid and it's doing fine - 99% of the work and time spend is done in the FastCGI application anyway and for static content mpm_event is good enough. I wouldn't run a dedicated static CDN box on Apache but for everything that can run on a single server Apache can also do the job... even HTTP/2 with mod_h2 works fine as of 2.4.17
A problem with nginx is to figure out what matches in a complex config... it's not straightforward. .htaccess is nice and simple for a shared server with lot's of users.
I really like nginx but I guess most people just don't really need it. Migrating to 2.4 and mpm_event should be good enough.
Historical reasons...after nginx started blowing Apache's performance out of the water, Apache addressed some fundamental architectural issues and caught (mostly) up.
As an example, I would refer you to a discussion down this thread about Apache 1.x forking for concurrent connections.
Also just for context remember that on linux forking and threads are basically the same thing so you can't just switch out one for the other and expect performance to change (it won't).
"forking and threads are basically the same thing"
It's nowhere near that simple. You can fork lightweight like threads or heavy. One of the terrible things about Apache performance was all the heavy forking on every single request for generated pages. That was back when it was really expensive.
I've used Apache forever and have always been able to tune it to get the job done. The thing that made me switch to nginx was the ability to use more than one SSL cert on the same IP. Now that I was "forced" to try it, I'm pretty happy with it.
there was never a time when nginx supported sni and apache did not.
Nginx requires an openssl version that supports sni -- that was added in 0.9.8f on Oct 2007 (if you compiled openssl manually to enable it), and was enabled by default in 0.9.8j on Jan 2009.
Apache supported SNI using mod_gnutls since 2005; and in 2.2.12 (July 2009) using openssl.
Technically correct, however for users that install from OS packages it's possible that they're getting a version of nginx that had SNI enabled and a version of Apache that did not.
No matter how many times I see “nginx”, and know that it’s supposed to be “engine X”, I always pronounce it as [ŋɪŋks] in my head!
The way nginx handles requests and responses in an implicit event loop reminds me of a recent talk by Brian Kernighan, in which he mentions the ubiquity of the “pattern–action” model in many domains. I think it’s a very useful architectural pattern to have in mind when you’re designing a configuration system or a DSL.
I also liked this quote:
> …it is worth avoiding the dilution of development efforts on something that is neither the developer’s core competence or the target application.
Until I learnt the correct way of pronouncing it, I always thought it was "en-ginks".
Sometimes there can be a long gap (even many years) between first reading about something and first using its name in conversation. A long time for mispronunciations to stew away in my brain (assuming the person I'm talking to even knows how to pronounce it themselves.)
I still thought that until just now - that there were 2 different words: epitomy and epitome; epitome being stronger: suggesting some kind of singular platonic ideal the former meaning just a good example of this kind of thing
Just as I used to think misled (miss-led) and misled (my-zled) were 2 different words - the latter implying an element of malice
Oh, don't feel too bad; a few weeks ago, my wife had lunch with the Ivy League-pedigreed nth-generation CEO of a reasonably large midsized company who used the exact same mispronunciation. Paris being worth a mass, she chose not to correct him.
I’m a native English speaker with a background in phonetics and a silly brain. I speak several languages very poorly. :)
Rather like Larry Wall:
> I started trying to teach myself Japanese about 10 years ago, and I could speak it quite well, because of my phonology and phonetics training–but it’s very hard for me to understand what anybody says. So I can go to Japan and ask for directions, but I can’t really understand the answers!
While I like nginx over apache because I've been burned too many times by strange apache configs, I have recently found and am growing to love Hiawatha. It's a GPL webserver focused on security, uses PolarSSL which jut got bought out and is now mbed, and it's pretty fast. All the benchmarks I've seen show it comparable to stock nginx, apache, but once you tack on some of the optimizations nginx will beat Hiawatha. It also has a very easy config syntax.
Well, I could have sworn I remember seeing a benchmark comparing stock vs basically tuned (config files) nginx, hiawatha, and a few others, where stock hiawatha won but tuned nginx beat it out, but I just spent ten minutes or so looking for it and couldn't find it, so I'm going to have to retract my earlier statement I suppose.
Either way, I think Hiawatha is a great webserver that should get more attention. Especially since I am GPL proponent and Hiawatha is one of the only currently maintained GPL webservers.
As a bonus here is a benchmark of nginx and hiawatha under attack. (notice the service drop on nginx)
Every "new" webserver tries to fix apache configuration syntax mess, and imho they all fail. Yes, setting up a reverse proxy looks simpler with nginx or haproxy for that matter. Now when it comes to complex configurations they all suck, and I'm not sure a json/yaml config format is going to fix that as long as webservers have such wide scopes (from serving static pages, to proxying traffic, authorizing, authenticating, encrypting...). At least apache is very modular on this regard and some credit should be given to it having survived and evolved along with all the newer options.
The better choice is not a new webserver program with a different, possibly better, configuration language, it's HTTP server libraries that can be used from actual programming languages to do whatever you want. Many languages ship with really good libraries these days.
They indeed are shipping with really good libraries these days but no matter how good they are, there is some level of doubt that they fail to alleviate and they get deployed behind nginx or apache httpd anyway.
It seems more like they are deployed behind load balancers, because load balancing is a specific job that it makes sense to do with a separate piece of software. Admittedly, that separate piece of software is often nginx, but there are other popular options, and I don't think very many people use Apache that way (though I could be wrong).
I wish there was more input as to why Apache 2.4 isn't suitable. It's been 3 years since its event driven model was released and it is a perfectly acceptable web server even for static content.
Apache 2.4 isn't event driven. The Apache's Even MPM only handles keep-alive connections asynchronously, while the whole request processing is still synchronous. It solves only one problem of Apache, but it still doesn't make it as scalable as nginx.
Hi, I'm one of the main authors (along with others, it is a community driven project) of the Event MPM for Apache.
On Benchmarks and timelines: I agree that 2.2+ was not widely available for many years due to the update cycle of linux distributions, but at the time nginx wasn't in the distros... so it become a thing where people would yum install apache2, get a 2-5 year old version, and they would benchmark that against an ngnix from their latest dev download.
The original work for the Event MPM started around 2004:
The version in 2.2 was mostly focused on Keep-Alive requests. Apache 2.2.0 was first kicked out on December 1, 2005.
To go beyond Keep-Alive requests, is a set of features/patches called "Async Write Completion". Much of this work was done in 2006-2007 by Graham Leggett:
Timing wise, most of that work did not find its way into a stable release until 2.4, which came out February 17, 2012. This is the date the article references.
You're quoting from a paragraph that's summarizing the history of Apache, and the statement was true about Apache 1.x.
Back in the day, Apache 2.0 took a long time to gain substantial market share, for various reasons. The situation was not too dissimilar from the current Python 2/3 split.
Edit: But I guess Apache never forked for every connection, even in 1.3. It only forked if it needed a new child and the existing ones were busy.
Apache 1.3 did not fork a process per connection. It forked to handle additional concurrent connections, but that's very different than forking for each and every connection.
> In February 2012, the Apache 2.4.x branch was released to the public. Although this latest release of Apache has added new multi-processing core modules and new proxy modules aimed at enhancing scalability and performance, it's too soon to tell if its performance, concurrency and resource utilization are now on par with, or better than, pure event-driven web servers.
When was this written? Is it still too soon to tell? 2.5 years seems like enough time to tell?
More impressive in the same book, how another server (Warp) written in Haskell (GCed lazy functional language, supposedly way slower than C over epoll) achieves the same performance as nginx!
Back when I was tinkering with mod_python performance[0] there was this web server called nxweb[1], which out-performed nginx consistently by quite a bit.
Why do web servers always seem to invent their own config file format? Whilst nginx seems to do it slightly more sanely than Apache, it still doesn't use something like YAML or JSON; is there a good/obvious reason for this I'm missing?
nginx configs can be difficult enough on their own without constraining them to formats that wouldn't allow directives that take arguments preceding a block, not to mention the extra escaping that would come with quoting already-quoted strings. This is a common formulation:
location ~* \.(jpe?g|png|gif|ico)$ {
...
}
That would be pretty messy inside of JSON or YAML. You couldn't have the location line be a key for a map/hash, because you can have multiple blocks with the same "key".
Besides, as bad as nginx configs are, just trying to understand which block "traps" a request is where you can spend most of your time; see ifIsEvil [1].
Thanks for that comment. Running a complex nginx config with lot's of special treatment for subfolders on nginx - why and where a requests ends up in such a configuration is really difficult to tell.
However Apache just can't do some things that help you shooting in your foot. Maybe the C++ vs. Java comparison is not too far fetched.
nginx is one of those things that requires its own mentality to really get. Like SQL's set logic or CSS's forward-looking selectors, it's not quite like traditional imperative, control flow-based programming. But once you start thinking the way it wants you to think, there are benefits.
I'm in the middle of a project built entirely in nginx and it's astoundingly performant. The restrictions on what I can do in (mainline) nginx force me to think through how I structure the blocks and directives with better logic representative of a web server and not an application server, which is what I'm used to.
DigitalOcean has a decent tutorial on now nginx decides on server and location block [1]. And, related to ifIsEvil, this blog post [2] goes a little into explaining how nginx "traps" a request. If someone has better resources, I would appreciate them.
I've written a tool to tell us exactly that, called nginspex...hoping to open source it shortly. We use generated configs with tens of thousands of lines, for me this was an exercise in learning what nginx is up to as well as being able to say how a request will be handled.
The tool interprets the nginx conf, rather than compiling any of it (as nginx does with the rewrite rules), makes it easy to log which lines are involved in the processing as it hits them.
Not really. XML is a document markup language, where order is important. JSON is a serialization format, and is a bit more bare-boned. For example, how would you transfer this XML to json, and back to XML?
There are many ways to do it, you just need to pick a convention. For example, one convention is that you use "#" as key for element name, "." as key for element children, and attribute name as keys for the attributes. Following those conventions, you can produce:
Or, for something more verbose (but maybe more intelligible), you could use "$element" as key for element names, and "$children" as key for child elements / text. (The point of choosing $ as a prefix, is it is not a valid character in attribute names, so cannot conflict with them.)
That is pretty hard for a human to parse and understand, though.
I think it could be reasonably easy to come up with a config file standard in XML or JSON, but that the format will have to rely on the strengths of each. Translating between the two just becomes an unreadable mess. If anything, if I were to write an application that allowed for either format, I would come up with a separate standard for each. More code/upkeep, but when the config files are intended for humans and to be hand-written, the focus should be on the user.
It's not just readability. XML has some serious issues when used as a configuration file format rather than as document markup. It has features oriented towards document markup that get in the way of writing data structures, and it lacks convenient features for writing data structures that yaml-like languages have.
First, a quick disclaimer: Apache conf format is not really XML. It leverages XML-like syntax but it's mostly not XML and avoids most of the serious problems that XML tends to bring, which I'll explain in more detail below.
XML was designed to provide structure to documents, it was not designed as a configuration syntax or a data serialization format. XML is meant for a document that already exists in its own right as a document, where the XML is added on as a layer to aid automated semantic understanding of that document's structure. It is not meant to directly represent programming data structures. As such, XML tags are designed to pop out and be visible from significant amounts of text that is not metadata. When there's more tags than text, as is usually the case when you try to use XML to write programming data structures, XML winds up being hopelessly verbose, and it's hard to avoid errors writing it (like misspelling end tags, forgetting a slash, putting end tags in the wrong order, etc.)
When not used for its intended purpose, XML winds up being hard for humans to write directly and hard(er than yaml and json) to write programs to parse it. In yaml and json, there's a standard, mostly direct mapping to common data structures in most high-level languages. With XML you have to make a lot of trivial decisions to make use of features that weren't designed for what you're trying to do. The most obvious examples are the distinction between attributes and tags: what does each one mean? What do tagnames represent? What do attribute values represent? What do attribute names represent? How do you handle CDATA that has more XML in it? XML is designed to elegantly handle something like this:
<A>first section <B>marked up section</B> second section</A>
But this kind of structure is horrible for a configuration file, unless the CDATA sections are a parsed language of their own and parsed externally, which is essentially what Apache does. If you're trying to use XML to specify data structures like lists and trees, it's messy. Consider this example:
You might envision "VirtualHost" to be an item in a list, where the value of that item is a dictionary with subkeys specifying "ServerName" and "DocumentRoot." But in fact, there's more to it than that. An XML parser also gives you all the whitespace in between those two tags. You can discard it, you can write checks to ensure that nothing ever ends up in that unused CDATA area by mistake, you can write tools to generate the XML-- but no matter what method you choose it's something you have to think about that just doesn't come up if you are using a language designed for writing programming data structures instead of abusing one designed for marking up text documents.
And that example highlights another problem with XML which is a flat out lack of support for common programming data types such as lists and integers. In the example above, how would you know that "<VirtualHost>" represents an item in a list, but "<ServerName>" should be a key in a dictionary? XML doesn't help you there, every parser decides for itself.
> There are many ways to do it, you just need to pick a convention.
The fact that you have to pick a convention is the crux of the issue. "Easy" is a subjective term, but the fact is there isn't a direct mapping between XML And JSON.
It means that if you're using XML and converting it into a programming data structure, you have to make a bunch of decisions about how to handle the XML. It means that if you're converting a programming data structure to XML, you have to have a bunch of specific rules for how to generate that XML.
With JSON, you only have to make those decisions if you need to use data structures that aren't supported by JSON.
XML is largely human-unreadable, ludicrously verbose, and has weird character escape requirements. Not a trifecta I want from my config files. I don't think anyone who's ever found themselves tampering with a complex XML file in vim ever wanted to repeat the experience.
I am highly disappointed by Nginx in three major areas:
1. They created a paid version that has some additional basic features such as cache purging, dynamic upstream name resolution, and a few others. Charge for support, charge for some fancy management interface, monitoring, but for basic features (most of them available in Tengine [0]) - thanks, but no thanks. You lost me as an evangelist! In fact, in may aspects, they now are catching up with Tengine!
2. Instead of making LuaJIT integration standard and avoid the need to escape Lua in the configuration files, they invented some subpar JavaScript. People already use Lua widely, it's fast, it's great - don't you have anything better to do than invent yet another language!? I really can't believe pragmatic people would have done this, honestly! Speaks so badly about their thought process! I know can expect anything stupid from them!
3. The configuration language is not very intuitive. If they embedded Lua, the whole configuration could be a Lua script that initializes some internal state. This would have been a dream come true!
The real question is: why didn't they use something like Lua for the config file format?
Web servers can have notoriously complex configs, up to the point where designing a mini-language might a worse idea than stripping down an embeddable language, such as Lua.
It's not just that Lua makes a good config file format, it's that serving HTTP requests is a complicated enough business with enough edge cases and corner cases that it merits a turning complete language.
I actually have some special corner case API endpoints that nginx just simply can not handle in the manner I would prefer. Further, the regex based location syntax, and even the prefix based ones, are not really what you want; you want a "Path" object that's aware of what / in a URL means, and does the right thing if you do/don't add it to the URL. (And it's not as easy as "/foo/bar/baz/?")
And we do. We don't use openresty (at least at the moment), we use a custom-built nginx that includes — among other things — the lua module. It's a great boon, but something things are still hard or impossible[1], and we don't use it for every request, opting for the location syntax, and all its flaws, for most.
[1]. Conditionally streaming an upload to a backend (i.e., if auth fails, don't stream) is impossible; nginx will buffer the entire request body, either in memory or on disk, and there is no way to change this behavior.
This looks really interesting. Can't believe I haven't run across it before.
I've been toying with getting a couple home servers going (replaced home service w/ business service, just installed two router based DMZ, looking at lightweight hardware -- probably will be Fit-PC products). I was going to run a separate reverse proxy and Lighttpd or similar but a quick glance makes it seem like Caddy could be used for both and more easily. Thanks for the link.
JSON doesn't allow comments. So it is an ok-ish config format. Yaml would have been better. But they probably wrote everything from scratch to be optimized and decided if they are writing a parser for a config file might as well invent a config file format as well :-)
Well ok. But I'd still argue that Nginx was invented before the popularization of YAML. I think I didn't see YAML until I got in touch with Rails in 2007, which used the format extensively. And even then, almost everyone I met didn't take it seriously, saying that it is a joke until Microsoft, IBM, etc support it.
JSON is horrible for a configuration language. Needing to put quotation marks around every single literal is insanely irritating if you ever have to write a lot of configuration. There's also a lot of application-specific syntax sugar that you cannot do if you stick with strict JSON.
Note that it's not really fair to call Apache's configuration language "XML". Apache relies on XML for some structured data in its configuration file, but all the individual directives are parsed separately from the XML.
I agree. When people started to move towards the "better" JSON standard from XML, I was puzzled. The inability to have comments, the need to "escape" every key name, the lack (in the past) of a schema and query language standard, etc.
To be fair, while I agree that JSON is bad for config, I would argue that JSON's objects and arrays eliminate the need for DTDs and standard query languages altogether in many cases. Of course, with sufficiently lqrge and complex data stores, you'll want a documented structure and method for accessing that data no matter what format you use, but unlike XML, JSON provides structure to get started on a small scale very easily.
Consider this python:
Such a program will look similar in any language with a json library that maps objects and arrays to native data structures. Granted, my simple tool will fail if the JSON isn't an object, but it's a very simple matter to extend it to handle lists and literals. For many applications this is a huge advantage over XML, especially if the point of the JSON isn't configuration but rather inter-application communication (aka data serialization).
Have you seen nginScript (1)? It's a syntax for embedding snippets of JavaScript in your NGINX configuration. This allows for scripting application logic into standard NGINX config files via a very common and simple language. There's some more details and simple examples at this blog post.
So I'd say Nginx is actually the least annoying web-server to config of those I've used:
1. Nginx
2. IIS 6 (strange metabase thing)
3. Apache (XML, mostly)
4. IIS 7, 7.5 (XML, but with some of the files scattered through your Windows directory, and also some values aren't valid in some of the files).
5. Tomcat (XML plus madness).
I'd say the thing that makes all web-servers a pain is debugging which rules are passing/failing, and where are they sending their results to.
I think the thing which makes Nginx easier than the others is probably that it doesn't try to support the 'shared hosting' scenario, which adds a lot of mess.
Is that a positive thing? Of course, like in many other cases, IIS configuration is supposed to be generated from tools, but it always needs some tweaking and XML makes it a total pain to edit.
It's not just web servers. I would say that it is because they want it to be human readable/editable, but JSON would probably be just as good in that regard.
I would prefer it if it were more like openssh or supervisor.
Though I suspect those styles of configs are would make some of the more advanced configurations a pain.
Does anyone know how this architecture compares to cowboy? I know erlang is known for concurrency, but I'm assuming erlang isn't as fast as the custom-tailored C here. OTOH, I feel slightly less concerned about security issues with erlang.
Well cowboy is an application library. So it is a bit in a different league. With cowboy you can start writing your business logic or application code directly and go to work. Nginx can do some of that with Lua integration but it is nice for serving static files and proxying to back-ends.
I could see for example nginx in front of cowboy. It would strip away ssl, serve static pages, maybe authorization/authentication and then proxy connections to cowboy servers in the backend for application logic.
Nginx is a solid piece of software with a lot of people working on it.
I find it sad that people expect good services built on top of it to be free as well. Without an enterprise/paid offering how else do you suppose people fund nginx? Right now the state of open source funding is abysmal.
It's a common question. As you correctly noted in the changelog, the vast majority of features have been placed in the open source version. The commercial product has gained great visibility and adoption but open source NGINX is very core to our company. In fact, development and feature releases to open source NGINX have rapidly increased since the inception of NGINX, Inc (the company) because it can support the development efforts :)
>These days the Internet is so widespread and ubiquitous it's hard to imagine it wasn't exactly there, as we know it, a decade ago. It has greatly evolved, from simple HTML producing clickable text, based on NCSA and then on Apache web servers, to an always-on communication medium used by more than 2 billion users worldwide.
The Internet[0] has a much richer history and larger ecosystem than just the World Wide Web. The Internet started nearly six decades ago, the web has only been around for a bit more than two.
Yeah, I thought that was pretty damn breathless of the author too.
Architecturally and usability-wise, we're more or less at the same place as a decade ago:
In 2004, both nginx and Gmail were released, and people were going ape about exciting "Web 2.0" technologies like DHTML and AJAX, whose paradigms more or less still underpin all modern development. There have been a lot of additions and streamlinings, but "dynamic pages/apps in the browser without a pageload" were the modus operandi then, and are the MO now.
I would like to mention agentzh and his team that did an amazing job in releasing OpenResty[1] which makes it easy to extend nginx with custom Lua functionality, which also happens to be the backbone of CloudFlare architecture, and the core technology being used by projects like Kong[2] when it comes to microservices management.
[1] http://openresty.org/
[2] https://github.com/Mashape/kong