Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nginx design details (aosabook.org)
151 points by amnigos on July 7, 2012 | hide | past | favorite | 29 comments


This is great! I didn't know that volume II has been published...

Also be sure to read the chapter about LLVM compiler family (written by LLVM creator, who is an Apple employee now): http://www.aosabook.org/en/llvm.html

It's great. A learnt a lot from that chapter.

This book (The Architecture of Open Source Application) is a treasure trove. Just look at the index here: http://www.aosabook.org/en/index.html and tell me you don't want to skip work (or school, or whatever else is you're doing) for a week to read it all :D


Thank you for pointing out that other chapters are available for free too. It's indeed an amazing read. (It contains chapters about internal organization of Mercurial, Puppet, Haskell compiler and so much more.)


My only quibble with Nginx so far is the magical order in which it evaluates location directives. http://wiki.nginx.org/HttpCoreModule#location

On more than one occasion, you have to unintuitively try to figure out whether a rule at the bottom is getting matched before a rule at the top. The matching algo is quite intricate and easy to to trip up on. It tries to help novices by matching literal strings first for them rather than simply making a note of the performance benefits of putting literal locations at the top, but ends up failing to reveal the true matching order at a glance - you need to check the verbose logs, yuck.


I think the configuration language will eventually have to be reworked to address some of the issues with magical ordering and unintuitive behavior. Making the configuration language lua, for instance, would be a huge step forward, because many complex configs already use the lua 3rd party module.

In addition to "location" eval order, "If" statements are another common gotcha.


It's not a magic. It's just an optimization. Prefixed strings are organized to the radix tree for fast configuration lookup.

But the only way to deal with regexps is to execute them sequentially.

btw, the official doc is here: http://nginx.org/en/docs/http/ngx_http_core_module.html#loca...


I'll fully grant you that the learning curve for locations is quite steep and it takes some getting used to, but the order is always logical and it's entirely possible to reason your way through it without needing verbose logs.


Perhaps you have not been thrown someone else's list of 15-20 locations and tried to determine why one was not matching, and what was being matched before it based on specificity, whether or not it was a literal or regex, whether or not it halted further matching in the order that specific type was defined.

The complexity is entirely unneeded, mod_rewrite could def use some minor convenience tweaks, but is far more intuitive and therefore more effective in both understanding and debugging. I assure you that "match in order defined, unless passed through explicitly by a sub-condition" is sufficient and simple for 99.5%.

nginx's "match literals, then match everything, then choose most specific, in the order defined by its type" is craziness, "getting used to it" does not make it good, it's simply the first unnecessary step in making it useable.


I've spent about 3 years doing support in #nginx. Perhaps I've just been thrown someone else's list of 15-20 locations too many times.

Locations definitely need reworking, never said they didn't, I just took issue with the having to check the verbose logs as it's perfectly possible without.


You don't have to if you've done it for 3 years and are familiar with nginx's "rules". Otherwise if you want to avoid wasting 3 hours, you sometimes have to.

Those rules are not well defined either. What is a "literal string" / "conventional string"? Ok, "=" is clear, but what is /images/, how about /im[ag]es/, and /images/$? Since these literal strings are not quoted, what black magic is used to classify the location directive as regex or literal? I would actually love to know this, probably the biggest stumbling point for me. Especially since it uses decoded uris, because "$" could have been a "%24" in a literal string path, OR it could be a regex end of line assertion...wtf?

if you can somehow understand this without having to poke the logs or through vast experience of prior trial and error, my hat's off to you, sir.


As has been covered the type of matching being done is easy to tell from the prefix, though, until you get used to them you will probably be checking the documentation for just what each prefix means.

The real WTF is when choosing which location is the most specific when there is a string literal and a regex match.

Ignoring ^~ locations which will always be preferred over regex the only locations that can actually be chosen over regex is exact match "=" or default string literal (no prefix) that has an exact match. This part is what is typically the real cause of "I don't know WTF is going on here."


The type of match is made explicit before the string/pattern.

"location = string" and "location string" and "location ^~ string" are non-regex matches.

"location ~ pattern" and "location ~* pattern" are regex matches.

Matching order is unintuitive and therefore bad, and there are plenty of other quirks with nginx configs that should be made more intuitive, but confusion between regex and non-regex is unusual.

It's all described right here: http://wiki.nginx.org/HttpCoreModule#location


ok, read through it again. the regex distinction is clear, that was my bad. the choice of prefix symbols is pretty poor. ^~ looks related to ~ and ~*. in the docs, the regex prefixes are presented first, so i associated ~ as a regex indicator. how wrong i was.

i generally disagree with ^ being the universal "not" indicator out of context since it's a "beginning of line assertion" when not preceded by "[" within a regex. The fact that it indicates what follows is a regex-halting literal string prefix match (exactly what ^ would indicate in a normal regex) is plain confusing.

thanks


Another problem with the existing worker model is related to limited support for embedded scripting. For one, with the standard nginx distribution, only embedding Perl scripts is supported.

For those looking for full featured and robust embedded scripting support in nginx, Yichun Zhang's lua-nginx-module is highly recomended. https://github.com/chaoslawful/lua-nginx-module


The slides from a talk about it at the London Lua group the other day are here http://www.londonlua.org/scripting_nginx_with_lua/slides.htm... video coming shortly.


My one experience with this server was quite bad. The web group we contracted used it, but would constantly run into issues us developers could have helped them with had they used Apache. Things like needing to return certain headers in the request, or return certain status codes, or do certain things handled easily by Apache's large ecosystem of modules became very grueling trials full of meetings with the contractors running the thing.

Just like Java is sometimes the best language to use because it has huge mind share despite the warts, sometimes Apache is the right server to use just because that's what almost everyone uses and it makes things so much easier than using something more rare.


You would run into the same situation with contractors who used apache, if they didn't know apache very well.

Returning status codes from nginx is easy. Adding headers is easy. Perhaps you could elaborate about "certain things handled easily by Apache's large ecosystem of modules".

Apache's large ecosystem of modules is built around doing everything in the webserver. More often these days, that stuff is handled by a dynamic application. Nginx talks to those via fastcgi, scgi, uwsgi, or plain http proxying. Nginx does not need to know anything about how the apps do whatever it is they do.


Can you give concrete examples of what you couldn't do?

Here's the wiki on how to set headers: http://wiki.nginx.org/HttpHeadersModule

Here's the wiki on returning a particular http code: http://wiki.nginx.org/HttpRewriteModule#return


I'm curious what you issues you ran into as well. I've used it in production environments for the past few years after moving away form Apache. It was a near seamless transition for my team, myself included. We use it behind some heavy iron load balancers and in front of a dozen web servers and it's been a generally pleasant experience.


Why do you use a web server (Nginx) in front of other web servers? I'm pretty new to this. Thanks.


I've seen situations where nginx sits in front of Apache and serves static files directly, but proxies requests for PHP through to Apache for easier setup.


As described, static or lighttpd for static and reverse proxy for the domain achieves more than one goal:

* you can serve page faster; * you can mix on a same domain more than one app from different servers in the DMZ (for sharing domain based mechanism (flash, Cross site ajax) by rewriting the url; * you can have one front (nginx) and several servers, which with heartbeat mechanism can handle failover. * you could (when ssl certificate were only IP based) share one SSL certificate for more than one back (VIP)

It is a pretty low cost quite scalable architecture. I guess you could do it with apache, but I dropped apache since its licence is as understandable as its configurations.


would constantly run into issues us developers could have helped them with had they used Apache

I'm not sure that says what you think it does about Ngnix and the developers you deal with...


Not in this book, but worth taking a look at if you're into reading source code for large open source projects, is Redis and Mongrel2. Both are really well written C code-bases.


I've read several times about people using the Go language's (golang) web server in conjunction with Nginx. Apparently Nginx provides some benefit that Go's web server doesn't, but I haven't seen that thing called out. Can anyone tell me what it is? Thanks. (I'd also love to see a performance comparison between these two.)


I do not know about others but I do it because not mixing up web server responsibilities with application server is a good idea. For example, I leave things like gzip compression and TLS handling to nginx without having to deal with them in my app.


Using HTTP in general as an internal transport protocol has always struck me as odd. But I come from a financial services perspective where latency is the key driver.


I use ZeroMQ or Go's RPC libraries for internal communication and mostly use nginx as reverse proxy and to serve static files on the edge.


I do something like this so that I can serve more than one standalone golang web application behind the same server. It allows me to multiplex several different go binaries and an apache server on the same box with the same address. These are all personal projects that aren't high traffic so one vps is fine and I don't have to remember which app is on which port.


I much appreciated the article. And even more when I discovered from other commenters that http://www.aosabook.org is a collection of architectural description of a lot of Open-Source projects. I immediately read about sendmail by Eric Allman (http://www.aosabook.org/en/sendmail.html) and I appreciated it actually more than the article about Nginx. There is a lot of wisdom in it and now I understand the reasons for some architectural decisions that have been taken for sendmail.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: