Hacker News new | past | comments | ask | show | jobs | submit | pinkbike's comments login

I would expect something a lot more from google. All these ideas are way too elementary and some are just misguided. Advice like "crop your image"...come on. The misguided ones are things like "use single quotes instead of double quotes when printing in php since one does not look for variables in the string text". The is just misguided and is probably the last thing your should worry about. This is orders of magnitude from any useful optimization that is actually impacting your site performance.


Look through the rest also - the advice really varies. I found the part about avoiding browser reflow [1] very useful. Just last night, I was getting usability issues with a piece of JS which was executing with each keystroke and (I now realize) sometimes causing reflow. I fixed it, but now I know what was broken :)

1. http://code.google.com/speed/articles/reflow.html


I disagree; I thought the point about not copying variables in order to make the code look "cleaner" is a good point. I never considered that a malicious user might pass in 512kb worth of data into a textarea..


Very pretty interface and good features. We have our own debugger/profiler with a similar feature set (not as pretty though) that is integrated into our frame work. One of the MOST important features that we integrated into our profiler was the ability to log/save the profiler data when a page generates slowly. Basically, store your profile output and all client info to a database when a page generated slower than some time setting. You'll be surprised at all the slow pages. A page may generate fast when you are testing but under real load you'll uncover a slew of information that will help you fix/improve your application. Under a heavily loaded site under real usage you'll see which queries are locking and are interdependent on others. You'll also run into the issue of slow clients where you'll see slow page generation on larger pages which is attributed to apache flow controlling php. Adding this facility into a tool like this should be easy, and it will yield a lot of useful additional data.

Nice work.


It would be their systems that are misconfigured.

Well I'm asking for a couple of reasons.

1. To get advice like yours, which I followed in addition to sending emails to their main site.

2. This is a somewhat interesting issue in that more and more sites simply pass every request to their script engine and then process the URI to determine what to do with it. This seems to be true on HN also as you get a response to every request.

If you go to http://news.ycombinator.com/test.gif this is being processed by the back end most likely. Now multiply this by 10 or 20 such requests for every 1 real page load from one of these misconfigured systems. This of course wastes a bunch of resources. In the least it may start sessions that don't need to be started, at the most, even db connections are made before module information is determined and the request can be rejected.

This is not just some single person on the net with a misconfigured cache/proxy but an ISP. This potentially means that all users using their ISP are causing much more load on sites than they need to be.

Is this trivial? It may seem so. In the old days this was a non issue. The server did not have that file and minimum resources were used to send a 404 response. Today, it's probably less likely, as your web app may be launching bootstrapping code, starting sessions, db connection, just to determine that this is a bad request.

3. It's a warning for all us developers, that misconfigured systems like this do exist out there and you can take measures in designing your site to handle these with minimum resources.


apache w/modphp has the best latency in comparison to any fastcgi setup. When it comes to high concurrency latency and time to finish is your biggest issue. Slow clients are another killer (slow clients are users who have a slow connection and take an order of magnitude longer or more do download the page data as compared to generation). If you application is fast (less than 20ms) page generation your best bet is the following setup...

nginx or varnish reverse proxy front end. (depending on your load you can turn keepalives on here) This front end isolates your www/php/db from slow clients making sure that your request gets processed fast, resources are released, and then a light process of your proxy handles the delivery of the data. On the back end use apache/mod_php with a limit of only 50-100 clients.


Benchmarks that are not completely anecdotal are really hard to produce. For starters you need the following...

1. Don't run the client on the same server. If you do, you have no business trying to test for high concurrency. Isolate the variables.

2. Size of file you are serving. Are you close to saturating your connection between the client and server? Most of the time this is the case.

3. Concurrency is hard to test because most of the time the client is the problem in the test. Don't use apache bench for anything like this as it's high concurrency is much to be desired.

4. A lot of other details need to be compared to make a benchmark useful. Are you using keepalives on both/or not. Nginx workers/processes vs apache threads/clients. Are you comparing apples to apples? How's your TCPIP backlog in a case like this? What kind of IO model are you running on each? Are you using sendfile on both or only one?

Nginx is a great server, and probably a better choice for static files, but data like this is like saying, "the other day I saw a some kind of Honda pass some kind of Nissan". No useful information to infer about either.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: