I'm part of the Minecraft forum/wiki team (I created them both originally) so if you guys have any questions, feel free to shoot them at me, or you can join #rsw on irc.esper.net.
(in advance I'm not one of the sys admins, although I guess I'm a php developer, but I suck too much to work on the sites at these traffic levels, I just do the whole community stuff)
Also to clarify, the forum + wiki combined when ignoring our downtime are pushing more traffic, not just the wiki.
Edit: Also to hijack my own comment, if any of you guys with your fancy startups want to advertise to a community of indie gamers feel free to email sam@redstonewire.com :-)
you've got articles that have rating, you've comments to those articles that have rating, articles and comments are sorted according to rating, people earn karma from posting good comments / articles, everything has tags.
So personally I don't see a false dichotomy; even if Digg is more dynamic / complex ... WTF are they doing with those 500 servers?
Digg is more like Facebook and Twitter than StackOverflow. Each of Digg's logged in users gets their own "News Feed" based on the users they follow/friended/are most similar to.
StackOverflow on the other hand is much simpler - questions and responses, plus users and voting. AFAIK there's no collaborative filtering going on at SO, be it user or item based.
I don't know why Digg needs so many boxen, but I did find Spolsky's comparison disingenuous.
Whether it's overkill or not depends entirely on what the servers are being used for. Without actually working there it's hard for one to say for sure.
You don't know what each server is doing. They might have 10 servers being used by an internal marketing analytics team, with 40 support servers for development, QA, testing, and disaster recovery of those specific services.
What sort of computing resources do you think it took for Google to develop the autopilot car? Would you have been able to determine that by looking at their homepage and the services they provide? No. That's the point. I'm not suggesting that Digg is doing anything so interesting, but most of those 500 servers are almost certainly NOT being used to support their website directly, they're being used by the business for other things.
The title was supposed to be a serious representation of our traffic numbers (our amount of servers wasn't supposed to be part of the comparison, the poster here is a slippery snake) but we can pretend it was if you like, nobody likes joel so it fits in well.
Spolsky compared two rather more similar sites though. This one is a wiki where most of the content is static, served out of cache. Still, I agree that such comparisons are on a very weak footing. A single feature could easily kill any valid comparison.
Mediawiki is fantastic software, we'd be able to operate on a lot less hardware if it was just mediawiki, but because of the forum we had to boost everything up. For anyone who ever considers using phpbb for anything serious: please don't.
We're looking at a couple of alternatives and I can report back with our findings when we know enough, right now we're looking at vbulletin which powers some of the larger forums, it is apparently very good if you're willing to strip out the poor parts (apparently search is terrible).
It looks as if the best approach is to roll your own, phpbb seems to be designed with the smaller user in mind, so while routing every single file through file.php for easy processing might work well for Johnny and his friends forum, once you hit a large scale it becomes rather a pain.
So yeah, no idea, we'll find out soon though, I'll report back when I know :-)
I think there is an important lesson in this even if it's apples to oranges. Often in my career I've been in a debate with a non engineer (product person, ceo, etc) about why a certain features sounds good to them but a variation of it which provides most of what they want is so much better because I can keep the page mostly static. Sometimes I've won that debate and sometimes I lost. Being able to hold up examples, see such and such wiki serves x million pages with 3 servers because it's mostly static vs ...
I think these types of stories are misleading for startups.
Many startups would do better to add server capacity in the short term, rather than spend lots of time optimizing to cut costs, when this is typically hidden from the user.
For example, a 4GB linode VPS is $160/month, so you can have 34 of those ($5440/month) for the cost of one developer (Salary of $67k based on: http://www.simplyhired.com/a/salary/search/q-php+developer/l...). Also, many startups struggle to recruit good developers, so would it make sense for them to spend all their time optimising code to perform on cheap hardware? rather than improving the product in a visible way to the user?
Possibly true, but your accounting ignores recurring vs. one-time costs. If you can pay some external person $10k to tune your setup and save $4000/month, and aren't already swimming in money, obviously that's a smart thing to do.
It also ignores all the very real other side effects of inefficient design (the most common cause of poor performance). For example, bad user experience (if you need to spread out your traffic onto many servers that usually implies a sizeable latency on each page view), engineering drag due to technical debt, and engineering drag due to cumbersome infrastructure and deployment overhead. All of these things matter.
In general a company with a more efficient solution will have an easier time with just about every aspect of development and deployment, which pays huge dividends. However, if you find that engineering such efficiency is too difficult due to fundamental design choices or legacy systems then sometimes it's not worth killing yourself to fix.
Definitely, nobody should take us as an example for their startup/business. We're literally a small forum that inherited huge success and had to rapidly deal with scaling up, we're not a business and money is not our goal, so if someone were to base their business off of what we've done it might not turn out too well.
The current servers we operate were paid for with donations from our users because our ad revenue has yet to arrive heh.
Distributing your app across 34 servers will require a non-trivial development effort itself.
Not to mention, there was a story here just this week about how communication between EC2 nodes may be a lot of reddit's performance trouble. Scaling horizontally is inevitable at a certain scale, but it's no panacea.
For $200 per month you can get a quad core X3220 with 8
GB RAM and 2x 500GB disk with a large amount of bandwidth included: http://www.100tb.com/ .
I don't fully understand the love for large VPSes (that aren't even all that large) compared to dedicated hardware that have a better chance of having higher memory bandwidth, more RAM, and faster disk access; though I do understand that many are very happy with Linode as a business.
I'll talk to the guy who actually tested them when he wakes up, but from what I understand network speeds were terrible. I'll get back to you when I know :-)
You really think for $200/month they're going to let you saturate the equivalent of a 300 megabit connection 24/7? You'll get shut off if you approach anything close to that, I'd bet, just like all the "unlimited bandwidth" hosts. That, or the transfer speeds your server gets will be nothing near the 300 megabits that'd be required to use 100 terabytes in a month.
more to the point, for $5K a month you get 2x32GiB ram 8 core 4 disk servers, capacity for 16 of your 4GiB VPSs. so with two months up front cost, then another $200/month you can get the same capacity as those 32 linodes. Even if you have to pay $100/hr for your hardware guy (which is above market) you are saving a ridiculous amount of money.
It depends on the type of optimization. For page caching I definitely agree, not because it's a waste of effort, but two other distinct reasons:
First, because you might be papering over more fundamental issues performance issues that will still hurt you in the long term, and will indeed be harder to spot once caching is in place.
Second, because cache invalidation is quite often non-trivial, and doing it right may require a somewhat thick layer of code, and a certain discipline moving forward. This will slow down development if you are in rapid pivot mode.
However if your performance fundamentals are already resolved, and the business model is in place, and you expect the code to be around for a while, then putting the effort into caching will be amortized over many years and pay many dividends not only in server costs, but also in user experience due to fast page loads, and also in correctness because you will have time to get the caching right rather than scrambling to add it at scale and potentially serving stale data to millions of people instead of thousands.
A decent php dev can figure out how to hook up a memcached server and install apc. Most of the work is already done, I think it would be worth the couple hours to learn about caching, IMO. A little caching can go very far.
Which wasn't our point, the OP here misrepresented us. We wanted to explain how popular we were and we realised we had as much traffic as stackoverflow, so we used that as a traffic comparison, our point was not to compare servers. As you can see if you read through the linked topic, at no point did we compare ourselves to SO beyond traffic wise, the OP here is at fault :)
Why was your posted title on reddit then: "Minecraftwiki.net and minecraftforum.net now serve more traffic than Slashdot and Stackoverflow!" ? I just added a server count and php part which were both in the thread mentioned to signal previous false dichotomy. You guys shouldn't have posted a title like that then, you have compared yourself to SO and Slashdot, not me.
Yes, we compared our traffic levels, not our server count. The title of this post implies that we're saying "we have as much traffic as SO and they use more servers!" which wasn't the point. The point was Stackoverflow is a tangible comparison of traffic, not hardware. Anyway I replied to you above explaining, I misunderstood your intentions.
Your title made our servers and setup the focus, that wasn't the point at all.
I agree. Over here we do 70m (high write ratio) pages per month on 1 server handling all apache/php/mysql. Hardware is really fast these days if you tune it to any degree.
That was a very interesting article, thanks. One question, if I may: When using a reverse proxy, it makes no sense to have keepalives on for Apache, correct? The proxy takes care of the keepalive and leaves Apache free for other requests?
Correct. The reverse proxy pulls from the fast, local network apache, and then passed the data to the slow clients. Apace is connected for a shorter time. Basically you're trying minimize the time a "memory expensive" process like apache is open per client.
No, I'm not referring to this parameter in the write up.
ipv4.tcp_slow_start_after_idle, which is on by default on most distros, applies to keepalive connections.
This causes your keepalive connection to return to slow start after TCP_TIMEOUT_INIT which is 3 seconds. Not probably want you want or expect. For example, if you have keepalives of say 10s, you'd expect that a request after say 5s would have it's congestion window fully open from previous requests, but the default behaviour is to go back to slow start, and close your congestion window back down. So you want to tune this to off on your image servers and other keepalive systems.
The tuning which I talk about is to actually increase the default initial congestion window size. The result being an advantage for non-keepalive connections and keepalives. There is no sysctl parameter that will allow for this control. This behaviour is hardcoded into the tcp stack, and hence requires direct modification and a recompiled kernel.
Also a request if any of you are experience with it, we're interested in new ad strategies (that retain our "minimalist" approach, allowing expansion without upsetting users) so if anyone either works for an advert company or has experience at our volume (or similar) we'd love to hear from you. sam@redstonewire.com :-)
(Hopefully this is okay, I'm not a regular HN user, mainly lurk, it isn't mentioned in the guidelines but it might be one of those secret rules that are learned as you go along... be gentle!)
We're not connected with the people behind Minecraft, we're totally separate entities. The forums and wiki are community ran and while we've had brief discussions with the Minecraft company (Mojang) nothing has come of it. The general consensus is that we're best operating as separate entities, it means Mojang can focus on game development and we can focus on growing the community, it means that we can remain impartial (although whether or not that is an issue, I have no idea). We've never had a penny of Minecraft proceeds :-)
I'll send an email now, it'd be great to talk to some people over at cpmstar!
I ran a wiki once where I changed the search box to a google search box. Made some good money even with very little traffic. Of course you're taking people away from your site but my website was pretty rubbish so wasn't too worried about that personally.
Just to clarify, the OP here titled it in a manner that misrepresents what we were saying. Also, we're far from serving static pages. Granted the wiki (which is 50% of our traffic) is pretty static and we could easily run that from a single server, the reason we have such high number of servers is because of the forum, which is the other 30m page views and it's phpbb, it's... well, let's not go there.
This submission is poorly titled, our intention was never to claim we're better than SO (we're very different... just like SO is very different to Digg) it was just a good comparison to make, as in "Joel said they're serving 60m page views a month and SO is huge, well we're doing the same, now you can see how big we are!" not "We serve the same as SO, therefore they suck!".
You really should be using community tracker, my other baby :) http://community.mediabrowser.tv/ I'm so happy I moved off phpbb it was causing nothing but grief.
No hard feeling here, I think you are building an awesome business
We're in the process of moving, it's just a lot of work at our size, we have to make sure everything works :) We actually started out with fluxbb (my choice) but users got tetchy and as we moved from being "just a forum" to being a "community" we had to go forward with new features, but this was back before we had adverts and the $250+ for "proper" forums wasn't something we wanted to do. Here's an idea of how much we've grown: http://i.imgur.com/eenut.jpg
That software looks interesting although as I'm not a sys admin, all that matters to me is how pretty it is and that doesn't have enough rounded corners ;)
Also, important to note that while 2 million pages a day sounds like a lot, it's around 23 pages/s if distributed evenly. If traffic peaks around double or triple that, it's still very feasible for a single server running a dynamic app to serve that.
I don't get how Minecraftwiki is serving "more traffic than StackOverflow." I think we have at least twice their daily traffic. All our numbers are on Quantast--feel free to check.
We weren't challenging you or anything, I just noticed that tweet (it was mentioned here) and I thought "hey we're doing the same, we can use them as an example of how big we are!". I'm just a dumb kid who has never had anything he created this successful before and being able to say "We're as big as stackoverflow" is crazy.
As I've said elsewhere, the person who titled this is a silly person, they misrepresented what we said.
Yes, you've said that several times. Intention was two-part. One, to show how previous post by spolsky was false dichotomy, and second to point out to your success - which almost everyone here understood as such.
However, your original post on reddit (well your sysadmin) was titled: "Minecraftwiki.net and minecraftforum.net now serve more traffic than Slashdot and Stackoverflow" - so you can't blame that part on me, just the server count and php part.
I'm not good at the whole English thing (even though it's my only language). I didn't mean to imply you were at fault, just that your title didn't represent what was actually happening. Also I didn't realise that you'd posted a comment here pointing out it was supposed to be a joke, I'm used to reddit where it points out that a comment is by the submitter.
This is more a testament to HTTP caching and varnish than PHP, 4 servers or Mediawiki. If you can cache the entire page and serve it out of cache for most of your requests, you're in a very position.
No, Minecraft is seriously that big. If you trust Alexa much you'll find that we (forum/wiki) are in the top ~5k for both sites, Minecraft is top 3k last I checked. It's been insane recently... what really hammers it home is that this is a product people have purchased, so it's going to be around for a long while. While we probably won't maintain the current traffic once the game settles down into a normal routine, we sure won't be dying for many years, which is what I love about this.
Minecraft is like garrymods, the game is what you the player want to make it, this will lead to a lot of future success along side this current success.
*If you're interested, here's a (not very accurate) list of where Minecraft has been featured: http://www.minecraftforum.net/viewtopic.php?f=3&t=2162 which includes Australian TV, Physical magazines, huge tech blogs, gaming blogs, forums... everywhere! I don't think I'll ever see anything happen like this again in my life (and I'm only young) -- Minecraft is incredibly unique.
Interesting -- I know nothing about it, but it sounds like it might appeal to people who want to learn video game programming, at least perhaps the ones that don't want to go into hardcore engine programming.
I'm just wondering... Why do they use Varnish and HAProxy and nginx? This is quite redundant setup. It would be a lot more efficient to put nginx on lb01 and leave only PHP on fe* nodes.
(in advance I'm not one of the sys admins, although I guess I'm a php developer, but I suck too much to work on the sites at these traffic levels, I just do the whole community stuff)
Also to clarify, the forum + wiki combined when ignoring our downtime are pushing more traffic, not just the wiki.
Edit: Also to hijack my own comment, if any of you guys with your fancy startups want to advertise to a community of indie gamers feel free to email sam@redstonewire.com :-)