All this interest in how twitter could scale really highlights the opportunity for someone who knows what they're talking about to write an intelligent analysis. After ten or more tries, clearly TechCrunch doesn't employ that person.
Almost every analysis I read, including this one, points to Rails as the major bottleneck but then can't say anything concrete to justify it. This article says that Rails is no good for processing intensive tasks and that you should use C. Rails is a framework for serving web pages, it isn't doing any of the processing or message queueing. I'm sure that Rails won't work out of the box with whatever backend you would need for a stable twitter, but I haven't heard anyone address the actual technical issues. Is it just a matter of having to tweak Rails or is there a reason that Rails is the wrong architecture?
Blaine released Starling, the queue system that Twitter uses. What do people think of that? Was it the right architecture? Is there some out of the box proprietary solution that would have worked better? What were the tradeoffs of writing that in Ruby?
People seem to be catching on that one of the problems is a small operations team. Some people say there's just one person but I'm pretty sure there's two. Either way, I can't imagine that with just two people they've had enough time put in fully redundant systems let alone a real staging environment. What should operations look like for something Twitter's size?
On the Gilmore Gang, Blaine mentioned that the reason Twitter's track feature only works on the phone and not the web is because it's much easier for them to do broadcast than to do the lookups necessary for historical display. Yet, I use Summize and Tweetscan to give me a web based track feature. What are the real issues there?
I'm sure there's lots to get into here if someone with the chops to get into it would speak up. But given the lack of in depth commentary, maybe there really aren't very many qualified people.
I'm sure by now the Twitter team must be very aware of what their bottlenecks are. I don't know what the typical user Twitter user does. I personally send 1-5 messages per day but refresh the web timeline all the time. If most people are like me, then I would want to keep the last 24-48 hours worth of messages in memory. This shouldn't be a problem, as we are talking about 1M messages times 200 bytes or so, uncompressed.
In order to generate a page, you'd need a "follow" matrix. Given a user, you want to know who this person is following and pick up enough recent messages to generate a page view from the memory table above. This "follow" matrix would be relatively sparse (I imagine). It must be persisted often but given that it's queried all the time it would have to be in memory as well, at least partially. Assuming that Twitter has 2M users and the average user follow 40 people (wild guess), the entire matrix would take up 80M ids (a few hundred megabytes in a hashmap or something).
Also, I'd cache personal timeline requests for a minute or so, to avoid being killed by people hitting Ctrl-F5.
Of course the devil is in the details. All that information needs to be persisted and crash-recoverable. For the numbers above everything might work on a single beefy server but probably not for long, so the system has to distributable.
It's not an easy problem, but a competent team should be able to solve it (perhaps not in the most elegant and cleanly documented way) in a matter of weeks given the right motivation. Getting the system to be stable enough for production would be a different matter.
A few weeks? Really? I don't believe that, but it's really a moot point until they hire more people.
I thought of a couple other technical points that I'd like an informed opinion on. What should their permanent data store look like? Nik, in the TC comments, says a custom built BigTable or sets of SQLite. Those things are interchangeable now?
And what are best practices for upgrading their infrastructure. I was at MasterCard when they switched data centers and the switch was seamless. Of course that required a completely new set of hardware, not rushing servers to the new building in shopping carts.
Pretty much everyone I can think of who's seen these types of problems works in the financial services industry.
Almost every analysis I read, including this one, points to Rails as the major bottleneck but then can't say anything concrete to justify it. This article says that Rails is no good for processing intensive tasks and that you should use C. Rails is a framework for serving web pages, it isn't doing any of the processing or message queueing. I'm sure that Rails won't work out of the box with whatever backend you would need for a stable twitter, but I haven't heard anyone address the actual technical issues. Is it just a matter of having to tweak Rails or is there a reason that Rails is the wrong architecture?
I agree, I see a lot of confusion about this. Some just seem to blame it on the language, others on the framework, many can't seem to tell the difference (FTA: "Rails would do itself no harm by conceding that it isn’t a platform that can compete with Java or C when it comes to intensive tasks.")
It doesn't make any kind of sense to blame the language. Within reasonable limits, the speed of the language is only relevant if you're trying to run everything off one box. If you're running more than one server, a slower language just means more servers. I can't think of any exception to this.
What about blaming the framework? I wouldn't expect it to scale out of the box, fair enough. But are there any frameworks out there which can scale perfectly, out of the box, without any a priori knowledge of what the system actually does or needs? If there is a specific crippling flaw in Rails which makes it unable to scale, it should be possible to pinpoint it.
In the meanwhile, very little is said about the architecture and the database. Is there really any problem Twitter needs to solve that is more difficult than those solved by Facebook (friend newsfeeds) or by any chat system (online status updates)?
Classic. The guy behind Omnidrive's inability to stay functional is claiming to have a clue about twitter's scaling issues without interviewing anyone on the team? Isn't that like asking George W. Bush to write the handbook on Mideast Diplomacy?
I always wondered if it'd make sense to run a contest by the community and judged by the community on how to make twitter scale. There have been a lot of suggestions in informal blog posts and the like, but nothing official.
In one way it would be admitting defeat, in another a way to crowdsource difficult problems to be created and judged by the community. Judging from the interest it harnessed already, I imagine a few intelligent and interested people would submit ideas.
> run a contest by the community and judged by the community on how to make twitter scale.
There's already been more effort discussing the problem than fixing the problem. Most of the discussion has been based on very little factual data. For example, armchair architects can safely suggest Blub where Blub != Ruby because it is a theory which is unlikely to be disproved and extremely unlikely to be made in isolation.
They have $15mm cash and a $80mm valuation; Lets see if throwing money at the problem can solve it. I still have trouble seeing something without a business model have a valuation of $80mm ;/ just me?
Is using Twitter really very different from the Gale messaging system? It seems like the user experience is fairly similar, but Gale's architecture is much more interesting to me (and it scales! wow!).
In some sense, isn't Twitter email with message length limitations and multiple interfaces? Is there really a need to store this in a highly relational single storage way? I'm not enough of a power user to have seen the nuances around the more advanced features to know for sure. Can anyone enlighten me?
Almost every analysis I read, including this one, points to Rails as the major bottleneck but then can't say anything concrete to justify it. This article says that Rails is no good for processing intensive tasks and that you should use C. Rails is a framework for serving web pages, it isn't doing any of the processing or message queueing. I'm sure that Rails won't work out of the box with whatever backend you would need for a stable twitter, but I haven't heard anyone address the actual technical issues. Is it just a matter of having to tweak Rails or is there a reason that Rails is the wrong architecture?
Blaine released Starling, the queue system that Twitter uses. What do people think of that? Was it the right architecture? Is there some out of the box proprietary solution that would have worked better? What were the tradeoffs of writing that in Ruby?
People seem to be catching on that one of the problems is a small operations team. Some people say there's just one person but I'm pretty sure there's two. Either way, I can't imagine that with just two people they've had enough time put in fully redundant systems let alone a real staging environment. What should operations look like for something Twitter's size?
On the Gilmore Gang, Blaine mentioned that the reason Twitter's track feature only works on the phone and not the web is because it's much easier for them to do broadcast than to do the lookups necessary for historical display. Yet, I use Summize and Tweetscan to give me a web based track feature. What are the real issues there?
I'm sure there's lots to get into here if someone with the chops to get into it would speak up. But given the lack of in depth commentary, maybe there really aren't very many qualified people.