Hacker Newsnew | past | comments | ask | show | jobs | submit | more burgreblast's commentslogin

100,000's of servers for 100,000,000 of messages/day ?

I understand that half the servers aren't even doing messages, but, isn't WhatsApp doing 2 orders of magnitude more messages with 3 orders of magnitude (?) fewer servers?

Is that right? I'm curious how one would justify 10,000X worse?

So for each message, 10,000X more equipment is needed?


Also:

- whatsapp doesn't have to allow browsing the entire history of their billions of messages;

- whatsapp doesn't have tags. A message can go not only to 1000000 users, but also to so many apps requesting updates for one tag.

- Twitter allows advanced search, where you can browse, in real time (or down to the entire history), a complex combination of people, tags and free text. With settings such as choosing the lang or the date.

- Whatsapp has a list of messages. But Twitter has a graph : message can be RT again and again, answered to and liked.

- all those features have some impact or the other on the way the tweets are displayed to the user.

- Twitter's API is much more used than Whatsapp's.


WhatsApp also does not need ad related analytics.


That's a big leap. They don't have to actually show ads for that analytics data to be extremely valuable.


WhatsApp from the very beginning went with Erlang and it perfectly suits their needs. You can almost map 1:1 the messages in WhatsApp to messages in Erlang. On the top of that they optimized the hell out of their stack[1].

Twitter on the other hand is a very different problem where you need to broadcast messages in a 1:N fashion where N can be 100.000.000 (KATY PERRY @katyperry. Followers 95,366,810). On the top of that they need extensive analytics on the users so they can target them in the ad system. I am pretty sure there is some space for optimisation in their stack, not sure how much % of these servers could be saved.

http://www.erlang-factory.com/upload/presentations/558/efsf2...


Twitters analytics are either lossy or eventually consistent [1]. I'm sure they're resource intensive, but they're taking shortcuts that makes them very amenable to saving resources (unless it's just very buggy).

In terms of the broadcast problem, it's trivially handled by splitting large follower lists into trees, and introducing message reflectors. Twitters message counts is high for a public IM system, but it's not that high overall messaging volume for private/internal message flows. More importantly, despite the issue of large follower counts, if breaking large accounts into trees of reflectors, it decomposes neatly, and federating large message flows like this is a well understood problem:

I've half-jokingly in the past you could replace a lot of Twitters core transmission of tweets with mail servers and off the shelf mailing-list reflectors, and some code to create mailboxes for accounts an reflectors to break up large follower lists (no, it wouldn't be efficient, but the point is distributing message transfers including reflecting messages to large lists is a well understood problem), and based on the mail volumes I've handled with off the shelf servers I'll confidently say that 100's of millions of messages a day that way is not all that hard to handle with relatively modest server counts.

Fast delivery of tweets using reflectors to extreme accounts would be the one thing that could drive the server number up high, but on the other hand, there are also plenty of far more efficient ways of handling it (e.g. extensive caching + pulling rather than pushing for the most extreme accounts)

Note, I'm not saying Twitter doesn't have a legitimate need or the servers they use - their web app does a lot of expensive history/timeline generation on top of the core message exchange for example. And the number of servers does not say much about their chosen tradeoffs in terms of server size/cost vs. number of servers, but the core message exchange should not be where the complexity is unless they're doing something very weird.

[1] Taking snapshots of their analytics and the API follower/following count shows they don't agree, and the analytics numbers changes after the fact on a regular basis.


Ha. Love the mail server idea.

It simply proves the point that it's not a terribly large problem that takes 10,000 times the equipment because of [search | many recipients | tags | etc].

It reminds me of that flickr architecture from back in the day: hopelessly complicated with tiers and shards and tiers and caching and tiers and tiers...to serve some images. But tagging!

Do people feel more important if they make a complicated solution? Where is Alan Kay?


Could you elaborate a bit on the message reflectors and using follower trees instead of lists with regard to messaging like Twitter? I am genuinely interested in improving messaging patterns in twitter-like scenarios (ie. large fan-outs)


Let me start at the beginning: I have used mail servers as messaging middleware. Back around 2000 I ran an e-mail provider, and we jokingly started talking about taking our heavily customized qmail install and turning it into a queuing system for various backend services we were building. Then we decided to try it, and it worked great (we ended up using it in a reference registrar platform we built when we build the .name registry; but I've used a similar solution elsewhere since)

The point is e-mail provides the federation, and has a rich eco-system of applications and handles things that are easy to mess up, like reliable queueuing and retries, as well as a rich systems of aliasing and forwarding.

So let's consider Twitter: You have a list of followers, and a list of people you follow. It provides two obvious ways of knitting together a timeline: Push and pull. In real life it's probably most efficient to mix, but for the "twitter by e-mail" architecture, let's consider push only.

In its simplest form you map twitter ids to an internal but federated "email address" to a virtual bucket. Then you use MX records to map virtual buckets to a server. On each server you map the internal email address to a mailbox.

You also maps twitter ids to an internal "email address" for reflecting tweets to that twitter accounts followers. It also maps to a virtual bucket, with MX recors mapping to a server. But instead of mapping this addres to a mailbox, you map it to a mailing-list processor.

When user A follows user B, in this model that means user A subscribes to user B's reflector.

To handle fanout, you can use the aliasing supported by pretty much all mail servers to remap the reflector address to a second mailing list. This second mailing list is a list of lists. Here you need "non-email" logic to manage the mailing lists on the backend.

To outline this, for user A, the above might look like this:

- Twitter handle A maps to A@virtual-bucket-56.timeline.local ("56" is arbitrarily chosen - imagine hashing the twitter handle with a suitable hash)

- MX record mapping virtual-bucket-56.timeline.local to host-215.timeline.local ("215" is also just arbitrarily chosen in this example).

- On host-215.timeline.local there is a IMAP mailbox for tweets from people this user follows.

- Twitter handle A also maps to A@virtual-bucket-56.reflectors.local, with MX record mapping that to host-561.reflector.local (the point being that the MX records can be used to remap failing hosts etc)

- On host-561.reflector.local "A" maps to a mailing-list package that accepts basic subscribe ("follow") and unsubscribe ("unfollow") options.

Here you already have the basics. The "magic" would happen once the mailing list A@host-561.reflector.local reaches some threshold, say 10k. At this point you'll want to add a level of indirection, say you rename A@host-561.reflector.local to A-sub1@host-561.reflector.local and creates a new A@host-561.reflector.local with one subscriber: A-sub1@host-561.reflector.local. Then you create a new mailing list on a different server with sufficient capacity, lets say A-sub@host-567.reflector.local, and subscribe that (you might want to indirect these two via virtual buckets) to the main list.

There's no magic here - mailing out a list of 10k is trivial. A two level tre with 10k at each level can have 10k leaf nodes with 10k users each, for 100m users.

In practice you'd likely "cheat" and mark the top users someone is following, and do pulls against cache servers for their tweets instead of pushing them, and so drastically reducing the need for big fanouts. Basically you need to spend lots of time testing to determine the right cutoffs for pull (which potentially will hit many servers on each page reoad) and push (which hits many servers each time someone tweets to a large follower list).

Again, let me reiterate that while this type of setup works (have tested it for milllions of messages), it's by no means the most efficient way of handling it. The e-mail concept here is more of a way of illustrating that it's a "solved problem" and "just" an issue of optimization.

For starters, you'll want to consider if it's easy enough to reconstruct data to drop syncing to disk, using RAM-disk to speed up "deliveries" etc., and you may want to consider different types of storage backends etc. You may also want to consider other "tricks" like locating leaf-reflector nodes on the servers where the accounts the reflect to are located (at the cost of more complicated "mailing list" management).

The most worthwhile lesson is that if you hash the id to a virtual bucket, and have a directory providing mapping from virtual bucket to actual server, you gain flexibility of easily migrating users etc.. If you in addition provide a means of reflecting messages to a set of subscribers you have pub-sub capability. If you need to handle big fanout, you'll want a way of "pushing down" the list and inserting a fan-out reflector "above" it.

Those patterns can be applied whether you use e-mail, or zeromq or any low level messaging fabric for the actual messaging delivery (in general the [entity] => [virtual bucket] => [server] indirection is a worthwhile pattern for almost anything where you may need largescale sharding)


In WhatsApp, a typical message goes to 1 other person. On Twitter, it can go to millions of people.

When Twitter initially got their failwhaling under control, I recall reading they solved it by changing from a relational "join in and merge the timelines of everyone you're following on each refresh" model to a messagebox model. If that's true, maybe that naive model is now showing its limitations (I doubt they stopped there though, it seems like they have things under control)


I suspect the writer was using the phrase "hundreds of millions" figuratively. When I worked there years ago there were already 14 billion API requests a day, iirc. (That number was public at the time, for the record.)


I belive it's in the low 100's of millions of tweets per day. I've seen that stat elsewhere

> 14 billion API Do you mean 14B internal, services-requesting-services API requests?

Surely you can't mean 14B API requests from the outside world, can you? I'm scratching my head over how their real user base could generate anywhere near that load.


As of several years ago the putlic http endpoints would easily do 1M/sec at peak times. Not just api, but web, images, et al.


You really need to read the article.

Those servers aren't just for managing the messages. It's also for their advertising and analytics platforms. And since over a third of their servers are for generic Mesos it could be for anything e.g. development containers.


That's only 1000 messages per second on average. A single database +app server could handle that load. Assuming a bunch of other stuff is happening 500 servers sounds generous.

Wtf are they doing that each server can only handles one tweet every two minutes?


Actually, it's 1000-9000 messages per server per day. Or about 1 message every 10-100 seconds.

Of course, that's just the new messages inbound. They may need to distribute that single message to 100M people (who likely won't even see it, but still.)

Problems that are trivally solvable with one database don't simply scale by adding more DBs or machines. Scaling isn't easy or they would have done it. I'm in no way disparaging their team, because I don't know what kind of constraints they had getting to this point.

Still, I'd bet it could be optimized by 2+ of orders of magnitude if people sat down and re-evaluated the whole structure again at this point in time.

Regardless, is that really a priority?

They may have bigger issues on their plate now (growing revenue, growing users, making users happy). Assuming their business can generate the cash flow to overcome the inefficiencies, they may be better served to focus on growth.


The Israeli desalination plant "Sorek can produce a thousand liters (264 gal) of drinking water for 58 cents." And it can produce 624Km3/day or 150M m3 of water a year. That's 20% of Israeli domestic consumption.

That's $0.002 per gallon, about what CA water costs. All fresh from the sea.

75X cheaper than the 0.15 cents per gallon @mojomark estimates in the thread below that it would cost in fuel costs alone for a tanker to ship water from alaska.

https://www.scientificamerican.com/article/israel-proves-the...

I can't tell exactly, but the $0.58/1000 liters seems to be amortizing the $400M investment cost, because it's on a 25 year Build - Operate - Transfer contract.


> it settled on one that has since become a byword for failure: Edsel

The Ford Edsel didn't fail because of the name. It was a hyped "advanced" product that also had build quality issues, and questionable styling. Mechanics didn't trust the combustion technology so they couldn't recommend the "new" tech. A lot of simple reasons that people didn't buy it.

The name, of course, was that of Henry Ford's son, Edsel Ford. The product was expected to be "the future", but it unfortunately didn't work in the present well enough.

It got a bad rep, like the Corvair, and consumers stayed away in droves.


How do you guys deal with re-factoring code? Imagine a codebase that has been largely additive for 100 years.

While the "+1 minus two guideline" has plenty of shortcomings in the long term, there is lots of low hanging fruit now, and it's an important mindset shift.

Guys, don't just add LOC. Refactor, clean it up and make it better. Remove blocks we don't use anymore.


Government is strongly limited by time. There's a hard ceiling on the amount of things that can be proposed, debated, and voted on within a term. Consequently, if there are things in the statute that are no longer used it's often better to simply ignore them than to spend precious time in the house arguing about removing them. It might only take 5 minutes to call the house to order, have someone stand up and say "We don't need a law banning witchcraft any more" and then have a vote where the result is a foregone conclusion, that's 5 minutes that the government isn't doing something useful that will actually impact people's lives.

The repeal of pointless old laws comes up relatively often here in the UK. Some of our laws are really old - the government was talking about repealing some that were passed almost 750 years ago recently http://www.bbc.co.uk/news/uk-politics-30334812


>>>Government is strongly limited by time. There's a hard ceiling on the amount of things that can be proposed, debated, and voted on within a term. Consequently, if there are things in the statute that are no longer used it's often better to simply ignore them than to spend precious time in the house arguing about removing them.

That is exactly why all laws should have sunset rules built into them.

We should have listened to Thomas Jefferson who wanted all laws, even the constitution itself to expire every 19 years


It's a really terrible thing to have laws that aren't respected by the people, the government or the courts. It completely undermines the rule of law.


There's an argument to be made that there might be some benefit in government spending a little less time impacting people's lives.

But also, spending that time cleaning up old laws sounds an awful lot like "doing something useful that will actually impact people's lives".


Sounds like a garbage-collection problem. Some algorithms are more efficient than others.


How do you guys deal with re-factoring code?

As a very rough guide. Replacing one big function with two smaller functions is often better than replacing two small functions with one big one.


This is for functions that actually have a meaningful purpose and merit being better understood.


spot on


I think we programmers are ahead here - the general public needs sometime to think this over. The "+1 - 2" rule looks like a good start for opening the discussion.


People LOVE being sold things/ideas they agree with. e.g. How many people tune into to Apple's product announcements?

People don't like rasping attempts to change their mind.

The key to selling is that it should never feel like "sales", as the listener understands it.

Many here will argue that Apple's announcements aren't really selling, they're informative, exciting, useful, keep one abreast of the latest, useful to know in the space. etc. That's perfectly executed selling: I'm _not_ being pitched. I'm deciding on my own that this fits my wants and needs.


I would suggest the shared experiences and better relationships happen in real life when you commit 100%.

We know that IRL we can be in physical proximity with one another, but perhaps it's the joint commitment to the experience that builds the meaning.

Better relationships probably are built on more commitment, not less.


I agree, but there are different types of commitment. There's commitment in terms of time, energy, emotion, attention, etc.

While VR provides anonymity (taking away some emotional commitment of human expression/emotion), the actual shared experiences you're doing is more powerful than any other medium and is pushing on the multiple other levers of commitment such as time, energy, attention, etc.

So VR is creating better relationships through commitments of multiple type and strength.


Agreed. Can anyone see how this is a technology play? I get that a16z is looking for returns but is this really the space for them to "add value"? We haven't seen any good ideas lately and we can park a lot of cash here?


Anecdata here, but...I live on this planet, have an awesome spouse, great kids, etc. It's seriously wonderful to be monogamous.

For me, at least, marriage has _everything_ to do with it.


You're confusing pre-marital sex with monogamy.


Good for you! Self control turns out to be an advantage for many things in life.

It's even better as you get older and begin to appreciate that enduring fidelity actually increases love. But I guess many here don't know that.

Statistics say most people are weak willed. shrug. That's obvious, and still not a reason to live that way.


I'm struggling to find your solutions appealing.

Your answer for addressing indoor drug dealing is to not arrest outdoor drug dealing when you see it?

Allow stealing meat because somebody else got away with scamming?

We disallow theft because theft-free is a better way to live. We have decided that scamming your investors is also punishable as fraud.

Don't throw up your hands as if these are impossible issues. Live life and make your decisions to build your community into the places where you want to live.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: