Hacker News new | past | comments | ask | show | jobs | submit login

A highly-scalable, multicast, real-time, distributed message passing system with full-text search, archiving, deletions, dynamic subscription changes, access restrictions, open APIs with quotas, abuse detection mechanisms, spam fighting algorithms, different authentication protocols, etc.

Still think it's so simple?




Okay, I'll take the bait...

> highly-scalable

Everything I take from our set of OTS components will have live, verifiable examples of running at scale.

> multicast, real-time, distributed message passing system

For this the choice really comes down to RabbitMQ or ejabberd. An XMPP solution is appealing for the obvious benefits of having a "presence" concept, but an AMQP solution keeps us closer to the current reality of Twitter.

So we start with a large distributed rabbit setup. A few clusters scattered across the world, connected via shovel pipes and using some nested queue/exchange plumbing to wire it all together.

> full-text search

Some queues dump to a FTS setup that keeps recent messages in RAM and migrates them to disk as they get older. SOLR is probably a better solution here, but I know the Sphinx delta/full-index model better and would reach for that first.

> archiving

Dump a firehose queue to disk and send copies/updates out to various services that want it. Pushing it to HDFS would probably be the first choice I would look at, just because it is easy to go from there to various Hadoop analytics.

> deletions

Nothing is ever deleted, it just becomes invisible. This is just a flag to add to the FTS indexes and archives.

> dynamic subscription changes, access restriction

Handled completely by the message queues.

> open APIs with quotas, abuse detection mechanisms, spam fighting algorithms, different authentication protocols, etc.

None of these is particularly difficult once you have the general framework setup, and the structure of the system actually makes it pretty easy to wire in things like realtime abuse and spam prevention once you get rolling. There is only one thing hard about building a better Twitter, getting the userbase to make it worthwhile. If Twitter was in a different market where the network effect was not so strongly self-reinforcing then it would have been cloned and re-implemented better back when the fail whale was our constant companion, but this is not the case so making a "better" twitter is of little value.


highly-scalable - now:maybe, year ago:no

multicast - are there still stupid limits on max number of subscriptions?

real-time - sort of. have you measured the latency? depends what you mean by realtime.

distributed message passing system - distributed:yes, message passing:it depends what you mean. is polling considered as messaging?

archiving - what?

deletions - doesn't remove the items from search. I consider it as not-working.

dynamic subscription changes - yes.

access restrictions - sort of.

open API - yes

...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: