Design Lessons & Advice from Building Large Scale Distributed Systems at Google

neilk · on Oct 17, 2009

Jeff Dean is well known as a superstar within Google, although not so much to the outside world. With the exception of PageRank he created or had a hand in almost all the technologies you've heard of as the major Google innovations.

I have a Googler friend (genius coder in his own right) who sometimes wondered if he wouldn't be more productive by just devoting his workday to ensuring that Jeff Dean was properly caffeinated.

I find this kind of fascinating because Jeff Dean's academic background was in compiler optimization research. Not the obvious choice to build infrastructure for a large website. But perhaps compiler people know how every nanosecond counts, and can see the network as just another high latency part of a big computation.

strlen · on Oct 17, 2009

Designing and implementing a compiler involves many aspects of computer science -- as does distributed systems development.

If you've never written a compiler from scratch, I highly suggest doing so(1): there are data structures (symbol table -- a hash table, parse tree, tries), algorithms (converting an NDFSM to a DFSM(2), parsing, filling registers and many others), computer organization / architecture (outputting assembly, optimizing the code with pipelining and CPU cache in mind). Not to mention it's also a substantial project which teaches you a lot about software engineering: it's not something you can write in a single session, it's something that has to be frequently extended and tested (I can't think of a cleaner candidate for test driven development: you specify the source code you want to input and test if the generated IR/assembly code is what you're looking for). It's also really fun :-)

Not to mention custom compilers/languages are very useful in distributed systems: look at Thrift/Protocol Buffers ( languages for describing serialization of data structures for RPC). Google also has custom language which is compiled into state machines that they used to implement their Chubby service (and probably other distributed systems) -- see the "Paxos Live" paper. At an employer I've been at, which (when I was there) was deploying distributed systems with tens of thousands of nodes, we've created a custom language for describing/querying groups of machines: this language and systems built on top of it (for monitoring, configuration, provisioning and deployment), helped reduce admin:machine ratio by orders of magnitude (think a team of three handling ~10,000 machines).

(1) I suggest picking up an older (not the newest) version of the Dragon Book and targeting a x86/amd64/MIPS: while for a practical compiler I'd almost certainly suggest targeting LLVM or JVM/CLR, you'll learn more targeting bare metal.

(2) Incidentally, Lamport's landmark paper suggested treating processes in distributed systems as state machines.

sandGorgon · on Oct 18, 2009

that is a fascinating account - could you elaborate on how you leveraged a compiler/custom language for your 10K machine management.

I am not able to wrap my head around what was it specifically that you needed a language for - rather than, say building it on top of perl/python/lua - and whether you used the yacc/bison toolchain to build it.

strlen · on Oct 19, 2009

First, I didn't build the language/compiler myself (it was written in my group before I arrived). It would be embedded in other languages, much as SQL is: you execute queries (first against data on a local disk distributed via version control, later against a web service).

The queries you'd make would be like: "Give me difference of (union of ( all machines running 64 bit Linux , all machine in cluster "match", all machines in wast coast datacenter )) machines starting "box10" and then show me the vlans these machines are on". Of course this would be in a much more terse (but very easy to learn) language. Perl/Python/C code (and through a CLI tool, shell scripts) would then do operations on returned values (i.e. parallel ssh to execute a command, parallel TCP call).

snprbob86 · on Oct 17, 2009

The guy is definitely incredible:

http://research.google.com/people/jeff/index.html

http://news.ycombinator.com/item?id=220031

nostrademons · on Oct 17, 2009

He certainly is, but I suspect the YCombinator thread you linked to actually got the wrong person. All the facts the post cites are true, but the Steve Yegge blog he's responding to was talking about the engineering culture. While Jeff Dean's had a hand in basically every piece of engineering infrastructure at Google, another name comes to mind for someone who's had a hand in basically every piece of engineering culture.

pmorici · on Oct 17, 2009

Hm, there are some interesting numbers in there.

"Map Reduce Usage at Google: 3.5m jobs/year averaging 488 machines each & taking ~8 min ... Big Table Usage at Google: 500 clusters with largest having 70PB, 30+ GB/s I/O"

So to run 3.5 million jobs at 8 minutes each on 488 machines that means they would need at least 26,069 machines to complete those map reduce jobs in a year.

Similarly if you deduce that for their largest storage cluster they are using their previously described commodity hardware approach and at the moment the sweet spot for drives is at about 1TB with 1 or two drives per machine that is between 36,700 and 73,400 machines in their largest storage cluster. That seems like a lot.

wallflower · on Oct 17, 2009

> Working on next generation Big Table system called Spanner

o Similar to BigTable in that Spanner has tables, families, groups, coprocessors, etc.

o But has hierarchical directories rather than rows, fine-grained replication (ad directory level), ACLs

xal · on Oct 17, 2009

It's funny. All these abstractions build on top of each other to provide fault tolerant failover at google scale and we are coming full circle back to the file system metaphor.

Finally :-)

kls · on Oct 17, 2009

If there is something that I have seen come to pass time and time again in technology, is that, the old becomes new again. You can literally make a fortune on identify the next big thing and then implement an old idea on top of it. Look at mobile, we started out with simple systems that took assembly or C (Newton, Palm) and graduated to Java and now we are seeing the latest platform move to web technologies. Take any facet of technology and you can draw a parallel to legacy technology.

dejv · on Oct 17, 2009

Future scale: ~10^6 to 10^7 machines, ~10^13 directories, ~10^18 bytes of storage, spread at 100s to 1000s of locations around the world

tectonic · on Oct 17, 2009

How do you compete against this?

bliving · on Oct 18, 2009

Specialization would be my first guess. But then the problem is, how do you identify a niche that stays under Google's radar long enough to exploit any advantage? So, you just have to find a knowledge domain to specialize in that Google doesn't cover. And how hard is that?!?

You're right... how do you compete against this?

mr_luc · on Oct 18, 2009

Great Teams Can Do Anything versus Great Surfers Need Great Waves.

I always felt as though the discussion was missing something -- and I think that this is it. Competition is different from most startups' vision of success. Growing to compete with the titans of your industry is always harder, so maybe you do need to ride a wave to do it.

I'll be the methods are the same, at least at the start:

    make something people want

That's what facebook did, and they've now got some pretty cool infrastructure running in-house.

I wouldn't say -- particularly after reading that article, yikes -- that anyone is really in a position to compete directly with google via frontal assault.

But if you really want to take on Google instead of be bought by them, the starting point is probably the same as it would be for an upstart in manufacturing back in the day: make something people want, a killer product; become the leader in that product, that thing, and you'll probably be more competitive than when you started, even if you don't have the economies of scale of your competition.

Google does move really fast, and their moat is massive.

I personally have neither the desire nor the ability to pole-vault that moat. Still, the trend against them is the iron-clad rule that small teams are always more productive than large teams (they manage it incredibly well, still -- somewhere, somehow, this is a drag on them), small organizations can move faster than big ones, etc.

Small organizations will be able to make things people want, the first step towards competitiveness. And as for infrastructure advantage the trend is in their favor:

    cloud-on-demand; or, DIY big storage (backblaze guys)
    crawling-on-demand (8legs et al)
    pbx-on-demand (twilio etc)
    ... lots more ...

That ecosystem empowers tiny companies to do a lot more before they have to think about in-housing difficult infrastructure -- possibly, enough to fund it!

( This is all just a thought exercise; I don't know why someone would want to wade the moat and take down google as opposed to sell to them. There are more pleasant things to do with your life. )

seymourz · on Oct 18, 2009

Who ignites Google's inspiration for infrastructure innovation? This man certainly should be Urs Hoelzle. All the star work, done by Jeff Dean and other at Google, is from his thoughtful planning ...