RethinkDB (YC S09) Raises $1.2 Million For Its Database For Solid-State Drives

pg · on April 1, 2010

Incidentally, if any hackers are looking for jobs working on an interesting problem, I know the Rethinks are hiring. It would be the perfect job for a lot of hackers. You'd get to solve big problems starting with a blank slate, and you'd get to work with smart, totally pragmatic people (Slava and Mike). Plus they now have enough to actually pay you.

quickpost · on April 2, 2010

Not to mention they are very transparent with their compensation and stock options (Scroll down - http://rethinkdb.com/jobs/). People who run their hiring process and business like that seem like very good people to work for indeed.

kf · on April 2, 2010

That's really, really cool. I've never seen the stock options/equity listed on a hiring page before.

reitzensteinm · on April 2, 2010

And I love that it's a percentage, not, eg, 10k options.

byrneseyeview · on April 2, 2010

Do people actually quote the number of options without reference to a percentage or a stock price?

wheaties · on April 2, 2010

Not bad compensation for a start-up.

cperciva · on April 2, 2010

I would jump on this immediately if I was in the bay area and didn't already have my own startup. As it is, I'll have to satisfy my (data)base urges by writing my own high performance SSD-optimized key-value store.

ntoshev · on April 2, 2010

Wow, are the metadata of Tarsnap that demanding?

cperciva · on April 2, 2010

Tarsnap splits data into blocks of ~64 kB which are individually compressed and encrypted before being uploaded. I use S3 for back-end storage, but need to keep track of where I put each of the blocks.

It's not that the metadata is demanding; it's just that there's a lot of it. For each $/month Tarsnap takes in, I have about 200,000 table entries.

ShabbyDoo · on April 1, 2010

"There’s obviously risk involved with trying to redefine how people structure their databases"

TechCrunch misses the point that Rethink is explicitly not doing this. The MySQL engine is below the SQL parsing layer, so as-is MySQL apps should be able to run against it.

davidu · on April 2, 2010

That's in theory. In practice, MySQL has different syntax for different engines when you get into more esoteric queries.

andrewcooke · on April 2, 2010

i was looking at this the other day. how do they address (what i assume are) the larger space requirements of append-only databases with the lower capacities of ssds? am i wrong about the space requirements, or is moore's law going to fix it, or is there some kind of background compaction?

coffeemug · on April 2, 2010

A couple of points:

- We garbage collect (see Mendel Rosenblum's Ph.D. thesis on log-structured systems)

- Our customers care about cost per IOPS, not cost per GB.

- The hot real-time stuff is usually handled by a different database and/or storage system than the older, less frequently accessed data anyway.

andrewcooke · on April 2, 2010

thanks. if anyone else is curious, this seems to be the thesis - http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/6267.html

jganetsk · on April 3, 2010

Is there really a need for segment cleaning as Rosenblum lays it out?

They clean segments to reduce fragmentation and allow for decent-sized extents to write new data into. Since your writes don't need to be long and contiguous, do you really need to empty the live data out of segments?

You DO need to identify which snapshots are stale, and consequently mark certain blocks as free. But I see no need for compaction.

runT1ME · on April 2, 2010

Will you guys be able to support all isolation levels?

coffeemug · on April 2, 2010

Isolation levels are really a poor design decision, because they imply the use of locks. Serializable is great, but impossible to implement efficiently. Repeatable read, read committed, and read uncommitted can be implemented efficiently, but allow for various unpleasant isolation artifacts.

The one we're implementing is really the one everyone wants - snapshot isolation. It can be implemented very efficiently, and is stronger than repeatable read, read committed, and read uncommitted (so you should never want these three). It's not as strong as serializable, but nobody can give you a scalable serializable isolation level.

Snapshot isolation also guarantees consistency, but requires all transactions to be idempotent (so they could be rerun in case of a conflict). It's the best of both worlds, in practice most other databases already behave this way anyway.

andrewcooke · on April 2, 2010

so for inserts do you need to have some kind of uniqueness constraint that makes sure that a repeated insert is rejected (the first example that came into my head when i read "idempotent" was a simple insert, which isn't, as far as i can tell)?

[sorry if this seems like an interrogation - it's just interesting stuff you're doing...]

coffeemug · on April 2, 2010

Essentially, it means that any transaction might potentially be rolled back and rerun. This isn't a problem for SQL, but suppose I select some stuff, get back into the host programming language, fire off some rockets into space from the Kennedy Space Center, and then insert some data about the launch into the database. This is a big problem, because if the insertion fails because of potential conflicts, the whole thing needs to be rolled back (including the rocket launch), and rerun. A lot of software is written to account for this (i.e. don't perform any external state modification you can't roll back until you've confirmed the transaction is committed), but a lot of software isn't. To really have great isolation and performance, you need to write software this way. For people that don't, we'll support serializable level, but there are very strong limitations as to how efficient this can be.

andrewcooke · on April 2, 2010

ah, ok, i misunderstood how broadly you were using the word "transaction". makes sense, thanks.

ALee · on April 2, 2010

Big congrats. Smart people tackling tough problems get my vote of confidence.

helwr · on April 1, 2010

congratulations, Slava & Michael

vladocar · on April 2, 2010

This is very interesting project, all the current DB are optimized for normal HD(and the standard HD is the slowest part of our PC). But with development of the Solid-State Drives we will have more and more fast drives. So the Database who will take advantage of the these new SSDrives will lead the way in Database design technology. It is right time to invest in this technology.

jrockway · on April 2, 2010

But really, rotating disks are not that bad for most database use cases. B-trees, the usual on-disk database structure, are designed to keep similar data on the same disk page, which means that if you request row 42, row 43 will be in memory by the time you need it. So the slowness of the disk is abstracted away; iterate over your data in index order, and it's always fast.

Hash tables have a theoretical advantage over balanced trees, and an SSD would make a naive hash table implementation easier to implement. But if you are smart (like, say, BerkeleyDB), hash tables and balanced trees have almost the same real world performance.

RethinkDB might be better for write-heavy operations, but that's because SSDs are better for random writes.

coffeemug · on April 2, 2010

if you request row 42, row 43 will be in memory by the time you need it

True, but this is rarely the case for OLTP workloads. What happens when there is a credit card transaction with a user ID 100731, followed by a credit card transaction with a user ID 8762592? Even for range queries, what you're saying is true only if you're walking through the primary index. The second you start walking through the secondary indices, you're back to random read land (my Facebook friends, for example, are extremely unlikely to be stored in the user table sequentially).

SSDs are better for random writes

Random writes are very tricky on SSDs because of the slow erase operation. The FTL controllers are getting much better at this on micro benchmarks, but it's very difficult to measure random write performance profile over different timelines and different disk space utilization scenarios.

runT1ME · on April 2, 2010

But the lack of locking is potentially big for multicore applications.

cperciva · on April 2, 2010

I'm not convinced. Modern OSes use locking extensively and do perfectly fine on multicore applications (FreeBSD's pgsql performance scales linearly up to 16 cores last time I saw graphs).

Obviously you need to be smart about how you do your locking (no giant lock!) but the mere fact of having locking is not automatically a problem.

runT1ME · on April 2, 2010

I misspoke, I should have said 'many-core'. Yes, you're probably right that no respectable database is going to have a problem with lock contention on 16 cores. But, AMD released 12 core processors this week. Its likely we'll see the average DB server have 48 cores sometime in the next year or two, and who knows after that.

cperciva · on April 2, 2010

I quoted 16 cores because that's the biggest hardware the FreeBSD project had available when those benchmarks were being run -- I suspect that it scales linearly quite a bit further than that.

jrockway · on April 2, 2010

So that data corruption can scale linearly?

(Disks vs. SSDs and transactions vs. "eventual consistency" are orthogonal.)

coffeemug · on April 2, 2010

Lock-free != data corruption. You just use atomic builtins[1] as primitives, instead of mutexes and semaphores.

[1] http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins....

jrockway · on April 2, 2010

Are we talking about low-level or high-level locking? I am talking about high-level locking semantics, like transactions.

cperciva · on April 2, 2010

At least as they've announced it so far, RethinkDB is a back-end for MySQL; so you should still have all of MySQL's transactional functionality.

(I'm sure Slava will correct me if I'm wrong here...)

jrockway · on April 2, 2010

I read over their page more carefully, and I think I see what they mean by "no locks". It means that the database stays internally consistent regardless of read or write order. When you start a transaction, you see the data in the log before you started, but you don't see any changes after you start. Fine.

You can get more isolation that this, and you need to to really keep your data consistent, but all DBs except Berkeley seem to have this off by default. So I am not too bothered by this, but I would be interested in seeing how well Rethink handles concurrent OLTP applications that actually care about data integrity. Caring about data integrity is slow, and Rethink might not speed this up all that much. Or it might :)

runT1ME · on April 2, 2010

Oracle doesn't even support repeatable read (which seems weird to me). Read Commited should be doable in their system.

jrockway · on April 2, 2010

Oracle is marketing, not software :)

known · on April 2, 2010

Can't we implement RethinkDB as features of MySQL or PostgreSQL

mlLK · on April 2, 2010

Congratulations guys, you deserve it, and thank you to anybody else out there writing drivers or optimizing software for changes happening in hardware that we all take for granted.

ojbyrne · on April 2, 2010

It says "maintanence" on their technical details page. Hopefully they can fix that.

mglukhovsky · on April 2, 2010

Fixed, thank you for pointing it out.

sophiebits · on April 2, 2010

Does the iPhone really have an SSD?

CrazedGeek · on April 2, 2010

It uses flash memory, so yes. http://www.apple.com/iphone/specs.html