Hacker News new | past | comments | ask | show | jobs | submit login

I feel like I need to note the pricing. $0.01 per 1,000 queries. That doesn't sound like much, but it adds up. Let's say you make 1,000/sec. $0.01 * 60 seconds in a minute * 60 minutes in an hour * 24 hours in a day * 30 days in a month = $25,920.

Is that a lot? I think it is. Google Cloud Spanner costs $0.90/hour per node or around $650/mo. Each Cloud Spanner node can do around 10,000 queries per second[1]. So, $650 to Google gets you 10x the queries that $25,920 to Fauna gets you. I mean, for $25,920, you could get a Spanner cluster with 40 servers. Each of those servers would only have to handle 25 queries per second to get you 1,000 queries per second.

I'm sure that people are going to question whether FaunaDB can actually do what it claims. At this pricing, I can't imagine someone actually seeing if they can live up to their claims. They have a graph showing linear scaling to 2M reads per second. Based on their pricing, that would be $630M per year. For comparison, Snapchat committed to spending $400M per year on Google Cloud and another $100M on AWS (and people thought the spend was outrageous even for a company valued at tens of billions of dollars). This is more money for the database alone.

Heck, it looks like one can get 5-20k queries per second out of Google's Cloud SQL MySQL on a highmem-16 costing $1k/mo[2]. That would cost $130k-$500k on FaunaDB. It seems like the pricing of FaunaDB is off by a couple orders of magnitude.

Ultimately, Spanner is something built by people that published a notable research paper and used by Google. Reading the paper, you can understand how Spanner works and be saddened that you don't have TrueTime servers powered by GPS and atomic clocks. FaunaDB has some marketing speak about how I'll never have to worry about things ever again - without telling me how it will achieve that.

It's also implemented in Scala. This isn't a dig on Scala or the JVM, but I use three datastores on the JVM and only one isn't sad for it is Kafka. But Kafka does very little in the JVM - it basically just leans on sendfile to handle stuff which means you don't get bad GC cycles or lots of allocations and copying.

FaunaDB is a datastore without much information other than "it's great for everything and scales perfectly". Well, at their pricing, they might be able to make it happen. I mean, most customers would simply move to something cheaper as they got beyond small amounts of traffic due to the pricing. 60,000 queries per second? That'll be $18M per year from FaunaDB or $50k per year from Google. It's not even in the same ballpark. If you really need to scale to 2M reads per second, $630M seems like a lot more than $1.6M for Spanner.

Maybe it's an easy way to get some money off people that "need a web scale database", but are actually going to be serving like 10 queries per second and are willing to spend $260/mo to serve that. If they hit it big, it shouldn't be insane to scale it to 10,000 queries per second and milk $260k out of them each month for a workload that can be handled by a single machine. That money also pays for decent ops people to run a big box and consult with the customer if they're going towards 100k queries per second with a $2.6M monthly payment.

EDIT: looking over Fauna's blog and some of their comments here, they seem to understand more than their marketing lets on. Daniel Abadi is one of those people whose name carries weight in the databases world (having been involved with C-Store/Vertica, H-Store/VoltDB, and others). While I haven't read the Calvin paper, it looks like a good read. I can see that they are using logical clocks and I can't find it right now, but I thought I saw that they're not allowing one to keep transaction sessions checked out - that all the operations must be specified. So, it seems like there's some decent stuff in there that's currently being obscured by marketing-speak. Still, the pricing seems really curious.

[1] https://cloud.google.com/spanner/docs/instance-configuration

[2] https://www.pythian.com/blog/benchmarking-google-cloud-sql-i...




To add extra color, for about $3M/month @ list prices of Cloud Datastore [1], you can, in a Multi-Region active-active synchronous replication configuration, run a workload with the following profile:

Reads: >1.1M entities/second Write: >380K entities/second Delete: >190K/second Storage: 100TB

And that's if you don't use any of the nearly free optimizations like Projection queries & keys-only queries, which any large scale customer does.

That's not pre-provisioned usage, it's actual pay-as-you-go usage - so if you have no traffic, you have no costs (except for what's already stored). It's been that way for 8 years too.

[1]: https://cloud.google.com/products/calculator/#id=e21b61d5-4a...

(PM for Cloud Datastore - if you'll looking at 1M+ QPS workloads feel free to message me)


Huge scale is what FaunaDB On-Premises is for; the pricing model is different. That's what NVIDIA uses for example. Nevertheless, we will have volume discounts and reserved capacity in Cloud too.

I see where you're coming from. People make the same argument against using cloud services at all when you can buy hardware yourself and operate it. The lack of flexibility is the hidden cost.

Our cloud pricing is competitive with other vendors, most of which require you to massively over-provision in order to get high availability, especially global availability, as well as predictable performance. In traditional cloud databases, you have to provision for peak load. Usually this is an order of magnitude difference from average load. An order of magnitude difference happens to matches your Spanner example exactly; however with Spanner, you still have to manage your capacity "by hand".

Architecture docs are on the way.


You're right that it's was a bit unfair to compare a flexible FaunaDB to Spanner which you'd need to provision for peak traffic. But even if it's an order of magnitude more, $16M vs $630M is still quite a gap. It really doesn't match the Spanner example. And if you're able to handle incredibly spiky loads, information on how is kinda important. If I go from a steady state of 100 QPS to 15,000 QPS for a 20 minute period, will that just be pain?

You've said that Spanner makes you manage capacity by hand, but the marketing copy says, "FaunaDB is adaptive, because it lets you change your infrastructure footprint on the fly. Dynamically shift resources to critical applications, elastically add capacity during peak events, and replicate data around the world—all in a unified data fabric." So, if I'm expecting a burst of traffic, do I have to "change my infrastructure footprint" manually? How quickly can one "elastically add capacity"? I mean, I've seen plenty of systems that one can add capacity to that, well, get humbled when copying data to new nodes. Like, you had 10 nodes and now you want 15 because you're being hammered. And wonderful, it's trying to copy data to the new nodes while it was already having capacity issues and only making response times worse and errors go up. I'm not saying that will happen to you, but there's no information to make me think that problem is addressed.

Honestly, people involved in FaunaDB seem to know enough about databases that I'd just expect more real information on the website. When Kudu came out, they published a paper that basically read like, "well, we created a column store kinda like one would if you'd read the C-Store paper and these are the trade-offs and we seem to have done reasonably" and I came away from reading it thinking, "ok, these people know the score. It may or may not be executed well enough, but there's an understanding." They led with a paper that might not have been revolutionary, but really showed that they understood the space and explained how it was designed such that someone with databases knowledge could see that it was reasonable.

Introducing your database with so much, well, non-information doesn't help you (in my opinion). Without digging, it looks like another DB vendor that promises everything will be perfect and that it's great for any workload.

The whole "About FaunaDB" page doesn't tell me much. Like, there's a comment in here that tells me you're using logical clocks, I can see from Daniel's Twitter that you're using some of his research, etc. I mean, you actually have cool technical details to highlight - details that make your DB seem a lot more real. But the page makes it feel like you don't have cool technical details - that you're trying to hide information because it's not good. I mean, adding in some details about how things are achieved make a product seem a lot more real. I know what logical clocks are. Calvin is a research paper I can read. I mean, finding that makes FaunaDB seem way more real - there's something substantive. Like, I can read Calvin tomorrow and some of the ways you're achieving things will come to light and I might be impressed.

But right now, it's really hard to find the information that would impress technical readers.


I'm with you. That level of detail is coming soon.


If we're on the topic comparing Spanner, here's the 15-second live demo of resizing Spanner from 70 to 99 nodes at [0]. The act itself is quite unremarkable, but the complexity abstracted away is awesome.

Both Spanner and Datastore do quite well in the cloud for "huge scale" as fully managed services. And with any deployment on-premise, one certainly must manage their own capacity "by hand".

(work at Google Cloud)

[0] https://youtu.be/kwnWfHq2EfQ?t=11m48s




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: