Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If only there was some database that let you store flexibly structured documents but keep the data normalized. Perhaps you could even construct views and indexes to accelerate different access patterns.


If only. Can you imagine if we also had some form of normalization so complete that it could actually manage an arbitrary number of dimensions?

Can you tell me how many times that users email address changed in that document over the last 3 years by executing a simple query? What was the email after the 2nd time it changed?

In 6th normal form, such a thing is trivial to manage. Discipline is the only price to pay for admission.


Fauna is temporal, so yes. Normal forms not required.


Does Fauna support multiple temporal dimensions in the same query?


Can you give me an example of the kind of query you have in mind?


I think your original claim is where I would like to focus my argument:

> Normal forms not required.

3NF (approximately where document databases live) struggles to decouple the time domain from individual facts. Let me give you an example.

Assume a Customer document has a LastModifiedUtc property. Does this tell you when their email specifically changed? No. It just says "something" in the customer document was modified.

Now, you could say "how about we just add a property per thing I want to track?" Ok - so we now have Customer.EmailLastModifiedUtc, Customer.NameLastModifiedUtc, etc. This is pretty good, but now assume we also need to know what the previous email addresses and names were. How do we go about this? Ok - no big deal, lets just add another column that is some JSON array or whatever. Customer.PreviousEmailAddresses. Cool, so now we know when the email was last modified AND we know what every last variant of it was.

What is missing? Oh right. What about when each of the previous email addresses actually changed? Who changed it? From what IP address? Certainly, we could nest a collection of documents within our document to keep track of all of this, but I hope the point is starting to come across that there may be some value in exploring higher forms of normalization. Imagine if I wanted to determine all of the email addresses that were modified by a specific IP address (across ALL customers), but only over the last 25 days. I feel like this is entirely out of the scope of a document database.

Don't get me wrong. 3NF is extremely powerful and handles many problem domains with total ease. But, once you start talking about historization of specific fields and rates of change, you may need to consider something higher order.


This is possible in Fauna. All documents are actually collections of document versions within the configurable retention period. If you ensure that every writer decorates the document with the facets you want to search by (ip address, etc.) then you can construct indexes on those facets and query them temporally. They will return event records that show when the document entered the index (when that ip updated it) and left the index (when a different ip updated it).

Map the index additions and their timestamps onto the documents themselves and you can retrieve the entire state of each record that the ip wrote at the time that it wrote it. If you want to know specifically what that ip changed, then diff it with the previous record, for example, to filter down to updates that only changed the email address.


You can have never a system that’s capable of all types post hoc querying. If your model is general enough to handle all the things you want you won’t have a performant system as it can’t exploit any information about the problem. The only thing I can think of that’s capable of all you describe is a Write Ahead Log without any compaction.


> If your model is general enough to handle all the things you want you won’t have a performant system as it can’t exploit any information about the problem

Define "performant". Denormalizing your domain model because you feel like the database might get slow is a strong case of premature optimization, unless you have actually tried to model it this way and have measured.

You will find that most modern SQL database systems have no problem querying databases with thousands or tens of thousands of tables. In fact, having narrow tables can dramatically improve your utilization of memory bandwidth since you aren't scanning over a bunch of bytes you will never use in the result set.


Sounds a lot like sql databases with json extension to me


I suspect that might indeed have been the joke.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: