We have, so far, only made a GraphQL endpoint for internal use. It has been awesome for quickly finding things in a medium-complexity database schema (30 or so tables, bunch of different relationships). It has become my preferred way to quickly find things in the database.
That being said, there are cases where a join between ad hoc subqueries is the best way, and GraphQL doesn't really offer a way to do that (though I don't see why it wouldn't be possible). E.g. arbitrarily combining two GraphQL queries that return lists where some field in one is equal to some field in another.
But in terms of replacing REST, where you have to do all of that anyway, it is far and away the better option (for ad hoc querying, at least).
> a join between ad hoc subqueries is the best way, and GraphQL doesn't really offer a way to do that
I think you're supposed to create a "virtual" field on the left-hand object that represents a collection of the right-hand type of objects. The field can be parameterized if your join needs extra information. If you want pagination, the virtual field returns an intermediary object describing the cursor (sort order, offset).
Ad Tech has no need for realtime data viewing or aggregation (in this manner), even for platform log data. Offline parallel processing is the standard. Redshift is particularly efficient, while others use Spark or other ad-hoc solutions.
For users, you always want mediation/adjustment steps that (can) modify realtime data to provide timesliced totals. For developers/administrators, you want to be able to persist data. Running totals in memory are too fragile to be reliable. There is an assumption of errors, misconfigurations, and bad actors at all times in AdTech.
We used MemSQL for real-time data for 2 years. All data is fully persistent, but the rowstore tables are also fully held in memory compared to columnstores which are mainly on disk. There's nothing fragile about it. SQL Server's Hekaton, SAP's HANA, Oracle's Times Ten, and several other databases do the same.
Timesliced totals is just a SQL query, and mediation or some other buffer from live numbers for customers is up to every business to decide, not some default proclamation for an entire industry.
If async await worked for the continuation monad, the typical list monad, and the maybe monad, then yes, async/await would be a good, if somewhat strangely named, construct
To scale writes you shard writes. This generally means running multiple databases (in any DBMS)
The key insight is that Datomic can cross-query N databases as a first class concept, like a triple store. You can write a sophisticated relational query against Events and Photos and Instagram, as if they are one database (when in fact they are not).
This works because Datomic reads are distributed. You can load some triples from Events, and some triples from Photos, and then once you've got all the triples together in the same place you can do your queries as if they are all the same database. (Except Datomic is a 5-store with a time dimension, not a triple store.)
In this way, a Datomic system-of-systems can scale writes, you have regional ACID instead of global ACID, with little impact to the programming model of the read-side, because reads were already distributed in the first place.
For an example of the type of problems Datomic doesn't have, see the OP paragraph "To sort, or not to sort?" - Datomic maintains multiple indexes sorted in multiple ways (e.g. a column index, a row index, a value index). You don't have to choose. Without immutability, you have to choose.
What is the primary reason people choose Datomic? From reading the website, I get the impression that the append-only nature and time-travel features are a major selling point, but in other places its just the datalog and clojure interfaces. I'm sure it's a mix, but what brings people to the system to begin with?
Datomic is my default choice for any data processing software (which is almost everything).
Immutability in-and-of-itself is not the selling point. Consider why everyone moved from CVS/SVN to Git/dvcs. The number of people who moved to git then and said "man I really wish I could go back to SVN" is approximately zero. Immutability isn't why. Git is just better at everything, that's why. Immutability is the "how".
I don't see why I would use an RDBMS to store data ever again. It's not like I woke up one day and started architecting all my applications around time travel†. It's that a lot of the accidental complexity inherent to RDBMS - ORM, N+1 problems (batching vs caching), poorly scaling joins, pressure to denormalize to stay fast, eventual consistency at scale...
Datomic's pitch is it makes all these problems go away. Immutability is simply the how. Welcome to the 2020s.
Thanks for the explanation, and I definitely agree with all your points. The reason I ask is that over the past year or so, we have been hacking on a little side project that maps a copy-on-write tree to a distributed shared log (https://nwat.io/blog/2016/08/02/introduction-to-the-zlog-tra...). This design effectively produces an append-only database with transactions (similar to the rocksdb interface), and means that you can have full read scalability for any past database snapshot. We don't have any query language running on top, but have been looking for interesting use cases.
A lot of stuff in those links that I'm not familiar with. I'd expect challenges around what will happen when the log doesn't fit in memory? What if even the index you need doesn't fit in memory? If I understand, zlog is key-value-time, so I'm not sure what type of queries are interesting on that. Datomic is essentially a triple store with a time dimension so it can do what triple stores do. What do you think the use cases are for a key-value-time store?
The log is mapped onto a distributed storage system, so fitting it in memory isn't a concern, though effective caching is important. The index / database also doesn't need to fit into memory. We cap memory utilization and resolve pointers within the index down into the storage system. Again, caching is important.
If I understand Datomic correctly, I can think of the time dimension in the triple store as a sequence of transactions each of which produces a new state. How that maps onto real storage is flexible (Datomic supports a number of backends which don't explicitly support the notion of immutable database).
So what would a key-value-time store be useful for? Conceptually it seems that if Datomic triples are mapped onto the key-value database then time in Datomic becomes transactions over the log. So one area of interest is as a database backend for Datomic that is physically designed to be immutable. There is a lot of hand waving, and Datomic has a large number of optimizations. Thanks for answering some of those questions about Datomic. It's a really fascinating system.
My questions were in the context of effective caching in the query process, sorry that wasn't clear. A process on a single machine somewhere has to get enough of the indexes and data into memory to answer questions about it. Though I guess there are other types of queries you might want to do, like streamy mapreduce type stuff that doesn't need the whole log.
I will have to think about if a key-value-time storage backend would have any implications in Datomic. One thing Datomic Pro doesn't have is low-cost database clones through structure sharing, and it doesn't make sense to me why.
> A process on a single machine somewhere has to get enough of the indexes and data into memory to answer questions about it.
This common challenge is perhaps magnified in systems that have deep storage indexes. For example, the link you posted seems to suggest that queries in Datomic may have cache misses that require pointer chasing down into storage, adding latency. Depending on where that latency cost is eaten, it could have a lot of side effects (e.g. long running query vs reducing transaction throughput).
This problem is likely exacerbated in the key-value store running on zlog because the red-black tree can become quite tall. Aggressive, optimistic caching on the db nodes, and server-side pointer chasing helps. It's definitely an issue.
I don't know anything about Datomic internals, so disclaimer, I'm only speculating about why Datomic doesn't have low cost clones: which is that the underlying storage solution isn't inherently copy-on-write. That is, Datomic emulates this so when a clone is made there isn't an easy metadata change that creates the logical clone.
Despite the lack of features etc in the kv-store running on zlog, database snapshots and clones are both the same cost of updating the root pointer of the tree.
> easily do that with a single aggregation REST endpoint too, provided you have resource links in your responses
I work with/on a "REST" API that has about 20 types of objects, 5-100 fields on each, with many different relationships between them. We've been looking at GraphQL to solve both the querying and mutation aspect, since it is extremely cumbersome to do it efficiently.
GraphQL seems great for this case, but if there is some way just adding resource links could get us most of the benefits I'd love to hear it!
Unfortunately the PokeAPI is not the best at showcasing this with many level of unneeded nesting and resources with nothing but an "url" property but hopefully you get the idea :)
The fact that the query "language" is indendation based rather than GraphQL's a-string-that-looks-a-bit-like-json-but-isnt was just an arbitrary choice, it's easy to change.
Personally, I think it would be morally reprehensible to use energy to keep BTC alive if there are vastly more efficient alternatives that achieve the same goal. Though I am interested in counter-arguments.
I don't think "reprehensible" applies here. We keep running old computers and old cars and we replace them as the old ones wear out. Bitcoin's ubiquity will be hard to unseat but all you can do is try to convince individual wallet-holders and merchants to use [the new more efficient coin].
Other coins have been proposed that either reduce the amount of computation (PPC) or increase the utility of the calculations (XPM). Note that those have existed for a long time and don't require an ICO. Using storage space as a (hopefully?) less energy-intensive contest to replace PoW is interesting although I suspect that Chia isn't truly novel here either.
Since I am calling out the responses to you, I feel like I should do the same here.
> All the other things that people use exist as (individually) extremely simple extensions that you can learn in isolation in a few minutes.
If by "learn" you mean you can type in something that is the correct syntax, sure... but in terms of actually learning how to use them to model problems? Come on...
> Haskell is possibly the least over-engineered language out there
Javascript is pretty under-engineered. But more seriously there are several real contenders here in Smalltalk, older Scheme standards, and Go (though again that may be considered under-engineered).
That being said, there are cases where a join between ad hoc subqueries is the best way, and GraphQL doesn't really offer a way to do that (though I don't see why it wouldn't be possible). E.g. arbitrarily combining two GraphQL queries that return lists where some field in one is equal to some field in another.
But in terms of replacing REST, where you have to do all of that anyway, it is far and away the better option (for ad hoc querying, at least).