Is this one of those, "If you have to ask how much it costs, you can't afford it?" situations? Because "Try it now Free!" is the only thing I can see on their site related to a cost, and "First one's free!" rarely means it will always be free. :|
Their resource estimator starts off at 1640 cores [1] and even the bottom most tick on the scale represents 500 cores. For people who can dedicate that amount of hardware to their database the licensing cost is probably not a major component.
Fair enough - I had considered that to be on the axis since it's the minimum value, and thus not a tick, but visually it certainly is represented as a tick.
At any rate, the point I was trying to make is that they clearly expect you to throw a lot of hardware at these systems.
Sure, but a lot of projects LIKE this have an open source project for those people NOT in the enterprise, and who probably won't be running more than 8-24 cores or so.
On a related note, changing the amount of RAM seems to only change the amount of RAM, not any of the other numbers. I guess that's their point, but I would rather they just SAY that than make me try a bunch of options to show that RAM doesn't matter. :|
I'm surprised these guys haven't been sued by Microsoft yet. I think the founder himself worked on a similar project at MSFT called Hekaton and moved out. Atleast that is what I heard from a couple of devs he tried to recruit out of sql. I'd advise startups to be weary of jumping ship.
Quick question: since data is uniformly spread across n leaf nodes, do queries that require checking a number of rows >> n hit nearly every leaf? If so, does this create latency problems when n is large? (since it'd only take one slow request out of n to cause high latency)
Yes, we will fan out queries when necessary. In a hard core oltp workload this will affect latency across the system by flooding the network (if you send too many of these queries at once) but we expose knobs that let you limit these queries' parallelism to keep the rest of the system really fast.
the problem with scaling out to multi cores with a focus on ram is that in larger datasets you end up trading disk latency for network and protocol latency. I a not sure that is a great trade even if we are talking about fiber channel as a medium.
I have to disagree; disk is ancient - it's mechanical egads! - while 10GigE is pretty commonplace now and infiniband and fiber channel are even faster.
back from my CS 101 takeaways: there are only 3 bottlenecks in a computer system: CPU, network, and IO.
looks like MemSQL is fixing the CPU and IO bottlenecks, but physics is physics so network is pure hardware solution haha
The problem is that you can end up with larger latency over the network because it still takes a fixed amount of time for nodes to communicate. Even with a 1TB/s link between nodes you can still have a good 30ms between them all adding even more latency. That can be mitigated somewhat by a good protocol that can manage that latency properly (e.g. not blocking while waiting on ACKs and such), it can still end up with far more latency than a few large disks would be (even better now with SSDs). That said I do imagine that some datasets will benefit from this kind of topology (I can imagine that geospatial stuff will do well with that, since you can locate physically close things on a single machine and reduce the amount of talking needed).
kdb+ is also very fast for real-time time series analysis and signal generation. Seeing Morgan Stanley and Credit Suisse in the customer list made me wonder if memsql could become a competitor in the niche that kdb+ currently dominates?
there are a lot of new features! Beyond distributing data across multiple machines in a cluster, there's more SQL surface area, multiple levels of redundancy for HA, and a distributed query optimizer. Some cool stuff with bi-directional lock-free skiplists too w.r.t. indexes