Hacker News new | past | comments | ask | show | jobs | submit login
MemSQL ships 2.0, scales across hundreds of nodes, thousands of cores (memsql.com)
127 points by nikita on April 23, 2013 | hide | past | favorite | 29 comments



Is this one of those, "If you have to ask how much it costs, you can't afford it?" situations? Because "Try it now Free!" is the only thing I can see on their site related to a cost, and "First one's free!" rarely means it will always be free. :|


Their resource estimator starts off at 1640 cores [1] and even the bottom most tick on the scale represents 500 cores. For people who can dedicate that amount of hardware to their database the licensing cost is probably not a major component.

The numbers look impressive, though.

http://www.memsql.com/why-memsql#scale-out


Try again - I found the bottom ticks were 8 cores and 256GB.


Fair enough - I had considered that to be on the axis since it's the minimum value, and thus not a tick, but visually it certainly is represented as a tick.

At any rate, the point I was trying to make is that they clearly expect you to throw a lot of hardware at these systems.


Sure, but a lot of projects LIKE this have an open source project for those people NOT in the enterprise, and who probably won't be running more than 8-24 cores or so.

On a related note, changing the amount of RAM seems to only change the amount of RAM, not any of the other numbers. I guess that's their point, but I would rather they just SAY that than make me try a bunch of options to show that RAM doesn't matter. :|


I'm surprised these guys haven't been sued by Microsoft yet. I think the founder himself worked on a similar project at MSFT called Hekaton and moved out. Atleast that is what I heard from a couple of devs he tried to recruit out of sql. I'd advise startups to be weary of jumping ship.


I blogged about this in some detail this morning.

http://www.dbms2.com/2013/04/23/memsql-scales-out/


Nice product. I'll try this out.

It seems like the isolation level is Read Committed. Are there plans to support higher isolation levels (Serialisability)?


This is in the books. It's tricky to do it on the cluster and has perf implications, but it is doable and we have a design for that.


Looks great!

Quick question: since data is uniformly spread across n leaf nodes, do queries that require checking a number of rows >> n hit nearly every leaf? If so, does this create latency problems when n is large? (since it'd only take one slow request out of n to cause high latency)


Yes, we will fan out queries when necessary. In a hard core oltp workload this will affect latency across the system by flooding the network (if you send too many of these queries at once) but we expose knobs that let you limit these queries' parallelism to keep the rest of the system really fast.


the problem with scaling out to multi cores with a focus on ram is that in larger datasets you end up trading disk latency for network and protocol latency. I a not sure that is a great trade even if we are talking about fiber channel as a medium.


I have to disagree; disk is ancient - it's mechanical egads! - while 10GigE is pretty commonplace now and infiniband and fiber channel are even faster.

back from my CS 101 takeaways: there are only 3 bottlenecks in a computer system: CPU, network, and IO.

looks like MemSQL is fixing the CPU and IO bottlenecks, but physics is physics so network is pure hardware solution haha


The problem is that you can end up with larger latency over the network because it still takes a fixed amount of time for nodes to communicate. Even with a 1TB/s link between nodes you can still have a good 30ms between them all adding even more latency. That can be mitigated somewhat by a good protocol that can manage that latency properly (e.g. not blocking while waiting on ACKs and such), it can still end up with far more latency than a few large disks would be (even better now with SSDs). That said I do imagine that some datasets will benefit from this kind of topology (I can imagine that geospatial stuff will do well with that, since you can locate physically close things on a single machine and reduce the amount of talking needed).


30ms? In anything resembling a modern datacenter? 0.3-0.5ms is more typical these days.


He was joking.


How does this compare to kdb+? This seems like a much less arcane competitor.


kdb+ is compressed columnar in memory on a single box with a very exotic language called Q. memsql is row-based in memory across n-machines using SQL.


kdb+ is also very fast for real-time time series analysis and signal generation. Seeing Morgan Stanley and Credit Suisse in the customer list made me wonder if memsql could become a competitor in the niche that kdb+ currently dominates?


Yes, if we note that there are core niches where nobody will replace kdb+ any time soon.


Great design on the site! Congrats to the memsql team


why is it that every company that has a blog either, a) doesn't have a link or b) buries the link to their main product site on their blog?


Often because the blog is run on a separate software platform from the main site.


Still no excuse.

I've educated multiple vendors on the subject. E.g., see the last point on http://www.strategicmessaging.com/marketing-communications-t... :)


Where is the source code?


It's commercial.


what are the major feature updates on this release compared to last one?


there are a lot of new features! Beyond distributing data across multiple machines in a cluster, there's more SQL surface area, multiple levels of redundancy for HA, and a distributed query optimizer. Some cool stuff with bi-directional lock-free skiplists too w.r.t. indexes





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: