Congratualations to the RethinkDB team on this! I'm sure it was no small feat to get RethinkDB running on Windows and between this and the Jepsen results, they are clearly doing very well right now.
Particularly concerning is this combination which makes me think I'll have to shard like I would with MySQL rather than relying on RethinkDB in certain situations:
I mean, don't get me wrong, its a hard problem to solve and overall I think the product will work out but it does seem to have some limits on its horizontal scaling and it seems like you are better off stopping somewhere in the 5-9 node range then scaling those vertically then sharding on clusters of nodes.
We're working hard on fixing every bug and performance issue in the system, so the explanation below isn't meant to shirk responsibility. That being said...
Every large system (including the Linux kernel, MySQL, Postgres, etc.) has a large number of bugs lurking under the hood. These bugs are often edge case scenarios that are specific to a given workload, hardware/software configuration, or some other aspects of a given user's system. I'd be careful about looking at the tracker and drawing conclusions -- in my experience bugs like these aren't representative of the overall experience of using the product. (That's not to say that RethinkDB is as stable the Linux kernel, I'm just using that to illustrate that bugs are unavoidable and don't necessarily color the experience of the average user).
Again I wanted to give a different perspective from the POV of a maintain of a large project. We're always working on fixing all important issues that prevent people from using Rethink, so I don't mean to imply in any way that we aren't responsible for these.
I don't mean to imply you weren't working hard or that these bugs were unusual for any complex project. I'm just getting the impression that the current version of RethinkDB you end up scaling like this if you were attempting to be safe:
1) Get to 5-9 Nodes
2) Vertically scale this cluster.
3) Shard via adding additional clusters [e.g. 5 Clusters of 7 Nodes] since ~30 nodes for certain workloads you run into performance issues.
Not quite true -- there are many deployments with >30 nodes in the wild. Some deployments with >30 nodes did run into issues, but I think a lot of that is extremely user-specific (and if you get to that level, RethinkDB engineers will help with all the custom issues you might run into).
I really don't like RQL. Why couldn't it just be a document flavored SQL? Having a DSL for writing in-database procedures is fine. But tying all external queries to a Builder is, from a purely aesthetic POV, pretty ugly IMO.
But that can be addressed down the line I'm sure. If nothing else by other people just building new DSLs on top of the Builder interfaces.
(Note: Yes, I realize the Builder supports map/reduce as well. In practice I'm skeptical how well that works in an on-demand fashion.)
Having only one fully-managed host available is a minor negative. Though we use Cloudant right now so that's a bit hypocritical perhaps.
Lack of integrated full-text search makes things (application-side) a lot more complex than Cloudant or a traditional RDBMS with integrated full-text search. Maybe not in a "look what I can do" prototype-ish phase, but definitely in a "what would Jepsen do to my system" sort of scenario (see the "Warning" here: https://www.rethinkdb.com/docs/elasticsearch/).
That means you're going to want to routinely compare and reindex. Which ugh.
That's just my 2c on why I'm not super excited about RethinkDB. Doesn't seem a whole lot to differentiate it from other NoSQL databases outside of being a better MongoDB. It doesn't integrate full-text-search like Cloudant. It doesn't have transactions. It's got this streaming thing. Which seems like a pretty niche feature. But maybe that's just me.
It is free though, so that's definitely something. If I were a cash-strapped startup that might be enough to move it near the top of my list. OTOH if I'm paying for a fully managed, hosted service that matters a whole lot less.
Before I reply let me clarify that I don't work there anymore and my thoughts could have diverged since leaving their corporate hive-mind collective of the sort that employees often claim not to represent.
> Why couldn't it just be a document flavored SQL? Having a DSL for writing in-database procedures is fine. But tying all external queries to a Builder is, from a purely aesthetic POV, pretty ugly IMO.
Having raw SQL in an application is insane. Being able to connect directly to a database and hand-rolling a query in SQL to look at stuff directly is not. But building such a query in Javascript like you do in the admin UI is fine too (actually, it's way more annoying than the Python or Ruby API's), with the benefit that you have only one way to learn to write queries, instead of two, and with the downside of not having infix operators and having to close many more parentheses and such. There might also be training costs if you have a semi-technical role that would have involved making SQL queries.
I don't believe in an aesthetic POV. Unless ease-of-use is the aesthetic.
> Yes, I realize the Builder supports map/reduce as well. In practice I'm skeptical how well that works in an on-demand fashion.
SQL databases already have aggregation queries, so there's nothing going on that they wouldn't already do. Map/reduce isn't, like, a big deal, other than in the mundane way that a table scan is generally a big deal.
> SQL databases already have aggregation queries, so there's nothing going on that they wouldn't already do.
That's true. I guess it depends on implementation. I was imagining something like CouchDB, which has to parse each document and run your map and reduce functions over them to build an index. Not something you'd want to do on-demand.
> Having raw SQL in an application is insane.
That's just like, your opinion man. We could argue all day on that one, but suffice it to say it's definitely one of those Novice -> Intermediate -> Expert -> Master circle-of-life things IME.
You can syntax check SQL. It's easily understood by most developers regardless of background. You probably won't have to spend more than 5 minutes getting up to speed with whatever driver you're using. There's never been an app that didn't require major refactoring after swapping out the database layer. There's just very little point in abstracting it IME. I've written a number of O/R Mappers in my time, including creating the second most popular Ruby one.
I wouldn't do it again. ;-) I'd use something like http://slick.typesafe.com/doc/3.1.1/introduction.html#plain-... write a few type-conversion protocols and call it a day. It's safe, simple, productive and can be maintained by anyone. Any downsides are pretty minor IMO.
The comment on aesthetics is just that the primary interface to the database has a "JavaScript First" feel to it. It's an odd choice (to me) to use a context bound Builder as opposed to an AST Builder you just pass to connection.execute or something.
> It's an odd choice (to me) to use a context bound Builder as opposed to an AST Builder you just pass to connection.execute or something.
But that's what RethinkDB's query builder is. You construct a free-standing AST, let's call it x, then call x.run(conn) or (IIRC) conn.run(x) to run it.
Edit: > You can syntax check ... O/R Mappers ...
It's not a dichotomy between SQL and O/R mappers here, and sorry if you weren't trying to imply that. An SQL query building API can easily be understood by developers that know SQL (correct me if I'm wrong), you don't need to syntax check it, and it survives refactorings just as well. And it's hard not to need to dynamically construct queries anyway.
> But that's what RethinkDB's query builder is. You construct a free-standing AST, let's call it x, then call x.run(conn) or (IIRC) conn.run(x) to run it.
That's not what http://rethinkdb.com/docs/quickstart/ says. It shows a connection that exposes context-bound Builder methods with trigger (insert, changes, run, etc) methods.
Though that is a little more high-level than I thought at first. You're right that not everything is a `run`.
It feels like a stretch to call it an AST but maybe that's just a question of taking a closer look at the wire protocol.
> An SQL query building API can easily be understood by developers that know SQL
It can, but it's more complex with more mental overhead. It's the difference between Active Record scopes and HQL. Where HQL is a flavor of SQL with extensions for object notation that can feel pretty similar to qualified database.table.column syntax, AR scopes are (I feel) objectively more complex.
> you don't need to syntax check it
You actually do. It's just your interpreter or compiler that does so. A compiler plugin (for native support) or embedded DSL such as LINQ or the Slick DSL can accomplish the same for SQL though.
> it's hard not to need to dynamically construct queries anyway
True, but IME a Builder based API is a poor choice for that unless the scope of the Builder usage is private to a single class. Once you start passing around a mutable builder you can easily end up with unintentional side-effects.
In SQL builders it's easy for example to override a sort clause to be incompatible with a previous aggregation if the builder doesn't go to a lot of care to make that impossible. And then you might well also end up in a situation where you need to discard previously defined scopes (such as the problematic aggregation). How do you do that?
IME once you let a builder escape a very narrow scope (class-level at the most) you create a great maintenance burden and create a new category of very difficult to avoid bugs.
Interpolation/materialization of terms into an AST is a much simpler, and encourages simpler designs (IME). IOW: Writing builders should be up to the user. If the "driver" provides it, it's too easy to fall into anti-pattern traps with unintended consequences.
That's just my opinion. But bottom line, the API is much too high-level for my taste. I'd prefer something much closer to Cloudant, with a well defined syntax (if you imagined the HTTP end-points query-parameters folded into the request body instead; query vs update APIs aren't super consistent on which goes where, eg: _deleted or _rev vs keys or include_docs).
Such a high-level interface for building queries being provided for the driver is fairly unique among the databases I'm familiar with. And while it seems like a nice affordance for JavaScript users, it feels like an anti-pattern for other platforms.
I don't mean to trash RethinkDB though. On the other end of the spectrum you have DynamoDB's very cumbersome API. If I had to pick between the two I'd take RethinkDB's any day. I just think a more formal syntax for other platforms would make it feel less JavaScript centric and more at home on other platforms.
> That's not what http://rethinkdb.com/docs/quickstart/ says. It shows a connection that exposes context-bound Builder methods with trigger (insert, changes, run, etc) methods.
I think the confusion stems from the fact that the Quickstart guide assumes that you're running queries in the Data Explorer, a web frontend for prototyping queries. In the Data Explorer, clicking the "Run" button is what triggers the execution of the AST.
If you wrote something like `r.table("tv_shows").insert(...)` in your application code, it wouldn't do anything except for returning an AST object. You can store that object, or call the `run(conn)` method on it to send it over a RethinkDB connection and execute it.
Note that the `r` object in these queries has no state. You can think of it as a namespace that serves as a starting point for building queries.
> That's not what http://rethinkdb.com/docs/quickstart/ says. It shows a connection that exposes context-bound Builder methods with trigger (insert, changes, run, etc) methods.
There isn't any sort of builder or any sort of context that anything is bound to. insert and changes aren't anything like run -- they build an AST that describes an insert query, and then you have to run them with run.
I prefer simple rowmapping to ORM making your queries for you any day. On previous java project, all the SQL was actually in XML files that were loaded by the repositories by which they were used. Debugging that was much easier (especially when it came to performance issues) than having an ORM because there was no ORM to reason (and possibly be wrong) about.
As far as cutting down on repetition with queries, I found that using some higher order functions (essentially templating, at that point) would clear that right up.
The DSL problem is probably relatively straightforwardly fixed (just write a DSL that translates to builder calls), but I think that RQL sprung up from a disdain of DSLs that were built by mongo (and probably in some part SQL), that often become overly complicated, hard to use, and sometimes use obtuse/hard-to-remember terminology.
The other missing-add-on points are valid of course, but they're just about as equally valid for any document store you can find. I don't speak for RethinkDB, but as far as I can see they do not claim to offer those things, so I don't know if it should be held against them that they don't. Of course, it makes sense that you're not super excited for it because of those missing features you want.
However, I think you'd be hardpressed to find another document store that has gotten the rest of the things it does as right as RDB has gotten them. RDB has iterated their way to a really high quality document store, yet (I find) does not get the press to match how well it performs (and how many big pitfalls it avoided).
That sounds pretty fair. I think the RQL Builder issue is a bit bigger than that since it would make writing your own DSLs a bit more awkward, with more overhead. I think a simple published protocol would be the better alternative. But even so, it's not insurmountable and there are databases with much much more awkward APIs. RethinkDB's isn't all that compelling IMO, but at least it's not actively hostile.
Cloudant gets most of the rest correct. Their (dedicated) clustering has been largely flawless IME. Their availability probably the best of any system I've used (database or otherwise). Very very impressive. And their integrated Lucene takes a whole category of issues out of the equation.
They don't have transactions either though. Which is a bigger issue than in RethinkDB since they also don't have joins. If you need to dematerialize a navigable tree into your documents that's easy enough. But what happens when the source-of-truth for your tree (another document in my case) changes? You have to reprocess the entire live database. The fact that you can't do that atomically mean you either: A: Hope for the best. Or B: Do a lot of extra work implementing processing that will (hopefully) converge on eventual consistency.
I feel like RethinkDB is moving up my list during this thread though. So there's that. :)
The lack of integrated full-text-search that provides decent guarantees is a real sticking point, but outside of Cloudant I can't think of another NoSQL solution that attempts it.
I'll say this though: If RethinkDB resolved the search issue, it would probably be THE DB to use IMO. RQL (despite the space I've given it) is pretty petty in comparison. And lack of multi-document atomic transactions are much less of an issue with the schema flexibility afforded by JOIN support.
Even without resolving it I think I'm convinced enough to give it a closer look.
> However, I think you'd be hardpressed to find another document store that [...]
CouchDB. Older by 6 yrs, so probably can't compete with RethinkDB on wow-yet-another-new-fresh-backed-tool-with-fancy-webpage-and-admin-ui scale ;)
In everthing else couchdb has more features than rethinkDB. And is much more mature, but somehow this word is understood backwards these days. Technology ageism I suppose ;)
CouchDB doesn't do joins, which can impose serious schema design changes depending on your application. map/reduce is done at index time. You wouldn't ever want to do it on the fly.
The base CouchDB also wasn't clustered last I looked (a year ago maybe?); you had to use BigCouch, which was an OSS contribution by Cloudant that lagged both Cloudant's own offering and CouchDB releases. It was supposed to get merged into head though so there's a good chance that actually happened while I wasn't looking.
If you're just looking at CouchDB though and not Cloudant, it's not all that compelling (IMO). No clustering. No integrated search. No transactions. No joins. Pretty shallow server metrics in the dashboard. CouchDB doesn't even really have queries AFAIK. Just index scans.
You might want to look at https://github.com/mfenniak/rethinkdb-net (this is an alternative C# driver with a different API). We'd love to get an official C# driver in soon, so if the community has suggestions on how to improve the API (and which driver feels more idiomatic), we're all ears!
That would be amazing, but potentially also misleading.
My 2 cents for Slava & co: The idea of IQueryable<T> is that it implements all LINQ "operators" (a weird LINQ term; they're just methods on an interface) just like IEnumerable<T> does it on in-memory data structures. But internally IQueryable<T> implementations typically build an SQL query instead.
So IQueryable<T> is designed for a use case somewhere halfway between an SQL query builder and an ORM.
I don't know ReQL well enough, so you'll have to assess this yourselves: If you can map all or most LINQ operators into ReQL, then you have to do the effort and implement IQueryable, it's too awesome an opportunity to let go.
But if the match is only, say, 50%, plus there's a couple of things Rethink can do that isn't easy to express with LINQ, I'd strongly recommend not implementing it at all, but designing your own, C#-flavored, query interface that just resembles the LINQ operators wherever possible (e.g. use "Where" instead of "filter").
Thing is, developers expect that anything they do on an IQueryable is transformed into a single gigantic query and optimized and executed once, on the database. Maybe Rethink can't guarantee the same performance characteristics, simply because it's no relational SQL-ish database. It would have to implement certain parts of IQueryable in client-side memory just to be compatible. Do that, and you'll end up fooling a lot of developers and they'll have a hard time finding out where their speed went. The other option is runtime NotImplementedExceptions for all unsupported methods, which really is just as unfriendly.
If you're in C#, the best NoSQL database for you is RavenDB[0].
First class LINQ support. Transactions. Full text search. Blazing fast. Open source. And it creates and maintains indexes for you based on your app's queries; database machine learning that keeps your app fast.
Disclaimer: I'm a bit biased, as I've contributed to RavenDB. But it really is a great document database. I use it on nearly every project, personal and professional.
From the little I know about C#, the LINQ (IQueryable?) interface is very well designed/used -- maybe some work could be done to look into exposing that sort of API?
I can at least say that the C# API looks extremely consistent with rethink in other languages.
>the LINQ (IQueryable?) interface is very well designed/used
It's very well designed/used if someone else does the heavy lifting of creating the data provider for you. Speaking as someone who has done it, writing your own IQueryable LINQ provider is a serious hassle, though fortunately it only has to be done once. If it's something you really wanted, I would try to lean on RethinkDB to do it rather than implementing it yourself.
Oh that's what I meant -- RethinkDB is thoroughy open source though, so I'm sure if someone in the community found the time to do it, the rdb devs would give it a look and improve the code, and possibly merge.
Well, they are looking for people to help out, if you have time and can invest in it you could be one of the contributors I'm sure. Otherwise I'm sure with Windows being a platform for it someone will come along and work on either a new driver (API) or improve the existing one.
Not all of them, but yeah, it's close. The main issue with 3rd party libraries is keeping them up to date with the official drivers. The way this driver is written, updates can be automated
This is one of those databases I wanted to eventually arrive on Windows, I really enjoyed the look of it and wanted to try it out, and although I have used from time to time Linux as my main OS I just don't feel comfortable using a database I can't run on multiple platforms, it seems like a barrier not being able to use it locally from Windows, although Vagrant and vagrant boxes like Scotch Box are making this "issue" a little less relevant these days, it's still nice to be able to run a database without installing an entire OS. Thanks for the work and time invested in bringing Rethink into Windows, it is truly appreciated.
RethinkDB uses custom abstractions to hide the lower-level OS-specific code. We were able to port the low-level code to Windows with very little changes in these abstractions, and thus minimal changes to the rest of the code base.
The proper way to use libuv would be to replace those abstractions with libuv's own, which would affect a lot more code and possibly break a lot of implicit invariants.
In retrospect, perhaps we should have given alternatives like libuv, libevent or boost::asio more consideration.
I love how they're reaching out to C# users (and conversely I love how MS is reaching out to Linux and OS X users by opening sourcing their CLR, giving away their IDE, integrating Node.js and JSX as first-class citizens, etc!) but Rethink-ers if you're reading this - I briefly perused your source code and to a C#'er (who has written his fair share of JS, both in the "Visual Studio ASP.NET and SharePoint" style, as well as the "node.js" style) -- this doesn't read like idiomatic C# at all. I'm sure it works well, but the API feels like someone who writes JS for a living was given a deadline to add this feature and tacked it on half-heartedly.
There's no reason in this day and age that you should be have any deps on stuff like Cygwin, which you claim you don't:
>> You won’t find any POSIX compatibility layers or other similar hacks–RethinkDB uses native Windows APIs on the Windows platform.
It looks like you do[1] from the whole Cygwin section and the env var's you have to add to PATH (in a somewhat inelegant way). That dependency "tree" is... extensive to say the least and a lot of it can be eliminated or eased with a combination of NuGet and chocolatey. Having in your instruction notes to build Goog's RE2:
>> Build it somehow (I forgot to record how) [2]
doesn't really make me too confident that you're going to support Windows as a first party platform.
The API exposed to the end-user should be conventionally written in a more LINQ-y way - I.e. Fluent style code should be written with Object.Blah().Filter(x => x % 2).ToList() or whatever. Just little idiomatic things like that are pretty pervasive.
Regardless, thanks for investing the time on working for this. As a curmudgeon and someone who spends his spare time reading journal papers in the ACM about concurrency models from hardware transactional locks to ICFP papers which implement similar concepts on the exact opposite side of the spectrum, it's my duty to be wary of new technologies and slow to adopt them into production-- but RDB and VoltDB are two products I really think are technically sound. Contact me if you want a more thorough critique to make this more idiomatic, because your product really is great!
Edit: I was referring to this revision of Windows.md, which at the time was what was available on their master branch. My critique largely revolves around that version. Proceeding my post, another branch's version was swapped in, to a more clean version in which the build phases of the dependencies are removed, and replaced with pre-compiled binaries as you now see:
https://github.com/rethinkdb/rethinkdb/blob/next/WINDOWS.md This solves the aesthetics issue but just pushes the problem into another domain. You're using binary deps (some of which are cygwin'd afaik, but I haven't taken IDA to it so I can't say definitively) but if you want to build from "source" a la the FreeBSD ports way -- where every single .c or .cc is available for one to see and modify as their environment demands -- the problem's still there just pushed under the rug.
We've since simplified the build process, and it can now build most dependencies automatically.
The only exception right now are the web UI assets which still need to be downloaded separately or copied from Linux (building these on Windows will come later).
We don't link or compile in any Cygwin code, and use all Windows APIs directly. The build system uses some Cygwin tools though.
Incidentally we considered using Cygwin to achieve Windows compatibility at some point, but found that it didn't implement some of the lower-level APIs that RethinkDB uses on Linux.
"Developer preview" means we're still going through internal testing processes. The code review is still in progress (the windows port hasn't been merged into next yet), and the build hasn't been widely tested in the wild.
Everything seems stable and there aren't major known bugs, but there is still work to do before we can label the windows port production ready.
A developer preview is a prerelease of a user software package that's potentially a platform for other software, targeted for developers who would potentially build applications for the package's user base. An OS like windows may have development previews, as tgerr would be developers who would want to start early to develop apps on tge platform. Sublime text may release a development preview in order to allow developers of plugins to start early to update thir plugins to the new version. But a database is not a user application, it is a developer's tool or a component of an application. Thus a developer preview of a thing that will never directly face a user is not logical. You want to say release candidate or simply beta instead. Developer preview implies extensible user application.
haha, it's more that it's rare that I see such detail put into ui/ux/branding in many/most products, especially ones that aren't geared toward common end users. big kudos :)
Does it make any sense from licensing point of view to host database on windows if there is a Linux version available? As far as I understand each user hitting that database directly or indirectly via an app needs a CAL for that Windows Server. Outside of an enterprise (ie. Cloud) it gets veery expensive quickly.
Enterprises embraced Linux long ago so not sure if rethinkdb will have much ROI on this effort.
The HN filter bubble sometimes forgets this, but there are a lot of developers out there who use Windows (even if they don't always deploy to Windows).
Not having to set up a vagrant box just to give Rethink a try may seriously influence developer adoption in this space.
Also while I might personally be able to get by running my DB on a linux VM some friends I'd like to share my apps to run for themselves aren't OK with linux at all.
The docker rethinkdb images made this a breeze but only helps my development.
Depends if your only options are SQL Server or Oracle (it happens a lot with enterprise software).
In that case, SQL Server is often the less costly option. I would do everything possible to avoid an Oracle product based on their licensing schemes. Architecture by licensing is not a fun exercise at all.
The front end servers used to host your website would generally be considered as running “web workloads” and CALs or External Connectors will not be required to access these servers. Once the customer adds a widget to their shopping cart, creates an account and enters their credit card and shipping information to complete the sale – they are now authenticated via your back end commerce servers/application (non-web workload). Since users are accessing the backend commerce servers which web workloads are not running – CALs or External Connectors will be required for users to access these back end servers.
I thought this would be obvious, since Server + CAL Licensing would make any public site unfeasible, and there are many large sites running on the windows stack.
Also I read some Ms documentation that a windows server that runs the database for web workloads may acutally come under web workload. I can't remember that document however.
I wonder what in post caused such a downgrade? I'm honestly thinking about ROI for Rethinkdb, I posted link to licensing FAQ from MS in one of my responses for clarification.