Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Announcing MoSQL (stripe.com)
390 points by nelhage on Feb 5, 2013 | hide | past | favorite | 113 comments


FYI you can store unstructured data in PostgreSQL (and query it) with the introduction of hstore. So knock one more reason to use MongoDB instead of PostgreSQL off your list. (Disclaimer: the length of my list to use MongoDB has always been a constant that is less than one.)

http://www.postgresql.org/docs/9.1/static/hstore.html


Wow, hstore really isn't a great alternative to an actual document DB. The "better" Postgres option would be a JSON type and functional indexes.


There is a JSON type but it just validates content.

HSTORE can be fully indexed (gIST and GIN). Just have to roll your own object graphs for nesting if that's what you need to do.

I swear I have typed this exact same comment previously. Deja vu, maybe


JSON type gives you some typed values within the doc, multi-level nesting, etc. You can add functional indexes (http://www.postgresql.org/docs/9.1/static/indexes-expression...) to index specific attributes within the JSON, do legit sorts over values, reasonable array queries, etc. It seems much, much closer to what Mongo does than anything you can do with hstore.


I think you just restated my comment. Do you believe expression indexes do not apply to HSTORE?

I consider both HSTORE (key/value) and the current JSON type and record functions are just intermediate steps to a fuller API [0].

[0]: http://www.postgresql.org/message-id/50EC971C.3040003@dunsla...



> JSON type and functional indexes

Those "Indexes on Expressions" are really a great feature that can also be combined with XML (not just JSON) and any other types. I recommend everyone to have a look at those:

http://www.postgresql.org/docs/9.2/static/indexes-expression...


Is there any way in those expressions to parse JSON and perform arbitrary calculations - i.e. like CouchDB views?


Sure, you can write server side procedures in f.ex. javascript that do arbitrary things with the json.


To be fair, one should note that "only" the languages C, Python, Perl and Tcl [1] are officially supported by PostgreSQL. Also, there are 3rd-party bindings for other languages such as Java, PHP, R, Ruby, Scheme, sh:

http://www.postgresql.org/docs/9.2/interactive/external-pl.h...

However, the PostgreSQL documentation doesn't mention JavaScript support anywhere. Are you sure there exists mature PL/JavaScript binding for PostgreSQL? If so, their docs should be updated.

[1] There's also "pgSQL", but that's a special-purpose language you won't find outside the database world. I don't recommend learning it unless you have strange requirements that make PL/pgSQL a perfect fit. For normale usage, use PL/Python or PL/Perl. In simple cases, use SQL directly.


Yes, #1 hit when you Google for postgresql JavaScript: https://code.google.com/p/plv8js/

I can't vouch for any particular maturity level but seems to have active users and it's been around a few years already.


> So knock one more reason to use MongoDB instead of PostgreSQL off your list.

One of the reasons MongoDB is so popular is because it is an fantastic database for developers. As a Java developer I can deal in my code with sets, hashmaps, embedded structures and have it effectively map 1-1 in the database. It's akin to an object database meaning you can focus higher up in the stack.

With the SQL ORMs you can't avoid having to deal with the ER model.


Of course, the problem with that approach is you don't have anything enforcing any sort of data integrity below the application. In my experience most of the time you actually can put down on paper a schema and a set of rules the data should obey without too much fear of it changing dramatically. The nice thing about hstore is it allows you the flexibility to introduce unstructured data in just the places where a schema is unknowable or not worth the complexity.

MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.


> MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

Unless MongoDB et al are saying "always use MongoDB et al and never an RDBMS", then I'm not sure how you arrived at the conclusion that "the schema is never worth the complexity."

If anything, the appropriate assumption is, "schemas aren't always worth the complexity." When they are, you use an RDBMS. When they aren't, you don't bother with the data integrity constraints.


The "right tool for the job" mantra often cited whereby you run N different data stores for different use cases heavily discounts the true implication of running multiple data stores: you have to run multiple data stores. You have more ways to get burned by your lack of expertise. You need more eyeballs for the same amount of confidence in your system since those will probably need to be different types of experts. You need to know how to monitor them and tune them. Discussion about which data store to use for a given use case becomes a constant drag on discussions. There is less consistency in modeling since you have to work with multiple paradigms. Your software needs to be built to be able to deal with multiple data stores. All your export/import/backup/etc software efforts that are 1-to-1 with each data store need to be multiplied.

The bottom line is if you drop in a second data store because you have a few fields in your database that are a pain to model with a schema, you are doing yourself a disservice compared to just doing ALTER COLUMN foo hstore.

My colleague mcfunley wrote an article about this blind spot when people talk about these issues:

http://mcfunley.com/why-mongodb-never-worked-out-at-etsy


While I agree that is often a blind spot, it is a red herring to this statement made by you:

> MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

You may well argue that if you have N-1 applications using PostgreSQL, and the Nth application could---on its own---justifiably use MongoDB, then it is still appropriate to use PostgreSQL in favor of not adding Yet Another DB Engine.

But that is nothing more than a specific case that is often ignored in the "best tool for the job mantra". It does not mean that schemas are never worth the complexity of an RDBMS.

All I'm saying is that you can't claim that a recommendation of MongoDB assumes schemas are never worth the complexity; you can only claim that the assumption is that they are sometimes not worth the complexity.

More generally, MongoDB makes no assumption that contradicts "years of DB research."


Please don't confuse problems with SQL ORMs with SQL itself. SQL stores are powerful, flexible, and quite easily queryable. MongoDB is only a good database for developers if it solves the problems that you need to solve in a way that causes no impedance mismatch.

And for the record, we use both a SQL store, Redis and MongoDB where the use case suits it where I work.


I am under no confusion. SQL ORMs all suffer from the same problems (which is forced by the underlying SQL model) that MongoDB does not.

And your whole "use the right tool for the right job" goes without saying. It's others who seem to be obsessed with this "SQL is perfect for everything" delusion.


I'm under the impression that Mongo is merely hiding away some complexities, instead of truly resolving them.


Can you be more specific about what you mean by that?


Some of the many mistakes I have made are that of being " lazy" and instead of creating some tables and some schema structure and all the related code I just serialized the stuff and stored it as a string in the database. A few month later or less a very simple be requirement arise and I have to filter by a thing inside the serialization string. Then I say ok, let's index separately the thing, or store it independently, or do two pass filtering, but none of these beats the very simple query I could have done if I had done the right thing in the first place. I hid the complexity away but it came back to me weeks later and then it really stunk: the choice is between many evils, migrate existing data, duplicate the thing, select by regexp, etc.


impression (Noun) 1. An idea, feeling, or opinion about something or someone, esp. one formed without conscious thought or on the basis of little evidence.


This is pretty cool but I'm struggling to see what the use cases are, atleast for analysis. There might be quite a bit of benefits for running application code that I'm not aware of. With regards to analysis though, their own example question is "what happened last night?" but then they go on to say that it is a near real-time data store. Does it matter that it is a real-time mirror then?

I've always liked the paradigm of doing analysis on "slower" data stores, such as Hadoop+Hive or Vertica if you have the money. Decoupling analysis tools from application tools is both convenient and necessary as your organization and data scales.


(I wrote MoSQL)

PostgreSQL scales surprisingly well for this purpose, and is much nicer for interactive queries than Hadoop/Hive. We use Impala[1] for some larger datasets, but Impala is comparatively new, and it's nice to have something as battle-tested as postgres here.

As for the "why do we need realtime?": In my mind the benefit of a near-realtime replica is not that you actually often need it, but that it means you never have to ask the question of "Was this snapshot refreshed recently enough?", and never end up having to wait several hours for an enormous dump/load operation, when you realize you did need newer data.

[1] http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-t...


Hey! Always cool when the author responds :)

I do agree that PostgreSQL would be nicer for interactive queries. Waiting for a M/R to spin off is a bit of a buzzkill.

With regards to your usecases, what sort of questions have you found yourself answering the most? Do you have analytics applications running off of this?


I agree with your points that PostgreSQL (or RDMS in general) is really good for certain type of reporting / analytics use cases while hadoop/hive is awesome for handling billions or rows + TBs of data.

How was your overall experience with impala ? Did you guys have a fairly new hive cluster to try it out or did you just spin up a new one since impala can only read certain file formats (i.e. no custom SerDe).

Also, for hive/hadoop datasets, is that more for just data exploration, while this PostgreSQL solution is for smaller datasets which return in a few seconds and would not perform well in hive due to the cost of setting up a mapreduce job ?


Nice work. I spent much time last year building a system that imported MongoDB data into Oracle, having to do everything to speed up bulk loading new data into Oracle (and archiving old data). Something like this would have worked much better, and I suspect it might not be that hard to make this tool work with Oracle.


Out of curiosity, did you guys consider running SQL on MongoDB using Postgres foreign tables? What were the pros and cons of that approach for your use-cases?

(In full disclosure, I wrote mongo_fdw for PostgreSQL.)


I actually prototyped our PostgreSQL solution using mongo_fdw (Incidentally, I throw together Debian packaging here, if you're interested: https://github.com/nelhage/mongo_fdw).

Our experience was that mongo_fdw doesn't (yet?) give postgres enough information and knobs to plan JOINs efficiently, which is one of the things we wanted. I got a decent amount of leverage out of using mongo_fdw and then cloning to native tables using SELECT INTO, though :)


That's neat. :) For mongo_fdw, we're expecting better join performance once we upgrade to PostgreSQL 9.2 FDW APIs and start collecting table statistics.

I'm sure there are other factors involved for MoSQL, but they are probably outside the scope of this post. I'd love chat about them offline.


That's your preference, but I've not often found an occasion were someone (a client/stakeholder) said, "Yeah, go ahead and give that to me slower rather than faster." It just never seems to happen.

I'm thinking of this as something like polyglot memoization. Pretty cool when you think about it. Frequently need something that is slow in NoSQL, but fast in SQL? Memoize it to your SQL datastore. The alternative has always been to write it to two places. I kind of dig moving this out to the datastore to figure out.

I'm thinking that plenty of people will find this useful.


That's why I'm curious what sort of questions they are answering with this tool. If the bulk of their questions are a variant of SELECT COUNT(DISTINCT user_id) FROM table, then yes, this would be convenient to have. But if their questions start to revolve around transaction cohorting or path analysis where there are potentially hundreds of millions to billions of transaction_ids with some gnarly JOINs thrown in for good measure, I would be surprised to see this scale.


I totally agree with this. I don't understand why you would want to translate MongoDB to MySQL for? What are the advantages to doing a full data dump? There's an overhead converting from MySQL to MongoDB as the two formats are completely different. What about the edge cases when one format doesn't support the other.


They each have different advantages and strengths.

MongoDB is great for failover and for rapid development or prototyping. SQL is great for reporting or analytics, since you can do all kinds of aggregates and JOINs right in the database.

The edge cases where you can't represent the data perfectly aren't a huge deal for this use case -- because it's a one-way export, you don't have to be able to round-trip the data, and as long as you can export the data you want to run analysis on, it doesn't matter if there's some you can't get.


Reading the headline I thought they were introducing a SQL like interface to their API, sort of like FQL for Facebook and I got a little excited. Something like this to get the email addresses of all your active trial subscribers:

SELECT c.email FROM customers c, subscriptions s WHERE c.subscription_id = s.id AND s.status = "active" and s.trial_start IS NOT NULL;

(where of course the customer and subscription tables would be a virtual view on your customers and subscriptions)


Hm, that could be pretty cool actually. Especially if we also added a REPL for interactive queries at manage.stripe.com.


You're welcome Patrick. I'd recommend looking at Antlr4 to parse the "StripeSQL" commands.


Thanks for the Antlr reference; I've been meaning to learn more about it and you've pushed me to finally start!


Isn't there a postgres foreign data wrapper than lets you do this?


At what point do you abandon mongodb and just use postgresql?


At no-point. Stripe is doing it right. They are using the right tool for each job. Mongo for storage speed etc and then postgres to analyze query etc.

This kind of comment shows how little knowledge you have about NoSQL and SQL. Is not a SQL vs NoSQL, it's about using the right technology for the job.


> This kind of comment shows how little knowledge you have about NoSQL and SQL.

The question is perfectly valid. In many scenarios (not necessarily Stripe's), PostgreSQL is fast enough to do the job. Stop putting people down for legitimate engineering questions.


>This kind of comment shows how little knowledge you have about NoSQL and SQL.

Try not to be condescending and your point will be better received. "Right technology" as I'm sure you're aware, has as much to do with subjectivity as appropriateness. Familiarity, workflow, ease of use (and did I mention familiarity?) cannot be overstated even when the perceived benefits are considered.

Read: religion.

Some of the people who rally against NoSQL may be deriding it from a knee jerk reaction, however others are simply frustrated with developers who, as Ted Dziuba would say, "value technological purity over gettin' shit done".


are you kidding me? There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company. Postgres is more than capable of sustaining the necessary speeds of a startup.

Relational databases were created in the first place to solve these very problems around transactionality and analytics for finance.

This library is a beautiful example of reinventing the wheel, and otherwise creating a patchwork of unnecessary - and ultimately brittle - infrastructure.


(I work at Stripe.)

Where we use MongoDB, it's not because of speed. PostgreSQL is certainly capable of fast performance. MongoDB is useful for its ability to log freeform data as well as for its replication model. (We use sharded MongoDB in a few places, but mostly use straight replica sets.)

We use MySQL, MongoDB, PostgreSQL, and Impala. They're all useful in different places.


Mongo's probably still got the edge as a JSON store overall, but definitely check out the new JSON object dereferencing functionality coming in 9.3. There's a Russian indexing posse consisting of Oleg, Teodor, and Sasha who have been looking at doing proper indexes for JSON but haven't managed to secure funding. (Disclosure: I think they should get funded.)

These are the same guys who built hstore, full text search, GIN and GIST indexes and I think are working on a generic regular expression index type right now.


> "We use MySQL, MongoDB, PostgreSQL, and Impala."

Thanks for the clarification, but this makes it even more obvious your engineering team is introducing needless complexity into your organization.

Postgres can store unstructured data just fine, so you have a 'solution' that uses 3 OLTP stores instead of one.


PostgreSQL is awful for storing unstructured data. It is the most cumbersome, clunky syntax I've seen for a while and it lacks ORM support meaning you are forced to manually write it.

Making developers productive is an important aspect for choosing a database.


Choosing a data store based upon syntax and slightly limited ORM support isn't exactly a great idea. Both of these things can be improved rapidly with a little code.

More important questions are how is the data stored, how is it accessible, how can you scale the system, what operational constraints are there, how fast is it, what types of data modeling can be done, what consistency/transaction guarantees does it provide, etc. These are the things that will make developers productive because they will not be putting out fires all the time.


well said!


Why do you use MySQL over Postgres and vice versa?


(Clouderan here)

How are you liking Impala? We just dropped 0.5 release yesterday which includes the JDBC driver :D!

Edit: Awesome job on the Ruby client, it's great!


It's been great -- setup was a bit of work (we're on Ubuntu, so had to build from source), but once up and running it's allowed us to do lots of ad-hoc analysis that would have been too hard otherwise.

I've been meaning to write a MoSQL equivalent for our Impala data, but at the moment we're doing a more traditional ETL.


gdb - If you have Impala, Hadoop, and Hive right now. Why use MongoDB instead of HBase and make it all work in a happy harmony?


Awesome! Great to hear it's working out for you guys, looking forward to MoSQL for Impala :-)


We've been pretty happy so far. There have been a few rough edges getting it up and keeping it running, but we've been very impressed with the performance so far.

I've passed your comment on to Colin, who wrote the Ruby client -- I'm sure he'll appreciate it!


I got myself a little Impala Herd server setup, pointed it at my Impala cluster and it's working great ;).


heh, I didn't think anyone would actually use that - I originally wrote it meaning to use it as a tutorial for the blog post, then scrapped that idea.

Thanks for the kind words!


Everywhere I've worked that did high volume transaction processing had an architecture that required a piece like this. Even if you use a relational database for intake, you still need to move the data to another database for analytics. Moving the data automatically via replication sounds a lot better than the typical batch process running at 4am.


Tell this FIS Global.

There is absolutely no reason to make banking system on GT.M but they did.

Although: GT.M is the only(?) NoSQL that is ACID-compliant.


> There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company

Yes there is. PostgreSQL doesn't support multi master replication which makes it a terrible choice if you really want to make sure every transaction gets written. I really wonder at what point people that keep recommending PostgreSQL are going to wake up and realise what is happening in the industry.

People are scaling OUT not UP. Especially startups.


I'm sorry, postgres-xc doesn't work for you needs? [0] It has worked for me in the past.

[0] http://postgres-xc.sourceforge.net/


I would imagine that for your average startup, using solutions that don't even support transactionality will cause greater complexity issues. Especially given the enormous window before db scale out/up becomes an issue on well-designed applications.


Enormous window ?

Many startups would be using AWS and it is not inconceivable that you would have Multi-AZ/Multi-Region VPSs. Scaling out != Expensive.


> People are scaling OUT not UP. Especially startups.

Startups need to scale out because many of them like to deploy on mediocre EC2 instances with the slowest SAN storage ever.

People that keep recommending PostgreSQL are rightfully ignoring this industry.


> Startups need to scale out because many of them like to deploy on mediocre EC2 instances.

No. They need to scale out because providers like AWS have outages. And so startups et al need to deploy in multiple AZ/regions in order to have as close to 100% uptime as possible. You can't do that with a well considered multi master style replication strategy which PostgreSQL frankly doesn't have.

>People that keep recommending PostgreSQL are rightfully ignoring this industry.

Sure. And soon enough they will be relegated to the dustbins of history. The trends don't lie.


"The trends don't lie"

Wah. And you do not even seem to be ironic. Trends always lie, there is always a next thing that will take the opposite direction, in philosophy, in science, and particularly so in computing stuff.


In all fairness, you could use something other than Postgres that's also ACID.


The only advantage MongoDB has over Postgres is built-in sharding, and even that is of dubious value.


To pick one, we like the fact that MongoDB lets you change your schema and add new fields to your documents without having to worry about migrations or keeping track of schema versions, or any of that.

You could build something like that on top of SQL, but it's nice to have a tool where you don't have.


Serious question to you or anyone else who uses schemaless databases. Why is the ability to change schemas on the fly a good thing? Having worked at two companies that did, it was nothing but a recipe for disaster in large groups. Code that was dependent on expecting an integer or a string and not a collection would constantly break because a developer in some other group decided to store a collection instead of a the original data type that was expected. Schemaless databases required more documentation to track changes made between groups and led to more bugs because we could never be guaranteed of what kind of data we would be receiving. I've always thought of a database schema as a contract that makes guarantees to all applications. Why would you want to be able to break that contract?


There's no such thing as a "schemaless" database. There are, however, different ways of handling the storage and management of the schema.

In the situations you describe, and when using most NoSQL databases, there's still a schema. It's just stored in the minds of developers, in documentation that's correct and up-to-date, in documentation that is incorrect and outdated, throughout application code, and numerous other places.

Then there's the sensible approach taken by most relational database systems, where the schema is centralized, it is described with some degree of rigor, and it can be more safely modified and managed.


I've found a good SQL library (like Alchemy), a good migration library (like Alembic) and a DB with non-blocking migrations is much nicer to use, since it makes data migrations very easy.


How does MoSQL handle schema changes and new fields in mongo?

I'm imagining with this tool you start to need to be a bit more careful with the flexibility which initially drew you to mongo.


MoSQL will just throw any fields it doesn't recognize into a JSON "extra_props" field (if you ask it to). So everything will work fine, and existing SQL code (which doesn't know about those fields) will continue to be fine.

If you need the data in SQL, you can either parse the JSON somehow, or rebuild the SQL table with a MoSQL schema that knows about the new fields.


Automatic failover is a pretty big feature though. I wish Postgres had a built-in solution. Sure, I could use Pacemaker but it's no where near as painless.


You should be aware however that Mongo's failover incurs downtime.

A Postgres bouncer + WAL replication achieves a similar result: There is no downtime on failover, but there is a single slave.


10gen also has a nice python app which syncs by tailing the MongoDB oplog to an external source. Most common is Solr.

https://github.com/10gen-labs/mongo-connector/tree/master/mo...

Seems to be high quality, and supports replica sets.


Very neat project. I can see several use-cases for this where I work- It'd be nice to have alternatives means of searching through data.

I'd also like to mention a project I've been contributing to, Mongolike

[My fork is at https://github.com/e1ven/mongolike , once it's merged upstream, that version will be the preferred one ;) ]

It implements mongo-like structures on TOP of Postgres. This has allowed me to support both Mongo and Postgres for a project I'm working on.


Nice. Real businesses need a data warehouse and SQL is the right tool for that job.

I thank them for releasing this.


Maybe I'm misunderstanding your comment but... Real businesses need real solutions for their use cases. SQL is not necessarily the right tool for "that" job.


If such a "solution" involves safely querying, analyzing, storing and manipulating data in any way, SQL and relational databases are usually the best option in practice.

It's much more effective and efficient to use a SQL query than it is to throw together a huge amount of imperative JavaScript code (that's usually very specific to a single NoSQL database, as well) merely to perform the equivalent query.

It's much safer to use a database that offers true support for transactions and constraints, rather than trying to hack together that functionality in some Ruby or PHP data layer code, or relying on some vague promise of "eventual consistency", for instance.

It's much more maintainable, and leads to higher-quality data, to spend some time thinking about a schema, rather than just arbitrarily throwing data into a schema-less system, and then having to deal with the lack of a schema throughout any application code that's ever written.

Aside from an extremely small and limited handful of situations (Google and Facebook, for instance), relational databases are the best tool for the job.


> Real businesses need a data warehouse and SQL is the right tool for that job.

Honestly. I don't think you could be more misinformed if you tried.

Hint: Google "Big Data".


...data warehouses in general mostly use SQL, and lots of businesses use data warehouses successfully. Teradata, Netezza, Oracle, DB2, etc. I'm not sure why his statement was controversial - SQL's a great language for reporting and analytics.


I've had to deal with a lot of NoSQL advocates whose experience with SQL or relational databases doesn't extend beyond MySQL.

Of course, it's understandable why they have a bad impression of SQL; they've only ever used one of the most inept implementations around.

Those who are willing to try one of the more mature and sensible relational database systems usually see quite quickly the value that such systems provide.


Beyond a few terabytes of data, Postgres is just as worthless as MySQL, and every other non-experimental SQL option comes with a "call us" price tag.

If there are production-ready options for biggish data other than NoSQL or high priced commercial analytics dbs, please share...


Considering the space requirements of a DB with a properly designed schema, I have to wonder what on earth you can do that generates beyond a few terabytes of data, and how you can make any sense out of it.

Also, your comment is rather ambiguous, I certainly hope you're not calling Postgresql experimental, because that would be laughable; and there are several examples of multi-terabyte databases using it.


What do you suggest?


Not the OP but I'd say giving Postgres a try is a suggestion.


>Real businesses need a data warehouse and SQL is the right tool for that job.

SQL is NOT always the right tool for the job.

There are plenty of situations where an Hadoop or a Storm/S4 approach works better. Again it's about picking the right technology for the task at hand.


SQL is an excellent tool for a data warehouse, or any situation where there is value in separating data design from application design. In such situations the infamous o/r impedance mismatch is arguably a good thing.



I thought that "young" NoSQLs sometime in will got SQL interface.

Look at old NoSQLs: Intersystems Cache got SQL interface, GT.M (in PIP-framework) also got SQL.

My impression that MongoDB looks a lot like MUMPS storage with globals in JSON.


Is there currently support for "unrolling" arrays or hashes into tables of their own? If not, would definitely be interested in helping to add that on (we use arrays on documents quite a bit, but have run into a number of situations where a simple SQL query for analysis could have quickly replaced a bunch of mongo scripts).


I've added that capability to mongo_fdw, which I use for getmetrica.com. I'll be contributing it back soon (after that 9.2 API conversion). Would be happy to talk to you about the wrapper or Metrica. Email's in my profile.


FYI, the email field from the profile doesn't actually get displayed publicly. Mine is (username) @gmail.com


Whoops. Okay, emailing.


There isn't support. It's definitely something I've pondered. If you're interested in adding support, I'd be happy to hear from you at (my username) AT stripe.com.


Email sent!


If you need to make a tool(and use twice the amount of storage) to be able to "query your data" in a SQL manner while using noSQL, it probably means you are using the wrong tool for the job.


Actually, it is pretty common to replicate the transactional data into another data store for analytical purpose. However, using PostgreSQL as the OLAP data store may not be the wisest move.


Author of MoSQL, did you consider just using the MongoDB FTW instead? https://github.com/citusdata/mongo_fdw


(I wrote MoSQL)

I actually played with mongo_fdw. At this point, it's a really cute hack, and useful for some things, but it doesn't give Postgres enough information and knobs to really let the query planner work effectively, so it ends up being really slow for complex things. I do love the concept, though.


What were your thoughts on MongoConnector? (https://github.com/10gen-labs/mongo-connector/tree/master/mo...)


I love this idea. I can see myself using MoSQL pretty soon. Does it handle geospatial data? Can it replicate geospatial data from Mongo to a Geometry data type in Postgres?


Out of curiousity, but what is the rest of Stripe's stack like? Ruby, apparently, but I'm assuming they don't use any kind of Mongo ORM at all.


Someone should write a client library so you can do ad hoc data aggregation queries without using SQL. You can call it NoMoSQL :)


Also useful when MongoDB blows chunks because it was a crap architectural decision and you quickly port your app to raw SQL...


Can't wait for NoMoSQL


Waiting for BroSQL.


how do you deal with sharded mongo clusters?


(disclosure: I'm one of the founders at Citus Data)

hey, one way to do that is to use the MongoDB foreign data wrapper - also mentioned in some of the earlier threads.

mongo_fdw (https://github.com/citusdata/mongo_fdw) allows you to run SQL on MongoDB on a single node. Citus Data allows you to parallelize your SQL queries across multiple nodes (in this case, multiple MongoDB instances) by just syncing shard metadata. So you would effectively run SQL on a sharded mongo cluster without moving the data anywhere else.

another idea could be to use MoSQL to neatly replicate each mongo instance to a separate PostgreSQL instance, and then use Citus Data to run distributed SQL queries across the resulting PostgreSQL cluster.


MongoDB is great for a lot of reasons - record-level locking? multiple concurrent writes? append-only journals?

I have read than in version 2.x they announce some features, so, it is greatness?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: