Citus 6.1 Released – Horizontally scale your Postgres database

koolba · on Feb 16, 2017

Are the commands in the examples the actual SQL commands for sharding and is the default to have them in the public namespace?

Ex:

    CREATE TABLE states (...)
    -- distribute it to all workers
    SELECT create_reference_table('states');

    SELECT isolate_tenant_to_new_shard('table_name', tenant_id);

While isolate_tenant_to_new_shard() doesn't seem like it'd clash with anything, create_reference_table() seems common enough to exist in someone's code. Why not have this in a citus schema by default?

> In Citus 6.1 Vacuum is now distributed and run in parallel automatically across your cluster.

Did VACUUM prior to 6.1 require one node in the cluster to issue locks on other nodes? If so what for? I'm not intimately familiar with how the nodes in the cluster communicate but would have figured each is acting as a standalone DB and coordinating common updates via 2PC (which means VACUUM was already distributed).

ozgune · on Feb 16, 2017

Fair point on creating these functions in a Citus schema. The reason for this is mostly historical. We started with pg_catalog, haven't run into issues yet, and therefore didn't prioritize changing this [1].

On the VACUUM table side, when the user ran this command, we previously didn't propagate vacuum to the related shards. With this release, we run this command in parallel across worker nodes. (https://github.com/citusdata/citus/issues/719)

[1] Citus' user-defined functions aren't in the public, but rather in the pg_catalog schema. If the user defined their create_reference_table(), that definition would override the Citus definition. You'd then need to call the Citus function by fully qualifying its name or changing your search_path.

As a footnote, if a user defines a create_reference_table() function, it would only override the Citus function if the two functions have the same signature: create_reference_table(table_name regclass).

(edited)

crudbug · on Feb 16, 2017

I just started learning Citus, it is a very promising product.

Couple of things for improvement / may be already supported.

1. New User records do not propagate to all the nodes - Manual Step.

2. New Database records do not propagate to all the nodes - Manual Step.

3. Materialized View with incremental refresh - Currently Postgres re-runs the query everytime there is a data change - For a table of billion records this is very inefficient.

anarazel · on Feb 16, 2017

> 1. New User records do not propagate to all the nodes - Manual Step. > 2. New Database records do not propagate to all the nodes - Manual Step.

The issue here is that both databases, users and some other objects aren't "database local" but "cluster wide" objects (i.e. visible in all the databases of a postgresql installation). As the citus extension isn't necessarily created in all of them we can't reliably do anything about this...

tpetry · on Feb 17, 2017

Automatic failover for master nodes would be a really nice feature making citus a hassle-free solution.

craigkerstiens · on Feb 17, 2017

Hi, Citus Cloud our managed service does have automatic failover available for both the master and the distributed nodes.

tpetry · on Feb 17, 2017

I know, but i was speaking for the community or enterprise edition. Managed services are sometimes not allowed for legal reasons (data privacy).

rattray · on Feb 16, 2017

Can any users of Citus comment on the difficulty/ease of adoption, tradeoffs, and benefits?

rattray · on Feb 16, 2017

Another way of asking: Citus seems amazing. What's the worst part of using it (other than cost)?

jonathanpdenney · on Feb 16, 2017

We've been running on Citus for about 2 months now, with a saas app that has lots of reporting functionality. It's no different than standard postgres in terms of configuration, other than that you have to define a distribution column for the tables you want to be distributed into tenants (e.g. accounts for saas). That's how you really see a significant benefit when working with larger datasets. That and the extra memory they provide over other managed postgres services. For us, "distributing our tables" meant just adding a rails migration using their gem.

We were upgrading from heroku postgres, and chose this over RDS, because they take care of the distribution of data into tenants without us having to change our logic, since everything our users create was already getting associated to their organization / account.

Only issue we've had was at one point we hit our plan's database connection limit of 300 for a few minutes, because of a new long running query we added, that started locking up connections and causing timeouts as soon as we went to production. We reached out immediately and their team jumped all over it to help us free up those connections, while we fixed the query.

rattray · on Feb 16, 2017

Thanks, that's super helpful!

ozgune · on Feb 17, 2017

(Ozgun from Citus Data)

I think the answer to your question depends on the workload you're planning to use Citus for. We find that users rely on Postgres for different use cases and Citus' benefits (and worst parts :)) vary across them.

Common use cases, ordered by popularity & benefits, include:

1. Multi-tenant database: Most B2B applications already have the notion of a tenant / customer built into their data model. Citus allows you to scale your transactional relational database to 100K+ tenants with minimal changes to your application -- and with all the benefits of Postgres.

2. Operational analytics: Citus enables ingesting large volumes of data and running analytical queries on that data in human real-time. Example applications include analytic dashboards with sub-second response times.

3. NoSQL++: As we increased Citus' feature set, we found that users started relying on it for workloads that have high read / write throughputs. We have a few "hidden" features in v6.1 that make Citus even more powerful here. That said, sometimes the open source community is way ahead of us on this. :)

4. Other: These include anything from distributed transactions to real-time notifications. Since we extend Postgres, if you're a power user and understand Citus' underlying primitives, sky's the limit. (We debated adding a section to our documentation for other workloads. We just couldn't agree if we wanted to name it Top Gun or Highway to the Danger Zone.)

Our goal in each release is to add new features that makes Citus stronger in use cases where it's already strong, as well as to listen in to our users & customers and look to expand into areas where we already see good adoption.

tadruj · on Feb 17, 2017

Ozgun, any performance benefits for time-series data?

+1 for Top Gun

doh · on Feb 17, 2017

We're using Citus for almost a year, 8 months in production. Currently running 22 workers [32 core, 3TB space, 120GB RAM] containing almost a trillion (yep) rows in one single table.

Citus has a lot of quirks and one could write a book about it. But for shortness here is just couple of the top things you should know.

- Citus is not PG. It runs on PG, it looks like PG but it doesn't behave like PG. By using Citus you lose majority of advance strategies, especially ones that manipulate data (UPDATE from SELECT, DELETE, etc.). In very short, you have to forget about everything you know about PG.

This means that you have to rethink your table schemas, distribution strategies, etc. This is common in every distributed database (No/SQL).

- Citus deployed in a cloud is a bit sensitive. As the master node keeps track of each worker in almost realtime fashion, any breaks in communication results in corrupted shards. If you don't use just one user (postgres) but have different rights for different users, fixing the shards can be a pain (if you don't pay for the enterprise version, which provides you with helpful scripts).

- Adding new worker is easy, but rebalancing them to spread the shards evenly is not (again different for enterprise version).

- PG's planner can be a bit stubborn while using citus. It sometimes behaves differently from the native PG. The result is that you have to rethink you strategies and rewrite queries up until you don't get the desired result.

- Writing functions and automatic triggers can be challenging. The usefulness of them is determined by how well you designed you original tables. Once the tables are colocated and sharded on the same column you get impressive results. Writing the functions can be challenging though as you have to deploy the function to each table. They offer a script to help you with it https://github.com/citusdata/citus/blob/a2f6b29a6d4537b626f8...

- You do want to consider how many shards you need. More automatically means that Citus gets slower, but you want to be able to grow so chose more than you have workers.

There is much more, but those are the things that you should consider. As for us, we learned to live with the quirks and now we're able to run aggregation queries through billions of rows in under 600ms.

I would suggest to consider Citus only if you're considering something like Cassandra or Hbase.

mslot · on Feb 17, 2017

> Citus deployed in a cloud is a bit sensitive. As the master node keeps track of each worker in almost realtime fashion, any breaks in communication results in corrupted shards. If you don't use just one user (postgres) but have different rights for different users, fixing the shards can be a pain (if you don't pay for the enterprise version, which provides you with helpful scripts).

Citus supports two replication models: statement-based replication and streaming replication. [1]

In statement-based replication the master directly writes to several replicas. If writing to a particular replica fails due to a network issue, then to maintain both availability and read-your-writes consistency, the master marks that replica as inactive to be repaired at a later time.

In streaming replication there is primary and secondary worker. The master only writes to the primary which replicates its changes to the secondary. If writing to the primary fails, then the transaction rolls back and no repair is necessary. If the primary goes down, it can be replaced by the secondary. In this model the master doesn't have to keep track of any state or perform any special locking, which is also what enables Citus MX [2].

Citus Cloud exclusively uses streaming replication with automated failover, meaning shards never have to be repaired.

[1] https://www.citusdata.com/blog/2016/12/15/citus-replication-...

[2] https://www.citusdata.com/blog/2016/09/22/announcing-citus-m...

doh · on Feb 17, 2017

I'm aware of the streaming replication. But the cost doubles immediately. Which in our case represents tens of thousands dollars every month.

Now this is a tradeoff. Do I mind to fix some shards from time to time or burn salaries of 6 engineers for extra convenience.

jpgvm · on Feb 17, 2017

Are you using the commercial distribution or the OSS one?

If the OSS release how have you implemented rebalancing? It seems to me the APIs are there in the extension, just user defined functions are missing to actually do the rebalance.

I have been looking at integrating Citus into Flynn as an option for more scale-out workloads but needing non-OSS features to recover from workers going missing or adding new workers put me off implementing it for now.

doh · on Feb 17, 2017

We're still stubbornly using the OSS version. I wrote my own scripts that help with the rebalancing but it's a pain to say the least.

sroussey · on Feb 17, 2017

Since the OSS version is AGPL, where is your code? Would be interesting to see the full use case.

doh · on Feb 17, 2017

I did not modify anything Citus related, just wrote scripts that are using their API.

Also the code is messy (as everything built for an in-house use).

sroussey · on Feb 17, 2017

It is AGPL, not GPL, so code using the API must be AGPL as well. That's the difference.

sroussey · on Feb 17, 2017

Oh God, please don't quote stackoverflow for legal opinion. An interesting confirmation bias problem. :/

But no copy/paste without attribution, which is the usual thing (and a reason why some companies block SO). So that's good! :)

Love what Citus is doing as well! And looking into the product now, talking with the team, and considering purchasing.

jpgvm · on Feb 17, 2017

That is an interesting point.. if it's AGPL and you are calling user defined functions from your application that are implemented in the AGPL postgresql extension is your codebase affected by virality clause?

Or would that only count when you write new UDFs that call into the extension? Or are neither affected?

Would really love to know because that might affect my willingness to spend time making automated Citus stuff a reality.

anarazel · on Feb 17, 2017

Disclaimer: work for Citus, but am not a lawyer, nor an executive.

There's some additional information at https://www.citusdata.com/faq under "How do I license Citus Data Products?".

I hope that clarifies the intent of using AGPL somewhat?

doh · on Feb 17, 2017

I don't think it does.

> Require users who choose to modify Citus to fit their needs, to release the patches to the software development community.

Do you consider a script that uses your API a patch? In my opinion there is no modification to Citus directly. It's used as intended and the only change is how is used instead of what it does. But it's not clear to me.

doh · on Feb 17, 2017

Based on this [0] the code doesn't have to be shared. Also would not like to publish code that goes against commercial interest of the Citus team, which I like quite a lot.

[0] http://softwareengineering.stackexchange.com/a/314908

manigandham · on Feb 18, 2017

If anyone is looking at the enterprise version then it's probably best to try memsql.com too, which is a much more polished product with a mysql dialect.

agentgt · on Feb 16, 2017

I only have about a month experience but the real issue is you have to do a lot of things manually that are done automatically in other systems like Cassandra.

Particularly rollups and stream like things. That is you have to do the rollups (as well stream like things) either through postgres rules,triggers Listen/Notify, cron etc. With system like Druid (Cassandra), Elastic Search and other time series databases this is not needed (either the query ability is efficient enough because of compressed or it is automatically indexed or both).

For the stream like stuff we have been using Pipelinedb. Pipelinedb is annoying though in that it is a fork and not an extension. Pipelinedb has a similar problem. You have to create the streams apriori.

So both for Citus and Pipelinedb you have to do some planning of your schema where as the other guys you can do more exploratory/adhoc analysis and queries. The disadvantage to the NoSQL guys (besides maturity and literally no SQL) is that you pay a lot more in memory as most of them (IMO cheat) and rely heavily on available memory. Postgres is very memory friendly.

grammr · on Feb 16, 2017

> Pipelinedb is annoying though in that it is a fork and not an extension.

Hi, I'm one of the PipelineDB co-founders--thanks for using our product! Making PipelineDB an extension is the most consistent piece of feedback we've received from our users, and I promise we're listening: PipelineDB 1.0 will be a standard PostgreSQL extension, incrementally rolled out via versions 0.9.8, 0.9.9, and 1.0.0.

rattray · on Feb 16, 2017

Hmm, interesting. Sounds like a problem with a Postgres, not Citus, but it still sounds like a disadvantage compared to competing products (RethinkDB also comes to mind).

I haven't implemented anything for that before personally; would a tool like Bottled Water [0] be helpful for this usecase?

[0] https://github.com/confluentinc/bottledwater-pg

agentgt · on Feb 16, 2017

Yes Postgres has disadvantages (I accidentally posted my comment too soon so that might have caused some confusion) but like I said it is: 1. Mature, 2. SQL is damn powerful, 3. Memory friendly (aka cost).

RethinkDB is cool. I just never got into it to know how well it actually works. As for bottledwater we use Postgres + AMQP aka Rabbit + Listen/Notify. I'm not really a big fan of kafka even though it does have amazing scaling ability.

geoka9 · on Feb 16, 2017

Have you used PostgreSQL before using Citus? If yes, could you share any pain points compared to plain PG (joins, write performance, lack of full SQL support? anything that comes to mind).

manigandham · on Feb 18, 2017

Memory is there to be used. 99% CPU and RAM usage is perfect utilization.

merb · on Feb 16, 2017

well sadly Listen/Notify on PG does not work with streaming replication. That would make it really really great tough.

websymphony · on Feb 16, 2017

They also have a community edition, which also seems pretty powerful. https://www.citusdata.com/product/community

So cost is maybe not a big drawback either.

MichaelRenor · on Feb 17, 2017

> Microservices and NoSQL get a lot of hype, but in many cases what you really want is a relational database that simply works, and can easily scale as your application data grows.

I believe this product is very threatened right now by Google's newly announce Cloud Spanner database[1]. Even with services like this that make scaling "easy", it doesn't yet make it transparent like Google is striving to do.

[1] https://cloudplatform.googleblog.com/2017/02/introducing-Clo...

craigkerstiens · on Feb 17, 2017

Craig here from Citus. Spanner is absolutely some interesting technology, but focuses on some pretty different use cases than we're looking to solve. Spanner is globally distributed but the cost of that is likely to be 100ms per operation. Further it uses SQL for the read, but the insert and update is RPC so it's a bit of a different interface.

Citus is much more focused on interacting like a relational database that simply scales out as opposed to re-inventing a lot of the underlying tech and interfaces. You can find the comment further down from one of our founders, but we're very focused on a few use cases.

1. Scaling beyond single node Postgres (this primarily for B2B applications currently)

2. Operational analytics, here because of the way we parallelize the workload you can get sub-second query performance across terabytes of data.

3. NoSQL++, this is a bit more akin to where spanner may fit, where you have exceptionally high read and write throughput requirements. (Exceptionally high meaning north of 100k writes per second).

atombender · on Feb 17, 2017

Not really, for several reasons.

One is the fact that Spanner is a purely hosted service. That rules it out for a lot of folks, and for those shops where it's fine for production, it's inconvenient for developers. Your app is no longer something you can run completely on your laptop; and if you ever want to be able to develop offline, you have to architect your app to abstract the data layer so you can swap in something like Postgres.

Spanner, as a relational database, also has some important limitations. For example, while nominally ACID, the "C" in ACID has historically included referential integrity, aka foreign keys, which Spanner does not currently do (presumably because doing this in a transaction would require locking the referenced row). Citus can use foreign keys between colocated tables, which enables certain apps where referential integrity is an absolute must (e.g. anything in finance).

If you're used to functional indexes, partial indexes, PostGIS, fulltext, GIN (array indexing, very useful), the superbly rich SQL dialect (COPY FROM/TO, SELECT INTO, CTEs, window functions, LISTEN/NOTIFY, timezone-aware dates), triggers, etc. then migrating to Spanner means potentially dividing a lot of that up to pieces. For example, the lack of fulltext support might mean having to resort to something else, like Elasticsearch, to supplement the database; now you have a rather different system that requires some kind of stateful firehose for indexing, instead of a single transactional RDBMS.

There are pros and cons to everything. Spanner looks very impressive, but it's not for all apps/shops.