How else do you run a multitenancy platform?

ZetaZero · on April 13, 2022

This. It would be an impossible nightmare for every account to have their own DB. Hundreds of thousands of accounts and databases....

taeric · on April 13, 2022

Wait. Why? This sounds like something that feels hard, if you are used to the giant DBs of old. But you can probably get many many instances of the smaller databases without much trouble.

Would still be some maintenance, don't get me wrong. But far from impossible.

ezekg · on April 13, 2022

Imagine the database schema migrations...

Spivak · on April 13, 2022

Having worked at shops that used this architecture it's really not that bad. Can you write the code to do one schema migration? Great, now you can do 1000. App server boots and runs the schema migrations, drops privs and launches the app. Now you've staved off your scaling issues from "how to have a db large enough to hold all our customer data" to "how to have a db large enough to hold our biggest customer's data." Much easier.

robertlagrant · on April 13, 2022

You can write the code to do 1000 schema migrations, but the problem is if you've migrated 40% of them and hit an issue. What do?

spookthesunset · on April 13, 2022

One of the many reasons to put good constrains on fields and use referential integrity! If you don't let the database enforce data validity you are gonna get fucked at some point!

source: every single place I've worked at that poo-poos referential integrity has a database that is full of bullshit that "the application code" never cleaned up

Always use referential integrity. The people who are against it almost always are against it for superstitious reasons (eg: "it makes things slow" or "only one codebase calls it so the code can enforce the integrity"). All it takes is exactly one bug in the application code to corrupt the whole damn thing. And that bug will happen over the lifetime of the product regardless of how "good" or "awesome" the programmers think they are....

... I'll get off my soapbox now!

robertlagrant · on April 14, 2022

That's one thing yes. What if there's a transient network error, or the DB runs out of memory, and now you have some data in an old state and some in a new.

You're lecturing about table design. I'm talking about more general transactionality over any errors.

jagged-chisel · on April 13, 2022

the good news is by the time you get to the 100th client, you'll likely have run into all possible bugs and the remaining 6900 will be pretty smooth.

oauea · on April 13, 2022

You'll quickly run into limitations of how many tcp connections you can hold open. Unless you also want to run separate app servers for each customer, which will cost a lot of $$$

Oh, and just forget about allowing your customers to share their data with each other, which most enterprises want in one way or another.

kubami · on April 13, 2022

Wait. What? None of the enterprise customers want to share data with each other. And definitely not on a DB level. That should happen in the business logic.

lalaithion · on April 13, 2022

Lots of companies have consultants, and want to be able to share their consulting-related tickets with their consultants. And the consultants want one system they can log into and see the tickets from all of the companies that are hiring them.

outworlder · on April 13, 2022

It would be a nightmarish scenario if you have thousands of customers. And completely unnecessary. You can create multiple databases and or schemas in a single instance.

Don't do any of the above unless you understand the implications.

lelandbatey · on April 13, 2022

I worked at company that architected their multi-tenancy in almost exactly this style. In their particular case, only a few of the very largest customers had their database set aside on their own dedicated instance, but every customer did have their own DB with their own set of tables. Having worked in that world (every customer had their own DB) and on a product where all customers had their data intermingled in one gigantic set of tables in one giant DB on one logical instance, I'd definitely encourage the "every customer gets their own DB".

Giving every customer their own table means you're going to need database administrators. For these folks their dedicated job was maintaining, operating, and changing their fleet of databases, but they where very technical and were amazing to work with.

david422 · on April 13, 2022

> I'd definitely encourage the "every customer gets their own DB".

Does this extend to services as well? We have a suite of (micro) services. Are they all segregated?

mdoms · on April 13, 2022

This is the case. I won't comment on your "hundreds of thousands" figure because the number of Cloud customers was a closely guarded secret at least when I worked there, but yes one DB per tenant, dozens to hundreds of DBs per server, and some complicated shuffling of tenant DBs when you run into noisy neighbours.

mh- · on April 13, 2022

That makes this prolonged restore process all the more confusing, then.

I (and many others) assumed they had to graft in data from backups since a full restore would clobber newer changes from unaffected customers.

If they're all isolated in their own logical per-tenant DBs, I'm really at a loss for what is making restoration take 3 weeks for 400 tenants.

I understand if you'd rather not venture into it, but care to offer any speculation?

mdoms · on April 14, 2022

To be honest I'm at a bit of a loss too. My speculation is that since they went all-in on microservices and utilizing various AWS services (something that was underway when I worked there) their data stores have become very much more disparate.

For example, they have the main PostgreSQL data store. Surely that's easy to restore. But the users in that DB have a "foreign key" (in a logical sense, not physical) to the Identity service. This is a real life example that occurred while I was there. So now we have a mixture of multi and single tenancy. So perhaps the identity records are also tied to this app ID and deletes were propagated to that service. And perhaps there is an SQS queue and a serverless function to handle, say, outgoing mail from Jira. Where does this data go? I dunno maybe some Go-powered microservice with its own DocumentDB store. Do deletes propagate here too? Who knows. You can see how this gets complicated and how issues multiply with more services.

Again, this is only speculation. But "decomposing the monolith" was a big deal and it was coming from the top.

mh- · on April 14, 2022

Thanks for the insights. It does seem to have, ahem, decomposed.

spookthesunset · on April 13, 2022

If they had multi-tenant databases for SaaS it would mean either the self-hosted jira instances also had the same multi-tenant database schema or they'd have to maintain two almost entirely different data access layers for cloud vs. on-prem. Since their cloud offering came from a historically on-prem codebase, I would expect the easiest way to offer cloud stuff is to do a DB per tenant. Otherwise there would a shit-ton of new code that only applies for cloud stuff....

kikki · on April 13, 2022

Not quite the same but at Fandom (Wikia), every wiki has its own DB (over 300,000 wikis), and they are clustered across a bunch of servers (usually balanced by traffic). It works well - but we don't ever really need to query across databases. There's a bunch of logic around instance/db selection but that's about as complex as it gets.

jjice · on April 13, 2022

Interesting architecture. From a design point of view, I like the idea of full isolation. From an infrastructure point of view I'm a little scared. I'd assume it's actually not that bad and there's a good way to manage the individual DBs and scale them individually.

Really interested if you can share any details.

Edit: I know each wiki is on a subdomain. Does each wiki also have it's own server?

kikki · on April 13, 2022

There are _many_ databases on each server, last I checked there was around 8 servers (or: "clusters") - and we have it so the traffic is somewhat evenly distributed across each server. There are reasonable capacity limits, and when servers get full we spin up a new one and start accepting new wikis there. I am not in OPS, and they do a lot of work behind the scenes to make this all run smoothly - but from an eng perspective we rarely have issues with this at scale.

Some of this was open source before we unified all of our wiki products, which has a lot of the selection / db logic, at https://github.com/Wikia/app.

spookthesunset · on April 13, 2022

How do you update the schema on 300,000 databases?

kikki · on April 13, 2022

It doesn't change often, if we do we just have large automated rollout plans - but we've done mass changes enough times there are good procedures around large DB migrations.

radicaldreamer · on April 13, 2022

Sorry, I'm not actually sure... maybe someone who's experienced in backend db can elucidate here.

Is it not a good idea to spin up separate db instances for each client/company?

brightball · on April 13, 2022

There are basically two options for multi-tenancy with their own tradeoffs.

1. An account/tenant_id field for each table

2. A schema for each tenant wrapping all of the tables

Option 2 gives you cleaner separation but complicates your deployment process because now you have to run every database change across every schema every time you deploy. This gets more complicated as your code is deploying in case the code itself gets out of sync, there's a rollback or an error mid deploy due to an issue with some specific data.

The benefit of the approach is the option to do different backup policies for different customers, makes moving specific customers to specific instances easier and you avoid the extra index on tenant_id in every table.

Option 1 is significantly easier to shard out horizontally and simplifies the database change process, but you lose space on the extra indexes. Plus in many databases you can partition on the tenant_id.

Most people typically end up with option 1 after dealing with or reading horror stories about the operational complexity of option 2.

eropple · on April 13, 2022

The secret bomb in option 1 is that you generally have to have smarter primary keys that fully embrace multitenancy and while Atlassian hires smart folks and I'm sure they at some level know this--that's a relatively hard retrofit to work into a system.

outworlder · on April 13, 2022

Option 2 has many unforeseen consequences.

Business wants to run a query across customers? In most DBs you need either custom code or to create a stored procedure to iterate across schemas.

Every table that you create is multiplied by the number of customers. This has implications for some database systems (like PG's vacuum).

Your migrations will take _forever_ to run.

Etc.

Spivak · on April 13, 2022

The second problem is mitigated by the fact that schemas are trivially migratable between database servers. Once you grow too big for one cluster just make another.

andyjohnson0 · on April 13, 2022

> Is it not a good idea to spin up separate db instances for each client/company?

It depends, really. There is a trade-off in terms of software and operational complexity vs scalability/perf and isolation. And probably a bunch of other factors.

If you have separate databases for each customer, schema migrations can be staged over time. But that means your software backend needs to be able to work with different schemas concurrently. You can also benefit from resilience and isolation guarantees provided by the dbms. On the other hand, having a dbms manage lots of databases can affect perf. Linking between databases can be a minefield, especially w/r/t foreign keys and distributed transactions.

https://docs.microsoft.com/en-us/azure/azure-sql/database/sa...

Spivak · on April 13, 2022

> But that means your software backend needs to be able to work with different schemas concurrently.

Not if you're truly multi-tenant and each customer has their own app servers. Then your code and schema version are always in lock-step.

andyjohnson0 · on April 13, 2022

True. But then you have an additional problem ...

stickfigure · on April 13, 2022

I have built multiple multi-tenancy platforms and I never create separate databases for each customer. If you have separate databases, it's almost impossible to run meaningful queries across all of them. That architectural choice creates far more headaches than it solves. Usually people end up with the split-database architecture when they want a quick retrofit for a system that wasn't designed with multiple tenants.

I've also had to restore partial data from backups on a few occasions when customers fat-fingered some data and asked pretty-please to undo. If someone on staff understands the system well, it's not hard. I suspect Atlassian suffers from a complicated schema and a post-IPO brain drain.

sharken · on April 13, 2022

It's likely a mixture of all these factors, the brain drain could absolutely be responsible.

At least it would not be the first time in history that a company has lost the engineering spirit. And instead the business people have taken over, so that details like disaster plans become less of a priority.

A business person and an engineer will always view risk differently, better disaster plans is a kind of insurance that is a lot harder to sell when too many business people run the company.

x0x0 · on April 13, 2022

I can't believe anyone would do separate databases.

Just wait until a migration doesn't run on 2 of your 400+ customer databases. Or multi-hour migrations.

rectang · on April 13, 2022

When all customer data lives in the unified database: Just wait until a bug in a query exposes the data of customers to each other, creating instant regulatory and privacy nightmares for everyone.

x0x0 · on April 13, 2022

With an orm and customer objects to create scoped queries, I haven't found this to be a problem. It's also very easy to check in code reviews. And not a painful issue from, well, the lack of this happening given it's an extremely common app design.

tedunangst · on April 13, 2022

Sounds good to me. Now you've got 398 happy customers on the new version, and a small scale issue to resolve with two customers.

codingdave · on April 13, 2022

It is like any other architectural choice - there are pros and cons both directions. If you have separate db instances, you have to scale up the operations to manage each one - migrations, scripts, etc need to be either run against them all, or you need good tooling in place to automate it. A single instance avoids all that, but is more complex in the actual software and definitely more complex for security. A single DB also would let you share data amongst organizations fairly easily, but whether that is good or bad depends on your product. I've created and run products both ways, and I like separate DBs at small scales, single DBs at medium scale, but separate DBs again at huge scale if you also put management tooling in place.

nemothekid · on April 13, 2022

I believe you can sign up an account for free or incredibly cheap ($5/user). You would potentially have tens of thousands of databases. Imagine trying to do something like a database migration to add a column. I believe the day to day operations would be a nightmare as no RDBMS has probably had that kind of feature stress tested.

some-guy · on April 13, 2022

The company I work at (Workday) does this, but it's for business / liability reasons.

robertlagrant · on April 13, 2022

Bearing in mind the licence fees of Workday, the costs of separate databases pale in comparison!

indymike · on April 13, 2022

Answer: it depends on the application. For example big social app is not going to provision a new db for every user, or for every customer that runs an ad. Likewise, a lot of enterprise software fits a model where each customer getting it's own db makes sense. So, really, just a design decision.

truffdog · on April 13, 2022

Separate DB instances doesn't scale as well cost wise, and generally means onboarding takes a few minutes instead of being instant. It is very common though.

Spivak · on April 13, 2022

The solution that satisfies everyone is having a separate schema per customer and a number of database clusters. Then each customer is assigned to a particular cluster. Always make sure you have excess capacity on your pool of clusters and onboarding is still instant.

doliveira · on April 13, 2022

By segregating as much as you can. Definitely not by putting everything in a single table. At the very least separate databases/schemas with proper permissions so there's not any chance of data intermiBy segregating as much as you can. Definitely not by putting everything in a single table. At the very least separate databases/schemas with proper permissions so there's no chance of data intermixing.

The best would be multiple separate database instances, which is not even hard to manage specially for qualified engineers like Atlassian surely has plenty of. The problem are business decisions of ignoring the tech debt, usually...

akie · on April 13, 2022

Now every time you run a database migration, you have to adjust N tables - and in Atlassian's case, N is 200000. Is that better? It depends. There is no "best" way of doing multitenancy.

hnlmorg · on April 13, 2022

That’s just an automation issue. It’s not like you have to write a bespoke database migration script per DB.

robertlagrant · on April 13, 2022

The bug we are mitigating was also just an automation issue.

hnlmorg · on April 13, 2022

It’s also pretty easy to foobar up a single DB instance if you don’t have proper guardrails in place.

Automation wasn’t the issue here. It’s the symptom not the cause.

robertlagrant · on April 13, 2022

No, the symptom was the loss of customer data.

doliveira · on April 13, 2022

Way easier, actually.

doliveira · on April 13, 2022

There is a worst way of doing multitenancy, and that is sharing a single big table.

hnlmorg · on April 13, 2022

Multiple schemas? You don't need every tenant in the same schema. However I'm not a DBA by trade so there might be some issue with doing this at scale that I'm unaware of.

msh · on April 13, 2022

At minimum separate tables for each tenant.

Spivak · on April 13, 2022

At that point you might as well just do separate schemas, it's actually less headache.