The one exception I'll make from the very start is "add tenant identifier to every row, yes, even if it's linked to another table that has tenant identifiers."
Sure, this means you will have some "unnecessary" `tenant_id` columns in some tables that you could get through a relation, but it saves you from _having_ to include that relation just to limit by tenant, which you will some day almost be guaranteed to want. (Especially if you want to enable row-level security¹ someday.)
We do a variant of this across all our tables. If we have a parent-child-child relationship, then all child tables, regardless of depth, will have the parent id.
This way we can load up all the data needed to process an order or invoice etc easily, without a ton of joins.
We don't do multi-tenant, instead separate databases per tenant so far.
How is this working for you so far? Do you ever need to report across the multiple tenants, and how do database migrations go? I'm starting to look into this, and purely for reporting and database migrations I'm leaning towards multi-tenant.
The product I'm working have not needed to report across multiple tenants.
We do have some customers which have separate daughter companies which might technically be individual tenants, and where we might need to report across those. But in all those cases so far we've been able to host the databases on the same server, so can easily join tables from the individual databases into the same query.
Database migrations are very smooth, given each tenant has their own database, so we can do it when they're ready instead of needing a service window that fits all our tenants. We have several that are essentially 24/7 operations, and while they do have periods of low activity that fits a service window, they usually don't line up well between them.
Likewise schema upgrades are trivial. We have written our own little tool that updates the database given a source schema description in XML (since we've had it for 15+ years now), and as it does not do destructive updates is is low risk. So schema upgrades is done automatically as part of our automated update deployment (ala Windows Update).
Of course this requires a bit more thought during schema design, and sometimes we do add some manual cleanup or similar of tables/columns to our tool that we know are safe to remove/change.
One upside is that performance is easy to manage. A single tenant will almost never cause performance issues for others than themselves. If they start doing that we can always just trivially move them to their own server.
A downside is that we have a lot of large code lists, and currently these are kept per-database as we've traditionally been deployed on-prem. So we're looking to consolidate those.
We do have another product that I haven't worked on, it's a more traditional SAAS web app thing, and that does have multi-tenant. It's not as business-critical, so the service window constraint isn't an issue there.
Given that it has an order of magnitude more tenants than the application I work on, I think that multi-tenant was a decent choice. However it has also had some complications. I do recall hearing some issues around it being a more critical point of failure, and also that certain larger customers have started to want their own separate database.
I think ideally we would have combined the two approaches. Allow for multi-tenant by having a tenant id everywhere, but also assume different tenants will run in different databases.
Thank you very much for your time, I appreciate it! It definitely seems automated schema updating is necessary if you're doing more than a couple of databases, and you raised many other good points that I hadn't fully considered. I can definitely appreciate larger clients wanting their own dedicated databases, so planning for that initially could be a wise choice.
Thank you again!
Nice one! I’m working on an app the is user/consumer facing but will eventually have a teams/bussiness offering. Everything has a uid with it cus users are the core of your app. But yea if your multitennant then tenants should be treated like that as well.
Then said row is actually owned by nobody, and the problem solves itself. Your many-to-many table will have a row for each connection, and those rows will have a tenant_id. But that's a normal database problem that will essentially always require joins, at that point -- it's not complicated by this approach.
(Alternatively, your row might have two tenant IDs embedded directly in it by the nature of the shared connection, like "owner_id" and "renter_id" for a property, for example, and again, you're fine because you can use those IDs for a query very easily.)
In a great number of (probably most) business-to-business software domains, most rows are only relevant to one tenant at a time. It’s a partition/shard key, basically: queries will only uncommonly retrieve rows for more than one tenant at a time.
Hassles emerge when the tenant that owns a row changes (client splits or mergers), but even then this is a common and robust architecture pattern.
Sure, this means you will have some "unnecessary" `tenant_id` columns in some tables that you could get through a relation, but it saves you from _having_ to include that relation just to limit by tenant, which you will some day almost be guaranteed to want. (Especially if you want to enable row-level security¹ someday.)
¹ - https://www.postgresql.org/docs/current/ddl-rowsecurity.html