The consistency problem is an open question in my mind. I definitely don't like the idea of having some data synchronization tool to fix the inconsistent data across services problem. I wonder what the best practice is for maintaining data consistency across services.
Ideally you don't have to sync the data because one service owns that data. Other services request that data via api. In a RESTful world those api requests are cacheable.
But what about the situation where you have an entity service that owns the data for one piece of the domain, for example a People service, and then other services, like the Address service and the Billing service, reference a particular person. In that scenario, I can imagine the Address service and the Billing service would have a foreign key referencing a person in the People service. Then, what happens if the Person gets deleted? In that case, we've got a consistency problem, even though each service owned its data.
The People service can also store Addresses. Call it the Identity service. Include People, Businesses, Relationships, and Addresses.
The Billing service can then reference People or Businesses (and if Businesses, then sub-People), bill to an Address, etc.
No one's saying every object should be a service; you need to find the correct lines to divide across.
In our system (which has been service-oriented for five years), we don't do deletes. We do 'inactive' (UPDATE table SET ACTIVE=0…), but never deletes.
Especially in a case of your billing example, you never want to delete a person or address, because that's historical data you need to retain, but we just keep everything. If it goes in the database, it's because we want to keep it forever.
You could have a service bus, where you publish a "PersonDeleted" message that the other services would subscribe to. It decouples the Person service from all the other related entity services.
You'd have to allow for propagation delay. Plus the possibility of a message storm if you delete something fairly fundamental.
> You could have a service bus, where you publish a "PersonDeleted" message that the other services would subscribe to. It decouples the Person service from all the other related entity services.
You're still screwed if you complete a transaction on the deleted person's still-existing account, now that your system is no longer transactional...
Your objection is hypothetical/abstract and when you ground it in specific use cases there are plenty of patterns that emerge addressing how to deal with the inconsistent state. For example, just-in-time/read reconciliation, batch remediation, actually making some subset of actions transactional/consistent and suffering lower availability there, and so on and so forth.
I'm not saying there are no solutions, but saying "just add an event bus" is unlikely to be sufficient. Whatever you do, you're going to pay additional costs in terms of complexity.
Yeah, if you're an ACID person, this approach is going to present conceptual challenges. The propagation delay is a mostly-solved problem, which I know because lots of high-scale sites work. Getting a summary of their design decisions around this would be a huge time-saver, but I don't know of one.
So the problem you've identified is real. I used to have some bootleg footage of some private amazon tech talks where the speaker emphasized that in distributed systems it was generally a terrible idea to have transactions span entities.
I think you basically have to learn to live in an eventually consistent world. In the case of people being deleted I would imagine that the user service exposes a pub/sub interface where address and billing services subscribe to "delete" events.
Hardly need "private bootleg" footage to discover this reality. Pat Helland (at the time, working at Amazon) wrote a paper about it maybe 10 years ago.
You don't HAVE to live in an eventually consistent world. If you use something like ZeroMQ or use REST then you can "notify" other services of a "person deleted" event in a synchronous manner.
That has nothing to do with the fact that if your systems are distributed, you will have eventual consistency.
If System A needs to tell System B about an event in order for A and B to remain consistent, but B is down, you've got eventual consistency, because B can't become consistent with A until it's back up and has performed whatever recovery is necessary to process that event. Service discovery does nothing to solve that problem.
In addition, the network isn't just up or down. It's varying shades (dare I say, 50 shades?) of down or broken. A single machine might not be accessible due to a switch issue. An entire rack or aisle might be compromised by a bad router or faulty routing table. A network cable might be flaky. The truth is you just don't know, and that's all inside a single LAN.
Your service discovery system could be able to see service {A,B,C}, but service A can't talk to B or C due to network issues. It happens.
The "each service owns its data" scenario shouldn't take precedence over cases like this, where consistency is aligned with obvious business rules. If Person gets deleted, then "on delete cascade" should take care of that Person's Address and Billing records.
For updates and maybe reads, it's a different story.
In my experience, you only expose APIs that are either standalone and transaction-ally independent (change address, delete address etc. in your example) or composite services (say people service) that should manage this distributed transaction. How transactions are managed under the hood vary based on implementation.
One may argue that in this case, "people" service doesn't go by description of micro-service as given in the article. But we need to understand that services get called in some context and there has to be someone there to do the plumbing. That someone can either be a db query, some code in the service, or app/application calling the services. And "generally" you would prefer service code over other two and hence a composite service.
IMO it may also be okay to have People, address, billing under one schema if service granularity and context allows so.
cgh, I've hit the reply limit, but I wanted to ask how you'd implement the cascade deletes thing over services? Would the People service have to emit events describing that a person was deleted that the Address and Billing services would be expected to subscribe to in order to handle that the person was deleted?
Sorry, I should have been more clear. I'm assuming a shared database. So cascading deletes would be defined in the table's schema. Let's pretend we're using Postgresql:
The original position being argued though was "... where all services talk to the same database... You need to split the database up and denormalize it.".
So the basic premise is that there is no shared database, and thus having the database enforce cascading deletes is not an option.
You wouldn't so much delete them, as deactivate them (mark them inactive but keep them in around for retrieval). The consuming service would react differently to an inactive person as an active person.
Does anyone know?