I've had success with strategies at introducing some abstractions/patterns at my current place(doing this alone for a enterprise SaaS company with 200-ish devs). It's weird that we don't teach these or talk about them in software engineering(AFAIK). I see them being re-invented all the time.
To borrow from medicine: First step is to always stop the `stop the hemorrhage`, then clean the wound, and then protect the wound(or wounds).
- Add a deprecation marker. In python this can be a decorator, context-manager, or even a magic comment string. This i ideally try to do while first introducing the pattern. It makes searching easier next time.
- Create a linter, with an escape hatch. If you can static analyse, type hint your way; great! In python i will create AST, semgrep or custom ones to catch these but provide a magic string similar to `type: noqa` to ignore existing code. Then there's a way to track and solve offending place. You can make a metric out of it.
- Everything in the system as to have a owner(person, squad, team or dept). Endpoints have owners, async tasks have owners, kafka consumers might have owners, test cases might have owners. So if anything fails you can somehow make these visible into their corresponding SLO dashboards.
The other alternative to this last step is "if possible" some platform squad can take over and do this as zero-cost refactor for the other product squad. Ofcourse the product squads have to help test/approve etc. It's an easier way to get people to adopt a pattern if you do it for them. But the ROI on the pattern has to be there, and the platform squad does get stuck doing cruft thankless work sometimes. If you do this judiciously the win might be thanks enough, like more robust systems, better observability/traces, less flaky tests etc. etc.
Tests might cover more code than a single unit owned by different teams, thus end up with multiple owners. Prefer "squads" as the owners rather the individuals.
But just like documentation the ownership might be stale and out of sync. So the idea would be let some reds in SLO dashboard correct them over time. It's not possible to automatically link "tests" to the "code" always.
End-to-end tests might get tricky. But unit tests should be owned by the person/team/squad that owns the unit.
And unit tests should never break/be red. If the code needs to changed, the test needs to be changed at the same time.
End-to-end tests can be flaky. Those probably shouldn't prevent deployments and can be red for awhile. Should probably manually confirm if the test is acting up, there's a change in behavior, or if something is legitimately broken before ignoring them though.
To borrow from medicine: First step is to always stop the `stop the hemorrhage`, then clean the wound, and then protect the wound(or wounds).
- Add a deprecation marker. In python this can be a decorator, context-manager, or even a magic comment string. This i ideally try to do while first introducing the pattern. It makes searching easier next time.
- Create a linter, with an escape hatch. If you can static analyse, type hint your way; great! In python i will create AST, semgrep or custom ones to catch these but provide a magic string similar to `type: noqa` to ignore existing code. Then there's a way to track and solve offending place. You can make a metric out of it.
- Everything in the system as to have a owner(person, squad, team or dept). Endpoints have owners, async tasks have owners, kafka consumers might have owners, test cases might have owners. So if anything fails you can somehow make these visible into their corresponding SLO dashboards.