Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

“if you’re not on-call for your code, who is?”

Someone whose job it is?

The people who designed your car don’t have to change the oil in it.

The more I think about it, the more I realise it’s just motivated reasoning because people like doing it. The logical conclusion to this line of thinking is that the CEO just does everything themselves. Otherwise, how do they really know if they’re effective or not?



It’s about aligning incentives. We don’t want to spend our time fixing prod, so we build robust systems. Configuration is hard, so we simplify that. Upgrades are error prone, so we make them more stable and update more frequently.


Where do things like pride, responsibility, and empathy fit in this model of incentives? Good people want to do a good job. An engineer designing a bridge doesn’t need to use it every day to be incentivised to do a good job; designing a bridge is an intellectual pursuit that is quite a bit beyond simple incentives.

The downside of this “skin in the game” approach is that you skip specifying things properly and end up with systems that only your devs can configure and upgrade.


I'm currently on-call, essentially, for code I didn't write. (Not by choice, by necessity.)

We have a subsystem that is currently emitted millions of logs per hour, It's eating up most of the available compute. In a separate incident, it racked up a few thousand dollar bill by making millions of API calls that all failed.

It clearly has issues. But I'm not the primary dev: I have no familiarity with the code base, I have little idea what it is doing (and yes, I've asked). As I'm not a dev of the code (and have no time to become one — our agile sprint planning will never allow time for that, and, since I'm not one of the devs) I'm not able to add the information I need to the code to get the insights into answering "why is it eating up most of the compute?".

> The people who designed your car don’t have to change the oil in it.

No, but when the car fails to operate as designed, those people need to figure out the why. Also, a mechanic has an understanding of how the car is built, and how it functions. In software, the only people that have that are the devs.


> But I'm not the primary dev: I have no familiarity with the code base, I have little idea what it is doing (and yes, I've asked). As I'm not a dev of the code (and have no time to become one — our agile sprint planning will never allow time for that, and, since I'm not one of the devs) I'm not able to add the information I need to the code to get the insights into answering "why is it eating up most of the compute?".

It looks like you identified the root causes of the problem here: the fact that you're not the dev doesn't have to be a problem. It's the fact that it's under-documented & has bad metrics.

While everyone understands that we will always have bugs and issues (at least while we keep working in the current paradigm for software development), having good designs, documentation and metrics is attainable. It just has to be prioritized by management.

> "our agile sprint planning will never allow time for that"

Sounds like those who call the shots either don't understand the cost of not doing these things, or believe that it's more cost effective not to do them.


> As I'm not a dev of the code (and have no time to become one — our agile sprint planning will never allow time for that, and, since I'm not one of the devs) I'm not able to add the information I need to the code to get the insights into answering "why is it eating up most of the compute?".

But as oncall, that's not your job, is it?

Being oncall means you are the primary point of contact for your team regarding any issue involving how your product reaches the public. You take the lead identifying issues and finding ways to mitigate how problems impact users. Yet, that doesn't mean you should be attaching debuggers to running processes and adding breakpoints here and there. You are expected to avoid downtime, meet service levels, and coordinate with all teams to fix operational issues and increase code quality.

If you're not the primary dev and you stumble on an issue, you are expected to file a ticket and bring it to the attention of anyone who is in a position to address the issue.


I'm currently on-call[...]Not by choice, by necessity.

I have no familiarity with the code base, I have little idea what it is doing (and yes, I've asked)

It sounds like you’re being asked to do a job without the tools you need to do it (i.e. supporting documentation, a runbook etc.). I obviously don’t know the circumstances, but the organisation needs to resolve those issues so you can do an effective job.

when the car fails to operate as designed, those people need to figure out why

I agree completely. The team responsible for the codebase should be fixing the bugs.


Code doesn't wear. An "oil change" or "part replacement" workflow in a software system is a fixable bug. A healthy shop fixes those bugs, rather than hiring people to work around and clean up after them. The failures it experiences are novel, and people who are not deep experts in the code would not be able to fix them anyway.


>>The people who designed your car don’t have to change the oil in it.

If the people who designed the car don't care about oil changers' requirements because "its not their job", than a 2am call is absolutely the best kind of feedback they should get, even if they are not on-call.

If they do care, that means they already got that feedback and/or they listen.

So the fredback from the production should be there and it should be ongoing.


If you want the code to run 24/7, and you want engineers available to fix it, then you hire people to work off hour shifts and pay them more. You don't hire someone to work 9 am to 5 pm Monday to Friday and expect them to wake up at 2 am to fix something. On the other hand, an engineer scheduled to work from 9 pm to 5 am would have no issue doing so.


My team designs an airliner system. Effectively there is a company support team "answering the phone" 24/7. But when our system fails in a non-trivial way on an aircraft somewhere, one of the design engineers absolutely gets a call and will spend the next few hours supporting it. Even if it's 2am on a week-end.


Oil change is maintenance, not a fix. If there was a flaw in the design of the oil filter, it will definitely come to people who designed it.


Yes, exactly. The people responsible for that work will be the ones to fix it.

Note how the designers were notified of the flaw without having to be there when it was discovered, because that was someone else’s job.


Software Engineers are neither CEOs nor designing cars. They can afford to spend some time fixing their own code.


Sure, how about 40 hours a week?


I think you’ve misunderstood the point I was making; humans need to work in teams, and that means hand offs, communication, delegation etc.

If you can’t ever cooperate with anyone else because of incentive structures, then your organisation can only have one individual–the CEO–who was to do everything by themselves.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: