Hacker News new | past | comments | ask | show | jobs | submit login
It Takes 6 Days to Change 1 Line of Code (2015) (edw519.posthaven.com)
568 points by Mz on Dec 6, 2016 | hide | past | favorite | 287 comments



I want to fault this but I can't. Code reviewer who's not afraid to reject code? Check. Proper testing of code changes? Check.

The process probably looks silly in this case because nothing went wrong. But with a hard-coded variable being changed, who is to say it won't break some other system by being changed? And if the variable was then moved to a config file, who's to say again that it might not break in some obscure way? Such as if the config file is not found in the uat/test environment? Additionally after all this, the manager can change the variable in the config file as they please without this process need going through again.


The process is fine, but sometimes shit is broken and you need to fix it now and clean up the existing mess later.

Why not instead of

> Shirley (Code Review): It is now against company policy to have any hard-coded variables. You will have to make this a record in the Parameters file. Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production.

Add at the end:

However, I'll allow the change for now because this is an important update that affects peoples jobs. I want to see every issue fixed before the end of the week and I will not approve any other changes until these issues are resolved.

There, win-win.


> The process is fine, but sometimes shit is broken and you need to fix it now and clean up the existing mess later.

Working in big enterprise shops, I've noticed that periodically that attitude does take hold, and the result is that the firefighting ends up producing more fires than it puts out.

If you could change most of the people, culture, and incentives in a non-software-centered enterprise all at once (and this typically goes far beyond the IT shop), sure, maybe more fluidity in software development and release process would work better. But that's usually not possible, nor is it usually possible to durably wall off a group within an enterprise to isolate it from the problems of the larger organization (though that does happen sometimes.)


Exactly this. I think dev shops are fairly stratified into quality levels. "fix it now and clean it up later" works in a certain class of shop, and that's great.

But there's a huge swath of dev shops that never pay any technical debt back, and the compound interest on that technical debt consumes all productivity. Past a certain point, drawing a line in the sand and forcing better practices no matter how easy it is to rationalize 'one more hack' is the only way to break that cycle.


> But there's a huge swath of dev shops that never pay any technical debt back

Engineers often talk about technical debt as if the codebase is the currency of the realm. It's not. Business Value is the currency of the realm. The code is something that facilitates business value, when (and only when) it does what the business needs. The people working on the code have a customer, the business. The business is locked in to a single provider for that service. And the service provider has grown complacent, prioritizing its own needs and processes ahead of the needs of the customer. They're behaving like other entrenched and untouchable service providers do, in monopoly industries like government and cable companies. Deal with my service levels, or suck it.

There are many customer-obsessed engineers who know how to strike this balance, but this is one of the many issues where there's selection bias in who will take the time to participate in the discussion. People who think this is a powerful story of IT done right need to broaden their minds, stop playing around on hacker news, and start thinking a little broader about how software service providers can do a better job of learning how to say "yes" faster.


> Engineers often talk about technical debt as if the codebase is the currency of the realm. It's not. Business Value is the currency of the realm.

"Technical Debt" is a term that exists largely to communicate the idea that the thing it represents increases the dollar cost of both operating the software to deliver constant business value in a steady state, and of delivering changes to the software to continue to deliver the same value as business needs change or to realize additional opportunities to deliver new value.

It is, absolutely and centrally, about business value; ignoring it in decision-making is exactly like ignoring maintenance/operations costs in a capital purchase and considering only acquisition costs.


All tech debt need not be paid back in full - thinking otherwise is indeed a fallacy like you point out.

The catch here though is the fact that rate of business value production is inversely proportional to amount of tech debt. Meaning a ton of yes's today will mean a grinding halt in a couple of years. I have worked on too many (very successful) teams now and watched many others that have gotten themselves into this situation.

After this point, you end in a place where tech debt repayment efforts can take years - but it also coincides with 4-5 year mark when original developers are tired and leave. End product - a giant legacy system operated by a ton of new folk who have no clue what's up. End result: This gets 'deprecated' and replaced by a brand new one.

Ideally I do think data is gold but code needs to be incrementally ripped out and replaced every 3-5 years or so. Else entropy is too crippling on the system.


> start thinking a little broader about how software service providers can do a better job of learning how to say "yes" faster.

We don't need more "yes men", we need more people able and willing to say 'Here are the trade-offs', and work with the business from there.

Yes, it absolutely is about Business Value, and the best Business Value is not always "yes" (of course, sometimes it is).

... speaking as someone currently dealing with and seeing the Business Value impact of technical debt.


But for a lot of things, this is actually generally acceptable. Back office business apps, for example, where because of internal politics and attrition, they generally end up being replaced wholesale every 5-10 years, which is approximately how long it takes technical debt-ridden code to become so brittle it's unsupportable. A CIO can save millions by allowing this behavior. Thousands upon thousands of apathetic IT staff (including programmers) make decent livings in enterprises like this.


5-10 years? In a company of what size, in what industry? For most "back-office" apps you're talking 2 years for the re-write just to see (almost) feature parity, and another year for the users to be trained and feel comfortable with it. If you were ripping and replacing a non-technical users go-to app every 5 years you'd have revolt.


We have 2 apps that are used for the same thing.... the original one and the rewritten one. The rewritten one was coded up in C# over 3 years and isn't anywhere near as robust as the original one. The original one was written in VB3 circa 1995 and dragged to VB6 where its stayed to this day.

Theres that much technical debt on the old app that the first change I made to it 5 years ago caused it to stop compiling because of the single line I added to a function which made it to big to compile.

If you work in a shop that has been around long enough to lose its training wheels and get a driving licence, then you have technical debt thats core functionality that everyone is scared of breaking.


Anecdotal example: The place I was working at had high software quality standards, but eventually created a separate team that wrote in a bizarre XML 'language' of sorts. This was, in effect, still part of the software, and the sins committed there were horrendous, but all was allowed in the name of "business value".

The end result of that being that you couldn't really change _anything_ without it breaking any of N client installs (each custom) - and you couldn't reason about which one touched a particular part of the code, or what assumptions it made.

That company was all but dead inside two years. There were other factors leading to its demise, but far and away the biggest issue was the crippling technical debt that was piled on with reckless abandon.

There are definitely trade-offs to be made, and sometimes it makes sense to take on some technical debt. Your example of back-office apps may be one of them, but there's certainly a subset of software development for which this is both a business and technical-level suicide.


At least in this example, the entirety of the project involves fixing that "technical debt" of hard coding a variable, and it's being held up for not fixing _enough_ technical debt.


> and the result is that the firefighting ends up producing more fires than it puts out.

I usually find that customers

- Prefer getting feature early, even if it's unpolished, meaning ship something broken and fix it later rather than polish and ship something perfect.

- Prefer many, complex features over simplicity/performance in the product.

- Prefer having more bugs, but quick turnaround over having few bugs and a long turnaround.

In all the enterprise scenarios I have worked I have tried to stress the importance of simplicity etc but really all customers have stressed is that they don't mind bugs, crashes or poor performance as long as they can get their work done, and that might depend on that last feature or putting out that last fire.


They say that now. But what happens when they list all the bugs against you when it comes to paying the invoice?


They say, thanks for fixing all those bugs. They know the bugs exist because they encountered them. Fixing bugs actually looks like work is getting done.

Unlike if you came back to them with a polished product with no bugs, they would be like why am I spending all this money for this time?


That’s related to the concept expressed by behavioural economist, Dan Ariely: “consumers don’t value goods and services solely by their utility, … but also a sense of fairness relating to how much effort was exerted” [1]. In that quote, I’d probably qualify the word effort as “perceived effort”.

http://danariely.com/2010/12/15/locksmiths/


The customer doesn't know about or understand technical debt.

Any improvements you make down there are just racking up on time spent that makes the bills higher which they will notice. They do not see your improvements, but they will notice it if your much betterer code now introduces a new bug.

Higher bills as your predecessor, slow in fixing and new bugs. That is the moment to worry about getting bills paid, not for fixing fast.


The thing is, the existing customers are locked in (or we failed).

The customers we want are those that aren't customers yet. Those customers require features that we don't have yet in order for them to become customers.

It might be the right decision as a business to screw the existing customers by aggregating tech debt in order to grab new market share before the competition does (again this is in a context where a customer has no chance of leaving once on board - which is more or less the case in all enterprise scenarios I have experience from, both web and desktop).

I hate this mentality and as a developer it's terrible - but I can't say that it's the wrong decision to make short term, business wise.


Guess I simply do not share your view. My customers are not locked in, in fact I will go out of my way to make it easy for them, their employees or even another consultant to continue on any work of the work I do and have done for them in the past.

In addition I am not in favor for aggregating technical debt, I suppose I wasn't clear. The article talked about a 1 line fix and that by itself should not trigger a rewrite of a feature, especially not when the fix has to be done now, not next week.


> But what happens when they list all the bugs against you when it comes to paying the invoice.

I have never seen that happen, but then I have never worked on invoiced/project work, only long term enterprise/product dev. Customers license the software, but I suppose there could be some SLA type agreement also for desktop software (which happens to be the case).


OT: I was about to post that women (Shirley) are bossy, but then I pretended that the named individual is Sam and the entire tone of the comment shifted immensely. It was as if the big boss was making this comment. And big bosses are never looking into things of this much detail, which gave more importance to the task.

There is something that isn't right here. I am not sure why I am feeling this.


Welcome to the scary world of recognizing your own biases!

It's a good thing.


It's not Shirley's job to make an executive decision, like whether to break policy or not.

If the company policy should be ignored, then the issue should have been escalated to someone with the authority to make that decision.

IMO the real problem is that the submitter didn't realize that was the case and so didn't escalate sooner. This could have all been avoided by asking for (and getting) carte blanche from Philip or David in the first place.


And yet, if everyone is aware not only of company policies, but the underlying principles, everyone also has the authority to deny something. Hardcoded values, for example, would violate a principle about maintainability.

Talk about policy and it's politics and escalation and such. Talk about principles and it becomes a moral issue.

Alternatively, think of it as breaking a specific law (such as the speed limit) vs being immoral in a more general sense of the word.


I think making this a moral issue slightly misses the point. Even if Shirley understands the underlying principle of maintainability and is motivated to do the best thing for the company, she is still not the right decision-maker because the relevant risks and rewards are not fully visible to her.

Decisions should be made at the lowest level where all the relevant information is available. In this case, the executive understands the pressing reasons for making the change, so is properly able to weigh the incremental risk (with the technical advice of Shirley et al) vs. the business priority, whereas Shirley can only see half of the picture.

Imagine if Shirley had gone ahead and broken policy, then something went badly wrong. She likely would've been blamed for the whole fiasco.


> Imagine if Shirley had gone ahead and broken policy, then something went badly wrong. She likely would've been blamed for the whole fiasco.

This is what I was looking for in this entire thread. It's one thing to push through a small change in a small internal program for something that isn't critical like a batch job that clears old data from a server. That's not what this is.

It changes the potential output of the entire company. Yes, it's a single LOC totaling out to a single byte. Yes, it passed all of the tests. However, tests are written by people, and tests are only as good as the people who write them. This is partially why there are human eyes in the QA/IT/UAT phases. Shirley would have been viewed as the single point of failure had something go wrong, because she made the decision to break policy.


Yeah, if I was Shirley, I'd just have suggested "why don't you ask David to overrule this one?", but definitively not overruled it myself.


If I was Ed, I would have grabbed David and told him to tell Shirley to allow the change as is. Probably before submitting it for review.


Because everything is urgent, all of the time.. and so nothing gets improved.


This is what escalation is for - the job of a VP or Director of Engineering is to decide what is actually urgent and approve fast-tracking it, and what should actually follow process. Having an efficient escalation process is critical.


Very much this: there should be a process for "emergency change" and that process has to be used for every actual emergency and not for anything else.


I've found that those in charge also have a budget to run you on... how do they justify you spending x hours updating code to do exactly 0 as far as anyone is concerned? ..plus all the regression testing? They just don't... it gets left as it is


The word "should"


"I know this is priority 1, but is it priority 1A, 1B, or 1C?"


Every numeric priority scoring system I've ever worked with at every company I've ever worked with has devolved into this: Start with P1 (high), P2 (medium), and P3 (low). Sooner or later, an exec will swoop in and declare something Even Higher Priority Than P1!!! So, "P0" gets invented, and after some time, you have tons of P0 issues, meaning P1 is the new "medium", P2 is the new "low" and P3 becomes "never". Then something comes in that is Even Higher Priority than P0!!! And now, you're stuck.

For this reason, I always advocate reversing the numbers. The higher the P number, the higher the priority. So you never run out of numbers to express how much more urgent and earth shattering your next bug is. Try it!


I worked at a big company that had a big data warehouse that you could submit jobs to: these jobs could be given one of five priorities from "lowest" to "highest". Naturally, everyone set their jobs to "highest", so jobs that generated some weekly email report that nobody really read were competing at the same level as jobs that were actually business critical (e.g., necessary for ad auctions to work correctly or whatever).

So they introduced a new priority level: "Critical". The pattern repeated itself in short order.

It was a tragedy of the commons thing, and eventually it was literally impossible for a job to ever finish if it was marked anything other than "Critical".

So they introduced "Omega Critical". Now you needed to be on a permissions list maintained by the database team to raise something to "Omega Critical". I got on the list (because I was scheduling actually high-priority stuff). Then everyone in my org would come to me and ask me to set random bullshit to Omega Critical, because they knew I had permissions and "one more job won't hurt"...

I don't work there anymore, but I believe they have since developed even more weird categories like "Q4 Omega Critical" and who knows what else.


I've seen 2 pretty good solutions to this.

1. VP bangs head with equivalent VP in other department, complaining there are too many requests. Can work, depends on individual.

2. Actually have a service level agreement. Requests are limited to X per hour/day/week/whatever the process needs, and billed accordingly. Have a bit of buffer in this, a little creative billing. Requests within SLA are dealt with in agreed turnaround time. Have a dashboard that clearly shows this to customers. Alert customers if they're consistently submitting what's outside of SLA, and they will receive a degradation of service unless they provide further funding.

Everyone knows if you pour too much water unto a cup it will overflow.


Having a budget for each department and a cost for running the queries is a great idea... I'll try to implement it the next time I'm in this situation. (I have also had a ton of VPs - plus the CEO and the owner - come to me with "very urgent now now now!" requests... it would have been great to tell them "the top priority for me is this query worth $1000, can you beat that?)


Sure. That's how external service providers work - being internal is in the buffer, you can 'do a favour' when needed. If it is the owner making an ASAP!OMG! request perhaps take the chance for a chat and to demonstrate you have a system and that you're not only managing, but taking a stance of leadership (and point out that as it is internally charged, they're not losing any cash).


Data warehouse queries at Amazon worked like this (5 levels of priority). Everyone set the priority for all of their queries to the highest possible level, every time.


This is what I like about kanban, there is a queue, priority is not a status.


Remove all priority levels.

Put a checkbox. "This is an emergency". People are suddently less likely to click it.

And anyway, if it really were, they'd have called so it still doesn't matter but it makes them happy :D


I worked at place with such a button. Neat caveat: only VP's could unchecked it...

(And I managed to check it within my first month there. My boss kindly let me know the difference between a client saying emergency and a VP saying so....)


I've seen where through general abuse and repeated, early dismissal of bug reports, people are extremely reluctant to call anything an emergency ... even actual emergencies.


I'll never forget the company I consulted at where the priorities were harder and harder to say with a straight face.

To get to the front of the queue, you had to mark your request "hot and urgent for the whole team"


I just got a sudden bitter taste in my mouth when reading "P0" and remembering the people who used to say it out loud; is that normal?

Another great thing about inverting the priority numbers is that nothing can easily go beneath a floor of 0, though ultimately all of this is about fixing the symptom and not the cause.


The reason for the numbering system possibly comes from the ITIL system where priority is determined as a product of two factors - impact (scope) and urgency. These definitions in most IT orgs I've seen regardless of how poorly executed or immature the org is seems to be pretty consistent and oftentimes include gating factors like other comments in this subtree have been discussing. If some of the most immature and byzantine IT processes can get this nonsense settled, engineers can avoid the bike shedding and goalpost moving, too if they actually put some effort into a sensible system and enforced it.


Order them, don't bucket them. That is, the highest priority is head of the queue, and the lowest is at the bottom.


I have a different issue. Everything kindof always gets promoted to a must have. So now I've made 2 priorities, need-to-have and nice-to-have. For any given collection of work that needs to be done there can be no more than 60% need-to-have estimated work hours. When the deadline is accepted by my team, it means that all need-to-have are done at that deadline, and nice-to-have probably are, but no gurantees.

Every single fucking time the PO thinks the deadline is too late, she tries to remove nice-to-have issues - sorry, but that doesn't change the deadline.


I former co-worker of mine told me about their priority tracking software where you could enter a number from 0-5 for each bug, where 0 was the highest priority and only certain senior managers could set a priority of 0.

However people soon discovered that priorities where stored as a float in the database and they could enter any number 0<n<=5 in the field. So bugs quickly started to get priorities like 0.5, 0.25, 0.1, 0.05 and so on...


This is inevitably what happens when the people reporting the issues get to decide their priority (I've seen this even in a two-person dynamic). You have to have discipline, and you usually have to have an impartial outsider (to the issue at hand) who doesn't care about other people's panic to actually set priority levels.


I've seen systems that have two fields, e.g. priority and severity. Where severity is "how important the client thinks it is" and priority is "how important we think it is".


Our system has another one: Who it impacts. Does it impact a user, a department, or the entire organization? The difference between "my crap's broken" "our crap's broken" or "everybody's crap's broken".


Heh. I like that distinction.


No, sooner or later an exec swoops in with "this is our largest customer!!!!". It's still a P1, it just gets bumped to the front of the line - because revenue. Nothing wrong with that, and no need for a different numbering system. Their P1 is no different than a smaller customers P1. They just get to be first in line because they pay to play.


Or what is even more likely, in my experience, "this could be our largest customer!!!!"

Sales-driven-development is lots and lots of fun.


I once had a priority list with a few items had a "low" priority, the rest being "high", "very high" and "extremely high". It gave me a depressing feeling because we had to delay stuff that was supposedly "high" priority for someone.


The problem with priorities is that multiple issues can have the same priority. If you give every issue it's own priority number then you no longer have this problem.


Hire P4 interns or contractors :)

If possible.


And let me guess... everything priority 2 or lower means 'never' :-)


  UPDATE jira SET priority=nil WHERE 1=1;
Here's your bugfix.


Or priority zero.


The best solution against "priority inflation" that I ever saw: support request form of the DreamHost web hosting provider had priority levels with vivid descriptions (examples) of their meaning, to the extent that the top priority level was called "people are dying". I doubt anyone dared to select that priority unless people were indeed dying.


Nah. Nothing is ever urgent.

Just take all tickets that are "urgent", remove the urgent and put them at the back of the queue.


Story of my last few jobs.


This is quite literally how technical debt is created. The story said that there was already a set of bugs being worked on. Which means that by adding this, the technical debt backlog would only continue to grow.

I worked at a company where the technical debt was so deep that I really wanted to say that it was technically bankrupt, meaning that the software should be entirely rewritten. This goes against every grain of my experience to do, and we ended up not declaring technical bankruptcy, but if you want to understand where it starts, it's, we'll do it later.

Each of those decisions needs to be accounted for just like a credit card purchase.


I think that's the key - let's not forget what's at state with this story: laying off people. What's the price of extending the tech debt of this core code without 'proper' unit tests, hardcoded values, and employee-IDs in code? Is that greater than laying people off?

It is easy to forget what the real world looks like when staring at a screen all day. We take out tech debt like a mortgage - sometimes that debt doesn't make sense and we end up having to foreclose on it, other times we're able to pay it off slowly over time and it all works out.

Where the analogy to debt falls down is that sometimes you luck out and don't have to repay tech debt. Maybe your legacy manufacturing system gets replaced with something modern and all of that horrible code magically vanishes - maybe your single threaded software from the 90s is finally outdated and you need something modern and would have had to rewrite it anyhow.


>Where the analogy to debt falls down is that sometimes you luck out and don't have to repay tech debt

Just call that "winning the tech debt lottery". Much like winning the real lottery it definitely happens but not enough to rely on it.


It seems like the code review policy should accommodate for scary legacy code.

What happens if the findings are fixed later in the week? Are they promoted, too? Are large swaths of the system regressed because of the changes? The process should allow the reviewer to say "oh, this is legacy code--the actual change looks good, and I won't worry about the code from ten years ago that doesn't conform to the standards we created a year ago."


Fixing code to meet more recent standards has a nontrivial chance of introducing a bug. If it is to be done at all, it should be done on its own account, and not piggybacked onto something with its own goals.

No matter how hard the process mavens try to reduce development to painting-by-numbers, good human judgement is the key to success.


Agreed, as that is the big issue. Anyone who requires a patch to production include more than the absolute minimum code change has a broken process. You make the next release bring that code up to the new standards when you can do a much fuller test.


Yes, totally agree. Far more risk is introduced by the cleanup than by the original change. The cleanup is still important, but shouldn't be fast-tracked.


I get the intention here, but sometimes code really needs to be touched. Otherwise you get in the modern day, some-variant-of-Whitesmiths code that nobody can read, or wants to read ever. If you separate it out, business processes usually impede on your ability to ever get to the spun-off task and then it just gets forgotten.


With that attitude all code is legacy code the day it is created, because it will never be updated to conform to future standards.


That's a reasonable point, but how do you (as a product owner) justify the cost of making those fixes (and doing the regression testing necessary to ensure the system still works properly)?

My group supports several legacy applications (10+ years old). The product owners can barely be bothered to green-light bug fixes. I can't imagine they could find the justification to green-light tech debt fixes.


I think the language of technical debt can actually be very meaningful to business. It's fair to take on debt in the short term to make initial gains, but you need to have a reasonable perspective about the ongoing cost that debt has, and pay it off on an ongoing basis if you can't afford the debt service.

If it's a decade-old application that rarely even gets bug fixes, it's probably not worth doing major refactorings. On the other hand, if it's your core product, and your team is regularly making updates and fixes, it would be irresponsible to not incorporate your team's latest standards.

Also, it's worth investing in test automation, because the balance really shifts if you can minimize the cost of regression testing by automating it.


That's the worst thing you can do. The change will be forgotten and will never be properly worked out. Why should it? It just works.

I have seen a 270k loc codebase where every change has been made by this kind of thinking. Nobody ever went back to make the "temporary" changes properly.


It depends on the project. If this is critical code, 9/10 times it might not be an issue to let it slide in favor of getting the release out early, but that 10th time is a disaster. Weird things start happening and you don't know why and the team dealing with the fallout may not even realize which change caused the issue.

So now there's a lot more disruption and a ton of frustrated people throwing way too much energy into finding what ultimately ends up being a one line mistake.

After a few rounds of that you decide that if it's going to take 6 days, it's going to take 6 days.


Open a bug for the other changes and change only what's needed (unless they're literally in the same line and have no other secondary impacts)


Right - the idea that a QA person would happily jeopardize people's livelihoods over a box-ticking exercise is why that profession gets a bad reputation...


> However, I'll allow the change for now because this is an important update that affects peoples jobs. I want to see every issue fixed before the end of the week and I will not approve any other changes until these issues are resolved.

This goes in the backlog and gets pushed down further and furhter and further until it's backlog clutter that nobody even notices until that particular chunk of code no longer exists.


>The process is fine, but sometimes shit is broken and you need to fix it now and clean up the existing mess later.

when did you ever get to this land of milk and honey called later? whenever you have a jfdi because its broken, the way its coded always stays....

I've never seen a manager sign off on reworking code that does the job, just to make it "more manageable" or "standard" or "less of a bodge"


> The process is fine, but sometimes shit is broken and you need to fix it now and clean up the existing mess later.

A problem is that most minor changes in the big picture are thought of as being critical things that need to bypass the process in the short term.

It's not just this case that seems to make sense. It's that there's dozens or hundreds of other similar critical patches that need to bypass the process.


What is to keep Shirley from simply filing up to four new tickets? Isn't that better?


Shirley keeps finding problems everywhere, she's not a team player.


Can you explain how finding problems is the opposite of being a team player? How is a team supposed to get things done if there isn't someone to recognize problems?


No I can't. It's just an attitude that I often see from management.


Management is almost never incented to understand defects.


Not only not a team player but Shirley is also a hypocrite. I guarantee there are non-standards based code touching the refractors she demanded at some level.


Sigh


This smells. I've seen plenty of the first part, fix it quick, but rarely any of the latter, fix it properly later. And the former stacks up rapidly as weeks turns to months turns to years.


>The process is fine, but sometimes shit is broken and you need to fix it now and clean up the existing mess later.

Because "later" might end up actually being in the middle of the night, or later might not ever come. And then the mess might be magnitudes bigger than what you were facing before. Just wait, now. You have the time. Always. Unless your platform/software is continually crashing, your business always has time. Changing code like that without review is begging for you to "not have time" and cause an undue amount of stress for everyone involved.

Be an engineer, not a monkey. We're all only human and mistakes will be made.


I find this attitude on code reviews to be extremely demotivating, especially on code bases where I am not the primary committer or lack overall responsibility for the system being changed. It completely distracts from the code under review, and it only requires a small amount of social/organisational ill health to be abused.

Even if you believe that the next committer should clean up, it can require quite a big context switch - either the clean up explodes into quite a big task and you have to question the trade-offs being made in doing it now, or it doesn't have full focus and becomes risky and/or no real improvement.

I don't happen to believe that the next committer should clean up in most of these situations. Minor things, sure, you should do it (but don't side-track the review with it!). In my experience, people who already couldn't be bothered to do it or clean up later use a new commit as an excuse to get their work done for them. I've often seen people criticising tangential code issues in review who were responsible for the original commit. The question becomes - if it's such a minor issue for me to clean up, why wasn't it a minor issue for you to clean up? If it's bigger, how did it pass review in the first place?

My view is that extraneous cleanup should be proportional to the original change at most - and not discussed in code review of unrelated changes. If a particular area of the code sees sufficient churn to accumulate more and more debt, everybody will know and talk about it, it will be clear that some specific refactoring is necessary, and more importantly it will be evidence that small cleanup is worthless (high churn) and a better architecture is valuable (high churn). A given team may have high tolerance for technical debt and thus never do this work. But then they are screwed anyway, why create noise at every review just hoping that someone else solves it?

Remember, there's also a good chance that the mess stays the same as it always was, sitting there passively. And there's a chance that the cleanup introduces unanticipated problems, especially if it's legacy code with no/less testing.


Are you addressing my statement or someone else's? I don't understand how going through with code reviews for any change is somehow distracting from "code under review."

And surely addressing the code you're touching isn't "extraneous" cleanup. You're modifying the code, you can do it bit by bit.


I think he's addressing this line from the original article, which sparked this whole sub-thread way up-thread:

"Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production."

Basically, it's the idea that if you touch a piece of code, you have to fix everything else wrong with it, even if that involves touching other areas of the code that you weren't looking at initially and may not even have familiarity with.

This is "the perfect is the enemy of the good", applied to code refactoring. If your policies put people on the hook for fixing mistakes that were made by other people a long time ago every time they touch code, there is a very strong incentive to never touch code. You get a bunch of hyperactive software wizards that shit out a working product (with a bunch of crap code), put it on their resume, and then move on to another employer. The people hired after them have a strong incentive to never do anything (because touching code will require that they touch a bunch of other stuff that they don't understand and have no interest in dealing with), and so they browse Reddit and Hacker News all day long. The code remains crappy, but its crappiness can't be blamed on anyone except folks who are no longer with the company.

Actually, this is a pretty accurate description of the software industry.


Cause when you say, "You can fix it later," later never comes.


"The CTO mandated that we reformat the TPS reports by Friday. We'll add the change next week."


Yeah, this is the kind of "everybody's empowered to push back and insist on doing things correctly" environment that people normally praise. Or at least, they praise until they realize that people will push back and insist on quality, and then they decide it's stifling and prevents them from getting anything done.


The problem is when the push back is not standardized. I've seen bad code go through with comments like "we can deal with these things in a later PR or two" but then other code gets flagged for "it fixes one problem but it could fix three other things too" which increases scope. An inconsistent standard is better than no standard but it's just as frustrating.


They're not pushing on quality. They are plainfully pushing against it and deliberately risking to introduce more bugs.


It's all a balance though, right? And really, about the flexibility of the environment. You have to be able to recognize and deal with a broad spectrum of priorities - this, at least, was not prioritized highly enough across the environment and that meant it wasn't handled gracefully.


My reaction exactly.

And 2 days were lost to escalation while 2 were spent in test. Everything else didn't look crazy to me if the change is small but has a high system impact.

For an industry that is implicated in downed spacecraft, massive security vulnerabilities, and a reputation for unstable products, it's amazing how happy we are to optimize for developer satisfaction over all other metrics (I'm looking at you, Agile).

Edit:

And before anyone says it, I'm not claiming this is a good process. But coding velocity is just one metric by which we should be measuring our success.


>...it's amazing how happy we are to optimize for developer satisfaction over all other metrics (I'm looking at you, Agile).

Developer comfort is not a goal of agile; the goal of agile is always having the project in a known good state.

Any developer comfort is a side effect of the prerequisites for having the project always be in a known good state.


I don't really disagree with you, but it is not unreasonable to come away with the point of view that the parent had. The agile manifesto has caused as much harm as it has good, I think. People use the manifesto as a methodology, rather than using it as a way to reflect on a methodology. In the extreme, some people will use the manifesto as an excuse to not only avoid a methodology, but a method as well! As long as my intentions are pure, then I can do no wrong, kind of thinking.

Having said that, I think to say that "the goal of agile is always having the project in a known good state" is simplifying things a bit too much. A very important part of agile is preferring artefacts that are executable over artefacts that are not.

More relevant to this discussion is the idea that people should be making judgements rather than relying on hard coded rules of process. If someone has made the decision that the risks of doing something is not worth the benefit, then avoiding the problem is not a waste of time. Importantly, though, a process or rule can not make judgements -- only people can. Following a rule blindly certainly can waste a lot of time.

Setting up a team is not easy. If you put yourself in a situation where every single person on your team can and will press the big red "Stop The Presses!" button, then you only have yourself to blame. Every person on a team should have responsibility appropriate for their experience, skill and view of the situation. If you have disputes, then you don't want an arbitrary process determining the winner -- you want a real, thinking, feeling human being to make the judgement call.

I'm rambling on here, but most of the breakdowns I see in teams come from the idea that either employees are stupid and so we should shield ourselves from them with layers of process, OR the idea that my colleagues are stupid and so I should make sure to politically rig decisions so I don't have to talk to them. This is basically guaranteed to result in failure.


Agile isn't about developer comfort, its about the fact that nobody knows what they want until they see it and decide that they hate everything about it. It's about the fact that requirements are made of mercury. E-signatures on specs aren't worth the e-paper they're written on, and the stakeholders want it yesterday with today's changes to the requirements.

Agile isn't a process so much as it is conceding defeat on the very concept of process.


I like the mercury metaphor. Here's a another one for Agile.: Don't build a stone fortress for a nomadic tribe.

If someone wants to give you a full specification and stick to it, you can build something fully specified. If they give you a vague idea that will change a lot, build them as little as necessary and make it flexible enough to change with the idea. If your code's going along for the trip, it may as well be easy to pack.


we should measure value, that is all. and that's what agile is supposed to aim for, not developer comfort. these processes offer negative value (prohibitive costs of change and inability to execute in time which massively outweigh any perceived security). even a vaguely leaner organisation will eat their lunch.

and that error-proofing is entirely imaginary - they have added a ton of moving parts, handoffs, and manual processes and so will have way more screw ups than a small tight team practising continuous delivery, as anyone who has worked in such environments will know.


"value." I'm getting this BS from a PM right now over a refactor job that needs to happen. Sustainability and tech debt be damned. Every time they want to support a new file type, it's editing several files, most of which are source, recompile, put QA resources on it, get IT to sign the binaries ...

If you'll get off my back, let me refactor and get my "developer comfort" on, you'll never have to involve me or QA or IT in supporting Just One More File Type ever again - you can push that from policy on the server side. But "developer comfort" isn't "value for the customer."

_sigh_


Why is it the PMs business if you refactor the code or not?

That's a technical decision that they shouldn't have to know about and are not qualified to make any decisions about.

Or put in other words: The PM should be your customer, not your boss.


Indeed. This is the actual problem that needs solving.


Just do it. Seriously. I no longer have a job where I have to put up with this bullshit, but I did at previous jobs, and eventually I just started doing the tasks that needed to be done to make code nicer. I'd spend roughly half my time doing necessary tackling of technical debt. I even got thanked in the end, when new features post-refactor ended up not being the shitshow that they always had been prior. People don't know what they want until they have it. Lead them to the promised land.


Yeah, I've had much more success "leading by example" than by trying to fight against the tide the "correct" way.


> Every time they want to support a new file type

Why can't you say "we never designed the system to support more than X file types, so if we want arbitrary additions we need to take Y time to change the system to support that. Thus far, there have been Q requests for changes, and in my professional opinion we could save time now, and in the future if we make adding arbitrary filetypes to the feature set this sprint".

This places the refactoring as something necessary to accomplish business goals, and save time/money in the future while providing better service to the internal or external clients using the software.


I'll give you a slightly different perspective than the other replies so far. The problem here is not a lack of communication with the product manager. The problem is actually that decisions are being made at an inappropriate level.

As a developer, your goal is not really to write working code as quickly as possible, your goal is to provide the highest possible throughput of working code.

Consider the case of delivering A as fast as possible, B as fast as possible and C as fast as possible. Then contrast that to delivering A,B, and C as fast as possible. These are different problems. Your product manager may or may not understand this point, so ensuring that they do would be my first step.

To do this, I usually use the metaphor of cooking. You can cook scrambled eggs very quickly, and can save time by not putting things away or washing up. You can then make a salad -- again you will save time by not cleaning or washing. Finally you can make soup. But by the time you get to soup, your board is full of unnecessary ingredients, your sink is full of dirty dishes, you have no room on your burner for the pot, etc. Professional chefs clean as they go. Each task takes a little bit longer, but their overall throughput is massively higher (the Ratatouille scene where she says, "Keep your station clean... or I will kill you!", is a good visual to keep in mind).

But this isn't enough. You product manager, once they realise this, will try to "optimise your experience". They will try to group tasks that they think will make you go faster. They will try to drop functionality which they think will make you go slower, etc, etc. Above all, they will keep asking you, "Can we make this change without having to refactor the code?".

In this way, you are going to start restricting what you can do and you will make your code more rigid because you are always treading around the boundaries and never fixing fundamental problems in the center. This is a sure fire way of making you go much, much slower and making your customers very unhappy (since you never do anything of any substance).

So what you need to do to remedy this situation is to cooperate with your product manager and agree where the appropriate place to make decisions is. For example, your product manager is the only person able to make a good judgement about what the customers actually need/want. It is usually hard for programmers to trust their product managers in this regard (because we tend to think of them as morons who are only pushing a schedule so that they will look good and get promoted). Making sure that your product manager understands that you depend on them for this aspect can go a long way to building a bridge. You have to create the idea that you are a team, working together to fulfil the same goals, rather than adversaries trying to achieve different things.

In exchange, the product manager will have to trust you that technical decisions will have a good outcome. If you tell them, "I'm going to take a detour", they have to understand that's because it will make things go faster -- you can see a roadblock ahead and pushing ahead will kill your throughput. This decision is yours, and yours alone. It is a matter of trust, though, so it will take time for your product manager to build their trust in you. Feel free to have a frank discussion about this and explain that it's difficult for you to trust them too. But when both sides trust the other, then there will be a much better outcome.

Now there is one last thing: emergencies. Occasionally, it doesn't matter whether there is a road block ahead or not. This is not your call. And as much as there are asshole product managers who will pull the "this is an emergency" card every 5 seconds, that's their call.

If you think the emergency card is uncalled for, balance your feeling for fixing the situation against the similar issue where you want to delay production to refactor code. That's going to be your call. Your product manager is going to be thinking, "Is it really necessary????" and your goal is to engender trust. It is a give and take, but your only route to success is building that trust.

In the end, if you find that you just can't trust your product manager (or they just can't trust you), I would talk with your management. If you still can't find an acceptable solution, then looking for a better situation elsewhere is probably your best option. Some people want to work in a broken way. You don't have to though.


"...the product manager will have to trust you that technical decisions will have a good outcome."

This is precisely the problem. PM refuses to believe engineering when we say "You can have feature F on date D" rather than on date D-21 when PM thinks they 'must' have it. ("must" in this case is also not backed up with data, but random desires of potential customers that come up in sales presentations...)


Of course they don't believe you though. If you have any experience you cook those numbers to deal with people like them!

I find fixed-length sprints harm agile. You're tempted to suggest sprint-length goals, not the minimum useful product. This leads to distrust, because everything looks like estimate stuffing, yet you being honest intend to fill those days doing cleanup, automation, etc. All real work. And you resent every effort to trim even a minute.

Usually even the surliest customer can be worked with by simply cutting them in on more of the planning (you do bill for meetings, right?) and letting them decide how far to push a "spike", etc. When they see their demands are heard and turned into work (and billed for!!) they often become much more open to suggestion, such as to stop beating an expensive dead horse.

The goal is to be able to break down a project into functional goals (index the data, query it, script that, schedule that, add a GUI, etc.) After the first step or two you're producing value, on the way to the goal.

One of the hardest problems in development is finding a competent customer for a PO, who can drive the project with requirements and political support. The more tangible you can make the benefits early on, the higher-level stakeholders you'll have access to and the less trouble you'll face getting things done.


As I said, there needs to be a mutual trust here. The "I must have this on date D" is the PM's call, not yours. You need to trust them. If you can't, then it's already game over. The problem is not so much that feature F needs to be done on date D (if, indeed, it turns out to be possible). The problem is when every request has a date attached to it. This is where you need to negotiate. Very probably they think that if they don't attach a deadline to the task, then it will never get done. Management training has historically taught just this. Similarly, they may think that if they don't assign a challenging deadline to a task, then you will sit around reading HN until the last minute and then furiously get the work done. In other words, they don't trust you to get the work done as fast as possible.

It's that trust issue that you need to address, not the date. If you focus on the date, then you are doing exactly the same thing that the PM is doing when they complain that you insist on refactoring code -- you are abandoning trust and trying to make a judgement call from the wrong place.

Unfortunately, in this situation there are probably a few things you need to do to solve the situation. As usual, there are many ways to skin a cat, but I will tell you the way that has worked for me in the past.

First, if you have deadlines on individual features, then you have a problem. If you have features that span large amounts of time (as in more than a day or two), then you have an even bigger problem. From what you are saying, these two things seem likely.

The strategy I would suggest is to temporarily acquiesce to the deadlines. Yes, technical debt, but it will pale in comparison to the debt you will acquire if you don't fix this problem.

Next, split up the features into smaller pieces. Each piece should be (on average) about 1 day of work. So if your feature is due in 10 days, then you should have 10 pieces to build. Do not organise these pieces by design structure. Instead organise them such that every day you accomplish something that you can literally show the PM (it won't necessarily be possible every time, but do your best). Very important: doing this will require a lot of rework because you will have to put temporary design scaffolding in to make something user-visible every day. Do not avoid this!!!!! I repeat: Do not avoid this!!!!! You will have to refactor as you go. This is not bad.

Depending on what kind of thing you are building and what your deployment procedure is, try to deploy not-finished, but still useful bits as often as possible. Every day would be best if you can manage it, but don't go more than 2 weeks without a customer visible deployment (2 weeks is a rule of thumb, but you should probably view it as a hard and fast rule at first and adjust later when you have a feel for what works best). This may require you to split up the feature into sub-features that are complete in and of themselves. This can be challenging from a design perspective.

Your goal here is 2 fold. First you are establishing credibility that you are working and delivering every day. Not only that, but if you miss the deadline, the PM has a very good idea of where you are with the feature. As you surmise, they probably don't care at all when the feature is done. They just care that you are working on it flat out. By delivering every day, you establish this fact.

Second, by deploying at least every 2 weeks, you are significantly reducing the pressure that the PM feels from other places. If you have an excellent PM, they will be insulating you from the white hot pressure that other business units are putting on your team. But even the best PM cracks and puts you under the vice.

Corporate attention span varies in different companies, but my 2 week rule of thumb has worked well for me in many different environments. It answers the "What have you done for me recently" question nicely. A stress free PM results in significantly more elbow room for you. Never underestimate this.

Now, it may be that you are already delivering dribs and drabs every day or two (because your PM is already pre-optimising your experience). If so you can probably go to step 2 faster. In this step you start negotiating to remove deadlines. Since you are already delivering every day and you are deploying every 2 weeks, negotiate the amount of work that you are going to deploy in the two weeks, while continuing to deliver every day.

So, if you have 10 reports to do, don't put a deadline on each report. Say that you will deploy all 10 reports in 2 weeks and that you will deliver functionality every day, just as you always have. Also negotiate that prioritisation happens every 2 weeks. So whenever they want something, they will have to wait (on average) one week before it is prioritised into the active development backlog. This will be a hard sell, but offer the addendum, "Of course if there is an emergency, we'll have to make adjustments, and I will leave the definition of 'emergency' up to you." (The general rule is that you remove a like amount of work from the backlog and replace it with the 'emergency'. If you have to stop work in progress, then it is lost and you emphasise that 'emergencies' have a cost).

It won't be smooth going all at once, but over time you should be able to negotiate a way that will work for your group.

Hope this helps! It doesn't always work out, but like I said: you always have the option to vote with your feet if you think that your partners just don't want to dance.


Agile emphasizes value for the customer but who the customer is has a very broad definition. The end-user is first and foremost, of course, but the operations group who has to keep the service running and the development team who has to maintain it (i.e. you) and so forth are also considered to be customers. If your PM doesn't understand that, it's a failure on his/her part, not an issue with agile itself.


So...learn to make your case more effectively. PMs are people, too.


Do I really have to explain the backstory here on HN? OK, the case has been made. Repeatedly. By me, my team lead, the research intern who wrote the original code ... Project Management is in agreement. This Product Manager refuses to accept that we need to refactor. Doesn't want the spend the time to do it. "I understand technical debt, but..." "I understand sustainability, but..."

Please allow me to disagree with you that this is somehow my fault (which can apparently be corrected by my "learning to make my case more effectively.")


I'm curious, have you put it into numbers? You can probably even make stuff up (AKA estimate), as long as it has a rational basis. Something like, "Over the next six months, not refactoring will cost us 450 hours of lost developer productivity, based on how much time was spent fighting tech debt over the last two weeks. The refactor will take six weeks and 300 hours of developer time. If we do not refactor, we will have 20% developer turnover within two years."

Edit: It's possible that your PM will then say, "This feature will earn $250k per week. Delaying six weeks will cost $1.5m. Over two years, 1800 hours of wasted developer time is ~$150k, and recruiting 20% new devs will cost $250k. Let's spend $400k in lost productivity to earn $1.5m in additional revenue."


> I'm curious, have you put it into numbers? You can probably even make stuff up (AKA estimate), as long as it has a rational basis.

Engineers and developers usually have a strong aversion to giving out numbers out of the ass, and you cannot give such an estimate in any other way (barring the cases of the most trivial patches).


I disagree here. Engineers and developers are great at making up numbers. Consider how many times in this thread where people will refer to "9 times out of 10" or other such statements. (Oddly, I have noticed that most of the numbers engineers make up are of the "90% of the time" variety. As if spouting a loose statistic is fine, but a date?! How could we?)

I think the problem is that we often think giving a point estimate somehow has to be 100% accurate. Or, worse, we estimate only on the lines of code that will be created and forget that it is all of the lines of code discarded in the process that take most of the time.

So, no, refusing to give a date doesn't do service to this. If something is going to take a long time, talk to the reasons it will take time. But FFS, do not focus on some intrinsic quality of the existing code base that is meaningless. It is not a customer problem that there is cruft in the codebase that embarrasses you. It is a customer concern that the last X times a feature was added you spent Y hours dealing with high priority bugs that resulted from compromises.

And if you don't know X and Y, than you are worse than making shit up, you are losing credibility.


> I think the problem is that we often think giving a point estimate somehow has to be 100% accurate.

That's a learned behavior from all the times that estimates become deadlines.


Then the learned behavior should be to remember the reason the estimates were wrong last time, and to include those this time.

And no, I do not think it is that easy. The point is that refusing to speak simply robs the process of valuable input.


Often, in these cases, what technically skilled people fail to do is make the case in sufficiently precise dollars-and-cents terms (often, in part because they fail to realize how much business types--which Product Managers mostly are, in practice--respond to that kind of presentation, and also often because they are professionally inclined to not allow precision of their forecasts to exceed accuracy, whereas business types are often conditioned to both create and act on SWAGs with precise cost benefit estimates that, while they often have a post hoc rationalization, are essentially pulled out of the air.)

I'm not saying this is the issue in your case, but I've seen it a lot.


So there's no engineering manager empowered to override the Product Manager? That's a serious organizational problem.

The Product Manager and Director of Engineering should be peers, not one reporting to the other.


I don't think the example is about optimizing for developer satisfaction.

1) He changed a number. 2) There was ( apparently ) a very simple test for the effect of the change being correct. 3) This was high-priority.


I think the fault lies in review procedures that require bringing the rest of the code up to standards if you just touch a tiny part of it. In a shop like this, these rules mean that nobody will ever touch legacy code if they can possibly help it.


For the company I work in the daytime for and the company I moonlight for, this has been the case in both situations. The looming nature of legacy code, especially with prohibitively work-expensive policies lead to much more than bad programmer effectiveness. It lead to lots of extra lead-time on deployments, heavily decreased morale, and when we lost our two rockstar-20/30-year developers, turnover exploded.

There's more to it more often than not than just policies and the amount of legacy code, but even when we had the most supportive of managers, morale was very low over it. I definitely think companies, as they age and go from startup to a firmly placed company, should really take stock of their code assets and policies and really make sure they're kept up-to-standard yearly, and really look at every policy and make sure it's absolutely needed. It's a lot easier than taking a 1 line project on a 9 year old piece of code and turning it from a half-day full-review to a 6 day haltingly slow process.


> this has been the case in both situations.

Can you please clarify what the case has been? Has it been the case that procedures that require bringing all code up to standards if just a tiny bit is touched? Or has it been the case that nobody will ever touch legacy code if they can possibly help it?

Thanks.


It's a good policy in general, but it needs to be flexible.

"Fix this unrelated stuff or no approval" needs to be overridden by "This is critical". "Fix this before you add this new feature" is pretty reasonable.


But why was the policy implemented in the first place. Probably to counteract an earlier development error where managers insisted that every change was priority 1, hot fix now. So the process evolved to be resistant to that.


Exactly. I've worked at places where "I'm the super senior manager, and I don't care about the process! Fix it now!" was acceptable, and at that point you might as well have no process. It inevitably turns into a train wreck.


or worse, work around the process to avoid it


Couldn't changing a bunch of code to conform to new standards actually create far more bugs?


That is a valid criticism, but this is basically institutionalized boy scouting, which is a good practice.


"Leave it better than you found it" is one thing, but "perfect everything you touch or don't touch it at all" is another, vastly more expensive and high-friction thing.


And hard rules like config files vs hard-coded constants reeks of fetishization of process and rules over judgement. If something has been modified once in 10 years, with no near-term modifications planned, hard coding it seems pretty reasonable.


> institutionalized boy scouting, which is a good practice

You brought up the name for it. Okay. Now can we get back to the debate on whether it is a good practice?

These people are debating whether boy-scouting is 100% required every time, or whether there are critical issues that require making it sometimes-optional.


> whether there are critical issues that require making it sometimes-optional.

There are obviously issues where it would be better to skip it, but that's the wrong question. The question that needs asked is can the organization identify, adopt, and properly execute a system which identifies those cases reliably enough that the benefit of the true positives is not overridden by the costs of the false positives. And, IME, the places that tend to have inflexible policies like this are also places where the answer to that is "hell, no!". Which is demonstrated dramatically (and expensively) everytime someone with who fails to recognize the organizational capacity constraint gets into the right position of authority and tries to implement that kind of flexibility.


I agree. Requiring things like stakeholder signoffs, code review, good test processes, and respect for team resource management is annoying, but it can also be incredibly important when you're trying to keep a big ship afloat.

Without those steps there's another article waiting to be written: "Rogue developer brings down business for six days with a seemingly innocuous 1-line change"


I've worked in two kinds of companies: Places where it's kind of a hassle to get code past a code review, and places where everyone's always having to put out fires all the time.

Of the two, I've had to unexpectedly work nights, weekends or even vacations at only one kind.

Guess which one I decided I'd rather work at.


I don't get it. The two groups are not mutually exclusive.

It can be annoying to go past code reviews and yet there are fire everywhere all the time.


It's very unlikely. Peer-reviews are spreading information across team, so at least two team members are understanding every area of the code, so project manager can always add more members to urgent or problematic task quickly.


I have worked in one place where you couldn't get the code to pass without tests AND a code review.

One of the methods I found had a cyclomatic complexity of 200+ and a single test, verifying that the result was not null.

I was told "no, you can't fix that, the NY office wrote it and they'll get angry with us if we change their code and refuse to help".

Things are worse than you think :)


Depends on how much people care. The code review could be half assed and not detect issues.


IME, those were the same place.


Except the end result was the boss said DO IT NOW and it got done now.

This seems to be a pretty clear example of not making the process work for you. Unwritten policies. Having to escalate twice because of silly reasons. Hot fixes being miscategorized as features. Is a very urgent fix really the place to be fixing a bunch of other tech debt?


cost to business of occasional goof is less here than prohibitively high costs of making any change from these inappropriate and frankly insane processes (I've held management positions at such companies and easily recognise these kind of processes, in fact the post may have triggered some mild PTSD)


This is one of those "all of software is a tradeoff" kind of thing. Here, the company optimized for safety, with the tradeoff being that it takes much longer to ship. Startups optimize for shipping, and consequently ship broken code way more often.

OP demonstrates the fatal flaw in having process, in that poorly created process can be crippling. In this case, it may have led to significantly less profits, where if they had a way to fast-track something immediately, they could have avoided the delayed change for something so simple.

Another flaw is that process, if it's really bad, almost encourages people to circumvent it. At large, slow companies, the devs who push the most code are not necessarily the best and the brightest but rather those who can bend process in their favor.


If we're dealing with a company-wide hard-coded variable that only changes when the company is in danger of layoff, I'm pretty sure that:

(1) Making the variable a configuration variable will do nothing: it's not like you want two different versions of company policies at the same time.

(2) It won't help with testing, because any realistic test of the variable's change will involve changing the company policy and probably the way all systems behave.

In my opinion, making that into a config variable is a mistake. It's perfectly OK to keep as a hard-coded constant.


It is not a mistake. One literal 3 is not distinguishable from another literal 3. If you change it in one place but forget about another - that can lead to the inconsistent behaviour. Configuration variables are much less error prone


Configuration variables make the code ever-so-slightly longer, and introduces additional failure points (what if the config file is missing or corrupted?) and considerations for cases that will never arise with hard-coded values (should we check that the limit is always positive? what if it is 10000 months, should we do something?).

Yes, in many cases they are an improvement, but not always. Every "best practice" must be judged individually with the question "does it apply here?".

Years ago I was working with some very small piece of a web-crawling framework, and I had to do something special for www.naver.com, Korea's biggest portal. Because the string "www.naver.com" appeared multiple times in the same function, we put that into a string constant at the beginning: something like

  static const string kNaverUrl = "www.naver.com";
Because having "magic constant" sprinkled in the code is bad!

...it was only much later that I realized the stupidity. As if Naver, Korea's #1 portal, will change its domain name at a whim! Even in the unlikely case that happens (say, they get acquired), so many other things will also change that my code will be surely unusable after that.

kNaverUrl was an unnecessary detour that only served to make the code longer and (slightly) less obvious, guarding against a theoretical possibility that will never come true.

Code Review has this nasty tendency of accumulating "best practices" beyond breaking point, because nobody objects to best practices. Once in a while, these practices turn out not to be that best.

Just my two cents.


I don't know how does your editor work, but my Vim diligently helps me with identifier completion, so kNaverUrl wins every time with "www.naver.com", even if it doesn't make the code more flexible.

It also defends against typos, as serge2k pointed.


It's also easier to refactor code without magic constants, as you have compile time bugs mispelling kNaverUrl instead of runtime bugs because "ww.naver.com" in one of 10 places.


You have a point, but on the other hand, it is much easier to write (kNaverUrl + "index.html") than to write "www.naver.comindex.html". So it just opens door for a different kind of bugs.


Then it becomes a function:

  kNaverUrl('index.html')
Multiple functions:

  kNaverWwwUrl('index.html')
  kNaverApiUrl('resource')
Then it get's refactored:

  urlGenerator('www.naver.com')('index.html')
  urlGenerator('api.naver.com')('resource')
Refactored even more:

  urlFor('naver_www', 'index.html')   
  urlFor('naver_api', 'resource') 
Even more!

  urlGeneratorFactoryFactory(.......


> I had to do something special for www.naver.com, Korea's biggest portal. Because the string > "www.naver.com" appeared multiple times in the same function, we put that into a string > constant at the beginning: something like > static const string kNaverUrl = "www.naver.com"; > Because having "magic constant" sprinkled in the code is bad!

You have changed your magic string into a magic constant. It's a step towards better code, but the next step is to give the thing a proper name. In this case, something like DoSomethingSpecialUrl. Now you have a bit of self-documenting code.


> As if Naver, Korea's #1 portal, will change its domain name at a whim

Maybe they won't change their domain name, but what if you end up needing to move from http to https? Or what if you want to crawl their staging server at staging.naver.com? Or if you wanted to bypass the DNS?

Avoiding the magic constant achieves more than just guarding against name changes. It also guards against typos, you reduce n potential failure points to 1.


static const string kNaverUrl = "www.naver.com";

That's such a common junior developer mistake. I see it a lot - constants called ONE_SECOND or COMMA.


It was a hard-coded value before. The OP just wanted to leave it as a hard-coded value. The world would have been just as hard-coded before his change as after, and it would have been improved as little by the change. The policies in place seem to set the perfect against the good.


My take on this is modifying the hard coded constant was not a functional change to the code base. Making it configurable was a functional change. One shouldn't make functional changes to old battle tested code unless you have to. If you do that it's just risk.


Configuration variables are, in my experience, considerably more error prone. If something is in a configuration file, some day, somebody is going to want to configure it, not knowing how it works, what the valid values are, or what it even does. Moving something to a configuration variable means you have to do all the work to sanitize that input and defend against invalid values.

Hearing "Oh, make setting XYZ configurable" or "Stick a flag to turn FooBar'ing on or off" also fills me with worry, as all too often it has been the harbinger of terrible feature creep and piling on of technical debt.


Nobody set it needs to be a magic number. A descriptive constant name will address that. However, a setting in a config file invites tampering, which in turn means more complicated tests to account for that. It might make sense if frequent changes are anticipated.

However, not every hard coded constant should or needs to be in a config file.


My impression was that "hard coded variable" meant a named constant, i.e. something that was not run-time configurable.


So make it a CONST.


The pain for me here isn't that process can take some extra time, it is that the developer doesn't know what is expected and keeps getting surprised at every step. It can soak up a lot of calendar time and developer motivation to have a code change swatted back and forth as different people weigh in with various undocumented guidelines/policies/standards over a period of days or even weeks. This could have been a very different experience if developer had known what was expected from the start.

Ideally I'd like to see code reviews culturally in an organization be mostly about catching bugs and collaboratively learning the craft of coding better. It can be very frustrating and demotivating when it devolves into a series of undocumented style and naming requirements. I think it is good to have standards and process but seems like most places I've worked are woefully short on documenting these hidden requirements.


I think there are certain things here that should go and things that should stay.

Reviewers should not be afraid to reject code, changes should be tested.

However, there are both bureaucracy and code reviewing things that I would change here to not slow things down unnecessarily:

- If the code practices document hasn't been updated, pending future changes shouldn't block your PR

- You should never have any trouble finding someone to review your changes - the whole team should be empowered to review.

- Strictly, only the lines you touch should need to follow any existing code practices, not the whole module because you happened to change one line. Otherwise people will be scared to do small fixes.

- You should only be forced to write new tests for functionality you've actually added, for the same reason. As well as of course fixing tests you break with your code.

- Don't even get me started on Ed not having access to Marge...

- Why does this piece of work need to be done in the first place? Why would a backlog be limited to 3 months?

TL;DR: There should only be as much bureaucracy & strict rules as strictly necessary. Unnecessary bureaucracy is the death of companies, large and small.


>- Why does this piece of work need to be done in the first place? Why would a backlog be limited to 3 months?

If it's a literal factory they may want minimize storage costs. You have to rent space to keep the product until the customer expects it delivered. I'm a layman so take my guess with a grain of salt.


It's nice to reject code, but code reviewers must also be mindful of opportunity costs when wasting developer time on trivial concerns. (Being the cynic that I am, I guess at some point Corporate America will introduce code review reviews to solve this problem.)


> Code review reviews

Good tech managers do this today. I do. My direct reports that give good code reviews are more valuable to me, so I need to know how well they are reviewing code.


Exactly. It's a metamod process.


Amen! For an enterprise system that doesn't have end-to-end automated tests, this sounds like a sane organization running as best it can. The process is optimized to not let incomplete code make it into production. This is a company doing it right given what they have to start with. Changing that ship won't happen overnight.


But will anyone change it again? How many times would they have to change it for this to be better? Maybe the developer should build a UI so the business owners can change it themselves. After all, business people editing config files directly on production machines is dangerous. Clearly the conf file should be in source control and subject to code review. And so on and so forth. Reasonable changes accrete.

The real question is: What is this policy and process protecting against? Does it work?

We're all risk averse. Bad things happen. We come up with processes and systems to prevent them. We write our postmortems and say "How could this have been prevented?" and we say "Heavyweight process X and automated, difficult to maintain, high overhead system Y" and everyone nods their head sagely and makes plans to implement X and Y. Then we make a new mistake and postmortem and build process Z and system W.

But you know that old saying "Lightning never strikes the same place twice"? Well, it's wrong. Sometimes it does. Especially over a long enough timeline. But it's pretty rare. We're the soldiers preparing to fight the last war and losing the next one.

Personally, when I see stuff like this I always think of https://medium.com/@webseanhickey/the-evolution-of-a-softwar...

It's a cartoon. But I'm getting old and angry. 15 years in, I think that most of this stuff that we do, these pedantic code reviews, processes, policies, best practices and test plans we're always beating each other over the head with, they don't always pay for themselves. We still write bad software. We still make stupid human mistakes.

There is no answer but good judgment, trust and respect. You need as much of it as possible. I think it's easier if you believe Clayton Christensen and build protected small teams that can operate in a limited bureaucracy and limit the connectivity between them and groups viewing them as "others". But again, trust, respect, autonomy and good judgment.

Or just make shittons of money. With lots of money, you can afford to hire lots of smart people to do stupid stuff and take forever doing it.


Here's some.

David/Julie: Having processes that allow critical issues to sit idle in a backlog for 2 days.

Shirley: Delaying a code change because of completely unrelated issues. Code changes that introduce bad code should be fixed first. But demanding that the person fix all unrelated issues in the file is not the way to go.

Tony: Not having tools in place for quick, completely automated testing. If a one line change requires 3 days of manual testing, something is wrong.


For established businesses, I wonder if you could come up with a useful metric for how much value you add by rejecting a commit and thus avoiding an operational incident.

Mature, high-profit systems warrant much greater scrutiny on incoming changes than businesses/systems which aren't spinning off money yet. You can err in both ways: too slow where you need to be fast, and too fast where you need to be more careful. And AFAICT companies/organizations build processes, habits, and platforms around one mode and have a hard time switching to the other.


Yes. Here, every second of downtime is over $10, if it affects all users.

Last time we had a partial outage that only affected a piece of the base for a few hours, we got a first estimate at over 4 million dollar direct loss of revenues, and some percentage of that as direct guaranteed profit loss.


So when stackoverflow or github goes down, and they trace the bug to some legacy component, they should refactor the entire thing, taking a week, instead of increasing the memory on their VM, bypassing the release process, then doing full postmortem once the fire is out?

This isn't dev. Running a factory is ops. It can't take 6 days to renew your domain name when you realise it expires in 2. It can't take 6 days to patch your server when some script kiddie takes it down the first time. Not everyone has a job that gives them the luxury of making sure everything is perfect before they commit to anything.


There are two paths issues can take. One is the regular one, that follow procedures for good reason. The other is the fast one, this is the things where time is of the essence, e.g. a bug that takes the entire system down so noone can do anything - issues like this must be solved asap, and the codemess, edge-edge-case testing, and documentation will have to wait until later (but will be top priority). This change sounded like it was the latter, and putting it through 6 days wait to get something out that needs to be changed yesterday is too slow.


The fault is that they're missing the "fuck the process, let's get this done by yesterday morning or we're all gonna die" button.

(Of course, the rule for this button when you have is that you can only press it like once per month or some other arbitrary long time. Oh, and maybe having a physical button that actually sets off an alarm in the office could help too. Maximum annoyance to prevent abuse.)


I realized this as I was writing another reply, but it can be faulted for forcing a system upgrade onto an independent issue. Separation of concerns, people! Messing with legacy code is not a trivial matter, and if you are going to do it, make it its own project.


What about

> Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production.

F* that s* indeed.


This sounds like a place programmers go to die.


I'll go further and say that this is what people say they want: time to refactor out tech debt. The headline is a lie - there was tech debt that was company policy to take time to fix (parametising some hardcoded variables, amongst other things). "mission critical LoC" is not the same as "LoC" in their little summary.

People beg for time to fix tech debt, and right here is someone complaining about having to do just that.


As I read it, he's complaining about (1) gating high-priority releases on tech debt (priority inversion); (2) the fetishization of rules over judgement (no hard-coded variables that have been changed once in 10 years with no near-term updates planned); and (3) the systemic lack of urgency.


Yeah, but this is "top of the queue" priority, not "must get done today" priority. The managers were talking about lead times and the potential of laying someone off, not "if this doesn't get fixed today, someone is going home hungry!". There's no mention anywhere in the story of someone actually at risk of being let go due to the delay.

I've also worked support at a startup that was all about the whole "we don't really need tests for trivial changes!" factor. Oh, what fun it was when the devs would push straight to production at 1am, with an "ah, it'll get tested in the morning sometime, maybe"... in the meantime our farmer clients were using the tool well before office hours to plan their days around irrigation requirements. Those changes that devs thought so trivial were usually trivial, but sometimes were quite catastrophic due to unintended interactions.


As per the article, there were both unit tests and a pre/post test run initially provided for this change.


This is why startups kill big companies.

Lots of stupid little processes that add up and make the managers feel power, but end up killing the performance of the business.

Brings up the interesting differences between leadership and management. Managing is what gets people into situations like this, whereas leadership (in late stage companies) is what gets people questioning these processes instead of moving like cattle into a slaughter.

I saw this exact type of behavior at General Electric back in 2011. People wanted to argue about acronyms and process design all day, whilst Samsung ate up 20% market share in 1 year. That business of GE no longer exists and was sold to a foreign buyer. Move fast or die -- this rule does not only apply to startups!


It's easy for startups to be nimble because they have no legacy processes to maintain.

The startup may have 5 guys in a warehouse building their product, while the big company has 100 people on a $10M assembly line.

When the startup wants to scale by 20%, they work longer hours or add another worker. If they didn't plan properly and find that they don't have another workstation for that worker, they buy a table at OfficeDepot and there's their new workstation.

The assembly line might need an entire process flow redesign to add more workstations, plus a physical reconfiguration, software changes, etc.

If a developer just blindly adds a parameter that adds 10% to the workday without going through reviews and tests, he may find out from the physical plant engineers that they no longer have enough time to do preventative maintenance so they need to scheduled an outage to do deferred maintenance.

There's a reason why process exists in large companies, and it's not just for the sake of process.

That small startup might be able to usurp the large corporation, but the large company still has an advantage in throughput and/or efficiency. The corporation can promise Walmart that they'll have 100,000 units delivered by Dec 1st... to meet the same goal, the small startup might have to scale their workflow 100X, and they can't do that by scaling from 5 guys in a warehouse to 500 guys in a warehouse.


I don't know, for every justifiable decision I've seen in corporates there have been ten where it was all about politics, needless bureaucracy, incompetence and power and nothing to do with getting things done.


> whilst Samsung ate up 20% market share in 1 year

Right a 78 year old company with hundreds of billions of dollars of revenue is a startup killing big companies.

I'll just note that while Samsung is about half of G&E's age it makes twice as much in revenue.


The comment you are replying to included:

    Move fast or die -- this rule does not only apply to startups!
So I think they are aware that Samsung is not a startup.


Apparently "startup" doesn't mean "company which has just started running" these days, it means "company which is growing rapidly."


I don't know. It seems more like 99% of startups die, and 99% of large companies stick around. Not that I enjoy working for large companies, but just saying...


Large companies stick around because they've solved the one problem all startups have - finding and keeping customers and turning them into revenue through sales and product pipelines.

Code is pretty much a footnote to that process. It's exactly one of very many tools a business can use.

It's true that technical debt can cause business problems. But it doesn't cause nearly as many problems as having an ineffective sales team, getting killed by regulation, or being eaten alive by a competitor.


Right. For every startup that skips process and gets lucky there is a pile that rolled snake eyes and died.

Big companies aren't taking the chance.


Big companies tend to not take that chance even when it's worth taking though. Not everything is mission critical but the procedures around them pretend they are.


> 99% of startups die

And no one bothers suing into the ground a muppet with two packs of ramen to their name ...


Startups can afford to have their critical systems explode.


Considering how inefficient big companies are it's surprising that they don't get killed in the market immediately. You would expect that there are plenty of nimble newcomers who kick their asses but that's not really happening.


There are advantages to scale and disadvantages. Just looking at the disadvantages makes you wonder how giant companies can exist. Just looking at advantages makes you wonder how tiny companies can ever hope to compete.


In software it's often because the can afford expensive lunches with the people buying their software. The buyers are several steps away from the users of the software.


Some inefficiencies are more equal than others. Brand, the number of people in your network, reputation all add up.


Two words: Regulatory Capture.

See Also: Boeing, IBM, CGI Group, AT&T, NRC, etc.


Honestly, that's way too simplistic a point of view.

The biggest reason large companies that move slow continue to succeed in their original business, is because their original processes and flow are from when they were growing, while they were learning and more agile in their space. As a result, they still generally provide much more value there than the average startup can compare to, in at least a few specific areas, specific to each business.

That's why the blue ocean approach often succeeds much more than trying to take on a large company at their own game. It's hard, because it's a huge organization dedicated to being good at that one thing- and nothing more. So the better answer is to sidestep, not to take them on directly.


I don't read that as being the point of the article. Regulation is not even mentioned or alluded to. Also, my understanding of regulatory capture has little to do with regulation slowing down business. If anything, rather the opposite. But that's off-topic anyway, and definitely not appropriate for this week.

Edited to update comment on regulatory capture.


I don't know what you're talking about, it's in there plain as day. The company has a department called IT Security. Not IT. Not Security. IT Security. Companies that legitimately care about security integrate security personnel with all of their operations, into every department. Top-down security departments exist to check boxes on regulatory forms. So I will be shocked if "IT Security" is not a made-to-order department whose existence is solely to satisfy things like PCI, HIPAA, etc.


I understand your point if you're referring to government regulation. I don't think your argument makes sense if you're referring to "regulatory capture":

Regulatory capture is a form of government failure that occurs when a regulatory agency, created to act in the public interest, instead advances the commercial or political concerns of special interest groups that dominate the industry or sector it is charged with regulating.[0]

[0]: https://en.wikipedia.org/wiki/Regulatory_capture

In the case of regulatory capture, you'd think regulation would most likely be reduced rather than impeded.


> In the case of regulatory capture, you'd think regulation would most likely be reduced

No, in regulatory capture you expect regulation to be increased so as to form a barrier to entry to new competitors, while advantaging (comparatively) established incumbents that have the inside track with regulators and the regulatory process.


Makes sense. Thanks for the insight. My believe main point that using the term "regulatory capture" is not what the commenter intended or is appropriate in this case (as it's regulation in general that would affect this rather than tangential effects of regulatory capture).


True, but the greater danger of integrating security into every department is discounting security in favor of the main requirement of your department. Your department is viewed for political and performance reasons, budgeting time away from the main goal of your area towards security is technically ... waste.

Additionally, you ignore the business logic cross-departmental issues when you do not have a centralized security department.


What? In any large company, you are going to have access controls - not everyone is going to be allowed access to every system. In the example given, the guy was requesting access to a new system he didn't previously have access to; having a security team that is in charge of that access doesn't seem to have anything to do with regulation.


Just sign the paper for PCI or HIPAA and here you go. Fully PCI and HIPAA compliant.

There is really no need for full IT Security PCPAA compliance department.


Reminds me of the bike shed effect: https://en.wikipedia.org/wiki/Law_of_triviality


It's easy to criticize this type of thing as wasteful and inefficient for the one time in a thousand where it really is just a single line code change that doesn't break anything, but the reason things like mandatory code review, required tech debt fixes, and all of the rest end up existing is because developers are notorious for saying, "Oh, it's just a one line change!" and then breaking everything when they slam it through to production without thinking about it.


So... one of the places I work is one of those 'nimble little startups', and it gets more and more bogged-down by the day due to some poor technical decisions taken early. There are seven different kinds of timestamp in the database, for example. That's pure dev overhead that gets applied for everything. And because the nimble little startups are pressed to work on new features rather than refactoring, more tech debt just gets added to the pile...

Also, don't forget selection bias - most small companies, startups included, die in the first few years. Only the success stories make the papers.


Related anecdote: I interned for a medical device company over summer of 2015 and worked on a team that was in charge of maintaining several implantable class 3 medical devices. The devices used a short range proprietary wireless standard to talk with a intermediary device which could be connected to via Bluetooth. One product had a pretty serious usability issue where connecting to the intermediary device could take upwards of several minutes. After the issue was discovered it took roughly an afternoon and several lines of code to fix. However, in order to provide software updates for a class 3 medical device each update needs to be individually approved by the FDA. It took 12 months from the time the fix was implemented to when the FDA allowed it to be pushed out.


What is the approval process? Do the FDA manually check the device (similar to FCC certification for wireless things)?


I never had the opportunity to be involved in the FDA submission process, but from what I understand the FDA does not do any testing of their own. It's the companies responsibility to prove to the FDA that their device is safe and effective. All the testing is done by the device company, carefully documented, and then reviewed by the FDA. I've heard stories that the documentation for a class 3 medical device can be upwards of 20,000 pages which are all printed and handed to the FDA in 3 ring binders.


I think the word you want is "anecdote".


thanks


I this had happened at any of the companies I started, I would have fired several people in the aftermath. The problem is the policy is broken by design. Let's look at what is wrong here:

1.) priority is not just high, it is critical, communicating this is lost at each layer (executive, planning, execution, process control, quality control) 2.) leadership is lax, the chain of command doesn't designate a clear single responsible individual 3.) policy enforcement in this example actually increases the risk of an unsatisfactory outcome, by increasing the complexity of the solution vs. what is in production 4.) quality control is adversarial and ass backwards, code review is supposed to be a sanity check "does this code do what the developer thinks it does" aka. "can some other person understand it" 5.) test planning should not be the developer's responsibility, quite frankly if QA can't figure out if it is working or not you should fire your QA department. 6.) Ultimately, it is a total failure of policy and management as it requires the President of the company to micromanage the situation.

If you think any of this is fine, I'm sorry but your company is doomed to fail (unless it is already so big it is too big to fail).


You would fire them, instead of trying to build up their skills? I'm not even sure who you would fire in this situation, the IT director for not shepherding the ticket all the way through, or Shirley for closely following company policy? Either would be overreacting.

Slow/inflexible company policy has been built up, presumably under the IT Director's watch. The president didn't communicate that it was extremely urgent (and maybe it wasn't - the article doesn't say when the change actually needed to be made by to make a difference). Ideally, management can recognize that this change took a week because of their accumulated decisions, and lack of urgency. They should figure out exactly how their company policy is slowing them down instead of letting heads roll.


Don't fire. Don't touch anything.

Just remember the chapter 12 of Peopleware.

Between two organizations, the productivity can vary by 10 times. It's not about you, it's a social characteristic of the organisation.

Just leave and find one whom speed suits you.


Specifically regarding your code review and QA comments, code quality is everyone's responsibility, not just management's or QA's. When development gets the idea that only QA is in charge of quality, development tends to write sloppier and sloppier code.

The real reason why this one-line change took so long is because the bureaucracy hasn't been automated away yet. Policy like not allowing magic numbers shouldn't be stuffed in a document, it should be enforced in a linting tool, and that tool ought to be runnable on the developer's workstation. If you have a QA department, QA should be spending its time writing automated tests, not maintaining and running manual tests. The developer ought to have been able to take 5 minutes to checkout the codebase, make the quick change, run the tests close to the change, and then submit to CI, which would run through the entire test suite and then continuously deploy to production. QA's approval should be built-in to the codebase and CI. Even in big companies, the process should not take longer than an afternoon.


"I this had happened at any of the companies I started, I would have fired several people in the aftermath."

And then no one would ever try to block a code review ever, and your code quality would go in the toilet. Then, something like this would happen again, except with a much less benign change, and your servers blow up (not literally).

"1.) priority is not just high, it is critical"

Priority is always critical for whatever little changes management wants.


> Priority is always critical for whatever little changes management wants.

Which is why in an actual emergency, management needs to actually communicate with the people involved in the development pipeline, not just hope the normal flow will magically work faster.

This used to happen constantly at my office:

Boss: I made a ticket to [change thing]. It's urgent, and please let [coworker I pass tickets to] know that it's urgent too.

...

Me: Coworker, I just sent you a ticket, [boss] says it's urgent.

Coworker: Everything is urgent. Shrugs and goes back to working from the top of his ticket list, all of which have the same high-priority flag as the one I just sent him.


> priority is not just high, it is critical, communicating this is lost at each layer

This was my exact thought. Why did a "fast tracked" ticket spend two days in a queue? Why did the requests to IT not also get set to urgent priority? (An IT request that blocks an urgent code change should itself be considered urgent.)

Most bizarrely to me, why the hell did they not defer updating the code to meet newer company policies? For the hardcoded variable thing, I would've said "ok, just change the value now, then create a separate, non-urgent ticket to bring that code up to our current standards."

It's sounds like their bureaucracy isn't equipped to deal with fast-tracked changes. The worrying part that management fed a critical issue into that bureaucracy and didn't prod it along until they noticed the delays. They may not fully appreciate how complex and full of roadblocks their process has become.


1, yes. 2, not really 3, no 4, not really 5, yes, but does not affect this case.

If this company runs into this kind of problem often, then something can be done, but if this is a rare occurrence, it's not so bad.

Remember dev did not 'waste' 6 days of his time, rather, it took that time for the cogs to move together on it.

And even with 1 ... it's not super clear.


We're actually not sure about (1), as 'kevinstubbs points out.

(2) It does appear that David, the IT Director, was responsible, though he was out of town during part of the process. (What? In 2016 "out of town" means "not reachable"?)

It's on (3) that I totally agree with 'cthulhuology and disagree with you. Changing the value of an integer constant in code is far less risky than what the code reviewer demanded. As others have argued upthread, the original one-line change should have been put through to production as it was, and the additional changes should have followed later via the established process... if indeed they were truly necessary at all. (As someone else pointed out, a variable whose value changes once in 10 years probably doesn't really need to be a configuration parameter, though making it into one probably counts as a slight improvement.)

For (4), okay, here I don't entirely agree with 'cthulhuology. There is a place for enforcing standards; code review isn't just about comprehensibility.


> What? In 2016 "out of town" means "not reachable"?

The text explicitly says David's signature was needed. Unless they have an e-signature setup, or he can locate a fax machine, it doesn't matter how "reachable" he is.

(That's assuming that "out of town" doesn't mean "on vacation", in which case "not reachable" is in fact quite reasonable.)


> (As someone else pointed out, a variable whose value changes once in 10 years probably doesn't really need to be a configuration parameter, though making it into one probably counts as a slight improvement.)

I'd argue that adding a variable that changes once in 10 years is not an improvement. Config files should consist of things that actually need to be configured. Adding crap you'll never touch makes it harder to find the variables that are actually important, and also makes it harder to read and understand all available options.

Introducing configuration variables for the hell of it is analogous to writing every method to be as generic as possible; useful but not worth the risks and expenditure. Just like how we often draw the line and say "this method doesn't need to be any more generic", config variables similarly need an approach more nuanced than "move everything to the config file".


"Changing the value of an integer constant in code is far less risky than what the code reviewer demanded."

This depends on context.

An integer variable can kill humans if it's the wrong variable.

Was it related to safety?

Who knows?

I'm assuming the reviewer knew what they were doing ...

Maybe some common sense all around would have helped here.


It's the president's fault for letting this problem rise to 10% and risk layoffs before acting. Like most leaders, his psychopathy is visible by the wake it causes.


For me the absurd part was this one:

"Philip (President): Our factory is underutilized by 10%. Either we start building more of our backlog or we lay people off. I'd rather keep everyone busy, build inventory, and get ahead of the curve before the busy season. How can we do that?

Lee (Operations Manager): Company policy restricts us from building more than 3 months of backlog. If you just change that to 4 months, we'll have plenty of work."

You can't have one week of 10% less productivity if not you will immediately fire everyone? And you company policies only work if you change a line of code? And this line of code is from some old legacy code that no one touch and your whole process depends on it? No, 6 days of development is the least of the problems here. This company is absurd.

6 days for the line of code wasn't optimal, but wasn't that bad. What this story shows to me how bad management pretend things are IT fault.


It didn't mention firing anyone. When I see 'layoff' used in the context of manufacturing, it usually means something more like an furlough (i.e. the laid off employees stay home and don't get paid until the company has enough work for them to do). In this case, it seems they'd rather not do it, but if the software change drags on long enough, they'd have to start layoffs.

As for requiring a code change to implement new policy; maybe not ideal, but it looks like they're trying to move away from that with the requirement that hard coded constants be moved to a parameters file.

And in general, using the company's production software to enforce business policies is probably an important management control. Often when you see rules like that, it's because the company got burned by something in the past and wants to enforce a process that prevents it from happening again. In this case, maybe in the past an operations manager was evaluated based on how busy the factory does, and so just kept building up more and more inventory to game the metrics he/she was being judged by.

I think it was a tad inaccurate of the manager to call it 'legacy' software. It might be an old system, but it appears to be critical to the company's operations, and it seems to be something that is worked on fairly regularly. Even if it's written in Cobol and JCL, I'm not sure it would be fair to call it legacy software unless the company is actively working on a replacement and making only necessary changes to the old system in the interim.

So, while I understand your concerns, I'm not sure it's fair to pin this on bad management. It sounds to me like a company that's probably been in operation for a long time, and has processes in place to make sure nobody pushes any changes to a production-critical system without proper approval and testing. They could possibly streamline the process, but it doesn't seem (necessarily) crazy based on the information we were given.


Ugh. that's sad. I'll be sure to avoid old manufacturing companies when looking for jobs then.


This even isn't the absurd part.

If you increase the backlog time you are building for, then it's a one-off increase, it doesn't increase the quantity of new work coming in, so once they've worked through this one-off extra with their 10% spare capacity, things will just go back to how they were before.


That can be okay if the same 6 days can be used in parallel to change 999 other lines of code, too.

I.e. "1 line in 6 days" is only a measure of latency, and not of throughput.


> a measure of latency, and not of throughput.

You could drive a station wagon full of backup tapes quite far in 6 days :)


It's not about how far, but how fast it is going. The bandwidth of that wagon is about how fast those tapes pass a given point of travel. How they are arranged inside the vehicle is critical. Tapes moving at 100 km/h side by side have more instantaneous or "burst" bandwidth than tapes moving at 100 km/h in a single file.


A risk of allowing too much bureaucracy in commit processes is that engineers try to avoid the slowdown as much as possible. For example, several unrelated changes may bleed together into “super commits” so as to clear bureaucratic hurdles only once. Next thing you know, it’s harder than ever to identify the exact cause of any one issue: you have more changes in each commit, fewer comments than ever, and a high probability that no thorough review was really conducted.

Large blobs make the review process completely fall apart. In the past I dug up too many cases of people allowing 50-file steamrollers into a repository when the entire “review” was literally two words (“looks good”). While this probably happened due to time constraints, it makes the entire thing pointless. With a simpler commit process, engineers might submit smaller change requests at a time and people asked to review “7 files” might do a thorough job instead of balking at requests to review “50 files”.


One of the things that happens as a result of anal code reviews is that at some point you learn to commit as little as you can get away with (and this includes things like comments and tests).


Avoiding comments or diagnostics/logging, because these two will be picked on every single time.

"Don't have it debug log 'Foo'd 5 bars', make it 'Foodulated five bar things'"


In Case of Emergency Break Glass

There really needs to be a procedure for violating procedure, so that when higher-ups say "do this immediately", those same higher-ups can own the consequences of violating procedure, and so that they know the business implications of the rollback plan, etc.

If the bug is "people are getting laid off" or "our customer database is being siphoned by an SQL injection", it probably should not wait.


The process sounds fine, no need to violate it. The problem was that the IT Admin said "I'll put this on the fast track" but didn't, and no one else saw the urgency. It was just dumped at the back of the queue behind a lot of other things and mistakenly labeled an 'Enhancement'.

Proper handling would have had a dev working on it within the hour. The response to the code review would have been "other changes are out of scope, open another ticket for enhancements". If the developer didn't already have access to the test environment, at least IT would know that they're authorized for it. (Critical fixes shouldn't be assigned to a new hire who doesn't know the code base and doesn't have access to commit code yet.) The ticket would be edited (Test Case: view WorkOrdersHours report, Expected Result: ~10% higher total) and tester would be told to reread it and go ahead. And of course documenting the test runs is the tester's responsibility (they're the one doing the test runs), not the developer's, as is notifying the IT Admin and Operations Manager when the testing is done and it's ready for sign off.

Exact same process, same quality checking along the way, timeframe reduced from 6 days to 4 hours or less; likely less than 1 if the team works together well and their tools are decent. A good process can handle emergencies without sacrificing quality, as long as the people work together and the priority is communicated.


I once waited 6 weeks for a backend team to fix a prematurely closed div tag that was screwing up the layout of a news site. The templates they were given were good; they screwed up the implementation. In desperation I had to hack the layout with JS (which I was able to update at will).


That kills me to hear that. I'm in charge of Web where I am and we make 95 percent of our changes instantly. There is no reason to make people wait; it honestly takes longer to project manage most changes than to just make them, so we just make them, but of course Web sites/front end is a different beast than application or other code.


I don't get this, whatever happened to a hot fix in minutes, and then a proper fix 6 days later? Have we forgotten how to do this? Together with some other reasonable comments, I can't fault the process in the story, but I just can't get why they didn't allow a hot fix?

Then it would be a different title; It takes 5 minutes to approve the change to 1 line of code and rolled into production, and then 6 days to get an official patch - done in accordance with proper IT policy.


What makes you think that they have CI and CD? It's entirely possible that just building & shipping is a 1 full day procedure.


I'm surprised no one has raised the problem of senior managers and VPs underestimating and oversimplifying the complexity and LOE of work. It causes unrealistic timelines for others who do have to adhere to a release process (which is never perfect but certainly necessary). "Yes CEO, of course we can get that in by tomorrow, no problem."

When the work misses the impossible deadline the feedback usually points to an underperforming organization and rarely to unrealistic estimation.


Agree with some other comments here:

Good: Code review? Great. Boy scout rules enforced in legacy code? awesome. The QE analysis seems great!

Bad: Failure to expedite? Fail. And without that there's no story.

Also at a high level, the fact that it is only 1 line of code is irrelevant. It's not closely correlated to the riskiness of the change.


Favorite line:

"Ed: Fuck that shit."

"Shirley: That may very well be true."



I work on a consulting company. We have extremely aggressive deadlines, and work with less than capable people on both sides of the table.

This means sometimes that takes a usual programmer a few minutes to change a line of code and promote it to UAT.

Forget the number of bugs that happens after this. Yeah, the number is VERY high, but is subjective. Think about the overall state of the program. When all that matters is velocity, people takes all forms of shortcuts. OO is out of the window. Hardcode? What about things that should've been parameters and are not only hard-coded, but spread along many files?

TL;DR OP case seems contrived, or at least the other extreme. Processes should be followed. If the factory stopped because of that single line - or any other, someone would ask: all this to save 6 man days?


This has been discussed on Hacker News before (https://news.ycombinator.com/item?id=3989136). The cited article was published in 2012 at a different domain that is no longer alive. Thanks to the Wayback Machine, we have copies. Here's one: https://web.archive.org/web/20120623032243/http://edweissman...


Change control needs to be structured around risk. I.e how much $ are you losing a second given an outage. Also, not all changes have the same risk. Your change control process should be malleable to these realities. If a change is low risk and the reward is high then it shouldn't need the same controls as a high risk change. Also, there are more ways to mitigate risk other than additional steps in the release process.

TLDR absolutists make poor release engineers.


This is why it's important to have a "hotfix" process where you can make a change fast and then still follow up with all the QA and coding standards.


The bad thing with bureaucracy and slow processes is that eventually you'll find yourself in a situation where you either 1) start working on something else while waiting on some process to complete and get your changes merged, and then you gradually forget to play your turns in the process, delaying it even further, or 2) waste days sitting on the important change, pushing it forward, being active and responsive, and basically just avoiding any other non-trivial task because of #1 happens.

So you need to decide whether you want to get real work done or if you want to get the one task done, for which coding took a matter of minutes or hours, but the process eats up days.


To change the frame of reference a bit, how would this story work out at Google?

I just have an outsider perspective on this, but I really think that monorepos & automation that comes with it will have an amplifying effect in the years to come ..


Requiring all your constants to be configurable seems like a great way to bypass (potentially critical) code review and QA. I can totally see changing something like "number of months" causing extremely non-obvious issues (e.g. integer overflow in something 20 stack levels away)...once you make it a configurable constant though, changing it won't be QA'd at all.


If technical debt accumulates, you slow down until the point you need a lot of time to understand the consequences of changing code.

Pretty much like the end game in Jenga.

One good approach is SOLID: https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)


It's hilarious to read but part of the events that lead up to this are compliance , audit trails, security, investors and customers want to know every 5 minutes some dev isn't jumping into vim and saving new code on production and taking things down with a syntax error.


The only difference between this and a large modern tech company is that the latter has this process automated. If anything, large tech companies have more respect for the need for process, as they understand how critical it is for everything to run smoothly.


It seems like buried in all the other policies in one meta-policy: nothing gets done until someone with sufficient authority overrides all the rules.

How often does the president have to do this? Has anything ever been done the correct way?


Reality is in a large org, process can usually be sidestepped with emergency change requests and executive sponsor. Which is basically what happened here, it just took extra days because it wasn't started that way.


Are there really factories whose policies and parameters are encoded in software vs. under the guidance of humans?

That in itself is amazing to me considering that the tech company I work at, lots of things are just done by fiat.


Since this was a critical issue I think "David" should've brought key stakeholders (or leads) together and set expectations for who was doing what parts and when it would be complete.


there's a saying, "if it hurts, do it more often" ( https://www.martinfowler.com/bliki/FrequencyReducesDifficult... )

these guys should do all their PRs as one line (or a few lines ) changes and put it through the system and improve the system till they can get things deployed within an hour ( or some reasonable time period )


They should also limit the number of lines that are allowed to be changed/added/removed every 6 days... and maybe it should be a law for popular products.


Why did they hard code a variable in the first place? Hundreds of comments about technical debt when it's just shitty engineering.


This sounds like bureaucratic nonsense waiting for signatures and approvals. Maybe the task flow needs to improve.


these processes are beyond fixing. if you absorbed into a corporate monolith that values 'best practise' you'll just have to put up with it. however this really would not happen at any shop that is concerned with saving money.


I don't think I would eat 24 excedrin over something like this.


This sounds like perfect is the enemy of good.


Thats a dream orocess right there right?

The solution imo is to tell the higher ups to fast track it if peoples jobs arecat stake.


This is true beauty


Become a $$$/hour contractor and learn to love the bureaucracy.

Then wait for those all-too-often moments where a manager's ass is on the line if feature X doesn't ship by tomorrow and watch all the rules and regulations get chucked wholesale out the window. Then chuckle.


Exactly. The mantra I was taught when I first began consulting, when faced with frustrations like these was (adapted from the old school Hawaii Five-O): "Bill 'em, Danno!" I've been at it for 20+ years and it truly works.


Surely there could be some optimizations, but there are no gross failures here.

At every step there were reasonable policies it seems.

It's not wise to frame this issue as '1 line of code' because '1 line of code' could ruin a company.

It's frustrating ... but it's not that bad.


Well, I've seen 5 line patch take 6 months (and counting), so...


I've had 6 months of work thrown out because management changed its mind over which approach to take. So 0 lines in 6 months..


Yeah it's sad I was thinking '6 days? that's pretty quick'


TRWTF is that it is considered a problem that it took 6 days. Sounds like there was no harm done and 6 days was OK.


[2015]




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: