What Went Wrong?

foobiekr · on July 17, 2021

Part of my job is to help the executives that I report to understand why things went wrong from the security perspective in our business unit. These are purely internal discussions, not even investigations. There are no penalties, but really, for things as egregious as hard coded passwords. As will become clear in a moment, the fact that my executives care is quite unusual.

Culturally the result is coverups and lies.

Engineers lie, managers lie, test people lie, directors lie, senior directors lie, vice president lie, external interesting teams are negotiated into minimizing certain critical failures, and so on. Managers don’t want to hear it so that they can’t be accused of lying, vice presidents don’t wanna know, SVP’s just want green squares on the cross-BU PowerPoint.

This is internal discussion of revenue impacting incidents. Do you know what executives do care about? Revenue. Lost deals. If the people who care about money, including the account teams, don’t care about security and severe quality issues enough to be honest enough to get to improvement, how could an external board accomplish anything for those very few incidents that actually become publicly visible?

This isn’t like the NTSB; I spent my life reading NTSB accident reports. They have actual real authority, there are potentially issues that might impact someone more than being caught distorting things.

oceanghost · on July 18, 2021

As an engineer I found if I didn't lie and pretend to support the false reality of the managers above me, I wouldn't have that job long.

Once we had a very large effort, creating a new hardware and software platform for the companies products. The CEO demanded it be done in 9 months, our estimate was accurate almost to the day-- 18 months. Politically it was easier for the PMs to agree with 9 months and then have 9 one-month delays it took, instead of getting fired for insisting on a good faith estimate.

What do you think the security in that product was like?

Security directly conflicts with the business model of cutting every corner possible and loading engineers as much as possible. A real security program would have to start with reasonable hours and goals.

hollerith · on July 18, 2021

I got curious about what legal authority the NTSB has. Here is what I found:

>like a cop, the NTSB can secure an accident scene and keep others away. It can examine the aircraft wreckage. It can have aircraft parts tested to help determine why the accident happened. It can even subpoena evidence. . . . If there is a lawsuit concerning the crash, the NTSB will not get involved. Not only is the NTSB’s report of the accident’s probable cause inadmissible, but the NTSB investigators are prohibited by law from testifying in court, even if they are served with a subpoena.

Source: https://www.aviationlawmonitor.com/2009/04/ntsb/the-ntsbs-li...

>The NTSB may issue a subpoena, enforceable in Federal District Court . . . For purposes of the Health Insurance Portability and Accountability Act of 1996 (HIPAA), Public Law 104-191, and the regulations promulgated by the Department of Health and Human Services, 45 CFR 164.501 et seq., the NTSB is a “public health authority” to which protected health information may be disclosed by a HIPAA “covered entity” without the prior written authorization of the subject of the records.

Source: https://www.law.cornell.edu/cfr/text/49/831.59

Aeolun · on July 17, 2021

I care about the issues (as an angineer), but my experience with raising any security issues is that it results in a lot of pain for me personally.

You report something to the ‘security’ team, and suddenly you’re responsible for doing all the work, as well as the prime suspect in an investigation ‘why didn’t you fix this before, since when did we know about this?!’

I’m absolutely incentiviced to just let any issue lie until it is discovered, because then it actually is the security teams problem.

jbuhbjlnjbn · on July 18, 2021

Jup.

To summarize, engineers mostly don't live security not because they don't care, on the contrary, but because doing so is the fastest way to sabotage yourself in the workplace and make enemies.

Two of my own examples

1) A coworker had the task of updating aged signal generators, because we had reported a bug which was then fixed. The signal generators worked without memory, a USB Stick was used as storage memory. As he plugged said USB stick into the work laptop, which he had never touched before, a virus warning popped up. We decided to do the right thing and tell our IT department about it and ask for further steps.. What followed were ridiculous hostile accusations, threats and pressure by the IT people, instead of a sensible reaction.

2) We had a safety instruction at work and it was pointed out anyone should follow safety rules and remind colleagues to spread awareness. Few days later I entered through a security door and two unknown colleagues followed right behind me. As I noticed, I politely asked them to show their access credentials. I was met with hostility and disgust. Later I learned those two were higher ranged managers, anytime I met one of them later, they remembered and treated me badly, for doing exactly what was asked.

I even went so far and told my supervisor of both instances to "raise awareness" of the reality, but only got shrugs and a blank face.

arch-ninja · on July 18, 2021

I was expelled from a university for reporting security flaws, the solution we all seek is the simplest one and for administrators it's easiest to hurt the people making noise which commonly results in the noise going away. fight-or-flight response at it's finest.

Edit: reminder that for the "common folk" these "security issues" are not a 5-minute fix, they are fundamentally different realities which require every machine on the network to be re-checked before they can be used again. There is a clear communication failure between the ones who want security and the ones who want "security".

kwhitefoot · on July 18, 2021

Which university? People should know so that they can avoid it.

wkavey · on July 18, 2021

Would like to hear the story behind this one.

jacobolus · on July 18, 2021

A high school friend (2 decades ago) told a teacher that the system keeping track of student grades was vulnerable to attack. The teacher asked the student to demonstrate by attacking the system and adjusting one of his test scores down by 1 point. My friend obliged, and the teacher reported the vulnerability to the administration. An administrator threatened my friend with expulsion, but when he proposed to go public with his story in response, they decided they wouldn't expel him. The resolution was "please don't tell anyone", and the vulnerability was never fixed.

Game_Ender · on July 18, 2021

It’s common, a friend almost got expelled for reporting a flaw in the universities ID card system. That friend did not brake anything, they did not sneak into any protected spaces. Just discovered and validated a flaw and then reported it.

taneq · on July 18, 2021

Once in a past life I discovered that the asset tracking system I was maintaining had no actual security on any of the data it returned. Once you'd logged in as one user you could just change the URL to access whatever user's data you wanted.

I raised the issue with my manager, we had a meeting about security, I explained that I wasn't a security expert but that this was a big problem and we needed to do a full security audit. I guesstimated 3 months to go through the codebase looking for security holes (in hindsight, a huge underestimate) and they said it was too expensive. Talking to fellow ex-employees the issue was still there 10+ years later.

foobiekr · on July 18, 2021

Not to scare you, but I know a power plant operator who has that issue on its plant operations system. It's pretty amazing.

I know another company where anyone in the company, anyone at all with a login into account in the Active Directory SSO of any privilege at all, even a janitor, can sign any code they want with the release keys.

Nobody cares.

R0b0t1 · on July 18, 2021

In a way this is similar security to what you get at a single office with no internal security. For the vast majority of history it worked.

concordDance · on July 18, 2021

The wonders of a high trust environment. The vast vast majority of people will not exploit security holes.

slyall · on July 17, 2021

I think you are overestimating the importance of "revenue impacting incidents" to company employees.

If the company makes a couple of million extra or less this year it doesn't effect the majority of workers. Their bonus isn't going up or down etc. And remember this incident has already happened.

By contrast if a report comes out blaming the loss on a worker, department or division then that could have major consequences. No matter how "blameless" it is, come next round of bonuses, promotions or layoffs everybody knows it'll be factored into the decisions.

So people don't have an incentive to make themselves look bad and unlike with the NTSB there is no legal powers or fear of causing deaths behind the investigation.

laurent92 · on July 17, 2021

I understand, but it sounds like we are digging ourselves into the same hole as USSR workers who were not incentivized to deliver working products, when we do that. It’s a civilizational peril. How do we solve cooperation at large scale? Is the only way to watch large companies accumulate bored employees and constantly recreate “the small guy”, the startup, which will finally make things right, until they become too big to be incentivized?

Spooky23 · on July 18, 2021

That comparison is apt. A big corporate entity is a mini command economy, and fails in similar ways.

There is an inflection point where the focus of the organization shifts from mission focus to control focus. Usually it’s when growth slows. A fast-mover industry like tech is a great example, but it happens almost everywhere.

WalterBright · on July 18, 2021

That's exactly how it works, and why big companies do not grow until they consume the entire world. They become complacent and unable to change, and a startup eventually takes their business away.

saalweachter · on July 18, 2021

It's funny that people expect management to function better because it's employee-owners of a capitalist corporation, instead of feudal lords or members of a communist central committee.

laurent92 · on July 18, 2021

The feudal owners in a startup situation are the marketplace owners who organize all startups to compete.

The day we lose is the day all of those marketplaces themselves are cannot be replaced.

disgruntledphd2 · on July 18, 2021

i.e. VC's and other providers of capital.

rdtwo · on July 18, 2021

Well they would except often startups are prohibited from competing via regulatory capture or excessive funding costs

snowwrestler · on July 18, 2021

> Is the only way to watch large companies accumulate bored employees and constantly recreate “the small guy”, the startup, which will finally make things right, until they become too big to be incentivized?

Yes, and you have just illustrated why the USSR struggled economically in comparison to the United States.

foobiekr · on July 17, 2021

I didn't say "employees" so much as "executives"; and the executives I'm referring to go beyond owning P&L. They actually do care about revenue, which is why everyone lies to them.

indymike · on July 18, 2021

> Do you know what executives do care about? Revenue. Lost deals.

This is often why engineering needs aren't covered. Things are presented as risks and expense, instead of in terms of revenue. A $10,000 expense actually wipes out $10,000 in net profit, so you need to generate revenue sufficient to create $10,000 in net profit. Most companies have really rosy gross margins, but really tight net margins, so a $100K expense will take $2.8 Million in revenue to offset it. Finally, there is how risk frames what you present. If you come at me with "this might happen" the other side of the coin is "this might not happen", and most managers will avoid the certainty of expense for the possibility of an expense.

If you are working with a CEO, valuation is where it's at. Try to understand the swing in company valuation based on profit. Present to the CEO like this: "X is highly likely to happen within Y months, and it will impact $Z in net profits, leading to a change in company valuation of up to $N." Make sure $N is enough to matter. You just took 98% of the arguments against taking action off the table.

an_opabinia · on July 18, 2021

> Most companies have really rosy gross margins, but really tight net margins, so a $100K expense will take $2.8 Million in revenue to offset it.

I don’t think you are interpreting things correctly here. This is both counterintuitive and wrong.

indymike · on July 18, 2021

I'm pretty sure I'm not wrong. Source: I own three companies. The $100k and $2.8M were just examples (very thin net profit). It's equally likely you could have $100k in expense and need $200-$400K in sales to offset it if you are in a very high margin business. Regardless, what matters is that the unexpected expense will reduce profit, every time, and by framing risk in terms of profit you will help your CEO make better decisions.

an_opabinia · on July 18, 2021

You’re not “framing the risk in terms of profit.” You’re talking about a set of equations in your head, that only you know, that come up with weird results like “it takes way more revenue to ‘offset’ an expense” because you’re holding “net” or “gross” or whatever margin constant. Which is really confusing and surprising and not how anyone I’ve ever met talks about this stuff, and they’d be equally puzzled and I don’t think would experience an illuminating or aha moment or whatever.

Personally I found it a little interesting, I get what you are going for, but it’s so strange and not really useful or true.

indymike · on July 18, 2021

I think you are looking for deep insight and the point that was being made was shallow. If I make $3000 in profit, at 30% margin, I have to have $10000 in sales. It follows that if I have an unexpected expense of $300, it will require $1000 in sales to have generate enough profit to cover the expense.

vladvasiliu · on July 18, 2021

> X is highly likely to happen within Y months

How do you quantify "highly"?

I think that one way or another, most people actually try to evaluate things this way.

If they knew with 100% certainty that some breach would happen, they would probably invest the time and money to fix the situation. Just as if the probability for the breach to happen was known to be 0%, they would be justified in not fixing it.

You frame this as CEOs being pragmatic, and I agree. But then, the other side of the coin is that if you're regularly wrong (there's a high risk of a breach happening — then nothing happens) they'll probably stop listening to you after a few times.

ithkuil · on July 18, 2021

Also don't forget the reverse-tinkerbell effect: if you do things right and nothing bad happens for a while, they may be tempted to consider that perhaps you're being overly cautious and perhaps a doing things a little bit faster/cheaper would be ok anyway.

kevinventullo · on July 18, 2021

And depending on the cost of something bad happening, they may be right!

ithkuil · on July 18, 2021

Well, the idea that fools consistently occupy exec positions is a bit foolish, isn't?

majjgepolja · on July 19, 2021

Given 90% of execs spend their entire lives doing nepotism and politics, I don't think it's as foolish as you frame it.

ithkuil · on July 19, 2021

people can do and be two things at the same time

izacus · on July 17, 2021

I also wonder if "blameless postmortem" culture perhaps actively works against preventing these kind of incidents. It doesn't seem that anyone in IT is ever responsible for damage they cause.

But yes, lying, "not seeing" and covering documentation is pretty much standard corporate behaviour I've seen around plenty of companies as well.

foobiekr · on July 17, 2021

I no longer believe in blameless post mortem as a general rule. I have, through experience, come to believe that the contexts where blameless post mortems work are the contexts where literally anything works because they are organizations that have high hiring bars and high expectations. My current employer is not one of them; we are a mountain of mediocrity and all blameless post mortems do is act as an excuse to avoid raising the bar.

jolux · on July 17, 2021

The principle of blameless postmortems is not supposed to absolve anyone of the responsibility to change anything, it’s supposed to foreground that serious failures are organizational failures first and foremost, because it’s the organization that has an obligation not to fail, not individuals, who fail all the time as a rule.

tialaramex · on July 17, 2021

Exactly this.

I spend a bunch of time reading accident reports from agencies like RAIB and MAIB, but my real jobs have been closer to the Web PKI and thus m.d.s.policy

Back in 2015, Symantec's CA issued some certificates that shouldn't have existed, including for names owned by Google. What's wrong there? Well, a blameless postmortem would probably tell you that your processes and procedures are bad, you are creating bogus certificates to "test" a real CA whose certificates are actually trusted in the real world. Need better processes, training, oversight to ensure things improve. What did Symantec do when they were caught? They fired the low-level employees who conducted the tests and wrote a blog post "A Tough Day as Leaders" which blamed the fired employees for getting it wrong. Some leadership (the blog post of course no longer exists although I assume it's archived somewhere).

Less than two years later, Symantec is back in trouble again because an RA they've worked with has been issuing certificates, using their CA infrastructure (and thus, from our point of view they were issuing these certificates even if they unaccountably believed this isn't their fault) that should not exist. This time Symantec's bosses blame not only low-level employees, but also auditors, bosses at the Korean RA, and anybody else they can think of... except themselves.

This is a gross failure of leadership. Once upon a time a US President said "The buck stops here", but Donald Trump was very clear, "The buck stops with everybody" and "I take no responsibility" and it seems Symantec's leadership were made in that image. They quit the CA business rather than do what it would take to fix the problem.

If you conduct a "postmortem" after an incident, then "Nobody was to blame and nothing needs to change" is almost certainly just as much the wrong outcome as "It's Jane's fault, fire her".

I mentioned I read MAIB reports. One MAIB report sticks out, after many years, in the following way:

Unlike every other MAIB report I've read, this one has No recommendations. Someone died, and yet there is nothing to recommend. Why not? Well, the cause is very simple, two men on a fishing boat took a lot of heroin, and their boat crashed, it sank and one died. No need to recommend that you shouldn't take heroin while operating a fishing boat since heroin is an illegal drug already and operating a vessel under the influence of drugs or alcohol is already a crime too.

If your next "blameless portmortem" doesn't have any recommendations, ask yourself, was what happened already a crime? Are the people involved dead or in prison and so either way beyond the value of recommending a different course of action next time? No? Then we need to recommend how to actually avoid it happening again.

duckhelmet · on July 18, 2021

> They fired the low-level employees who conducted the tests and wrote a blog post "A Tough Day as Leaders" which blamed the fired employees for getting it wrong. Some leadership (the blog post of course no longer exists although I assume it's archived somewhere).

"A Tough Day as Leaders"

https://archive.is/Ro70U

akiselev · on July 18, 2021

And the MAIB report: https://assets.publishing.service.gov.uk/media/54db3fd940f0b...

Goety · on July 18, 2021

>If your next "blameless portmortem" doesn't have any recommendations, ask yourself, was what happened [in IT services] already a crime?

Imagine the amount of training needed for a government worker to be able to decide that and be familiar with law AND not have a political/financial lean to be able to do that. The US government has already decided they're not going to train their population with relevant marketable skills globally at competitive ages. They've turned education into a private/public dating program to preserve social class norms.

The US government is still too ill equipped, corrupt, and disinterested in regional/global IT regulations. They seem more interested in weaponizing the IT realm to remain relevant across the globe. It seems to me like they're consistently deluded in thinking they can maintain systemic supremacy given movements in india, china, russia, and europe.

_lqaf · on July 17, 2021

> and all blameless post mortems do is act as an excuse to avoid raising the bar

"Well, there's your problem, right there."

The entire point of doing blameless post-mortems is to correctly identify problems for resolution. If management doesn't drive changes in response (process, training, communication, whatever), you have a different problem to solve before they'll do any good.

rocqua · on July 18, 2021

I imagine blameless postmortems sometimes happen because people want to avoid blame. So they argue that all the cool places do "blameless". We should try that too!

And then the organization doesn't understand the actual concept behind the idea (nor did the suggestor want that). Instead the organization learns "we have decided not to blame anyone". And then everyone involved is satisfied.

torgard · on July 17, 2021

A post-mortem should not necessarily blame the individual, but blame the circumstances the individual finds themselves in.

Yes, a hard-coded password is bad practice. But does the company have a bad culture of keeping configs in repos? Maybe management thinks it easier to commit configs with sensitive data, than to set up proper deployment shit. And after all, the repos are private, so it should be fine yeah?

Bad code ending up in production is something you'll see often. Does the company have nice test suites for everything? Continuous integration pipelines? E2E tests? Or is upper management pushing everyone to their limits, because "fuck it ship it"?

fisf · on July 18, 2021

A post mortem should absolutely also assign individual responsibility for grossly negligent behaviour.

If management thinks that skipping testing and implementing insecure controls is the way to go, get that in writing.

Collectively, developers need to show a higher degree of professionalism in this regard.

nanis · on July 17, 2021

In my negative experiences, "blameless" turned in to "nobody did anything wrong" which, of course, undermines the whole point of finding out what actually happened so we can see if there is a thing we can do to reduce the likelihood of it happening again.

Sometimes, the root cause is indeed someone with the privilege but not the good sense ignoring warning signs. If we can't identify that problem, then we can't improve our odds for the next time.

vajrabum · on July 18, 2021

That kind of result may mean you need a Molly guard. The original Molly was a toddler who reportedly pushed the Big Red Switch on an IBM 4341 twice in one day and so they put a cover over the switch to put an end to that sort of outage. Occasionally people need firing but even in that sort of circumstance there's an organization problem that needs addressing.

joshuamorton · on July 18, 2021

A valid blameless answer then is "remove the privilege" and yes, despite whatever objections you'll raise, this is possible. Difficult, but possible.

Like, even in the case of the extreme example of someone deciding to intentionally harm the company. You fire them, but then what, how do you prevent the next person to go rogue from causing equivalent harm?

tonyedgecombe · on July 18, 2021

This is what slows big companies down. They accrete so many of these rules and restrictions that people can't do their work.

Kalium · on July 18, 2021

> They accrete so many of these rules and restrictions that people can't do their work.

I hear this from software engineers, from time to time. What do you mean they aren't allowed to SSH into prod anymore? How will they debug, update, or maintain anything?

Sometimes this is your standard-issue hostile reaction to change. The old approach is what they are used to and they don't understand the need to change it (and "ask to understand" mostly so they can try and negotiate). This new world just seems to get in the way for no clear reason. Management neither appreciates nor understands the reluctance and just pushes on.

Usually what needs to happen is a series of changes across the org. You roll out the change with references to policy to support it. Workflows get updated and reworked so that SSHing into prod is not, in fact, the way to update systems or view logs or whatever.

Most of all, educational materials are provided. Often I find that people object to changes because they don't know another way to work. If all you've ever known is SSHing into prod to read logs, you've probably never heard of Kibana or used OpenTelemetry.

marcosdumay · on July 18, 2021

Public policy is just waking for this, and private administration is way too unprofessional as a group to formalize the problem (thus some people understand it, most don't), but rule creation requires a cost-benefit analysis.

Any new rule you create is a chance to analyze the entire set, and maybe redesign it.

concordDance · on July 18, 2021

Or people start ignoring the rules in order to do their work. Which has it's own problems (particularly as rules normally aren't divided into "important" and "unimportant".

jbuhbjlnjbn · on July 18, 2021

This is the thorn in my foot.

I refuse to ignore company rules, and also comment on gross negligence, which more often then not means I am the quarreler, not the good engineer in the eyes of coworkers and bosses.

rocqua · on July 18, 2021

I believe more of these incidents should conclude: "this is better to accept as the cost of doing business than to try to spend money to fix".

ganafagol · on July 18, 2021

Let's not conflate security and safety here. The boards hailed by the article are all about investigating safety failures, and so is the advocated invetigation board. Security is a different beast. It's sub-par in many airplanes/power plants/... too.

jjtheblunt · on July 18, 2021

Do bureacracies essentially provide complexity in which dysfunction readily hides?

pomian · on July 18, 2021

That's very disappointing. I've always brought up everyone up to think that of all the professions, you can always trust an engineer. They deal in facts. They don't lie. Perhaps it's as simple that in the real world, engineers can't lie. Because the world wouldn't work is they did.

cratermoon · on July 18, 2021

> There are no penalties, but really, for things as egregious as hard coded passwords

Is this sentence missing a word or phrase?

MrStonedOne · on July 17, 2021

In washington state, the state superior court ruled the police department was not liable for the impound fee paid by somebody who had their car impounded for 90* days for driving on what the computer reported was a suspended license, because they are exempt from mistakes from trusting their own computer system.

This was the second time the department had wrongfully impounded his car and they made no attempt to fix the mistake from the first time, this didn't impact the ruling.

Its gonna get much worse before it get any better.

spaetzleesser · on July 17, 2021

It will get much worse I think. More and more companies are hiding behind algorithms and other computer systems while cutting support staff. If you are wronged you have nobody to talk to and they make no effort to correct the situation. the only recourse is a lawsuit which is way too expensive for most people. And even when they are caught the fines are usually only nominal.

I think we are building up the ultimate faceless bureaucracies.

Aeolun · on July 17, 2021

If people implicitly trust the system (or at least don’t get blamed for it’s failures) it makes it much easier for hackers though.

swiley · on July 18, 2021

At that point you might as well just hook everything up to etherium; the judge is redundant.

AlbertCory · on July 18, 2021

You're certainly right that lots of very large companies go to every conceivable length to keep you from ever finding a human being, even for an online chat, forget a live phone conversation. Their "contact us" page is nothing but an FAQ.

Is legislation or regulation the answer? That would be unfortunate. The government rarely makes things better, but to be honest, those accident investigation boards probably DO prevent management from sweeping things under the rug.

Some kind of consumer ratings for Quality of Service will have to spring up. You can now find ratings of airlines for their on-time record and likelihood of losing your baggage. We need something similar for web companies.

cratermoon · on July 18, 2021

I rarely see such purely expressed faith in the Free Market Fairy: https://bitworking.org/news/2008/01/the-free-market-fairy/

AlbertCory · on July 18, 2021

Let me withdraw the snarky reply. But a serious question:

What's your solution to the "bad service" problem?

We can consider separately : (1) the failures that might kill someone, and (2) just lousy customer service.

The analogy to airlines would be: (1) a crash that kills people (2) being late & losing your luggage

AlbertCory · on July 18, 2021

I guess you wouldn't on your half of the web.

cratermoon · on July 18, 2021

What makes you think half the web believes in the Free Market Fairy?

steve_g · on July 17, 2021

I would assume (taking the lazy way of not checking) that this is an issue of qualified immunity for law enforcement officers. Qualified immunity is a judge-created doctrine that allows police and other government officials to avoid any consequences for clearly bad acts. Any intelligent 10-year old could see that it's unjust for a citizen to pay for the cop's mistake in this situation. Qualified immunity says, "if there is no specifically on-point existing legal ruling that states the behavior violates the constitution then the government official is not responsible.

Everyone else is expected to use their brains and be responsible for the consequences of their actions. Government officials are not. This is not an issue of computer failures being treated differently than other high-consequence failures. This an issue of government failure getting a pass.

Qualified immunity is a corrupt and depraved element of the US legal system.

alisonkisk · on July 18, 2021

No. Qualified immunity protects individuals from liability from the state's mistakes. It's similar to how you are not liabile for hurting someone while you are doing your job properly, and why you can't sue a Comcast customer service rep for stealing your money in a billing error.

Sovereign immunity (the principle that the state rules over all so no one can force the state to do anything) protects the state from liability from its own mistakes. You can only sue the state if the state decides to allow it.

zepto · on July 18, 2021

That doesn’t seem unreasonable in terms of the police department not being liable. What are they supposed to do? Maintain a parallel system so they can check everything the the computer produces?

It seems like the vendor of the faulty system should be liable.

ectopod · on July 18, 2021

No. It's the police department computer. They should be liable for the harm caused by it. Maybe the police can recover damages from the software vendor, but that should be separate.

zepto · on July 18, 2021

> but that should be separate.

Why?

MrStonedOne · on July 18, 2021

Because the citizen has no visibility nor control into the decisions related to what software vendor to use, what processes are for dealing with errors and issues, what terms are negotiated for licensed source access (to the police) vs managed closed source where all changes, issues, and fixes have to go thru long form scoping processes with the vendor delaying fixes, or what policies are for using and trusting these results, what polices there are for reporting and acting on inaccuracies, as well as informing the entire department/state about the inaccuracies.

The police department does, so they should be liable for those decisions.

Like, the moment there was a case of somebody being incorrectly listed as having a suspended license, the entire department should have been informed to not trust the results until the underlying issue was found and fixed.

zepto · on July 18, 2021

> The police department does, so they should be liable for those decisions.

Does it? Perhaps if they are using a bespoke solution, but they could just as easily be using a standard system procured at the county or state level.

> the entire department should have been informed to not trust the results until the underlying issue was found and fixed.

This doesn’t seem like a real world solution. Without knowing the frequency and nature of the problem this could be a massive overreaction.

MrStonedOne · on July 18, 2021

having you vehicle taken away can easily lead to joblessness and homelessness.

its not a massive overreaction.

zepto · on July 19, 2021

Not taking a vehicle from someone who should be banned from driving can lead to people being killed.

MrStonedOne · on July 18, 2021

rocqua · on July 18, 2021

They are supposed to keep track of which cars need to be impounded. If you make a mistake, that is on you.

openthc · on July 17, 2021

In Washington State we have a system to track cannabis, the enforcement officers are supposed to be able to get reports from this system. The system is super buggy and also doesn't have meaningful reports. So there is a secondary system for officers to export to Excel documents. In one of the trainings they've been instructed to look for anomalies -- not real analisys, not even a pivot table. One thing they find is "negative quantities" -- but how can that be? (hint: it's bugs in the tracking software). Then enforcement shows up at the cannabis business to audit these negative numbers (or demand the business try to correct the data (which they cannot due to bugs)).

So, crappy software gets law enforcement officers to basically review data "anomalies" created by bugs by visiting a business. The second most expensive method for data sanatization I can imagine. It's a poor use of their time and disruptive to the business.

The system in WA is so buggy that the agency has opted to freeze the software rather than try to fix the issues. The future of government software is bleak -- so long as they keep using closed source packages from low-cost bidders.

laurent92 · on July 17, 2021

Why isn’t all software created for the government required to be open-source? Would that really drive the costs up, if the providers don’t have the choice?

openthc · on July 17, 2021

The vendor claimed that if the code was out it would be a security risk. The agency claims the vendor needs to protect their intellectual property rights. We have (some) visibility into other things our taxes pay for -- the software should absolutely be one -- expecially the regulatory compliance ones that drive enforcement action.

Edit: also, they were breached anyway shortly after launch (2018) and then an email went around offerting to sell the code and data from their entire system.

laurent92 · on July 18, 2021

And it is true: If their code were out, it would be painfully obvious that it is full of vulnerabilities. Security by obscurity!

I know that because I’m myself afraid of making my old app open-source… I wish I had done a bug bounty from day #1.

Bug bounties are a killer tool. I wish some lawyers had made a license like “Not open-source but here’s the source for vulnerability research.”

Kinrany · on July 18, 2021

The government could also require a bug bounty, with a centralized agency investigating reports.

ncmncm · on July 17, 2021

> $Millions, and in some cases $billions, in tax money pour into projects that almost invariably run late, over budget, fail to deliver

In many, perhaps most, such cases, projects running late and over budget are performing exactly as intended by their sponsors.

All too often, the nominal purpose for a project is just cover for a totally legal conduit from the public purse to politically-selected private pockets. Thus, in the US, we get the F-35, the SLS, the California bullet train. Sometimes we get something out the other end, many years late. (The delay is to keep the gravy train running longer.) Sometimes, nothing.

In New York, we did end up with a 2nd Avenue Subway extension. There actually are F-35 planes at air bases, some of which can actually fly. They are stacking an SLS in Florida as I write this. They probably will launch at least the one, maybe a second, at $2B each.

alisonkisk · on July 18, 2021

> run late, over budget, fail to deliver

An estimate (in time or money) is a wish your heart makes. only fail to deliver is a real problem here, due to lack of agile practices.

helsinkiandrew · on July 18, 2021

For more background on the UK Post office scandal, there was a recent computerphile youtube video

https://www.youtube.com/watch?v=hBJm9ZYqL10

And Private Eye report

https://www.private-eye.co.uk/pictures/special_reports/justi...

In the post office case in my view the problem wasn't that the IT system was faulty it was that Post Office management continued to prosecute long after (more than a decade) any competent person would have known it was faulty - these people should be prosecuted.

sarchertech · on July 18, 2021

Software engineering hasn’t had our Quebec Bridge collapse yet.

It will take an enormous visible disaster that does more than cost a company a few tens of millions, or exposes a few hundred million social security numbers before we start holding companies and engineers liable for security flaws.

Goety · on July 18, 2021

Ah yes the equifax hack

https://www.nytimes.com/2017/09/07/business/equifax-cyberatt...

OPM Hack

https://en.wikipedia.org/wiki/Office_of_Personnel_Management...

The only way to create software engineering reform you envision is to force a paradigm shift. The only thing that could instigate it would be like skynet with many dead. Though, the unknown damage from OPM could be quite severe.

There might be 'security standards' but neither the government nor private sector can guarantee anything to be safe. The civilization built on IT networks will need to be rewowrked.

sarchertech · on July 18, 2021

Did you not notice that I referenced the Equifax hack?

filoleg · on July 18, 2021

How many were killed as a consequence of the Equifax hack?

Don't get me wrong, that hack was absolutely massive and terrible, but I think you missed the point of what the parent comment was saying.

Their point was that it must be something real and visceral, on a level that will make people think "something like this must never happen again at any cost". As far as I know, there weren't massive killings of people due to the equifax hack, so that isn't quite on the same level.

sarchertech · on July 19, 2021

The parent edited their comment after I commented.

It originally said

>Did you not remember the Equifax hack?

Of course I didn’t remember it. I explicitly mentioned it as something that didn’t qualify as an Quebec Bridge collapse level event.

They editing to make it seem like they realized that I mentioned it all along.

To your point

>it must be something real and visceral.

I agree that was exactly what I was saying when I started this thread.

swiley · on July 18, 2021

We've flown planes into the ground, killed people with Xrays, rigged elections, shut down oil delivery to sections of the east cost of the US.

If anything we have bridge collapses constantly and special news anchors that report on which routes you should take today as if it were just like any other traffic incident.

cratermoon · on July 18, 2021

Don't forget crashing the first Ariane 5 on its test flight[1], incorrect floating point math[2] and losing $440 million in 30 minutes[3].

1 https://www.esa.int/Newsroom/Press_Releases/Ariane_501_-_Pre...

2 https://www.cs.earlham.edu/~dusko/cs63/fdiv.html

3 https://www.henricodolfing.com/2019/06/project-failure-case-...

The Error of Our Ways: https://www.youtube.com/watch?v=3YaI6lhn78g

johnwalkr · on July 18, 2021

Therac-25 is a must-read-about.

https://en.m.wikipedia.org/wiki/Therac-25

hamilyon2 · on July 17, 2021

The industry fails to listen to lessons written in "Mythical man month" - 50 years from now. Half of a century ago. Of course some reports on why systems are being designed and coded poorly won't change anything. We know why, we just ignored the knowledge to the point of absurdity.

torgard · on July 17, 2021

Companies could be held liable for gross misconduct. Although GDPR is not exactly a shining example of IT regulation, I think it's a good example of liability.

Companies get fined for breaking GDPR.

Governmental projects should have similar requirements in place, and companies and people should be held accountable for breaking them.

a1369209993 · on July 18, 2021

> Companies get fined for breaking GDPR.

Well, no, they don't, or least when they do it's only pittance fines. Google, Facebook, et al are not bankrupt.

torgard · on July 18, 2021

Here's a website tracking GDPR enforcement: https://www.enforcementtracker.com/

According to that website, one of Google's 10 fines is the highest fine ever, at €50 million. Hilariously, in a dystopian kind of way, they were fined €28 last year. Not as in millions, but twenty-eight euros.

Facebook only appears on the list once, with a paltry €51,000.

But in any case, the point of GDPR is not to bankrupt companies.

CogitoCogito · on July 18, 2021

> But in any case, the point of GDPR is not to bankrupt companies.

It _should_ be.

Well okay that's a bit hyperbolic, but what I mean by that is that the point should be to _change_ behavior and if the fines necessary or the resulting change in business model leads to bankruptcy then that should be totally fine. Companies who cannot operate legally shouldn't continue operating.

a1369209993 · on July 19, 2021

> the highest fine ever, at €50 million

Which would a substansive (or even excessive) fine for a typical individual person or small business. For a multi-billion-dollar corporation, it's a operating expense.

> the point of GDPR is not to bankrupt companies.

The point of fines is to be too large to be operating expenses.

Goety · on July 18, 2021

I've stopped caring about GDPR given its lax enforcement and recent power grabs by european parliament. They had their chance to prove they were going to 'clean' their space and give people trust in their actions. If they truly believed in their model I think they would have tried a bit harder. In my opinion they have already failed to gain the trust of their citizens.

I hate to say it but I think the Chinese model with 'nationalization' of the internet and IT services has already won and we're in denial. Regional governments, "federations" are likely to continue to reduce global trade through regional nationalization and IT programs.

rocqua · on July 18, 2021

Which power grabs by EU parliament? It is actually exceptionally powerless. Were you referring to the conflict between the EU and Poland or Hungary?

Goety · on July 18, 2021

The EU had decided to expand their mandate into the tech sphere with regulation already proven not to work as intended/stated.

The EU mandate should not include policing of private speech through tangential monitoring services. Some wealthy European countries themselves have rather restrictive laws and approaches to private thought and through recent laws will have access to this information. The EU (non-democratic European commission with little legislation transparency) will likely to continue to grasp for more power as their competition of thought becomes more mimicry of foreign models of power than actual ingenuity.

https://www.patrick-breyer.de/en/chatcontrol-european-parlia...

https://news.ycombinator.com/item?id=27753727

ashton314 · on July 17, 2021

> Personal information is the helium of IT systems—it leaks out of every crack or imperfection faster than seems possible.

Might as well call it the hydrogen of IT systems—get too much of it concentrated in one place, and all it takes is one little spark for it all to go up in flames. Boom!

alisonkisk · on July 18, 2021

It doesn't destroy anything, it just leaks.

verytrivial · on July 17, 2021

I agree with nearly everything in this artictle but the following question stumped me: when exactly would a software disaster investigation board be employed?

Plane goes down, train goes off rails or passes signal at danger, easy. But at what exact what point did the UK postmaster system "fail" enough for an investigation?

andersource · on July 17, 2021

I would say at latest when people convicted because of it had their names cleared - https://www.bbc.com/news/business-56859357

monkeydreams · on July 18, 2021

> But at what exact what point did the UK postmaster system "fail" enough for an investigation?

The moment it came to light that postmasters were being improperly convicted? The moment it came to light that some improperly convicted postmasters committed suicide?

chrischapman · on July 18, 2021

This is all pretty new in the world. It took many years for aircraft investigations to get real professional at it.

How about we start with a simple algorithm like this:

1. Check if software is involved in the incident.

2. If it is, carry out an internal review to evaluate the possibility that the software may be a cause of the incident.

3. If it is, hand over to an independent review board.

4. Involve lawyers if and only if, the independent review board recommends it.

That was an off-the-cuff attempt at a suggestion. There is probably a much better way. Feel free to suggest alternatives.

I have no objection to lawyers getting involved. I just object to them suppressing the facts to protect their clients and preventing an independent investigation. It's impossible to cover up that an aircraft has crashed. Lawyers involved in an aircraft accident on behalf of the manufacturer (or operator) are basically doing 'damage limitation'. Lawyers involved in a software incident on behalf of the manufacturer (or operator) are basically doing 'evidence suppression'. They should stick to 'damage limitation'.

duckhelmet · on July 18, 2021

> at what exact what point did the UK postmaster system "fail" enough for an investigation?

There never was in independent investigation of the IT system.

"Justice Lost In The Post"

https://www.private-eye.co.uk/special-reports/justice-lost-i...

dgb23 · on July 17, 2021

Large amounts of money spent on government systems that never ship is a tragedy, but software projects like these tend to have a lot of open questions.

We understand software development often as a discovery process (evolving requirements), especially if they are large or disruptive. So one critical output of any such project has to be knowledge that can be built upon, as in open, clearly specified and written papers. This should be done regardless of whether it failed or didn't fail.

Aeolun · on July 17, 2021

I can tell you now that nobody including the current engineers, can work with the requirement documents that are being generated over the course of our multi year system transformation project.

Of course, it’s still an improvement over the legacy system which has none.

whoisthemachine · on July 18, 2021

> Compared to the 100 million euros Denmark spent on a new IT system for the police, a project that never delivered anything?

Governments need "sunken-cost fallacy" triggers that automatically halt projects when they pass thresholds, even when many worry about the effect of the halting. It's like a scene from a movie - "halt my project when I go over 100M EUR, even if I'm begging you not to".

concordDance · on July 18, 2021

Given almost every single government project overruns, what could your threshold possibly be?

Kinrany · on July 18, 2021

2x the initial cost seems reasonable. If the estimate was this wrong, it's useless and the project needs to be reestimated.

brigandish · on July 19, 2021

The problem is that many initial estimates will be low ball offers to try and win a contract. Perhaps combine the sunken-cost-limit with delayed payments, otherwise companies might low ball offers knowing they can never finish but still don't care because they'll be paid.

Better oversight is clearly needed too. There won't be one tool that'll do the job on its own.

whoisthemachine · on July 18, 2021

The funny thing is, many projects (and I should have omitted government-run, any organization can suffer from this problem) have reasonable cost estimates when they begin. Why not start with those? Perhaps this rule would incentivize more realistic estimates?

cloudfifty · on July 18, 2021

The contractors have all the incentives to extract as much government money as they possibly can during the entire process.

aryehof · on July 18, 2021

At issue is that software continues to approach an opaque “ball of mud” as size and complexity increase. Silver bullets of the day like “microservices”, event-driven design, and functional programming do nothing to improve that.

The prevalent analysis method of use-case driven procedural transactional scripts resulting in controllers or services [1] (typically just transforming a database) is problematic. Yet developers are taught or can use nothing else from computing’s rich history currently.

It would seem that any attempt to improve how we model “external” [2] complex systems in code has ceased in favor of pursuing the low hanging fruit of technical detail.

Why are we surprised that any software outside of computing and the data sciences, together with computing infrastructure, are challenged?

At issue is that outsiders have no idea of this industry incompetence, and incorrectly rely on software as being accurate, complete and foolproof.

——-

[1] Martin Fowler’s “controller-entity style” - P of EAA

[2] Where expertise in the complex problem domain lies outside of the development team.

Aeolun · on July 17, 2021

I’m baffed that nobody though that there might be an issue with the system when it suddenly turned out that 700 postmasters were disappearing funds. That’s way too high to be accidental.

ChrisMarshallNY · on July 17, 2021

I really enjoyed this.

Like most things, it's a matter of scale. If a train derails, we call in the NTSB, but they don't investigate car crashes.

The issue that I see, is that the software industry seems to be absolutely obsessed with scale. Small applications are actively sneered at. Go big, or go home.

So that means that every accident is a train wreck.

PhantomGremlin · on July 18, 2021

If a train derails, we call in the NTSB, but they don't investigate car crashes.

They most certainly do investigate car crashes. They just don't have the resources to investigate very many of them.

https://www.cnbc.com/2021/05/10/ntsb-releases-preliminary-re...

bitwize · on July 18, 2021

"We really need an IT accident investigation board to help us understand what causes wrong computer results that may harm innocent people."

"...Computer says no."

Scoundreller · on July 17, 2021

Would also like to point out the fantastic videos created by the US Chemical Safety Board: https://www.youtube.com/user/USCSB

ldarby · on July 17, 2021

It's known what went wrong, computerphile has a video with some details: https://www.youtube.com/watch?v=hBJm9ZYqL10 but it doesn't address any of the judicial and cultural fails, that's what needs to be fixed. Software bugs are a fact of life, people know this, except the judges in this case apparently.

HarryHirsch · on July 17, 2021

Bugs are a fact of life because of sloppy practices. The experience from SQLite is instructive, after a testsuite had been written, matters improved immensely.

Why was the testsuite written? Because it was in the list of requirements from the client, aerospace standards demand that every possible branch is covered by a test.

We choose to write bad software.

II2II · on July 17, 2021

One could argue that faults in the engineering and construction are also a fact of life, yet that doesn't mean we excuse them and it doesn't mean that assume that a failure is due to those faults. Investigations are performed in order to ascertain the truth.

I think the authors comparison to the historical development of trains is appropriate. Investigating IT failures wasn't as important 50 years ago because IT infrastructure was not as critical. Investigating IT failures today is critical because the functioning of society depends upon it.

ldarby · on July 18, 2021

I don't think you (or the sibling comment) got my point. No where am I "excusing" bugs. Civil engineering bugs are indeed a fact of life too, just check the god damn news.

What I'm saying, is that I think where PHK says "nobody sat down and documented precisely what went wrong", I think he is just wrong.

The guy in the Computerphile video, Steven Murdoch wrote this article:

https://www.benthamsgaze.org/2021/07/15/what-went-wrong-with...,

which has further details of the bugs, e.g.

"There is a window of time between a user printing and cutting-off a report. If another user was to perform a transaction during that window, that transaction may not show on the report."

which is from the long and very detailed judgment: https://www.benthamsgaze.org/wp-content/uploads/2021/07/Bate...

The problem that needs investigating here is the miscarriage of justice, apparently the post office was aware of the bugs at the same time as prosecuting people for falling victim to the bugs, in order to save face about their IT decision.

inopinatus · on July 18, 2021

A blog post and a judgement focused on the legal issues is nothing, not even close, not even in the same ballpark, as a thorough report from an investigatory board.

What PHK is calling for is a couple of orders of magnitude greater in detail and depth.

Moreover he specifically denounces any practice of treating technical inquiries as judicial processes. The incentives differ, making that a short path to bad outcomes as people run for cover. Dispassionately recorded findings that inform the whole profession may be useful, and restorative justice has already occurred; punitive justice will be best served as a separate and ultimate proceeding.

ldarby · on July 18, 2021

Seriously, did you read a different article from https://queue.acm.org/detail.cfm?id=3475967 ? I have re-read it several times but am not seeing what you apparently are.

chrischapman · on July 18, 2021

You're both right - inopinatus and ldarby. It was clearly a miscarriage of justice - hence the lawyers and court case and resignation of the Post Office CEO. But it absolutely needed to be investigated by a dedicated, professional Software Incident Investigation Board.

The miscarriage of justice is the truly awful, shameful part of this incident and it all came about because of the 'less than honest' stance of the post office leadership. To be fair to them, they were probably advised to act in this manner by their lawyers.

The difference between an aircraft/train/boat/car/oil rig accident and a software incident is that we immediately know that an aircraft/train/boat/car/oil rig accident has happened. Professional investigators are usually on the scene long before the lawyers have had their first cup of coffee. With a software incident, lawyers have wrapped their fingers around everything long before we even know that something has happened. For the incident investigators to do their job properly, it would probably be best to keep all lawyers out of the picture until the investigation is complete. But the chances of that happening are slim.

I also agree with PHK about the expense issue. There are so many examples of huge expense to discover the cause of an aircraft accident. Just think of MH370 or AF447 (military submarines were used to try to find both wrecks). Also oil rigs. How about Deep Water Horizon? Huge amounts of money was spent to figure out exactly what happened. Similar sorts of money should be made available to discover the causes of software incidents. It's far too easy for companies to hide a software incident and then hide behind lawyers while blaming others.

Bugs should be reviewed and dealt with by a profession independent investigation board, not by lawyers.

ldarby · on July 18, 2021

> To be fair to them, they were probably advised to act in this manner by their lawyers.

Probably? Don't you think that's something that should be investigated and people held accountable for?

Nope, doesn't matter, only the stupid software bugs here need to be investigated. Got it.

ldarby · on July 18, 2021

Replying to myself to clarify a couple of things further. I'm not disagreeing with the idea of an IT accident review board, and I agree in general we should focus on eliminating bugs, maybe what PHK means is having even more details of the bugs, i.e. the specific coding mistakes, issues with internal processes in Fujitsu that led to the bugs being released, etc. and a database containing this and other company's mistakes that others can learn from.

I just don't think the bugs here are especially interesting compared to bugs in other software of similar complexity. In any other situation it goes like this:

    customer: hey vendor, this software has bugs
    vendor: whoops, sorry, here's a fix

or even this:

    customer: hey vendor, this software has bugs, and they're so bad we're suing you (which may have been appropriate in this case if the bugs caused financial loss)

not this:

    customer: hey vendor, this software has bugs, and we've sent our employees to prison because we don't want to admit the bugs exist.
    vendor: wtf?

That's a cultural / judicial problem and I just don't believe the reaction should be to focus on preventing other companies creating similar bugs.

m0d0nne11 · on July 18, 2021

How about a headline that isn't such irritating, content-free click-bait?

yesenadam · on July 18, 2021

I looked, and a fair proportion of your comments on HN are very similar complaints—arguably more irritating and content-free than what you complain against.

I complained about an opaque headline once, and dang told me about how making the reader work a little is part of the HN philosophy. Pretty sensible when you learn about it:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

DangitBobby · on July 18, 2021

I agree with the article, though this doesn't quite sit right with me:

> And no, it is not "self-incrimination" unless you did something criminal.

People (in the US) are allowed to remain silent to prevent self-incrimination. While arresting someone, why don't we just say "talking right now can't be self-incriminating if you did nothing criminal?" These are in the same family as "why do you need privacy if you have nothing to hide?" and I just don't think it was well thought out.

Scoundreller · on July 17, 2021

> In 2017 the motor of an airplane exploded over the southern part of the Greenland icecap. Part of the engine landed on the ice while the plane continued to the first suitable airport way up north in Canada.

eh, Happy Valley-Goose Bay isn't that far north as far as Canada goes. 53 degrees north.

The actual droppings in Greendland were around 61 degrees N.

Nuuk would have been ~60% closer, but not a chance it could handle an A380.

overgard · on July 17, 2021

So I wonder, in this situation what changes where the accident is averted? It's fine to demand accountability but what specifically is being monitored? Because I don't think there's a real answer yet. They can say IT needs to do a better job. Obviously. But in what dimension? Theres a lot of "you screwed up!" but very little "heres the fix"

ridaj · on July 18, 2021

This is very sensible. One challenge is to encourage cooperation. One interesting aspect of strategy is to prevent findings to be used in court. For example, findings of probable cause by the US NTSB cannot, by law, be admitted as evidence to a trial. That helps to make it more about preventing reoccurrence rather than finding culprits.

NHQ · on July 18, 2021

This happened to Capt. Kirk in an episode of Star Trek. The legal move is demand to face the accuser. The accuser is the computer.

https://en.wikipedia.org/wiki/Court_Martial_(Star_Trek:_The_...

jhgorrell · on July 18, 2021

Believe this is the report referred to. I enjoyed the read.

https://www.bea.aero/uploads/tx_elyextendttnews/F-HPJE_TECHN...

emmelaich · on July 18, 2021

Two technical things that make it more difficult.

1. lack of flight recorder (black box).

2. computer systems are brittle, complex and ever-changing

nixpulvis · on July 17, 2021

I would gladly work for a prolific IRB.

ithkuil · on July 17, 2021

Well written article