Part of my job is to help the executives that I report to understand why things went wrong from the security perspective in our business unit. These are purely internal discussions, not even investigations. There are no penalties, but really, for things as egregious as hard coded passwords. As will become clear in a moment, the fact that my executives care is quite unusual.
Culturally the result is coverups and lies.
Engineers lie, managers lie, test people lie, directors lie, senior directors lie, vice president lie, external interesting teams are negotiated into minimizing certain critical failures, and so on. Managers don’t want to hear it so that they can’t be accused of lying, vice presidents don’t wanna know, SVP’s just want green squares on the cross-BU PowerPoint.
This is internal discussion of revenue impacting incidents. Do you know what executives do care about? Revenue. Lost deals. If the people who care about money, including the account teams, don’t care about security and severe quality issues enough to be honest enough to get to improvement, how could an external board accomplish anything for those very few incidents that actually become publicly visible?
This isn’t like the NTSB; I spent my life reading NTSB accident reports. They have actual real authority, there are potentially issues that might impact someone more than being caught distorting things.
As an engineer I found if I didn't lie and pretend to support the false reality of the managers above me, I wouldn't have that job long.
Once we had a very large effort, creating a new hardware and software platform for the companies products. The CEO demanded it be done in 9 months, our estimate was accurate almost to the day-- 18 months. Politically it was easier for the PMs to agree with 9 months and then have 9 one-month delays it took, instead of getting fired for insisting on a good faith estimate.
What do you think the security in that product was like?
Security directly conflicts with the business model of cutting every corner possible and loading engineers as much as possible. A real security program would have to start with reasonable hours and goals.
I got curious about what legal authority the NTSB has. Here is what I found:
>like a cop, the NTSB can secure an accident scene and keep others away. It can examine the aircraft wreckage. It can have aircraft parts tested to help determine why the accident happened. It can even subpoena evidence. . . . If there is a lawsuit concerning the crash, the NTSB will not get involved. Not only is the NTSB’s report of the accident’s probable cause inadmissible, but the NTSB investigators are prohibited by law from testifying in court, even if they are served with a subpoena.
>The NTSB may issue a subpoena, enforceable in Federal District Court . . . For purposes of the Health Insurance Portability and Accountability Act of 1996 (HIPAA), Public Law 104-191, and the regulations promulgated by the Department of Health and Human Services, 45 CFR 164.501 et seq., the NTSB is a “public health authority” to which protected health information may be disclosed by a HIPAA “covered entity” without the prior written authorization of the subject of the records.
I care about the issues (as an angineer), but my experience with raising any security issues is that it results in a lot of pain for me personally.
You report something to the ‘security’ team, and suddenly you’re responsible for doing all the work, as well as the prime suspect in an investigation ‘why didn’t you fix this before, since when did we know about this?!’
I’m absolutely incentiviced to just let any issue lie until it is discovered, because then it actually is the security teams problem.
To summarize, engineers mostly don't live security not because they don't care, on the contrary, but because doing so is the fastest way to sabotage yourself in the workplace and make enemies.
Two of my own examples
1)
A coworker had the task of updating aged signal generators, because we had reported a bug which was then fixed. The signal generators worked without memory, a USB Stick was used as storage memory. As he plugged said USB stick into the work laptop, which he had never touched before, a virus warning popped up.
We decided to do the right thing and tell our IT department about it and ask for further steps..
What followed were ridiculous hostile accusations, threats and pressure by the IT people, instead of a sensible reaction.
2)
We had a safety instruction at work and it was pointed out anyone should follow safety rules and remind colleagues to spread awareness.
Few days later I entered through a security door and two unknown colleagues followed right behind me. As I noticed, I politely asked them to show their access credentials.
I was met with hostility and disgust. Later I learned those two were higher ranged managers, anytime I met one of them later, they remembered and treated me badly, for doing exactly what was asked.
I even went so far and told my supervisor of both instances to "raise awareness" of the reality, but only got shrugs and a blank face.
I was expelled from a university for reporting security flaws, the solution we all seek is the simplest one and for administrators it's easiest to hurt the people making noise which commonly results in the noise going away. fight-or-flight response at it's finest.
Edit: reminder that for the "common folk" these "security issues" are not a 5-minute fix, they are fundamentally different realities which require every machine on the network to be re-checked before they can be used again. There is a clear communication failure between the ones who want security and the ones who want "security".
A high school friend (2 decades ago) told a teacher that the system keeping track of student grades was vulnerable to attack. The teacher asked the student to demonstrate by attacking the system and adjusting one of his test scores down by 1 point. My friend obliged, and the teacher reported the vulnerability to the administration. An administrator threatened my friend with expulsion, but when he proposed to go public with his story in response, they decided they wouldn't expel him. The resolution was "please don't tell anyone", and the vulnerability was never fixed.
It’s common, a friend almost got expelled for reporting a flaw in the universities ID card system. That friend did not brake anything, they did not sneak into any protected spaces. Just discovered and validated a flaw and then reported it.
Once in a past life I discovered that the asset tracking system I was maintaining had no actual security on any of the data it returned. Once you'd logged in as one user you could just change the URL to access whatever user's data you wanted.
I raised the issue with my manager, we had a meeting about security, I explained that I wasn't a security expert but that this was a big problem and we needed to do a full security audit. I guesstimated 3 months to go through the codebase looking for security holes (in hindsight, a huge underestimate) and they said it was too expensive. Talking to fellow ex-employees the issue was still there 10+ years later.
Not to scare you, but I know a power plant operator who has that issue on its plant operations system. It's pretty amazing.
I know another company where anyone in the company, anyone at all with a login into account in the Active Directory SSO of any privilege at all, even a janitor, can sign any code they want with the release keys.
I think you are overestimating the importance of "revenue impacting incidents" to company employees.
If the company makes a couple of million extra or less this year it doesn't effect the majority of workers. Their bonus isn't going up or down etc. And remember this incident has already happened.
By contrast if a report comes out blaming the loss on a worker, department or division then that could have major consequences. No matter how "blameless" it is, come next round of bonuses, promotions or layoffs everybody knows it'll be factored into the decisions.
So people don't have an incentive to make themselves look bad and unlike with the NTSB there is no legal powers or fear of causing deaths behind the investigation.
I understand, but it sounds like we are digging ourselves into the same hole as USSR workers who were not incentivized to deliver working products, when we do that. It’s a civilizational peril. How do we solve cooperation at large scale? Is the only way to watch large companies accumulate bored employees and constantly recreate “the small guy”, the startup, which will finally make things right, until they become too big to be incentivized?
That comparison is apt. A big corporate entity is a mini command economy, and fails in similar ways.
There is an inflection point where the focus of the organization shifts from mission focus to control focus. Usually it’s when growth slows. A fast-mover industry like tech is a great example, but it happens almost everywhere.
That's exactly how it works, and why big companies do not grow until they consume the entire world. They become complacent and unable to change, and a startup eventually takes their business away.
It's funny that people expect management to function better because it's employee-owners of a capitalist corporation, instead of feudal lords or members of a communist central committee.
> Is the only way to watch large companies accumulate bored employees and constantly recreate “the small guy”, the startup, which will finally make things right, until they become too big to be incentivized?
Yes, and you have just illustrated why the USSR struggled economically in comparison to the United States.
I didn't say "employees" so much as "executives"; and the executives I'm referring to go beyond owning P&L. They actually do care about revenue, which is why everyone lies to them.
> Do you know what executives do care about? Revenue. Lost deals.
This is often why engineering needs aren't covered. Things are presented as risks and expense, instead of in terms of revenue. A $10,000 expense actually wipes out $10,000 in net profit, so you need to generate revenue sufficient to create $10,000 in net profit. Most companies have really rosy gross margins, but really tight net margins, so a $100K expense will take $2.8 Million in revenue to offset it. Finally, there is how risk frames what you present. If you come at me with "this might happen" the other side of the coin is "this might not happen", and most managers will avoid the certainty of expense for the possibility of an expense.
If you are working with a CEO, valuation is where it's at. Try to understand the swing in company valuation based on profit. Present to the CEO like this: "X is highly likely to happen within Y months, and it will impact $Z in net profits, leading to a change in company valuation of up to $N." Make sure $N is enough to matter. You just took 98% of the arguments against taking action off the table.
I'm pretty sure I'm not wrong. Source: I own three companies. The $100k and $2.8M were just examples (very thin net profit). It's equally likely you could have $100k in expense and need $200-$400K in sales to offset it if you are in a very high margin business. Regardless, what matters is that the unexpected expense will reduce profit, every time, and by framing risk in terms of profit you will help your CEO make better decisions.
You’re not “framing the risk in terms of profit.” You’re talking about a set of equations in your head, that only you know, that come up with weird results like “it takes way more revenue to ‘offset’ an expense” because you’re holding “net” or “gross” or whatever margin constant. Which is really confusing and surprising and not how anyone I’ve ever met talks about this stuff, and they’d be equally puzzled and I don’t think would experience an illuminating or aha moment or whatever.
Personally I found it a little interesting, I get what you are going for, but it’s so strange and not really useful or true.
I think you are looking for deep insight and the point that was being made was shallow. If I make $3000 in profit, at 30% margin, I have to have $10000 in sales. It follows that if I have an unexpected expense of $300, it will require $1000 in sales to have generate enough profit to cover the expense.
I think that one way or another, most people actually try to evaluate things this way.
If they knew with 100% certainty that some breach would happen, they would probably invest the time and money to fix the situation. Just as if the probability for the breach to happen was known to be 0%, they would be justified in not fixing it.
You frame this as CEOs being pragmatic, and I agree. But then, the other side of the coin is that if you're regularly wrong (there's a high risk of a breach happening — then nothing happens) they'll probably stop listening to you after a few times.
Also don't forget the reverse-tinkerbell effect: if you do things right and nothing bad happens for a while, they may be tempted to consider that perhaps you're being overly cautious and perhaps a doing things a little bit faster/cheaper would be ok anyway.
I also wonder if "blameless postmortem" culture perhaps actively works against preventing these kind of incidents. It doesn't seem that anyone in IT is ever responsible for damage they cause.
But yes, lying, "not seeing" and covering documentation is pretty much standard corporate behaviour I've seen around plenty of companies as well.
I no longer believe in blameless post mortem as a general rule. I have, through experience, come to believe that the contexts where blameless post mortems work are the contexts where literally anything works because they are organizations that have high hiring bars and high expectations. My current employer is not one of them; we are a mountain of mediocrity and all blameless post mortems do is act as an excuse to avoid raising the bar.
The principle of blameless postmortems is not supposed to absolve anyone of the responsibility to change anything, it’s supposed to foreground that serious failures are organizational failures first and foremost, because it’s the organization that has an obligation not to fail, not individuals, who fail all the time as a rule.
I spend a bunch of time reading accident reports from agencies like RAIB and MAIB, but my real jobs have been closer to the Web PKI and thus m.d.s.policy
Back in 2015, Symantec's CA issued some certificates that shouldn't have existed, including for names owned by Google. What's wrong there? Well, a blameless postmortem would probably tell you that your processes and procedures are bad, you are creating bogus certificates to "test" a real CA whose certificates are actually trusted in the real world. Need better processes, training, oversight to ensure things improve. What did Symantec do when they were caught? They fired the low-level employees who conducted the tests and wrote a blog post "A Tough Day as Leaders" which blamed the fired employees for getting it wrong. Some leadership (the blog post of course no longer exists although I assume it's archived somewhere).
Less than two years later, Symantec is back in trouble again because an RA they've worked with has been issuing certificates, using their CA infrastructure (and thus, from our point of view they were issuing these certificates even if they unaccountably believed this isn't their fault) that should not exist. This time Symantec's bosses blame not only low-level employees, but also auditors, bosses at the Korean RA, and anybody else they can think of... except themselves.
This is a gross failure of leadership. Once upon a time a US President said "The buck stops here", but Donald Trump was very clear, "The buck stops with everybody" and "I take no responsibility" and it seems Symantec's leadership were made in that image. They quit the CA business rather than do what it would take to fix the problem.
If you conduct a "postmortem" after an incident, then "Nobody was to blame and nothing needs to change" is almost certainly just as much the wrong outcome as "It's Jane's fault, fire her".
I mentioned I read MAIB reports. One MAIB report sticks out, after many years, in the following way:
Unlike every other MAIB report I've read, this one has No recommendations. Someone died, and yet there is nothing to recommend. Why not? Well, the cause is very simple, two men on a fishing boat took a lot of heroin, and their boat crashed, it sank and one died. No need to recommend that you shouldn't take heroin while operating a fishing boat since heroin is an illegal drug already and operating a vessel under the influence of drugs or alcohol is already a crime too.
If your next "blameless portmortem" doesn't have any recommendations, ask yourself, was what happened already a crime? Are the people involved dead or in prison and so either way beyond the value of recommending a different course of action next time? No? Then we need to recommend how to actually avoid it happening again.
> They fired the low-level employees who conducted the tests and wrote a blog post "A Tough Day as Leaders" which blamed the fired employees for getting it wrong. Some leadership (the blog post of course no longer exists although I assume it's archived somewhere).
>If your next "blameless portmortem" doesn't have any recommendations, ask yourself, was what happened [in IT services] already a crime?
Imagine the amount of training needed for a government worker to be able to decide that and be familiar with law AND not have a political/financial lean to be able to do that. The US government has already decided they're not going to train their population with relevant marketable skills globally at competitive ages. They've turned education into a private/public dating program to preserve social class norms.
The US government is still too ill equipped, corrupt, and disinterested in regional/global IT regulations. They seem more interested in weaponizing the IT realm to remain relevant across the globe. It seems to me like they're consistently deluded in thinking they can maintain systemic supremacy given movements in india, china, russia, and europe.
> and all blameless post mortems do is act as an excuse to avoid raising the bar
"Well, there's your problem, right there."
The entire point of doing blameless post-mortems is to correctly identify problems for resolution. If management doesn't drive changes in response (process, training, communication, whatever), you have a different problem to solve before they'll do any good.
I imagine blameless postmortems sometimes happen because people want to avoid blame.
So they argue that all the cool places do "blameless". We should try that too!
And then the organization doesn't understand the actual concept behind the idea (nor did the suggestor want that). Instead the organization learns "we have decided not to blame anyone". And then everyone involved is satisfied.
A post-mortem should not necessarily blame the individual, but blame the circumstances the individual finds themselves in.
Yes, a hard-coded password is bad practice. But does the company have a bad culture of keeping configs in repos? Maybe management thinks it easier to commit configs with sensitive data, than to set up proper deployment shit. And after all, the repos are private, so it should be fine yeah?
Bad code ending up in production is something you'll see often. Does the company have nice test suites for everything? Continuous integration pipelines? E2E tests? Or is upper management pushing everyone to their limits, because "fuck it ship it"?
In my negative experiences, "blameless" turned in to "nobody did anything wrong" which, of course, undermines the whole point of finding out what actually happened so we can see if there is a thing we can do to reduce the likelihood of it happening again.
Sometimes, the root cause is indeed someone with the privilege but not the good sense ignoring warning signs. If we can't identify that problem, then we can't improve our odds for the next time.
That kind of result may mean you need a Molly guard. The original Molly was a toddler who reportedly pushed the Big Red Switch on an IBM 4341 twice in one day and so they put a cover over the switch to put an end to that sort of outage. Occasionally people need firing but even in that sort of circumstance there's an organization problem that needs addressing.
A valid blameless answer then is "remove the privilege" and yes, despite whatever objections you'll raise, this is possible. Difficult, but possible.
Like, even in the case of the extreme example of someone deciding to intentionally harm the company. You fire them, but then what, how do you prevent the next person to go rogue from causing equivalent harm?
> They accrete so many of these rules and restrictions that people can't do their work.
I hear this from software engineers, from time to time. What do you mean they aren't allowed to SSH into prod anymore? How will they debug, update, or maintain anything?
Sometimes this is your standard-issue hostile reaction to change. The old approach is what they are used to and they don't understand the need to change it (and "ask to understand" mostly so they can try and negotiate). This new world just seems to get in the way for no clear reason. Management neither appreciates nor understands the reluctance and just pushes on.
Usually what needs to happen is a series of changes across the org. You roll out the change with references to policy to support it. Workflows get updated and reworked so that SSHing into prod is not, in fact, the way to update systems or view logs or whatever.
Most of all, educational materials are provided. Often I find that people object to changes because they don't know another way to work. If all you've ever known is SSHing into prod to read logs, you've probably never heard of Kibana or used OpenTelemetry.
Public policy is just waking for this, and private administration is way too unprofessional as a group to formalize the problem (thus some people understand it, most don't), but rule creation requires a cost-benefit analysis.
Any new rule you create is a chance to analyze the entire set, and maybe redesign it.
Or people start ignoring the rules in order to do their work. Which has it's own problems (particularly as rules normally aren't divided into "important" and "unimportant".
I refuse to ignore company rules, and also comment on gross negligence, which more often then not means I am the quarreler, not the good engineer in the eyes of coworkers and bosses.
Let's not conflate security and safety here. The boards hailed by the article are all about investigating safety failures, and so is the advocated invetigation board. Security is a different beast. It's sub-par in many airplanes/power plants/... too.
That's very disappointing.
I've always brought up everyone up to think that of all the professions, you can always trust an engineer. They deal in facts. They don't lie.
Perhaps it's as simple that in the real world, engineers can't lie. Because the world wouldn't work is they did.
In washington state, the state superior court ruled the police department was not liable for the impound fee paid by somebody who had their car impounded for 90* days for driving on what the computer reported was a suspended license, because they are exempt from mistakes from trusting their own computer system.
This was the second time the department had wrongfully impounded his car and they made no attempt to fix the mistake from the first time, this didn't impact the ruling.
Its gonna get much worse before it get any better.
It will get much worse I think. More and more companies are hiding behind algorithms and other computer systems while cutting support staff. If you are wronged you have nobody to talk to and they make no effort to correct the situation. the only recourse is a lawsuit which is way too expensive for most people. And even when they are caught the fines are usually only nominal.
I think we are building up the ultimate faceless bureaucracies.
You're certainly right that lots of very large companies go to every conceivable length to keep you from ever finding a human being, even for an online chat, forget a live phone conversation. Their "contact us" page is nothing but an FAQ.
Is legislation or regulation the answer? That would be unfortunate. The government rarely makes things better, but to be honest, those accident investigation boards probably DO prevent management from sweeping things under the rug.
Some kind of consumer ratings for Quality of Service will have to spring up. You can now find ratings of airlines for their on-time record and likelihood of losing your baggage. We need something similar for web companies.
I would assume (taking the lazy way of not checking) that this is an issue of qualified immunity for law enforcement officers. Qualified immunity is a judge-created doctrine that allows police and other government officials to avoid any consequences for clearly bad acts. Any intelligent 10-year old could see that it's unjust for a citizen to pay for the cop's mistake in this situation. Qualified immunity says, "if there is no specifically on-point existing legal ruling that states the behavior violates the constitution then the government official is not responsible.
Everyone else is expected to use their brains and be responsible for the consequences of their actions. Government officials are not. This is not an issue of computer failures being treated differently than other high-consequence failures. This an issue of government failure getting a pass.
Qualified immunity is a corrupt and depraved element of the US legal system.
No. Qualified immunity protects individuals from liability from the
state's mistakes.
It's similar to how you are not liabile for hurting someone while you are doing your job properly, and why you can't sue a Comcast customer service rep for stealing your money in a billing error.
Sovereign immunity (the principle that the state rules over all so no one can force the state to do anything) protects the state from liability from its own mistakes. You can only sue the state if the state decides to allow it.
That doesn’t seem unreasonable in terms of the police department not being liable. What are they supposed to do? Maintain a parallel system so they can check everything the the computer produces?
It seems like the vendor of the faulty system should be liable.
No. It's the police department computer. They should be liable for the harm caused by it. Maybe the police can recover damages from the software vendor, but that should be separate.
Because the citizen has no visibility nor control into the decisions related to what software vendor to use, what processes are for dealing with errors and issues, what terms are negotiated for licensed source access (to the police) vs managed closed source where all changes, issues, and fixes have to go thru long form scoping processes with the vendor delaying fixes, or what policies are for using and trusting these results, what polices there are for reporting and acting on inaccuracies, as well as informing the entire department/state about the inaccuracies.
The police department does, so they should be liable for those decisions.
Like, the moment there was a case of somebody being incorrectly listed as having a suspended license, the entire department should have been informed to not trust the results until the underlying issue was found and fixed.
In Washington State we have a system to track cannabis, the enforcement officers are supposed to be able to get reports from this system. The system is super buggy and also doesn't have meaningful reports. So there is a secondary system for officers to export to Excel documents. In one of the trainings they've been instructed to look for anomalies -- not real analisys, not even a pivot table. One thing they find is "negative quantities" -- but how can that be? (hint: it's bugs in the tracking software). Then enforcement shows up at the cannabis business to audit these negative numbers (or demand the business try to correct the data (which they cannot due to bugs)).
So, crappy software gets law enforcement officers to basically review data "anomalies" created by bugs by visiting a business. The second most expensive method for data sanatization I can imagine. It's a poor use of their time and disruptive to the business.
The system in WA is so buggy that the agency has opted to freeze the software rather than try to fix the issues. The future of government software is bleak -- so long as they keep using closed source packages from low-cost bidders.
Why isn’t all software created for the government required to be open-source? Would that really drive the costs up, if the providers don’t have the choice?
The vendor claimed that if the code was out it would be a security risk. The agency claims the vendor needs to protect their intellectual property rights. We have (some) visibility into other things our taxes pay for -- the software should absolutely be one -- expecially the regulatory compliance ones that drive enforcement action.
Edit: also, they were breached anyway shortly after launch (2018) and then an email went around offerting to sell the code and data from their entire system.
> $Millions, and in some cases $billions, in tax money pour into projects that almost invariably run late, over budget, fail to deliver
In many, perhaps most, such cases, projects running late and over budget are performing exactly as intended by their sponsors.
All too often, the nominal purpose for a project is just cover for a totally legal conduit from the public purse to politically-selected private pockets. Thus, in the US, we get the F-35, the SLS, the California bullet train. Sometimes we get something out the other end, many years late. (The delay is to keep the gravy train running longer.) Sometimes, nothing.
In New York, we did end up with a 2nd Avenue Subway extension. There actually are F-35 planes at air bases, some of which can actually fly. They are stacking an SLS in Florida as I write this. They probably will launch at least the one, maybe a second, at $2B each.
In the post office case in my view the problem wasn't that the IT system was faulty it was that Post Office management continued to prosecute long after (more than a decade) any competent person would have known it was faulty - these people should be prosecuted.
Software engineering hasn’t had our Quebec Bridge collapse yet.
It will take an enormous visible disaster that does more than cost a company a few tens of millions, or exposes a few hundred million social security numbers before we start holding companies and engineers liable for security flaws.
The only way to create software engineering reform you envision is to force a paradigm shift. The only thing that could instigate it would be like skynet with many dead. Though, the unknown damage from OPM could be quite severe.
There might be 'security standards' but neither the government nor private sector can guarantee anything to be safe. The civilization built on IT networks will need to be rewowrked.
How many were killed as a consequence of the Equifax hack?
Don't get me wrong, that hack was absolutely massive and terrible, but I think you missed the point of what the parent comment was saying.
Their point was that it must be something real and visceral, on a level that will make people think "something like this must never happen again at any cost". As far as I know, there weren't massive killings of people due to the equifax hack, so that isn't quite on the same level.
We've flown planes into the ground, killed people with Xrays, rigged elections, shut down oil delivery to sections of the east cost of the US.
If anything we have bridge collapses constantly and special news anchors that report on which routes you should take today as if it were just like any other traffic incident.
The industry fails to listen to lessons written in "Mythical man month" - 50 years from now. Half of a century ago. Of course some reports on why systems are being designed and coded poorly won't change anything. We know why, we just ignored the knowledge to the point of absurdity.
Companies could be held liable for gross misconduct. Although GDPR is not exactly a shining example of IT regulation, I think it's a good example of liability.
Companies get fined for breaking GDPR.
Governmental projects should have similar requirements in place, and companies and people should be held accountable for breaking them.
According to that website, one of Google's 10 fines is the highest fine ever, at €50 million. Hilariously, in a dystopian kind of way, they were fined €28 last year. Not as in millions, but twenty-eight euros.
Facebook only appears on the list once, with a paltry €51,000.
But in any case, the point of GDPR is not to bankrupt companies.
> But in any case, the point of GDPR is not to bankrupt companies.
It _should_ be.
Well okay that's a bit hyperbolic, but what I mean by that is that the point should be to _change_ behavior and if the fines necessary or the resulting change in business model leads to bankruptcy then that should be totally fine. Companies who cannot operate legally shouldn't continue operating.
Which would a substansive (or even excessive) fine for a typical individual person or small business. For a multi-billion-dollar corporation, it's a operating expense.
> the point of GDPR is not to bankrupt companies.
The point of fines is to be too large to be operating expenses.
I've stopped caring about GDPR given its lax enforcement and recent power grabs by european parliament. They had their chance to prove they were going to 'clean' their space and give people trust in their actions. If they truly believed in their model I think they would have tried a bit harder. In my opinion they have already failed to gain the trust of their citizens.
I hate to say it but I think the Chinese model with 'nationalization' of the internet and IT services has already won and we're in denial. Regional governments, "federations" are likely to continue to reduce global trade through regional nationalization and IT programs.
The EU had decided to expand their mandate into the tech sphere with regulation already proven not to work as intended/stated.
The EU mandate should not include policing of private speech through tangential monitoring services. Some wealthy European countries themselves have rather restrictive laws and approaches to private thought and through recent laws will have access to this information. The EU (non-democratic European commission with little legislation transparency) will likely to continue to grasp for more power as their competition of thought becomes more mimicry of foreign models of power than actual ingenuity.
> Personal information is the helium of IT systems—it leaks out of every crack or imperfection faster than seems possible.
Might as well call it the hydrogen of IT systems—get too much of it concentrated in one place, and all it takes is one little spark for it all to go up in flames. Boom!
I agree with nearly everything in this artictle but the following question stumped me: when exactly would a software disaster investigation board be employed?
Plane goes down, train goes off rails or passes signal at danger, easy. But at what exact what point did the UK postmaster system "fail" enough for an investigation?
> But at what exact what point did the UK postmaster system "fail" enough for an investigation?
The moment it came to light that postmasters were being improperly convicted? The moment it came to light that some improperly convicted postmasters committed suicide?
This is all pretty new in the world. It took many years for aircraft investigations to get real professional at it.
How about we start with a simple algorithm like this:
1. Check if software is involved in the incident.
2. If it is, carry out an internal review to evaluate the possibility that the software may be a cause of the incident.
3. If it is, hand over to an independent review board.
4. Involve lawyers if and only if, the independent review board recommends it.
That was an off-the-cuff attempt at a suggestion. There is probably a much better way. Feel free to suggest alternatives.
I have no objection to lawyers getting involved. I just object to them suppressing the facts to protect their clients and preventing an independent investigation. It's impossible to cover up that an aircraft has crashed. Lawyers involved in an aircraft accident on behalf of the manufacturer (or operator) are basically doing 'damage limitation'. Lawyers involved in a software incident on behalf of the manufacturer (or operator) are basically doing 'evidence suppression'. They should stick to 'damage limitation'.
Large amounts of money spent on government systems that never ship is a tragedy, but software projects like these tend to have a lot of open questions.
We understand software development often as a discovery process (evolving requirements), especially if they are large or disruptive. So one critical output of any such project has to be knowledge that can be built upon, as in open, clearly specified and written papers. This should be done regardless of whether it failed or didn't fail.
I can tell you now that nobody including the current engineers, can work with the requirement documents that are being generated over the course of our multi year system transformation project.
Of course, it’s still an improvement over the legacy system which has none.
> Compared to the 100 million euros Denmark spent on a new IT system for the police, a project that never delivered anything?
Governments need "sunken-cost fallacy" triggers that automatically halt projects when they pass thresholds, even when many worry about the effect of the halting. It's like a scene from a movie - "halt my project when I go over 100M EUR, even if I'm begging you not to".
The problem is that many initial estimates will be low ball offers to try and win a contract. Perhaps combine the sunken-cost-limit with delayed payments, otherwise companies might low ball offers knowing they can never finish but still don't care because they'll be paid.
Better oversight is clearly needed too. There won't be one tool that'll do the job on its own.
The funny thing is, many projects (and I should have omitted government-run, any organization can suffer from this problem) have reasonable cost estimates when they begin. Why not start with those? Perhaps this rule would incentivize more realistic estimates?
At issue is that software continues to approach an opaque “ball of mud” as size and complexity increase. Silver bullets of the day like “microservices”, event-driven design, and functional programming do nothing to improve that.
The prevalent analysis method of use-case driven procedural transactional scripts resulting in controllers or services [1] (typically just transforming a database) is problematic. Yet developers are taught or can use nothing else from computing’s rich history currently.
It would seem that any attempt to improve how we model “external” [2] complex systems in code has ceased in favor of pursuing the low hanging fruit of technical detail.
Why are we surprised that any software outside of computing and the data sciences, together with computing infrastructure, are challenged?
At issue is that outsiders have no idea of this industry incompetence, and incorrectly rely on software as being accurate, complete and foolproof.
——-
[1] Martin Fowler’s “controller-entity style” - P of EAA
[2] Where expertise in the complex problem domain lies outside of the development team.
I’m baffed that nobody though that there might be an issue with the system when it suddenly turned out that 700 postmasters were disappearing funds. That’s way too high to be accidental.
Like most things, it's a matter of scale. If a train derails, we call in the NTSB, but they don't investigate car crashes.
The issue that I see, is that the software industry seems to be absolutely obsessed with scale. Small applications are actively sneered at. Go big, or go home.
So that means that every accident is a train wreck.
It's known what went wrong, computerphile has a video with some details: https://www.youtube.com/watch?v=hBJm9ZYqL10 but it doesn't address any of the judicial and cultural fails, that's what needs to be fixed.
Software bugs are a fact of life, people know this, except the judges in this case apparently.
Bugs are a fact of life because of sloppy practices. The experience from SQLite is instructive, after a testsuite had been written, matters improved immensely.
Why was the testsuite written? Because it was in the list of requirements from the client, aerospace standards demand that every possible branch is covered by a test.
One could argue that faults in the engineering and construction are also a fact of life, yet that doesn't mean we excuse them and it doesn't mean that assume that a failure is due to those faults. Investigations are performed in order to ascertain the truth.
I think the authors comparison to the historical development of trains is appropriate. Investigating IT failures wasn't as important 50 years ago because IT infrastructure was not as critical. Investigating IT failures today is critical because the functioning of society depends upon it.
I don't think you (or the sibling comment) got my point. No where am I "excusing" bugs. Civil engineering bugs are indeed a fact of life too, just check the god damn news.
What I'm saying, is that I think where PHK says "nobody sat down and documented precisely what went wrong", I think he is just wrong.
The guy in the Computerphile video, Steven Murdoch wrote this article:
"There is a window of time between a user printing and cutting-off a report. If another user was to perform a transaction during that window, that transaction may not show on the report."
The problem that needs investigating here is the miscarriage of justice, apparently the post office was aware of the bugs at the same time as prosecuting people for falling victim to the bugs, in order to save face about their IT decision.
A blog post and a judgement focused on the legal issues is nothing, not even close, not even in the same ballpark, as a thorough report from an investigatory board.
What PHK is calling for is a couple of orders of magnitude greater in detail and depth.
Moreover he specifically denounces any practice of treating technical inquiries as judicial processes. The incentives differ, making that a short path to bad outcomes as people run for cover. Dispassionately recorded findings that inform the whole profession may be useful, and restorative justice has already occurred; punitive justice will be best served as a separate and ultimate proceeding.
You're both right - inopinatus and ldarby. It was clearly a miscarriage of justice - hence the lawyers and court case and resignation of the Post Office CEO. But it absolutely needed to be investigated by a dedicated, professional Software Incident Investigation Board.
The miscarriage of justice is the truly awful, shameful part of this incident and it all came about because of the 'less than honest' stance of the post office leadership. To be fair to them, they were probably advised to act in this manner by their lawyers.
The difference between an aircraft/train/boat/car/oil rig accident and a software incident is that we immediately know that an aircraft/train/boat/car/oil rig accident has happened. Professional investigators are usually on the scene long before the lawyers have had their first cup of coffee. With a software incident, lawyers have wrapped their fingers around everything long before we even know that something has happened. For the incident investigators to do their job properly, it would probably be best to keep all lawyers out of the picture until the investigation is complete. But the chances of that happening are slim.
I also agree with PHK about the expense issue. There are so many examples of huge expense to discover the cause of an aircraft accident. Just think of MH370 or AF447 (military submarines were used to try to find both wrecks). Also oil rigs. How about Deep Water Horizon? Huge amounts of money was spent to figure out exactly what happened. Similar sorts of money should be made available to discover the causes of software incidents. It's far too easy for companies to hide a software incident and then hide behind lawyers while blaming others.
Bugs should be reviewed and dealt with by a profession independent investigation board, not by lawyers.
Replying to myself to clarify a couple of things further. I'm not disagreeing with the idea of an IT accident review board, and I agree in general we should focus on eliminating bugs, maybe what PHK means is having even more details of the bugs, i.e. the specific coding mistakes, issues with internal processes in Fujitsu that led to the bugs being released, etc. and a database containing this and other company's mistakes that others can learn from.
I just don't think the bugs here are especially interesting compared to bugs in other software of similar complexity. In any other situation it goes like this:
customer: hey vendor, this software has bugs
vendor: whoops, sorry, here's a fix
or even this:
customer: hey vendor, this software has bugs, and they're so bad we're suing you (which may have been appropriate in this case if the bugs caused financial loss)
not this:
customer: hey vendor, this software has bugs, and we've sent our employees to prison because we don't want to admit the bugs exist.
vendor: wtf?
That's a cultural / judicial problem and I just don't believe the reaction should be to focus on preventing other companies creating similar bugs.
I looked, and a fair proportion of your comments on HN are very similar complaints—arguably more irritating and content-free than what you complain against.
I complained about an opaque headline once, and dang told me about how making the reader work a little is part of the HN philosophy. Pretty sensible when you learn about it:
I agree with the article, though this doesn't quite sit right with me:
> And no, it is not "self-incrimination" unless you did something criminal.
People (in the US) are allowed to remain silent to prevent self-incrimination. While arresting someone, why don't we just say "talking right now can't be self-incriminating if you did nothing criminal?" These are in the same family as "why do you need privacy if you have nothing to hide?" and I just don't think it was well thought out.
> In 2017 the motor of an airplane exploded over the southern part of the Greenland icecap. Part of the engine landed on the ice while the plane continued to the first suitable airport way up north in Canada.
eh, Happy Valley-Goose Bay isn't that far north as far as Canada goes. 53 degrees north.
The actual droppings in Greendland were around 61 degrees N.
Nuuk would have been ~60% closer, but not a chance it could handle an A380.
So I wonder, in this situation what changes where the accident is averted? It's fine to demand accountability but what specifically is being monitored? Because I don't think there's a real answer yet. They can say IT needs to do a better job. Obviously. But in what dimension? Theres a lot of "you screwed up!" but very little "heres the fix"
This is very sensible. One challenge is to encourage cooperation. One interesting aspect of strategy is to prevent findings to be used in court. For example, findings of probable cause by the US NTSB cannot, by law, be admitted as evidence to a trial. That helps to make it more about preventing reoccurrence rather than finding culprits.
Culturally the result is coverups and lies.
Engineers lie, managers lie, test people lie, directors lie, senior directors lie, vice president lie, external interesting teams are negotiated into minimizing certain critical failures, and so on. Managers don’t want to hear it so that they can’t be accused of lying, vice presidents don’t wanna know, SVP’s just want green squares on the cross-BU PowerPoint.
This is internal discussion of revenue impacting incidents. Do you know what executives do care about? Revenue. Lost deals. If the people who care about money, including the account teams, don’t care about security and severe quality issues enough to be honest enough to get to improvement, how could an external board accomplish anything for those very few incidents that actually become publicly visible?
This isn’t like the NTSB; I spent my life reading NTSB accident reports. They have actual real authority, there are potentially issues that might impact someone more than being caught distorting things.