Hacker News new | past | comments | ask | show | jobs | submit login

>“Well what about the 737 MAX”, that was a system specification error, not due to “buggy” software failing to conform to its specification. The software did what it was supposed to do

Exactly: the system was designed to fly the plane into the ground if a single sensor was iced up, and that's exactly what the software did. Boeing really thought this system specification was a good idea.




That is a massive over-simplification and that invites patently false characterizations like it was a "stupid mistake" that would have been fixed if they were not stupid (i.e. adopted average development process). That is absolutely not the case. They were really capable, but aerospace problems are really, really hard, and their safety capability regressed from being really, really capable.

They modified the flight characteristics of the system. They tuned the control scheme to provide the "same" outputs as the old system. However, the tuning relied on a sensor that was not previously safety-critical. As the sensor was not previously safety-critical, it was not subject to safety-critical requirements like having at least two redundant copies as would normally be required. They failed to identify that the sensor became safety critical and should thus be subject to such requirements. They sold configurations with redundant copies, which were purchased by most high-end airlines, but they failed to make it mandatory due to their oversight and purchasers decided to cheap out on sensors since they were characterized as non-safety-critical even if they were useful and valuable. The manual, which pilots actually read, has instructions on how to disable the automatic tuning and enable redundant control systems and such procedures were correctly deployed at least once if not multiple times to avert crashes in premier airlines. Only a combination of all of those failures simultaneously caused fatalities to occur at a rate nearly comparable to driving the same distance, how horrifying!

A error in UX tuning dependent on a sensor that was not made properly redundant was the "cause". That is not a "stupid mistake". That is a really hard mistake and downplaying it like it was a stupid mistake underestimates the challenges involved designing these systems. That does not excuse their mistake as they used to do better, much better, like 1,000x better, and we know how to do better and the better way is empirically economical. But, it does the entire debacle a disservice to claim it was just "being stupid". It was not, it was only qualifying for the Olympics when they needed to get the gold medal.


I really don't think it takes a mastermind of software design to go "okay I've built a system that takes control of the plane's maneuverability, let's make sure we have redundant sensors on this". Furthermore, descriptions of MCAS and its role were dangerously under played so that they didn't have to tell their customers to retrain their pilots. An egregious breach of public trust in a company we put a whole lot of faith into.


>They failed to identify that the sensor became safety critical and should thus be subject to such requirements.

Whistleblower testimony indicated it wasn't a failure to identify it as safety critical, but a conscious decision not to mention it as such to the regulator, and not implement it as a dual sensor system as doing so would have caused the design to require Class D simulator training; which Boeing was relying on the abscence of as a selling point to prevent existing airlines from defecting to Airbus.

>They sold configurations with redundant copies, which were purchased by most high-end airlines, but they failed to make it mandatory due to their oversight and purchasers decided to cheap out on sensors since they were characterized as non-safety-critical even if they were useful and valuable.

Incorrect. All MAX's have two AoA vanes, each paired to a single Flight Computer. The plane has two Flight Computers, one on each side of the cockpit, and the computer in command is typically alternated between each flight. One computer per flight will be considered in-command (henceforth referred to as Main), the other will be henceforth referred to as operating as "auxillary". The configuration you're thinking of is an AoA disagree light, implemented by enabling a codepath in software running on the Main FC whereby a cross-check of the value from the AoA vane networked to the auxillary FC would light up a warning light to inform pilots that system automation would be impacted, because the AoA values between the MFC and AFC differed. A pilot would be expected to recognize this as and adapt behavior accordingly/take measures to troubleshoot their instruments. Importantly, however, this feature had zero influence on MCAS. MCAS only took into account inputs from the vane directly wired to the Main FC. While a cross-check happened elsewhere for the sole purpose of illuminating a diagnostic lamp, there was no cross-check functionality implemented within the scope of the MCAS subsystem. The MCAS system was not thoroughly documented in any delivered to the pilot documentation. The program test pilot got specific dispensation to leave that out of the flight manual. See the Congressional investigation, final NTSB, and FAA report.

>The manual, which pilots actually read, has instructions on how to disable the automatic tuning and enable redundant control systems and such procedures were correctly deployed at least once if not multiple times to avert crashes in premier airlines.

The documentation, which included an Airworthiness Directive and NOTAM, informed pilots any malfunction should be treated in the same manner as a stabilizer trim runaway. Said problem is characterized in aviation parlance as a continual uncommanded actuation of trim motors. MCAS, notably is not that. It is periodic, and in point of fact, it ramps up in intensity over time until over 2° of travel are commanded by the computer per actuation event, with the timer between actuations being reset to 5 seconds by use of the on yoke Stab trim switches. This was ncommunicated to pilots. Furthermore, there were design changes to the Stab-Trim Cutout switches between 737NG (MAX's predecessor), and MAX. In the NG, the Stab Trim cutout could isolate the FC alone, or both FC and yoke switches from the Stab Trim motor. In MAX, however, the switches were changed to never isolate the FC from the Stab trim motors, because MCAS being operational was required for being able to checkmark FAR compliance for occupant carrying aircraft. So when that cutout was used, all electrically assisted actuation of the horizontal stabilizer became unavailable. The manual trim wheel would be the only trim input, and in out-of-trim attitudes, would result in such excessive loading on the control surface that physical actuation without electronic assistance was not feasible on the timescales required to recover the plane. There was a maneuver known to assist with these conditions (when they occurred at high altitude) called "roller coastering" in which you dive further into the undesired direction to unload the control surface to render it actuable. This technique has not been in official documentation since Dino 737 (Pre-NG). The events you're referring to when uncommanded actuations were recovered on other flights, happened at high altitudes, and were recovered with countered electrical stab switch actuation followed by Stab trim cutout within the reset 5 second watchdog timer prior to MCAS activation subsequent to a Stab-trim yoke control switch actuation. This procedure, and the implementation details needed to fully understand its significance, were undocumented prior to the two crashes. Furthermore, this procedure to cut out MCAS/the MFC from the stab trim motor and finishing the flight in a completely manually trim controlled configuration meant that technically you were flying an aircraft in a configuration that could not be certified to carry passengers when taking the FAR's prescriptively, and uncompromisingly rules-as-written with zero slack offered for convenience, because MCAS was necessary for grandfathering the MAX under the old type cert, and without MCAS functional, it's technically a new beast, which is non-compliant with control stick force feedback curves when approaching stalls, which by the way, just to make it clear, a compliant curve has been a characteristic of every civil transport in all jurisdictions worldwide for well over 50 years. This was not documented and only became apparent after investigation. Again, see the House findings, FAA report, and NTSB.

>Only a combination of all of those failures simultaneously caused fatalities to occur at a rate nearly comparable to driving the same distance, how horrifying!

Oh, the multi-billion dollar aircraft maker built a machine that crashes itself, gaslit it's regulators, pilots, airlines, and the flying public to juice the stock price so executives could meet their quarterly incentives, and diverted tunds away from it's QA and R&D functions to do stock buybacks, move HQ away from the factory floor, and try to union bust. With over 300 direct measurable deaths within a couple of months and multiple years worth of grounding and mandated redesigns to fix all the other cut corners we've been unearthing, and veritable billions of dollars of loss incurred in delays. Heavens, it could happen to anybody. How could you possibly see this as something to get upset about? /s


Thank you for providing a more thorough and complete technical explanation.

As you can see from my final statement, I made no argument that it was not a travesty. It was ABSOLUTELY UNACCEPTABLE. This is not a defense of their inadequacy.

I was pointing out how it is absolutely incorrect to claim that it was a "stupid mistake". That argument is used by people implicitly arguing that "If only Boeing used modern software development practices like Microsoft/Google/Crowdstrike/[insert big software company here] then they would have never introduced such problems". That is asinine. As can be seen from your explanation, the problem is multi-faceted requiring numerous design failures in both implementation, integration, and incentives. In fact, the problems are even more subtle and pernicious than in my original explanation that was derived from high level summaries rather than the investigation reports themselves.

I do not know if this has changed in the last few years, but at Microsoft you were required to have 1 whole randomly-selected person, with no required domain expertise, say they gave your code, in isolation, a spot check before it could be added. This is the same process applied regardless of code criticality, as they do not even has a process to classify code by criticality. This is viewed as a extraordinary level of process and quality control that most could only dream of achieving. Truly if only Boeing threw out whatever they were doing and adopted such heavyweight process by "best-in-class" software development houses they would have discovered and fixed the 737 MAX problems.

Boeing does not need to adopt modern software development "best practices" and whatever crap they use at Microsoft/[insert big software company here] that introduces bugs faster than ant queens. The processes in play that created the 737 MAX already make Microsoft and its peers look like children eating glue, but they are inadequate for the job of making safe aerospace software and systems. What Boeing needs to do is re-adopt their old practices that make the 737 MAX development processes look like a child eating glue. The 737 MAX was not stupid, it was inadequate. BOTH ARE UNACCEPTABLE, but the fix is different.


This is a totally bizarre strawman argument. Safety-critical software has almost nothing in common with Microsoft crapware, or indeed, most typical desktop software. Even within the desktopo software industry, MS has never been held up as "best-in-class", but rather the butt of jokes.

As the other poster said, it doesn't take a genius to figure out that a new safety-critical system needs its sensors to be redundant. It wasn't stupid, though, it was malicious: Boeing wanted to hide the existence of MCAS so that pilot retraining wouldn't be required.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: