I found it interesting that Boeing did proactively tell airlines to inspect 737 MAXs for a possible loose bolt in a different part of the plane (rudder section) at least 8 days before the January 5th event. Example story: https://www.reuters.com/business/aerospace-defense/boeing-ur...
Unfortunately, Boeing did not know they had other issues with the plug door bolts.
I work in the defense industry, it’s very much like the aerospace industry in that we deal with human life as a consequence of our work. We have software QA departments that operate very much like manufacturing or aerospace QA.
Software QA provides nothing of value to software development; having it as a dedicated function works against the overtly stated goals of the function and counterintuitively acts to degrade quality within software by mandating strict top down process and brittle end-to-end testing.
Although Software QA is intended to be an independent verification body that provides engineering organizations with tools and resources, in practice they function as a moral crumple zone [1] within the complex socio-technical defense industrial system, being one of the groups that the finger will be pointed to when something goes wrong and absorb shock to the business in the event of a failure. As a result they have a strong incentive to highly systematize their work with specific process steps, to shield them from liability, which can be applied generically to all projects.
Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops. This acts to find problems quickly and resolve them quickly. Software QAs need for high level, top down, generic systemization requires them to work against these principles in practice. Bespoke project specific checks, such as unit testing, is not viewed as contributing to the final product and is discouraged by leadership who see it as waste.
To give an example of how these dynamics destroy quality in software. I once found a bug in software on a piece of test equipment where a logarithmic search function was not operating on a strictly sorted list. When I pointed this out to my leadership I was told that if we changed any part of code, it would require a new FQT, which would be too expensive to conduct and was not in the budget. Although the bug would have been trivial to solve, and was clearly wrong and would not provide any benefits by remaining in the test equipment software, the process required for changes prevented solving the issue.
> We have software QA departments that operate very much like manufacturing or aerospace QA.
I don’t work in this industry, but this seems fairly ridiculous on its face: software is not at all like manufacturing.
In manufacturing, there’s a design and a manufacturing process, and a critical function of QA is ensuring that the manufactured produce is manufactured to spec.
With software, the software is written, compiled, and then repeatedly copied. And something should verify that it’s copied correctly, but this is straightforward and boring.
So software QA ought to be much more like the kind of validation that happens when designing hardware, not like the kind of testing and validation that happens as products are manufactured.
Ideally there should be a solid spec written and then qa can test against the spec. Maybe there is somewhere that does write solid specs, including accounting for corner cases, but in my 25 years working professionally in the industry, I’ve never seen it.
I work in MedTech. We do this. A design has to be reviewed by QA, and is then tested, and the test is reviewed again. So just to counter the narrative, there are companies that do that, and it is working. In other jobs I also saw the cargo cult of QA. But in some industries it is just crucial, otherwise the pressure is too high to cut corners to implement something. It is a good mechanism to counter the need to move fast and break things.
The only complete and precise specification of software is the code itself. If some other form of specification was complete, we would be able to auto-generate the code.
This is beside the point. The code specifies what the product _is_, not what it "should" be. If you ask for a word processor and I deliver a perfectly bug-free and feature-complete calculator would you really believe it lived up to spec?
This is also beside the point. I think both of you are trying to warn against the dangers that lie on both sides of this coin: people can invest too heavily in a specification and waste an enormous amount of time, and people can immediately jump into coding and code something that does not do what it was intended to do. Like with all things in life, there’s a balance between these two extremes that’s correct.
You need some level of specification so you know what you’re building, but you have to keep in mind that the final code defines what the behavior truly is. Sometimes, that behavior unintentionally becomes part of the specification because users begin to rely on it.
I do like the fact that you both used hyperbole to succinctly illustrate the dangers of veering too far in either direction though :)
A (human language) specification is simply _enough_ information about a system that a human can figure out the intention of the author. The smarter and more context-rich the human, the simpler the specification can be. The dumber and less context-rich the human, the closer the specification needs to be to code.
It's asymptotic. By the time you reach a human who is as dumb as an actual computer, the specification _is_ the code.
I work in software for CPU design/verification. Even here, where in theory there should be a rock-solid spec, there's not. There's a 12,000 page architectural specification, which is very helpful for specifying all the end-user visible state. But the microarchitectural specification is scattered all over different PDFs, visio docs, excel sheets, and sometimes the only spec is the RTL code itself.
I think that's why people always tell each other to not take things at face value.
Of course there is a big difference between sw and hw QA, in the thing that they test, and how they test them.
But they are also very similar. Any QA department has to think about ways that things can go wrong, and what things to test for, how to test, which testing methods, which standards to handle, keeping certifications, etc. During testing you also need to keep reevaluating if you actually are catching each problem/bug and how to implement changes in your company that decreases the amount of problems or increase the amount that you catch.
I think in that way there's a lot of overlap in thinking about business processes and how to identify problems with them.
Of course once a specific binary gets tested and approved by QA it shouldn't matter if it gets copied or whatever as long as you make sure its the same binary (by a checksum for example).
But still making sure that errors don't reach the customer, is vital in any QA. If errors does happen, QA is the department that can make sure that it doesn't happen again. And ofc be able to proof in court that you did your due diligence if something does happen.
It sounds like they are calling something QA but using it as a liability shield. It makes sense that you are upset about that, but naming something QA and having it do something else doesn't mean that QA as an effort is bad. It means that the people doing that are being deceptive.
Fair point, you are correct in your inference that there are some bad actors in my workplace. However, I’ll argue that the fundamental dynamics of bifurcating the responsibility of quality from software leads to a steady state where all QA departments end up as a liability shield given enough time.
This is driven by Pournelle's iron law of bureaucracy [1], which says that people who promote the bureaucracy rather than the mission of the bureaucracy will get promoted within the organization and come to dominate its decision making.
For example, in schools, administrators make more money than teachers. This is despite both groups having similar levels of education and intelligence. The reason for this is that administrators know the laws and regulations of the environment they’re working in and ensure the continuity of the organization. Despite not directly contributing to the organization’s stated mission of education, they are in charge of the organization and take more benefits from it.
Software QA has similar dynamics. A QA department may start out making good faith contributions to the organization. Eventually there are product failures, eventually leadership needs a scapegoat to show they’re doing something, and eventually QA takes the blame. People get moved, demoted, or fired. QA realizes its risk, and takes steps to mitigate it. They create a highly systematized workflow and process, adopt or introduce standards. Then assert that following process equates to good outcomes. When bad outcomes occur, they point to their strict adherence to following process as evidence of innocence.
If the process does not support the work or mission, that is a cost they are happy to impose on other functions to deal with. This is the final state until a system disruption happens.
I have seen a case of Software QA taking a very different shape, so I'd like to argue that the outcome you describe is not intrinsic to software QA, but rather to company culture.
The case I'm talking about does not have a separate QA department, but QA people as part of every software team. If a product fails, that team is responsible, so software devs are in the same boat as QA. They focus on learning from these failures, so no scapegoat is needed. Process does get followed, but not as a defense mechanism, but because not doing so introduces noise that is an obstacle to improvement. In case of bad outcomes, people do point out that they followed process because then it is clear that the process is involved in the failure and should be improved.
Unfortunately, companies with that kind of culture are rare.
I can see integrated QA working for teams because QA personnel would understand project specific constraints and degrees of freedom and tailor solutions in a way that top-down QA cannot.
However, there are situations where less QA may be needed, for periods of time, such as when PRs may be low. QA may be seen as overhead by management, and something to reduce. This will lead to QA shared between teams and a push for standardization and top-down process deployment to minimize complexity for these personnel will develop. Complexity to manage the QA personnel will be shifted to development teams.
This situation absolutely is controlled by company culture. A culture that neither values QA nor development will do this. A company under financial strain will do that. Companies wax and wane constantly.
Manufacturing process builds identical widgets using standard equipment. Widgets are inspected to confirm they are within spec. Frequentist statistics are used to determine when widgets are consistently out of spec. When this happens, equipment is inspected and repaired as a corrective action. This process is well defined and linear.
Software produces bespoke, non-standard widgets to address domain specific needs. At the end of the day, software developers are defining a process for machines to follow. If you want to control quality in the software development , aside from having perfect domain knowledge for a particular project, the only way to do it do it is to define an arbitrary process for developers to follow and track adherence to it. This may have no impact, or be a hindrance. It will never add value because it will never be abstract enough to be appropriate for every domain.
It's like saying communism isnt the problem, but that it s how every single group attempted to implement it that should be blamed.
Sure, maybe, but if nobody ever can implement the theoretical utopia, maybe we should talk of things humans can do instead and ditch the unimplementable idea.
QA cannot be done by a separate team the way you dream: it will always be a political buffer zone staffed by the cheapest half-competent people you can find, expulsing good people into dev or management. Or you merge it into dev/solution design.
The reason is simple: just like contract law, you only care about quality once you are in trouble and need to reverse back the source of the issue to give to the client a post mortem. Otherwise, you care first about velocity, or $ input/hr of effort.
About 2000, Software QA (and almost all traditional QA activities) were changed. The focus was on process over inspection.
"Design in quality, do not inspect it into the product"
Suppliers (to include software) were expected to manage the quality of the product they provided; the purchaser would focus on how they managed the process, not in the compliance of every part.
This had a chance until software process was tossed in the name of "agile".
I recall a bug I was involved with at a telecoms equipment market in the early 2000s. The bug only showed up in our biggest base stations in high load situations. We diagnosed the bug, and there were a couple of parts to it. Sloppy software design in an optional hardware module (no state machine) was one part - and was fixed. But there was another underlying issue in the way message queues were handled.
Anyhow, the fix for this was created and written. But we never got to put it into production. The reason: the company didn't have a lab test facility that could put a sufficient load on the software to prove it. Even though we were getting field failures because of this issue that were getting a bad rep, we couldn't fix it because even though the old code was known to be buggy, we couldn't prove the new code. So the process said we couldn't ship it.
Another way of looking at that is that within the ability to test, the implementations were indistinguishable, so the process mandated that the older implementation must be used. I wonder if they would have explicitly specified age as a metric if this was considered when designing the process.
Here's a stupid question: How do you know your process is good unless you inspect it?
"Hey Bob I know you're a competent engineer, but don't worry about specifying a certain type of bolt or loctite, the untrained assembly personnel will figure it out. I'm sure they won't let 200 people die in a plane crash."
> Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops.
Agreed, for good software teams.
I would content that most software teams at most companies are not good.
Which is to ask, with an average to bad software team is it better to have integrated or separate QA?
If your devs aren't good what are the chances of your QA team being good enough to make up for their short comings? The dynamics laid out by the parent comment will just hit even harder. Your best bet is to enforce basic practices like continuous integration, coverage goals and maybe a coverage ratchet as a merge gate. Training and education on areas were the team is weak is also a must.
Does it make sense to degrade the performance of good software teams because bad software teams exist?
Ideally we’d always have good software teams, but in the real world sometimes you have to build software with bad teams.
Leaders have options, they can do things like reduce scope, increase budget, increase schedule, or full on abandon or cancel the project. These are all options available to leaders, but they require tradeoffs and decisions to be made on a project by project basis.
It is scalable to have a strict process that everyone has to follow, then impose a watchdog to enforce it on a wide scale. It may not be better to have separate QA, but it is easier for those in charge.
It makes the most sense to me to match the org structure to the teams you have.
If I'm trying to build something with undertrained, demoralized, underpaid engineers... it's not optimal to use methods intended for self-motivated, high-performance teams.
And nothing says there must be company-wide mandates. Maybe this area gets a formal, independent QA team, but this other area doesn't.
My experience just doesn't bear out that collapsing the QA function into development always leads to better outcomes.
I've seen the opposite happen too often, and QA be the sole bulwark between idiocy and customers.
> Does it make sense to degrade the performance of good software teams because bad software teams exist?
Consider the classic statistic "most drivers think they are above average".
I posit that the same is true of software teams, almost every team will self-assess as above average, i.e. good. Those teams will then imagine that, being good, they build quality into the process and very little verification QA is done.
I have worked as a software consultant for 15 years now. I've worked with at least 40 separate software teams in that time. Every single team manager would pep talk with "this is the best team I've ever seen". Some of this is obviously blowing smoke to get people to work harder and feel good. But over the years I've had candid conversations with managers and realized that most of the time the genuinely think their team is really good, truly top 10-20%.
Here's the rub. Being a consultant, I'm almost always brought in by higher level management because something is going horribly wrong. The team can't deliver quickly. The software they deliver is bug ridden. They routinely deliver the wrong software (i.e. incorrect interpretation of requirements.)
Often times these problems are not only the fault of the development team, management has issues too. But in every single case, the development team is in dire straits. They have continuous integration sure, and unit tests, and nightly builds, and lots of green check marks. But the unit tests test that the test works. The stress tests have no reality based basis for expected load. The continuous integration system builds software but it can't be deployed in that form for x, y & z reasons, so production has a special build system, etc...
In 15 years I have never once encountered a team that would not benefit from a QA team doing boring, old school, black box manual testing. And the teams that most adamantly refuse to accept that reality are precisely those that think they are really top tier because they have 90+% unit test coverage, use agile and do nightly builds.
So, my question is, do you (I don't mean the specific "you" here, rather everyone should ask themselves this, all the time) think that most bad software teams know they are bad? Including the one you are part of? Would it really hurt to have some ye olde QA, just in case, you know, you are actually just average? :)
I'm curious: in your many years of being a consultant to these bad teams, where the manager really thought they were top 20%, did you get a chance to talk to the rank-and-file team members, and did they paint a very different picture of the team health and software quality than their manager?
Also, did you run across any orgs where they basically refused to use a process like Agile, and instead just did ad-hoc coding, insisting that this was the best way since it worked just fine for them back when they were a 5-person startup?
Not parent, but in my experience as a consultant working with bad teams, the rank and file were 'doing the job.'
You usually had a few personality archetypes:
- The most technical dev on the team, always with a chip on their shoulder and serious personality issues, who had decided to settle for this job for (reasons)
- The vastly undertrained dev who was trying to keep up with the rest of the team, but would eventually be found out and tossed, usually to blame for a major issue
- The earnest and surprisingly competent meek dev, who presumably didn't have enough confidence to apply to a better job, but easily could have made it on merit, work ethic, and skill
- The over-confident dev who read a bit of SDLC practice, and could see every tree while missing the forest
The key is that, aside from the incompetent person, they had all always been working there for awhile. Consequently, there wasn't good or bad health and quality: there was just "the system" (at that company) and dealing with it.
And none of these folks ever worked at 5-person startups. ;) I think it was definitely more an issue of SDLC "unknown unknowns" they should be doing, than willful decisions not to.
> I'm curious: in your many years of being a consultant to these bad teams, where the manager really thought they were top 20%, did you get a chance to talk to the rank-and-file team members, and did they paint a very different picture of the team health and software quality than their manager?
Yes, generally I join teams and work as an engineer or sometimes as a team lead, so I'm talking to all the team members.
Most start up teams are composed of junior developers, often pretty smart people. Usually 5 or fewer years of experience. Many times these are people who have already accomplished stuff they didn't think they could do. So that generally means that yes they think pretty highly of themselves. To a degree it is quite justifiable, they tend to be very accomplished but in a narrow domain. Unfortunately they don't realize that their technical accomplishments in a specific field does not mean that they are experts everywhere. Their managers understand that these are smart people and assume again that this is therefore a good team.
Non start ups that I join are usually just plain dysfunctional.
> Also, did you run across any orgs where they basically refused to use a process like Agile, and instead just did ad-hoc coding, insisting that this was the best way since it worked just fine for them back when they were a 5-person startup?
Usually more the opposite. In my experience I come across teams that are sure they must not need any help because they follow all the rules in Scrum and have great code coverage metrics.
It is really common to see this kind of thing. I call it "the proxy endpoint fallacy". It can crop up anywhere that there is something that can be measured. In that example, it would be confusing adherence to Scrum with having a working SDLC or perhaps confusing code coverage metrics with the objective of having bug-free releases.
This isn't a software only fallacy. In politics, GDP is often confused with societal well-being. Always be wary of your metrics and change them as required to keep you tracking your actual goals.
Depending on the shape of the distribution, most drivers could be above average. Average doesn't imply 50th percentile, that's what the median is for. A minority of tremendously poor drivers could certainly mean that most drivers are in fact better than average, in the same way that my friends on average have more friends than I do.
I'm not going to argue with the general thrust of your comment, which I think is insightful as to how incentives can compromise objectives. But...
> To give an example of how these dynamics destroy quality in software. I once found a bug in software on a piece of test equipment where a logarithmic search function was not operating on a strictly sorted list. When I pointed this out to my leadership I was told that if we changed any part of code, it would require a new FQT, which would be too expensive to conduct and was not in the budget. Although the bug would have been trivial to solve, and was clearly wrong and would not provide any benefits by remaining in the test equipment software, the process required for changes prevented solving the issue.
I've seen this happen where it was a bad thing, but also where it was a good thing.
It's all about risk.
What risk does the software defect pose to the mission? What risk is inherent in making any change to the software? Noting that even trivial changes can be fat-fingered and thus are a source of risk. I've seen it go wrong this way: a seemingly trivial change was made, but the developer accidentally checked an extra file into source control, causing a further defect.
And then: what is the cost of mitigating these risks? Maybe the software defect is as trivial as its fix. Maybe an acceptable fix would be to write up a workaround in the documentation.
I don't think it's always wrong to say no to fixing issues. I also don't think it's always right that a separate QA department contributes nothing to the organization, even if they act as a handbrake on the software developers (sometimes, precisely because they do that). Human factors are real.
> Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops.
No. Good software teams are led by competent, technical management. Managers who aren't afraid to get down into the dirty details. Managers who aren't afraid to roll up their sleeves and write code if they need to.
The process doesn't matter. The management of what is or is not important does. Agile is just one process out of many.
Imagine an accounting team led by someone who never did accounting in their life: "Just make the numbers work out! I don't care how you do it! My bonus is at stake!"
Sigh... This myth that the only people who can competently manage developers are other developers has been floating round for decades.
For some reason, developers seem remarkably blind to the skills other roles and disciplines require. Only a developer can do that, everyone else is basically useless fluff. Maybe it's a form of arrogance or just deep unself-awareness.
Let's apply your reasoning to medicine. I'm sure you would be completely fine with managers telling your surgeon what parts of the surgery can be 'optimized away'.
Hahah, indeed. So have you seen a law department in a company headed by someone who doesn't come from law background? How about a finance department headed by some schmuck who doesn't know anything about finance?
I've seen plenty of departments managed by people who don't come from the background of the department. My current boss is extremely good and came from a different discipline.
Although I don't deny it can help to have the background, it is not necessary to be a good manager of something. Also seen plenty of good techies promoted to management and failing badly.
This is a lie, and you know it. Even a mere idea about lawyers being managed by a non-lawyer would be laughed at. Same with finance, nobody would be stupid enough to even try it
I disagree with you. You are stating it, but you are not giving reasons. Managers who weren’t developers tend to not be able to manage the team. They can’t help with or understand the technical decisions made. The non-technical managers tend to be project managers just focused on dates.
We may be talking about different levels of management here. A manager should not be making technical decisions, they should have team leads and architects who do that. It's their job to manage the team, interface with the business, prioritise work and give cover to the team so they can get on with it.
I guess if you have a manager who is making technical decisions, they are really a hybrid manager/contributor role. Maybe that works better in smaller organisations.
Then what exactly is the non technical manager's added value?
He has no experience to lead the team in high pressure situations. Like production being down.
He can't truly have a first person understanding of the work of the people who he manages. He has to rely upon others to tell him who's good and who's bad. That sets up a pecking order.
He can't help or mentor engineers with design decisions, or provide a historical context.
He doesn't understand the technology so there's an immediate communication and knowledge barrier that has to be overcome between him and his directs.
He doesn't feel the pain of a bad decision, because he's not coding it, and he can't emphasize with them since he doesn't code.
He tends to push feature development without fixing technical debt. Again that's pain he personally doesn't feel.
To me, the role you are describing is a principal engineer or team lead, not a manager.
Simply not true however that a good manager can't lead the team in a high pressure situation. I'd say that exactly what a good manager could do well. Obviously they won't be making overtly technical decisions, that's what you are for. They can make business decisions, provide cover, get resources, communicate to other stakeholders... All the bits that need doing but would be a huge hassle for the techies who are trying to fix the issue.
Not everything of value is technical. That you don't see the value is either because you have great managers shielding from having to deal with all that, or you have always worked in a place that combines management and technical responsibilities (which I never have except for very small companies).
I wouldn't call what I'm describing as product management, although it's possible they could do general management too.
> Good software teams are led by competent, technical management.
...or perhaps with no managers at all. I'm less and less convinced of the importance of management in engineering except to give investors an illusion of control.
I sort of agree, and I do think it’s possible depending on the team. But unfortunately developers can be too opinionated and get focused on low priority things.
Why not both? Am I missing something? You can have feedback loops and CI and all that, "good craftsmanship" or "good practices" (not "best" practices because those often suck hah), where of course opinions vary on the details of that -- and then someone who is also good at the craft who spends more or most time on helping the rest work together, i.e. manage/lead them.
I’d bet the children would come out better simply because they have parents who are likely multi-disciplined as a group. A disparite group will (almost) always come up with better results than a homogeneous one (at least in my experience)
I think you're sort of misunderstanding the role of QA.
You think that QA is a liability shield, but that is only a side effect of the work that they actually do.
The task of QA is exactly that: an entity that tries to assure that the quality is up to some standard.
Even in favourable conditions mistakes happen, so how do you make sure as a company that not 1 in every 100 product are faulty and tarnishes the good reputation that your company has spent so much time and money on to build? You hire a QA to make sure problems get caught before delivery.
But if all humans make mistakes, and QA is human, how do you make sure that the QA doesn't make a mistake? A never ending chain of QAs expecting each other?
No of course not. One thing that helps with reducing errors is to have a rigid protocol that is followed to the letter everytime. Pilots, for example, have a preflight checklist that they have to run every time they operate the plane.
The rigid protocol of QA teams is therefore an essential part of their jobs.
Although from your standpoint as a developer it might seem strange that QA is 'preventing' you from fixing a bug, it is actually very reasonable.
Especially since you work in the defence industry, I hope you understand that it is very important that the software that operates radars, planes, missiles, bombs, etc is working exactly as expected. Understandably there is a great deal of effort made to assure that when those things are needed they work exactly to spec.
So in your example it is probably very reasonable that any change you make needs to go through some rigorous process. The fact that it 'only' was about test equipment, doesn't matter because test equipment is just as, if not more important as the stuff it tests.
The reason why QA has the side-effect of being a 'liability shield' is that it gives companies the ability to argue (and proof) after the fact that the company did their due diligence in making sure that the product was to spec.
Especially certification is basically to get an external organisation to approve your QA. In that case if you get sued you can rightfully claim that you did everything that was legally asked of you, and if there is blame, then it is the certifying company using insufficient standards.
I’m regularly critical of Boeing Defense (particularly space contracts where I’m a huge Boeing skeptic), but I think people are pretty off base if they think Boeing is just completely incompetent.
Airliner safety is insanely good. Just vast seas of competence, but when there’s a super rare failure, the incorrect impression people get is that Boeing (or Airbus) is just full of incompetency. Almost nothing that humans do is held to the same standard. Not spaceflight, not software, not healthcare, and certainly not automotive.
Flying a 737 Max with a bad door and without the fix to the angle of attack sensor is probably still better per mile than driving. In spite of going at 10 times the speed and miles above the Earth.
You can almost argue it’s held to a higher standard than it should, slowing development of cleaner aviation (and therefore killing more people in the future due to tertiary effects of climate change, etc).
It kind of annoys me when comment sections are filled with people talking about how incompetent Boeing is. It feels like out of shape slobs on their La-Z-boy chairs talking about how incompetent or slow some professional sports players are. Like, airliner safety is just a totally different league than almost anyone else plays in. On the worst day, their better than almost anyone else is on their best.
Because I dug it up for another comment, commercial carriers operating under Part 121 (roughly: scheduled passenger and cargo operation) had 4 fatal incidents in the last 10 years. [0]
Totalling 6 deaths.
In 10 years of US commercial carrier aviation.
One of those was literally 'the engine exploded and threw part of the turbine into the cabin (and also shredded some of the wing)'!!
Which resulted in 1 person dying and a successful landing.
Ya but your sample size is way too small to measure the death rate. Aircraft deaths are rare, but flying is too.
The two MAX 8s that fell from the sky were 100% Boeing's fault and could have happened in the US. If 5% of airline traffic is in the US you can renormalize those hundreds of dead and you get dozens dead.
We know US pilots have been warning about the same issues that led to the deathly crashes later but were ignored. The thing is, one part of US commercial aviation being so safe is a lot of pilots responsible for the jet airliners are ex-military. Someone mentioned Southwest Airlines Flight 1380, yup, captain Tammie Jo Shults was one of the first Navy female fighter pilots. Miracle on the Hudson? Sully Sullenberger was an Air Force captain and training officer. Civilian training, no matter how good, is just no replacement for military training and experience.
I can't find specific numbers but estimates say about one in three has a military background. That's an awful lot.
Let's assume American pilots are gods. They were shouting that their crafts were unsafe.
No matter how good they are and how prescient, that doesn't help them if the aircraft computer decides it's stalling, forces a nose down and they cant fight the controls.
But, even if we assume omnipotence from these American pilot gods, and assume they can fly outside the bird and Superman-style catch it, they are still only 30% of American pilots. Just another population to normalize out.
> forces a nose down and they cant fight the controls.
But that's not what happened. According to every report, it is possible to take back control, it's just very much not intuitive and the situation was confusing.
According to the Seattle Times
> However on both accident flights, the angle-of-attack sensor failure set off multiple alerts causing distraction and confusion from the moment of takeoff, even before MCAS kicked in.
> On the Ethiopian Airlines flight, for example, a “stick shaker” noisily vibrated the pilot’s control column throughout the flight, warning the plane was in danger of a stall, which it wasn’t; a computerized voice repeating a loud “Don’t sink!” warned that the jet was too close to the ground; a “clacker” making a very loud clicking sound signaled the jet was going too fast; and multiple warning lights told the crew that the speed, altitude and other readings on their instruments were unreliable.
> I manually positioned the thrust levers ASAP. This resolved the
threat
Then there's
> B737 MAX First Officer reported that the aircraft pitched nose down after engaging
autopilot on departure. Autopilot was disconnected and flight continued to
destination
> I called "descending" just prior to the GPWS sounding "don't sink, don't
sink." The Captain immediately disconnected the autopilot and pitched into a climb
Another
> Takeoff and climb in light to moderate turbulence. After flaps 1 to "up" and
above clean "MASI up speed" with LNAV engaged I looked at and engaged A
Autopilot. As I was returning to my PFD (Primary Flight Display) PM (Pilot
Monitoring) called "DESCENDING" followed by almost an immediate: "DONT SINK
DONT SINK!"
I immediately disconnected AP (Autopilot) (it WAS engaged as we got full horn
etc.) and resumed climb
> I can't find specific numbers but estimates say about one in three has a military background. That's an awful lot.
Not surprising given that pilot training is really really expensive. Airlines love former military pilots because they are a significantly lower financial risk for them. Put them into type rating and off they go, it's rare that one ends up as a dud.
> It kind of annoys me when comment sections are filled with people talking about how incompetent Boeing is. It feels like out of shape slobs on their La-Z-boy chairs talking about how incompetent or slow some professional sports players are.
People do this with everything though, and air travel induces a large amount of fear in the populace. Not only are we not generally comfortable flying in the air for obvious reasons, but when it happens almost everyone has to concede control to a few people in the cockpit and on the ground. Driving, even if exponentially more dangerous, affords the illusion of control of one's outcome, given driving or having someone you know driving, and control over the vehicle maintenance, etc, as well as familiarity with the control and mechanism of the vehicle. These things don't exist with airplanes for the vast majority of people.
So, you can see why there is a need to find a human component to air travel problems, because that is something one can fix (fire the incompetent people, fine them, whatever), as opposed to all of the other things which must be accepted or rejected entirely.
It is entirely in line with human nature to do this, regardless of its accuracy or effectiveness.
> Airliner safety is insanely good. Just vast seas of competence, but when there’s a super rare failure, the incorrect impression people get is that Boeing (or Airbus) is just full of incompetency. Almost nothing that humans do is held to the same standard. Not spaceflight, not software, not healthcare, and certainly not automotive.
And there's good reasons for that. Spaceflight actually is regulated pretty strictly (partially, because any spaceworthy rocket is effectively a missile), and space pilots and tourists both sign up for such missions fully knowing that they will have a very significant chance of dying one way or another - there simply hasn't been enough human spaceflight activity to work out and understand all the failure modes, unlike with other forms of transportation.
Humans, unlike birds, aren't naturally wired to travel by air... they need to be able to trust their lives to a significantly higher degree to someone else behaving like they should, because unlike in a car they have zero control (or the illusion of control) in an aircraft.
Additionally, the inherent security risk of an airliner is very high: what is a widebody airplane at its core? Hundreds of tons of weight, a decent portion of which is fuel, propelled at near-supersonic speed, and only two people in control of it. Anything goes bonkers and you can get thousands of people killed and injured (see 9/11).
In contrast, cars, even trucks, have way less capability to cause damage simply because they weigh so much less. The only thing that comes close is railways, and hell I don't get what the US is doing there, there's barely any regulation compared to European standards (see the videos I linked at https://news.ycombinator.com/item?id=38725988).
Being better then driving shouldn't be the standart. Specially driving in the US.
Flying isn't safer then trains I would assume.
Flying has the advantage of being seperated from almost everything else. Most accidents happen when there is mixed traffic, specially cars operated by people with minimal training.
Is it plausible that Boeing has "learned" from software/startup/venture-capital culture with regards to tolerating higher risk to minimize costs?
I suspect it's rather a case of parallel evolution between McDonnell Douglas brass and software startup culture, since cost-cutting culture goes back many decades (remember "Chainsaw" Al Dunlap[1] ?) — but I wonder if there's a more direct influence.
Here's a Netflix documentary (in the wake of the MCAS crashes) that alleges that after the merger with McDonnell Douglas, the culture of the firm changed. Previously dominated by engineers, it was now dominated by MBAs with a focus on profit and shareholder value.
"With impressive clarity, Downfall: The Case Against Boeing reveals corporate corruption that's enraging in its callousness and frightening in its scope."
Boeing Airliners are much safer now than before they merged with McDonnel-Douglas. (Because basically all airliners are.) And I say that as a regular Boeing critiquer.
In lots of ways, the "learning" there would just be "capitalism".
It's inherently short-sighted unless forced to do otherwise by legislation. Cutting small corners pays off A LOT until the hammer falls, so there's a massive advantage to doing it / you need to do it if competition is doing it, or you eventually shut down as they take all your business.
It's inherently a race to the bottom. Sometimes that's a net gain, sometimes it isn't.
All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.
Other systems had incentives such as, get it running by such and such date or have yourself and relatives sent to inhospitable place. So people rushed flawed designs into production.
That said, upper management at Boeing needs a shake-up. People need to get fired. They need to do what Intel is trying and that is to get more engineers in charge, or at least grant them veto power on designs.
> All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.
It should be a lesson against dogmatic pursuit of absolutes: capitalism comes in a wide range of flavors, and the worst is if it’s completely unrestrained. Communism produced worse and worse results the further it got from any sort of public accountability, etc.
The two problems that I see is that the concept of nuance is somewhat at odds with having a simple concept to teach kids at school, and there’s always a group which is more motivated to game the system than the average person who really just wants to hang out with their friends, raise a family, etc. rather than play political games. Boeing didn’t start it by any means but they’ve benefited enormously from decades of reduced oversight and elevated pay driven by a sort of cartoon libertarianism where letting people get enormously rich will motivate them to build great things unfettered by “red tape”.
> All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.
They have, but post-Thatcher neoliberal capitalism has taken the existing perverse incentives and made them exponentially worse. We're on a course heading straight to feudalism, just with fancy titles with legal rights replaced by economic might.
Yep, Chernobyl being a prime example. Or Komarov's failed re-entry after complaining about the design faults of the vehicle long before launch. Then there was the more uhhm run of the mill backyard blast furnace campaign which contributed to misallocation of workforce which then led to mass starvation.
There are many many more examples. I find it so tiresome to see young people just use capitalism as a catch all for the failure of something. It's such a lazy and uninformed argument.
I'm not carte blanche defending capitalism - its a mixed bag but it sure outpaces the competing systems put forward to date. It does need some stronger safeguards against industry self regulation - that has a bad track record.
I think we're on the same page. Economic systems need failsafes so that they don't suffer from positive feedback loops.
What anti-capitalist sympathizers, in my view, don't realize is that this is due to people being in the loop. These economic systems are merely vehicles, some better than others, but the conductors are people, be they communists or capitalists. At least with capitalisms there is a delayed regulator (negative feedback) in communism it's up to the system to decide if it needs to modify itself.
That adage is okay, but for it to work not everything can be forgiven — there actually has to be an expectation to be held responsible towards acting on good faith.
Cockpit resource management is also something a lot of industries can learn from. As well as human error analysis. How an error came to be is often much more interesting then the personal shortcomings of the person who caused it.
At it's best, software QA and related methods should be equal to airline manufacture.
Think of railway signalling systems, control-by-wire bits of modern cars, medical equipment, etc. Where the design of the software is formally proven, and the implementation verified to ensure it fits the design.
Would you pay at least 2x for your software to have couple more nines of reliability? I’m gonna guess that “no”. At places where it costs $$$ to have bugs shipped to the end customer (e.g phones) or where there’re regulatory requirements they still have dedicated qa.
Which is what i said in second part of my comment. For most software businesses the cost of shipping a bug is trivial and/or poorly measured so due to McNamara fallacy it is readily exchanged for well measured cost of having a functioning qa team
of course, but most of us aren't working on products where a quality problem would kill hundreds of people. Having aircraft-level QA would be plain silly, you don't expect that level of quality from most other industry like eg guitar manufacturing, do you?
The problem was not really the software in isolation, but that pilots expected the 737 NG to behave exactly like the old version - because Boeing decided it was too expensive to retrain pilots.
The problem was software that prioritized input from a fauly external sensor, over pilot control, and literally crashed planes directly into the ground. At a certain step in the sequence it was not physically possible for a pilot to pull hard enough on the control element to counteract the software. Could they have disabled the system? Only if they could figure out the specific software trying to crash the plane.
Is that what you meant by "the problem wasn't the software?" Because the pilots should have been trained to unplug the computer to stop it from crashing the plane?
Pilots should (are supposed to) disable the auto-trim if it's doing something uncommanded/unexpected. Runaway trim can happen for reasons other than faulty software. MCAS was a new factor and they should have been told about it, I don't dispute that at all.
Here we are again, this misconception just won't die.
In the 737 MAX, the only way to disable auto-trim also disables powered trim (the thumb buttons). As grand parent says, at a certain step in the sequence it was not physically possible for a pilot to trim the plain back to stability manually. It simply can't be done.
In the 737 ng, there was a button to do just that. That would have been useful.
And that's even ignoring the fact that all symptoms were very different from those present in a runaway trim situation as described in the manual and learned by the pilots.
The thumb buttons would override MCAS. But then you'd have to disable the trim motors and trim manually (by hand-cranking a wheel). That part was not clearly understood by the pilots, because they were not told about MCAS.
How are pilots expected to disable a malfunctioning MCAS in an emergency, and balance manually by trial and error the aerodynamic extravagances of the angle of attack of such unbalanced aircraft in the middle of procedures?
The user of the parent comment is remarking about time.
The aircraft can be certified without MCAS?
By what I read, MCAS is there for to avoid entering into an aerodynamic stall when the aircraft is approaching a high angle of attack, due it's using larger motors for what classical 737 was designed for. It's balancing an unbalanced aircraft using software to repeatedly adjust the horizontal stabilizer.
It is not my field, but I'm not even sure if it should be called to trim, it sounds like a euphemism for what's going on.
The manufacturer company put in larger engines than the aircraft is designed for. And they did it to avoid all the homologation licences and design costs involved in bringing a new aircraft to market with the appropriate tolerances, and to compete with another company's aircraft in time (Loss of sales).
They introduced MCAS in the aircraft for to balance by software a hardware issue, a big design negligent issue which can lead to stalling. It is beyond to trim an aircraft, and because of this there is a big difference in the scale of the values that the algorithm manages from a trimming.
It is not my field, but I think it is not a simple factor, and that it should not be put this over the Pilots like if it were a normal aircraft that received a simple update. Every pilot flying that plane should have been warned that it was not a classic plane with a classic update.
If this type of behaviour by aircraft manufacturers becomes the norm, costs over safety, we as passengers will suffer it, as other passengers unfortunately suffered it, while they blame the Pilots. In addition that nowadays the China's aircraft manufacturing industry wants to enter global market. Some days ago I read they want permission (homologations approvals) for to enter in the European Union.
PS: They also cut costs retiring backup sensors, delegating responsibility for a vital system due the MCAS to the buyer as if it was an unimportant feature; disaster was the order of the day. And the spending cuts were not limited to that, as we have seen in recent days.
> They introduced MCAS in the aircraft for to balance by software a hardware issue, a big design negligent issue which can lead to stalling.
> Every pilot flying that plane should have been warned that it was not a classic plane with a classic update.
I was mean,
> They introduced MCAS to use software to attempt to balance an aerodynamically unbalanced aircraft with a high stall tendency, in order to avoid designing a new aircraft.
> Any pilot flying that aircraft should have been warned that it was a plane that didn't want to fly aerodynamically, with software forcing it to fly without backed redundancy. It was not mere trimming.
Even more ridiculous, Boeing offered a second source of truth option, but marked it as an upcharge, which the airlines in question rejected. "No thanks, no need for a second AoA sensor, one is none is probably fine!"
Additionally, two feels like a really strange number. I would think three for a tiebreaker would be standard for any sensor with that much impact (no pun intended).
Yes, two would be very ill-advised. I think there was some incident where a plane had only two pitot tubes and of course this caused problems. Or... I may have simply misremembered pitot tubes instead of AOA sensors in the Boeing case.
This wasn’t related to autopilot and they removed mention of the MCAS system from the documentation to support the main selling point of the 737 MAX, which was that existing 737 pilots would be able to switch easily without recertification. They knew that they’d lose most sales to Airbus if the aircraft were compared on their merits so they were banking hard on their huge pool of certified pilots as the competitive edge.
If you listen to podcasts, these two episodes of Causality are excellent:
Ha, playing hardball! I wonder whether you’d find pilots who are Boeing loyalists who’d take offense, or if those guys are even madder at the current management for letting them down.
>> Because the pilots should have been trained to unplug the computer to stop it from crashing the plane?
Yes.
The fault lies with the airlines because I don't for a second believe they didn't put pressure on Boeing to get the MAX certified without mandating retraining.
And then once that was done, didn't dig into the details too hard about what changes were made.
I have a low tolerance for 'I set up all the conditions and incentives to encourage you to break the law... but you should take all the blame when it explodes.'
At some point, the customer has to take some responsibility for what they asked for.
It’s easier to blame Boeing because they made the damn thing its documentation. We know for a fact they are at fault. Some or all of the airlines may or may not have put pressure on Boeing.
The expense for retraining pilots falls on the airline.
Retraining has its own problems. No matter how well retraining is done, pilots still make mistakes from doing the right thing for the previous plane that is the wrong thing for the one they are currently flying.
Adjusting airplanes to fly the same way is a major safety advantage.
Arguably, Boeing hit the uncanny safety valley -- similar enough so that pilots and airlines relaxed, but different enough so that relaxation ultimately killed people.
The emergency procedure for runaway trim was the same for both aircraft types, and was not followed. After the first crash, an Emergency Airworthiness Directive was issued to all MAX pilots reiterating the procedure, which was not followed in the second crash, as well as not reacting to an overspeed warning.
Unreported by the media, there was another MAX incident before the first crash. The crew had no knowledge of MCAS, but did follow the emergency runaway trim procedure, and continued the flight and landed safely.
"Runaway stab trim". It is a memory item, every pilot should be able to perform it from memory.
Turn off the motor, and the trim is manual. There is a crank right there in the cockpit. If it is too hard to turn, change aircraft configuration to reduce the forces required to. Pilot know how to do this. This pilot stuff, they understand the forces on the flight controls and what impacts them.
Boeing made an engineering mistake. The pilots also made an operational mistake. Unfortunately, both mistakes at the same time were fatal.
I pray that pilot training has improved. And that Boeing has made systems level changes to the aircraft that will preclude it happening in the future.
And that is how aviation becomes safer every year; at a significant cost of customers lives.
> The probable cause of this accident was the inappropriate response by the first officer as the pilot flying to an inadvertent activation of the go-around mode, which led to his spatial disorientation and nose-down control inputs that placed the airplane in a steep descent from which the crew did not recover.
The fatal accident count is higher for GA, but I didn't normalize against flight hours or flights, just glanced at it.
I'm sure there's been a study somewhere that attempts to untangle all the factors that differ between commercial carriers and GA, to see which safety is most sensitive to -- continuous highly professional maintenance, highly trained and experienced crew, rigorous airliner certification regime, etc.
The electric trim switches override MCAS. This was explained in the Emergency Airworthiness Directive sent to all MAX pilots after the first crash.
Also, overspeeding the airplane makes it much harder to turn the manual trim wheel. The cockpit voice recorder on the EA flight recorded the overspeed warning horn, which the crew did nothing about (they were at full power, should have pulled the throttles back).
The LA crew restored normal trim twenty-five times before crashing. What they never did was turn it off after restoring normal trim.
If a pilot can't be expected to maintain the pitch of a plane on takeoff, he has no business flying ANYTHING.
What Boeing did (and is STILL doing) is expect pilots to know or remember obscure NON-PILOTAGE (and in the case of MCAS, BURIED) trivia to prevent disaster.
Now... what's the more-responsible approach? Expect pilots to pilot, or expect them to recall an ever-growing list of workarounds to incompetent system design?
The whole MCAS was just unnecessary feature (bug fix).
Without it the plane would have worked just fine. The pilots would just have had to go some amount of training scenarios to get the certification on how the MAX plain flies.
Exactly. Unless the "upward-pitching tendency" under high power is extreme, any competent pilot should be able to keep the plane's attitude as desired.
Nah, in the software world, the truth is QA is where the people who can't get jobs as programmers end up. I've seen testers go on to become programmers, but I've never seen a programmer become a tester. Maybe it's different for real-time or life-critical systems, sure, but I can confidently say this is how it is in web development.
> Software QA when actually practiced is more advanced now than airline QA.
...eh, I think "when actually practiced" is doing a lot of carrying there.
What do you mean by "actually practiced".
Outside of the aerospace and healthcare industries,
I'm not sure there are many software shops that are doing QA to a level I would like to trust anyone's life with.
what does advanced mean when comparing things so unlike from each other?
also software is the least likely comparison I would have made; software quality is a shit-show on a general level, and the vast public is quite aware of this every time a subway timeboard blue-screens or gets frozen on an AMI screen, or the POS machine that they're forced to interact with at work does something equally as stupid.
Given everything I've seen so far, I'd bet good money that what happened here was miscommunication between Spirit and Boeing. Spirit started out locking down the plug, then Boeing asked them to just loosely attach it[1] so Boeing could yank the plug for interior/wiring/AC/paint, then someone at Boeing forgot about the "loosely". So now, they get in a hurry (maybe the AC/interior didn't need any access to work on, which makes sense for this MAX variant, it wouldn't need as many hatches to pull wire) and it went down the Renton line as if the plug was fully installed. It's enough to pass high blow inspection and other inspections, but then over time that "shipment config" attachment vibrated out, and pop goes the plug.
Almost certainly systemic issue though, so that sucks. Sucks real bad.
They need to get a Tiger Team or whatever together to look at everything with a shipment config, and make sure those "ship kits" don't leak into the real actual airplane configuration. This is . . ok, this is really manufacturing 101 stuff, but well, things happen.
I'm in the industry, but haven't touched the MAX, so take this with a grain of salt.
Bolts are most likely tightened with a torque wrench or a gun that is set to a torque spec. Over tightening a bolt is as bad as a loose bolt.
I speculate these passed QA from Boeing because they might have been correctly torqued to the spec. What happens in field is hard to understand. One possibility is vicinity to the engine can cause extreme vibrations, these can make them loose.
Other possibility is the maintenance side of things - maybe a badly calibrated torque wrench could be the reason.
Mechanical systems are not inherently immutable.
I would expect lock wire or some other method of ensuring the bolt does not un-torque itself. Especially for bolts that are not required to be removed past final assembly...
Nope. It is most probably caused by operational stress - rudder assembly is moving, fuselage is also working (compression and decompression cycles on take off and landing, thermal expansion and compression). I bet they don't just put red Loctite on it to keep it from getting loose. My bet is design flaws, not manufacturing or QA.
EDIT: I saw the pictures of bolts with pins and bolts without pins. The ones with pins cannot get loose, the others can. Let's see what happened.
I don't know about you, but in my industry, "QA" also means extensive testing to ensure that part/assembly/etc doesn't break with expected operations. So, yeah, from where I'm standing, this was a QA problem. Something did not get checked or tested as it probably should have.
The 4 restraining nuts and bolts on the door have a cotter pin like mechanism to prevent them from loosening. If assembled correctly they cannot loosen unless the pin fails.
They are not related. Probably different types of bolts, for sure different stress types. Rudder assembly is a moving part, these false door panels are not.
It's not a control surface, but it is a "moving part." That's what's baffling to me, that they spent a lot of effort building this hinge and pin roller system, and designed the door to hinge open up to 15 degrees.
It makes me wonder if there's maintenance procedures that at some point would require the operation of that door to successfully complete. Otherwise, the mechanism itself seems so incredibly overwrought, with lots of additional bolts, castle nuts, retaining pins, and even sprung hinges at the bottom.
Does anyone know why this "plug-type non-plug door" is built this way?
It needs to be usable depending on how many passengers the interior is configured for.
So it has all of the door bits there. Maybe some parts like the emergency escape slide are not installed.
e: I should be clear that it's not usable as an emergency exit, as configured by Alaska. However the operator could choose to activate it later and install a usable exit.
If you are correct, then the implication is that the concern extends beyond door plugs for MAX-9 737s to all emergency exit doors on all models of aircraft sharing this design. This is somewhat reminiscent of the huge problem with the 688 (Los Angeles) class submarines, where the discovery of a faulty weld that had passed inspections raised doubts about all welds.
This is not correct. To the passengers, this just looks like another seat next to a window with a plug installed. It's not a door.
If there was a reconfiguration to a seating standard that required the extra exit, the plug would be removed and a proper door would be installed, with the associated interior pieces.
> To the passengers, this just looks like another seat next to a window with a plug installed. It's not a door.
This is true.
However there's still common hardware in there to allow the plug to be installed and maintained. This is why it's a complicated set of kit vs just bolting in a permanent fixture.
Where did you see that? My understanding is that it's an optional plug door that's used to assist with interior installation. Once the interior is done, it's bolted shut and interior paneling is installed over top. From the inside, you can't tell it's there.
Ahh. Well, in the case of the Alaska flight, it's a plug door and not used as an exit. It's pinned in place with large pin that has a bolt, a castle nut and cotter pin which lock the pin.
This is not true. It's designed to be opened when inspecting the fuselage for corrosion or stress cracks at the opening. To open it you have to remove the interior plastic panels and undo the 4 bolts that this accident is about
I recall a running joke from my childhood - from a former communist East European counry - about a certain car saying you should finish the assembly at home after purchase, tightening the screws before first use. Despite being a famously poor quality car - even in the sloppy East European practices - that supposed to be a joke not to follow suit!
Unfortunately, Boeing did not know they had other issues with the plug door bolts.