Pilot Error: A How to Guide
Aviation safety professionals will tell you that pilot error is one of the most controllable factors leading to aircraft accidents. If you eliminate pilot errors, you can eliminate most aircraft accidents.
I think they have that all wrong. Pilot error is inevitable. The problem is that we as pilots don't know how to deal with the very concept of pilot error. We are doing it all wrong.
Photo: G450 Pilot Flight Director, 250 knots below 10,000 feet, after an ATC "Slam Dunk."
(PhotoShop from Eddie's aircraft)
So how should we approach the pilot error topic? I think we should refocus, thusly:
Dissect the "pilot error" term so it become mores useful
As professional pilots we are all part mishap investigators when reading through aircraft accident reports; it is a natural and necessary part of our education as aviators. But we often fall prey to the general public’s perception of pilot error as the top cause of all aircraft accidents. Most Internet web searches for “most common cause of aircraft accidents” will yield pilot error as number one, followed distantly by mechanical, weather, and sabotage. In the public’s eye, we are guilty before we even turn a prop or spin a turbine. But this blame game hardly tells us why accidents happen and how to prevent them in the first place. For that, we need to turn to professional accident investigation techniques.
Are pilot errors the "root cause" of most aviation accidents?
Photo: Students sift through mock wreckage, USAF Aircraft Mishap Investigation Course.
The U.S. Air Force trains its accident investigators at the Aircraft Mishap Investigation Course (AMIC) at Kirtland Air Force Base, New Mexico. I graduated from AMIC many years ago, when it was in San Bernardino, California, and will always remember two fundamentals of investigation:
- You aren’t done until you’ve found the root cause.
- The root cause must include the word “fail” to truly address what needs to happen to prevent recurrence.
Dr. Antonio Cortés, the Associate Dean of the College of Aviation at Embry-Riddle Aeronautical University in Daytona Beach, Florida explains with a metaphor about examining a plant’s root structure in order to detect signs of disease or other problems with the plant: [Cortés, p. 45]
Treating a sick plant by just examining the visible portion of a plant that lies above the ground can often lead to misdiagnoses, and thus, a faulty treatment plan. By carefully examining the root structure of a plant, gardeners get down to root causes of plant pathology, and as a result can produce accurate diagnoses of a plant’s ailment and initiate effective treatments to bring the plant back to health. The gardening process serves as a very apt metaphor for how safety investigators relying only on active causes of accidents will likely find their corrective efforts ineffective if the underlying causes are not addressed. They must seek out the root causes of accidents.
He goes on to say that investigators must move past apparent/obvious/active causes down to latent/organizational causes. We armchair investigators are fond of citing the old saying that an accident results from a chain of events and breaking any link in that chain can prevent the accident. But that ignores the fact that the “something” that started the entire sequence of events could very well be an inciting event that will strike again.
For more about this, see: Windshear.
Going beyond the pilot to find the "root cause"
To cite an extreme example, in 1975, Eastern Air Lines Flight 66, a Boeing 727, crashed at New York-John F. Kennedy International Airport, New York (KJFK) after encountering what was then called “adverse winds associated with a very strong thunderstorm,” something we now call windshear. The National Transportation Safety Board cited the crew’s “delayed recognition and correction of the high descent rate” as contributory factors but in general found the cause to be those adverse winds. We cannot very well say the weather failed to be conducive to flying and saying the pilots failed in this instance would do little to prevent recurrence. It is telling that only a year before the Eastern Air Lines crash, a Pan American World Airways Boeing 707 crashed in Pago Pago due to “destabilizing wind changes.”
As a trained accident investigator, I would have written the 1975 Eastern Air Lines probable cause as follows: We as an industry failed to fully understand previous accidents attributed to “destabilizing wind changes” and then we failed to train pilots to recognize and avoid conditions conducive to windshear.
Even when pilot error seems obviously to blame, citing it as the root cause can obscure what needs to be fixed. If you say a crash was caused by pilot error the solution seems very easy indeed: don’t make that error. And yet that error happens again and again.
For more about Six Sigma, see Chapter 16 of Faster, Better, Cheaper, by Christoph Roser.
Pilot errors are caused by non-perfect pilots
Errors in a dynamic environment like aviation are inevitable. Popular management programs that seek perfection in manufacturing processes can have adverse effects in a cockpit. We are not in the business of producing widgets, after all. The trendy “Six Sigma” program, for example, gets its name by trying to keep production within six standard deviations of the process means. In statistical terms, one standard deviation is known by the Greek symbol sigma, σ, and 6σ equates to an accuracy rate of 99.99966%. Expecting that from machines whose only task is to repeatedly produce hundreds of thousands of light bulbs, for example, might be possible.
But given the quantity and diversity of decision making expected of a professional pilot on even the most routine flights, perfection is simply not possible. Pilot qualifications and experience levels can help, but will not bridge the gap between good and perfect. Once you add in the complications of fatigue, weather, other airplanes, and aircraft malfunctions, even the most idealistic academic will have to agree that Six Sigma is better left in the classroom.
For more about this, see: Fault Tolerance.
Two types of pilot error
A better idea is to realize that not all errors have the same consequences and therefore can receive different emphasis. Some errors are so critical that we must avoid them without fail; we must develop systems and procedures to prevent them from ever happening. But other errors have wider margins or fail safe mechanisms; these errors can be survived or easily corrected. But how do we know the difference between these two categories of error? Perhaps we can borrow a page from hardware design to solve our procedural confusion. An engineer might address the two categories as fault intolerant versus fault tolerant. Pilots would do well to understand the difference and react accordingly.
When dealing with mechanical systems, a fault tolerant system is one that can be said to “fail safe,” that is, when it fails it does so to leave you in an acceptable condition. A triple navigation system, for example, can have a single long range navigation system go bad and still leave you with two others to meet most worldwide requirements for navigation system redundancy. A fault tolerant system can also be “fail passive” so long as it incorporates a level of redundancy and notifies the pilot there is a problem. Most transport category airplanes have fail passive pitot-static systems. You can lose a system so long as you have a backup and let the pilot know you are in a degraded condition.
A fault intolerant system, on the other hand, either leaves no redundancy or fails to disclose to the pilot there is a problem. Non-redundant systems are usually engineered to be “safe life” components; they are designed to outlast the aircraft. The pilots, airline, and manufacturer were surprised when the non-redundant stabilizer jack screw of Alaska Airlines Flight 261 failed on January 31, 2000. McDonnell Douglas required regular inspection and replacement of the MD-83’s stabilizer jack screw assembly but the airline, with the FAA’s approval, improperly lengthened inspection intervals resulting in a failure of the aircraft’s horizontal pitch control system. The aircraft was flyable, but the pilots and ground maintenance teams did not realize they were dealing with a fault intolerant system. Their combined efforts at troubleshooting caused the jammed stabilizer to break free from flyable limits and condemned the aircraft to plunge from the sky, killing all onboard.
Just as we can classify various pieces of hardware as fault tolerant or intolerant, we can also define various pilot procedures as so critical that perfection is indeed necessary and others that are less so. This division of importance allows us to place focus where it is needed. In plain English, we are talking about errors that are either critical or non-critical. An academic will protest at this point: all errors are critical! While this view sounds good in the classroom, those of us who work in cockpits know it robs us of the focus needed to eliminate those errors that really can kill us in an instant.
Eliminate critical errors with relentless simulator practice
The list of fault intolerant processes required of a pilot will vary with aircraft but for large multi-engine aircraft the list includes the V-1 takeoff decision, the land or go around decision on an instrument approach at minimums, and a high altitude rapid depressurization. In these cases, where perfection is mandatory, we are required to demonstrate proficiency in a simulator until our reactions become rote. There is a danger, however: if we train only to pass the check ride, we may not really learn the muscle and mental memory to master each maneuver. We also risk blurring the line between critical and non-critical when decision making times are lengthened.
Immediate Action Required (non-ambiguous) Procedures
When we practice procedures that require split-second decision making, the transference from simulator to airplane can (and should) be seamless. An engine failure at V1 may surprise you, but your reactions should be instantaneous.
More about this: V1.
I was once paired with a younger pilot in an initial qualification simulator course where the training vendor was essentially “teaching the type ride.” Half the simulator sessions were designed around the check ride profile and every engine failure was of the left engine, because they knew the check ride had to include a failure of the critical engine. In each case I allowed the nose to track into the dead engine before correcting with rudder and in each case we lived. My sim partner, on the other hand, immediately applied the right rudder so quickly I could never tell which engine had failed without looking at the instruments first. The instructor said this wasn’t a problem. I bet the instructor my sim partner wouldn’t be able to handle a right engine failure and he agreed to try an experiment. The next day he failed the right engine and the three of us yawed our way into simulator hell. My ex-simulator partner passed his check ride and has never spoken to me since. I may have lost a friend for life, or maybe I saved his life should he ever experience an engine failure that isn’t listed as the critical one. Training matters.
More about this: Missed Approach.
Very few of us, even we in the “seasoned pro” category, have gone missed approach at minimums in an airplane more than a handful of times. We practice this often enough in the simulator where our reactions at decision altitude should be instantaneous. But quite often they are influenced by surprise and an overarching need to be smooth. Think back to your last missed approach at minimums with passengers on board. Would that reaction have passed a type rating check ride? If not, you have work to do.
More about this: Rapid Depressurization.
The same deliberate approach needs to be taken with a high-altitude decompression. In this case, however, training needs to go beyond the simulator. You should be as proficient with donning the oxygen mask in your airplane as in the simulator. The simulator’s mask is used often enough where the entire assembly pulls from its container at the slightest tug. You should time yourself in your airplane to see if you can do it as quickly.
More about this accident: American Airlines Flight 1572.
Ambiguous Critical Procedures
We approach many fault tolerant procedures in the simulator as if they were fault intolerant and therefore strive for perfection all of the time. Exceeding 250 knots below 10,000 feet, for example, will earn you a stern rebuke from the simulator instructor. But in the airplane, it is only cause for making a pitch adjustment or throwing out some drag. The fact we allow ourselves this latitude for fault tolerant errors that we think are non-critical, may encourage us to do the same for fault intolerant ones that we seem to survive routinely. Case in point: the Minimum Descent Altitude (MDA).
Figure: Bradley Intl VOR Rwy 15, Jeppesen page KBDL 13-1, 13 Feb 95.
Perhaps the best example of this is the classic “dive and drive” accident. In the simulator dropping a few feet below a "hard" altitude will earn you a critique or even a failure. But in the airplane we know there is a margin of error in that altitude and a few feet now and then has never hurt us before. The line between critical and non-critical becomes blurred.
In 1995 the pilots of American Airlines Flight 1572 failed to level off at their Minimum Descent Altitude (MDA) and landed short of the runway at Windsor Locks-Bradley International Airport, Connecticut (KBDL). Their McDonnell Douglas MD-83 was destroyed, but no one was killed.
The captain used the vertical speed mode to descend to the MDA with the intention of pressing the altitude hold button at the MDA. But he didn’t do this until the first officer said, “you’re going below your . . .” They had the added complications of the company’s procedure at the time to fly with the altimeters set to the field elevation (QFE), having an inaccurate altimeter setting, and flying in turbulence. The first officer didn’t make the company’s mandatory callout 100 feet above the MDA and the captain didn’t change his programmed descent rate until the first officer’s “you’re going below” call. At that point, the aircraft was only 350 feet above the ground. The altitude hold button allowed the aircraft to dip further below MDA and the captain didn’t take any steps to manually fly back to the MDA in a more aggressive manner. Moments later the aircraft struck trees and the left engine ingested enough foreign matter to fail and the right engine to degrade sufficiently to no longer sustain flight. From that point on the crew exhibited exceptional flying skills and Crew Resource Management (CRM) that was credited with limiting the number of injured passengers to one individual. But why would two highly qualified and well trained pilots have flown in what appears to be a lackadaisical manner before that point?
Pilot error was “a” cause but not the root cause. The captain failed to level off at the MDA. The first officer noted the deviation but failed to forcefully correct the captain. But these pilots were flying as they were taught and checked. American Airlines clearly recognized this fact when they issued the following items in a bulletin distributed in January of the next year:
- Despite its name a non-precision approach must be executed with exacting precision.
- MDAs and step-down altitudes are limits. Any altitude level-off variation must be above the MDA or step-down altitude rather than below.
- The primary attention of both pilots will be directed to the level-off at the step-down altitude or MDA.
- After level-off at the MDA, the pilot-not-flying will direct primary attention outside the airplane and call out visual references in the sequence required.
In this case, the airline went above and beyond the formal NTSB report and wasn’t satisfied to say the crash was caused because the pilots failed to level off. (“So the rest of you pilots, don’t do that!”) They recognized that the root cause was a misunderstanding among pilots about the term “non-precision.” (The title simply means the approach does not include glide path guidance, not that it can be flown with less precision than one with an electronic glide slope.) Their approach to accident investigation is something we should try to emulate when evaluating pilot error.
I believe we tend to give ourselves much greater leeway in the airplane for those items we don’t internally recognize as fault intolerant. This leads us to think we have a greater margin than we do. So we don’t act with the necessary urgency when we hear “you’re going below your . . .” We are supposed to be perfect, so if we are going to fall short a few feet, how is a few hundred feet any worse? The same can be said of our CRM skills. If the captain doesn’t always “nail” the MDA, at what point do we become more forceful in our corrections? There must be a point where “you’re going below your . . .” becomes “Too low! Go around!”
Minimize non-critical errors with honest critiques and self disclosure
So there are times you want to strive for perfection because the cost of falling short is simply too high. But what about all those other rules, regulations, procedures, and techniques that don't have the same razor thin margins, but you still want to get right? For these, it may be okay to fall short of perfection provided you have a plan to recover.
Demonizing pilot error is counterproductive
The “don’t make any pilot errors” approach to flying:
- ignores the fact that perfection in aviation is impossible,
- discourages us from admitting our mistakes or accepting corrections,
- inhibits our learning process, and
- prevents us from coming up with the tools to mitigate the effects of these errors.
Perfection is not achievable in aviation; our environment is simply too dynamic. As pilots, we have an intuitive understanding of the impossibility of the task in front of us. The aircraft manufacturer and various governments have conspired against us by compiling an endless list of rules and regulations that are designed primarily to say, “I told you so,” in the event of an accident.
The best example may be the oft violated provisions of 14 CFR 91.211 covering the use of supplemental oxygen; everyone ignores that rule. So now we’ve broken one rule. The second rule becomes easier to break. As does the third, and so on. What about the many other rules we agree are important and should never be ignored? Altitude busts, for example, are not to be tolerated.
Photo: G450 guidance panel without a vertical mode, from Eddie's aircraft.
A few years ago, I was flying over Ireland in my high-tech Gulfstream with the flight level change mode of the autopilot doing a nice job of holding our Mach number and the engines giving us a steady 1,000 feet per minute rate of climb. As the autopilot captured our intermediate level off altitude, FL350, Shannon Center re-cleared us to FL400. The first officer spun the altitude selector to the new flight level, I acknowledged the setting, and got busy with verifying our oceanic clearance between our master document and the flight management system. Halfway through this task we got the dreaded, “say altitude?” request from center. We were at 40,500 feet and climbing an anemic 200 feet per minute. We had flown imperfectly.
Shannon Center was very understanding and simply requested we chaps correct to our assigned flight level. I thought about this for a very long time. When we got back I convened a flight department meeting and we dissected the events as best we could. Even our youngest pilot had decades of international experience and I had over twenty years in Gulfstreams. This should not have happened!
Even if Shannon Center was willing to downplay my transgression, we as a flight department were not so charitable. We eyed our procedures and our new aircraft with suspicion. Then, a few months later, it happened again. But this time we caught it. In what can only be described as a programming bug, our G450 has a “gotcha” that isn’t published and isn’t as widely known as it should be. We discovered that if the altitude selector is changed after the autopilot has captured the previous altitude, the autopilot freezes the pitch and continues without a vertical mode, without a target altitude, and without a warning. If you are in a climb, you continue that climb until you stall. If you are in a descent, you continue until the airplane overspeeds or hits something. I call this the “vertical mode trap,” and it is a pretty big trap, indeed.
So now we know and we tell everyone who flies the Gulfstream G450 or G550 to be on the lookout for the same bug and to always verify they have a vertical mode whenever the altitude selector is changed. A fellow Gulfstream pilot wrote to say it was remarkable that we were able to self-critique and discover the why of the incident, but even more remarkable that we didn’t already have in place a procedure that would have saved us. He recommended that we should not only use sterile cockpit procedures below 10,000 feet, but that we should also employ these restrictions whenever within 1,000 feet of a level off. Of course, we already had these restrictions but they didn’t preclude the pilot from engaging in other activities, like concentrating on an oceanic clearance. Now they do.
I think about this mistake of mine now and then, especially in the context of pilot errors. I’ve seen pilots react to being caught in an error by first denying and then excusing. “No I didn’t.” “Well, you distracted me.” “It is understandable, that’s when ATC called us.” I think these types of pilots have bought into the idea of perfection and cannot accept anything less. But if you realize you are not perfect, you can admit when you make mistakes, you can willingly accept critiques, and you can learn from your mistakes. But most importantly, you can develop the necessary strategies to prevent future errors or ways to minimize their impact.
The fix has more to do with our culture as pilots than it does with simulator training and check ride performance. Most operators have long ago abandoned the “captain is always right” mentality but still have further to go. We have to realize pilot error is a fact of life. Yes, we have to strive for perfection when the margin for error is razor thin. We must also learn to recover from those mistakes that start out as less than critical, but could end up being the root cause in a future accident report.
This change to our pilot culture represents a tectonic shift from the way we approach cockpit discipline and will require some soul searching for many pilots. I recommend every pilot rethink their ideas about perfection. Some procedures are fault intolerant and demand flawless performance, these should be studied and practiced until automatic. But others are fault tolerant; while we want to get them right too, we need to realize we will not be perfect and must rethink how we approach them:
- Realize that pilot error isn’t a sin to be avoided at all cost, denied, or vilified.
- Foster an environment where corrections are given freely and accepted without debate.
- When in doubt, err on the side of safety and save the debate for post flight.
- Talk freely of your pilot errors to help others to learn from your experiences and to telegraph that you are open to corrections.
More about this accident: Comair Flight 5191.
You can test yourself to see if you need this cultural wakeup call with a hypothetical situation. Let’s say you and your crew are at the Lexington-Blue Grass Airport, Kentucky (KLEX) but you haven’t been there for a few months. It is dark and the taxi route has changed slightly. You are cleared for takeoff on the longer runway, just over 7,000 feet long. It is the first officer’s takeoff and you lined the airplane up with the runway and said, “all yours.” After taking control of the aircraft the first officer says, “that’s weird with no lights.” For the first time you realize the normally lit runway is completely dark. What do you do? That’s precisely what happened to the crew of Comair Flight 5191 on August 27, 2006. They had lined up on the shorter of two runways and failed to lift off after attempting the takeoff, killing all but one of the persons on board.
Now for our hypothetical let’s say the first officer realized you are on the wrong runway. Would the first officer feel at ease enough with you to speak up? Now let’s say you are rolling but the airspeed indicator is still on the peg. The first officer says, “we’re on the wrong runway.” Now what? Will your first thought be your aircraft’s abort procedures or will you be doing the mental math that says, “we’re probably okay” and that if you abort, the chief pilot won’t hesitate to fire you on the spot?
Or, on the other hand, will you have the character to realize that you can make mistakes and sometimes the safest and sanest thing to do is to bring everything to a stop and figure things out? It is all too easy to say, yes, of course I would. But take it from someone who has spent many years practicing the art of pilot error. Admitting to a pilot error in front of your peers and those that look up to you is hard work. But the more often you do that, the less often you will have to.
Cortés, Antonio; Cusick, Stephen; Rodrigues, Clarence, Commercial Aviation Safety, McGraw Hill Education, New York, NY, 2017.
Roser, Christoph, "Faster, Better, Cheaper" in the History of Manufacturing, CRC Perss, Taylor & Francis Group, London, 2017.