Testing the Limits: The crash of Gulfstream Aerospace flight 153
On the second of April 2011, a group of test pilots and engineers boarded a prototype Gulfstream business jet for a routine test flight in Roswell, New Mexico. But as the plane lifted off the runway, it suddenly rolled to the right, struck the ground, and burst into flames, destroying the experimental aircraft and killing all four people on board. The cause of the crash would prove to be anything but simple. Somewhere during the complex design process, Gulfstream had made assumptions about the airplane’s performance which proved to be tragically incorrect, sending its test pilots out to achieve an impossible goal. The problem went to the core of the company, as corporate demands to meet a tight certification schedule pressured engineers to avoid following up on clues that their calculations were wrong. The crash would ultimately provide important lessons not just for Gulfstream, but for anyone handling large corporate projects, about the dangers of creating a work environment where safety testing becomes a formality.
When a new airplane carries passengers for the first time, the world sees it as a beginning: the start of a long service life, a new era for a company, or the latest trend in design. But for another group of people, one which is often paid little attention, the debut of a new plane is not a beginning but an end, the culmination of thousands of hard and occasionally dangerous working days spent turning an idea into a real aircraft that is both functional and safe. These are the flight test pilots and engineers — the ones who push prototype airplanes into the unknown, so that future pilots won’t have to.
For an established manufacturer like Gulfstream Aerospace, the well-known creator of high-end business jets, the process of designing, building, and testing a new airplane is a familiar (albeit lengthy) routine. There is a well-defined list of hundreds of tasks to be completed, from determining the basic aerodynamic properties of the plane to finalizing the procedures that regular pilots will one day use. But it might have been that very familiarity which started to lead Gulfstream astray.
In 2008, Gulfstream announced that it had begun work on the G650, a large, twin-engine business jet with room for up to 19 passengers. The jet was loosely based on the main Gulfstream product line dating back to the 1980s, but had undergone a major redesign to make it bigger, faster, more capable, and more expensive than previous models. Early advertising promised that the plane would be ready by September 2011 and would feature the ability to take off from runways as short as 6,000 feet, increasing the number of airports to which customers could fly in comparison to its competitors.
Determining an airplane’s minimum allowable takeoff distance requires complex calculations of its takeoff performance, not just in normal operations, but also in the event of an engine failure. But to understand what Gulfstream was trying to accomplish — and why it went wrong — an explanation of several critical speeds is required. Be warned: many numbers lie ahead.
For the average pilot, the most important takeoff parameter is V1, or decision speed, the highest speed at which a pilot can reject the takeoff and stop the plane on the runway under the specified conditions.
The second most important speed overall, and the most important in this particular case, is V2, or takeoff safety speed, the minimum speed which the plane must attain by a height of 35 feet above the ground in the event of an engine failure. This speed guarantees that the plane can climb safely on one engine while remaining easily controllable.
According to federal regulations, the minimum takeoff distance for an airplane is defined by the distance required to accelerate to V1 and then stop; 115% of the distance required to accelerate on all engines to V2 + 10 knots by a height of 35 feet; or the distance required to reach V2 by a height of 35 feet following an engine failure one second before V1, whichever is most limiting. In the case of the Gulfstream G650, it was the third scenario which proved to be the determining factor in its minimum takeoff distance. Therefore, Gulfstream wanted to make sure V2 was slow enough for the plane to legally take off on a 6,000-foot runway, as it had promised to prospective customers.
Most of the time, the V2 speed for a new airplane is derived by means of a special mathematical formula applied to another speed, known as VMU. VMU, or “minimum unstick speed,” is the slowest speed at which it is possible for the aircraft to become airborne. Pull the nose back as far as it will go, and the speed at which the plane lifts off the runway is VMU, a parameter which is immutable for a given set of conditions and can be derived experimentally.
On previous Gulfstream business jets, engineers had discovered that their calculations produced a V2 speed which was lower than the minimum allowed by regulations. When those planes were designed, V2 was required to be no less than 1.2 times the stall speed, the speed below which the wings cannot generate enough lift to keep the plane in the air. Because these early Gulfstream models were capable of climbing safely on one engine at a lower speed than this regulatory minimum, the law itself became the limiting factor when determining the minimum allowable takeoff distance, rather than the performance of the airplane. V2 was therefore defined as being equal to the regulatory minimum, and other takeoff parameters, including the distance required, were reverse-engineered from V2.
However, during the 1990s, the Federal Aviation Administration changed the regulatory minimum for V2 to 1.13 times the lowest speed at which the wings were capable of supporting the weight of the airplane (also known as VSR). As it turned out, Gulfstream did not adequately appreciate how this change would affect its strategy for deriving takeoff speeds on its future aircraft.
When Gulfstream began working on the G650 in the late 2000s, it once again defined V2 as the regulatory minimum of 1.13 VSR. From this value, engineers derived VR, the speed at which the pilot will rotate the nose for liftoff, and VLOF, the speed at which the airplane will lift off the runway during a normal takeoff, using formulas developed for its predecessor airplane, the G550.
However, implicit in this procedure was a major assumption which they failed to scrutinize: that despite working with a new airplane and a new definition of the V2 rule, a V2 speed calculated directly from VMU would still be below the regulatory minimum. Had someone actually crunched the numbers, they would have discovered that the real V2 speed for the G650 was above the regulatory minimum, not below it. In fact, it was impossible to safely achieve a V2 speed of 1.13 VSR on this aircraft. This meant that if the pilot accelerated to rotation speed (VR), pulled the nose up, and lifted off the runway, by a height of 35 feet the plane would invariably be traveling faster than the V2 speed chosen by Gulfstream. A higher V2 speed would not have been a safety issue in and of itself, but the company couldn’t simply increase V2 without also increasing the legally required takeoff distance, thus breaking its promise to customers that the plane could take off from 6,000-foot runways. Gulfstream therefore tasked its test pilots with finding some way to achieve a V2 speed of 1.13 VSR.
During the first round of takeoff tests, held in Roswell, New Mexico in late 2010, the Gulfstream flight test team collected data to determine VMU for the G650. Responsibility for analyzing this data and sending a report to the project management team fell to G650 First Flight Test Engineer Reece Ollenburg, who was also responsible for overseeing each testing session. But due to this high workload, Ollenburg had not finished the report on the VMU tests by the time the project moved to the next phase, in which test pilots would verify the calculated takeoff speeds and develop the techniques that pilots should use in order to achieve them.
As this next set of tests got underway in the first quarter of 2011, the flight test team was faced with the aforementioned problem: whenever a test required them to climb out at V2 speed, they would overshoot this speed by a large margin. Furthermore, scary things started to happen if they tried to decrease their speed by pulling up more steeply during rotation. Their job at Roswell throughout March and April would be to find a takeoff technique that would allow the pilot to consistently achieve V2 without endangering the safety of the airplane. Nobody had told them that this was impossible.
The first disturbing incident had happened in late 2010, during a test designed to determine VMU with the flaps set to 20 degrees. During takeoff, the pilot over-rotated, reaching a pitch angle closer to 13 degrees than to the 10 degrees which had been set as the target. Immediately, the right wing dropped, and the flight controls could not level the plane. The wing might have kept right on going until it hit the ground were it not for the prompt actions of the monitoring pilot, who intervened to pitch the nose down and increase thrust, apparently salvaging the flight.
Following this event, the flight test team met informally and determined that the wing had dipped because the pilot pulled up too steeply during rotation, causing some sort of lateral instability. The engineers agreed, and the test flights continued minutes later.
But the real cause of the uncommanded roll to the right actually went much deeper than that. The roll in fact occurred because the right wing had stalled: its angle of attack relative to the airflow became too great and it ceased to generate lift. The monitoring pilot was able to recover because his pitch down input decreased the angle of attack back below the critical point and reversed the stall.
The reason the engineers didn’t realize that the roll was caused by a stall was because the plane never reached the angle of attack at which they thought a stall would occur. The core of the problem was that the G650, like all other airplanes, will stall at a different angle of attack when the airplane is influenced by ground effect than when it is in free air.
The ground effect is simply the change in airflow behavior over the wing when the plane is close to the ground. Ground effect typically increases lift and decreases drag for a particular angle of attack, but will also decrease the angle of attack at which the plane will stall. During low-speed wind tunnel tests early in the development process, Gulfstream had determined that the G650 in ground effect would stall at an angle of attack (henceforth, AOA) approximately two degrees lower than in free air. Following the VMU tests in 2010, First Flight Test Engineer Ollenburg ran some calculations and revised this estimated difference to 1.6 degrees. The plane’s stall protection systems were then programmed with this figure, including the stick shaker stall warning, and the pitch limit indicator displayed on the pilots’ attitude indicators, which delineates the highest allowable pitch in that phase of flight. The final version of the airplane would also include a computerized flight envelope protection system that would prevent the plane from stalling, but by the time of the 2010–2011 flight tests it was not yet ready to be installed.
However, Ollenburg’s calculations were wrong. The reason was simple: he was relying on academic, peer-reviewed sources which were also erroneous. Several publications had stated, and Ollenburg evidently believed, that the maximum amount of lift which could be generated by the wings — known as the max lift coefficient — was identical in ground effect and in free air, when in fact on many aircraft this value was lower in ground effect. This misunderstanding threw off his math and caused him to significantly underestimate the stall AOA in ground effect, which was in fact as much as 3.5 degrees lower than the stall AOA in free air.
Believing that the ground effect stall AOA was 13.1 degrees (1.6 degrees less than the free air stall AOA of 14.7 degrees), the engineers programmed the stall warnings to activate at an AOA of 12.3 degrees when the plane was close to the ground, in theory providing an adequate margin of warning. But in reality, the plane could stall at an AOA as low as 11.2 degrees while in ground effect, meaning that the stall could occur before the stall warnings activated. In the earlier flight test incident, the pilot over-rotated past the target angle and exceeded the stall AOA in ground effect, causing the right wing to stall with no warning whatsoever.
Although the pilots managed to recover from this incident, they never fully understood what had occurred, and a thorough investigation was not conducted. The problem was assumed to be related to the pilot over-controlling the plane on rotation and was not believed to be indicative of a fundamental problem with the performance calculations — a belief which persisted even after a second uncommanded roll incident during a takeoff test in March 2011.
These two separate miscalculations had now put the flight test crew on course for a near-inevitable disaster. The erroneously low V2 speed calculated by Gulfstream could only be achieved by pitching up on takeoff to an angle which exceeded the plane’s stall AOA in ground effect, and the pilots had no idea. The only question was when the other shoe would drop.
During the weeks and days preceding the fateful test flight on the 2nd of April, the flight test team repeatedly discussed their strategies for finding a takeoff technique that would allow V2 to be reached safely and reliably. In Birmingham, Alabama in February, the pilots had tried rotating at a higher speed than normal, holding the pitch to nine degrees, and then abruptly increasing pitch to 15 or 16 degrees immediately after liftoff.
The crux of the strategy was as follows. Because the AOA is equal to the pitch angle while the plane is rolling across the ground, keeping the pitch to nine degrees during this phase ensured that the AOA would not reach 11–12 degrees, the zone where the pilots knew roll control issues had appeared. Then, once the plane began climbing, they could increase the pitch substantially without increasing the AOA (since the plane’s vector of motion, and consequently the airflow, would start pointing upward as well). As the plane climbed steeply it would hopefully slow down enough to reach V2. Using this method the test pilots had managed to get within 4 knots of V2, but this was still outside the tolerance of ±2 knots required for the test to be successful. Furthermore, the pilots doubted that they could convince the FAA that this was a “normal technique” that regular pilots could be expected to perform.
By April 2nd, the team had become convinced that the technique was too difficult, and a new approach would be needed. “I’m not going to do that jerk stuff, it just doesn’t work,” test pilot in command Kent Crenshaw told his colleagues earlier that morning. “That’s not the way they’re going to fly the airplane, and I don’t think the FAA is gonna like it either… it’s such a great flying airplane, you shouldn’t have to abuse it to get it flying.” The consensus was that they needed a continuous maneuver, not this “jerky” approach where they tried to stop the pitch at nine degrees to avoid the roll issues, only to pull back suddenly after liftoff.
Rethinking the strategy, Crenshaw said after another failed test, “We’re pausing because we’re trying to do this capture, and I think we’re getting too focused on that… because if you have a real engine failure, the guys aren’t gonna be looking at nine degrees, they’re gonna be looking at trying to get to V2.” Concurring with Crenshaw’s assessment, the flight test team decided that on each test they should hold at nine degrees for less time, until eventually they weren’t holding at all, in the process determining the sweet spot where, they believed, they could pull up in one continuous motion, avoid roll control issues, and decelerate to V2 all at the same time. Unfortunately, no such sweet spot actually existed.
Shortly after 9:00 a.m. that day, the crew taxied to runway threshold at Roswell for a test christened 7A2. In the pilots’ seats were Pilot in Command Kent Crenshaw and Second in Command Vivan Ragusa. At their computer stations in the cabin were flight test supervisor Reece Ollenburg and flight test engineer David McCollum. Another five engineers monitored the flight parameters from a telemetry trailer parked next to the runway.
The plan for this test was to simulate an engine failure at V1 with the flaps set to 10 degrees. It was a maneuver they had executed many times before.
“Power set,” Ragusa said.
“Airspeed’s alive, I got the yoke,” Crenshaw replied as the plane rumbled off down the runway.
“Eighty knots,” Ragusa called out.
Five seconds later, at a speed of 105 knots, Ragusa announced, “Chop,” and reduced thrust in the right engine to idle. Another eight seconds passed as the plane continued to accelerate. “Standby… rotate!” he said.
Pulling back with about 50 pounds of force, Crenshaw let the pitch increase smoothly past nine degrees, hoping to lift off right as the pitch neared the danger zone beyond 11 degrees. But he didn’t quite make it: 4.4 seconds after rotation, with the plane still on the ground, the AOA reached 11 degrees. A split second later, the plane lifted off the ground, and two tenths of a second after that, with an AOA of 11.2 degrees, the right wing stalled.
The moment the plane became airborne, it started to roll to the right, despite Crenshaw’s attempts to steer left. Exclaiming that something was “going on,” he applied even more left aileron, to no effect. The ailerons would have been useless if the right wing wasn’t flying. As the AOA increased further, the stick shaker finally activated, prompting Crenshaw to pitch down, back below the pitch limit indicator on his display, believing that this would correct the stall. But it did not. With the plane banked about 13 degrees to the right and in a shallow descent, the right wing struck the ground, scraping along the runway in a shower of sparks.
“Oh, whoa, whoa, whoa, whoa!” Ragusa exclaimed.
“BANK ANGLE! BANK ANGLE!” blared an automated warning.
“Power, power, power!” Crenshaw shouted. Ragusa jammed the right engine back to full power, but the plane still didn’t recover from the right bank.
Unsure why his attempted stall recovery had not corrected the problem, and desperate to keep the plane in the air, Crenshaw hauled back on his controls to climb. The AOA shot up through 22 degrees, the bank angle increased dramatically, and the plane dropped like a rock toward the ground.
“No no no no!” Ragusa shouted.
“BANK ANGLE! BANK ANGLE!”
“Ah, sorry guys!” Crenshaw said. His words would be the last on the cockpit voice recorder. Fifteen seconds after the plane first became airborne, the right wing again struck the ground, sending the plane veering off the right side of the runway. The fuselage slammed to earth, destroying the landing gear, and the plane slid on its belly across the desert, throwing up a great cloud of dust and fire. The jet then careened across a taxiway and into a small concrete structure, rupturing the fuel tanks and triggering a massive explosion, before the plane finally ground to a halt, surrounded by flames.
The impact was not especially hard, and all the crewmembers initially survived. Although Crenshaw’s leg was pinned in the wreckage, Ragusa, McCollum, and Ollenburg were able to get out of their seats and head for the main cabin door. But before they could open it, the intense flames and smoke overcame them, and all four crewmembers perished in the inferno. The other team members ran to the plane from the telemetry trailer, but the heat prevented them from getting close to the door, and by the time fire trucks arrived on the scene four minutes later, it was clear that those on board could not have survived.
The deaths of four elite flight test team members shocked the entire flight testing community and demanded a full investigation. That responsibility would fall to the National Transportation Safety Board, but the agency had little experience with flight test accidents, and the learning curve for the investigators would be steep. Only through extensive coordination with surviving team members, who had a deeply vested interest in understanding why their colleagues died, would the NTSB be able to come to its remarkable conclusions.
By checking and rechecking the math and carrying out numerous simulations, the NTSB and its partners at Gulfstream eventually determined that the manufacturer had applied an outdated procedure for calculating the takeoff speeds which underestimated V2. Simultaneously, an incorrect assumption about the behavior of the plane led to an underestimation of the stall AOA in ground effect, causing the stall warnings to be set incorrectly. The flight test crew, tasked with finding a way to achieve this impossibly low V2 speed, eventually pitched up too much, reached the stall AOA without warning, and could not recover before the right wing struck the ground. Although Crenshaw did attempt a recovery, the incorrectly calibrated warnings led him to believe that he was pitching down enough to escape the stall, when in reality he needed to pitch down even more. With the information he had, there was no way he could have understood the situation in time to prevent the crash.
The NTSB was disturbed to discover that two previous incidents of this same phenomenon, either of which could have led to a similar accident, had not been properly investigated. Gulfstream had no protocol in place for stopping the testing in case of an unexpected event, such as the uncommanded roll. Instead, the team conferred informally and agreed on a mutually acceptable cause — a cause which was sort of correct, but did not capture the true scale of the problem.
In fact, the consistent difficulty in reaching V2 should have served as a warning sign that something was seriously wrong. If Gulfstream had directly calculated V2 from the experimentally proven VMU, they would have realized that the value they had chosen was too low, but nobody did this. From the beginning, Gulfstream assumed that the G650 would behave the same way as its predecessor, the G550, even though this was not the case. Basic calculations could have shown that this was wrong. For example, on the G650, the difference between VLOF (liftoff speed) and the real V2 speed was greater than on the G550. If Gulfstream had reversed the math and attempted to derive the G650’s VLOF from the desired V2 speed based on experimental data, instead of using the formulas from the G550, they would have arrived at a VLOF which was lower than the VMU, a physical impossibility. Obviously the plane cannot lift off at a speed lower than the minimum required to become airborne!
This series of erroneous assumptions was indicative of a testing environment in which safety and quality control were slipping. As the NTSB interviewed more and more Gulfstream employees, the reason started to become clear. From the very beginning, company management had pursued an aggressive testing schedule, which the engineers and pilots thought was unreasonable. Management had a good reason for this: if they couldn’t get the plane certified by September 28th, 2011, the five year anniversary of the original application for the G650 type certificate, the plane would be legally required to meet any new certification requirements introduced during that five-year period. This would cause further delays, preventing the aircraft from being delivered to customers by the promised date. If the company couldn’t meet its promises, they would lose money, and at the end of the day, money is king.
Those who were responsible for testing the airplane were not happy with the schedule imposed upon them by management. The lead flight test engineer had told his supervisors that this tight schedule didn’t leave room for contingencies, but senior managers told him that this was a risk they were willing to take. The director of the G650 flight sciences department acknowledged that the schedule would probably slip, but said he didn’t want to extend the deadline because management felt people wouldn’t work as hard. “We like to keep a sense of urgency at Gulfstream to keep things moving,” he explained.
By March 2011, the FAA’s Atlanta regional Aircraft Certification Office had become concerned that the program would not meet the September deadline. On March 31st the ACO wrote to Gulfstream: “For some time now the FAA has expressed our concerns about the overly aggressive schedule, and for some time now you have acknowledged ‘unofficially’ that things are slipping; however, the company TIA schedule continues to reflect a pace that has proven to be unrealistic.” However, the FAA did not have the power to force Gulfstream to modify its flight test schedule, since the agency would only become directly involved once the plane graduated from flight testing to certification testing. And before Gulfstream could respond to the FAA’s letter, the accident occurred.
In its final report, the NTSB explicitly concluded that this schedule pressure was one of the major reasons for the company’s failure to adequately investigate the V2 overshoot problem, and subtly chastised Gulfstream’s attempts to argue otherwise. In order to save time and labor, formulas and methods from previous models were applied to the G650 without checking whether they were appropriate. Then, faced with a looming deadline, the entire project team became so focused on achieving a goal that they didn’t take a step back to consider whether the goal was achievable. The possibility that the selected V2 speed could not be reached was not something management wanted to think about: not only would it throw off the certification schedule, it would also jeopardize promises made to customers about the jet’s takeoff performance. As a result, even though increasing V2 was the most obvious solution to the V2 overshoot problem, this course of action was never seriously considered.
Contributing to these failures were a number of other organizational problems at Gulfstream. First of all, the duties of individual team members were poorly defined and often differed from the duties described in the flight test manual. Originally, different engineers were supposed to supervise the tests and analyze the data, but over time these two responsibilities had drifted together, and by 2011 First Flight Test Engineer Reece Ollenburg was in charge of both. This workload was too high for one person, and as a result he hadn’t finished his analysis of the data from the Fall 2010 VMU tests by the time the next round of takeoff tests began in February 2011. If this data had been fully analyzed the fundamental problems with the takeoff speeds might have been discovered. The NTSB felt that it was inappropriate to go ahead with takeoff performance testing before the speed schedules had been checked against the experimentally-derived VMU data from 2010.
This decision underscored a dangerous lack of “control gates” in the Gulfstream G650 program. In the field of project management, a control gate is a step which must be completed before the project can proceed, ensuring that no stage of the project is attempted before its prerequisites are fulfilled. In this case, the absence of control gates enabled the project to go ahead despite a dangerous lack of empirical knowledge about the plane’s takeoff speed schedule.
As a result of the tragedy in Roswell, Gulfstream made major changes to its organization and management style, as well as the G650 airplane itself. Field testing of the plane was halted until December 2011, during which time Gulfstream created a new computer model which would help develop an appropriate rotation technique and assess its margins versus the stall AOA, instead of making test pilots figure this out in the real aircraft. Gulfstream ultimately settled on a V2 speed 15 knots higher than the one originally proposed, but in the end they were still able to meet the 6,000-foot minimum runway length guarantee by increasing the maximum takeoff thrust produced by the engines. The company also developed better tools for detecting anomalies in the test data, and more robust procedures for handling these anomalies before testing could be resumed. Finally, the G650 was outfitted with more advanced fire suppression systems, more emergency exits were added, and Gulfstream introduced a policy of putting airport firefighters on standby near the runway whenever high-risk flight tests are being conducted.
On September 7th, 2012, the Gulfstream G650 finally received its type certification, nearly one year after the promised date. Despite the delay, the model was a success, selling over 400 examples by the end of 2020. But if Gulfstream had done its due diligence and abandoned its attempts to meet an unrealistic deadline, the delay surely would have been shorter, and four men would still be alive.
Flight testing has always carried an element of danger, and it likely always will. But it is still the responsibility of the manufacturer to reduce the risk as much as possible, a responsibility that should not be made subservient to corporate goals. Test pilots are some of the best of us, possessing great skill, excellent judgment, and exceptional courage. But ultimately they too are humans with families, and in this day and age, no test pilot’s spouse or children should have to hear that dreaded knock on the door and the words, “I’m sorry, but there’s been an accident.”
The tragedy of Gulfstream Aerospace flight 153 also holds lessons which extend far beyond the aviation industry. The accident was a classic case of poor project management and a detrimental workplace attitude. Forcing team members to overwork themselves in the interests of an impossible project schedule is not beneficial to the workers or the company, and usually results in unsafe corner-cutting, as Gulfstream discovered the hard way. A project can be done quickly or it can be done well; it is all but impossible to ask for both. Gulfstream has surely learned that lesson — but will managers at other companies and in other industries take it to heart? Or will they continue to push their teams to the brink of disaster in the pursuit of the almighty dollar? Although it may be hoped that this analysis will help open some eyes, in the end, only time will tell.
Visit r/admiralcloudberg to read and discuss over 200 similar articles.
You can also support me on Patreon.