2001 Systems Engineering Capstone Conference • University of Virginia
Student Team: Jacob Burns, Jeff Noonan, Laura Kichak, and Beth Van Doren
Faculty Advisor: Yacov Y. Haimes NASA RISK ASSESSMENT AND MANAGEMENT ROADMAP Department of Systems Engineering
Client Advisors: Jose Caraballo
Langley Research Center
KEYWORDS: Risk Filtering, Ranking, and with the identification of several hundred risk scenarios, Management Framework (RFRM), Hierarchical which are then progressively filtered down to a smaller Holographic Model (HHM), “Faster Better Cheaper” set of scenarios that are essential for success. These (FBC), Head-Topic. remaining risks generally have a high probability of
occurring, with consequences very serious in nature.
By generating multiple policy options for each of these ABSTRACT
scenarios, risk management plans can be developed to
Recent mission failures have raised doubts about reduce the probability of each one’s likelihood of the effectiveness of NASA’s current risk management occurrence and minimize the severity of its effects. procedures. As a result, NASA commissioned the Recommendation of the optimal alternative in each case University of Virginia's Center for Risk Management of is made based on trade-off analyses conducted between Engineering Systems, directed by Dr. Yacov Y. Haimes, associated costs, schedule delays, and effectiveness. to develop five-year roadmap that identifies the Providing NASA with this scientific approach for activities required to meet NASA’s long-term corporate managing risks will equip them with the tools necessary goals. The purpose of the Capstone effort was to locate to safeguard their missions against failure. and analyze different methodologies that could be incorporated into this plan. As with a traditional INTRODUCTION roadmap, our risk-management roadmap stems from three pieces of information: A “Faster, Better, Cheaper,” (FBC) philosophy
involves trying to launch more missions at a fraction of
1) Where are we now? the cost. Using this project approach, NASA’s recent
- What risk management knowledge missions have experienced many adverse effects.
and practices are currently in place at Failures in the past two missions to Mars caused NASA
NASA? not only to lose millions of dollars and potential
2) Where do we want to go? scientific return, but also to face the public humiliation
- What is the level of risk associated with disaster (Dickey, 2000). Applying risk
management required to reduce the assessment and risk management procedures to future
likelihood of mission failures? projects can prevent further failures from occurring.
3) How do we get there? These processes answer six questions to accomplish
- How do we enhance NASA’s their objectives (figure 1).
knowledge of risk management, and
how do we provide the means to Risk Assessment Risk Management implement this knowledge? “What can go wrong?” “What can be done?” “What is the likelihood that “What are the available options something will go wrong?” and their associated tradeoffs?” The methodology discovered that best fits the
“What are the associated “What are the impacts of current needs of NASA is an eight-phase approach called the consequences?” decisions to future options?” risk filtering, ranking, and management (RFRM)
framework. RFRM systematically isolates all critical Fig. 1. The six questions of Risk Assessment and Risk risks facing a NASA mission. The methodology begins Management (Haimes, 2001) 183
NASA Risk Assessment and Management Roadmap
categories should be discarded. For our analyses, we METHODOLOGY chose to eliminate the scenarios falling under the moderate and low categories. The Risk Filtering, Ranking, and Management Framework (RFRM) was created by Yacov Haimes,
Stan Kaplan and James Lambert. When applied to a
specific mission, the eight phases of the RFRM method
guides an effective process that minimizes system-
associated risks. Adoption of these procedural
guidelines will allow for the increased reliability of a
NASA space project.
In Phase I, all of the risk scenarios present in
NASA’s organizational structure are identified (Haimes,
2001). These risks form the framework of the
Hierarchical Holographic Model (HHM).
Fig. 3. Ordinal US Air Force Risk Matrix
In Phase IV, each of the remaining scenarios is
quantitatively rated on its defensive attributes, such as
detectability. Numerous attributes are each assigned a
weight and the level of their effects are rated as high,
medium, or low, corresponding to scores of five, three,
and one. The total scores of all of the scenarios are
then calculated and those with a score below a user-
defined threshold are filtered out. One important aspect
of this phase is that any scenario believed to be crucial
to a mission's success can have its categories and
weightings altered to generate the necessary score
(Haimes, 2001). Phase V is similar to the bi-criteria filtering of Fig. 2. Sample Hierarchical Holographic Model phase III in that it also uses the Air Force Risk Matrix, but here cardinal rating is used rather than ordinal The set of risk scenarios is reduced in Phase II classification. Numerical probability ranges minimize using the interests of the current. Scope and temporal any discrepancies in personal judgement among domain are two factors that they may consider while information sources (Haimes, 2001). thinning the risk set. For example, each NASA center After phase V, only a few scenarios remain. Phase is only concerned with certain technological aspects VI asks, “What can be done to reduce these risks?” and time periods of a mission’s implementation and This phase consists of an enumeration and analysis of time would be better spent if it concentrates only on various courses of action, which are assessed for their risks defined within its defined within this scope. cost effectiveness (Haimes, 2001). This analysis is Next, each risk in the narrowed set is qualitatively performed with the fractile method. classified based on probability of occurrence and Next, the entire system is examined by taking the associated consequences. This step (phase III), called management policies identified in phase VI into bi-criteria filtering, employs the ordinal version of US account. In phase VII, the robustness of the plan is Air Force Risk Matrix (Figure 2). The two criteria used evaluated to determine if any risk scenarios may have to complete the classification are the probability of a been missed. This may call for some of the earlier risk's occurring and its associated consequences. The phases to be revisited. probability of risk occurrence is classified as frequent, Phase VIII is also called operational feedback. likely, occasional, seldom or unlikely, while the This methodology can always be improved upon. An consequences range from loss of life to no effect. The assessment of the cost and time of remedial measures combination of a scenario’s probability and will indicate the effectiveness of the methodology consequences assign it one of the following severities: (Haimes, 2001). Extremely high, High, Moderate, and Low. At this point, it is at the user’s discretion to choose which
• University of Virginia 2001 Systems Engineering Capstone Conference
APPLICATIONS planetary swingbys brought the craft negative attention from the American Public because, even though After generating an HHM for the organization of 2 detrimental PONASA, mission-specific case studies proceeded to add effects could only occur by the highly improbable robustness to the model. Each Capstone Team member chance of explosion inside Earth’s atmosphere, chose an FBC-era mission, conducted under NASA’s NASA’s recent track record did not assure faith in
guidance, on which they ran an RFRM risk assessment. imminent success. Currently, the Cassini is still on These four missions were the Mars Polar Lander, the route to Saturn, having successfully looped the Earth.
Mars Climate Orbiter, the STS-93, and the Cassini Mission. Compiling the common risks remaining after STS-93 Phase V in the four examinations revealed trends
pertinent to an understanding of NASA’s weaknesses. The STS-93’s primary objective was to deploy the
Chandra X-Ray Observatory. The orbiter chosen for
this task was the Columbia, which had been used in THE MARS CLIMATE ORBITER
twenty-six previous missions. During launch on July
The Mars Climate Orbiter, a Jet Propulsion 23, 1999, an electrical shortage disabled two main Laboratory (JPL) mission, was intended to be the first engines’ computers. Examination of the orbiter upon Martian weather satellite. Orbiting around the planet, return revealed that the origin of the problem was the Orbiter’s main tasks were to perform global damaged wire that had been incorrectly handled during sounding of the atmospheric and imaging of the maintenance. The RFRM identified maintenance as a planet’s surface, and to provide relay assistance for the key issue for this mission. Several options for Mars Polar Lander. Unfortunately, rather than minimizing this risk were examined and it was establishing itself in orbit, the spacecraft crashed into recommended that NASA conducts more tests and hires the surface of Mars. The root cause of the mishap was more maintenance supervisors. the failure to use metric units in the coding of the trajectory software file, “Small Forces”. The output THE MARS POLAR LANDER from this file, SM_Forces, was required by the Mars Surveyor Operations Project (MSOP) Software The purpose of the Mars Polar Lander was to Interface specification to be in Newton-seconds explore previously undiscovered regions of Mars, (metric). Instead the program returned data in pound-namely the South Pole. The mission had three primary seconds (English), which caused an offset of 4.45 in the goals: to see if there was evidence of life, past or trajectory calculations (Mishap Investigation Board present; to analyze weather processes and history; and [MIB], 1999). to determine the possible resources, if any, that exist on
The identified contributing causes of the failure the Red Planet [Mars Polar Lander]. No space agency, were: modeling of spacecraft velocity changes, American or foreign, had sent a probe to either the knowledge of spacecraft characteristics, trajectory North or the South Pole; the MPL was supposed to be correction maneuver TCM-5, systems engineering the first. The MPL was launched on January 3, 1999, rdprocess, communications among project elements, and deemed lost 11 months later on December 3. The operations navigation team staffing, training of primary reason for mission loss has been attributed to a personnel, and validation and verification processes design flaw that caused a premature shutdown of the (MIB, 1999). landing rockets during touchdown. While premature shutdown was most likely the technical cause of THE CASSINI MISSION mission loss, the real source of failure lies within the NASA organization and its management policies. As a Launched in October of 1997, the Cassini “Faster, Better, Cheaper” baby, the MPL was nearly 30 Mission was an international cooperative space effort percent under funded. The scarceness of money led to conducted by NASA, the European Space Agency many problems, including insufficient time to properly (ESA), and the Italian Space Agency (ASI). Cassini’s test a few essential components. After the MPL objective was to conduct a four-year scientific crashed into the Martian surface, NASA and JPL were exploration of the planet Saturn and its largest moon, left shaking their heads and questioning why. Using the Titan, in an attempt to gain insight into the birth and risk filtering and ranking methodology, over 400 evolution of our solar system (Ulrich, v.) The Cassini’s sources of potential error were identified and, using this controversial use of both plutonium fuel (POprocess, the field was narrowed to about twenty mission ) and 2
NASA Risk Assessment and Management Roadmap
specific and NASA-oriented problems. Some of the Distribution of Final Head Topicsmajor areas of error included “inexperienced project
managers,” “improper software verification and 4validation,” and “elimination of oversight teams.” 3
1The RFRM was applied to minimize general risks 0Number of Scenariosat NASA. In each of the mission case studies, several
hundred risks were successfully identified in phase I.
OrganizationalThese risks were then subject to filtering and ranking in
Humanphases II-V, with the following risks remaining after
Hardwareassessments and the removal of mission specific Softwarescenarios: Fig. 5. Distribution of Risks under Head Topics Communication Head-Topic ScenariScenario Name LeadershipFive policy options were identified for this risk o ID Managementscenario. Organizational A.7 Culture Resource Allocation A.8 Faster Risk Management Plan Systems EngineeringOption A Do nothing Human B.1 Trust External B.3 Stress Assign responsibility of one employee within each sub-group to Option B play role of an role independent oversight manager B.5 Employee Indifference (Overlooking Problems) Option C Re-assign current employees for oversight teams Hardware C.1 Maintenance
Software D.1 Increased Use Without Hire new employees to occupy internal oversight teams (One Increasing Verification and Option D per project) Validation D.2 Insufficient Testing Option E Hire external consultants as oversight teams Communication E.1 Inadequate Error Tracking
E.3 Comm. Between NASA and Fig. 6. Policy options for the Elimination of Oversight contractors E.2 Communication between Sub-Teams teams Management G.2.1 Inexperienced Project Application of the fractile method provided the Managers Resource H.2 Lack of Qualified Personnel expected percentage of errors for each alternative. An Allocation error is defined as the improper reporting, tracking, or H.4 Elimination of Oversight handling of a problem in the system due to the Teams elimination of oversight teams. Plotting these expected Systems I.2 Insufficient Supervision of Engineering Communication between values against each option’s associated monetary costs Engineers and Teams and time delays created two pareto frontiers graphs,
which graphically represent the trade-offs. For example,
the trade-offs between two attributes, namely
percentage of errors and cost, are exhibited in Figure 7. Fig. 4. Common risk scenarios from the four mission
case studies after the removal of mission specific 600000Multi-Objective Analysis (Cost)hazards. EE500000 400000To test the effectiveness of the remaining phases of DD300000the RFRM methodology, one scenario was chosen to UnconditionalExp Valueadvance into Phase VI, Risk Management. The 200000Cost ($K)selection of the scenario “Elimination of Oversight Conditional Exp100000CCBBValueTeams” resulted from consideration of the overall 0AAdistribution of the remaining scenarios under the 020406080Percentage of errors that are not tracked properlyvarious Head-Topics as well as their relevance to
mission success. Fig. 7. Pareto Frontier for Cost 186
2001 Systems Engineering Capstone Conference • University of Virginia
concurrent missions at each center better using the A multi-objective trade-off analysis combining all RFRM method described in this project. We feel that three attributes, namely to make recommendations to with improved communication, improved wages, and NASA in order to eliminate, if not minimize, the risk of the addition of workers to reduce stress levels, NASA not properly handling errors. can maintain its current ambitions and return to its lofty Multive-Objective Tradeoff (Size Indicates Costs)status as the world leader in space travel and technological innovation. 14 D 12 BREFERENCES 10 8CDerby, Stephen L. & Ralph L. Keeney. (1981). Risk 6 E4Analysis: Understanding “How Safe is Safe Enough?” 2Time Delay (months)AIn Theodore S. Glickman & Michael Gough (Eds.), 0Readings in Risk (pp.43-52). Washington D.C.: 01020304050 Percentage of errors that are not handled, reported, or Resources for the Future. tracked properly Fig. 8. Multi-Objective Trade-off Analysis of cost, Dickey, Beth. (2000, September). “Midcourse time delay, and percentage of errors Correction: NASA discovers faster and cheaper don’t add up to better”. Government Executive, 29-38. RECOMMENDATIONS
Haimes, Yacov Y. (1999). Development of a Risk An analysis of the scenarios remaining after
Management Roadmap for NASA. Virginia. ranking and filtering reveals that NASA is experiencing
problems in nearly every organizational area. The five
Haimes, Yacov Y. (1998). Risk Modeling, Assessment, areas producing the most problems are organizational,
and Management. New York: Wiley-Interscience human, software, communication, and resource
Publication. allocation. This is not to say that the other areas are relatively problem-free; a different analysis may Haimes, Yacov Y., James Lambert, & Stan Kaplan. produce a different scenario distribution. (2001). Risk Filtering, Ranking, and Management NASA is one of the most important government Using Hierarchical Holographic Modeling Framework. agencies in the United States. It alone is responsible for Charlottesville: University of Virginia. expanding our knowledge of the universe. As a result,
it must lower the current failure rate of its missions.
Hoffman, Edward J. (1996). “Issues in NASA Program This project revealed the effectiveness of the RFRM
and Project Management”. NASA Office of method in risk mitigation at NASA.
Management Systems and Facilities Scientific and The management plans chosen to handle the risks
Technical Information Programs. Washington: NASA. surrounding the elimination of oversight teams were
either to assign current employees to man oversight
Intellectual Capital. (1997). “NASA's Shrinking teams or to hire new employees to occupy internal Budget”. Intellectual Capital. Date Accessed: October oversight teams. These solutions provided the best 29, 2000. Date Posted: August 7, 1997. tradeoff between risk and cost of implementation. <http://ic.voxcap.com/issues/issue100/item4461.asp> Further risk management plans can be developed and evaluated in the same manner previously discussed to Lawler, Andrew. (2000, April). “’Faster, Cheaper, handle the other major issues facing NASA. Better’ on Trial.” Science, 32-34. NASA employees represent some of the best and brightest scientists and engineers in the world. Mishap Investigation Board. (1999). Mars Climate However, even the most dedicated and intelligent Orbiter Mishap Investigation Board: Phase I Report. workers have their limit. Currently, NASA employees Pasadena, CA: JPL laboratories. are overworked, underpaid, and faced with an environment that does not foster trust and open Molak, Vlasta (Ed.). (1997). Fundamentals of Risk communication (MIB, 1999). Despite limited Analysis and Risk Management. New York: Lewis government funding, NASA officials must find a way Publishers. to combat these issues facing their employees. NASA can explore its options like reductions in the number of
NASA Risk Assessment and Management Roadmap
National Aeronautics Space Administration. (1999).
NASA FBC Task Final Report. Washington: NASA.
Shuttle Presskit. (1999, July). “STS-93: Shuttle
Presskit”. Date Accessed: April 14, 2001. Date Posted:
July 13, 1999. <http://www.shuttlepresskit.com/STS-
Ulrich, Dr. Peter B. (1995) Final Environmental Impact
Statement for the Cassini Mission (FEIS). Washington,
Jacob Burns is a fourth-year Systems Engineering major from Mclean, VA. His concentration is
management systems. Mr. Burns principal contribution
to the project was the analysis of the Mars Polar Lander
in relation to the NASA risk assessment. He has
accepted a position as a consultant for Anderson in
Laura Kichak is a fourth-year Systems Engineering major from Silver Spring, MD. She has a minor in
Economics and is concentrating in management
systems. Her principal contribution to the project was
the analysis of the Space Transportation System 93. Ms.
Kichak has accepted a position at SAIC in Arlington,
Jeff Noonan is a fourth-year Systems Engineering major from Fair Lawn, NJ. His concentration is in
Management and Computer Information Systems.
Jeff’s principal contribution to the project was the
analysis of the Mars Climate Orbiter. He has accepted
a position with UBS PaineWebber in New York City.
Beth Van Doren is a fourth-year Systems Engineering
major from Branchburg, NJ. Her concentration is in
History. Beth's principal contribution to the project was
the analysis of the Cassini Mission. Beth plans to study
for the LSAT this summer in hopes of attending law
school in the near future.