IPB

Welcome Guest ( Log In | Register )

18 Pages V  « < 15 16 17 18 >  
Reply to this topicStart new topic
MGS in Trouble, Formerly: MGS in safe mode
PhilHorzempa
post Feb 16 2007, 05:46 PM
Post #241


Member
***

Group: Members
Posts: 167
Joined: 17-March 06
Member No.: 709



Lorne,

Please let us at UMSF know what happened to your
MGS software article on Geek Counterpoint. It was
an excellent presentation and helped a lot of us who
may not know software as well as you, but are technically
informed enough to comprehend the issues.

If there are questions as to whether this is what really
ended the MGS mission, then please consider re-posting
an edited version of the article that omits that conclusion.
It was fascinating to catch this glimpse into a crucial
aspect of unmanned exploration. As I believe someone
else has already said, our robot explorers do exactly what
we tell them. The unfortunate thing is that sometimes
we don't realize what we have told them.


Another Phil
Go to the top of the page
 
+Quote Post
elakdawalla
post Apr 13 2007, 03:44 PM
Post #242


Administrator
****

Group: Admin
Posts: 5166
Joined: 4-August 05
From: Pasadena, CA, USA, Earth
Member No.: 454



The preliminary report is out, and it sounds like what Lorne described.
Here's the report:
http://www.nasa.gov/pdf/174244main_mgs_whi...er_20070413.pdf

--Emily

QUOTE
MEDIA RELATIONS OFFICE
JET PROPULSION LABORATORY

NEWS RELEASE: 2007-040 April 13, 2007

REPORT REVEALS LIKELY CAUSES OF MARS SPACECRAFT LOSS

WASHINGTON - After studying Mars four times as long as originally planned, NASA's Mars Global Surveyor orbiter appears to have succumbed to battery failure caused by a complex sequence of events involving the onboard computer memory and ground commands.

The causes were released today in a preliminary report by an internal review board. The board was formed to look more in-depth into why NASA's Mars Global Surveyor went silent in November 2006 and recommend any processes or procedures that could increase safety for other spacecraft.

Mars Global Surveyor last communicated with Earth on Nov. 2, 2006. Within 11 hours, depleted batteries likely left the spacecraft unable to control its orientation.

"The loss of the spacecraft was the result of a series of events linked to a computer error made five months before the likely battery failure," said board Chairperson Dolly Perkins, deputy director-technical of NASA Goddard Space Flight Center, Greenbelt, Md.

On Nov. 2, after the spacecraft was ordered to perform a routine adjustment of its solar panels, the spacecraft reported a series of alarms, but indicated that it had stabilized. That was its final transmission. Subsequently, the spacecraft reoriented to an angle that exposed one of two batteries carried on the spacecraft to direct sunlight. This caused the battery to overheat and ultimately led to the depletion of both batteries. Incorrect antenna pointing prevented the orbiter from telling controllers its status, and its programmed safety response did not include making sure the spacecraft orientation was thermally safe.

The board also concluded that the Mars Global Surveyor team followed existing procedures, but that procedures were insufficient to catch the errors that occurred. The board is finalizing recommendations to apply to other missions, such as conducting more thorough reviews of all non-routine changes to stored data before they are uploaded and to evaluate spacecraft contingency modes for risks of overheating.

"We are making an end-to-end review of all our missions to be sure that we apply the lessons learned from Mars Global Surveyor to all our ongoing missions," said Fuk Li, Mars Exploration Program manager at NASA's Jet Propulsion Laboratory, Pasadena, Calif.

EDITORS NOTE:

NASA will hold a media teleconference today at noon PDT (3 p.m. EDT), to discuss the report.

Audio of the teleconference will stream live at: http://www.nasa.gov/newsaudio


--------------------
My blog - @elakdawalla on Twitter - Please support unmannedspaceflight.com by donating here.
Go to the top of the page
 
+Quote Post
djellison
post Apr 13 2007, 06:59 PM
Post #243


Administrator
****

Group: Chairman
Posts: 14201
Joined: 8-February 04
Member No.: 1



I hope someone asks what the projected remaining on-orbit lifespan of the spacecraft was before it went awol - that tells us the true value of the loss really.

(And guess who got in with the first question - a great one about orientation...nice one ESL smile.gif - I hope you can manage a trademark timeline of events to break it all down )

Damn - I missed the last 5 minutes.

Doug
Go to the top of the page
 
+Quote Post
elakdawalla
post Apr 13 2007, 09:57 PM
Post #244


Administrator
****

Group: Admin
Posts: 5166
Joined: 4-August 05
From: Pasadena, CA, USA, Earth
Member No.: 454



I've now posted a story on the review board report.

http://planetary.org/news/2007/0413_Human_...s_Together.html

--Emily


--------------------
My blog - @elakdawalla on Twitter - Please support unmannedspaceflight.com by donating here.
Go to the top of the page
 
+Quote Post
brellis
post Apr 14 2007, 03:26 AM
Post #245


Member
***

Group: Members
Posts: 747
Joined: 9-February 07
Member No.: 1700



QUOTE (elakdawalla @ Apr 13 2007, 02:57 PM) *
I've now posted a story on the review board report.

--Emily


Thanks for the thorough reporting. One of the questions lingering in my head about the more advanced computers onboard the unmanned orbiters launched in the last decade has been operating system maintenance. Most of us here on earth now have to deal with OS updates, compatibility, etc., and most of us by now have experienced a fatal error on a home computer at some point. I have several Macs and PC's, each of which has a different combination of repair engines and potential OS crises waiting - like your very appropriate analogy - like a hammer to fall.

In my experience troubleshooting my 'puters, I try to assume that by the time I'm in trouble with one my my machines, it's not the result of only one problem. Usually a few problems have coagulated into a destructive condition. Your article describes an unfortunate sequence of missteps that could have been avoided with a Disk Repair program of some kind.

--Brad
Go to the top of the page
 
+Quote Post
nprev
post Apr 14 2007, 03:43 AM
Post #246


Senior Member
****

Group: Admin
Posts: 8410
Joined: 8-December 05
From: Los Angeles
Member No.: 602



An absolutely classic 'chain of mistakes/events' scenario, all too familiar from aircraft accident accounts. Excellent reporting, Emily, and thanks!

There are indeed many lessons to be learned here. The main one is that configuration control is an imperative. Two different groups should never have been responsible for maintaining identical spacecraft software-driven bus functions; that's inviting disaster right there.


--------------------
A few will take this knowledge and use this power of a dream realized as a force for change, an impetus for further discovery to make less ancient dreams real.
Go to the top of the page
 
+Quote Post
helvick
post Apr 14 2007, 08:59 AM
Post #247


Dublin Correspondent
****

Group: Admin
Posts: 1799
Joined: 28-March 05
From: Celbridge, Ireland
Member No.: 220



I hope Lorne will re-instate his analysis now too - my recall of the article was that it was fundamentally correct and his explanation of the challenges involved in the "simple" day to day management of MGS systems was enlightening.
Go to the top of the page
 
+Quote Post
edstrick
post Apr 14 2007, 10:19 AM
Post #248


Senior Member
****

Group: Members
Posts: 1870
Joined: 20-February 05
Member No.: 174



There is a real need for a computer controlled spacecraft to be able to declare "utter dire emergency" and nearly lobotimize itself, switch to a hopefully nearly bulletproof safety control system and safe itself. There's an increasingly long list of lost, nearly lost, and compromized missions where vehicles couldn't properly safemode (Magellan's computer system crashes and NEAR's pre-orbit-insertion burn screwup at Eros) etc.

Pioneer Jupiter missions never had a computer crash and safemode emergency EVER... (no computer)... The missions were done entirely by direct ground command except for turn and burn stored commands in a sequencer for midcourse maneuvers.
Go to the top of the page
 
+Quote Post
MarsIsImportant
post Apr 14 2007, 12:29 PM
Post #249


Member
***

Group: Members
Posts: 258
Joined: 22-December 06
Member No.: 1503



It seems to me that part of the solution is they need to redefine what safe mode is. What they thought was safe mode was actually self-destruct mode. ...Of course, I understand that it's not quite that simple.
Go to the top of the page
 
+Quote Post
mcaplinger
post Apr 14 2007, 02:35 PM
Post #250


Senior Member
****

Group: Members
Posts: 1913
Joined: 13-September 05
Member No.: 497



QUOTE (edstrick @ Apr 14 2007, 03:19 AM) *
Pioneer Jupiter missions never had a computer crash and safemode emergency EVER...

The fact that those spacecraft had no need to maintain attitude to the Sun (RTG-powered) and had no articulation makes the problem a lot simpler, doesn't it? Given the complexities of having two separately articulated solar panels, need for battery charge management, an articulated HGA, being in a low orbit with no sun half the time, etc, MGS's safe mode design drivers were vastly more complicated. To think that the way out of these problems is to have a "simpler" safe mode is naive. MGS was lost via a long chain of unlikely errors, any subset of which would have left things OK. We just got unlucky. With 20-20 hindsight, the problems seem rather obvious, as such problems usually do.


--------------------
Disclaimer: This post is based on public information only. Any opinions are my own.
Go to the top of the page
 
+Quote Post
elakdawalla
post Apr 14 2007, 05:44 PM
Post #251


Administrator
****

Group: Admin
Posts: 5166
Joined: 4-August 05
From: Pasadena, CA, USA, Earth
Member No.: 454



I totally agree, Mike. One of the questions I wasn't able to get an answer to, which I would have liked to include in the article, was: how many times did MGS encounter a fault, enter safe mode, and recover successfully because its fault protection worked? Its 10 years were made possible by lots of "lessons learned" from previous missions, and its demise, though sad, does give designers insight into a whole 'nother set of potential faults that they can now plan for, and help make sure it never happens to another mission.

Until robots really do become intelligent, I fear it's much more likely for a long-lived mission to fail unexpectedly due to some bizarre chain of unforseen events that human programmers just didn't plan for, than for the mission to fail for purely mechanical reasons. It seems to me that we now make plans to end missions before they fail for mechanical reasons, and deorbit them or take some other such protective action. But you just can't plan for every possible human error. You just have to try to plan for everything that's remotely likely. They just didn't plan for this particular bizarre string of events.

--Emily


--------------------
My blog - @elakdawalla on Twitter - Please support unmannedspaceflight.com by donating here.
Go to the top of the page
 
+Quote Post
mcaplinger
post Apr 14 2007, 08:52 PM
Post #252


Senior Member
****

Group: Members
Posts: 1913
Joined: 13-September 05
Member No.: 497



QUOTE (elakdawalla @ Apr 14 2007, 10:44 AM) *
One of the questions I wasn't able to get an answer to, which I would have liked to include in the article, was: how many times did MGS encounter a fault, enter safe mode, and recover successfully because its fault protection worked?

You can read through the status reports at http://mars.jpl.nasa.gov/mgs/status/reports/msop-mgs.html
looking for "safe mode", "contingency mode", and "c-mode".


--------------------
Disclaimer: This post is based on public information only. Any opinions are my own.
Go to the top of the page
 
+Quote Post
nprev
post Apr 15 2007, 12:36 AM
Post #253


Senior Member
****

Group: Admin
Posts: 8410
Joined: 8-December 05
From: Los Angeles
Member No.: 602



Emily, "bizarre strings of events" are almost always how mishaps occur in aviation & probably in every other field of endeavour as well. Good systems engineering strives to minimize design features that might induce single-point and at least some chained failures, but ultimately in the real world external systemic influences add many layers of complexity (and often thousands of variables) that can never be completely controlled. This is concisely and quite accurately summarized in pop culture as "**** happens", of course... smile.gif

I am convinced that this is a fundamental heuristic of the Universe, and unfortunately probability implies that the most unlikely chain of events will someday occur to induce an uncontrollable amount of entropy into any given system, thus making its future behavior impossible to predict with accuracy. The MGS ground team did nothing fundamentally wrong; in fact, despite the prima facie tone of my previous post, I meant no criticism of them at all. Lessons learned to realize small single-point improvements is all we can do; entropy will always win in the end, despite our best efforts.


--------------------
A few will take this knowledge and use this power of a dream realized as a force for change, an impetus for further discovery to make less ancient dreams real.
Go to the top of the page
 
+Quote Post
brellis
post Apr 15 2007, 02:54 AM
Post #254


Member
***

Group: Members
Posts: 747
Joined: 9-February 07
Member No.: 1700



nprev said:

"I am convinced that this is a fundamental heuristic of the Universe, and unfortunately probability implies that the most unlikely chain of events will someday occur to induce an uncontrollable amount of entropy into any given system, thus making its future behavior impossible to predict with accuracy. The MGS ground team did nothing fundamentally wrong; in fact, despite the prima facie tone of my previous post, I meant no criticism of them at all. Lessons learned to realize small single-point improvements is all we can do; entropy will always win in the end, despite our best efforts."

I have an abiding and long-lived personal fascination with the concept of entropy, because I've been friends with the Los Angeles punk band by that name. Humans trying to define laws of nature become enforcers of those laws. "The system as defined is now perfect, and it will work indefinitely into the future" is a frustrating mental block in the human effort to define the universe and reform it in our own image.

If entropy killed MGS, it was just tiny particles of entropy. Entropy on a grander scale would turn MGS, Mars and the entire human endeavour into some kind of mush, would it not?
Go to the top of the page
 
+Quote Post
ElkGroveDan
post Apr 15 2007, 03:15 AM
Post #255


Senior Member
****

Group: Admin
Posts: 4750
Joined: 15-March 05
From: Sloughhouse, CA
Member No.: 197



QUOTE (brellis @ Apr 14 2007, 06:54 PM) *
Entropy on a grander scale would turn MGS, Mars and the entire human endeavour into some kind of mush, would it not?

Actually a very cold and very diffuse gas consisting of disassociated ions would be a more accurate fate of the craft if entropy were taken to it's extreme.

Mush has way to much energy and molecular organization.


--------------------
If Occam had heard my theory, things would be very different now.
Go to the top of the page
 
+Quote Post

18 Pages V  « < 15 16 17 18 >
Reply to this topicStart new topic

 



RSS Lo-Fi Version Time is now: 26th January 2020 - 04:16 AM
RULES AND GUIDELINES
Please read the Forum Rules and Guidelines before posting.

IMAGE COPYRIGHT
Images posted on UnmannedSpaceflight.com may be copyrighted. Do not reproduce without permission. Read here for further information on space images and copyright.

OPINIONS AND MODERATION
Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators.
SUPPORT THE FORUM
Unmannedspaceflight.com is a project of the Planetary Society and is funded by donations from visitors and members. Help keep this forum up and running by contributing here.