MGS in Trouble, Formerly: MGS in safe mode |
MGS in Trouble, Formerly: MGS in safe mode |
Jan 11 2007, 03:41 PM
Post
#226
|
|
Founder Group: Chairman Posts: 14434 Joined: 8-February 04 Member No.: 1 |
Some sort of software/commanding problem caused...
A bad attitude which heated up the battery radiator which caused... Battery failure which caused... Loss of vehicle, as I understand it so far. |
|
|
Jan 11 2007, 05:00 PM
Post
#227
|
|
Member Group: Members Posts: 153 Joined: 14-August 06 Member No.: 1041 |
The MCO loss was more a process problem, stimulated by a simple calculation error... Another English to Metric conversion? Given the shear number of human interactions with the MGS, it is a extraordinary accomplishment of the MGS team to have kept the ball in the air this long. In a way it is like playing Tetres: No matter how great you are, the result of every mission without a firm time-line will be a failure of some sort, usually human...no matter how super human the effort:) I hope they will be candid and timely in providing a detailed description of the failure and the lessons learned. Knowing the reason that an un-timed mission failed is one more mission success. |
|
|
Jan 11 2007, 05:14 PM
Post
#228
|
|
Newbie Group: Members Posts: 8 Joined: 7-January 07 Member No.: 1568 |
I feel for the MGS software team if it turns out to not be a direct software issue. It's understandable that NASA management and the public want to know as quickly as possible the root cause of a fault, but, having experienced similar situations, it's painful to see headlines like "Faulty Software May Have Doomed Mars Orbiter" before we have a definitive answer. Unfortunately, it may be too late to correct the impressions that have been made if it is a parameter issue or something of that nature.
|
|
|
Jan 11 2007, 07:25 PM
Post
#229
|
|
Senior Member Group: Members Posts: 2922 Joined: 14-February 06 From: Very close to the Pyrénées Mountains (France) Member No.: 682 |
It's a wellknown fact that the majority of car accidents occur on the road you know the better. You can call that statistics or lack of concentration. The longer a mission goes, the more likely an human error will occur. I'm just amazed how long the Voyagers have flown, it'll be 30 years this year.
To the software people : we back you guys, habit is a bad thing and people only remember your failures. We just CAN't fly without you. -------------------- |
|
|
Jan 11 2007, 10:47 PM
Post
#230
|
|
Interplanetary Dumpster Diver Group: Admin Posts: 4404 Joined: 17-February 04 From: Powell, TN Member No.: 33 |
Another English to Metric conversion? Given the shear number of human interactions with the MGS, it is a extraordinary accomplishment of the MGS team to have kept the ball in the air this long. In a way it is like playing Tetres: No matter how great you are, the result of every mission without a firm time-line will be a failure of some sort, usually human...no matter how super human the effort:) I hope they will be candid and timely in providing a detailed description of the failure and the lessons learned. Knowing the reason that an un-timed mission failed is one more mission success. They may not. They will certainly look into scenarios of what might have happened, but since contact was lost and there was only limited contact between November 2 and November 6 (which, I believe, was the last day they picked any signal out), it may be hard to isolate the cause of failure. I will also say that human error can have a magnified effect on extended missions, which are usually funded at much lower levels than primary missions, stretching staffing to the bone. -------------------- |
|
|
Jan 12 2007, 12:30 AM
Post
#231
|
|
Newbie Group: Members Posts: 18 Joined: 17-September 06 From: USA Member No.: 1151 |
-------------------- Lorne Ipsum, Chief Geek
Geek Counterpoint blog & podcast |
|
|
Jan 12 2007, 12:52 AM
Post
#232
|
|
Senior Member Group: Members Posts: 2922 Joined: 14-February 06 From: Very close to the Pyrénées Mountains (France) Member No.: 682 |
Gang, This might help explain things a bit: Lorne Lorne, you have a way of explaining rocket science, I've never seen before! I've learnt a lot of things...and that seams SO simple to understand. Thanks so much... -------------------- |
|
|
Jan 12 2007, 12:55 AM
Post
#233
|
|
Merciless Robot Group: Admin Posts: 8785 Joined: 8-December 05 From: Los Angeles Member No.: 602 |
Absolutely superb & highly educational analysis, Lorne; thank you VERY much!
The bottom line is that many parts of this read exactly like every aircraft accident report I've ever read: there is always a chain of events that increases unknowns and ultimately leads the entire system (including the human element) into an uncontrollable situation with basically unpredictable, often undesirable outcomes. I sure hope that the MGS software team member(s) involved near the end don't feel too bad; they shouldn't. Aside from the brilliant performance of the spacecraft that vastly exceeded all reasonable dreams before launch, complex systemic failures just plain happen. They seem to be an inevitable feature of the Universe, and I'm sure that the mathematics of chaos theory could easily prove this. -------------------- A few will take this knowledge and use this power of a dream realized as a force for change, an impetus for further discovery to make less ancient dreams real.
|
|
|
Jan 12 2007, 03:16 AM
Post
#234
|
|
Newbie Group: Members Posts: 18 Joined: 17-September 06 From: USA Member No.: 1151 |
nprev & climber,
Thanks -- glad you liked the writeup! I'm with you -- hopefully the poor guy at the bottom of the totem pole doesn't get beat up too severely over this (he'll be reliving it for the rest of his life anyway). I've worked mission ops for old spacecraft with static memory maps before, and I remember how we ALWAYS got paranoid whenever we did parameter updates. When push comes to shove, the fact that a mistake like this could go unnoticed for months says there's a bad process being followed (or a good one not being followed) somewhere. Hopefully the review board can come up with some lessons that can be applied to more modern architectures. Lorne -------------------- Lorne Ipsum, Chief Geek
Geek Counterpoint blog & podcast |
|
|
Jan 12 2007, 03:56 AM
Post
#235
|
|
Senior Member Group: Members Posts: 1592 Joined: 14-October 05 From: Vermont Member No.: 530 |
Thanks for the extremely informative blog!
I am unsure about one thing though. Towards the end, in "the spark that lit the fire," you do not mention when MGS was switched back to SCP-1. Was this part of the safe mode? Or had it already been transitioned back to SCP-1? And if the transition was an intentional switch back, I tend to agree that at least some process should have caught the bad parm upload. (ie a comparison of the two memories) But if the switch back was a result of a safing event before the SCP-1 memory repair was fully verified, well, that just sucks but is less faultworthy. |
|
|
Jan 12 2007, 10:19 AM
Post
#236
|
|
Dublin Correspondent Group: Admin Posts: 1799 Joined: 28-March 05 From: Celbridge, Ireland Member No.: 220 |
Lorne - superb write up, one of the best bits of reporting on spacecraft ops I've ever come across. Any chance you're available to help the BBC out as they seem to be in need of a major quality control overhaul at the moment?
|
|
|
Jan 12 2007, 11:45 AM
Post
#237
|
|
Senior Member Group: Members Posts: 1870 Joined: 20-February 05 Member No.: 174 |
Viking's case involved a thrown-together set of people from the disbanded engineering and software team. VL1 was on an automatic "eternal" mission that was hopefully not going to require any further commanding ever. They were trying to salvage or extend the mission by uploading battery conditioning commands as the battery started to show similar problems to the VL2 batteries that killed that lander's operations.
Note that the Magellan Venus radar mapper mission was nearly lost early on due to a high-lethality interrupt handling error that could send the computer essentially into runaway crashes. They finally "trapped" the error when the ground duplicate test system did a interupt fault and crashed while full diagnostic info was available. I'm deeply unhappy with trusting in software driven "safe modes", preferring that the spacecraft be able to fall back into an ultimate nearly lobotomized mechanical safe mode. Remember, Pioneers 10 and 11 never had a software problem, never rebooted, never crashed. No computers. All the way beyond Pluto on direct commands (except for sequencer stored commands for midcourse maneuvers). I'm also deeply unhappy with spacecraft inside of Jupiter's orbit that do not have essentially 100% omnidirectional coverage with low data rate omni-antennas. We nearly lost the ability to command Mariner 10 when it was being stabilized in a drifting roll mode and it rolled into a null in the receiving antenna pattern shortly before the third Mercury encounter. We also had problems with Magellan getting into nearly communication-unable attitudes during one or more of it's computer crash crises. You really want to get 8 bits/second telemetry as long as a spacecraft has power and live command decoding circuits, and the ability to send 1 bit/second commands. |
|
|
Feb 13 2007, 01:42 PM
Post
#238
|
|
Newbie Group: Members Posts: 11 Joined: 13-August 05 From: Belgium Member No.: 465 |
Link sends me to "Episode 52 -- The Antikythera Mechanism" How do I get to the right one? Joining that forum? |
|
|
Feb 15 2007, 10:39 PM
Post
#239
|
|
Member Group: Members Posts: 172 Joined: 17-March 06 Member No.: 709 |
Lorne, Have "they" gotten to you? It appears that the excellent post that you wrote concerning MGS and its software in January is now "disappeared." In fact, except for TPS' mention of it in their weblog, and this UMSF thread, there is no hint that that article ever existed. This is truly bizarre. What's up Lorne? Another Phil |
|
|
Feb 16 2007, 03:37 PM
Post
#240
|
|
Member Group: Members Posts: 153 Joined: 14-August 06 Member No.: 1041 |
Somewhere - but I cannot find where - I read Lorne's scenario was not likely to be correct. This may be why the article was pulled, which is too bad, because it was a very good description of MGS era computer systems.
In any case, it will be disappointing if yet another 'successful mission unplanned ending' investigation is kept under wraps. |
|
|
Lo-Fi Version | Time is now: 26th September 2024 - 08:52 AM |
RULES AND GUIDELINES Please read the Forum Rules and Guidelines before posting. IMAGE COPYRIGHT |
OPINIONS AND MODERATION Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators. |
SUPPORT THE FORUM Unmannedspaceflight.com is funded by the Planetary Society. Please consider supporting our work and many other projects by donating to the Society or becoming a member. |