MGS in Trouble, Formerly: MGS in safe mode |
MGS in Trouble, Formerly: MGS in safe mode |
Apr 15 2007, 03:39 AM
Post
#256
|
|
Merciless Robot Group: Admin Posts: 8789 Joined: 8-December 05 From: Los Angeles Member No.: 602 |
To continue the nihilism: a cold vacuum of slowly decaying protons amidst a sea of barely energized leptons several trillion years from now...
No matter. We all do the best we can, and the MGS team performed WAY beyond any initial expectations. (How inspiring, how refreshing, to see such magnificent dedication, brilliance, and innovation to make this mission last so long, yes? This is the spirit of humanity at its very best.) Each and every one of them should get a medal as far as I'm concerned for making truly significant contributions to human knowledge and exploration. I envy them the private satisfaction they each must feel for doing something that really meant a great deal, not just now but a thousand years from now... -------------------- A few will take this knowledge and use this power of a dream realized as a force for change, an impetus for further discovery to make less ancient dreams real.
|
|
|
Apr 15 2007, 05:43 AM
Post
#257
|
|
Member Group: Members Posts: 754 Joined: 9-February 07 Member No.: 1700 |
We all do the best we can, and the MGS team performed WAY beyond any initial expectations. (How inspiring, how refreshing, to see such magnificent dedication, brilliance, and innovation to make this mission last so long, yes? This is the spirit of humanity at its very best.) Each and every one of them should get a medal as far as I'm concerned for making truly significant contributions to human knowledge and exploration. I envy them the private satisfaction they each must feel for doing something that really meant a great deal, not just now but a thousand years from now... I wholeheartedly agree. Think of how far humanity could go if we all achieved on this level? I still wonder about a simple Disk Repair program |
|
|
Apr 16 2007, 08:48 AM
Post
#258
|
|
Member Group: Members Posts: 247 Joined: 17-February 07 From: ESAC, cerca Madrid, Spain. Member No.: 1743 |
I read the anomaly report that NASA put out, and I can follow what happened based upon similar experience. I've screwed up enough while at the console to understand what happened.
From some of the comments here, people may have a very different picture of how the usual (old) spacecraft console software works as compared to the reality. I'll try to make a few points, and maybe people can tell me if I'm missing anything. In my experience, the old console software is not very high tech. The projects are always run by hardware guys, who don't know much about software. And the pressure in the programs, before launch, are almost always hardware driven. So, you end up with software which is not exactly state of the art, being used to control hardware which often IS state of the art. Most of the old ground software I've used is very manual. So for instance, I can understand exactly how the report's errors occurred. There is a command prompt that asks you for the value of the parameter you want to change, and the memory address in RAM. You type it in. You don't change the redundant side's values at the same time, until you know it worked on the primary. So later, another guy repeats it for the redundant side. He types it in. But he types it in different than the first guy. Error 1. Later, you do a memory dump. These were generally crude tools that spit out pages and pages of hard copy, in hex, with very little technology to help you make sense of it. It is fingers moving over the page, finding values in two places that match (or don't). But as usual, they found the problem. Good job, team. Now they do it all again. Run the memory update program, enter the addresses of the parameters by hand, then enter the correct value by hand. But in this case, the parameter was entered correctly this time, but they typed in the address wrong. Error 2. I've done that. And it isn't pretty. Generally, the old console software won't catch it. It will do whatever you tell it to do, and put whatever you want in any location. There are no limit checks, no graphical displays to show you in what location these parameters are actually going. There's nothing to back you up. So you should have people double check what you're doing. But into your fourth mission extension, it may not seem that important. Your computers have been shoved into some corner to make way for more important things, you have people working part time whose real focus is on other things, none of your managers are paying attention to the mission anymore, so nothing you do on it seems like it is going to exactly help your career. The edge, you could say, is missing, and inevitably things happen. Usually recoverable, sometimes not. You are not working with changes in operating systems. No one changes the operating system once it is launched, unless you do a patch to fix a serious problem. You do everything in your power to forget you have an operating system. You just work with parameters, whenever possible. And you change parameters by making direct writes of numbers to specified memory locations, all of which are entered manually at a prompt. Type in either a bad address, or a bad parameter value, and if it goes through unnoticed, you have a time bomb in your RAM. A memory location where the parameter should be between X and Y, and you just put in a value of Z = 3Y. A lot of these problems with the ground systems are now fixed, with missions starting off with much better ground systems than the older missions had. MGS launched in 1996; the ground system software was locked down at least six months or a year before that. The software was probably based on designs from the early nineties. As with everything else, the ground software has changed a lot between 1992 and 2007. And it has changed because of exactly the kinds of errors that got made on MGS. But since no program every spends budget to improve the working software of old missions, things like this can happen. As for safe modes, keep in mind that any spacecraft safe mode is designed to handle a single fault. No one even attempts two-fault solutions, because anything beyound single-fault planning gives you an almost infinite number of possibilities to plan against, which cannot be done on the budget you have. And when you enter safe mode, the flight code uses defined parameters in RAM. You can have a perfectly lovely safe mode definition, but if the parameters have been corrupted, all bets are off; anything can happen. If you think that using some sort of safe mode that is absolutely hard coded would be safer, I would disagree. Things are learned after launch, often very very disturbing things. Having the flexibility to alter the parameters is much safer than not. And this flexibility allows you to tailor the safe mode to things like failed hardware, which cannot be planned for in advance. There is talk about how the lessons learned from this will include periodic end-to-end reviews, looking into how the manned program does things, and ways to keep the operators fresh and enthused. Well, end-to-end reviews that will actually be detailed enough to catch parameter discrepancies are long, detailed, require experts who are working on current programs with tight deadlines and budgets, and require money to fund them. The human spaceflight side has a lot more money for these things, because lives are at stake. Unmanned missions get their fourth extension based on the fact that they promise to spend almost no money at all, otherwise the spacecraft would have been shut down and hurtled into the planet. These are the sorts of things managers say at times like this, but when it comes down to funding them, count me as quite sceptical. New missions will take priority for the cash. And sometimes, that is the right decision. There are people out there who know a lot more about the MGS specifics than I do. If I'm way off, let me know. But this was my take on the whole thing, for what its worth. -------------------- --
cndwrld@yahoo.com |
|
|
Apr 16 2007, 02:56 PM
Post
#259
|
|
Merciless Robot Group: Admin Posts: 8789 Joined: 8-December 05 From: Los Angeles Member No.: 602 |
Interesting and valuable insight from someone who's been there, cndwrld...thank you!
There's a lot of trade space between flexibility & foolproofing in human/machine interface, but in your general examples it sure sounds like the bias is sometimes set too far to the former. Setting up a table of parameters in MS Access or something for each of the redundant databases & then continuously comparing them for equality (and flagging fields that don't match) doesn't seem too hard or expensive to build. Foolproof? No, nothing really is. I'm sure that many if not most SV operators do something exactly like this, and bad things still happen. -------------------- A few will take this knowledge and use this power of a dream realized as a force for change, an impetus for further discovery to make less ancient dreams real.
|
|
|
Apr 16 2007, 03:37 PM
Post
#260
|
|
Senior Member Group: Members Posts: 3419 Joined: 9-February 04 From: Minneapolis, MN, USA Member No.: 15 |
That was an excellent summary, Don -- it demonstrates what I've been saying all along, that the limitations on most every human endeavor have more to do with financial and schedule pressures than they do with the limits of our technology or imagination.
Now, as we all know, there are a lot of ways to automate the processes you discuss. Heck, back in Gemini days, more than 40 years ago, command loads to the Agena target vehicles were sent up pretty much exactly as you describe, here. But even back then, they had an automatic comparator that would check the command load as sent against the command load as received by the Agena. Only when that comparator failed did they end up digging through printouts of the command loads to verify that the load was properly received. Now, that's not exactly the same as comparing an actual command load to a desired command load, but its similar in process. And thus, the technology to error-check a lot of this stuff has been around for a long time. As you have so effectively pointed out, the ground support stuff is usually designed (or used off-the-shelf) to do its job, bare-bones, no extras. Error trapping is almost non-existent. And lest anyone think that this is just an issue with ESA's efforts, recall that an average command load to the MERs requires most of an individual's workday to prepare -- seven or eight hours. We all know it's *possible* to create error-trapping front-end software for such things that would allow a rover driver to tell the front-end interface: "We want to drive 20 meters in this specific direction, take the following image series, and then prepare for an overnight Odyssey pass." It's very possible to set it up so that creating and radiating the appropriate command series would take the rover driver 10 or 15 minutes, and the front-end would ensure that all commands sent to the spacecraft would be safe and properly executable. Why isn't it done like that? Probably because it would have cost too much in time and money to develop such a front-end system in the first place... -the other Doug -------------------- “The trouble ain't that there is too many fools, but that the lightning ain't distributed right.” -Mark Twain
|
|
|
Lo-Fi Version | Time is now: 31st October 2024 - 10:49 PM |
RULES AND GUIDELINES Please read the Forum Rules and Guidelines before posting. IMAGE COPYRIGHT |
OPINIONS AND MODERATION Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators. |
SUPPORT THE FORUM Unmannedspaceflight.com is funded by the Planetary Society. Please consider supporting our work and many other projects by donating to the Society or becoming a member. |