Oppy Sw/hw Problems |
Oppy Sw/hw Problems |
Oct 17 2005, 10:30 PM
Post
#31
|
|
Member Group: Members Posts: 252 Joined: 27-April 05 Member No.: 365 |
I'm pretty sure the software on the two rovers is -identical-. However, the various settings are very different for each.
|
|
|
Guest_Edward Schmitz_* |
Oct 18 2005, 03:08 AM
Post
#32
|
Guests |
It is very unlikely that a bug in the software is causing the reboots. These reboots are coming late in the life of the rover and are not happening on Spirit. It is far more likely that an aging component is to blame. Probably in the computer some place. Something that the computer can't recover from like an error in a cpu register. There are a thousand components that if flaky would send Opportunity into a brain freeze. This would result in the watchdog circuit rebooting the rover.
We can't rule out a software bug that only is rearing it's ugly head because of something new they are doing. But they've been using these things too long for that to be likely. Not to be a downer, but these reboots could start becoming more frequent to the point of - dare I say it - End of Mission. ed |
|
|
Oct 18 2005, 04:41 AM
Post
#33
|
|
Member Group: Members Posts: 350 Joined: 20-June 04 From: Portland, Oregon, U.S.A. Member No.: 86 |
I'll say that it's too early to say for sure whether it's a flaw in the hardware or the software. Opportunity and Spirit, while performing the same basic tasks, are not performing exactly the same tasks. Perhaps some unnoticed, barely important flag is set on Opportunity that is not set on Spirit, or vice versa.
I do think it's unlikely it's a hardware problem - any sort of hardware problem would likely lead to Opportunity being utterly unusable. Unless you get really lucky, a bad gate on a chip will probably make the entire chip useless. At best, perhaps one byte of Opportunity's memory is bad, and it is only being written/read very rarely, but I would think that the engineers at NASA/JPL would be able to easily detect this - didn't they already shut out a chunk of the flash RAM on one of the rovers? If you forced me to take a bet, I'd wager on a bug in the software.. Hardware can recover from bad software. Software can rarely recover from bad hardware - especially a bad CPU. |
|
|
Oct 18 2005, 05:24 AM
Post
#34
|
|
Member Group: Members Posts: 477 Joined: 2-March 05 Member No.: 180 |
Well if it comes down to it, I'm sure the operators could do a "format and reinstall" of the Rover's memory. I'd hope that they've got lots of possible diagnostics routines to do - and heck, look at how bad Spirit was back in the early days, and they got it going just fine.
|
|
|
Oct 18 2005, 07:41 AM
Post
#35
|
|
Senior Member Group: Moderator Posts: 4279 Joined: 19-April 05 From: .br at .es Member No.: 253 |
Doug,
This thread is getting quite OT. May I suggest a new one and move all this discussion about Oppy sw/hw problems there? |
|
|
Oct 18 2005, 07:43 AM
Post
#36
|
|
Senior Member Group: Moderator Posts: 4279 Joined: 19-April 05 From: .br at .es Member No.: 253 |
Hi all,
I suggest to continue all discussions about current and/or potential sw/hw problems here. |
|
|
Oct 18 2005, 02:32 PM
Post
#37
|
|
Member Group: Members Posts: 510 Joined: 17-March 05 From: Southeast Michigan Member No.: 209 |
QUOTE (mike @ Oct 18 2005, 12:41 AM) Amen. Unfortunately, not everyone up the food chain from the Poor Bloody Programmer understands this Sometimes it is cheaper/easier to fix hardware problems with software, but often it's just exchanging one set of problems for another. -------------------- --O'Dave
|
|
|
Guest_Edward Schmitz_* |
Oct 19 2005, 02:56 AM
Post
#38
|
Guests |
This problem has started recently and is recuring. They are not doing anything that is significantly different than before. The software has not been updated for a long time. There is a chance that it is a subtle bug that they never stepped on before. But that is unlikely. It is more likely that something has changed. If it were a bug, the software would have a good chance of catching it and recording it - even if it could not recover from it.
"Software can rarely recover from bad hardware" H/w is often intermittent. And a h/w reset will often get you back in action. We all seen crazy PCs that have intermittant problems and resetting them gets us going until the next failure. It keeps happening until that memory stick or offending PCI card is replaced. The rovers have a hard wired reset that will automatically cycle the computer when the software stops pinging this watchdog circuit. This is common pratice in a system that is unattended. Here is what we know. 1) It is a recent development. 2) It is reoccuring. 3) The software is not recording the problem. 4) They have not been able to corrolate it to any action being performed. Computers run on zeros and ones but the circuitry is analog. When a chip becomes marginal, it can randomly change the state of something that it shouldn't. This kind of behavior is common in computers that have been subjected to radiation or thermal cycling. Radiation degrades electronics. They often don't fail straight out when subjected to radiation. Thermal cycling damages solder joints. These can open and close rapidly. It is a common mode of failure in a system the is thermally cycled. My prediction is that the resets will continue and become more frequent. And oh how I hope I am wrong! |
|
|
Oct 19 2005, 03:50 AM
Post
#39
|
|
Member Group: Members Posts: 350 Joined: 20-June 04 From: Portland, Oregon, U.S.A. Member No.: 86 |
In my experience, an electronic device with any sort of damage will fail regularly and fail spectacularly. However, I agree that there's no way to say for sure - especially since we don't have much information as far as the exact nature of the failures - and what you say about an intermittently flawed device makes sense, particularly given the harsh environs of space.
I imagine that a fatal bug in the rover software will result in a core dump of the operating program, and if there were to be no core dumps, that to me would be a sign of a (bad) hardware problem. If there were to be core dumps, but utterly useless core dumps (no/nonsensical stack trace), that would be a sign of a possible hardware problem, though not necessarily.. Sadly, however, I can not say for certain it's not a hardware problem. I hope it isn't. |
|
|
Nov 1 2005, 10:56 AM
Post
#40
|
|
Senior Member Group: Members Posts: 2998 Joined: 30-October 04 Member No.: 105 |
From the latest Mission Status at the NASA/JPL site:
http://marsrovers.jpl.nasa.gov/mission/sta...tml#opportunity QUOTE Sol 622: Untargeted observations included a panorama to examine the amount of light reflected from the surface and a ground survey. A software glitch resulted in losing the afternoon communication relay session with Mars Odyssey. The problem was a repeat of one experienced previously on Spirit's sols 131 and 209 and on Opportunity's sol 596. It occurs when a "write" command reaches an area of memory during a vulnerability period of a few microseconds when that memory location cannot accept a new write command. The rover team is investigating the problem. And a battery "glitch": QUOTE Sol 623: This was a recovery sol. Opportunity returned data directly to Earth during an X-band communication window after calibration of the high-gain antenna. It also performed a calibration of the panoramic camera mast assembly (the rover's "head") to regain use of it and to stow the camera. One of the rover's two batteries would not recharge, which at first puzzled the team. A switch that allows battery 1 to recharge was not enabled, so the battery was temporarily unable to recharge. On the following morning (sol 624), the switch was enabled and the battery subsequently operated normally. Engineers' analysis indicates that recharging was not enabled on sol 623 because the rover did not use enough electricity from the battery during the previous sol (622) to draw the battery's charge below a level pre-set as a threshold for allowing a recharge. This explains why Oppy's been erratic for the past few Sols... --Bill -------------------- |
|
|
Guest_Sunspot_* |
Nov 1 2005, 11:06 AM
Post
#41
|
Guests |
Well.....there doesn't appear to be any data from Sol 629 yet. Maybe another software glitch - or that dust storm !!
|
|
|
Nov 1 2005, 11:14 AM
Post
#42
|
|
Member Group: Members Posts: 562 Joined: 29-March 05 Member No.: 221 |
re: glitches
wrt hardware problems: it's not rocket science... err wait yes it is wrt software problems: it's not brain surgery.. umm hang on.. ah ha ha new aphorism regarding problems with the rovers "it's not rocket surgery" |
|
|
Nov 1 2005, 01:26 PM
Post
#43
|
|
Senior Member Group: Moderator Posts: 4279 Joined: 19-April 05 From: .br at .es Member No.: 253 |
Based on currenlty planned seqs for sols 628-630 (see below) and taking into account that some pancams previously planned for sol 628 just disappeared, I would say that Oppy had another hiccup.
--- 628 p0767.03 14 0 0 14 0 28 navcam_7x1_az_288_3_bpp 628 p1663.01 6 0 0 6 0 12 navcam_3x1_az_108_1_bpp 630 p0767.03 14 0 0 14 0 28 navcam_7x1_az_288_3_bpp 630 p1663.01 6 0 0 6 0 12 navcam_3x1_az_108_1_bpp ---- .. or it may be an energy (read dust storm) issue. |
|
|
Nov 1 2005, 03:11 PM
Post
#44
|
|
Member Group: Members Posts: 153 Joined: 11-December 04 Member No.: 120 |
QUOTE (paxdan @ Nov 1 2005, 11:14 AM) re: glitches wrt hardware problems: it's not rocket science... err wait yes it is wrt software problems: it's not brain surgery.. umm hang on.. ah ha ha new aphorism regarding problems with the rovers "it's not rocket surgery" I wouldn't call something that travels at 5 mm per second a rocket... |
|
|
Nov 1 2005, 08:18 PM
Post
#45
|
|
Member Group: Members Posts: 350 Joined: 20-June 04 From: Portland, Oregon, U.S.A. Member No.: 86 |
Hey, it is a software problem and not a hardware problem.. I now have further evidence for my 'any real hardware problem causes an utter failure', regardless of said hardware being in the presence of several inter-galactic cosmic particles. Of course, if said inter-galactic cosmic particle(s) somehow hit the gate(s) in the EPROM that control(s) the delay before writing to memory (and didn't break anything else), then I stand corrected.
|
|
|
Lo-Fi Version | Time is now: 16th June 2024 - 01:28 PM |
RULES AND GUIDELINES Please read the Forum Rules and Guidelines before posting. IMAGE COPYRIGHT |
OPINIONS AND MODERATION Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators. |
SUPPORT THE FORUM Unmannedspaceflight.com is funded by the Planetary Society. Please consider supporting our work and many other projects by donating to the Society or becoming a member. |