IPB

Welcome Guest ( Log In | Register )

4 Pages V  < 1 2 3 4 >  
Reply to this topicStart new topic
Oppy Sw/hw Problems
Burmese
post Oct 17 2005, 10:30 PM
Post #31


Member
***

Group: Members
Posts: 252
Joined: 27-April 05
Member No.: 365



I'm pretty sure the software on the two rovers is -identical-. However, the various settings are very different for each.
Go to the top of the page
 
+Quote Post
Guest_Edward Schmitz_*
post Oct 18 2005, 03:08 AM
Post #32





Guests






It is very unlikely that a bug in the software is causing the reboots. These reboots are coming late in the life of the rover and are not happening on Spirit. It is far more likely that an aging component is to blame. Probably in the computer some place. Something that the computer can't recover from like an error in a cpu register. There are a thousand components that if flaky would send Opportunity into a brain freeze. This would result in the watchdog circuit rebooting the rover.

We can't rule out a software bug that only is rearing it's ugly head because of something new they are doing. But they've been using these things too long for that to be likely.

Not to be a downer, but these reboots could start becoming more frequent to the point of - dare I say it - End of Mission.

ed
Go to the top of the page
 
+Quote Post
mike
post Oct 18 2005, 04:41 AM
Post #33


Member
***

Group: Members
Posts: 350
Joined: 20-June 04
From: Portland, Oregon, U.S.A.
Member No.: 86



I'll say that it's too early to say for sure whether it's a flaw in the hardware or the software. Opportunity and Spirit, while performing the same basic tasks, are not performing exactly the same tasks. Perhaps some unnoticed, barely important flag is set on Opportunity that is not set on Spirit, or vice versa.

I do think it's unlikely it's a hardware problem - any sort of hardware problem would likely lead to Opportunity being utterly unusable. Unless you get really lucky, a bad gate on a chip will probably make the entire chip useless. At best, perhaps one byte of Opportunity's memory is bad, and it is only being written/read very rarely, but I would think that the engineers at NASA/JPL would be able to easily detect this - didn't they already shut out a chunk of the flash RAM on one of the rovers?

If you forced me to take a bet, I'd wager on a bug in the software.. Hardware can recover from bad software. Software can rarely recover from bad hardware - especially a bad CPU.
Go to the top of the page
 
+Quote Post
Jeff7
post Oct 18 2005, 05:24 AM
Post #34


Member
***

Group: Members
Posts: 477
Joined: 2-March 05
Member No.: 180



Well if it comes down to it, I'm sure the operators could do a "format and reinstall" of the Rover's memory. I'd hope that they've got lots of possible diagnostics routines to do - and heck, look at how bad Spirit was back in the early days, and they got it going just fine.
Go to the top of the page
 
+Quote Post
Tesheiner
post Oct 18 2005, 07:41 AM
Post #35


Senior Member
****

Group: Moderator
Posts: 4279
Joined: 19-April 05
From: .br at .es
Member No.: 253



Doug,

This thread is getting quite OT.
May I suggest a new one and move all this discussion about Oppy sw/hw problems there?
Go to the top of the page
 
+Quote Post
Tesheiner
post Oct 18 2005, 07:43 AM
Post #36


Senior Member
****

Group: Moderator
Posts: 4279
Joined: 19-April 05
From: .br at .es
Member No.: 253



Hi all,

I suggest to continue all discussions about current and/or potential sw/hw problems here.
Go to the top of the page
 
+Quote Post
odave
post Oct 18 2005, 02:32 PM
Post #37


Member
***

Group: Members
Posts: 510
Joined: 17-March 05
From: Southeast Michigan
Member No.: 209



QUOTE (mike @ Oct 18 2005, 12:41 AM)
Software can rarely recover from bad hardware.
*


Amen. Unfortunately, not everyone up the food chain from the Poor Bloody Programmer understands this smile.gif

Sometimes it is cheaper/easier to fix hardware problems with software, but often it's just exchanging one set of problems for another.


--------------------
--O'Dave
Go to the top of the page
 
+Quote Post
Guest_Edward Schmitz_*
post Oct 19 2005, 02:56 AM
Post #38





Guests






This problem has started recently and is recuring. They are not doing anything that is significantly different than before. The software has not been updated for a long time. There is a chance that it is a subtle bug that they never stepped on before. But that is unlikely. It is more likely that something has changed. If it were a bug, the software would have a good chance of catching it and recording it - even if it could not recover from it.

"Software can rarely recover from bad hardware"

H/w is often intermittent. And a h/w reset will often get you back in action. We all seen crazy PCs that have intermittant problems and resetting them gets us going until the next failure. It keeps happening until that memory stick or offending PCI card is replaced.

The rovers have a hard wired reset that will automatically cycle the computer when the software stops pinging this watchdog circuit. This is common pratice in a system that is unattended.

Here is what we know.
1) It is a recent development.
2) It is reoccuring.
3) The software is not recording the problem.
4) They have not been able to corrolate it to any action being performed.

Computers run on zeros and ones but the circuitry is analog. When a chip becomes marginal, it can randomly change the state of something that it shouldn't. This kind of behavior is common in computers that have been subjected to radiation or thermal cycling. Radiation degrades electronics. They often don't fail straight out when subjected to radiation. Thermal cycling damages solder joints. These can open and close rapidly. It is a common mode of failure in a system the is thermally cycled.

My prediction is that the resets will continue and become more frequent. And oh how I hope I am wrong!
Go to the top of the page
 
+Quote Post
mike
post Oct 19 2005, 03:50 AM
Post #39


Member
***

Group: Members
Posts: 350
Joined: 20-June 04
From: Portland, Oregon, U.S.A.
Member No.: 86



In my experience, an electronic device with any sort of damage will fail regularly and fail spectacularly. However, I agree that there's no way to say for sure - especially since we don't have much information as far as the exact nature of the failures - and what you say about an intermittently flawed device makes sense, particularly given the harsh environs of space.

I imagine that a fatal bug in the rover software will result in a core dump of the operating program, and if there were to be no core dumps, that to me would be a sign of a (bad) hardware problem. If there were to be core dumps, but utterly useless core dumps (no/nonsensical stack trace), that would be a sign of a possible hardware problem, though not necessarily..

Sadly, however, I can not say for certain it's not a hardware problem. I hope it isn't. smile.gif
Go to the top of the page
 
+Quote Post
Bill Harris
post Nov 1 2005, 10:56 AM
Post #40


Senior Member
****

Group: Members
Posts: 2998
Joined: 30-October 04
Member No.: 105



From the latest Mission Status at the NASA/JPL site:

http://marsrovers.jpl.nasa.gov/mission/sta...tml#opportunity
QUOTE
Sol 622: Untargeted observations included a panorama to examine the amount of light reflected from the surface and a ground survey. A software glitch resulted in losing the afternoon communication relay session with Mars Odyssey. The problem was a repeat of one experienced previously on Spirit's sols 131 and 209 and on Opportunity's sol 596. It occurs when a "write" command reaches an area of memory during a vulnerability period of a few microseconds when that memory location cannot accept a new write command. The rover team is investigating the problem.


And a battery "glitch":

QUOTE
Sol 623: This was a recovery sol. Opportunity returned data directly to Earth during an X-band communication window after calibration of the high-gain antenna. It also performed a calibration of the panoramic camera mast assembly (the rover's "head") to regain use of it and to stow the camera. One of the rover's two batteries would not recharge, which at first puzzled the team. A switch that allows battery 1 to recharge was not enabled, so the battery was temporarily unable to recharge. On the following morning (sol 624), the switch was enabled and the battery subsequently operated normally. Engineers' analysis indicates that recharging was not enabled on sol 623 because the rover did not use enough electricity from the battery during the previous sol (622) to draw the battery's charge below a level pre-set as a threshold for allowing a recharge.


This explains why Oppy's been erratic for the past few Sols...

--Bill


--------------------
Go to the top of the page
 
+Quote Post
Guest_Sunspot_*
post Nov 1 2005, 11:06 AM
Post #41





Guests






Well.....there doesn't appear to be any data from Sol 629 yet. Maybe another software glitch - or that dust storm !! unsure.gif
Go to the top of the page
 
+Quote Post
paxdan
post Nov 1 2005, 11:14 AM
Post #42


Member
***

Group: Members
Posts: 562
Joined: 29-March 05
Member No.: 221



re: glitches

wrt hardware problems: it's not rocket science... err wait yes it is
wrt software problems: it's not brain surgery.. umm hang on..

ah ha ha new aphorism regarding problems with the rovers "it's not rocket surgery"
Go to the top of the page
 
+Quote Post
Tesheiner
post Nov 1 2005, 01:26 PM
Post #43


Senior Member
****

Group: Moderator
Posts: 4279
Joined: 19-April 05
From: .br at .es
Member No.: 253



Based on currenlty planned seqs for sols 628-630 (see below) and taking into account that some pancams previously planned for sol 628 just disappeared, I would say that Oppy had another hiccup.

---

628 p0767.03 14 0 0 14 0 28 navcam_7x1_az_288_3_bpp
628 p1663.01 6 0 0 6 0 12 navcam_3x1_az_108_1_bpp
630 p0767.03 14 0 0 14 0 28 navcam_7x1_az_288_3_bpp
630 p1663.01 6 0 0 6 0 12 navcam_3x1_az_108_1_bpp

----

.. or it may be an energy (read dust storm) issue.
Go to the top of the page
 
+Quote Post
Cugel
post Nov 1 2005, 03:11 PM
Post #44


Member
***

Group: Members
Posts: 153
Joined: 11-December 04
Member No.: 120



QUOTE (paxdan @ Nov 1 2005, 11:14 AM)
re: glitches

wrt hardware problems: it's not rocket science... err wait yes it is
wrt software problems: it's not brain surgery.. umm hang on..

ah ha ha new aphorism regarding problems with the rovers "it's not rocket surgery"
*


I wouldn't call something that travels at 5 mm per second a rocket...
Go to the top of the page
 
+Quote Post
mike
post Nov 1 2005, 08:18 PM
Post #45


Member
***

Group: Members
Posts: 350
Joined: 20-June 04
From: Portland, Oregon, U.S.A.
Member No.: 86



Hey, it is a software problem and not a hardware problem.. I now have further evidence for my 'any real hardware problem causes an utter failure', regardless of said hardware being in the presence of several inter-galactic cosmic particles. Of course, if said inter-galactic cosmic particle(s) somehow hit the gate(s) in the EPROM that control(s) the delay before writing to memory (and didn't break anything else), then I stand corrected.
Go to the top of the page
 
+Quote Post

4 Pages V  < 1 2 3 4 >
Reply to this topicStart new topic

 



RSS Lo-Fi Version Time is now: 27th May 2024 - 06:42 PM
RULES AND GUIDELINES
Please read the Forum Rules and Guidelines before posting.

IMAGE COPYRIGHT
Images posted on UnmannedSpaceflight.com may be copyrighted. Do not reproduce without permission. Read here for further information on space images and copyright.

OPINIONS AND MODERATION
Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators.
SUPPORT THE FORUM
Unmannedspaceflight.com is funded by the Planetary Society. Please consider supporting our work and many other projects by donating to the Society or becoming a member.