Sol 22 anomaly, File system problem |
![]() ![]() |
Sol 22 anomaly, File system problem |
Jun 19 2008, 11:17 PM
Post
#16
|
|
|
Newbie ![]() Group: Members Posts: 8 Joined: 7-January 07 Member No.: 1568 |
Google is your friend. http://blogs.windriver.com/deliman/2008/05...ou-watch-i.html confirms that Phoenix uses VxWorks 5.2. As a side note, a good example of what can sometimes make spacecraft software difficult. 5.2 was released around '95, I think? The RAD6000 can go up to at least 5.3.1, but VxWorks is now up to 6.6 or so. Newer boards such as the RAD750, the LEON3, etc. reach into the 6.x range, but you're still usually a few revs (along with the corresponding features and bug fixes) behind. |
|
|
|
Jun 19 2008, 11:34 PM
Post
#17
|
|
![]() Dublin Correspondent ![]() ![]() ![]() ![]() Group: Admin Posts: 1772 Joined: 28-March 05 From: Celbridge, Ireland Member No.: 220 |
45000 data items is a lot of data. Now you can generate that very quickly even on a slow [my terrestrial terms] cpu on Phoenix if something goes very badly wrong but let's assume for a moment that it is just something that is polled regularly that is responding all the time with an interesting data item rather than an expected "nothing to see here, move along" message.
~45k = 1 item every 30 seconds for 15 sols. What happened ~ 15 sols prior to the event being discovered on Sol 22? Sol 7 was the first dig IIRC , could this be something to do with that? Alternatively if it was 1 item every second or so it would have started ~12 hours prior to the the anomaly being properly noticed. That could mean that one of the Atmosphere experiments that was being carried out on Sol 21 triggered something - I'm reminded of the problem that led to Spirit getting stuck on the side of the Columbia Hills way back around Sol 300 or so - a navigation bug IIRC that had something to do with excessive tilt (or am I badly mis-remembering things). Glad to see that the recovery process has led to lots of science data being returned, nice silver cloud that. And finally let me add my name to the list of those who's debugging code has gone haywire - in my case I ended up sending myself about 22K e-mail messages in a few minutes when I failed to realise that the machine I installed the alert on did not support the "sleep" command by default. |
|
|
|
Jun 20 2008, 03:25 AM
Post
#18
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 1061 Joined: 13-September 05 Member No.: 497 |
...you're still usually a few revs (along with the corresponding features and bug fixes) behind. The core set of VxWorks functionality is so small that I don't know that we're missing that much. Sometimes I'd be happier if they didn't keep "upgrading" things. -------------------- Disclaimer: This post is based on public information only. Any opinions are my own.
|
|
|
|
Jun 20 2008, 03:53 AM
Post
#19
|
|
|
Newbie ![]() Group: Members Posts: 8 Joined: 7-January 07 Member No.: 1568 |
It's mostly the bug fixes along with the fact that it's difficult to find support (or people that are still using) versions of VxWorks that are that old. Although I have to say I miss Tornado (but, then again, I still use vi so what do I know!).
EDITED: Removed what was upon reflection an unfair comment about DOS file system. |
|
|
|
Jun 20 2008, 03:28 PM
Post
#20
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 1008 Joined: 29-November 05 From: Seattle, WA, USA Member No.: 590 |
It's mostly the bug fixes, particularly earlier versions of the DOS file system, along with the fact that it's difficult to find support (or people that are still using) versions of VxWorks that are that old. Just so no one is confused, when you say "DOS" you don't mean the old Microsoft product (nor the even older IBM product). This is a product of Wind River, a well-known maker of "real-time" operating system software. Roughly, real-time just means any OS operation has a guaranteed time to completion. Things like space probes need that, but consumer products don't. It's hard to design, hard to build, and hard to program. (A friend of mine using a different real-time product once told me he'd decided it was called "real-time" because "you have a real time getting it to do anything!") Microsoft never made a real-time operating system -- not even Windows CE, which I helped build. So whatever is wrong on Mars, it wasn't my fault! :-) Seriously, I doubt this is Wind River's fault either. The accidentally-turned-on-debug-statement theory sounds plausible to me. --Greg |
|
|
|
Jun 20 2008, 05:07 PM
Post
#21
|
|
![]() Member ![]() ![]() ![]() Group: Members Posts: 375 Joined: 3-August 05 Member No.: 453 |
One model of SUV has a badge on the back that states "real-time 4WD"; that always makes me laugh...
Airbag |
|
|
|
Jun 20 2008, 05:59 PM
Post
#22
|
|
|
Newbie ![]() Group: Members Posts: 8 Joined: 7-January 07 Member No.: 1568 |
Seriously, I doubt this is Wind River's fault either. The accidentally-turned-on-debug-statement theory sounds plausible to me. --Greg Yup, I agree that it wasn't Wind River's fault nor did I intend to imply that. My point was more for the benefit of readers who may not know that many of these spacecraft run with older operating systems and older hardware. Software has bugs whether it be written by the application developer or by a vendor, it's a fact of life, and later versions of software tend to correct bugs in previous versions (with the hope of not introducing more bugs--which I've done more than once |
|
|
|
Jun 20 2008, 08:42 PM
Post
#23
|
|
|
Junior Member ![]() ![]() Group: Members Posts: 73 Joined: 17-May 08 Member No.: 4114 |
Just so no one is confused, when you say "DOS" you don't mean the old Microsoft product (nor the even older IBM product). This is a product of Wind River, a well-known maker of "real-time" operating system software. Drifting into OT computer trivia, the filesystem in question was FAT, which originated with MS/PC DOS and is referred to in vxWorks as the "DOS filesystem". Various flavors of FAT are very popular for all kinds of embedded applications. A description of the spirit anomaly can be found here: http://trs-new.jpl.nasa.gov/dspace/bitstre...1/1/04-3354.pdf |
|
|
|
Jun 21 2008, 11:24 PM
Post
#24
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 1008 Joined: 29-November 05 From: Seattle, WA, USA Member No.: 590 |
the filesystem in question was FAT, which originated with MS/PC DOS Interesting -- I wouldn't have guessed that Wind River would reverse-engineer the old FAT filesystem, but it makes a lot of sense when you think about it. That means you could pop their disks into an old floppy-disk drive and read them with plain old MS-DOS. These days, I guess you'd want to use a USB drive or equivalent. Anyone know if that's actually the case? --Greg (please tell me it doesn't just "text" you the data!) |
|
|
|
Jun 22 2008, 07:57 PM
Post
#25
|
|
![]() Member ![]() ![]() ![]() Group: Members Posts: 643 Joined: 23-December 05 From: Forest of Dean Member No.: 617 |
No need for mad RE skillz; just license it from MS.
-------------------- --
Viva software libre! |
|
|
|
Jun 23 2008, 11:45 AM
Post
#26
|
|
|
Member ![]() ![]() ![]() Group: Members Posts: 162 Joined: 15-August 07 From: Shrewsbury, Shropshire Member No.: 3233 |
It's mostly the bug fixes along with the fact that it's difficult to find support (or people that are still using) versions of VxWorks that are that old. Although I have to say I miss Tornado (but, then again, I still use vi so what do I know!). I talked to someone who used the VxWorks 5.2 flash based (DOS compatible) file system 10 years ago in a flight data recorder. They told me that they found a number of problems with that release of the file system and eventually obtained the source code to make some fixes themselves. They fixed file defragmentation code and speeded up file deletion code. One thing that they found, and this is the reason for my post, was that the file system slowed down considerably when there were lots of files in any one directory. I can therefore see why storing 45,000 files in one directory might make Phoenix's software run very slowly. I imagine that NASA must be currently be deleting files in Phoenix's the file system much as they did in Spirit's flash in order to work around the Spirit SOL 18 flash file system annomaly. I can see an argument for moving on beyond VxWorks 5.2 to take advantage of a more robust flash file system in future NASA missions such as the MSL rover, although as was suggested software upgrades can bring their own problems. |
|
|
|
Jun 24 2008, 05:08 PM
Post
#27
|
|
![]() Bloggette par Excellence ![]() ![]() ![]() ![]() Group: Admin Posts: 3982 Joined: 4-August 05 From: Pasadena, CA, USA, Earth Member No.: 454 |
An update from Barry Goldstein that I understand a little bit less than the first update. Discuss!
QUOTE ('Barry Goldstein') It was a problem we'd identified a while ago and we were starting to work a fix for it. It was associated with when we saved when we go to sleep at night, the way we save the packet sequence numbers in the file system and what's supposed to happen is we're supposed to mask off the lower 12 bits, and what happened was we had identified that and had started working a patch to fix this, we knew the symptom, when it happened it would generate duplicate packet sequence numbers. We knew the system could operate that way but we were worried about what would happen, all the permutations. So what happened on sol 22, we actually had one of those issues occur where we basically generated duplicate sequence numbers. It just so happened that morning when we uploaded the sequence for that morning we included those same packet deletes, we do that every morning. And we deleted just enough packets such that because of the other problem we ended up having the file system configured where there were two consecutive packets with the same ID. If we hadn't sent up that exact number of packet deletes this wouldn't have happened. When we did that, we had an unintended consequence. It normally shouldn't happen, if we had corrected the masking issue it would not have happened, but when we ended up with two packets with the same sequence number, our team went to work looking at it, we found a bug in the code that generates packets that if that happens, you end up getting into an infinite loop generating the same packet ID. So as you recall we generated over 45,000 packets with the same sequence number, so because of the first bug we generated a condition where the second bug was exposed. So the bottom line is, yesterday we completed the patch for the first problem and we uplinked (I believe) the patch to the system to get rid of the first bug. And we're going to have a discussion today to see if we're ready now to release the use of the flash back to the science team, because we've now eliminated the source of the problem. The consequence is still there until we finish the other patch, but it shouldn't happen now, so we'll have a discussion and make a decision on whether we want to release that or wait another couple of sols until we get the second patch uploaded. --Emily -------------------- |
|
|
|
Jun 24 2008, 06:29 PM
Post
#28
|
|
|
Member ![]() ![]() ![]() Group: Members Posts: 162 Joined: 15-August 07 From: Shrewsbury, Shropshire Member No.: 3233 |
An update from Barry Goldstein that I understand a little bit less than the first update. Discuss! --Emily Masking the lowest 12 bits of the packet sequence number would cause all except the lowest 12 bits of the packet sequence number to be thrown away. The packet sequence numbers 0 and 4096 would both generate a new packet sequence number of 0. This is because decimal 4096 is binary one followed by twelve binary zeros. As a result of the masking operation, the one would be thrown away. It might reasonably take 22 sols for Phoenix to transmit its first 4096 packets. After 22 sols following the "masking" operation, Phoenix would allocate packet 4096 a packet sequence number of 0 which would generate the first duplicated packet sequence number. I find fragments of information about space software problems both interesting and frustrating. It said on twitter that Phoenix's software is not Open Source. From my point of view I would like lander software to be Open Source. I am sure that there would be benefits to both NASA and ESA if Mars Rover software development was turned into an Open Source project. I think that EDL software might be the only software that needs to be classified. |
|
|
|
Jun 24 2008, 06:40 PM
Post
#29
|
|
|
Administrator ![]() ![]() ![]() ![]() Group: Chairman Posts: 13272 Joined: 8-February 04 Member No.: 1 |
. From my point of view I would like lander software to be Open Source. http://phoenix.lpl.arizona.edu/blogsPost.php?bID=51 http://phoenix.lpl.arizona.edu/blogsPost.php?bID=42 specifically : "Also, the MET team is not allowed access to commands that interface directly with the lander. This means they actually don't have access to the MET_ON and MET_OFF commands! Because those are the ones that interact directly with the lander!" Not only is the lander software a highly lucrative commercial product of Wind River, much of the lander software falls under ITAR, to an obstructive degree. Doug |
|
|
|
Jun 25 2008, 01:51 AM
Post
#30
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 1061 Joined: 13-September 05 Member No.: 497 |
Not only is the lander software a highly lucrative commercial product of Wind River... Strictly speaking, it isn't. The operating system is VxWorks. The stuff that runs the mission is essentially an application that runs on top of VxWorks and is not encumbered by Wind River (as far as I know -- IANAL.) If somebody wants to write an open-source version of VxWorks, that'd be swell, and not all that hard, since it's a very simple system that basically only provides basic interrupt handing, a task model and intertask communication primitives. But even if that happened, I wouldn't count on the spacecraft-specific code being made available. I don't think we need to get into a debate about the virtues of open source versus the alternatives on this forum. AFAIK, MSL isn't using the DOS filesystem. For the cameras, I wrote my own filesystem (the cameras don't use an OS, the software runs on the bare metal.) -------------------- Disclaimer: This post is based on public information only. Any opinions are my own.
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th June 2013 - 07:13 PM |
|
RULES AND GUIDELINES Please read the Forum Rules and Guidelines before posting. IMAGE COPYRIGHT |
OPINIONS AND MODERATION Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators. |
SUPPORT THE FORUM Unmannedspaceflight.com is a project of the Planetary Society and is funded by donations from visitors and members. Help keep this forum up and running by contributing here. |
|