Printable Version of Topic

Click here to view this topic in its original format

Unmanned Spaceflight.com _ Forum Maintenance _ Possible intermittent downtime

Posted by: helvick Jul 31 2012, 10:54 PM

All,

We have to implement an ip-address change on the server over the next 12 hours or so. The board may be unreachable briefly in an hour or so and there may be some DNS issues over the next 24 hours. There should not be any extended outages of more than 5 minutes at any point, and I'm hopeful that there wont be any noticeable downtime at all. I've set the DNS TTL to 15 minutes as an advance step, and will be reducing that to 5 minutes for the hour or so around the change. This _should_ propagate out and make it fairly painless.

I'll post an update once the change has been made, or let you know that we've postponed the change if there are any issues that crop up and prevent us switching over.

JoeM

Posted by: imipak Jul 31 2012, 11:50 PM

CODE
;; ANSWER SECTION:
www.unmannedspaceflight.com. 14126 IN CNAME unmannedspaceflight.com.
unmannedspaceflight.com. 32 IN A 67.201.14.106
...as of 2012-07-31 23:49:25 UTC.

Mumble mumble change freeze? mumble sure everything will be fine mumble mumble.

Posted by: helvick Aug 1 2012, 05:38 AM

QUOTE (imipak @ Aug 1 2012, 12:50 AM) *
Mumble mumble change freeze? mumble sure everything will be fine mumble mumble.

If I had my way there would be no changes right now, but that's not an option unfortunately.

On the plus side a lot of care is being taken, our hosting provider is double checking things as I type, and while I can't be certain just by looking at a single example, the TTL number reported by your dig query indicates that the preparatory changes in TTL are working as they should.



Posted by: helvick Aug 2 2012, 01:40 AM

IP-address change is still a work in progress - the new address has been assigned and is working but we've had some unexpected behaviour when switching over one of the backroom sites and are looking at a root cause for that before we move the serious stuff. On the plus side the change over was virtually transparent on two other test sites so I'm confident that once we do cut over very the downtime will be no more than 5 minutes, and may be unnoticeable.






Posted by: helvick Aug 2 2012, 06:47 PM

We went offline for about five hours between ~14:00 BST and 19:30 BST because of some communication challenges between me and our hosting provider.

Apologies to anyone who was impacted, this was entirely my fault. The change-over has now been completed and there should be no further outages.

I am assured by our hosting provider that we are now on a much better network link so if we get a flood of new visitors sparked by Curiosity we are now much better equipped to handle it.


Posted by: imipak Aug 2 2012, 06:56 PM

My dear helvick, the happiness from finding UMSF is back far outweighs any minor inconvenience from a couple of hours' downtime!

Many thanks to you and everyone who pulled this off. A minor timing glitch like that is small potatoes compared to some of the nightmares I've witnessed on systems serving rather more people than UMSF. That sort of operation is never trivial, especially when circumstances are forcing your hand and the clock's ticking. (Not that I know of any large web filtering service providers that had to make pre-Olympics capacity upgrades across a dozen data centres worldwide, at the same time as moving a bunch of critical centralised (non-redundant) systems 30 miles in the back of a truck without downtime, and finally completed the job last Thursday evening, or anything... laugh.gif )

Posted by: RoverDriver Aug 2 2012, 07:00 PM

QUOTE (helvick @ Aug 2 2012, 11:47 AM) *
We went offline for about five hours between ~14:00 BST and 19:30 BST because of some communication challenges between me and our hosting provider.

Apologies to anyone who was impacted, this was entirely my fault. The change-over has now been completed and there should be no further outages.

I am assured by our hosting provider that we are now on a much better network link so if we get a flood of new visitors sparked by Curiosity we are now much better equipped to handle it.


Translated into mars.gif language: your reaction wheel.gif had a glitch and went into safe mode, so no comm windows were honored?

Paolo

Posted by: helvick Aug 2 2012, 07:17 PM

Paolo - perfect analogy. smile.gif

Technically there was no downtime on the _board_ she was chugging away wondering where all her friends had gone but we had a major fault on the comms end that needed a good solid kick to get it all lined up again.

In band remote systems admin of this type is painful - I don't know how you rover driver lunatics stay sane.





Posted by: RoverDriver Aug 2 2012, 10:32 PM

QUOTE (helvick @ Aug 2 2012, 12:17 PM) *
Paolo - perfect analogy. smile.gif

Technically there was no downtime on the _board_ she was chugging away wondering where all her friends had gone but we had a major fault on the comms end that needed a good solid kick to get it all lined up again.


It looked to me that there were no DNS entries for UMSF on ANY DNS server. I even queried a few whois servers and you were GONE.

QUOTE
In band remote systems admin of this type is painful - I don't know how you rover driver lunatics stay sane.


Who told you we are all sane? laugh.gif For us the trick is never, ever send a command you cannot recover from. For a surface mission it is much better to stop and wait for Earth to analyze telemetry than to try to do more stuff in one planning cycle. There is less time pressure than on an orbiter, or in your case a live system with users in panic mode trying to get to your site RIGHT NOW. But the remote thing is just the same (and you cannot google for an answer!).

Paolo

Posted by: Phil Stooke Aug 3 2012, 01:24 AM

Just before downtime: "UMSF? I can stop any time I want to... I'm not addicted"

Five hours later: "AAAAAAArrrrggghhhhhhh!!!!!!! blah blah blah goooo goooooo"

Three seconds after that: "AAAAhhhhh... that's better! But don't ever do that again...."




Phil (checking in from Tofino today)

Posted by: RoverDriver Aug 3 2012, 03:09 AM

QUOTE (Phil Stooke @ Aug 2 2012, 06:24 PM) *
Just before downtime: "UMSF? I can stop any time I want to... I'm not addicted"

Five hours later: "AAAAAAArrrrggghhhhhhh!!!!!!! blah blah blah goooo goooooo"

Three seconds after that: "AAAAhhhhh... that's better! But don't ever do that again...."




Phil (checking in from Tofino today)


ditto!

Posted by: climber Aug 3 2012, 07:46 AM

QUOTE (Phil Stooke @ Aug 3 2012, 03:24 AM) *
Just before downtime: "UMSF? I can stop any time I want to... I'm not addicted"
Five hours later: "AAAAAAArrrrggghhhhhhh!!!!!!! blah blah blah goooo goooooo"
Three seconds after that: "AAAAhhhhh... that's better! But don't ever do that again...."
Phil (checking in from Tofino today)


I guess old timers remember this funny topic "You know you're an UNSFer": http://www.unmannedspaceflight.com/index.php?showtopic=5430
I'm sure you could have posted there, Phyl.

Posted by: MarsCraft Aug 9 2012, 04:22 AM

Hello, first posting and glad to be able to participate. Maybe another swift kick was in order, or the system has been overworked from recent MSL chatter, but when I was activating my account after a long delay, and going through the "please help me remember my password" procedure(several times), it took two days to get the reply. I followed the instructions, and all appears to be well now.

Posted by: elakdawalla Aug 9 2012, 06:06 AM

Yep, there was a problem that we didn't discover until this morning (when someone else who had tried to register emailed me with the same complaint), but it's fixed now. Welcome!

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)