IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Possible intermittent downtime, Network Changes
helvick
post Jul 31 2012, 10:54 PM
Post #1


Dublin Correspondent
****

Group: Admin
Posts: 1799
Joined: 28-March 05
From: Celbridge, Ireland
Member No.: 220



All,

We have to implement an ip-address change on the server over the next 12 hours or so. The board may be unreachable briefly in an hour or so and there may be some DNS issues over the next 24 hours. There should not be any extended outages of more than 5 minutes at any point, and I'm hopeful that there wont be any noticeable downtime at all. I've set the DNS TTL to 15 minutes as an advance step, and will be reducing that to 5 minutes for the hour or so around the change. This _should_ propagate out and make it fairly painless.

I'll post an update once the change has been made, or let you know that we've postponed the change if there are any issues that crop up and prevent us switching over.

JoeM
Go to the top of the page
 
+Quote Post
imipak
post Jul 31 2012, 11:50 PM
Post #2


Member
***

Group: Members
Posts: 644
Joined: 23-December 05
From: Forest of Dean
Member No.: 617



CODE
;; ANSWER SECTION:
www.unmannedspaceflight.com. 14126 IN CNAME unmannedspaceflight.com.
unmannedspaceflight.com. 32 IN A 67.201.14.106
...as of 2012-07-31 23:49:25 UTC.

Mumble mumble change freeze? mumble sure everything will be fine mumble mumble.


--------------------
--
Viva software libre!
Go to the top of the page
 
+Quote Post
helvick
post Aug 1 2012, 05:38 AM
Post #3


Dublin Correspondent
****

Group: Admin
Posts: 1799
Joined: 28-March 05
From: Celbridge, Ireland
Member No.: 220



QUOTE (imipak @ Aug 1 2012, 12:50 AM) *
Mumble mumble change freeze? mumble sure everything will be fine mumble mumble.

If I had my way there would be no changes right now, but that's not an option unfortunately.

On the plus side a lot of care is being taken, our hosting provider is double checking things as I type, and while I can't be certain just by looking at a single example, the TTL number reported by your dig query indicates that the preparatory changes in TTL are working as they should.


Go to the top of the page
 
+Quote Post
helvick
post Aug 2 2012, 01:40 AM
Post #4


Dublin Correspondent
****

Group: Admin
Posts: 1799
Joined: 28-March 05
From: Celbridge, Ireland
Member No.: 220



IP-address change is still a work in progress - the new address has been assigned and is working but we've had some unexpected behaviour when switching over one of the backroom sites and are looking at a root cause for that before we move the serious stuff. On the plus side the change over was virtually transparent on two other test sites so I'm confident that once we do cut over very the downtime will be no more than 5 minutes, and may be unnoticeable.





Go to the top of the page
 
+Quote Post
helvick
post Aug 2 2012, 06:47 PM
Post #5


Dublin Correspondent
****

Group: Admin
Posts: 1799
Joined: 28-March 05
From: Celbridge, Ireland
Member No.: 220



We went offline for about five hours between ~14:00 BST and 19:30 BST because of some communication challenges between me and our hosting provider.

Apologies to anyone who was impacted, this was entirely my fault. The change-over has now been completed and there should be no further outages.

I am assured by our hosting provider that we are now on a much better network link so if we get a flood of new visitors sparked by Curiosity we are now much better equipped to handle it.

Go to the top of the page
 
+Quote Post
imipak
post Aug 2 2012, 06:56 PM
Post #6


Member
***

Group: Members
Posts: 644
Joined: 23-December 05
From: Forest of Dean
Member No.: 617



My dear helvick, the happiness from finding UMSF is back far outweighs any minor inconvenience from a couple of hours' downtime!

Many thanks to you and everyone who pulled this off. A minor timing glitch like that is small potatoes compared to some of the nightmares I've witnessed on systems serving rather more people than UMSF. That sort of operation is never trivial, especially when circumstances are forcing your hand and the clock's ticking. (Not that I know of any large web filtering service providers that had to make pre-Olympics capacity upgrades across a dozen data centres worldwide, at the same time as moving a bunch of critical centralised (non-redundant) systems 30 miles in the back of a truck without downtime, and finally completed the job last Thursday evening, or anything... laugh.gif )


--------------------
--
Viva software libre!
Go to the top of the page
 
+Quote Post
RoverDriver
post Aug 2 2012, 07:00 PM
Post #7


Member
***

Group: Members
Posts: 845
Joined: 29-September 06
From: Pasadena, CA - USA
Member No.: 1200



QUOTE (helvick @ Aug 2 2012, 11:47 AM) *
We went offline for about five hours between ~14:00 BST and 19:30 BST because of some communication challenges between me and our hosting provider.

Apologies to anyone who was impacted, this was entirely my fault. The change-over has now been completed and there should be no further outages.

I am assured by our hosting provider that we are now on a much better network link so if we get a flood of new visitors sparked by Curiosity we are now much better equipped to handle it.


Translated into mars.gif language: your reaction wheel.gif had a glitch and went into safe mode, so no comm windows were honored?

Paolo


--------------------
Disclaimer: all opinions, ideas and information included here are my own,and should not be intended to represent opinion or policy of my employer.
Go to the top of the page
 
+Quote Post
helvick
post Aug 2 2012, 07:17 PM
Post #8


Dublin Correspondent
****

Group: Admin
Posts: 1799
Joined: 28-March 05
From: Celbridge, Ireland
Member No.: 220



Paolo - perfect analogy. smile.gif

Technically there was no downtime on the _board_ she was chugging away wondering where all her friends had gone but we had a major fault on the comms end that needed a good solid kick to get it all lined up again.

In band remote systems admin of this type is painful - I don't know how you rover driver lunatics stay sane.




Go to the top of the page
 
+Quote Post
RoverDriver
post Aug 2 2012, 10:32 PM
Post #9


Member
***

Group: Members
Posts: 845
Joined: 29-September 06
From: Pasadena, CA - USA
Member No.: 1200



QUOTE (helvick @ Aug 2 2012, 12:17 PM) *
Paolo - perfect analogy. smile.gif

Technically there was no downtime on the _board_ she was chugging away wondering where all her friends had gone but we had a major fault on the comms end that needed a good solid kick to get it all lined up again.


It looked to me that there were no DNS entries for UMSF on ANY DNS server. I even queried a few whois servers and you were GONE.

QUOTE
In band remote systems admin of this type is painful - I don't know how you rover driver lunatics stay sane.


Who told you we are all sane? laugh.gif For us the trick is never, ever send a command you cannot recover from. For a surface mission it is much better to stop and wait for Earth to analyze telemetry than to try to do more stuff in one planning cycle. There is less time pressure than on an orbiter, or in your case a live system with users in panic mode trying to get to your site RIGHT NOW. But the remote thing is just the same (and you cannot google for an answer!).

Paolo


--------------------
Disclaimer: all opinions, ideas and information included here are my own,and should not be intended to represent opinion or policy of my employer.
Go to the top of the page
 
+Quote Post
Phil Stooke
post Aug 3 2012, 01:24 AM
Post #10


Senior Member
****

Group: Members
Posts: 6900
Joined: 5-April 05
From: Canada
Member No.: 227



Just before downtime: "UMSF? I can stop any time I want to... I'm not addicted"

Five hours later: "AAAAAAArrrrggghhhhhhh!!!!!!! blah blah blah goooo goooooo"

Three seconds after that: "AAAAhhhhh... that's better! But don't ever do that again...."




Phil (checking in from Tofino today)


--------------------
... because the Solar System ain't gonna map itself.
Go to the top of the page
 
+Quote Post
RoverDriver
post Aug 3 2012, 03:09 AM
Post #11


Member
***

Group: Members
Posts: 845
Joined: 29-September 06
From: Pasadena, CA - USA
Member No.: 1200



QUOTE (Phil Stooke @ Aug 2 2012, 06:24 PM) *
Just before downtime: "UMSF? I can stop any time I want to... I'm not addicted"

Five hours later: "AAAAAAArrrrggghhhhhhh!!!!!!! blah blah blah goooo goooooo"

Three seconds after that: "AAAAhhhhh... that's better! But don't ever do that again...."




Phil (checking in from Tofino today)


ditto!


--------------------
Disclaimer: all opinions, ideas and information included here are my own,and should not be intended to represent opinion or policy of my employer.
Go to the top of the page
 
+Quote Post
climber
post Aug 3 2012, 07:46 AM
Post #12


Senior Member
****

Group: Members
Posts: 2763
Joined: 14-February 06
From: Very close to the Pyrénées Mountains (France)
Member No.: 682



QUOTE (Phil Stooke @ Aug 3 2012, 03:24 AM) *
Just before downtime: "UMSF? I can stop any time I want to... I'm not addicted"
Five hours later: "AAAAAAArrrrggghhhhhhh!!!!!!! blah blah blah goooo goooooo"
Three seconds after that: "AAAAhhhhh... that's better! But don't ever do that again...."
Phil (checking in from Tofino today)


I guess old timers remember this funny topic "You know you're an UNSFer": http://www.unmannedspaceflight.com/index.php?showtopic=5430
I'm sure you could have posted there, Phyl.


--------------------
Go to the top of the page
 
+Quote Post
MarsCraft
post Aug 9 2012, 04:22 AM
Post #13


Newbie
*

Group: Members
Posts: 6
Joined: 11-March 10
Member No.: 5260



Hello, first posting and glad to be able to participate. Maybe another swift kick was in order, or the system has been overworked from recent MSL chatter, but when I was activating my account after a long delay, and going through the "please help me remember my password" procedure(several times), it took two days to get the reply. I followed the instructions, and all appears to be well now.
Go to the top of the page
 
+Quote Post
elakdawalla
post Aug 9 2012, 06:06 AM
Post #14


Administrator
****

Group: Admin
Posts: 5001
Joined: 4-August 05
From: Pasadena, CA, USA, Earth
Member No.: 454



Yep, there was a problem that we didn't discover until this morning (when someone else who had tried to register emailed me with the same complaint), but it's fixed now. Welcome!


--------------------
My blog - @elakdawalla on Twitter - Please support unmannedspaceflight.com by donating here.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic

 



RSS Lo-Fi Version Time is now: 17th January 2017 - 07:14 PM
RULES AND GUIDELINES
Please read the Forum Rules and Guidelines before posting.

IMAGE COPYRIGHT
Images posted on UnmannedSpaceflight.com may be copyrighted. Do not reproduce without permission. Read here for further information on space images and copyright.

OPINIONS AND MODERATION
Opinions expressed on UnmannedSpaceflight.com are those of the individual posters and do not necessarily reflect the opinions of UnmannedSpaceflight.com or The Planetary Society. The all-volunteer UnmannedSpaceflight.com moderation team is wholly independent of The Planetary Society. The Planetary Society has no influence over decisions made by the UnmannedSpaceflight.com moderators.
SUPPORT THE FORUM
Unmannedspaceflight.com is a project of the Planetary Society and is funded by donations from visitors and members. Help keep this forum up and running by contributing here.