Second Life Down
The Lindens had some excitement Tuesday. The roll of new server software started as usual. But some time in the morning logins started to fail. About 8 AM something went wrong. The chart of concurrent users from etitsup.com was given to me by Shug Maitland.
You have to correct the time for SLT/PDT as the chart shows local time for the UTC-4.5 zone for Caracas and/or St. Pierre. Add 4.5 hours to the times to get Second Life or Pacific Time.
In any event that translates to about 9± AM PDT things started to go bump. The Lindens are not talking about the cause of the problem. What they have said we sort of know. People could login but not connect to a region server. In the early afternoon the problem was corrected and more users could login.
The login process is handled by the login servers… no surprise. Those servers hand your viewer a token it can use to gain access to the region and backend servers. Something was happen to the token. For whatever reason the tokens were not making it to the region servers. Thus no connection was allowed.
The problem coincided with the server rollouts. But, the problems were unrelated. The first half, more or less, of the rollout completed. It was stopped to allow people to work on the problem and reduce load on the servers when logins were fixed.
It looks from the chart like it took about 6+ hours to find and fix the problem then another 2± hours for things to return to normal.
The Lindens aren’t willing to say exactly what was wrong so my assumption is it has something to do with security issues. That doesn’t mean SL was hacked, nor that it wasn’t, but my supposition is the Lindens feel the information could help hackers, so they aren’t talking.
Maestro Linden did give credit to Simon Linden for being the one to figure out what went wrong. So, YAY Simon and thanks.
All this has resulted in several postponements. Wednesday’s RC rollout was pushed to Thursday and Tuesday’s main channel roll out competed Wednesday.
The server maintenance planned for Thursday is postponed with a date yet to be announced. That maintenance has something to do with database maintenance, if I understand correctly. Maestro said we might see a bit better performance after the update. So, I’ll guess that means more than just shutting the server down to clean under it. A new OS for those servers? Some new hardware? We don’t know.
UPDATE: Of courses while I was writing this the Lindens were writing a post: The Recent Unpleasantness. Now we know far more about what happened.