This week has been a bad one for region restarts. There was a problem with the update rolled out to the main grid. The update had to be rolled back later in the day. This means most of the upgrades running in the release channels didn’t upgrade. So, what happened?
Since this release has been through QA and came from a release channel, how did it get past the testing and yet be so bad it had to be rolled back?
Oskar and Steven Linden explained at the recent Beta Server User Group and Maestro in the Deploys thread in the forum what happened. It seems a region would crash and begin its restart, which is what it is supposed to do… well… not the crash part. The new problem triggered the simulator to restart all the regions in the host. In the logs this looked like an estate manager restart of the non-crashed regions. Since no one reported problems during the test week on the release channel and the restarts didn’t register as a warning in the grid monitoring reports, no one noticed the problem. It was too small to see in the release channel.
When the update rolled to the main grid the problem scaled up the number of restarts. It became obvious that something was wrong.
The problem turned out to be a tiny bit of the region crossing improvements code that miss set a flag when a region crashed and triggered a general sim restart. Steven Linden said, “…it’s been traced to checks we put in place to prevent the ‘Failed to grant capabilities’ errors when teleporting.” Within a few hours on the main channel the problem became apparent and the Lindens decided they had to roll the update back.
The initial roll out started at 5am, and the second, the roll back, finished at 8:30pm. So, it was not a good day for the grid.
Kernel Upgrades Start
This week a temporary channel was created. It is labeled RC KT… Release Channel Kernel Test? The channel is for rolling out a new kernel to the OS running the simulators and backend support services.
This upgrade has been running on ADITI. Now testing starts in a part of the more diverse main grid. The RC KT channel runs the same simulator code as the main channel. The only difference is the kernel change. Once this change passes testing and the Lindens can collect performance stats, they will start to roll it to the rest of the grid… provided it fixes the problems and improves performance as they anticipate it will.
Kernel upgrades will be occurring between now and 12/25. Yes, this will mean extra restarts. It also means faster hard drive performance on the hosts with the upgrade; at least it did on the ADITI grid. The Lindens expect the upgrade to have the most impact on homestead regions.
Plus it fixes problems introduced by the previous kernel upgrade. If you don’t know or have forgotten, the previous upgrade was made to fix the Time Warp issue. Evidence is that problem has been fixed.
The Kernel Test channel was created because the grid monitoring tools cannot detect kernel versions, just as users cannot detect kernel versions. The channel facilitates collecting metrics on the kernel change to find improvements and regressions. Once the need to collect these metrics passes the RC KT regions will be rolled back into the other channels.
With the coming holidays there will be some No Change Windows. These are periods when no new code will be rolled out.
Oskar Linden said that in the week of the 19th and the week of the 26th there will be no code shipped. So, from late December 16 to the end of the year we will see no new upgrades rolling out. This does not mean there will not be restarts. The Lindens do weekly restarts to clear minor memory leaks and others problems. Expect Tueday restarts to continue through the NCW periods.
This update is the region crossing foundation changes from previous weeks with may be some new fixes included.
The upgrade in this channel remains the same.
This channel keeps running the release that was rolled back, server maintenance and region crossing stuff. More fixes have been added. One of which is a problem with Premium Members not being able to enter Premium Regions. There is a rumor that the problem is caused by viewer version with older viewers being unable to TP to premium regions. That is not true.
Kelly Linden said there is something like 50 fixes related to region crashes included in this test.
The new llTransferLindenDollars() function is active in Le Tigre.
Pose Ball Problem
On Le Tigre there is a known problem SVC-7499 – Pose ball rezzing items are either not giving pose balls or rezzing balls in wrong positions. So, if your sex bed is broken, blame Kelly… it wasn’t intentional. Simon Linden said, “The bug is pretty esoteric, and relies on some specific events and the order they happen.” Kelly Linden said, “Well, the problem is that for some scripts the extra timer event means they think they have timed out waiting for a response before they’ve had a rez event.”
Temp Rezzers are seeing new problems from this bug. The Temp Rezzers that attempt to defeat a regions’s prim limits are now also spamming on chat channel 0. So not not only do they lag the servers they annoy people. Right now the problem is seen in regions in the Le Tigre channel. The related JIRA item SVC-7500 – Scripts running Riot after latest LeTigre Deploy is drawing lots of attention.
Andrew Linden is looking at the problem and the issue of Temp Rezzers. The Rezzers are considered an abuse of the SL system. Since the Temp Rezzers use a legitimate rez function necessary for other use in SL, think bullets, the blocking of Rezzers gets complex.
Kelly Linden said, “Oh, another note about the poseballs/timer events: scripts compiled to mono should not be effected.” I think that means one can avoid SVC-7499 by using Mono.
A resident was in the Friday meeting trying to find a solution for the problem of objects rotated 180 degrees changing rotation by ±0.5 degrees when linked. The Andrew Linden will try to find time to look at the problem.
Group Liability Setting
Groups paying tier have some problem with a setting for new members. This affects all groups. But, I suspect groups made for renters and others associated with land use suffer most from a problem in which new group members get L$1 changes to pay for group related tier. See: SVC-378 – Role ‘Everyone’ in new groups should not have ability “Pay group liabilities and receive group dividends.”
The fix is to change the default behavior for those joining a group. This fix is in process somewhere within the Lab. If you have an interest in this fix get by can click WATCH.