In the forum and at Friday’s Server & Scripting User Group meeting the issue of sudden crippling lag came up. People are reporting regions with lots of free script time dropping to 0.3 Physics Frames per Second (FPS). One cannot move within the region nor can they TP out of the region. After a couple of minutes people are logged out of Second Life™.
Regions were events are occurring seem to get hit more often than other regions. It may be the high number of people. But, that may simply be their getting reported more often because people are around to notice. There are a few reports here and there in the forum from people asking about the problem. Those regions are not event regions nor do they always have a large number of avatars.
Toysoldier Thor started a forum thread for reporting the problem. He attended Friday’s UG and said, “I posted a thread in the forums to report a disturbing event that seems to be happening frequently on sims with a large population of avatars on it…. All the details are in the forum. I have other venue owners that have seen the same thing – since Pathfinder was released.” (Thread)
Toy further describes it as, “The sim instantly LAGS right up and most avatars get booted out of SL…. and it seems to happen when there is a major crowd and freaky enough when the artists are switching streams.”
Simon Linden said, “We know there are some things that can push the regions into a half-dead state, where you can’t enter or leave. I’ve been working on those on and off for a few months — it’s because some messaging system gets overloaded and doesn’t recover. There are a few ways to make it happen.
We haven’t had restarts in a couple of weeks – things tend to get worse over time, so that may be a factor.”
Theresa Tennyson commented, “There are also a number of sims that have been bogging down with unusual-looking script activity. … There are sims running at half speed with 50 ms script time and less than 1% of scripts run per pass.”
Simon responded, “I’ve heard of reports like that about the script time, Theresa … we also saw one where the fps was low but the “spare time” was high — something that definitely should not happen, unless the OS kernel is not giving it time slices as expected.”
Theresa said, “There are some regions with high SLEEP time but not spare time.”
Simon responded, “I might be mixing it up … but that’s similar. Basically the OS is sleeping the simulator longer than it should.”
Basically, at this point no one knows what is up. But, region owners running events are feeling the pain first. I know that I have been seeing odd region lags from some months. I first started noticing it at the SL Birthday celebration. I was in regions with 3 or 4 avatars and the lag was horrible. The Viewer Stats wre making no sense to me. I could, however, usually tp out. Adjacent regions with more avatars had less lag. The next day or a few hours later the laggy region would be running fine, even with more avatars.
So, this may be an old problem that is getting worse. It may also be at a peak because of there being no restarts for a couple of weeks. But, whatever the case, providing more information about incidences of the problem should help the Lindens track down the problem.
UPDATE: There is also discussion of the problem in the Deploys thread.
Helping
This is one of those problems where the JIRA Change makes it hard to know how wide spread a problem is. The forum thread helps, but there really aren’t that many people reading the forum. Just notice the number of views a forum post gets, typically less than 200.
If you run into the problem, you could help out by reporting it in the forum or JIRA. To tell if you are seeing the problem open the Viewer Stats (Ctrl-Shift-1) and check Sim/Physics FPS, Time Dilation, Sleep Time (inside Time Details), Scripts Run %, Script Time, and Spare Time. What you are looking for is nonsensical stats.
Sim/Physics FPS and Time Dilation normally are 45 and 1.0 respectively. It is when those both drop to low numbers like 0.3 that the region is lagging way beyond normal. But for those values to indicate this is the subject problem the readings for Sleep Time and Spare Time should both be high, in the area of 20ms, and the Scripts Run % to be a low percentage, 1 to 5% with Spare Time above zero. These will be abnormal values. The Scripts Run % should be near 100% when there is low Script Time and high Spare Time. When the reverse is true, there is a problem.
It is also common to be kicked out of SL with this problem. It can take a couple of minutes.
So, if you are in a region and you are suddenly crippled by lag, quickly pop open Viewer Stats and have a look. If the readings are nonsensical, as I described above, note the readings and time of day and report the problem. The report should include the server information, time of day and time zone, and a short description of what was going on. An example of the server taken from Help -> About [viewer name] is:
You are at 464,297.0, 306,950.0, 38.9 in [region_name] located at
sim8665.agni.lindenlab.com (216.82.39.224:12035)
Second Life Server 12.09.07.264510
Providing the viewer stats will help too. But, most important is time of day the event occurred and the server info.
It is your choice as to whether you file a JIRA or add it to the forum thread. The Lindens will field the duplicate JIRA threads, so don’t worry about filing duplicate JIRA’s.
If you are going to make a report, be sure you know this is not just the transience lag that hits as a region fills up. If you are running an event and getting hit by this, add every event to the forum thread even if you file a JIRA.
Thanks for postnig this Nalates.
Saturday night has been a horrible night related to what seems to be clearly a bug on the server code. Sever sims were reporting problems including the very popular INSPIRE Space Park and yet another big live concert event – exact same situation – crashed when they were switching streams.
So – ppl – please post all the details about your crash….
thanks for the heads-up! In last days I ran into this problem on a couple of sims. I’ll be back to check!
To add: I have not seen the sudden crippling lag but the last couple of months I have seen a lot of avis wind up walking on air over the ocean around my store sims, whereas that had become very rare.
Well, I must say I am puzzled by this post. I have had the problem you describe for about a year. All of the sudden my avatar becomes unresponsive. I can’t edit things, I can’t move or fly, the only thing I can do is turning the avatar. After a minute or two I get kicked out. Only recently I noticed that a sign of problems was when the bandwidth would drop to almost zero.
This problem usually happened after some minutes I logged in and for several times, so that I could hardly be inworld for more than 10 – 15 minutes straight. Recently I noticed the problem doesn’t seem to occur, or doesn’t for a long time, if I drop my graphic settings from ultra with shadows on to medium.
I have always thought the problem had to do with a new router that was installed when I upgraded my connection from 10 Mbps to 100 Mbps, but now I wonder. I’ll keep an eye to the stats to see if I can replicate what you describe.
Nalates? At what time on Fridays and where does the Server & Scripting User Group meet? Is it invitational only? TY
It is a public meeting… well SL resident public.
All the meetings, times, and locations are listed here: User Groups.
Some possible good news regarding this increased rash of sims going “STALE” during larger events that has been noticed by a lot of large venue owners….
Over the past weekend I was able to witness and capture critical live data of both a very sick sim during a larger music event and then the next day I was at two far larger events (65 – 77 avatars) where these two sims both behaved very well andeasily handled the loads with low/reasonable expected lag.
In the forum thread I reported it and Qie & myself noticed a very clear and specific distinction between the extremely sick sim and the two healthy sims…. a NETWORK RELATED issue within LL’s Data Center. Qui and I are speculating that its a possible problem related to the sim instance, or the server’s kernel where the sim runs on, or network routing to/from this server. We cannot isolate deeper as we do not have access nor knowledge of the exact design of LL’s servers.
I have created an official bug JIRA related to this problem and pointed the LL JIRA BUG to the community forum thread to get all the details:
https://jira.secondlife.com/browse/BUG-355
Since Residents can no longer look at each other’s JIRAs… You can participate in the following thread on this topic….
http://community.secondlife.com/t5/Second-Life-Server/Increase-in-Instant-SIM-LAG-amp-Crashes-During-Larger-Events/td-p/1683765
Thanks.
I have been trying to get the Linden’s attention in various User Group meetings. It’s just not clicking for them. You may want to try getting a group of region owners to show up at Oskar’s Thursday meeting.
Your specific information is likely going to do more than anything else to get their attention.
Further update – my JIRA was already picked up and on the thread Simon Linden has already posted initially to some of the findings….
From Simon Linden…
” Thanks for the reports, Toy … there’s some good information there that seems to shows the simulator getting into trouble when there are network problems. ”
I think now that we have a possible smoking gun… the LL staff can focus their investigation on the NETWORK.
Having more sim /region owners post the same performance statistics during one of these extreme lag events would help us further confirm the root cause.