#SL Server Updates Week 27

Since Monday was a holiday in the US there was no Tuesday roll out to the main grid. However, we did see the Release Channels rolled back and the main grid software installed there. This is just until they add a new batch of fixes to the Mesh Prep update that was on Le Tigre and the Mesh Prep update that was on Magnum.

Slowing

Week 27 is the second week for the main grid running with the new Mono2 update in place. The main grid has many more strange conditions than can be found in the Release Channels. So, problems not revealed in testing show up. In the article Second Life Server Update Week 26 I explain one of the cases where Release Channel testing did not find a problem that quickly showed up when rolled to the main grid.

Services Failure

During week 27 there were some reports of scripts still slowing. There may be a problem when a region is using up all the available script time. If you’re seeing a slow down, check the JIRA’s.

Release Channel Roll Back Timing

It often takes time for a problem to appear. With the Wednesday roll out to the Release Channels (RC) it can be late Thursday or Friday before the Lab is sure what is going on. It is not always the roll out that is at fault, even if everyone in SL thinks that is what broke things. With week 27’s July 4th holiday Lab staff was gone by the time it was known the RC was definitely the problem and needed to be taken off the grid. Since only support staff was available, that was not going to happen until after the holiday.

This is the problem of a 24×7 service and an eight to five five day week work hours. RL does impact SL.

Texture Slowing

I’ve been changing viewers and wondering why avatars are so slow rezzing. There is a JIRA svc-6760 that no one can see that deals with HTTP Texture Get (I think…). Monty Linden brought a chart to the last Server Group meeting that shows a chaotic failure happening in the servers. They recover but it can take some time.

There are 3 parallel graphs. The middle one shows what is happening on the client/viewer side. Cap refers to the top limit set on the number of Get Texture requests a viewer can make. Minimum, mean and max GetTexture-response-time in 30-second intervals during ‘an event’ are shown. You should be able to see things start slowing and getting more chaotic then spike up to long response times between 30 and 60 seconds. At some point things time out and response comes back under a second.

The top one shows what is happening server side in Internal Services. This is showing the delay accumulating in response to viewer texture requests. Something is happening between the two services.

The bottom graph is showing Throttles, Tracebacks, and Proxy Delays. Throttles are the limits being hit on the viewer side. Once the view has made 16 requests (now 8 on Firestorm) it has to wait for one of those requests to complete before making another request.

Times outs happen when the internal services cannot be reached. If you are looking closely, they appear to be trailing a bit because a time out is only displayed after it times out.

You can follow what is being done to solve the problem on the viewer side in JIRA VWR-25145.

On the server side Monty is changing the Stack Structure. I’m not sure how the ‘stack’ works in SL SIM software. Monty describes it as running 15 or so tiers of services in too few physical tiers of computers. He has a fix in the works and will roll it out in ADITI. His idea is to trade latency getting requests from computer to computer (more physical tiers) for less contention over processor time (more machines more processors).

The problem with testing on ADITI is getting it load-tested before moving it over to the Release Channels. So, expect a call for testing help in week 28. The rest of us can expect some slow loading textures until the fix is in place on the main grid.

Monty is not sure this tier thing is the complete problem. But, it should be a step forward.

Summary

This coming week 28 we should see new roll outs to the Release Channels. If I’m tracking correctly we have all three channels open. Mesh Prep and Server-Maint. will be back and perhaps a new one.

 

 

One thought on “#SL Server Updates Week 27

  1. Pingback: Så kom en detaljerad förklaring « opensweden

Leave a Reply

Your email address will not be published.