Physics causing incredibly poor performance in Agones PCVR instances

tl;dr: server performance has been extremely poor in my Agones instance (NA region) since wiping it a week ago. I think it may be caused by a bug in rotation math and/or physics interactions, and after a bit of testing, I was able to reproduce the issue in a couple instances.

Description:
I think one of the most recent server builds introduced a positional or rotational calculation bug (local rotations maybe?) in PCVR regarding both teleports and collision properties, as well as a possibly related occlusion culling and LOD bug visible in the forest. Symptoms include, but are not limited to:

  • Logging out and logging back in again resulting in being transported to locations multiple meters away from the original source (that were not previously visited), or being teleported back to spawn unexpectedly (presumably due to the previous position being invalid).

  • Excessively high server tick latency after loading certain overworld chunks, impacting the whole server. A confirmed source is objects out of position interacting with the physics engine, which I was able to reproduce with Prefabulator (see below).

  • Forest chunks levels of detail layering incorrectly, in incorrect positions and rotations. (See below.)

  • Certain attachments acting in undefined ways (typically, jettisoning themselves away at high speeds) after a weapon breaks or grab is released through another object, in one instance crashing the entire server.

I’ve played ATT for awhile and this all seems new (even compared to previous Mines and Forest lag).

Reproduction Steps:
I was able to verify this in a toy example, through the following steps:

  1. Crafting a one-sided craft piece (Craft Piece Side Flat 1Way prefab), attach it to a medium handle (Handle Medium Standard).

  2. Using Prefabulator, select the craft piece, then move it a small ways (~0.1 meter)

  3. After doing this, the server lagged excessively in a similar way to the latency we were experiencing until I deleted the craft piece. (Which is to say, we narrowed it down to physics being the apparent culprit.)

I only tried this after noticing other, related physics anomalies that appeared on the server naturally after a save wipe, which I’m pretty sure aren’t related to my own tinkering:

  1. Teleports home and logout-login cycles having undefined effects. In particular, players being teleported home unexpectedly or into unexpected locations when returning online.

  2. Certain attachments jettisoning themselves in unexpected directions at incredibly high speeds, especially when a handle is broken.

  3. Certain collidable objects rendering correctly, but having unpredictable physics that incredibly lag the server.

  4. LOD layers and occlusion in the forest acting in unpredictable, spastic, and flickering ways, with rotations and orientations of objects at lower levels of detail being visibly incorrect.

If I had to guess (and if these bugs are related at all), I think a recent patch may have introduced a rotation bug somehow, impacting each object’s collision mesh or position. It doesn’t seem to affect prefab part locations or rendering outside of occasional ghosting (those seem fine, as tested with Prefabulator), and it doesn’t seem to be causing prefab placement issues (like earlier bugs in the Mines), but does seem to be impacting physics interactions and player teleportation.

Unfortunately, the result is server performance tanking to unusable levels even after I force a restart (eg, by logging everyone off and letting the instance spin back up), which is then exacerbated by the login bug and the collidable object bugs described above.

Server: Goldkin’s World (my personal private server)

Time: Persistent over the past week from the timestamp of this post

Discord Username: Goldkin

Folks on meta Discord were interested in why I called this a possible rotation bug, so I wrote some additional notes about it over there in #server-owner-talk.

Copying over why I’m guessing this is a rotation vector bug:

Yeah, all of this is new, and from the perspective of “this worked before, how could this be going wrong”, it’s just my hypothesis, so I don’t actually know what’s going wrong

But if I had to take a guess from my own noodling with game engines and rotation vector or quarternion maths in the past, it felt to me like some vector wasn’t getting translated to local or from a local rotational frame properly, which would explain the behavior of weird offsets and directional vectors coming out wrong in a recent update

If they’re not all independent bugs (I can’t see the code to verify), this feels like a single line of code was changed in one of the libraries that translates to or from local rotational frame before additional vectors are applied, perhaps during a code cleanup pass. At least, that’s my experience from writing similar game engine math in the past.

If I’m not completely off base with these being related, maybe that’s worth checking?

I tried reproducing these tonight on my server. I was able to reproduce the teleport bug by logging out and logging back in at the Smithy. However, I could not reproduce the extreme performance degradation, nor the LOD rotation bug in the particular chunk I found it in the forest that we experienced yesterday. In fact, things seemed pretty speedy this evening despite an extended mines dive with many enemies clogging up cycles.

Another player on my server reported also having seen the LOD rotation issue by logging in and out of the quarry. They reported that leaving and reentering the chunk fixed it for them.

Given this, perhaps the latter issue is kicked off by client desync somehow. If we’re able to reproduce it again, I’ll attempt to capture it in OBS.