Jump to content
  • DEVORD #010


    Development Update - 24JAN2022

    It's been a while since our last update and we've certainly had our work cut out for us. We have been rather radio silent over the past few weeks due to obvious issues regarding our game servers and the massive amount of time that was sunk into investigating, analyzing, comparing numbers, stress testing, pretty much the entire A-Z. But we'll keep that for last.


    Our website has received some quality of life upgrades regarding the operations module. Things like sending out notifications, some automation, sending automated emails,... On top of that during exodus we also sent our first newsletter! This may seem like an easy task but technically there's a lot that can go wrong when sending thousands of emails at once. And stuff definitely did go wrong. Our host limits the amount of emails we're allowed to send, and also blocks communication to other mail servers so we couldn't use an external one. COE doesn't take no for an answer and wrote a custom module to integrate with an AWS email service to send bulk mails.


    Our modpack has also received huge updates. Most noticeably we've done a giant cut in modpack size. Removing a total of 10 mods allowed us some extra server overhead as well as resolving client FPS issues. Our custom mod has also received quite some love. The Yellow Carrot FTX required a custom faction to be created. The "Livonian Army". They are a mix between mercenaries and Russians. Basically some up-armored and better trained terrorists.



    Alright, so the elephant in the room, and the ones causing the most damage to our day-to-day operations. We've been holding off on communicating to the rest of the unit until issues were resolved but the time has come.

    To find out the cause, we must first understand some basic concepts, like network packets. Network packets can be visualized the same way postal packets work. They have a destination address, some information about itself, and some data. To simplify things, each action a player performs has to be sent to the server in the form of a packet. The server does some checks on it, processes it, creates new update packages and then sends them to all other players. There are 2 major factors in play as to how many packets can be processed. The CPU decides how fast these can be processed, and bandwidth decides how many and how big these can be. Imagine it as a toll booth. The toll booth (CPU) has a certain amount of cars they can process per hour, while the highway its built on (bandwidth) can move a certain amount of vehicles per hour.

    Packet Switching vs Circuit Switching | Discover the Difference Between  Circuit Switching and Packet Switching - Apposite Technologies

    During our exodus changes were made to benefit client performance. Sending more network packages and trying to optimally balance our server resources. Long story short, we optimized for bandwidth usage instead of CPU available. The massive amount of bandwidth that our servers boast could simply not be used because the CPU was having issues preparing, scheduling and sending out network packets. This lead to extreme slowdowns in server FPS with some missions running entirely sub 10 server FPS. At this point both the server and the clients need to predict what is going to happen.

    For example if a player is running forward, and then quickly turns left, the server and other clients will just continue to predict that the unit is moving forward. Once both catch up the player will teleport on other people's screens to the actual location. Above 20FPS the clients and server don't do much predicting, so everything runs quite smoothly. Sustained periods below 20FPS result in this predicting behavior by the server. If the server never has a break and constantly needs to keep predicting, things are eventually going to get entangled and it hits a breaking point. Usually with the duration of our missions, it's not a big deal. But there are some cases where this becomes a real problem and leads to a complete network standstill (imagine a big traffic jam but vehicles are trying to drive in both directions on the same lane).

    Striking workers take over Tlalpan toll plaza, snarling Easter traffic

    So, we opted to make our highway as big as possible, whilst the toll booth could only handle a few lanes at a time. When a lot of action was taking place in-game, the CPU simply couldn't process all the packages, and process everything going on in-game (like physics calculations, positioning, hit detection,....)  at the same time. The big contributor to amount of packages being generated is explosions, and especially big ones with ACE Fragmentation on them. This combination of factors is what ultimately lead to a massive network jam. The server was still on, it was still processing things, but the delay created by these millions of packages just meant that everyone got kicked of the server. But when investigating, the server was happily chugging along as if nothing happened in the first place.

    We've now reverted and fine tuned the server to be more balanced. We first had 2048 max messages being sent per frame, and now we've only got 96. This doesn't mean that the server is sending less data, it just means that all the packages that are going to the same destination are being grouped together more often. Imagine we were first using small postal vans, often having them go to the same location, and now they're grouped together in large cargo trains.

    LEGO Cargo Train Review! It's AWESOME! - YouTube

    After this long investigation, which took multiple weeks to fine-tune, revamp, retest, we finally managed to find a good balance which gives us a major boost to server FPS. The last 2 weeks we've been running in the 20-30 range, whereas before it was not uncommon to run an entire mission in the 5-15 range. Last week's crash is still a bit of a mystery, since the changes were already implemented, but the crash coincided as a large explosion went off so we'll write it off as the same issue.



    With the server being more-or-less stable again, we hope to neutralize our focus again and pick up the slack in the other sections of our DMOS. We've got some exciting stuff on the drawing board which I can hopefully reveal more of soon.



    Thanks for reading!
    1LT J.Drake, OIC
    SGT C.Winters,
    PV2 P.Liwa

    Share this post

    Link to post
    Share on other sites