Pages

Monday, 27 November 2017

Software Optimisation

In recent years we've pulled back the curtain at Rare, giving you all kinds of insights into the development of current project Sea of Thieves and many of our older classics. As part of this candid new side to our culture, and to give potential new recruits a view of the way we currently do things, we decided there was scope to offer a closer look at some of our specific disciplines and working practices.
In this column, our second software-based Tech Blog strand continues to zero in on software optimisation for Sea of Thieves, picking up from where we left off in part 1!


•    PrePhysics. One of the areas where a lot of the game logic runs. Here actors (players, camera, AI enemies etc.) are updated via their 'Tick' functions. Actors can be composed of multiple components which may also tick.
•    Physics. Here actors publish their positions to PhysX. Simulation is then performed on worker threads.
•    During (Physics). This is the work performed while waiting for the physics update to complete on other threads. This includes audio and visual effects that will be unaffected by physics. This is followed by a short gap which indicates that the physics simulation on other threads had not finished before the completion of game work that was able to run in this block.
•    End Physics. Once the physics simulation is completed, physically driven actors can be updated. In Sea of Thieves, these are primarily actors attached to ships. Each of the stacks in this category corresponds to one of the three ships.
•    PreCloth and PostPhysics. In Sea of Thieves we do some object updates after physics for characters that can be potentially attached to ships. These categories include the movement and animation logic (coloured yellow).
•    Redraw and Wwise. These handle publishing state changes and events to the rendering and audio device. 
Richard: In this next part of our series of articles on performance and optimisation in Sea of Thieves, we're going to be taking a look at measuring the performance of the game and some samples of profiling data we've collected in the past.

How We Measure the Performance of Sea of Thieves
As with most game engines, there are two general ways in which the performance of a game running on UE4 can be inspected: tooling from within the engine itself, and external tools that are applied to the game's process when it's running.
UE4 has the facility to gather timing information for many areas of the engine through its own stats system. This stats system is easy to use and simple to implement within our own UE4 code, but there is a significant overhead when the statistics are being gathered, and only CPU statistics are available.
For analysis of GPU costs or more in-depth analysis of CPU performance we use the PIX tool which is available for Xbox or PC – if you are using DirectX 12. If your game or engine is not running on DirectX 12, all is not lost! A great deal of the CPU profiling functionality within PIX is provided via Windows Event Tracing, and so the same information can be obtained using the free Windows Performance Analyzer (WPA) tool. For documentation and advice on WPA, 

How We Measure the Performance of Sea of Thieves
As with most game engines, there are two general ways in which the performance of a game running on UE4 can be inspected: tooling from within the engine itself, and external tools that are applied to the game's process when it's running.
UE4 has the facility to gather timing information for many areas of the engine through its own stats system. This stats system is easy to use and simple to implement within our own UE4 code, but there is a significant overhead when the statistics are being gathered, and only CPU statistics are available.
For analysis of GPU costs or more in-depth analysis of CPU performance we use the PIX tool which is available for Xbox or PC – if you are using DirectX 12. If your game or engine is not running on DirectX 12, all is not lost! A great deal of the CPU profiling functionality within PIX is provided via Windows Event Tracing, and so the same information can be obtained using the free Windows Performance Analyzer (WPA) tool. For documentation and advice on WPA.

Sea of Thieves in Profile
When it comes to assessing the performance of the game ('profiling' it), it's extremely useful to have a reproducible test case that will allow you to identify new issues that have been introduced and reason about performance improvements that have been made. An example of one of our test cases from early in development can be seen below:

Here we have three ships anchored up next to an island, with twelve pirates in the scene. All very nice, but what does it look like under the hood? We put markers around significant chunks of work that emit events when the work starts and ends, allowing PIX to build up a picture of how the frame's work is structured. Below is a performance capture taken using PIX, showing what the CPUs are up to when the scene above is running:
So, what on earth does this multicoloured pile of upturned skyscrapers actually mean? Well, the main body of the capture shows the work that three CPUs are doing (0, 1 and 2) over time, which runs left to right. The highest level that is visible under each CPU heading is the highest level of code that we have marked up, and blocks under those are lower and lower levels of code that have been marked up and that are being called by the higher level. The colour of the blocks represents the type of work that is being done by the CPU in that block: orange is rendering work, purple is physics work, blue is 'game' code, and green is networking. Any grey areas are a default Unreal statistics marker where we have not applied our colour coding scheme.
The observant reader will notice that there's a lot of orange-coloured rendering work happening on CPUs 1 and 2, and a riot of colour on CPU 0. CPU 0 hosts the Game thread (GT), and (unless explicitly changed) most other 'general' work that happens in Unreal Engine. CPU 1 runs the Render thread (RT) and CPU 2 runs what is called the RHI (Rendering Hardware Interface) thread. The RHI thread is used to abstract away aspects of the graphics device and offload some of the rendering work to other threads, and is a feature only found in console builds of Unreal. It runs more or less in lockstep with the RT, so does not introduce further latency – for the purposes of the diagrams in part 1 of this blog, it is in the same block as the Render thread.
At the moment this performance capture shows us being 'Game thread bound', so that's the place we want to get some performance improvements that will actually have an effect on the frame rate. So, time to have a look at the Game thread in a bit more detail:
The labels here are adapted from UE4 and mean the following:
•    NetTick. You can clearly see a green block of the networking where it is receiving messages from the server, with blue blocks underneath it as we run game logic on receipt of the network messages. 

A Game of Numbers
In the diagram above, you will notice that each high-level area has a number next to it, which is the approximate time taken to run that code in milliseconds. I'll save you the time: it's a whopping 78.5ms. Our target for Sea of Thieves is 33ms, which would give us a framerate of 30fps. With a frame time of 78.5ms our crews would be feeling more than a little seasick with a framerate of just under 13fps.
Looking at the capture, the good news is that... nope. No good news. Well, one bit of good news: as everything is too expensive, we can tackle anything and see some improvement – but it makes sense to start with the largest block of work. In the next entry in the series we'll dive deeper into the End Physics section above and see what's going on inside a chunk of work that's taking nearly two thirds of our total target frame time.

No comments:

Post a Comment