Background
New State Mobile is a battle royale game developed by Krafton that was released in November 2021 and received 45 million downloads in its first month. KrafTON, Inc. is a group of independent game production firms that have banded together to provide new and entertaining entertainment for gamers all over the world.
PUBG Studios, Bluehole Studio, Striking Distance Studios, RisingWings, Dreamotion, and Unknown Worlds are all part of the firm, each with their own specialties. New State Mobile was built with Unreal Engine 4, and several attempts were made to limit the amount of heat and energy drain caused by the high GPU that comes with its unique gameplay features.
Because gamers can engage in long-range combat, the game engine must be capable of rendering scenes from a great distance. In addition, the battleground has a large number of vegetation, which causes the overdraw of this vegetation to have a significant influence on performance.
As a result, the team turned to Android GPU Inspector (AGI) for assistance in optimising the game’s GPU utilisation and eliminating bottlenecks.
What they did
New State Mobile utilised AGI to gain access to a large amount of GPU counter data and improve their GPU use as a result. With the use of AGI’s GPU activity profiling data, they were able to identify needless render passes.
They proceeded to verify the optimization progress using the GPU Counter and GPU activity back and forth to see whether they were heading in the correct way after determining which parts were consuming GPU use and memory bandwidth.
Here are a few things they learned about the game’s performance using AGI:
- Base pass optimization:The use of fragment shading was reduced thanks to depth prepass, a method that enhances the use of Early-z. Depth prepass was employed primarily for LOD0, which takes up the majority of the screen space, reducing the stress that further draw calls might cause. Using the 32-bit scenecolor format can also improve the overall render pass speed. UnrealEngine4’s default SceneColor format is FloatRGBA, which is a 64-bit format. Memory bandwidth can be cut in half by using a 32-bit format.
- Impact measured: GPU consumption fell by 7.5 percent after using depth prepass. More Fragments may be Early-Z as a result of the depth prepass. The amount of time it takes to shade fragments has lowered by 2%. GPU consumption was lowered by 5.3 percent using the 32-bit scenecolor format. Shaders Busy fell by 2%, while overall GPU reads from system memory dropped by 330 MB/s. The quantity of data written to system memory by the GPU was lowered by 78 MB/s, while the amount of data read from texture memory was reduced by 43 MB/s.
- Shadow pass optimization: When employing meshes as shadow casters, the usage of a high polygon LOD does not make a significant difference in quality. Low polygon LOD is chosen since it reduces the amount of triangles. The console command ‘ForceLODShadow’ in Unreal Engine 4 may be used to enable low polygon LOD.
- Impact measured: There were around 120,000 fewer triangles utilised for shadows. GPU utilisation reduced by around 2%, the amount of GPU memory read from the system memory decreased by 130MB/s, and the amount written from the GPU to the system memory decreased by roughly 23MBs, according to AGI’s GPU counter statistics.
- Auto-instancing: Auto-instancing lets you to combine the same render instructions at runtime and render them all at once, which may be used for both shadow pass and base pass optimization. NEWSTATE mobile was able to apply global lighting to specific objects without sacrificing performance as a result of this. UnrealEngine4 includes auto-instancing as a standard feature.
- Impact measured: The number of draw calls has been cut by 500. It lowered the number of draw calls by around 48%. The percentage of time that GPUs are used has fallen by around 3.5 percent. OpenGL was used to acquire these measurements.
Results
NEW STATE MOBILE cut its GPU use by 22% by implementing AGI. GPU use was reduced by 19% and 3%, respectively, as a result of depth prepass and shadow pass optimization. The number of draw calls and the total amount of memory read and written by the GPU from system memory were also significantly decreased.