In a river, when you remove a rock, the water flows in a new way, down to the new path of least resistance. Computer programs are the same.
Removing a bottleneck or a hotspot causes cascades and ripples across the entire performance picture. In other words, every change affects the flow of the program. Your old measurements, and therefore your analysis, may no longer be valid.
After a successful frame rate increase, it’s time to begin the process again. You may need to create a new benchmark, but you can always repeat the detection process and begin your search for the new, slowest part of your game.
Hotspots and Bottlenecks
When you optimize systems made of multiple concurrent elements, you’re dealing with a digital assembly line. Data flows from CPU to GPU, and memory accesses are fulfilled while you process. When everything flows smoothly, the system is harmonious. But if one piece must work too hard and others have to wait, there’s a discord.
Hotspots in the context of optimization are places in the code that consume more than their fair share of execution time, memory bandwidth, or function calls. Typically, a hotspot is a small in code footprint, but big in performance impact.
When optimizing, you want to find the hottest parts of your program and change them so they run cooler. Then other parts of the code become the new hottest part, and you can optimize them in turn.
Any system with multiple elements, such as a computer with multiple cores and GPU, along with deep pipelines with multiple subunits in those processors, has a bottleneck somewhere, and as optimizers the goal is to minimize the impact on performance.
There are three important trade-offs to consider when you are optimizing. Performance versus storage, accuracy versus speed, and development time versus complexity.
The first and most important is performance versus storage. It’s often possible to trade calculation time against memory usage. The example is memoization, a term coined by Donald Michie in 1968 to describe a technique where you cache recently calculated results to avoid recomputing values. Memoization replaces expensive calculations with a quick memory read. You use more memory, but save CPU.
The second trade-off is accuracy versus speed. Sometimes, you can reduce precision in order to increase performance. It plays a role in the data that you use to render too. You can use lower polygon meshes, as they consume less screen space. A lower detail isn’t as accurate as the highest resolution LOD, but it consumes fewer resources. You can also reduce texture resolution.
The final trade-off is development time versus complexity. Frequently, there are changes that will yield a speedup but impact the maintainability of code. Adding a complex image-based caching system to your renderer might speed things up, but getting it to work properly in all cases might be a nightmare.
As you optimize, these trade-offs give you several important angles from which to attack the problem. The most elegant solutions will give you exactly what you want without having to make any trade-offs. But when performance is your goal, understanding your arsenal is the key to winning the war.
Levels of Optimization
When searching for opportunities to improve performance, the answer to what you optimize may lie in the level of optimization. And it’s categorized into three levels: System, Algorithmic, and Micro.
The system level of optimization focuses on the resources of your machine and their use in your game. At the system level, you’re looking at broad elements of the computer, and you’re concerned with three things: Utilization, Balancing, and Efficiency.
[fruitful_tabs type=”default” width=”100%” fit=”false”]
[fruitful_tab title=”Utilization”] Describe how much of a resource’s time is spent doing productive work. [/fruitful_tab]
[fruitful_tab title=”Balancing”] When you move work from resource that is over utilized to one that is idle. For instance, as GPUs grew in power, games had to switch from carefully considering whether each triangle would be visible to sending large batches of geometry. GPUs became so fast that now it’s often cheaper to let the GPU determine visibility by drawing geometry than it’s to cull individual triangles. [/fruitful_tab]
[fruitful_tab title=”Efficiency”] Each part of the computer excels at a specific task. Part of system level optimization is making sure that each task is done with the right resources. Runtime data structures should be kept in RAM, not on mass storage. Massively parallel operations like rendering should be done on the GPU, not CPU. Something that takes 50% of CPU of frame time might only take the 1% of GPU.e [/fruitful_tab]
It’s easy to write a program that will use 100 % of a system’s resources, but it’s unlikely that this will translate into compelling game. You have to put your money on the screen. If resources aren’t being spent on making a better game, then they are wasted. The purpose of a game is to be fun and sell lots of units, not maximize the CPU usage.
The ideal scenario is a system that is completely balanced, fully utilized, and highly efficient.
Algorithmic level optimizations are the choices you make in the algorithms and data structures that make up your application. Algorithmic level optimizations focuses on removing work using better and typically more complex algorithms.
One of the best ways to analyze a program at the algorithmic level is to use a tool that will show you the call hierarchy of your code. A call hierarchy will trace what lines of code call others, building a tree rooted at the main loop of your program. Additionally, it will tell you how much time was spent in each part of code, as well as how many times it was run. By looking at this information, you can quickly identify subroutines that are taking the most time and target them for optimization.
Algorithmic level optimizations are crucial. Good algorithmic level optimizations can trump the best-micro-level or be more robust than system level changes. Removing work is an optimization that will benefit all hardware.
The stereotypical optimization work happens at the micro-level. Pouring over assembly listings, checking instruction timings and pipeline stages, using obscure hardware features to gain small wins. Micro optimizations are line by line optimizations. Concept such as branch prediction, loop unrolling, instruction throughput, and latency are all considerations when optimizing code line by line.
Micro optimizations can give big wins for inner loops, which are small sections of code that run many times. Suppose that you ran a routine every frame that sets each pixel on the screen to color, the line of code inside the loop accounts for 99 % of execution time, and as a result, finding a way to speed that one line up could yield a big boost in performance.
Micro optimizations can be hard to maintain, because they often rely on obscure knowledge and specific compiler or hardware behavior. It isn’t uncommon to have a game still under development that will run flawlessly on one machine and crash on another. The critical difference may be drivers, hardware or both.
Micro optimization are more reliable on fixed platform such as consoles. Be careful if you’re working on the micro level for PC game because taking advantage of the latest greatest feature might result in game that only runs on the latest greatest hardware.