3DMark′s Steel Nomad: The Ultimate Benchmarking Tool for Gamers and Overclockers


Summary

This article dives into the features and benefits of using 3DMark's Steel Nomad as the ultimate benchmarking tool for gamers and overclockers, highlighting its importance in making informed hardware choices. Key Points:

  • Steel Nomad benchmarks provide cross-platform comparisons, helping gamers make informed hardware decisions across different operating systems and devices.
  • The AMD 7900 XTX is evaluated as a cost-effective option for 4K gaming, delivering high-quality experiences without breaking the bank.
  • High bandwidth requirements of Steel Nomad in 3DMark tests underscore the need for ample memory bandwidth to achieve optimal performance in advanced gaming scenarios.
Steel Nomad stands out by offering thorough cross-platform comparisons, showcasing budget-friendly options like the AMD 7900 XTX for 4K gaming, and emphasizing high memory bandwidth needs for peak performance.

Cross-Platform Gaming Benchmarks: Unlocking Informed Hardware Decisions

Steel Nomad offers a more realistic and precise set of performance metrics for contemporary rasterized games, enabling gamers and technology enthusiasts to make well-informed hardware decisions. This benchmarking tool's cross-platform compatibility facilitates direct comparisons across systems with varying operating systems and hardware configurations. Such comprehensive assessments are invaluable for evaluating the performance of diverse devices, including not only desktop environments but also mobile and embedded systems. By providing detailed insights into system capabilities, Steel Nomad empowers users to optimize their gaming experiences and ensure their equipment meets the demands of modern software applications.

AMD′s 7900 XTX: A Viable Option for 4K Gaming at Budget-Friendly Prices

The 7900 XTX's remarkable performance, as observed in my testing, indicates it could be a more viable solution for achieving 60fps at 4K resolution compared to the 3080 Ti, despite being priced lower. This becomes even more compelling when factoring in the capabilities of modern upscalers that can enhance framerates at these higher resolutions.

Moreover, PCGH's extensive evaluation of 40 GPUs offers a broad perspective on the performance landscape of these graphics cards, including the 7900 XTX. The consistency between my findings and those of PCGH—despite different hardware setups—underscores the dependability and reproducibility of synthetic benchmarks in assessing GPU performance.

I primarily evaluated Steel Nomad Light on the same laptop I previously mentioned in this blog. Additionally, I extended my tests to various other GPUs, including those in devices as compact as my smartphone.
The main focus of this evaluation centers on the laptop's discrete GPU, specifically the entry-level RTX 3050 Mobile, which represents a reasonably up-to-date option in today's market. This GPU comes equipped with just 6GB of RAM, precisely meeting the requirements set by Steel Nomad Light. Unfortunately, it struggles significantly when subjected to standard 1400p tests; while at 1080p, it barely surpasses the 60fps threshold but frequently dips into the low-50s range. However, at a resolution of 720p, it achieves an average frame rate of around 102fps. With DLSS/Quality settings enabled, it could potentially upscale to perform adequately at 1080p. Despite its limitations, the RTX 3050 Mobile remains viable for maintaining a locked 60fps in rasterized AAA gaming scenarios through upscaling techniques.

Exceptional Graphics Performance

The integrated GPU of the 7840HS demonstrates a remarkable capability to run modern games at playable frame rates. Utilizing Quality-mode upscaling and maintaining a locked target of 30fps, it handles even resource-intensive AAA titles efficiently. This performance is particularly impressive when considering games that demand high frame rates, such as competitive shooters and graphically demanding titles.

Moreover, the scalability of Steel Nomad Light on higher-end PC GPUs showcases its potential to leverage enhanced graphics power effectively. For instance, with an Nvidia 3080ti GPU, the game achieves an impressive 224fps at 1080p resolution. This indicates that Steel Nomad Light can deliver exceptionally high frame rates while taking full advantage of advanced hardware capabilities, ensuring smooth and immersive gameplay experiences on top-tier systems.
I'm feeling quite frustrated because my Light test results are falling significantly short compared to PCGH's benchmarks using similar GPUs. For instance, in the standard 1440p test, my 7900 XTX managed to achieve only 214fps, while PCCH's same GPU soared to 317fps—a staggering 48% difference. Even my 3080ti couldn't keep up, recording just 155fps against PCGH's slightly less powerful 3080 10GB which hit 191fps—a gap of about 23%. Part of this discrepancy can be attributed to the platform differences; PCGH runs their tests on a system powered by an Intel i9-13900K CPU. Although CPU bottlenecking shouldn’t be an issue (my Light test shows a mere 17% CPU usage), their Intel setup is equipped with DDR5 RAM clocked at an impressive 7600MT. In contrast, my own testing environment includes a Ryzen 9 7900X paired with DDR5/6000MT for the Radeon card and a Ryzen 9 5900X with DDR4/3600MT for the GeForce card. Evidently, memory bandwidth plays a crucial role when aiming for very high FPS in these benchmarks.

Light Steel Nomad Variant Demands High Bandwidth in 3DMark Tests

The Light Steel Nomad variant, although adhering to the 6GB VRAM limitation, demands significantly higher bandwidth than previous 3DMark tests. This is evident from the drastic performance reduction observed in dGPU Hybrid mode. The severe framerate limitation experienced in this mode underscores the test's exceptional bandwidth sensitivity, which stems from its reliance on complex effects and computations. This heightened demand amplifies the performance disparity between iGPU and dGPU Hybrid modes, providing a clear illustration of how advanced graphical requirements can strain system resources beyond conventional benchmarks.

Dedicated GPUs and the Evolution of Mobile Gaming: A Shift Towards High-Performance Experiences

}:

{The significant performance improvement of the 3050 GPU over the integrated graphics unit (iGPU) in gaming tests suggests that the iGPU is constrained by memory bandwidth or other factors that did not affect simpler benchmarks like Time Spy. This observation highlights the critical role of dedicated GPUs in delivering superior gaming experiences, especially when handling resource-intensive tasks.

Moreover, Steel Nomad's support for Vulkan, Metal, and ARM architectures, coupled with the increasing availability of full-featured GPUs, points to a growing trend toward high-performance gaming on mobile devices such as tablets, phones, and handheld consoles. This shift indicates a broader industry movement towards enhancing mobile gaming capabilities to meet rising consumer expectations for quality and performance.
In my household, I have several other devices: an iPhone 13 Pro Max with 6GB of RAM, an iPad Air 4th Gen equipped with 4GB of RAM, and a range of MacBooks from x86 models to the latest M2 versions. Unfortunately, the macOS Steel Nomad ports are not yet available. While the iOS ports are ready, even the lighter version of Steel Nomad couldn't run on any of my Apple devices due to their limited RAM capacity—thanks a lot, Apple! The mobile Light test requires at least 8GB of unified RAM; for PCs, the requirement is just for VRAM at 6GB.

On the other hand, my Pixel 6 Pro boasts a hefty 12GB of RAM, which is what you'd expect from a true flagship phone released in 2021. This gives it a significant performance edge over my iPhone when it comes to frame rates—a clear victory for my Android device!

Steel Nomad: Scalable Rendering for Impressive Visuals

Steel Nomad's rendering techniques achieve an impressive visual quality, leveraging a diverse array of effects like post-processing, particle systems, and volumetric effects. These features operate seamlessly across various hardware configurations. The engine's scalability is further demonstrated by its Light variant, which reduces the resolution and Bloom calculation while excluding volumetric lights. This approach results in minimal visual quality degradation, showcasing the engine's ability to adapt to different hardware capabilities effectively.
The documentation cautions that the engine is highly multithreaded to issue command lists, which could lead to CPU bottlenecks in unbalanced builds. However, with the latest CPUs, the load is minimal; my 12-core 7900X only experienced about an 8% total load. In comparison, on the same system, Time Spy Extreme pushes CPU usage to around 17%, and even DX11 Fire Strike Ultra reaches approximately 12%, despite significant differences in asset detail and rendering quality. So, are these docs misleading, or does this new test leverage modern low-level GPU APIs much more effectively?
To examine a worst-case scenario, I assessed CPU usage during the Light test at 720p on my laptop equipped with a 7840HS chip. This mobile processor, which has a TDP ranging from 35 to 54 watts, features eight Zen4 cores—certainly not an entry-level component. The test revealed that CPU load hovered around 22%.

Similarly, when running the Light test on my 7900 XTX and pushing it to exceed 200fps in a competitive multiplayer context, the CPU load reached about 17%. UL's current benchmark for Steel Nomad Light stands at an impressive 557fps using an RTX 4090. Meanwhile, PCGH shared a video showcasing their RTX 4090 achieving roughly ~300fps during the same test with their system experiencing a 25% load on a Core i9-13900KS (utilizing only its performance cores). The discrepancy might be due to monitoring and capture activities.

It's plausible that those top-tier scores of over >550fps would demand at least around 40% of the same CPU resources.

Asynchronous Compute and Compute Shaders: Paving the Way for Future Game Development

**Asynchronous Compute and Compute Shaders: A Crucial Element**

The extensive use of asynchronous compute and compute shaders in Steel Nomad is emblematic of a broader trend within modern game engines and advanced technology frameworks. These techniques facilitate enhanced parallelization and optimization of graphics workloads, leading to substantial performance improvements. As next-gen engines like UE5, Snowdrop, and Northlight increasingly incorporate asynchronous compute and compute shaders, their significance for the future of game development becomes even more evident.

**Work Graphs for DirectX 12 and Vulkan: The Next Frontier**

Work Graphs for DirectX 12 and Vulkan represent an evolutionary leap in graphics programming, offering the promise of superior multi-threading capabilities. Although their widespread implementation in commercial games might still be on the horizon, GPUs that perform well with Steel Nomad are likely to shine in future titles that leverage these advanced methodologies. By tackling the complexities inherent in multi-threaded architecture and utilizing cutting-edge technologies, game developers can elevate visual fidelity and performance, crafting immersive experiences that captivate players.
Steel Nomad's engine is equipped with a functional Temporal Anti-Aliasing (TAA) system, though it only handles anti-aliasing without any upscaling capabilities.

3DMark offers specific benchmarks for DLSS, XeSS, and FSR technologies, but these are separate tests that make cross-comparison of scores invalid. It would be ideal to conduct identical tests with any upscaling technology, including the Frame Generation features of both DLSS and FSR.

With DirectX 12’s DirectSR on the horizon, there’s hope that 3DMark will eventually be updated to support testing with all DSR-compatible technologies across modern benchmarks like Speed Way or Steel Nomad.
In most of my tests, Vulkan generally outperforms DX12. However, there are exceptions, such as the Light test on the 7840HS iGPU and several scores from the 3080ti. This aligns with findings from PCGH and other sources, which suggest that this benchmark tends to favor DX12 for NVIDIA GPUs and Vulkan for AMD GPUs. That said, the performance difference is usually minimal, typically between 0.5% and 2%.

Notably, there are some anomalies in Nvidia's results with the full Steel Nomad setup: For instance, the RTX 3050 Mobile shows a significant +13% advantage using Vulkan, whereas the desktop 3080ti performs better with DX12 by a margin of +5%. These discrepancies are puzzling given they share the same Ampere architecture and driver versions.

VRAM Limitations and Persistent SystemInfo Issues in GPU Performance Analysis

The evaluation of resource requirements and VRAM limitations reveals significant insights into the performance capabilities of modern GPUs. Notably, while a comprehensive test demands 8GB of RAM, it successfully executes on an RTX 3050 with only 6GB of VRAM. This observation underscores Vulkan's ability to mitigate overhead, potentially offsetting the reduced bandwidth in NVIDIA's drivers.

Furthermore, persistent issues with the SystemInfo component in 3DMark continue to affect its reliability. Despite numerous updates, problems such as initialization failures, incorrect metric collection, zombie processes, and omission of metrics remain prevalent. These flaws severely impact the efficiency of conducting extensive testing sequences by causing high failure rates and unnecessary time consumption.
I've voiced my frustrations numerous times, both here and on Twitter, about the issues with game-based benchmarks, particularly when there's no built-in test. These benchmarks are excellent for evaluating specific games—such as determining which GPU performs best in a particular title—but they fall short when it comes to assessing overall hardware performance. For that purpose, synthetic benchmarks are more reliable, provided you have one that is (a) highly accurate and (b) reflective of the average performance across various games of interest.

For instance, when running pure-raster games at 4K Max settings on my stock 7900 XTX, I use a built-in benchmark for Cyberpunk 2077 and a low-effort method of "standing still in a busy scene" for other titles to gauge performance.

Comparing Performance Metrics for Accurate Assessment

When examining the performance of Steel Nomad, one can observe that its 68 fps aligns well with other mainstream games, suggesting a baseline consistency in optimization. In contrast, Aveum's notably lower performance highlights issues rooted in poor optimization rather than hardware limitations. This comparison underscores that Steel Nomad’s performance metrics are reflective of genuine capability rather than artificially enhanced results.

Additionally, it's important to consider the nuances in performance scaling indices derived from game-based benchmarks. For higher-end GPUs, these indices often fall short in accuracy when compared to Steel Nomad's scores. The discrepancy arises because as GPU strength increases and more sophisticated benchmarks are employed, bandwidth becomes a critical factor influencing overall performance. Hence, while traditional scaling indices provide a general sense of performance trends, they may not fully capture the capabilities unlocked by superior hardware configurations combined with optimized software environments.

Raster Rendering Performance Analysis for AAA Games

}:

{- Steel Nomad offers a detailed evaluation of raster rendering performance, covering an extensive array of graphical techniques utilized in modern AAA games. By concentrating exclusively on rasterization, it provides a more granular and dependable analysis compared to benchmarks that also include ray tracing.

- Despite the growing popularity of ray tracing over recent years, raster rendering continues to be prevalent in many current AAA titles. Steel Nomad's specific focus on rasterization enables a more accurate and illustrative assessment of graphics card performance within this particular area.

Game-Based Benchmarks: Enhancing Accuracy and Realism

Game-based benchmarks offer a unique advantage in identifying specific performance issues that synthetic benchmarks might miss. They can be particularly useful for gamers dealing with performance problems in certain games or genres. By focusing on real-world gameplay, these benchmarks take into account game-specific optimizations and other factors that significantly impact performance.

Moreover, game-based benchmarks provide valuable insights when comparing the performance of different GPUs for specific games. This can be incredibly helpful for gamers trying to choose the best GPU tailored to their needs. Unlike synthetic benchmarks, which often generalize hardware capabilities, game-based tests reflect actual gaming scenarios more accurately, guiding users towards more informed purchasing decisions.

In essence, incorporating game-based benchmarking methods not only enhances the assessment accuracy but also aligns better with practical gaming experiences. Such an approach ensures that both casual and competitive gamers get a clearer picture of how different hardware configurations will perform under real-world conditions.

J.D.

Experts

Discussions

❖ Columns