CPU Tests: Simulation

Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.

DigiCortex v1.35: link

DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.

The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.

The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.

For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.

(3-1) DigiCortex 1.35 (32k Neuron, 1.8B Synapse)

DigiCortex seems to fall into layers of performance, and the Core i7-5775C, with DDR3-1600, comes very close to the Core i7-6700K with DDR4-2133.

Dwarf Fortress 0.44.12: Link

Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.

Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.

For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:

  • Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
  • Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
  • Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts

DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.

(3-2a) Dwarf Fortress 0.44.12 World Gen 65x65, 250 Yr(3-2b) Dwarf Fortress 0.44.12 World Gen 129x129, 550 Yr(3-2c) Dwarf Fortress 0.44.12 World Gen 257x257, 550 Yr

Here's where we start to see some of the benefits of the lower latency eDRAM out to 128 MB. That larger cache pushes both Broadwell parts very near to modern CPUs, putting all the older models down the list. This is something AMD's APUs aren't particularly good at, due to the very limited L3 cache in play.

Dolphin v5.0 Emulation: Link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.

(3-3) Dolphin 5.0 Render Test

Unfortunately Dolphin isn't a fan of the eDRAM versions.

CPU Tests: Office and Science CPU Tests: Rendering
Comments Locked

120 Comments

View All Comments

  • krowes - Monday, November 2, 2020 - link

    CL22 memory for the Ryzen setup? Makes absolutely no sense.
  • Ian Cutress - Tuesday, November 3, 2020 - link

    That's JEDEC standard.
  • Khenglish - Monday, November 2, 2020 - link

    Was anyone else bothered by the fact that Intel's highest performing single thread CPU is the 1185G7, which is only accessible in 28W tiny BGA laptops?

    Also the 128mb edram cache does seem to make on average a 10% improvement over the edramless 4790S at the same TDP. I would love to see edram on more cpus. It's so rare to need more than 8 cores. I'd rather have 8 cores with edram than 16+ cores and no edram.
  • ichaya - Monday, November 2, 2020 - link

    There's definitely a cost trade-off involved, but with an I/O die since Zen 2, it seems like AMD could just spin up a different I/O die, and justify the cost easily by selling to HEDT/Workstation/DC.
  • Notmyusualid - Wednesday, November 4, 2020 - link

    Chalk me up as 'bothered'.
  • zodiacfml - Monday, November 2, 2020 - link

    Yeah but Intel is about squeezing the last dollar in its products for a couple of years now.
  • Endymio - Monday, November 2, 2020 - link

    CPU register-> 3 levels of cache -> eDRAM -> DRAM -> Optane -> SSD -> Hard Drive.

    The human brain gets by with 2 levels of storage. I really don't feel that computers should require 9. The entire approach needs rethinking.
  • Tomatotech - Tuesday, November 3, 2020 - link

    You remember everything without writing down anything? You remarkable person.

    The rest of us rely on written materials, textbooks, reference libraries, wikipedia, and the internet to remember stuff. If you jot down all the levels of hierarchical storage available to the average degree-educated person, it's probably somewhere around 9 too depending on how you count it.

    Not everything you need to find out is on the internet or in books either. Data storage and retrieval also includes things like having to ask your brother for Aunt Jenny's number so you can ring Aunt Jenny and ask her some detail about early family life, and of course Aunt Jenny will tell you to go and ring Uncle Jonny, but she doesn't have Jonny's number, wait a moment while she asks Max for it and so on.
  • eastcoast_pete - Tuesday, November 3, 2020 - link

    You realize that the closer the cache is to actual processor speed, the more demanding the manufacturing gets and the more die area it eats. That's why there aren't any (consumer) CPUs with 1 or more MB of L1 Cache. Also, as Tomatotech wrote, we humans use mnemonic assists all the time, so the analogy short-term/long-term memory is incomplete. Writing and even drawing was invented to allow for longer-term storage and easier distribution of information. Lastly, at least IMO, it boils down to cost vs. benefit/performance as to how many levels of memory storage are best, and depends on the usage scenario.
  • Oxford Guy - Monday, November 2, 2020 - link

    Peter Bright of Ars in 2015:

    "Intel’s Skylake lineup is robbing us of the performance king we deserve. The one Skylake processor I want is the one that Intel isn't selling.

    in games the performance was remarkable. The 65W 3.3-3.7GHz i7-5775C beat the 91W 4-4.2GHz Skylake i7-6700K. The Skylake processor has a higher clock speed, it has a higher power budget, and its improved core means that it executes more instructions per cycle, but that enormous L4 cache meant that the Broadwell could offset its disadvantages and then some. In CPU-bound games such as Project Cars and Civilization: Beyond Earth, the older chip managed to pull ahead of its newer successor.

    in memory-intensive workloads, such as some games and scientific applications, the cache is better than 21 percent more clock speed and 40 percent more power. That's the kind of gain that doesn't come along very often in our dismal post-Moore's law world.

    Those 5775C results tantalized us with the prospect of a comparable Skylake part. Pair that ginormous cache with Intel's latest-and-greatest core and raise the speed limit on the clock speed by giving it a 90-odd W power envelope, and one can't help but imagine that the result would be a fine processor for gaming and workstations alike. But imagine is all we can do because Intel isn't releasing such a chip. There won't be socketed, desktop-oriented eDRAM parts because, well, who knows why.

    Intel could have had a Skylake processor that was exciting to gamers and anyone else with performance-critical workloads. For the right task, that extra memory can do the work of a 20 percent overclock, without running anything out of spec. It would have been the must-have part for enthusiasts everywhere. And I'm tremendously disappointed that the company isn't going to make it."

    In addition to Bright's comments I remember Anandtech's article that showed the 5675C beating or equalling the 5775C in one or more gaming tests, apparently largely due to the throttling due to Intel's decision to hobble Broadwell with such a low TDP.

Log in

Don't have an account? Sign up now