Trace-based evaluation of CPU cache usage in Renode

This blog was originally published by Antmicro 

Although cache modeling is usually not part of ISS level simulation, there are cases where it’s crucial to understand memory access patterns e.g., when building a new chip and deciding on cache size and layout, or working on low-level, time-critical firmware that requires precise cache management. Since Antmicro’s open source Renode simulation framework is often used for architectural exploration thanks to its broad ISA support, and already includes advanced execution tracing options, we’ve expanded its capabilities with trace-based cache usage evaluation. By utilizing Renode’s execution tracing data, it is possible to gain detailed insights into cache behavior, such as cache hits, misses, and the overall hit ratio, which in turn enables precise analysis of how different cache configurations impact system performance, as well as identification of bottlenecks and opportunities for optimization.

In this article we introduce Antmicro’s solution for profiling CPU cache usage in Renode as a new addition to our portfolio of trace-based analysis features, developed partly within the scope of the European Union’s TRISTAN project, focusing on open and reusable IP and tooling for RISC-V software and hardware development. Since this mechanism is generic, it can be used in the context of any architecture like ARM or RISC-V during the architectural exploration and early prototyping phase.

Trace-based evaluation of CPU cache usage in Renode illustration

Cache usage evaluation in simulation

CPU cache requires careful management to correctly determine what data should be stored across its multiple levels, and is inherently a tradeoff between area and complexity on the one hand and performance on the other. Depending on hardware configuration and size of cache, and on the memory access patterns in the running software, frequent cache misses can heavily impact the overall performance of the system.

Cache operations are transparent from the perspective of software - there is no change in the code, regardless of cache usage, however the effect of cache can be very significant. Because of that, it’s difficult to reason about cache without actually running software. Running it on real hardware provides the best “model” of cache usage (because it’s the actual behavior), but that requires your hardware to be actually fully developed (which is not the case in pre-silicon scenarios) and involves additional cache counters that need to be supported by hardware and may require you to modify your software, which affects the cache behavior as well.

In simulation you can analyze cache regardless of the hardware platform, and without changing your software or using a debugger. This approach is especially useful for smaller CPUs that do not have hardware support for cache analysis. Additionally, employing Renode in the pre-silicon stage lets you perform this kind of analysis early and tweak your cache parameters to meet the performance requirements, as it will let you track execution of the software to find exactly which parts are slowed down by uncached memory access and how. This enables a hardware/software co-design paradigm that lets you track the performance of the system across the entire product lifecycle and fine-tune the behavior of both sides of the equation to squeeze the most from the area you have to fit in.

When developing the cache analysis solution for Renode, we leveraged the extensive execution tracing features already present in our simulator, including:

  • Execution tracer: this subsystem allows for monitoring and saving the traces of all major CPU operations, including program flow tracing, memory access logging and performed I/O operations.
  • Execution metrics: a module that allows to measure quantitative data related to the simulation, including number of accesses to peripherals, number of exceptions and the number of executed instructions (including counting of specific opcodes).
  • Execution profiler: a call stack analysis tool intended for debugging and inspecting the software running on a simulated CPU.
  • Python hooks: the framework utilizes a built-in Python API to provide an easy entry point to automate testing, extend the simulator’s functionalities, and integrate seamlessly with other tools and workflows.

Implementation details

Renode offers post-mortem analysis of memory accesses (generated by the execution tracing subsystem) to simulate CPU cache behavior and generate usage statistics. The generated trace.log file can then be passed to the cache modeling analyzer.

Cache configuration in Renode is derived from the following inputs:

  • cache and memory size
  • cache block size
  • cache associativity (k-way associative cache, direct mapping, fully associative)
  • replacement policy: the line eviction policy that will be used by cache, for example: FIFO, LRU, LFU or Random.

There are two ways of cache model configuration:

  • Command line interface arguments: this method allows users to specify detailed configuration options directly through the command line.
  • Presets: this method loads a predefined set of configuration parameters using a preset name.

The output of the cache modeling analyzer contains information about hits and misses. A high cache hit rate means that your cache is effective and efficient, while a low cache hit rate indicates that your cache is underused (which might impact performance).

For more information about cache analysis in Renode, refer to the project’s README and documentation.

Architectural exploration and early prototyping with Renode

The initial implementation supports Level 1 instruction and data caches. Future plans involve expanding it to support multi-level (L2, L3) as well as multi-core caches. Since the cache modeling analyzer was written in Python, it could be also easily extended to work with the pyrenode3 library.


Technical Conference 2024

The Event

The event took place over two days, one being open and free for a public audience, while the second was planned to be reserved to all project members.
It was a unique opportunity for students, professors and engineers from all R&D disciplines to get in contact with experts and representatives from the most important European universities and semiconductors industries engaged with the ever growing adoption of RISC-V cores.

The Program

This schedule offered an overview of the sessions, keynotes, and workshops planned to provide valuable insights and networking opportunities.


TRISTAN Technical Conference 2024 Announced

TRISTAN Project Board is excited to introduce the TRISTAN Project Technical Conference 2024, hosted by the University of Technology of Graz, with the support of NXP Semiconductors Austria.

The conference will be held in Graz on September 11-12, 2024.

Day-1 of the conference is Open and Free.

Our ambition is to provide engineering students and RISC-V newbies a good introduction to the growing technology ecosystem (we like to call it the "Rising Tide") based on our beloved ISA. All morning keynotes and RISC-V industrial applications demos, as well as the whole afternoon training program will be broadcast online.

Day-2 will be limited to all TRISTAN Consortium members.

By registering online through this form you'll get updates on the agenda, as well as links to the online broadcast channel. Your participation to the conference, on site or online, will greatly contribute to the success of TRISTAN dissemination mission!

We're looking forward to seeing you in Graz!

Tiberio Fanti


TRISTAN at RISC-V Summit Europe 2024

 

TRISTAN Project will be present with a own booth at the coming RISC-V Summit Europe 2024, in Munich.

The booth is co-sponsored by semify, VLSI, MINRES. Siemens, Synthara and ST. There you can meet our representatives and get introduced to the project as well as to the contribution our sponsors are giving to it.

Beside our sponsors, a few more TRISTAN partners will be present with posters during the event.

 

 


TRISTAN Initial requirements and feedback for processor and hardware IPs

The D1.1 deliverable reports the results of the requirements for processors and hardware IPs, describing the activities carried on in Task 1.2 during the first three months. The deliverable defines the concept of requirement, illustrates the process adopted for the requirements elicitation, reports the set of requirements collected following this process and provides some statistics on them. The requirements will describe the relationships between the actual concrete results in the work packages and the objectives that they satisfy. An update of this report will be available Mid 2024


Subscribe to our newsletter

The TRISTAN project, nr. 101095947 is supported by Chips Joint Undertaking (CHIPS-JU) and its members Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Germany, Denmark, Estonia, Greece, Spain, Finland, France, Hungary, Ireland, Israel, Iceland, Italy, Lithuania, Luxembourg, Latvia, Malta, Netherlands, Norway, Poland, Portugal, Romania, Sweden, Slovenia, Slovakia, Turkey.

© TRISTAN. All rights reserved.