Cache Simulator Design Documentation

Introduction
Architecture Overview
Core Components
Utility Components
Algorithms & Policies
Implementation Details
Testing Strategy
Future Extensions

Introduction

The Cache Simulator is a comprehensive simulation tool for studying memory hierarchy behavior in computer systems. It models a two-level cache hierarchy with configurable parameters and advanced features like prefetching, cache coherence, and detailed statistics tracking. The simulator is designed to be highly customizable, allowing researchers and students to experiment with different cache configurations and analyze their performance.

The simulator takes as input a memory access trace that describes a sequence of read and write operations with their corresponding memory addresses. It then simulates how these accesses would flow through the cache hierarchy, tracking hits, misses, replacements, and other events. Various statistics are collected during the simulation, which can be used to evaluate the effectiveness of different cache designs and policies.

This document describes the design of the Cache Simulator, including its architecture, components, algorithms, and implementation details.

Architecture Overview

The Cache Simulator employs a modular, layered architecture that separates core cache functionality from utilities and user interface components:

┌──────────────────────────────────────────────────┐
│                   Applications                   │
│ (Main Program, Tests, Trace Generator, Scripts)  │
└───────────────────────┬──────────────────────────┘
                        │
┌───────────────────────▼──────────────────────────┐
│               Memory Hierarchy API               │
│     (MemoryHierarchy, Configuration, Stats)      │
└───────────────────────┬──────────────────────────┘
                        │
┌───────────────────────▼──────────────────────────┐
│                Core Components                   │
│  (Cache, StreamBuffer, Prefetchers, MESI, etc.)  │
└───────────────────────┬──────────────────────────┘
                        │
┌───────────────────────▼──────────────────────────┐
│             Utility Components                   │
│   (Trace Parsing, Logging, Visualization, etc.)  │
└──────────────────────────────────────────────────┘

The design follows several key architectural principles:

Separation of Concerns: Each component has a focused responsibility.
Modularity: Components can be developed, tested, and replaced independently.
Extensibility: New policies, algorithms, and features can be easily added.
Configurability: Extensive runtime customization options are available.
Modern C++ Design: Leveraging C++20 features for safer, more expressive code.

Core Components

CacheBlock & CacheSet

The foundational data structures in the Cache Simulator are CacheBlock and CacheSet:

CacheBlock represents a single block in the cache, containing:
- Valid and dirty flags
- Tag bits for address matching
- MESI state for cache coherence
- Optional data storage (for functional simulation)
CacheSet represents a set of blocks in a set-associative cache:
- Vector of CacheBlocks
- Replacement policy managed via pluggable ReplacementPolicyBase instances

These structures are designed to be memory-efficient while still providing all necessary state information for accurate simulation.

Cache

The Cache class is a central component representing a single cache level (L1 or L2). It manages:

Cache organization (size, associativity, block size)
Block lookup and placement
Replacement policy implementation
Prefetching integration
Hit/miss statistics collection
MESI protocol state transitions

The Cache class provides a clean access() method that handles all the complex operations of a cache access while maintaining detailed statistics.

MemoryHierarchy

The MemoryHierarchy class orchestrates the entire cache hierarchy, managing:

Multiple cache levels (L1, L2, L3) - L3 support added in v1.4.0
Memory access flow through the hierarchy
Prefetching configuration and coordination
Statistics aggregation
Trace file processing
Performance metrics calculation

This class serves as the main API for applications interacting with the simulator, providing methods to access memory, process traces, and retrieve statistics.

L3 Cache Support

The MemoryHierarchy now supports an optional L3 cache with:

Configurable size, associativity, and block size
Inclusive or non-inclusive policy
Statistics tracking (hit rate, miss rate)
Proper access flow: L1 miss → L2 access → L2 miss → L3 access

Coherence Protocol Framework

Abstract CoherenceProtocolBase class with implementations:

MSIProtocol: 3-state (Modified, Shared, Invalid)
MESIProtocol: 4-state with Exclusive
MOESIProtocol: 5-state with Owned for dirty sharing

Interconnect Topologies

The InterconnectFactory supports:

Bus, Crossbar, Mesh (existing)
RingInterconnect: Bidirectional ring with wrap-around
TorusInterconnect: 2D torus with minimal path routing

StreamBuffer

The StreamBuffer class implements a simple hardware prefetching mechanism:

Prefetches sequential blocks following a cache miss
Maintains a buffer of prefetched addresses
Tracks hits in the buffer
Shifts out consumed entries
Provides statistics on buffer effectiveness

Stream buffers are a simple but effective prefetching technique that works well for sequential access patterns.

StridePredictor

The StridePredictor class implements a more sophisticated prefetching mechanism:

Detects constant stride patterns in memory accesses
Maintains a table of recent addresses and strides
Predicts future accesses based on detected strides
Provides confidence levels for predictions
Tracks prediction accuracy statistics

Stride prediction is effective for regular access patterns with non-unit strides, such as array traversals in multidimensional arrays.

AdaptivePrefetcher

The AdaptivePrefetcher class implements an advanced prefetching mechanism:

Dynamically selects between prefetching strategies
Adjusts prefetch distance based on effectiveness
Tracks different metrics to guide adaptation
Supports multiple concurrently active strategies
Maintains detailed effectiveness statistics

Adaptive prefetching can deliver good performance across a wider range of access patterns compared to fixed-strategy prefetchers.

MESI Protocol

The MESIProtocol class implements the Modified-Exclusive-Shared-Invalid cache coherence protocol:

Manages state transitions for cache blocks
Handles local reads and writes
Processes remote cache operations
Coordinates writebacks for modified blocks
Tracks protocol state transition statistics

The MESI protocol ensures coherent memory views in systems with multiple caches accessing shared memory.

Power Model

The PowerModel class provides CACTI-inspired power and energy analysis:

Dynamic energy calculation (read/write per access)
Static leakage power with temperature scaling
Technology node support (7nm, 14nm, 22nm, 32nm, 45nm)
Energy-Delay Product (EDP) computation
Detailed power breakdown reporting

The power model enables analysis of energy efficiency across different cache configurations.

Area Model

The AreaModel class provides silicon area estimation:

Component-level area breakdown (data array, tag array, decoders, sense amps)
Technology-specific scaling factors
Cell efficiency metrics
Aspect ratio estimation

Area modeling helps evaluate the physical implementation costs of cache designs.

Utility Components

TraceParser

The TraceParser class handles trace file input:

Parses different trace file formats
Validates memory access operations
Provides detailed error information
Offers both single-operation and bulk parsing
Uses efficient, streaming file reading

This component decouples trace file handling from the core simulation logic.

Statistics

The Statistics class manages simulation metrics:

Collects and organizes various numerical metrics
Calculates derived statistics
Provides data export capabilities
Supports metric comparison between runs
Offers customizable reporting formats

This utility enables detailed analysis of simulation results.

Visualization

The Visualization class provides text-based visualization:

Generates ASCII visualizations of cache state
Renders histograms and heatmaps of memory access patterns
Supports colorized output where available
Exports data for external visualization tools
Provides progress indicators for long simulations

These visualizations aid in understanding simulator behavior.

Configuration

The ConfigManager class manages configuration parameters:

Loads configurations from files (JSON, INI)
Parses command-line arguments
Validates configuration parameters
Generates parameter sweeps for benchmarking
Supports configuration comparison

This utility simplifies experimentation with different simulator settings.

Logging

The Logger class provides diagnostic information:

Supports multiple log levels
Offers both file and console output
Provides thread-safe logging
Allows component-specific named loggers
Enables selectively verbose output

Logging helps diagnose issues and understand simulator behavior.

Benchmarking

The Benchmark class supports performance measurement:

Times the execution of simulation functions
Compares multiple configurations
Calculates statistical measures of performance
Provides formatted benchmark reports
Supports repeated runs for variance analysis

This utility is useful for simulator development and optimization.

Memory Profiler

The MemoryProfiler class analyzes memory access patterns:

Tracks memory access frequencies
Detects common access patterns
Profiles locality characteristics
Identifies hot regions in memory
Provides insights for optimization

This tool helps understand workload behavior and aids in prefetcher design.

Algorithms & Policies

Replacement Policies

The simulator implements several cache block replacement policies:

LRU (Least Recently Used): Replaces the block that has not been accessed for the longest time. This is the default policy and is implemented using an ordering vector in each cache set.
Pseudo-LRU: A tree-based approximation of LRU that requires less state information but still provides good performance.
Random: Randomly selects a block for replacement, which can be effective for certain workloads and eliminates pathological cases that affect deterministic policies.
FIFO (First-In-First-Out): Replaces the oldest block regardless of access frequency.
NRU (Not Recently Used): A simple approximation of LRU using a single bit per block to indicate recent usage.

The policy can be configured on a per-cache basis.

Prefetching Algorithms

The simulator supports several prefetching algorithms:

Sequential Prefetching: Prefetches a configurable number of sequential blocks after a miss. Implemented by the StreamBuffer class.
Stride Prefetching: Detects regular strides in access patterns and prefetches accordingly. Implemented by the StridePredictor class.
Adaptive Prefetching: Dynamically adjusts prefetching strategy and aggressiveness based on effectiveness. Implemented by the AdaptivePrefetcher class.
Next-N-Line Prefetching: A simple variation that prefetches the next N blocks on any miss.
PC-based Prefetching: Uses program counter information (if available in the trace) to predict access patterns specific to different code regions.

These algorithms can be combined in various ways and customized with different parameters.

Cache Coherence

The MESI protocol implements cache coherence:

Modified (M): The block is present only in the current cache and has been modified. It must be written back to memory when evicted.
Exclusive (E): The block is present only in the current cache and matches memory.
Shared (S): The block may be present in other caches and is clean.
Invalid (I): The block is not present or valid in the cache.

The protocol handles transitions between these states based on local and remote operations, ensuring coherent memory views across caches.

Implementation Details

C++20 Features

The simulator leverages modern C++20 features:

std::optional: Used for representing potentially missing values, such as nullable parameters and results.
std::variant: Used for type-safe unions, particularly in return values that can represent different outcomes.
std::string_view: Used for non-owning references to strings, improving performance in text processing.
std::filesystem: Used for portable file operations in trace handling and result saving.
Structured Bindings: Used to simplify code that returns multiple values, like tag and set index extraction.
if constexpr: Used for compile-time conditional code, improving performance in template functions.
Concepts and Constraints: Used for clearer template interfaces and better error messages.
[[nodiscard]]: Used to ensure return values are properly handled for functions that shouldn't be called for side effects.
Ranges Library: Used for cleaner iteration and transformation of collections.
Three-way Comparison (spaceship operator): Simplifies comparison operations.

These features improve code safety, clarity, and performance.

Performance Optimizations

Several optimizations are employed:

Memory Pool Allocation: Cache blocks are allocated from a memory pool to reduce dynamic allocation overhead.
Bit Manipulation: Fast bit operations are used for address decomposition and tag comparison.
Vectorized Processing: Common operations are vectorized when processing multiple blocks.
Cache-Friendly Data Structures: Data is organized to maximize spatial locality.
Custom Hash Functions: Optimized hash functions for address lookup.
String Interning: Reuse of common strings to reduce memory usage.
Streaming I/O: Efficient trace file reading without loading entire files.
Compiler Hints: Strategic use of likely/unlikely hints for branch prediction.

These optimizations significantly improve simulation speed, especially for large traces.

Code Organization

The code is organized following modern C++ practices:

Namespace Usage: All components are within the cachesim namespace to avoid pollution.
Header-Only When Appropriate: Small utilities are implemented as header-only for simplicity.
Separation of Interface and Implementation: Larger components separate declaration and definition.
PIMPL Pattern: Used for components with complex internal details to improve compilation speed.
Consistent Naming: Camel case for class names, snake case for methods and variables.
Const Correctness: Rigorous use of const for parameters and methods.
Forward Declarations: Used to minimize include dependencies.
Explicit Memory Management: Clear ownership semantics with smart pointers.
Modular Design: Each component has a single responsibility.
Documentation: Comprehensive comments for all public interfaces.

This organization enhances maintainability, readability, and compile times.

Testing Strategy

The simulator employs a multi-level testing approach:

Unit Tests: Verify individual components in isolation.
- Test each cache operation separately
- Verify replacement policy behavior
- Validate prefetcher predictions
- Check MESI state transitions
Integration Tests: Verify interactions between components.
- Test L1-L2 interactions
- Verify prefetcher-cache coordination
- Check coherence protocol across caches
Validation Tests: Verify the simulator against known outcomes.
- Test with hand-crafted traces with predictable results
- Compare against simplified analytical models
- Validate against academic benchmark results
Property-Based Tests: Verify invariants and properties.
- Ensure cache coherence is maintained
- Verify inclusion properties when applicable
- Check conservation of blocks
Performance Tests: Measure and optimize performance.
- Benchmark with large traces
- Profile memory usage
- Analyze scalability with increasing parameters

This comprehensive testing strategy ensures correctness and robustness.

Future Extensions

Planned extensions include:

Non-Inclusive Policies: Support for non-inclusive and exclusive cache hierarchies.
Advanced Prefetching Algorithms: Implementation of more sophisticated prefetchers like GHB and Markov predictors.
GPU Cache Modeling: Extensions for modeling GPU-specific cache architectures.
NUMA Topology: Modeling non-uniform memory access patterns.
Graphical Interface: Development of a GUI for visualization and configuration.
Dynamic Trace Generation: Integration with instruction-level simulators.
Machine Learning Integration: Using ML techniques for prefetching and replacement.
Network-on-Chip Modeling: Detailed NoC simulation for many-core systems.

These extensions will enhance the simulator's capabilities and keep it relevant for future research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Simulator Design Documentation

Table of Contents

Introduction

Architecture Overview

Core Components

CacheBlock & CacheSet

Cache

MemoryHierarchy

L3 Cache Support

Coherence Protocol Framework

Interconnect Topologies

StreamBuffer

StridePredictor

AdaptivePrefetcher

MESI Protocol

Power Model

Area Model

Utility Components

TraceParser

Statistics

Visualization

Configuration

Logging

Benchmarking

Memory Profiler

Algorithms & Policies

Replacement Policies

Prefetching Algorithms

Cache Coherence

Implementation Details

C++20 Features

Performance Optimizations

Code Organization

Testing Strategy

Future Extensions

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Cache Simulator Design Documentation

Table of Contents

Introduction

Architecture Overview

Core Components

CacheBlock & CacheSet

Cache

MemoryHierarchy

L3 Cache Support

Coherence Protocol Framework

Interconnect Topologies

StreamBuffer

StridePredictor

AdaptivePrefetcher

MESI Protocol

Power Model

Area Model

Utility Components

TraceParser

Statistics

Visualization

Configuration

Logging

Benchmarking

Memory Profiler

Algorithms & Policies

Replacement Policies

Prefetching Algorithms

Cache Coherence

Implementation Details

C++20 Features

Performance Optimizations

Code Organization

Testing Strategy

Future Extensions