Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions build_tools/github_actions/configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Test Framework

Automated benchmark testing framework for ROCm libraries with system detection, results collection, and performance tracking.

## Features

- Automated benchmark execution (ROCfft, ROCrand, ROCsolver, hipBLASLt)
- Hardware, OS, GPU, and ROCm auto-detection
- Local storage (JSON) and API upload
- LKG (Last Known Good) comparison
- File rotation and configurable logging
- Modular, extensible architecture


## Configuration

```bash
# Required environment variables
export THEROCK_BIN_DIR=/path/to/rocm/bin
export ARTIFACT_RUN_ID=$WORKFLOW_RUN_ID
export AMDGPU_FAMILIES=gfx950-dcgpu
```

### Run Benchmarks

```bash
python3 build_tools/github_actions/test_executable_scripts/test_rocfft_benchmark.py
```

## Project Structure

```
build_tools/github_actions/
├── configs/
│ ├── config.yml # Main configuration
│ └── benchmarks/ # Benchmark configs
├── test_executable_scripts/ # Benchmark scripts
│ ├── test_rocfft_benchmark.py
│ ├── test_rocrand_benchmark.py
│ ├── test_rocsolver_benchmark.py
│ └── test_hipblaslt_benchmark.py
└── utils/ # Framework utilities
├── test_client.py # Main client API
├── config/ # Configuration management
├── system/ # System detection
└── results/ # Results handling
```

## Config setup

### Main Config: `configs/config.yml`

```yaml
Config:
Core:
LogLevel: INFO # DEBUG, INFO, WARNING, ERROR
LogToFile: true
LogDirectory: "./logs"
UploadTestResultsToAPI: true

Results:
OutputDirectory: "./results"
SaveJSON: true
```
## Architecture
### Core Components
- **TestClient** - Main API for test execution and results management
- **System Detection** - Hardware, OS, and ROCm detection
- **Results Handling** - Local storage and API submission with retry
- **Configuration** - YAML-based config with environment variable expansion
### Test Flow
1. Initialize TestClient → Detect system and load config
2. Run Benchmarks → Execute binary and capture output
3. Parse Results → Extract metrics from log file
4. Upload Results → Submit to API and save locally
5. Compare with LKG → Fetch and compare scores
6. Report Results → Display table and return status
## Adding New Benchmarks
### 1. Create Script
```python
from utils import TestClient
from utils.logger import log

def run_benchmarks():
"""Run benchmarks and save output to log file."""
pass

def parse_results():
"""Parse benchmark results from log file."""
pass

def main():
client = TestClient(auto_detect=True)
client.print_system_summary()
run_benchmarks()
test_results, table = parse_results()
client.upload_results(...)

if __name__ == '__main__':
main()
```
### 2. Add Config (Optional)
Create `configs/benchmarks/your_benchmark.json`:

```json
{
"test_cases": ["case1", "case2"]
}
```

## Documentation

- [Main framework documentation](README.md) - Main framework documentation
- [Utils Module](../utils/README.md) - Framework utilities
- [Configuration Guide](config.yml) - Configuration options

25 changes: 25 additions & 0 deletions build_tools/github_actions/configs/benchmarks/hipblaslt.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"input_shapes": [
"8192 320 320 1",
"2048 640 640 1",
"512 1280 1280 1",
"8192 320 1280 1",
"512 10240 1280 1",
"2048 5120 640 1",
"8192 2560 320 1",
"512 1280 5120 1",
"2048 640 2560 1",
"154 320 768 1",
"154 1280 768 1",
"4096 40 4096 16",
"1024 80 1024 16",
"1024 80 77 16"
],
"ntinput_shapes": [
"4096 4096 40 16",
"1024 1024 80 16",
"4096 77 40 16",
"256 77 160 16",
"1024 77 80 16"
]
}
28 changes: 28 additions & 0 deletions build_tools/github_actions/configs/benchmarks/rocfft.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"generic": [
"16777216",
"14348907",
"9765625",
"4096 4096",
"6561 6561",
"3125 3125",
"256 256 256",
"243 243 243",
"125 125 125",
"100 100 100 -t 2 -o",
"100 100 100 -t 3 -o",
"200 200 200 -t 2 -o",
"200 200 200 -t 3 -o",
"192 192 192 -t 2 -o",
"192 192 192 -t 3 -o",
"64 64 64 -t 2 -o",
"60 -b 1024"
],
"gfx94X-dcgpu": [
"336 336 56 --double -o"
],
"gfx950-dcgpu": [
"336 336 56 --double -o"
],
"gfx1151": []
}
33 changes: 33 additions & 0 deletions build_tools/github_actions/configs/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Test Framework Configuration
# Configuration for logging, API integration, and test execution

Config:
Core:
# Logging Configuration
LogLevel: INFO # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL
LogToFile: true # Enable file logging with rotation
LogDirectory: "./logs" # Directory for log files
LogMaxSizeMB: 10 # Maximum log file size before rotation
LogBackupCount: 5 # Number of backup log files to keep

# Execution Metadata
# Environment variables with defaults (format: ${VAR:-default})
DeployedUser: "${DEPLOYED_USER:-therockbot}"
ExecutionLabel: "${EXECUTION_LABEL:-therock_pr}"
CIGroup: "${CI_GROUP:-therock_pr}"

# Results API Configuration
UploadTestResultsToAPI: true
ResultsAPI:
URL: "${API_URL}" # API URL from GitHub secrets
FallbackURL: "${API_FALLBACK_URL}" # Fallback API URL from GitHub secrets
APIKey: "${API_KEY:-}" # API key from environment (secure)
Timeout: 30 # Request timeout in seconds
MaxRetries: 3 # Maximum retry attempts
RetryDelay: 5 # Delay between retries in seconds

Results:
# Local Results Output
OutputDirectory: "./results"
SaveJSON: true

36 changes: 36 additions & 0 deletions build_tools/github_actions/fetch_test_configurations.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,15 @@ def _get_script_path(script_name: str) -> str:
"platform": ["linux", "windows"],
"total_shards": 6,
},
"hipblaslt_bench": {
"job_name": "hipblaslt_bench",
"fetch_artifact_args": "--blas --tests",
"timeout_minutes": 60,
"test_script": f"python {_get_script_path('test_hipblaslt_benchmark.py')}",
# TODO(lajagapp): Add windows test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for add windows tests, can we open a Github issue and link it here? so we can keep track?

"platform": ["linux"],
"total_shards": 1,
},
Comment on lines +65 to +73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be an issue as this will get run during each PR, push to main and scheduled run, probably resulting in long queue times and machine shortages

I would imagine we want to add benchmark tests as a separate (perhaps nightly) run on separate machines? is this the case? how frequent do we want to run these and how long do these take?

# SOLVER tests
"hipsolver": {
"job_name": "hipsolver",
Expand All @@ -80,6 +89,15 @@ def _get_script_path(script_name: str) -> str:
"platform": ["linux"],
"total_shards": 1,
},
"rocsolver_bench": {
"job_name": "rocsolver_bench",
"fetch_artifact_args": "--blas --tests",
"timeout_minutes": 60,
"test_script": f"python {_get_script_path('test_rocsolver_benchmark.py')}",
# TODO(lajagapp): Add windows test
"platform": ["linux"],
"total_shards": 1,
},
# PRIM tests
"rocprim": {
"job_name": "rocprim",
Expand Down Expand Up @@ -142,6 +160,15 @@ def _get_script_path(script_name: str) -> str:
"platform": ["linux", "windows"],
"total_shards": 1,
},
"rocrand_bench": {
"job_name": "rocrand_bench",
"fetch_artifact_args": "--rand --tests",
"timeout_minutes": 60,
"test_script": f"python {_get_script_path('test_rocrand_benchmark.py')}",
# TODO(lajagapp): Add windows test
"platform": ["linux"],
"total_shards": 1,
},
"hiprand": {
"job_name": "hiprand",
"fetch_artifact_args": "--rand --tests",
Expand All @@ -160,6 +187,15 @@ def _get_script_path(script_name: str) -> str:
"platform": ["linux"],
"total_shards": 1,
},
"rocfft_bench": {
"job_name": "rocfft_bench",
"fetch_artifact_args": "--fft --rand --tests",
"timeout_minutes": 60,
"test_script": f"python {_get_script_path('test_rocfft_benchmark.py')}",
# TODO(lajagapp): Add windows test
"platform": ["linux"],
"total_shards": 1,
},
"hipfft": {
"job_name": "hipfft",
"fetch_artifact_args": "--fft --rand --tests",
Expand Down
Loading
Loading