Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ set(CMAKE_CXX_COMPILER g++)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Kernel debug/timing instrumentation switch.
# OFF by default to keep production performance unchanged.
option(MUSA_KERNEL_DEBUG "Enable MUSA kernel compute timing instrumentation" OFF)
# Kernel debug metadata switch.
# OFF by default to keep production behavior unchanged.
option(MUSA_KERNEL_DEBUG "Enable MUSA kernel debug metadata logging" OFF)

# Set MUSA path
set(MUSA_PATH "/usr/local/musa" CACHE STRING "Path to MUSA installation")
Expand Down Expand Up @@ -63,9 +63,10 @@ set(TF_COMPILE_FLAGS "${TF_COMPILE_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
# refcount aborts during session/resource destruction.
#
# Keep NDEBUG enabled in every build to match the TensorFlow wheel, and use
# MUSA_KERNEL_DEBUG only for plugin-local timing/debug instrumentation.
# MUSA_KERNEL_DEBUG only for plugin-local kernel metadata logging.
if(MUSA_KERNEL_DEBUG)
set(TF_COMPILE_FLAGS "${TF_COMPILE_FLAGS} -DMUSA_KERNEL_DEBUG -DNDEBUG")
message(STATUS "MUSA kernel debug info enabled")
set(TF_COMPILE_FLAGS "${TF_COMPILE_FLAGS} -DMUSA_KERNEL_DEBUG=1 -DNDEBUG")
else()
set(TF_COMPILE_FLAGS "${TF_COMPILE_FLAGS} -DMUSA_DISABLE_DEBUG_LOGGING -DNDEBUG -DMUSA_DISABLE_TRACE_LOGGING -DMUSA_DISABLE_AUTO_TRACE")
endif()
Expand Down
64 changes: 54 additions & 10 deletions README.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ TensorFlow MUSA Extension is a high-performance TensorFlow plugin specifically d
- **Automatic Graph Optimization**: Supports automatic layout conversion, operator fusion, and Automatic Mixed Precision (AMP)
- **Seamless Integration**: Fully compatible with TensorFlow ecosystem without requiring code modifications
- **Device Management**: Complete MUSA device registration, memory management, and stream processing support
- **Kernel Debugging Support**: Built-in kernel execution time statistics for performance analysis
- **Kernel Debugging Support**: Debug builds print operator type, input types, and input shapes, with optional terminal color highlighting

## Quick Start

Expand Down Expand Up @@ -77,7 +77,7 @@ Both Release and Debug modes are supported:
| Mode | Command | Description |
|------|---------|-------------|
| **Release** | `./build.sh` or `./build.sh release` | Optimized for performance, no debug overhead |
| **Debug** | `./build.sh debug` | Enables `MUSA_KERNEL_DEBUG` and kernel timing macros |
| **Debug** | `./build.sh debug` | Enables `MUSA_KERNEL_DEBUG` and prints kernel debug logs |

### 2. Compilation Process

Expand All @@ -90,7 +90,7 @@ Execute the automated build script:
# Release (explicit)
./build.sh release

# Debug (timing instrumentation)
# Debug (kernel debug logs)
./build.sh debug
```

Expand All @@ -101,9 +101,9 @@ The build script automatically completes the following steps:

### 3. Debugging and Diagnostics

For detailed debugging guide, see [docs/DEBUG_GUIDE.md](docs/DEBUG_GUIDE.md), including:
For a more detailed debugging guide, see [docs/DEBUG_GUIDE.md](docs/DEBUG_GUIDE.md). The README has been updated to describe the current kernel debug logging flow; the old timing-macro path has been removed.

- **Kernel Timing**: Performance analysis in Debug mode
- **Kernel Debug Logs**: `MUSA_DEBUG_LOG_KERNEL(ctx)` prints `op_type`, `input_types`, and `input_shapes`
- **Telemetry System**: Full-stack tracing and dirty data diagnostics
- **Memory Diagnostics**: Use-After-Free detection and memory coloring
- **Environment Variables**: Complete environment variable configuration table
Expand All @@ -116,14 +116,58 @@ export MUSA_TELEMETRY_LOG_PATH=/tmp/telemetry.json
python test_runner.py
```

Quick kernel timing setup for performance analysis:
### 4. Kernel Debug Change Summary

The kernel debug flow was updated as follows:

- Added a unified debug macro, `MUSA_DEBUG_LOG_KERNEL(ctx)`, to print detailed debug metadata at the beginning of `Compute()` and an automatic short completion message when `Compute()` exits
- Centralized formatting and logging helpers in `musa_ext/kernels/utils_op.h` and `musa_ext/kernels/utils_op.cc`
- All kernel `Compute()` entry points currently wired with `MUSA_DEBUG_LOG_KERNEL(ctx)` under `musa_ext/kernels` use this logging flow
- Removed the old timing macros entirely: `MUSA_KERNEL_TIMING_GUARD`, `MUSA_KERNEL_TRACE_START`, `MUSA_KERNEL_TRACE_END`, `MUSA_KERNEL_TRACE`, and `MUSA_PROFILE_OP`

The new log format looks like this:

```txt
[MUSA_KERNEL_DEBUG] op_type=AddV2 input_types=[float, float] input_shapes=[[1024,1024], [1024,1024]]
[MUSA_KERNEL_DEBUG] END AddV2
```

Notes:

- The start log keeps `op_type`, `input_types`, and `input_shapes`
- The end log is intentionally minimal: it prints `END <OpName>` on success and `FAIL <OpName>` on failure
- `input_types` is highlighted in cyan by default
- `input_shapes` is highlighted in yellow by default
- `END` logs are highlighted in bright green by default, and `FAIL` logs are highlighted in bright red
- When output is redirected to a file, plain text is emitted by default to avoid ANSI escape codes in logs
- To force color even when using `tee` or redirection, set `MUSA_KERNEL_DEBUG_COLOR=1`
- To explicitly disable colors, set `NO_COLOR=1`

Quick setup for the new kernel debug logs:

Option 1: run from the repository root.

```bash
./build.sh debug
export MUSA_TIMING_KERNEL_LEVEL=2
export MUSA_TIMING_KERNEL_NAME=ALL
export MUSA_TIMING_KERNEL_STATS=1
python test_runner.py
export PYTHONPATH=$PWD/test
python3 test/ops/add_op_test.py 2>&1 | tee /tmp/tme_add.log
grep 'MUSA_KERNEL_DEBUG' /tmp/tme_add.log
```

Option 2: enter the `test/` directory directly. In this mode you do not need to set `PYTHONPATH`.

```bash
./build.sh debug
cd test
python3 -m ops.add_op_test 2>&1 | tee /tmp/tme_add.log
grep 'MUSA_KERNEL_DEBUG' /tmp/tme_add.log
```

To force colored terminal output:

```bash
cd test
MUSA_KERNEL_DEBUG_COLOR=1 python3 -m ops.add_op_test
```

## Testing
Expand Down
64 changes: 54 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ TensorFlow MUSA Extension 是一个高性能的 TensorFlow 插件,专为摩尔
- **自动图优化**:支持 Layout 自动转换、算子融合和自动混合精度(AMP)
- **无缝集成**:与 TensorFlow 生态系统完全兼容,无需修改现有代码
- **设备管理**:完整的 MUSA 设备注册、内存管理和流式处理支持
- **Kernel 调试支持**:内置 Kernel 执行时间统计功能,便于性能分析
- **Kernel 调试支持**:Debug 模式下输出算子类型、输入类型和输入 shape,并支持终端颜色高亮

## 快速开始

Expand Down Expand Up @@ -77,7 +77,7 @@ tf.load_library("./build/libmusa_plugin.so")
| 模式 | 命令 | 说明 |
|------|------|------|
| **Release** | `./build.sh` 或 `./build.sh release` | 优化性能,无调试开销 |
| **Debug** | `./build.sh debug` | 开启 `MUSA_KERNEL_DEBUG`,启用 kernel timing 宏 |
| **Debug** | `./build.sh debug` | 开启 `MUSA_KERNEL_DEBUG`,输出 Kernel 调试日志 |

### 2. 编译流程

Expand All @@ -90,7 +90,7 @@ tf.load_library("./build/libmusa_plugin.so")
# Release(显式)
./build.sh release

# Debug(计时调试
# Debug(Kernel 调试日志
./build.sh debug
```

Expand All @@ -101,9 +101,9 @@ tf.load_library("./build/libmusa_plugin.so")

### 3. 调试与诊断

详细的调试指南请参阅 [docs/DEBUG_GUIDE.md](docs/DEBUG_GUIDE.md),包含:
详细的调试指南请参阅 [docs/DEBUG_GUIDE.md](docs/DEBUG_GUIDE.md)。当前 README 中与 Kernel 调试相关的说明已经更新为新的 debug 日志方案;旧的 timing 宏路径已移除。

- **Kernel 计时**:Debug 模式下的性能分析方法
- **Kernel 调试日志**:通过 `MUSA_DEBUG_LOG_KERNEL(ctx)` 输出 `op_type`、`input_types`、`input_shapes`
- **遥测系统(Telemetry)**:全链路追踪和脏数据诊断
- **内存诊断**:Use-After-Free 检测和内存染色
- **环境变量**:完整的环境变量配置表
Expand All @@ -116,14 +116,58 @@ export MUSA_TELEMETRY_LOG_PATH=/tmp/telemetry.json
python test_runner.py
```

快速启用 Kernel 计时分析:
### 4. 本次 Kernel Debug 改动说明

本次调试链路做了如下调整:

- 新增统一调试宏 `MUSA_DEBUG_LOG_KERNEL(ctx)`,在算子 `Compute()` 开始时打印详细调试信息,并在 `Compute()` 退出时自动打印极简结束提示
- 公共格式化和输出逻辑集中在 `musa_ext/kernels/utils_op.h` 与 `musa_ext/kernels/utils_op.cc`
- 当前 `musa_ext/kernels` 下所有已接入 `MUSA_DEBUG_LOG_KERNEL(ctx)` 的算子 `Compute()` 入口都会输出这套日志
- 旧的 timing 宏已经删除,不再使用 `MUSA_KERNEL_TIMING_GUARD`、`MUSA_KERNEL_TRACE_START`、`MUSA_KERNEL_TRACE_END`、`MUSA_KERNEL_TRACE`、`MUSA_PROFILE_OP`

新的日志格式如下:

```txt
[MUSA_KERNEL_DEBUG] op_type=AddV2 input_types=[float, float] input_shapes=[[1024,1024], [1024,1024]]
[MUSA_KERNEL_DEBUG] END AddV2
```

其中:

- 开始日志保留 `op_type`、`input_types`、`input_shapes`
- 结束日志只保留极简提示,成功时打印 `END <OpName>`,失败时打印 `FAIL <OpName>`
- `input_types` 默认使用青色高亮
- `input_shapes` 默认使用黄色高亮
- `END` 日志默认使用亮绿色高亮,`FAIL` 日志默认使用亮红色高亮
- 当输出被重定向到文件时,默认保持纯文本,避免 ANSI 颜色码污染日志
- 如需在 `tee` 或重定向场景中强制保留颜色,可设置 `MUSA_KERNEL_DEBUG_COLOR=1`
- 如需显式关闭颜色,可设置 `NO_COLOR=1`

快速启用新的 Kernel 调试日志:

方式一:在仓库根目录运行。

```bash
./build.sh debug
export MUSA_TIMING_KERNEL_LEVEL=2
export MUSA_TIMING_KERNEL_NAME=ALL
export MUSA_TIMING_KERNEL_STATS=1
python test_runner.py
export PYTHONPATH=$PWD/test
python3 test/ops/add_op_test.py 2>&1 | tee /tmp/tme_add.log
grep 'MUSA_KERNEL_DEBUG' /tmp/tme_add.log
```

方式二:直接进入 `test/` 目录运行,这种方式不需要设置 `PYTHONPATH`。

```bash
./build.sh debug
cd test
python3 -m ops.add_op_test 2>&1 | tee /tmp/tme_add.log
grep 'MUSA_KERNEL_DEBUG' /tmp/tme_add.log
```

如需强制保留颜色输出:

```bash
cd test
MUSA_KERNEL_DEBUG_COLOR=1 python3 -m ops.add_op_test
```

## 测试
Expand Down
39 changes: 20 additions & 19 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,66 +4,67 @@ set -e
# ============================================================================
# MUSA Plugin Build Script
# Usage:
# ./build.sh [release|debug]
# ./build.sh [debug|release]
#
# Examples:
# ./build.sh # Default: release mode
# ./build.sh debug # Debug mode with automatic kernel debug info
# ./build.sh release # Release mode (optimized)
# ./build.sh debug # Debug mode (kernel timing enabled)
# ============================================================================

# Parse build type from command line argument
BUILD_TYPE="${1:-release}"
BUILD_TYPE=$(echo "$BUILD_TYPE" | tr '[:upper:]' '[:lower:]')

case "$BUILD_TYPE" in
release)
CMAKE_BUILD_TYPE="Release"
MUSA_KERNEL_DEBUG="OFF"
debug)
CMAKE_BUILD_TYPE="Debug"
MUSA_KERNEL_DEBUG="1"
echo "=========================================="
echo "Building MUSA Plugin - RELEASE Mode"
echo "Building MUSA Plugin - DEBUG Mode"
echo "=========================================="
echo "Features:"
echo " • Optimized for performance (-O3)"
echo " • No debug overhead"
echo " - Automatic kernel debug info enabled"
echo " - Debug symbols included"
echo " - TensorFlow ABI/DCHECK compatibility preserved (-DNDEBUG)"
echo ""
;;
debug)
CMAKE_BUILD_TYPE="Debug"
MUSA_KERNEL_DEBUG="ON"
release)
CMAKE_BUILD_TYPE="Release"
MUSA_KERNEL_DEBUG="0"
echo "=========================================="
echo "Building MUSA Plugin - DEBUG Mode"
echo "Building MUSA Plugin - RELEASE Mode"
echo "=========================================="
echo "Features:"
echo " • Kernel timing instrumentation enabled"
echo " • TensorFlow ABI/DCHECK compatibility preserved (-DNDEBUG)"
echo " • Use env vars MUSA_TIMING_KERNEL_* to control output"
echo " - Optimized for performance (-O3)"
echo " - No debug overhead"
echo ""
;;
*)
echo "Error: Unknown build type '$BUILD_TYPE'"
echo "Usage: ./build.sh [release|debug]"
echo "Usage: ./build.sh [debug|release]"
echo ""
echo "Options:"
echo " debug - Debug build with automatic kernel debug info enabled"
echo " release - Optimized release build (default)"
echo " debug - Enable MUSA kernel debug/timing macros"
exit 1
;;
esac

# Clean previous build if needed
rm -rf build
# rm -rf build

mkdir -p build
cd build

echo "Configuring with CMake..."
echo " CMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE"
echo " MUSA_KERNEL_DEBUG=$MUSA_KERNEL_DEBUG"
echo ""

cmake .. \
-DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE \
-DMUSA_KERNEL_DEBUG=$MUSA_KERNEL_DEBUG \
-DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE \
-DPYTHON_EXECUTABLE=$(which python3) 2>&1 | tee cmake_output.log

echo ""
Expand Down
Loading
Loading