Skip to content

Conversation

zuochunwei
Copy link

Co-authored-by: AscendTransport[email protected]

Heterogeneous Ascend Transport Feature Implementation

Overview

This PR introduces the Heterogeneous Ascend Transport, a high-performance data transmission library designed for heterogeneous inference scenarios. Key features include:

Collaborative Heterogeneous Computing

910B NPU: Executes PREFILL operations

H20 GPU: Handles DECODE operations

Cross-Device KVCACHE Transfer

Efficient data exchange between NPU (910B) and GPU (H20) memory

Current version supports WRITE semantics (READ semantics will follow in future updates)

Key Changes

Build System:

Added USE_ASCEND_HETEROGENEOUS compilation flag to toggle the feature

Separate build configurations for PREFILL (910B) and DECODE (H20) sides

Core Functionality:

Implemented RDMA-based heterogeneous memory transfer

Added GPU Direct support for VRAM access

Configuration parameters: source, target_offset, opcode

Testing Framework:

New initiator test: transfer_engine_heterogeneous_ascend_perf_initiator.cpp

Reused rdma_transport_test.cpp as the target-side test

P2P handshake protocol with auto-port selection

Usage

Compilation Notes
PREFILL side (910B): Enable USE_ASCEND_HETEROGENEOUS and rebuild

DECODE side (H20): Use existing RDMA Transport with GPU Direct

Test Commands

bash

Target (H20)

./rdma_transport_test --mode=target --local_server_name=10.10.10.10 --metadata_server=P2PHANDSHAKE --operation=write --protocol=rdma --device_name=mlx5_1 --use_vram=true --gpu_id=0

Initiator (910B)

./transfer_engine_heterogeneous_ascend_perf_initiator --mode=initiator --local_server_name=10.10.10.10 --metadata_server=P2PHANDSHAKE --operation=write --npu_id=1 --segment_id=10.10.10.10:12345 --device_name=mlx5_1 --block_size=65536 --batch_size=128

Roadmap

Add READ semantics support

Optimize cross-device transfer performance

Extend to more heterogeneous computing scenarios

Co-authored-by: AscendTransport<[email protected]>
@@ -61,6 +61,7 @@ option(USE_NVMEOF "option for using NVMe over Fabric" OFF)
option(USE_TCP "option for using TCP transport" ON)
option(USE_ASCEND "option for using npu with HCCL" OFF)
option(USE_ASCEND_DIRECT "option for using ascend npu with adxl engine" OFF)
option(USE_ASCEND_HETEROGENEOUS "option for using Heterogeneous npu" OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description seems inconsistent with the title. How about changing it to "option for transferring between ascend npu and gpu"?

Copy link
Author

@zuochunwei zuochunwei Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Done

@ShangmingCai ShangmingCai requested a review from alogfans August 21, 2025 08:54
@@ -130,7 +130,7 @@ int TransferEnginePy::initializeExt(const char *local_hostname,
}

free_list_.resize(kSlabSizeKBTabLen);
#if !defined(USE_ASCEND) && !defined(USE_ASCEND_DIRECT)
#if !defined(USE_ASCEND) && !defined(USE_ASCEND_DIRECT) && !defined(USE_HETEROGENEOUS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -130,7 +130,8 @@ int TransferEnginePy::initializeExt(const char *local_hostname,
}

free_list_.resize(kSlabSizeKBTabLen);
#if !defined(USE_ASCEND) && !defined(USE_ASCEND_DIRECT)
#if !defined(USE_ASCEND) && !defined(USE_ASCEND_DIRECT) && \
!defined(USE_HETEROGENEOUS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be USE_ASCEND_HETEROGENEOUS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -30,6 +30,14 @@ if (USE_ASCEND)
)
endif()

if (USE_HETEROGENEOUS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since I don't have an env to verify, and the CI could not cover this part as well. So please CC @alogfans to double-check on the changes.

@zuochunwei
Copy link
Author

LGTM. Since I don't have an env to verify, and the CI could not cover this part as well. So please CC @alogfans to double-check on the changes.

@alogfans please double-check on the changes, thanks.

@@ -1,18 +0,0 @@
【替换命令】
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove these files? by accident?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of these files was not an accident. Here's the detailed explanation:

The files in the pkg directory were additional dependencies required for users of the CANN 8.1 version. However, the community has now updated and released the CANN 8.2 version, which natively includes all the content that was previously in the pkg directory. As a result, the contents under pkg are no longer necessary.

In the previous PR for AscendTransport titled [TransferEngine] Update to support CANN 8.2.RC1 #714, support for the CANN 8.2 version was already implemented. Therefore, during the current PR, we took the opportunity to remove the obsolete pkg files as part of the cleanup.

For users who need these dependencies, they can follow the official instructions to download the CANN 8.2 version from the community. The functionality remains fully consistent with what was provided in the pkg directory for CANN 8.1.

firstSubmit_ = false;
}

memcpy_mutex_.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advise to use std::lock_guard<>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// - Target side directly reuses RDMA Transport
// - Initiator side uses heterogeneous_rdma_transport
if (target_segment_desc->protocol == "rdma") {
proto = "ascend";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is a good idea. Our test program simply treats one node as target, and the other one as initiator. This is okay for P/D disaggregation. However, for a border area, a node can be presented as both source and target sides. Thus we can simply suppose HeterogeneousRdmaTransport is loaded in both sides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADXL TRANSPORT team has identified this issue and will address it in subsequent iterations. For HeterogeneousRdmaTransport, in the current PR, we first adapted and enabled the method of 910B actively writing to H20, and explained this limitation in the readme. The read semantics are still under development and not supported yet. So currently, we reused the RDMA TRANSPORT on the target side. The use of a unified HeterogeneousRdmaTransport will be modified after the read semantics adaptation is completed. By then, this change will be made as you requested. Please be informed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants