Skip to content

[Graph][Optimization] Concat+cast fusion to improve performance #24

@shanzhou2186

Description

@shanzhou2186

concat+cast fusion optimization
Goal
Optimize performance through concat+cast fusion

Problem Description
In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.

Here is the step to reproduce the performance issue.

  • Collect timeline information with DLRM from modelzoo, "numactl -C 8-15 -l python train.py --steps 100 --timeline 49 --no_eval --interaction_op dot --bf16". You will find the timeline shows below.
    image

Requirement Details

  • Fusion 2 operators concat and cast into 1 operator. Both of the forward and backward operations need to be covered. And make sure it could be applied in the real models DLRM at least.
  • Follow grappler mechanism https://www.tensorflow.org/guide/graph_optimization
  • Unit test code and benchmark code are needed.

Test

  • Using DLRM to validate the performance gain. The performance data and analysis result could be described and reproduced.

Code Style and commit

  • C++ and python: Keep aligned with DeepRec code.

Maintain

  • All of the issue and bugs related with this op need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get better performance.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions