concat+cast fusion optimization
Goal
Optimize performance through concat+cast fusion
Problem Description
In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.
Here is the step to reproduce the performance issue.
- Collect timeline information with DLRM from modelzoo, "numactl -C 8-15 -l python train.py --steps 100 --timeline 49 --no_eval --interaction_op dot --bf16". You will find the timeline shows below.

Requirement Details
- Fusion 2 operators concat and cast into 1 operator. Both of the forward and backward operations need to be covered. And make sure it could be applied in the real models DLRM at least.
- Follow grappler mechanism https://www.tensorflow.org/guide/graph_optimization
- Unit test code and benchmark code are needed.
Test
- Using DLRM to validate the performance gain. The performance data and analysis result could be described and reproduced.
Code Style and commit
- C++ and python: Keep aligned with DeepRec code.
Maintain
- All of the issue and bugs related with this op need to be covered in the future.
Definition of Done
- Run successfully in DeepRec and could get better performance.
- Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.
concat+cast fusion optimization
Goal
Optimize performance through concat+cast fusion
Problem Description
In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.
Here is the step to reproduce the performance issue.
Requirement Details
Test
Code Style and commit
Maintain
Definition of Done