-
Notifications
You must be signed in to change notification settings - Fork 176
Description
I've spent several days trying to compile a ConvLSTM3D-based Keras model with AWS NeuronX (TF 2.10.1, Neuron SDK latest). In my use case I do inference every 15 minutes (total inference time less than 15 seconds), so Inf2 would be great for me instead of keeping a GPU backed instance up 24/7, or waiting up to 10 minutes for it stop between inference calls.
The model uses ConvLSTM3D, Conv3D, Dense, BatchNorm, Embeddings, and a simple custom Temporal Attention layer.
According to the official Neuron Ops List, these ops are either directly supported or composed of supported ops.
I validated the model with analyze_model() and it reported ~78% of ops supported.
However, when actually tracing with tfnx.trace(), Neuron only compiles ~11-13% of the graph — even with minimal control flow and no unnecessary ops. This wouldn't work for production deployment.
Key concerns:
Discrepancy between analyze_model report and actual trace compile percentage
Poor support for common ops like LSTM/ConvLSTM or MatMul inside attention mechanisms
Lack of transparency on how control flow ops block entire graph segments from tracing
No clear documentation on how to structure models for better compilation rates
I’m happy to provide example code and the full trace logs if you want.
Thanks