DGX Spark/GB10 - error sm_121a: dynamic shared memory size

Hi, 

i tried to port this to llama.cpp on a DGX Spark (GB10). 

There seems to be a fundamental hardware limitation on this hardware. 

i get an error in the prepare_h kernel:

tvm.error.InternalError: Failed to set the allowed dynamic shared
memory size to 196608 [192 KB]

GB10 specs (sm_121a): maxSharedMemoryPerBlockOptin = 99 KB (101376 bytes)

the kernel seems to be dimensioned on the sm_90 (hopper?) at 228Kb so it would not work on sm_121a at 99kb/block. 

Is this a confirmed issue? are you considering adding support?

I'm wondering if aggressive re-tiling would still bring benefits for these devices.

something like:
- num_stages = 2 → 1
- block_DV = 128 → 64 (-50% V tile)
- chunk_size = 64 → 32

could eventually fit in 80-85Kb/block, but this would require some concrete CUDA engineering and the biggest question is if the expected outcome would be worth the effort. 

anyone looking into this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DGX Spark/GB10 - error sm_121a: dynamic shared memory size #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DGX Spark/GB10 - error sm_121a: dynamic shared memory size #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions