Skip to content

DGX Spark/GB10 - error sm_121a: dynamic shared memory size #4

@croll83

Description

@croll83

Hi,

i tried to port this to llama.cpp on a DGX Spark (GB10).

There seems to be a fundamental hardware limitation on this hardware.

i get an error in the prepare_h kernel:

tvm.error.InternalError: Failed to set the allowed dynamic shared
memory size to 196608 [192 KB]

GB10 specs (sm_121a): maxSharedMemoryPerBlockOptin = 99 KB (101376 bytes)

the kernel seems to be dimensioned on the sm_90 (hopper?) at 228Kb so it would not work on sm_121a at 99kb/block.

Is this a confirmed issue? are you considering adding support?

I'm wondering if aggressive re-tiling would still bring benefits for these devices.

something like:

  • num_stages = 2 → 1
  • block_DV = 128 → 64 (-50% V tile)
  • chunk_size = 64 → 32

could eventually fit in 80-85Kb/block, but this would require some concrete CUDA engineering and the biggest question is if the expected outcome would be worth the effort.

anyone looking into this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions