sgl-project / mini-sglang Public

Notifications You must be signed in to change notification settings
Fork 475
Star 3.6k

Code
Issues 8
Pull requests 20
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: sgl-project/mini-sglang

Labels 9 Milestones 0

New pull request New

20 Open 56 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Feature] Enable TP size of 8 for Qwen3 MOE models

#95 opened Mar 4, 2026 by jiahe7ay

Loading…

refactor(tests): convert to pytest-style with integration markers

#94 opened Mar 4, 2026 by MisakaVan • Draft

[Fix] Fix OOM during weight loading with tensor parallelism

#93 opened Mar 4, 2026 by NikitosKh

Loading…

Add Mistral model support

#92 opened Mar 4, 2026 by NikitosKh

Loading…

[Chore] Make pre-commit happy

#91 opened Mar 4, 2026 by MisakaVan

Loading…

Fix: torch.AcceleratorError: CUDA error: an illegal memory access was encountered

#89 opened Mar 1, 2026 by itechbear

Loading…

[Fix] Fix TP sampler inconsistency bug

#85 opened Feb 26, 2026 by DarkSharpness

Loading…

[Feature] Support hierarchical cache

#82 opened Feb 24, 2026 by DarkSharpness

Loading…

Add graph replay dump tensor tool

#72 opened Jan 30, 2026 by wlc952

Loading…

Adding non-streaming response (stream=False)

#69 opened Jan 20, 2026 by goswamig

Loading…

feat: Add INT8 quantization support

#57 opened Dec 30, 2025 by louiswang524

Loading…

perf: Optimize CUDA graph batch size selection and padding

#56 opened Dec 30, 2025 by louiswang524

Loading…

feat: Implement batch tokenization for improved throughput

#55 opened Dec 30, 2025 by louiswang524

Loading…

[Refactor] Restructure test suite to match source layout and isolate benchmarks

#53 opened Dec 29, 2025 by DhiraPT

Loading…

[Feature] Add MLA configuration and KV cache storage kernel

#42 opened Dec 23, 2025 by DhiraPT

Loading…

[Education] Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)

#40 opened Dec 23, 2025 by lamng3

Loading…

[Feature] Implement variable page size support

#33 opened Dec 22, 2025 by DhiraPT

Loading…

docs: align README/features CLI examples with args.py

#29 opened Dec 21, 2025 by Taskrwu

Loading…

[Improvement] Enhance engine error handling and documentation add more logging and doc

#23 opened Dec 20, 2025 by louiswang524

Loading…

Request-scoped Torch profiler via profile flag.

#14 opened Dec 18, 2025 by AdamLouly

Loading…

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!