Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Paged Attention #333

@vikigenius

Description

@vikigenius

Just found a recent blog https://vllm.ai/ and repo https://github.com/vllm-project/vllm that implements paged attention. Tested this out and it provides massive throughput and memory efficiency improvements.

Can we implement something like this? The paper isn't out yet. But shouldn't Rust be very good at this in theory with it's memory safety guarantees.

Metadata

Metadata

Assignees

No one assigned

    Labels

    issue:enhancementNew feature or requesttopic:backend-supportSupport for alternate non-GGML backends, or for particular GGML backend features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions