Skip to content

Security: EAddario/llama.cpp

Security

SECURITY.md

Security Policy

Reporting a vulnerability

If you have discovered a security vulnerability in this project that falls inside the covered topics, please report it privately. Do not disclose it as a public issue. This gives us time to work with you to fix the issue before public exposure, reducing the chance that the exploit will be used before a patch is released.

Please disclose it as a private security advisory.

A team of volunteers on a reasonable-effort basis maintains this project. As such, please give us at least 90 days to work on a fix before public exposure.

Important

For collaborators: if you are interested in helping out with reviewing privting security disclosures, please see: ggml-org#18080

Requirements

Before submitting your report, ensure you meet the following requirements:

  • You have read this policy and fully understand it.
  • AI is only permitted in an assistive capacity as stated in AGENTS.md. We do not accept reports that are written exclusively by AI.
  • Your report must include a working Proof-of-Concept in the form of a script and/or attached files.

Maintainers reserve the right to close the report if these requirements are not fulfilled.

Covered Topics

Only vulnerabilities that fall within these parts of the project are considered valid. For problems falling outside of this list, please report them as issues.

  • src/**/*
  • ggml/**/*
  • gguf-py/**/*
  • tools/server/*, excluding the following topics:
    • Web UI
    • Features marked as experimental
    • Features not recommended for use in untrusted environments (e.g., router, MCP)
    • Bugs that can lead to Denial-of-Service attack

Note that none of the topics under Using llama.cpp securely are considered vulnerabilities in LLaMA C++.

For vulnerabilities that fall within the vendor directory, please report them directly to the third-party project.

Using llama.cpp securely

Untrusted models

Be careful when running untrusted models. This classification includes models created by unknown developers or utilizing data obtained from unknown sources.

Always execute untrusted models within a secure, isolated environment such as a sandbox (e.g., containers, virtual machines). This helps protect your system from potentially malicious code.

Note

The trustworthiness of a model is not binary. You must always determine the proper level of caution depending on the specific model and how it matches your use case and risk tolerance.

Untrusted inputs

Some models accept various input formats (text, images, audio, etc.). The libraries converting these inputs have varying security levels, so it's crucial to isolate the model and carefully pre-process inputs to mitigate script injection risks.

For maximum security when handling untrusted inputs, you may need to employ the following:

  • Sandboxing: Isolate the environment where the inference happens.
  • Pre-analysis: Check how the model performs by default when exposed to prompt injection (e.g. using fuzzing for prompt injection). This will give you leads on how hard you will have to work on the next topics.
  • Updates: Keep both LLaMA C++ and your libraries updated with the latest security patches.
  • Input Sanitation: Before feeding data to the model, sanitize inputs rigorously. This involves techniques such as:
    • Validation: Enforce strict rules on allowed characters and data types.
    • Filtering: Remove potentially malicious scripts or code fragments.
    • Encoding: Convert special characters into safe representations.
    • Verification: Run tooling that identifies potential script injections (e.g. models that detect prompt injection attempts).

Data privacy

To protect sensitive data from potential leaks or unauthorized access, it is crucial to sandbox the model execution. This means running the model in a secure, isolated environment, which helps mitigate many attack vectors.

Untrusted environments or networks

If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions:

  • Do not use the RPC backend, rpc-server and llama-server functionality (see ggml-org#13061).
  • Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value.
  • Encrypt your data if sending it over the network.

Multi-Tenant environments

If you intend to run multiple models in parallel with shared memory, it is your responsibility to ensure the models do not interact or access each other's data. The primary areas of concern are tenant isolation, resource allocation, model sharing and hardware attacks.

  1. Tenant Isolation: Models should run separately with strong isolation methods to prevent unwanted data access. Separating networks is crucial for isolation, as it prevents unauthorized access to data or models and malicious users from sending graphs to execute under another tenant's identity.

  2. Resource Allocation: A denial of service caused by one model can impact the overall system health. Implement safeguards like rate limits, access controls, and health monitoring.

  3. Model Sharing: In a multitenant model sharing design, tenants and users must understand the security risks of running code provided by others. Since there are no reliable methods to detect malicious models, sandboxing the model execution is the recommended approach to mitigate the risk.

  4. Hardware Attacks: GPUs or TPUs can also be attacked. Researches has shown that side channel attacks on GPUs are possible, which can make data leak from other models or processes running on the same system at the same time.

There aren’t any published security advisories