Skip to content

fchange/openai-embedding-bge-small

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Local BGE Embeddings API

Use the BAAI/bge-small-zh-v1.5 model locally through an OpenAI-compatible /v1/embeddings endpoint powered by FastAPI.

1. Environment Setup

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

2. Configure Hugging Face Mirror & Cache

The service defaults to the Hugging Face China mirror (https://hf-mirror.com). Override it before the first run if you prefer a different mirror.

export HF_ENDPOINT=https://hf-mirror.com         # or your preferred mirror URL
export EMBEDDING_CACHE_DIR=$(pwd)/model_cache    # persistent local cache

You can pre-download the model once (optional but recommended):

python - <<'PY'
from sentence_transformers import SentenceTransformer
SentenceTransformer("BAAI/bge-small-zh-v1.5", cache_folder="$EMBEDDING_CACHE_DIR", device="cpu")
PY

3. Run the API Service

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

FastAPI will serve interactive docs at http://localhost:8000/docs.

4. OpenAI-Compatible Request Example

curl -X POST "http://localhost:8000/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "bge-small-zh-v1.5",
        "input": ["今天天气很好", "自然语言处理"],
        "user": "demo-user"
      }'

Response excerpt:

{
  "object": "list",
  "data": [
    {"object": "embedding", "index": 0, "embedding": [...]},
    {"object": "embedding", "index": 1, "embedding": [...]} 
  ],
  "model": "bge-small-zh-v1.5",
  "usage": {"prompt_tokens": 9, "total_tokens": 9}
}

5. Configuration Reference

  • EMBEDDING_MODEL_NAME: switch to a different SentenceTransformer checkpoint.
  • EMBEDDING_DEVICE: set to cuda, mps, etc. Defaults to CPU.
  • EMBEDDING_BATCH_SIZE: control batch size for encode().
  • EMBEDDING_CACHE_DIR: persistent model/cache directory (also reused for Hugging Face cache when provided).

⚠️ Token usage in the response is a simple heuristic (character count based). Integrate your tokenizer if you require exact counts.

About

openai-embedding-bge-small

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages