[Bug]: /health endpoint blocked by FastAPI lifespan during service.initialize(), violating documented liveness semantics

### Bug Description

Per [`docs/en/guides/05-observability.md`](docs/en/guides/05-observability.md), `/health` is documented as a "simple liveness check" responding `{"status": "ok"}`. However, the endpoint is gated behind the ASGI lifespan protocol — `service.initialize()` in `app.py` runs before `yield`, blocking ALL HTTP request processing, including `/health`, until initialization completes. When initialization triggers expensive collection recovery on existing workspace data, the server listens on port 1933 but does not respond to HTTP for minutes, causing the Docker entrypoint's health-check loop to time out and kill the still-initializing server.

### Steps to Reproduce

1. Deploy OpenViking v0.3.10 via Docker with an existing workspace volume containing populated vector data
2. Start the container: `docker compose up -d`
3. While the server is initializing, run: `curl http://localhost:1933/health`
4. Observe: `curl` hangs (no response) until initialization completes or entrypoint kills the server
5. Container restarts → loop repeats

### Expected Behavior

`/health` should respond as soon as the HTTP server is accepting connections — matching the documented "simple liveness check" contract.

### Actual Behavior

`/health` does not respond during `service.initialize()`. When the entrypoint's health-check timeout fires, it sends SIGTERM to the still-initializing server:

```
OpenViking HTTP Server is running on 0.0.0.0:1933
  ...
  File "openviking/storage/vectordb/utils/data_processor.py", line 339
    json.dumps(converted, ensure_ascii=False)
  File "openviking/utils/process_lock.py", line 120
    signal.SIGTERM → KeyboardInterrupt
[openviking-console-entrypoint] openviking-server exited before becoming healthy
```

### Error Logs

```shell
2026-04-29 06:26:53,388 - uvicorn.error - ERROR - Traceback (most recent call last):
  File "/app/.venv/lib/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  ...
  File "openviking/server/app.py", line 78, in lifespan
    await service.initialize()
  File "openviking/service/core.py", line 258, in initialize
    await init_context_collection(self._vikingdb_manager)
  ...
  File "openviking/storage/vectordb/collection/local_collection.py", line 996, in _recover
    index.upsert_data(upsert_list)
  File "openviking/storage/vectordb/utils/data_processor.py", line 339, in convert_fields_for_index
    return json.dumps(converted, ensure_ascii=False)
  File "openviking/utils/process_lock.py", line 120, in <lambda>
    signal.SIGTERM, lambda sig, frame: (_cleanup(), signal.default_int_handler(sig, frame))
KeyboardInterrupt
[openviking-console-entrypoint] openviking-server exited before becoming healthy
```

### OpenViking Version

v0.3.10

### Python Version

3.13

### Operating System

Linux (Debian, Docker)

### Additional Context

**Root cause**: `openviking/server/app.py:146-153` places the entire `service.initialize()` before `yield` in the ASGI lifespan. The slowest step is `init_context_collection()` at `core.py:265`, which triggers `PersistCollection._recover()` — duration scales with existing workspace data volume.

**Proposed fix**: Separate liveness from readiness by splitting `initialize()` into two phases:

1. **`init_essentials()`** (before `yield`, seconds): storage managers, embedder client, encryption setup
2. **`init_deferred()`** (after `yield`, background task): collection recovery, VikingFS init, queue workers

`/health` responds `{"status": "starting"}` (HTTP 503) immediately after the server starts, then `{"status": "ok"}` (HTTP 200) when deferred init completes. The Docker entrypoint polls for `"status":"ok"` rather than just HTTP 200.

**Secondary issue**: `openviking/utils/process_lock.py:120` uses `signal.default_int_handler` (SIGINT handler) for SIGTERM, raising `KeyboardInterrupt` instead of clean shutdown:

```diff
- signal.signal(signal.SIGTERM, lambda sig, frame: (_cleanup(), signal.default_int_handler(sig, frame)))
+ signal.signal(signal.SIGTERM, lambda sig, frame: (_cleanup(), sys.exit(0)))
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: /health endpoint blocked by FastAPI lifespan during service.initialize(), violating documented liveness semantics #1793

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Logs

OpenViking Version

Python Version

Operating System

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: /health endpoint blocked by FastAPI lifespan during service.initialize(), violating documented liveness semantics #1793

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Logs

OpenViking Version

Python Version

Operating System

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions