fix(api): confine /tasks/file reads to OUTPUT_DIR (path traversal)#223
fix(api): confine /tasks/file reads to OUTPUT_DIR (path traversal)#223Osamaali313 wants to merge 2 commits into
Conversation
The GET /api/v1/tasks/file endpoint (read_file) returned the contents of any client-supplied path after only checking that it exists and is a file, with no confinement to an allowed directory. A request such as `/api/v1/tasks/file?path=/etc/passwd` (or a "../" traversal) would read and return any file readable by the service process -- an arbitrary file read. Resolve the requested path (collapsing "../" and symlinks) and require it to live under settings.OUTPUT_DIR -- the only location task outputs and uploads are written (uploads default to OUTPUT_DIR; task artifacts live in OUTPUT_DIR/<task_id>/). Legitimate files served by this endpoint (the OCR result images/markdown) are unaffected; out-of-tree paths now return 403.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR hardens the read_file API by preventing arbitrary file reads, confining accessible paths to the configured OUTPUT_DIR.
Changes:
- Resolve and validate requested file paths against
settings.OUTPUT_DIR - Return 400 for invalid paths and 403 for paths outside the allowed directory
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Confine reads to the configured output directory. Resolve the | ||
| # requested path (collapsing any "../" traversal and symlinks) and | ||
| # require it to live under OUTPUT_DIR; otherwise an arbitrary `path` | ||
| # (e.g. "/etc/passwd") would be read and returned, exposing any file | ||
| # readable by the service. | ||
| base_dir = Path(settings.OUTPUT_DIR).resolve() | ||
| try: | ||
| file_path = Path(path).resolve() | ||
| except (OSError, ValueError, RuntimeError): | ||
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Invalid path", | ||
| ) | ||
| if not file_path.is_relative_to(base_dir): | ||
| raise HTTPException( | ||
| status_code=status.HTTP_403_FORBIDDEN, | ||
| detail="Access denied: path is outside the allowed directory", | ||
| ) |
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Invalid path", | ||
| ) | ||
| if not file_path.is_relative_to(base_dir): |
Per review: Path(path).resolve() resolved a relative `path` against the process CWD, so a legitimate OUTPUT_DIR-relative request (e.g. "task1/img.png") would be rejected. Resolve non-absolute paths under OUTPUT_DIR before the containment check -- preserving the intended semantics while still blocking traversal and absolute out-of-tree paths.
|
Thanks @copilot — addressed in 62ade70. Relative paths vs CWD: good catch. Non-absolute paths are now joined to
|
Problem (arbitrary file read / path traversal)
GET /api/v1/tasks/file(read_fileinapps/backend/app/api/tasks.py) returns the contents of any path the client supplies, after checking only that it exists and is a file:There is no confinement to an allowed directory, so the endpoint reads and returns any file the service process can access:
For a self-hostable service this is an arbitrary file read — config files, source, SQLite DB, keys, etc. become readable by anyone who can reach the API.
Fix
Confine reads to
settings.OUTPUT_DIR, which is the only location this app writes the files this endpoint is meant to serve:OUTPUT_DIR(upload_file_manager.save_to_path),OUTPUT_DIR/<task_id>/(pipeline_flow).The requested path is resolved (collapsing
../and symlinks) and required to be underOUTPUT_DIR; otherwise it returns403.Validation
Verified the confinement decision against the real serving logic:
exists && is_file)OUTPUT_DIR/<task_id>/img.png?path=<file outside OUTPUT_DIR>?path=<OUTPUT_DIR>/../../secret(traversal)?path=/etc/hostname(orC:\Windows\win.ini)Legitimate use (the markdown image URLs this endpoint backs) is unchanged; only out-of-tree paths are rejected.