feat: complete project migration to modern tech stack (Python 3.13, FastAPI, PostgreSQL 16)#12
feat: complete project migration to modern tech stack (Python 3.13, FastAPI, PostgreSQL 16)#12
Conversation
There was a problem hiding this comment.
Pull request overview
Migrates the memU server project into a new Python 3.13/uv-managed FastAPI backend with PostgreSQL/pgvector, Alembic migrations, CI, and pre-commit tooling.
Changes:
- Introduces new FastAPI app initialization with env validation and basic
/,/memorize,/retrieveendpoints. - Adds PostgreSQL + pgvector + Temporal local infrastructure (docker-compose) and Alembic migration scaffolding.
- Adds CI workflow, pre-commit configuration, Makefile automation, and initial pytest coverage.
Reviewed changes
Copilot reviewed 21 out of 34 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_health.py | Adds a basic root endpoint health test. |
| tests/test_env_validation.py | Adds tests verifying startup env var validation behavior. |
| tests/init.py | Marks tests as a package. |
| storage/.gitkeep | Keeps storage directory tracked in git. |
| pyproject.toml | Defines Python 3.13 project deps and tool configuration (ruff/pytest/bandit/pylint/mypy). |
| docker-compose.yml | Adds local Postgres (pgvector) and Temporal services. |
| config/init.py | Initializes config package. |
| app/workers/init.py | Initializes workers package. |
| app/utils/init.py | Initializes utils package. |
| app/services/init.py | Initializes services package. |
| app/models/init.py | Initializes models package. |
| app/main.py | Implements FastAPI app, env validation, and endpoints wired to MemoryService. |
| app/database.py | Adds SQLAlchemy async engine/session setup from env. |
| app/api/v1/init.py | Initializes versioned API package. |
| app/api/init.py | Initializes API package. |
| alembic/script.py.mako | Adds Alembic revision template. |
| alembic/env.py | Adds Alembic environment configuration for migrations/autogenerate. |
| alembic.ini | Adds Alembic configuration file. |
| README.md | Rewrites documentation for the new architecture, setup, and workflows. |
| Makefile | Adds common developer commands (sync, lint, test, migrate, pre-commit). |
| .python-version | Pins local Python version to 3.13. |
| .pre-commit-hooks/check_test_coverage.py | Adds a custom pre-push coverage gate hook. |
| .pre-commit-hooks/check_python_credentials.py | Adds custom scanner for hardcoded credentials in Python sources. |
| .pre-commit-hooks/check_markdown_formatting.py | Adds Markdown formatting validation hook. |
| .pre-commit-hooks/check_makefile_commands.py | Adds Makefile command validation hook. |
| .pre-commit-hooks/check_config_credentials.py | Adds config credential scanning hook. |
| .pre-commit-hooks/check_alembic_migrations.py | Adds hook to prevent empty Alembic migrations. |
| .pre-commit-config.yaml | Adds comprehensive pre-commit configuration (ruff, bandit, local hooks, etc.). |
| .gitignore | Updates ignore rules for venvs, env files, logs, storage/data, etc. |
| .github/workflows/code-quality.yml | Adds GitHub Actions quality pipeline (format/lint/tests/coverage upload). |
| .github/ISSUE_TEMPLATE/improvement_suggestion.yml | Normalizes issue template formatting. |
| .github/ISSUE_TEMPLATE/feature_request.yml | Normalizes issue template formatting. |
| .github/ISSUE_TEMPLATE/bug_report.yml | Normalizes issue template formatting. |
| .env.example | Adds example env file for DB/Temporal/LLM/embedding/storage configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - POSTGRES_USER=memu_user | ||
| - POSTGRES_PWD=memu_pass | ||
| - POSTGRES_SEEDS=postgres | ||
| - POSTGRES_DB=temporal |
There was a problem hiding this comment.
The Temporal container hardcodes PostgreSQL credentials/database name, while the postgres service allows overrides via environment variables. If POSTGRES_USER/POSTGRES_PASSWORD/POSTGRES_DB are overridden, Temporal will still try memu_user/memu_pass/temporal and fail to connect. Use the same variable references/defaults here (and consider aligning POSTGRES_DB if both services share the same cluster).
| - POSTGRES_USER=memu_user | |
| - POSTGRES_PWD=memu_pass | |
| - POSTGRES_SEEDS=postgres | |
| - POSTGRES_DB=temporal | |
| - POSTGRES_USER=${POSTGRES_USER:-memu_user} | |
| - POSTGRES_PWD=${POSTGRES_PASSWORD:-memu_pass} | |
| - POSTGRES_SEEDS=postgres | |
| - POSTGRES_DB=${POSTGRES_DB:-temporal} |
| test: | ||
| @pytest -v || echo "⚠️ No tests found or tests failed" |
There was a problem hiding this comment.
This target swallows pytest failures (the echo makes the recipe succeed). As a result make test/make check can report success even when tests fail. Remove the fallback || echo ... or make the command return a non-zero exit status when pytest fails.
| ruff check . | ||
| @echo "🔍 Running pylint..." | ||
| pylint app/ --fail-under=8.0 || true |
There was a problem hiding this comment.
This also hides pylint failures (|| true), so make lint/make check can pass even when pylint is below the configured threshold. If pylint is meant to be advisory, it should be clearly separated; otherwise remove || true so the target fails on lint errors.
| raise RuntimeError( | ||
| "DATABASE_URL environment variable is not set. " | ||
| "Please set DATABASE_URL to your PostgreSQL connection string. " | ||
| "Example: postgresql+psycopg://user:pass@localhost:54320/dbname" | ||
| ) | ||
|
|
There was a problem hiding this comment.
The app startup validation requires DATABASE_URL, but the repo also documents/supports configuring the DB via individual DATABASE_HOST/PORT/USER/PASSWORD/NAME variables (see README and app/database.py). Consider reusing the same DB URL resolution logic here (accept DATABASE_URL or construct from the individual vars) so configuration is consistent across the project.
| raise RuntimeError( | |
| "DATABASE_URL environment variable is not set. " | |
| "Please set DATABASE_URL to your PostgreSQL connection string. " | |
| "Example: postgresql+psycopg://user:pass@localhost:54320/dbname" | |
| ) | |
| db_host = os.getenv("DATABASE_HOST") | |
| db_port = os.getenv("DATABASE_PORT", "5432") | |
| db_user = os.getenv("DATABASE_USER") | |
| db_password = os.getenv("DATABASE_PASSWORD") | |
| db_name = os.getenv("DATABASE_NAME") | |
| if all([db_host, db_user, db_password, db_name]): | |
| database_url = ( | |
| f"postgresql+psycopg://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}" | |
| ) | |
| else: | |
| raise RuntimeError( | |
| "Database configuration is not set correctly. " | |
| "Either set DATABASE_URL to your PostgreSQL connection string " | |
| "(e.g. postgresql+psycopg://user:pass@localhost:54320/dbname) " | |
| "or set DATABASE_HOST, DATABASE_PORT, DATABASE_USER, " | |
| "DATABASE_PASSWORD, and DATABASE_NAME environment variables." | |
| ) |
321d1b2 to
2ffe5d5
Compare
2ffe5d5 to
05a3c2b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 34 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
app/main.py:66
- This async endpoint performs synchronous disk I/O (
Path.open()+json.dump()), which can block the event loop under load. Consider using async file I/O (e.g., via an async file library) or offloading the write to a threadpool to keep request handling responsive.
@app.post("/memorize")
async def memorize(payload: dict[str, Any]):
try:
file_path = storage_dir / f"conversation-{uuid.uuid4().hex}.json"
with file_path.open("w", encoding="utf-8") as f:
json.dump(payload, f, ensure_ascii=False)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| DATABASE_URL = os.getenv("DATABASE_URL") | ||
| if not DATABASE_URL: | ||
| db_host = os.getenv("DATABASE_HOST") | ||
| db_port = os.getenv("DATABASE_PORT") | ||
| db_user = os.getenv("DATABASE_USER") | ||
| db_pass = os.getenv("DATABASE_PASSWORD") | ||
| db_name = os.getenv("DATABASE_NAME") | ||
|
|
||
| missing_vars = [ | ||
| name | ||
| for name, value in [ | ||
| ("DATABASE_HOST", db_host), | ||
| ("DATABASE_PORT", db_port), | ||
| ("DATABASE_USER", db_user), | ||
| ("DATABASE_PASSWORD", db_pass), | ||
| ("DATABASE_NAME", db_name), | ||
| ] | ||
| if not value | ||
| ] | ||
|
|
||
| if missing_vars: | ||
| raise RuntimeError( | ||
| f"Database configuration is incomplete. Missing environment variables: {', '.join(missing_vars)}" | ||
| ) | ||
|
|
||
| DATABASE_URL = f"postgresql+asyncpg://{db_user}:{db_pass}@{db_host}:{db_port}/{db_name}" | ||
|
|
||
| # Create SQLAlchemy async engine | ||
| engine = create_async_engine( | ||
| DATABASE_URL, | ||
| pool_pre_ping=True, | ||
| pool_size=10, | ||
| max_overflow=20, | ||
| ) |
There was a problem hiding this comment.
DATABASE_URL is constructed with the postgresql+asyncpg:// dialect, but the project dependencies/lockfile don’t include asyncpg, so create_async_engine() will fail at runtime. Either add asyncpg as a dependency (and lock it) or switch the constructed URL to use the already-declared psycopg driver (e.g., postgresql+psycopg://) to match the rest of the repo.
| # Import app models early for autogenerate support | ||
| from app.database import Base | ||
|
|
There was a problem hiding this comment.
Importing Base from app.database triggers DB env validation and creates an async engine at import time, which can break Alembic commands (especially offline mode) and makes migrations depend on runtime DB settings. Consider moving Base/metadata to a side-effect-free module (no engine creation) and have env.py import only metadata/models.
| # Set required environment variables for testing before importing app | ||
| os.environ["OPENAI_API_KEY"] = "test-key-for-testing" | ||
| os.environ["DATABASE_URL"] = "postgresql+psycopg://test_user:test_pass@localhost:54320/test_db" | ||
|
|
||
| from app.main import app # noqa: E402 # pylint: disable=wrong-import-position | ||
|
|
||
| client = TestClient(app) |
There was a problem hiding this comment.
This test sets environment variables at import time and never restores them, which can leak state across tests and also overrides CI-provided values (e.g., the workflow sets DATABASE_URL=sqlite:///:memory:). Prefer using monkeypatch/fixtures to set env vars per test (or session) and cleanly restore them afterward.
| # Clean up individual vars | ||
| for var in ["DATABASE_HOST", "DATABASE_PORT", "DATABASE_USER", "DATABASE_PASSWORD", "DATABASE_NAME"]: | ||
| if var in os.environ and not locals().get(f"original_{var.lower().replace('database_', '')}"): | ||
| del os.environ[var] |
There was a problem hiding this comment.
The cleanup loop deletes DATABASE_* vars based on locals().get(...), but this test never saved the original values for DATABASE_HOST, DATABASE_PORT, etc. As a result, it can delete pre-existing env vars from the test runner and affect other tests. Save/restore originals for each var (or use monkeypatch.delenv/setenv) instead of relying on locals() here.
05a3c2b to
14612c1
Compare
14612c1 to
3590b3d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 28 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| original_key = os.environ.get("OPENAI_API_KEY") | ||
| original_db = os.environ.get("DATABASE_URL") | ||
|
|
||
| try: | ||
| # Set valid API key and individual DB variables | ||
| os.environ["OPENAI_API_KEY"] = "test-key" | ||
| if "DATABASE_URL" in os.environ: | ||
| del os.environ["DATABASE_URL"] | ||
| os.environ["DATABASE_HOST"] = "localhost" | ||
| os.environ["DATABASE_PORT"] = "54320" | ||
| os.environ["DATABASE_USER"] = "test_user" | ||
| os.environ["DATABASE_PASSWORD"] = "test_pass" | ||
| os.environ["DATABASE_NAME"] = "test_db" | ||
|
|
||
| # Remove app.main from sys.modules to force reimport | ||
| if "app.main" in sys.modules: | ||
| del sys.modules["app.main"] | ||
|
|
||
| # Should not raise | ||
| from app.main import app | ||
|
|
||
| assert app is not None | ||
| assert app.title == "memU Server" | ||
|
|
||
| finally: | ||
| # Restore original values | ||
| if original_key: | ||
| os.environ["OPENAI_API_KEY"] = original_key | ||
| else: | ||
| if "OPENAI_API_KEY" in os.environ: | ||
| del os.environ["OPENAI_API_KEY"] | ||
| if original_db: | ||
| os.environ["DATABASE_URL"] = original_db | ||
| else: | ||
| if "DATABASE_URL" in os.environ: | ||
| del os.environ["DATABASE_URL"] | ||
| # Clean up individual vars | ||
| for var in ["DATABASE_HOST", "DATABASE_PORT", "DATABASE_USER", "DATABASE_PASSWORD", "DATABASE_NAME"]: | ||
| if var in os.environ and not locals().get(f"original_{var.lower().replace('database_', '')}"): | ||
| del os.environ[var] | ||
| # Clean up module cache | ||
| if "app.main" in sys.modules: | ||
| del sys.modules["app.main"] |
There was a problem hiding this comment.
This test sets DATABASE_HOST/PORT/USER/PASSWORD/NAME but does not save and restore any pre-existing values for those variables. The finally block then deletes them unconditionally (because the corresponding original_* locals don't exist), which can leak state across tests (and can break local runs where these env vars are set). Capture originals (including DATABASE_PORT) and restore them, or use monkeypatch to isolate env changes.
| from fastapi.testclient import TestClient | ||
|
|
||
| # Set required environment variables for testing before importing app | ||
| os.environ["OPENAI_API_KEY"] = "test-key-for-testing" | ||
| os.environ["DATABASE_URL"] = "postgresql+psycopg://test_user:test_pass@localhost:54320/test_db" | ||
|
|
||
| from app.main import app # noqa: E402 # pylint: disable=wrong-import-position | ||
|
|
||
| client = TestClient(app) | ||
|
|
||
|
|
||
| def test_root_endpoint(): |
There was a problem hiding this comment.
The test module mutates process environment variables at import time and keeps a global TestClient. This can create order-dependent behavior and leak env settings into other tests. Prefer using a fixture (e.g., monkeypatch) to set env vars per test and construct the TestClient inside the test/fixture so state is isolated.
| from fastapi.testclient import TestClient | |
| # Set required environment variables for testing before importing app | |
| os.environ["OPENAI_API_KEY"] = "test-key-for-testing" | |
| os.environ["DATABASE_URL"] = "postgresql+psycopg://test_user:test_pass@localhost:54320/test_db" | |
| from app.main import app # noqa: E402 # pylint: disable=wrong-import-position | |
| client = TestClient(app) | |
| def test_root_endpoint(): | |
| import pytest | |
| from fastapi.testclient import TestClient | |
| @pytest.fixture | |
| def client(monkeypatch) -> TestClient: | |
| """Provide a TestClient with test-specific environment variables.""" | |
| monkeypatch.setenv("OPENAI_API_KEY", "test-key-for-testing") | |
| monkeypatch.setenv( | |
| "DATABASE_URL", | |
| "postgresql+psycopg://test_user:test_pass@localhost:54320/test_db", | |
| ) | |
| # Import the app after setting environment variables so it picks up test config | |
| from app.main import app # noqa: E402 # pylint: disable=wrong-import-position | |
| return TestClient(app) | |
| def test_root_endpoint(client: TestClient): |
Major changes: - Complete project migration from original repository - Python 3.13 environment with uv package manager - FastAPI + SQLModel + PostgreSQL 16 + pgvector - Temporal workflow engine integration - Comprehensive pre-commit hooks (20 checks) * Code formatting: ruff format * Linting: ruff, pylint (9.84/10) * Security: bandit * Custom validators: credentials, migrations, makefile, markdown - Test coverage: 41% with automated checks - GitHub Actions CI/CD pipeline - Environment variable validation at startup - All security best practices enforced - Complete English documentation
3590b3d to
2766fb9
Compare
Major changes: