A premium CSV operations workspace for cleaning, transforming, enriching, verifying, and recovering datasets at speed.
Built for ops, revops, lead-gen, research, and internal tooling teams who live in spreadsheets but need something far more powerful.
DataForge AI combines a polished Next.js frontend with a local verification sidecar so teams can:
- upload messy CSVs and inspect them instantly
- clean and normalize data inside a focused workspace
- enrich rows through AI-assisted flows
- verify email lists with a dedicated backend
- save encrypted projects to Firebase and reopen them later
This repo is best thought of as a product foundation for a modern data-ops console, not just a CSV parser.
Most CSV tools force you to choose between speed, control, and extensibility. DataForge AI is built to deliver all three:
- Fast UX: drag-and-drop upload, immediate parsing, focused workspace controls
- Serious persistence: Firebase auth plus encrypted Firestore project storage
- Real operations value: local email verification backend, queue controls, result downloads
- Product-ready structure: clear frontend/backend split with API surfaces already in place
| Area | What it does | Status |
|---|---|---|
| Workspace | Upload CSVs, inspect rows, track columns, edit dataset state | Stable |
| Cleaning | Trim whitespace, title case, deduplicate, sort, type conversion | Stable |
| Saved Projects | Google sign-in, AES encryption, Firestore storage, reload later | Stable |
| Verification | Upload verification CSVs, detect email column, run and manage jobs | Stable |
| AI Enrichment | Model selection, prompt modal, transform API surface | Partially wired |
- Drag-and-drop CSV ingestion
- Table-based data exploration
- Zustand-powered client state
- Responsive workspace layout with dedicated tool panels
- Trim leading and trailing whitespace
- Convert text to title case
- Remove duplicate rows
- Sort rows by selected columns
- Convert values between string and numeric types
- Sign in with Google via Firebase Auth
- Encrypt CSV payloads before persistence
- Chunk large encrypted payloads to respect Firestore size limits
- Recover projects from the dashboard
- Local Express sidecar for verification workflows
- CSV upload with email-column detection
- Queue controls for pause, resume, stop, and clear
- Pollable status and paginated results endpoints
- Result export via CSV download URLs
- Local browser storage for OpenAI API key usage
- Model discovery and selection
- Next.js API routes for transforms and model listing
Note: the AI UI is ahead of the current execution wiring. The modal and backend-facing surface exist, but some enrichment buttons still operate as scaffolded flows and placeholders rather than fully completed production jobs.
- Next.js 16
- React 19
- TypeScript
- Tailwind CSS 4
- Framer Motion
- Zustand
- Papa Parse
- Firebase Auth
- Firestore
- Firebase Storage
- OpenAI SDK
crypto-js
- Express 5
- SQLite
- Multer
- Winston
- Jest
app/ Next.js App Router pages and API routes
app/api/lead-ops/ Same-origin proxy to the Lead Ops FastAPI backend
app/lead-ops/ Lead Ops product route
components/ Product UI and interaction surfaces
context/ Auth provider and shared React context
lib/ Firebase, crypto, store, CSV, and backend clients
brandnav_backend/ Local sidecar for email verification
lead_ops_backend/ FastAPI + Celery service for lead ingestion workflows
public/ Static assets and README visuals
samples/ Example CSV inputs for local testing and demos
npm install
npm run install:backendCreate .env.local in the repo root:
NEXT_PUBLIC_FIREBASE_API_KEY=...
NEXT_PUBLIC_FIREBASE_AUTH_DOMAIN=...
NEXT_PUBLIC_FIREBASE_PROJECT_ID=...
NEXT_PUBLIC_FIREBASE_STORAGE_BUCKET=...
NEXT_PUBLIC_FIREBASE_MESSAGING_SENDER_ID=...
NEXT_PUBLIC_FIREBASE_APP_ID=...
NEXT_PUBLIC_ENCRYPTION_KEY=replace-this-in-real-environments
NEXT_PUBLIC_BRANDNAV_URL=http://localhost:5000
NEXT_PUBLIC_LEAD_OPS_API_BASE_URL=/api/lead-ops
LEAD_OPS_INTERNAL_URL=http://localhost:8000/api/v1Create brandnav_backend/.env:
PORT=5000
CORS_ORIGIN=http://localhost:3000
DB_PATH=.sql/user_auth.db
SESSION_SECRET=replace-with-a-long-random-secret
SESSION_MAX_AGE=86400000
ADMIN_EMAIL=your-email@example.com
ADMIN_PASSWORD=change-me
MAX_CSV_ROWS=100000
MAX_CSV_SIZE_MB=100
MX_DOMAIN=your-mail-from-domain
EM_DOMAIN=your-envelope-domainnpm run devThis starts:
- frontend on
http://localhost:3000 - verifier backend on
http://localhost:5000
Use the sample imports in samples/lead-imports/ if you want quick demo data without cluttering the repo root.
Run the whole website stack, including the Next.js app, verifier backend, and Lead Ops services:
docker compose up --buildThis brings up:
- website on
http://localhost:3000 - verifier backend on
http://localhost:5000 - Lead Ops API on
http://localhost:8000 - Lead Ops Postgres on
localhost:5432 - Lead Ops Redis on
localhost:6379
The Docker setup defaults to the public no-login flow. Firebase values are optional, and Lead Ops basic auth is disabled by default so the catalog-to-product handoff works immediately.
If one of the default host ports is already taken on your machine, copy .env.example to .env and change WEB_PORT, VERIFIER_PORT, LEAD_OPS_API_PORT, LEAD_OPS_POSTGRES_PORT, or LEAD_OPS_REDIS_PORT before starting compose.
npm run dev
npm run dev:next
npm run dev:backend
npm run build
npm run start
npm run lintcd brandnav_backend
npm run dev
npm run test
npm run test:watch
npm run test:coverage
npm run type-check- Sign in with Google.
- Upload a CSV into the workspace.
- Clean the dataset with built-in controls.
- Run verification if the file contains emails.
- Save the transformed result as an encrypted project.
- Reopen prior datasets from the dashboard whenever needed.
POST /api/transformGET /api/models
GET /api/verifier/healthPOST /api/verifier/csv/uploadPOST /api/verifier/csv/verifyGET /api/verifier/verification/:id/statusGET /api/verifier/verification/:id/resultsPOST /api/verifier/verification/:id/pausePOST /api/verifier/verification/:id/resumePOST /api/verifier/verification/:id/stopPOST /api/verifier/verification/queue/clear
- Project data is AES-encrypted before being written to Firestore.
- OpenAI API keys are stored in browser local storage, not on the server.
- The backend uses Helmet, CORS controls, and session protections.
NEXT_PUBLIC_ENCRYPTION_KEYshould always be set explicitly outside local experiments.- The backend exposes a development shutdown route when not in production; keep it private to trusted environments.
- polished authenticated UI shell
- fast CSV ingestion and workspace transitions
- encrypted Firestore project persistence
- integrated verifier sidecar surface
- clean separation between frontend product code and backend processing engine
- fully wire the AI enrichment buttons to the transform pipeline
- add production deployment docs for frontend plus verifier sidecar
- add
.env.examplefiles for both app layers - clean the repository’s macOS
._*metadata artifacts - add screenshots from a live seeded environment once dependencies are installed
This is a great foundation for teams building:
- lead list preparation tools
- internal CSV cleanup consoles
- outbound personalization pipelines
- email hygiene and verification products
- AI-assisted spreadsheet operations
If you extend this repo, keep the bar high:
- preserve the product-grade UI quality
- keep frontend and sidecar responsibilities cleanly separated
- document any new env vars and routes
- prefer shipping complete user-facing flows over partial abstractions
No license is currently defined at the root of the project. Add one before public distribution, client delivery, or commercial reuse.