Skip to content

arjunprakash027/AutoFlux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoFlux-lite

A end to end workflow to run ingestion of data to pre-processing to training to deployment of an ML model

For more indepth docs on particular services:

Download the repository

The most easiest way to bring the repository into your local is to run this code

curl https://www.arjunrao.space/templates > temp.sh && bash temp.sh AutoFlux && rm -rf temp.sh

This is lite version of the setup and in active development

For the spark version (not in active development) change the branch to main

ML

Ingestion & Transformation

A very basic architecture of environment setup using AutoFlux-lite AutoFlux Lite Architecture

Overview of the architecture

This is a lightweight version of a larger architecture that originally involved Spark, Hive, PostgreSQL, and Delta Lake. Instead of relying on these heavyweight components, this version leverages DuckDB—an embedded OLAP database—for efficient data storage and processing while keeping the system low on power consumption and compute requirements.

How It Works (Refer to the Architecture Diagram)

1. Transformation & Ingestion:

• Raw data is ingested and processed inside the dbt container. • After transformation, the cleaned data is stored in DuckDB, acting as the shared storage layer.

2. Machine Learning Pipeline:

• The ML container fetches the transformed data from DuckDB. • Data is further cleaned and preprocessed inside the ML container. • MLflow is used to: • Track experiments. • Log metrics and artifacts. • Store model versions for reproducibility.

3. Outputs:

• A model accuracy and experiment dashboard for evaluation. • A final trained model artifact ready for deployment.

This setup is ideal for environments with limited compute resources, making it accessible for local development, edge devices, and low-power machines while maintaining an efficient ML pipeline.

Step-by-Step Usage

Bring everything up

bash compose_build.sh

Will build and run every container

Verify the Setup

Check that all containers are running:

docker ps

You should see:

image

Run transformation

docker exec -it transformation bash

You'll get access to a transformation container where you can execute DBT commands. By default, the seed command runs automatically, and you can trigger the transformation process using dbt build, which will execute everything.

Stopping the Containers

To stop the entire setup:

docker-compose down

To remove all volumes and networks:

docker-compose down -v

About

A end to end workflow to run ingestion of data to pre-processing to training to deployment of an ML model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •