This project implements an advanced fraud detection system for credit card transactions. It leverages machine learning, real-time data streaming, and cloud technologies to detect and flag potentially fraudulent transactions before they happen. The system is designed to be highly scalable and capable of operating in real-time.
- Apache Kafka (Data Ingestion)
- Apache Spark (Real-Time Stream Processing)
- AWS Lambda (Fraud Alert Notifications)
- TensorFlow (Machine Learning Model for Fraud Detection)
- DBT (Data Transformation)
- AWS S3 ( Data Storage)
- Docker (Containerization)
Clone the repository to your local machine:
git
cd Install the required dependencies using pip:
pip install -r requirements.txtEnsure Apache Kafka is installed and running:
bin/zookeeper-server-start.sh config/zookeeper.properties &
bin/kafka-server-start.sh config/server.propertiesRun the following components:
-
Run Kafka Producer (Simulating Real-Time Transactions):
python kafka_producer.py
-
Run Kafka Consumer (Processing Transactions):
python kafka_consumer.py
-
Start Spark Processing for Fraud Detection:
python spark_processing.py
-
Run Fraud Detection Model:
python fraud_detection_model.py
-
Trigger AWS Lambda for Fraud Alerts:
python lambda_trigger.py
To deploy the entire system using Docker:
docker-compose up --buildThis starts the Kafka producer, consumer, and other system components.
- Reduced fraud losses by up to 40%.
- Improved customer trust and retention through proactive fraud detection.
- Reduced manual intervention in fraud detection processes.
To test the fraud detection pipeline:
- Run the Kafka producer to simulate transaction data.
- Start the Kafka consumer to consume the data.
- Observe Spark streaming console output for fraud detection logs.
- If a fraudulent transaction is detected, AWS Lambda triggers an alert via email or SMS.
