Code Grader Feedback

Overview

Code Grader Feedback is an automated code evaluation system developed for Fusemachine projects. It provides detailed feedback on code written in various programming languages, including Python, C, C++, JavaScript, Rust, and SQL. The system leverages the Meta LLaMA model, fine-tuned with LoRA (Low-Rank Adaptation), for efficient code analysis and feedback generation.

Key components of the project:

Frontend: Built with React.
Backend: Powered by FastAPI.
Feedback focuses on improving code quality, readability, maintainability, and adherence to coding standards.

Features

Code Evaluation: Provides detailed feedback on structure, complexity, and performance.
Readability Improvement: Suggestions to enhance code readability and maintainability.
Refactoring Recommendations: Guidance on simplifying code and reducing complexity.
Compliance with Coding Standards: Ensures code follows industry-standard conventions.
Future Enhancements:
- Grading system to evaluate code performance.
- Expanded feedback support for additional programming languages.

Dataset

The project uses The Stack dataset from the BigCode Project, containing over 6TB of source code files across 358 programming languages. This comprehensive dataset is ideal for multi-language code analysis.

Key Features

Languages: 358 programming languages.
Data: Over 6TB of source code files.
Metadata: Includes file content, size, language, and repository details.
Quality: License-filtered and deduplicated for clean data.

For more details, visit Hugging Face: The Stack Dataset.

Model Fine-Tuning

In Code Grader Feedback, we have fine-tune the Meta LLaMA 2 model using LoRA (Low-Rank Adaptation) and Quantized LoRA (QLoRA) for efficient performance in code analysis tasks. The summary of the key steps involved in the fine-tuning process.

Fine-Tuning Steps

Dataset Preparation:

We use the "bigcode/the-stack" dataset, focusing on Python files.

The dataset is streamed using the datasets library to handle large data without overloading memory:

from datasets import load_dataset

dataset = load_dataset("bigcode/the-stack", data_dir="data/python", split="train", streaming=True)
file_count = 0
max_files = 80000
downloaded_samples = []

for sample in dataset:
    if file_count >= max_files:
        break
    downloaded_samples.append(sample)
    file_count += 1

Processed samples are converted into a structured Hugging Face dataset:

from datasets import Dataset

data_dict = {"content": downloaded_samples}
dataset = Dataset.from_dict(data_dict)

Model and Tokenizer Configuration:

The model is loaded with QLoRA to reduce memory usage with 4-bit quantization:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=False)
model = AutoModelForCausalLM.from_pretrained("meta-llama/llama-2", quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/llama-2", padding_side="right")

The tokenizer is set up to handle padding appropriately:
```
tokenizer.pad_token = tokenizer.eos_token
```

Merging LoRA Weights:

LoRA weights are merged into the base model using the PeftModel:

from peft import PeftModel

peft_model = PeftModel.from_pretrained(model, "lora-finetuned-model")
peft_model.merge_and_unload()  # Merge LoRA weights into the base model

Final Model Loading:

The fine-tuned and merged model is loaded for inference using FP16 precision for memory efficiency:

model = AutoModelForCausalLM.from_pretrained("fine-tuned-llama-model", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("fine-tuned-llama-model")

The tokenizer is configured to align token sequences correctly:

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

For more details and the full fine-tuning process, refer to the notebook/ directory.

Project Structure

The project is organized as follows:

code-grader-feedback/
├── backend/
│   ├── app/
│   │   ├── main.py                 # FastAPI entry point and API logic
│   └── requirements.txt            # Backend dependencies
│
├── docs/                           # Project documentation files
│
├── frontend/
│   ├── src/
│   │   ├── App.tsx                 # Main React component
│   │   ├── components/             # Reusable React components
│   │   ├── pages/                  # Different page views (Homepage, LandingPage)
│   │   ├── constants/              # Static constants (styles, options)
│   │   ├── routes/                 # Navigation route configurations
│   │   └── types/                  # TypeScript type definitions
│   ├── package.json                # Frontend dependencies and scripts
│   ├── postcss.config.js           # PostCSS configuration for styles
│   ├── tailwind.config.js          # Tailwind CSS configuration
│   └── vite.config.ts              # Vite configuration for frontend bundling
│
├── notebook/
│   └── Code_Grader_Feedback_Fine_Tuning.ipynb  # Jupyter notebook for model fine-tuning
│
└── README.md                       # Main README file with project information

Installation

Follow the steps below to set up and run the Code Grader Feedback system on your local machine.

1. Clone the Repository

Start by cloning the project repository from GitHub:

git clone https://github.com/fuseai-fellowship/code-grader-feedback.git
cd code-grader-feedback

2. Install Frontend Dependencies

Navigate to the frontend directory and install the required Node.js dependencies:

cd frontend
npm install

3. Install Backend Dependencies

Navigate to the backend directory, set up a virtual environment for Python, and install the necessary dependencies:

cd backend
python3 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt

4. Configure Judge0 API

To enable code compilation, set up the Judge0 API by following these steps:

Go to Judge0 on RapidAPI.
Subscribe to the Basic Plan to access the Judge0 API.
Retrieve your API keys from the RapidAPI dashboard, which include:
- RAPIDAPI_HOST
- RAPIDAPI_KEY

5. Set Up Environmental Variables

You need to configure environment variables for both the frontend and backend. Follow the steps below for each:

Frontend:

Navigate to the frontend directory and create a .env file.

Add the following variables:

VITE_API_URL=http://localhost:8000
VITE_APP_RAPID_API_HOST=<your-rapidapi-host>
VITE_APP_RAPID_API_KEY=<your-rapidapi-key>
VITE_APP_RAPID_API_URL=https://judge0-ce.p.rapidapi.com/submissions

Replace <your-rapidapi-host> and <your-rapidapi-key> with the values from your Judge0 RapidAPI account.

Backend:

Navigate to the backend directory and create a .env file.
Add the following variable:
```
FRONTEND_ORIGIN=http://localhost:5173
```

For detailed instructions on setting up the environment variables, refer to the Setting Up Environmental Variables section.

6. Run the Frontend

Ensure your environment variables are configured properly, and the dependencies are installed. To start the frontend development server:

npm run dev

The frontend will be available at http://localhost:5173.

7. Run the Backend

Ensure your virtual environment is activated, and the necessary environment variables are set up. To run the FastAPI backend server:

uvicorn app.main:app --reload

The backend server will run at http://localhost:8000.

Contributing Guidelines

We appreciate your contributions! Follow these steps to get started:

Steps to Contribute

Fork and Clone
Fork the repository and clone it locally:

git clone https://github.com/<your-username>/code-grader-feedback.git
cd code-grader-feedback

Create a Branch
Create a new branch for your changes:
```
git checkout -b feature/your-feature-name
```
Make Changes and Commit
Make your changes and commit:
```
git commit -am "Describe your changes"
```
Push and Submit PR
Push your branch and submit a pull request:
```
git push origin feature/your-feature-name
```

A project maintainer will review and merge your changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code Grader Feedback

Overview

Features

Dataset

Key Features

Model Fine-Tuning

Fine-Tuning Steps

Project Structure

Installation

1. Clone the Repository

2. Install Frontend Dependencies

3. Install Backend Dependencies

4. Configure Judge0 API

5. Set Up Environmental Variables

Frontend:

Backend:

6. Run the Frontend

7. Run the Backend

Contributing Guidelines

Steps to Contribute

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
backend		backend
docs		docs
frontend		frontend
notebook		notebook
.gitignore		.gitignore
README.md		README.md
requirements.in		requirements.in

fuseai-fellowship/code-grader-feedback

Folders and files

Latest commit

History

Repository files navigation

Code Grader Feedback

Overview

Features

Dataset

Key Features

Model Fine-Tuning

Fine-Tuning Steps

Project Structure

Installation

1. Clone the Repository

2. Install Frontend Dependencies

3. Install Backend Dependencies

4. Configure Judge0 API

5. Set Up Environmental Variables

Frontend:

Backend:

6. Run the Frontend

7. Run the Backend

Contributing Guidelines

Steps to Contribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages