Dalelk: The Official AI Academic Assistant for the College of Computer Science and Engineering for Jeddah University
├── classifier_training/ # The brain that understands what the user is asking about
│ ├── certifications/ # Questions certifications folder
│ ├── courses/ # Questions courses folder
│ ├── general/ # Questions general folder
│ ├── classifier_dataset.xlsx # Used to train the intent classifier
│ ├── create_dataset.py # A script that automatically generates dataset questions
│ └── Evaluation_100.xlsx # Dataset for training/evaluation
│
├── datasets/
│ ├── Certifications.xlsx # Dataset about professional certifications
│ ├── Courses_new.xlsx # Dataset for the new course plan
│ ├── QA.xlsx # General academic Q&A dataset
│ └── Students Survey.xlsx # Dataset built from students questions
│
├── frontend/ # Frontend website implementation with responsive UI and API integration
│ ├── public/
│ ├── src/ # Main source code (components, pages, hooks)
│ ├── index.html
│ ├── package.json
│ ├── package-lock.json
│ ├── postcss.config.js
│ ├── tailwind.config.ts
│ ├── tsconfig.app.json
│ ├── tsconfig.json
│ ├── tsconfig.node.json
│ └── vite.config.ts
│
├── models/ # Trained AI models used by Dalelk
│ └── fine_tuned_marbert/ # Fine-tuned, trained on Dalelk's datasets
│
├── unit testing/ # Backend API endpoint tests and rate limiting validation
│ ├── main.py
│ ├── unit_test.py
│ └── unit_test_ratelimit.py
│
├── build_embeddings.py # bulid embeddings for dataset
├── data_loader.py # loading and preparing all the datasets into the program
├── llm_inference.py # communicating with the LLM to generate the final answer for the user.
├── load_embeddings.py # loading the pre-built embeddings
├── logger.py # Loading everything that happens in the program
├── main.py # the entry point of Dalelk where everything starts and comes together.
├── NOTICE.txt # google policy
├── query_classifer.py # The query classifier that analyze the user's intent
├── query_classifier_training.ipynb # Jupyter Notebook where the MARBert model was trained and fine-tuned.
├── requirements.txt # lists all the Python libraries that Dalelk needs to run.
├── retrieve.py # finding the most relevant information to answer the user's question.
└── setup.py # packaging and setting up Dalelk as a Python project.
- Go to Google AI Studio
- Sign in with your Google account
- Click "Get API Key"
- Click "Create API Key"
- Choose your project
- Copy the generated API key
- Paste it in the project in the .env file in the API_KEY1 field.
- Go to Hugging Face
- Sign in with your account
- Go to Embedding Gemma and accept the usage policy
- Click on your profile picture → "Settings"
- Go to "Access Tokens"
- Click "New Token"
- Give it a name and set access to "Read"
- Click "Create Token"
- Copy the generated token
- Paste it in the project in the .env file in the hf_token field.
- Go to the Releases page of this repository
- Download the classifier model file
- Place it inside the
models/folder
- Extract the downloaded file
- Make sure the extracted content follows this exact path:
!!! Windows users: !!! When extracting, make sure the final path looks like this and does not have an extra folder:
models/fine_tuned_marbert/(model files directly here)