Introduction

Uses locally hosted LLM & Scrapegraphai to scrape the web with natural language.

Example

Given the prompt:

"List me all the projects with their descriptions at https://perinim.github.io/projects"

The systems generates this output:

--- Executing Fetch Node ---
--- Executing Parse Node ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- (tokens compressed and vector stored) ---
--- Executing GenerateAnswer Node ---
Processing chunks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7928.74it/s]
{
    "projects": [
        {
            "description": "Open Source project aimed at controlling a real life rotary pendulum using RL algorithms",
            "title": "Rotary Pendulum RL"
        },
        {
            "description": "Developed a Deep Q-Network algorithm to train a simple and double pendulum",
            "title": "DQN Implementation from scratch"
        },
        {
            "description": "University project which focuses on simulating a multi-agent system to perform environment mapping. Agents, equipped with sensors, explore and record their surroundings, considering uncertainties in their readings.",
            "title": "Multi Agents HAED"
        },
        {
            "description": "Modular drone architecture proposal and proof of concept. The project received maximum grade.",
            "title": "Wireless ESC for Modular Drones"
        }
    ]
}

Setup

Setup ollama

brew install ollama
ollama serve
ollama pull mistral
ollama pull nomic-embed-text

On a cold start wait for 10 seconds-ish, the model needs time to load.

Container build

podman build -t scrapegraphai .

Run the scraper

podman run -v $(pwd)/src:/home/app/src --network=host -it scrapegraphai python src/main.py

Development

Local dev

podman run -v $(pwd)/src:/home/app/src -it scrapegraphai /bin/bash

Installing new Python dependencies

podman run --user root -v $(pwd):/home/app -it scrapegraphai pipenv install requests

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Example

Setup

Setup ollama

Container build

Run the scraper

Development

Local dev

Installing new Python dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jonathar/llm-scraper

Folders and files

Latest commit

History

Repository files navigation

Introduction

Example

Setup

Setup ollama

Container build

Run the scraper

Development

Local dev

Installing new Python dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages