RAGU provides a pipeline for building a Knowledge Graph, and performing retrieve over the indexed data. It contains different approaches to extract structured data from raw texts to enable efficient question-answering over structured knowledge.
Partially based on nano-graphrag
Our huggingface community is here
pip install graph_raguIf you want to use local models (via transformers etc), run:
pip install graph_ragu[local]import asyncio
from ragu.chunker import SimpleChunker
from ragu.embedder import STEmbedder
from ragu.graph import KnowledgeGraph, InMemoryGraphBuilder
from ragu.llm import OpenAIClient
from ragu.storage import Index
from ragu.triplet import ArtifactsExtractorLLM
from ragu.utils.ragu_utils import read_text_from_files
LLM_MODEL_NAME = "..."
LLM_BASE_URL = "..."
LLM_API_KEY = "..."
async def main():
# Load .txt documents from folder
docs = read_text_from_files("/path/to/files")
# Choose chunker
chunker = SimpleChunker(max_chunk_size=2048, overlap=0)
# Import LLM client
client = OpenAIClient(
LLM_MODEL_NAME,
LLM_BASE_URL,
LLM_API_KEY,
max_requests_per_second=1,
max_requests_per_minute=60
)
# Set up artifacts extractor
artifact_extractor = ArtifactsExtractorLLM(
client=client,
do_validation=True
)
# Initialize your embedder
embedder = STEmbedder(
"Alibaba-NLP/gte-multilingual-base",
trust_remote_code=True
)
# Set up graph storage and graph builder pipeline
pipeline = InMemoryGraphBuilder(client, chunker, artifact_extractor)
index = Index(
embedder,
graph_storage_kwargs={"clustering_params": {"max_cluster_size": 6}}
)
# Build KG
knowledge_graph = await KnowledgeGraph(
extraction_pipeline=pipeline, # Pass pipeline
index=index, # Pass storage
make_community_summary=True, # Generate community summary if you want
language="russian", # You can set preferred language
).build_from_docs(docs)
if __name__ == "__main__":
asyncio.run(main())from ragu.search_engine import LocalSearchEngine
search_engine = LocalSearchEngine(
client,
knowledge_graph,
embedder
)
# Find relevant local context for the query
print(await search_engine.a_search("Как переводится роман 'Ка́мо гряде́ши, Го́споди?'"))
# Or just past the query ang get final answer
print(await search_engine.a_query("Как переводится роман 'Ка́мо гряде́ши, Го́споди?'"))
# Output:
# [DefaultResponseModel(response="Роман 'Ка́мо гряде́ши, Го́споди?' переводится как 'Куда Ты идёшь, Господи?'")]
# :)Each text in corpus is processed to extract structured information. It consist of:
- Entities — textual representation, entity type, and a contextual description.
- Relations — textual description of the link between two entities (or a relation class), as well as its confidence/strength.
RAGU uses entity and relation classes from NEREL.
| No. | Entity type | No. | Entity type | No. | Entity type |
|---|---|---|---|---|---|
| 1. | AGE | 11. | FAMILY | 21. | PENALTY |
| 2. | AWARD | 12. | IDEOLOGY | 22. | PERCENT |
| 3. | CITY | 13. | LANGUAGE | 23. | PERSON |
| 4. | COUNTRY | 14. | LAW | 24. | PRODUCT |
| 5. | CRIME | 15. | LOCATION | 25. | PROFESSION |
| 6. | DATE | 16. | MONEY | 26. | RELIGION |
| 7. | DISEASE | 17. | NATIONALITY | 27. | STATE_OR_PROV |
| 8. | DISTRICT | 18. | NUMBER | 28. | TIME |
| 9. | EVENT | 19. | ORDINAL | 29. | WORK_OF_ART |
| 10. | FACILITY | 20. | ORGANIZATION |
| No. | Relation type | No. | Relation type | No. | Relation type |
|---|---|---|---|---|---|
| 1. | ABBREVIATION | 18. | HEADQUARTERED_IN | 35. | PLACE_RESIDES_IN |
| 2. | AGE_DIED_AT | 19. | IDEOLOGY_OF | 36. | POINT_IN_TIME |
| 3. | AGE_IS | 20. | INANIMATE_INVOLVED | 37. | PRICE_OF |
| 4. | AGENT | 21. | INCOME | 38. | PRODUCES |
| 5. | ALTERNATIVE_NAME | 22. | KNOWS | 39. | RELATIVE |
| 6. | AWARDED_WITH | 23. | LOCATED_IN | 40. | RELIGION_OF |
| 7. | CAUSE_OF_DEATH | 24. | MEDICAL_CONDITION | 41. | SCHOOLS_ATTENDED |
| 8. | CONVICTED_OF | 25. | MEMBER_OF | 42. | SIBLING |
| 9. | DATE_DEFUNCT_IN | 26. | ORGANIZES | 43. | SPOUSE |
| 10. | DATE_FOUNDED_IN | 27. | ORIGINS_FROM | 44. | START_TIME |
| 11. | DATE_OF_BIRTH | 28. | OWNER_OF | 45. | SUBEVENT_OF |
| 12. | DATE_OF_CREATION | 29. | PARENT_OF | 46. | SUBORDINATE_OF |
| 13. | DATE_OF_DEATH | 30. | PART_OF | 47. | TAKES_PLACE_IN |
| 14. | END_TIME | 31. | PARTICIPANT_IN | 48. | WORKPLACE |
| 15. | EXPENDITURE | 32. | PENALIZED_AS | 49. | WORKS_AS |
| 16. | FOUNDED_BY | 33. | PLACE_OF_BIRTH | ||
| 17. | HAS_CAUSE | 34. | PLACE_OF_DEATH |
File: ragu/triplet/llm_artifact_extractor.py. A baseline pipeline that uses LLM to extract entities, relations, and their descriptions in a single step.
2. RAGU-lm (for russian language)
A compact model (Qwen-3-0.6B) fine-tuned on the NEREL dataset. The pipeline operates in several stages:
- Extract unnormalized entities from text.
- Normalize entities into canonical forms.
- Generate entity descriptions.
- Extract relations based on the inner product between entities.
A modular multi-model pipeline:
- runne_contrastive_ner — extracts entities (NER step).
- ragu_lm — performs entity normalization.
- ragu_lm — generates concise definitions and descriptions for entities.
- ragu_re — extracts relation candidates.
- ragu_lm — refines and summarizes relations with their textual descriptions.
| Model | Dataset | F1 (Entities) | F1 (Relations) |
|---|---|---|---|
| Qwen-2.5-14B-Instruct | NEREL | 0.32 | 0.69 |
| RAGU-lm (Qwen-3-0.6B) | NEREL | 0.6 | 0.71 |
| Small-model pipeline | NEREL | 0.74 | 0.75 |
How to know what's prompt in used
search_engine = LocalSearchEngine(
client,
knowledge_graph,
embedder
)
print(search_engine.get_prompts())
#
# {'local_search': PromptTemplate(template='\n**Goal**\nAnswer the query by summarizing relevant information from the context and, if necessary, well-known facts.\n\n**Instructions**\n1. If you do not know the correct answer, explicitly state that.\n2. Do not include unsupported information.\n\nQuery: {{ query }}\nContext: {{ context }}\n\nProvide the answer in the following language: {{ language }}\nReturn the result as valid JSON matching the provided schema.\n', schema=<class 'ragu.common.prompts.default_models.DefaultResponseModel'>, description='Prompt for generating a local context-based search response.')}
#
# or, if you know prompt name
print(search_engine.get_prompt("local_search")You can update prompt using .update_prompt method
from ragu.common.prompts import PromptTemplate
search_engine.update_prompt("prompt_name", PromptTemplate(template=..., schema=...))- Ivan Bondarenko - idea, smart_chunker, NER model, ragu-lm
- Mikhail Komarov
- Roman Shuvalov
- Yanya Dement'yeva
- Alexandr Kuleshevskiy
- Nikita Kukuzey
- Stanislav Shtuka
- Matvey Solovyev
- Ilya Myznikov