GraphPaperSim

Advanced Data Analysis on Research Publications. Grouping of papers with the usage of a knowledge graph.

Description

The objective of this project is a advanced Data Analysis on Research Publications. Given a corpus of 30 papers, the papers have been grouped according to their topics and similarities; and a knowledge graph has been constructed from those relationships. Additionally, the knowledge graph has been enriched with metadata using sources like OpenAlex and OpenAire.

Documentation

You can find the documentation here

Requirements

To run this program you will need:

Docker which is a software that provides a convenient way to package, distribute and run applications as containers, ensuring consistency across different environments.
Operating Systems such as Windows or Linux.

Installation instructions

Step 1: Clone the repository from GitHub to your local machine:

git clone https://github.com/RubenCid35/GraphPaperSim.git && cd GraphPaperSim

Step 2: Start the docker server. In windows, you can it from the Docker Desktop o from services.

Execution instructions

To execute the project, you only need to run the programs located in the app folder. Inside it, you will find the RDF generated in .ttl (output.ttl) format for the 30 articles.

To execute, follow these steps:

Step 1: Start the docker server.

Step 2: Go to the "app" folder.

cd app

Step 3: Execute the following command.

docker-compose up

The program will run at http://127.0.0.1:8050/ .

To close the program, execute:

docker-compose down

RDF Generation

We will explain how the RDF of the 30 papers used to create the program was obtained. (Its execution is not necessary to run the previous program).

Step 1: First, Grobid is used to obtain information (title, abstract, acknowledgements) from the 30 papers. To do this, Grobid needs to be running:

docker pull lfoppiano/grobid:0.7.2
docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.2

Then, the code/grobid.py file is executed to obtain the results in results/results.json.

Step 2: Next, topic modeling, clustering, NER, and metadata retrieval are performed using OpenAlex and OpenAire. To do this, the following programs are executed:

code/topic.py to obtain the existing topics (results/topic.json) and the probability of each paper to belong a topic (results/topic_prob.json).
code/similarity.py to obtain the similarity between papers (results/similarity_results.json).
code/acknowledgment.py for the NER model of acknowledgments (results/acknowledgment.json).
code/openalex_openaire.py for extracting external information from papers (results/papers_info.json, results/authors_info.json y results/institutions_info.json)

Step 3: With the aforementioned JSON files, the output.ttl file has been obtained with RML Mapper. This tool allows the user to execute a RML rules (that are store in the files in mappings) to generate Linked Data. To use tool with the previous results, first the user needs to download the tool .jar from releases section and execute the following command:

java -jar .\rmlmapper-6.5.1-r371-all.jar -m .\mappings\transformations.ttl -o app/output.ttl -s turtle

Running examples

In the application, there are two tabs: "Consulta-Query" (for querying the Knowledge Graph) and "Sobre Nosotros".

Below is an example of a query.

PREFIX onto: <http://upm.ontology.es/papers#>
SELECT ?sub ?obj WHERE {
  ?sub onto:hasName ?obj .
} LIMIT 10

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX onto: <http://upm.ontology.es/papers#>

SELECT ?name ?uri ?topic ?score ?words
WHERE {
   ?uri onto:hasTitle ?name;
          onto:hasTopic ?topicAssign.

    ?topicAssign onto:score  ?score;
                          onto:assign ?topic.

    ?topic onto:hasWords ?words.

    FILTER( ?score <  0.9)

} LIMIT 10

Figure 1. Query example.

Preferred citation

Yimin Zhou.
Rubén Cid Costa.
Rodrigo Durán Andrés.

Where to get help

Rubén Cid Costa: [email protected]
Rodrigo Durán Andrés: [email protected]
Yimin Zhou.: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
app		app
code		code
docs		docs
grobid_results		grobid_results
mappings		mappings
ontology		ontology
papers		papers
results		results
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
codemeta.json		codemeta.json
mkdocs.yml		mkdocs.yml
presentation.pdf		presentation.pdf
rationale.md		rationale.md
requirements.txt		requirements.txt
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GraphPaperSim

Description

Documentation

Requirements

Installation instructions

Execution instructions

RDF Generation

Running examples

Preferred citation

Where to get help

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

RubenCid35/GraphPaperSim

Folders and files

Latest commit

History

Repository files navigation

GraphPaperSim

Description

Documentation

Requirements

Installation instructions

Execution instructions

RDF Generation

Running examples

Preferred citation

Where to get help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages