Created by: Marcin Zub
This project is part of the AGH UST course Natural Language Processing in Artificial Intelligence Systems. It focuses on text analysis using Natural Language Processing (NLP) techniques.
Texts that are being analyzed are:
Manuskrypt Wojnicza
by unknown authorOszi Csillagog
by Knut Hamsun
For each of them, the program performs the following operations:
- Check weather or not it's following the Zipf law
- Create the N-gram array with added incidence
- Create the collocation table
The project has the following structure:
src/
: Contains the source code for the project.analysis.ipynb
: Jupyter notebook for data analysis.data/
: Contains the project dataoutput/
: Here are output filesraw/
: Raw text files if needed
models/
: Contains text modeltext.py
: Implements the text model with read functions
processing/
: Contains Python classes for data processing.zipf.py
: Implements the Zipf's law functionality.
tests/
: Contains unit tests for the project.
requirements.txt
: Lists the Python dependencies required by the project.
- Clone the repository:
git clone https://github.com/MarcinZ20/NLP-Text-Analysis.git
- Install the dependencies using pip:
pip install -r requirements.txt
- Run the Jupyter notebook
jupyter notebook src/analysis.ipynb
This project is licensed under the terms of the LICENSE file.
For more information about the project, please refer to the src/analysis.ipynb and src/processing/zipf.py files.