A web application for basic qualitative socio-linguistic analysis (CS50's Introduction to Programming with Python final project). Has been a hard day's night, but here we are.
QualiLens is a basic web app built with Streamlit that allows users to perform simple socio-linguistic data analysis. Having worked in academia, I wanted to make the most of Python in terms of library usage and experiment with some straightforward data analysis tools. I realized that a CS50p project might not be the right place to implement complex features of Pandas, NumPy, or Matplotlib, since the effort should be proportional to the achievable outcome. This is also why I chose to use Streamlit instead of Flask or Django, although I still appreciated Streamlit's simplicity.
The project includes a project.py file where the core logic is implemented, along with three separate files, each serving a specific purpose. I deliberately applied the OOP paradigm (or at least attempted to) in these three files, while project.py contains procedural logic organized in functions.
QualiLens consists of a basic landing page, structured as follows:

The user is able to select three functionality: to generate a wordcloud, to generate a word frequency graph, and to analyze sentiment.
Once they select the file, the user can perform either one of those functions or all of them. Results will be as follows (for this purpose, the first two chapters of Wuthering Heights by Emily Brontë have been employed as samples):
- project.py
- word_cloud.py
- word_count.py
- sentiment.py
- requirements.txt
- README.md
Streamlit: Used to create the interactive web app interface. Pandas: Utilized for data manipulation and analysis, including reading datasets, cleaning data, and performing basic statistical operations. Altair: Employed for creating declarative, interactive visualizations. Wordcloud: Generates visual word clouds from text data, helping to identify frequent words or topics in a dataset. re: Python’s regular expressions module. Textblob: Provides basic natural language processing functionalities such as sentiment analysis, tokenization, and part-of-speech tagging.
To install the libraries, run:
pip install -r requirements.txt
And then (required for sentiment analysis):
python -m textblob.download_corpora
This module contains two classes for loading text and generating word clouds: WCLTextLoader and GenerateWordCloud.
WCLTextLoader is responsible for loading text from a file or directly from a string. It takes a file path or a string as input and returns the full text content.
GenerateWordCloud provides functionality to create a word cloud from a given text. The class contains two methods:
-
generate() generates the word cloud from the provided text and returns it as a NumPy array.
-
to_array() returns the generated word cloud as a NumPy array. Raises an error if the word cloud has not been generated yet.
This module contains two classes for text processing and word frequency analysis: WCOTextLoader and CountWords.
WCOTextLoader is responsible for loading text from a file. It takes a file path as input and returns the full text content.
CountWords provides functionality to analyze word frequencies in a text. It ignores common stopwords that have been arbitrarly chosen (e.g., “the”, “and”, “is”) to focus on meaningful content. The class contains three methods:
-
tokenize() splits the text into words, converting all to lowercase.
-
frequencies() computes the occurrence of each word in the text.
-
top_n_df() returns a Pandas DataFrame of the most frequent words, with a default of the top 30.
This module contains two classes for loading text and performing sentiment analysis: SentimentLoader and GenerateSentiment.
SentimentLoader is responsible for loading text from a file. It takes a file path as input and returns the full text content.
GenerateSentiment provides functionality to analyze the sentiment of a text. The class contains one main method:
- sentiment() analyzes the text sentence by sentence using TextBlob and returns a Pandas DataFrame with the first three words of each sentence, along with its Subjectivity and Polarity scores.
Note that Subjectivity score is a float number within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. Polarity score is a float number within the range [-1.0, 1.0] where -1.0 indicates a very negative sentiment, 0.0 is neutral, and 1.0 is very positive.
This is the main Streamlit application file that orchestrates the functionality of QualiLens. It provides the user interface, handles file uploads, and coordinates the three types of analysis: word cloud, word frequency, and sentiment analysis.
It uses Streamlit components such as title, header, tabs, file_uploader, and buttons to create an interactive experience. It allows users to upload a .txt file for analysis, displaying results either as images or interactive tables. Every element can be downloaded.
project.py contains 6 custom functions:
- handle_upload() manages file uploads.
- handle_text_upload_messages() displays messages depending on the upload status.
- handle_wordcloud() generates and displays the word cloud.
- handle_word_frequency() calculates and visualizes word frequencies.
- handle_sentiment() computes sentiment scores and generates an interactive chart.
- handle_sentiment_table() shows a detailed sentiment table.
To run the project, it is sufficient to run locally:
streamlit run project.py
Loris Botto





