The Gibberish Detector is a simple Python script that evaluates whether a given text consists of meaningful words. It does this by comparing words in the input text against a predefined set of known words stored in words_set.pkl.
- The script loads a set of known words from
words_set.pkl, which is included in this repository. - It defines a function,
words_check(text), that:- Converts the input text to lowercase.
- Splits the text into individual words.
- Checks how many words exist in the known words set.
- Returns a score between
0and1, representing the proportion of recognized words.
- Clone this repository:
git clone https://github.com/LMArantes/gibberish-detector.git cd gibberish-detector - Ensure you have Python 3 installed.
- Import and use the
words_checkfunction:
from detector import words_check
text = "Hello world"
score = words_check(text)
print(f"Score: {score}")For users who prefer not to modify the code, the script can be run from the command line.
You can analyze a text string directly by running:
python detector_cli.py -t "Hello world"
To analyze a .txt file, provide its path:
python detector_cli.py -f path/to/file.txt
- 1.0 → All words are recognized.
- 0.0 → No words are recognized (likely gibberish).
- A score between 0.0 and 1.0 indicates partial recognition.
This project is licensed under the Modified Attribution License (MAL) v1.0.