Large training datasets are not included in this repository to keep cloning lightweight. If you use the News Category dataset, place it here:
trainingData/News/News_Category_Dataset_v3.txt
This is a Java-based capstone project that automatically extracts top keywords from .txt documents using TF-IDF (Term Frequency - Inverse Document Frequency). Users can select a file, specify the number of keywords (up to 20), and choose whether to apply stemming for better keyword generalization. The result is a ranked list of keywords that can be exported or used for tagging and indexing purposes.
π Want to learn how to use this system?
β Open the User Guide
- π User Guide: Full walkthrough for GUI + CLI
- πΌοΈ Screenshots: GUI preview (light/dark, stemming ON/OFF)
- JavaFX GUI interface for document selection and keyword display
- Toggle stemming and dark mode
- Export keywords to TXT or CSV
- Batch runner for automated testing (
BatchTestRunner) - Custom TF-IDF keyword extractor with fallback support
- IDF training model builder from any folder of
.txtfiles - Full test suite with JUnit 5 + Mockito for unit and integration tests
- Robust error handling for malformed inputs, empty files, and invalid paths
- Modular code structure and configurable input/output setup
- Java 21 / Gradle-based build system
- Java 21
- JavaFX 24.0.1
- Gradle
- Lombok
- JUnit 5
- Mockito
capstone.documenttaggingsystem/
βββ TaggingApp.java # Main GUI app
βββ TaggingCli.java # Console-based interface
βββ FileParser.java # Cleans and normalizes .txt files
βββ IdfTrainer.java # Builds IDF map from document folders
βββ IdfLoader.java # Loads IDF map from file
βββ TfIdfCalculator.java # Handles TF, IDF, and keyword ranking
βββ WordStemmer.java # Handles word stemming
βββ FileUtils.java # File walker and loader helpers
βββ BatchTestRunner.java # Batch testing class
βββ test/ # JUnit + Mockito test cases
β βββ testDocuments/ # Controlled test .txt inputs
βββ trainingData/ # Your custom documents for IDF training
Make sure you have Java 21+ and JavaFX SDK 24.0.1 configured.
-
Clone this repository.
-
Open the project in IntelliJ IDEA.
-
Set JavaFX SDK under module dependencies.
-
To launch the GUI:
RunTaggingApp.javaor use:./gradlew run
To use the console-based interface instead of GUI:
./gradlew run --args='cli'./gradlew test- Training Phase: All
.txtfiles in thetrainingData/folder are parsed to calculate IDF values and generate a map. - Tagging Phase: When a new
.txtdocument is selected, itβs cleaned and tokenized (optionally stemmed) and compared against the IDF map. - Keyword Ranking: A TF-IDF score is calculated for each word, and the top N keywords are returned.
Top Keywords (with stemming enabled):
data β 1.2345
machin β 1.1004
model β 0.9981
Use manual exploratory tests to validate:
- GUI responsiveness and layout
- Correct file selection behavior
- Stemming toggle functionality
- Dark mode visibility
- Export functionality (TXT and CSV)
- Ashley Caceres Pagan β Backend logic, GUI functionality, mock testing, visual design, final system integration
- Ryo Kilgannon β TF-IDF logic, stemming integration, Lucene enhancements
- Danton Robinson β Documentation lead, initial GUI submission (replaced), testing plan draft assistance
- Unicode stemming support is currently limited to basic ASCII characters.
- File input is restricted to
.txtformat only.
- Batch document tagging and export functionality
- GUI enhancements (toast feedback, font scaling, resizable layout)
- NLP improvements (stop words, lemmatization)
- Optional cloud integration for large-scale document management
- Unicode and internationalization improvements
This project was developed as part of a capstone project for the University of XYZ. Special thanks to our professors and mentors for their guidance and support throughout the development process.
This project is for educational purposes only. The authors are not responsible for any misuse or damages caused by the use of this software. Use at your own risk.
We expect all contributors to adhere to the following code of conduct:
- Be respectful and considerate to others.
- Communicate openly and constructively.
- Be inclusive and welcoming to all.
- Assume good intentions and be open to feedback.
- Avoid personal attacks and harassment.
- Respect the privacy and confidentiality of others.
- Be mindful of the impact of your words and actions.
- Report any violations of this code of conduct to the maintainers.
- Take responsibility for your actions and their consequences.
- Strive to create a positive and supportive environment for all contributors.
- Encourage collaboration and teamwork.
- Be open to new ideas and perspectives.
- Support and uplift others in the community.
- Be respectful of differing opinions and viewpoints.
- Be mindful of cultural differences and sensitivities.
- Avoid using offensive or derogatory language.
- Be respectful of others' time and contributions.
We welcome contributions to this project! If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them with clear messages.
- Push your changes to your forked repository.
- Create a pull request to the main repository with a description of your changes.
- Wait for review and feedback from the maintainers.
- Make any necessary changes based on feedback.
- Once approved, your changes will be merged into the main repository.
- Celebrate your contribution!
- If you have any questions or need assistance, feel free to reach out to the maintainers via email or open an issue in the repository.
- For larger contributions, consider discussing your ideas with the maintainers before starting work to ensure alignment with the project's goals.
- For any documentation updates, please follow the existing format and structure to maintain consistency.
- For any new features, consider writing tests to ensure functionality and prevent regressions.
- For any bug fixes, please include a description of the issue and how it was resolved in your pull request.
- For any design changes, consider providing mockups or screenshots to illustrate your ideas.
- For any performance improvements, please include benchmarks or comparisons to demonstrate the impact of your changes.
- For any security-related changes, please follow best practices and consider potential vulnerabilities.
- For any accessibility improvements, please ensure compliance with WCAG guidelines and consider diverse user needs.
- For any localization or internationalization changes, please follow best practices and consider different languages and regions.
- For any community engagement, consider participating in discussions, answering questions, and providing support to other users.
- For any feedback or suggestions, please provide constructive comments and be open to discussions.
- For any issues or bugs, please provide detailed information to help with troubleshooting.
- For any feature requests, please provide a clear description of the desired functionality and its use case.
- For any documentation contributions, please follow the existing style and format for consistency.
- For any testing contributions, please follow the existing test structure and include relevant test cases.
- For any code style contributions, please follow the existing coding conventions and guidelines.
- For any dependency updates, please ensure compatibility with the existing codebase and test thoroughly.
- For any build or configuration changes, please ensure compatibility with the existing setup and document any changes made.
- For any deployment or distribution changes, please ensure compatibility with the existing setup and document any changes made.
- For any additional resources or references, please provide links and descriptions to help others understand the context and relevance of the materials.
- For any additional notes or comments, please provide context and explanations to help others understand the purpose and significance of the information.
- For any additional testing or quality assurance information, please provide details on the testing strategy and methodologies used in the project.
- For any additional deployment or distribution information, please provide details on how to deploy and distribute the application for use in different environments.
- For any additional configuration or setup information, please provide details on how to configure and set up the application for different use cases and environments.
- For any additional performance or optimization information, please provide details on how to optimize the application for better performance and efficiency.
- For any additional security or privacy information, please provide details on how to ensure the security and privacy of the application and its users.
- For any additional accessibility or usability information, please provide details on how to ensure the application is accessible and usable for all users.
- For any additional localization or internationalization information, please provide details on how to ensure the application is localized and internationalized for different languages and regions.
- For any additional community or support information, please provide details on how to engage with the community and seek support for the project.
- For any additional feedback or suggestions, please provide details on how to provide feedback and suggestions for the project.
- For any additional resources or references, please provide links and descriptions to help others understand the context and relevance of the materials.
- For any additional acknowledgments or credits, please provide recognition of any third-party libraries or resources used in the project.
- JavaFX Documentation
- Java 21 Documentation
- JUnit 5 Documentation
- Mockito Documentation
- Gradle Documentation
- TF-IDF Wikipedia
- Lucene Stemming
- JavaFX Dark Mode
- JavaFX GUI Design
- JavaFX File Chooser
- JavaFX TextArea
- JavaFX Button
- JavaFX Label
- JavaFX Scene
- JavaFX Stage
- JavaFX Application
- JavaFX Event Handling
- JavaFX CSS
- JavaFX Layouts
- JavaFX Controls
- JavaFX FXML
- JavaFX WebView
- JavaFX Media
- JavaFX Charts
- JavaFX 3D
- JavaFX Animation
- JavaFX Effects
- JavaFX Imaging
- JavaFX Media Playback
- JavaFX Accessibility
- JavaFX Printing
- JavaFX Web
- JavaFX Swing
- JavaFX Swing Interoperability
- JavaFX Swing Node
- JavaFX Swing Application
- JavaFX Swing Scene
- JavaFX Swing Node
For any questions or feedback, please reach out to the contributors via email:
- Ensure JavaFX SDK is correctly set up in your IDE.
- Check for any missing dependencies in the
build.gradlefile. - If you encounter issues with the GUI, try running the CLI version for debugging.
- For any errors related to file paths, ensure that the files are accessible and correctly formatted as
.txt. - If you experience performance issues, consider optimizing the IDF training data size or the number of keywords requested.
- For stemming issues, ensure that the stemming library is correctly integrated and that the input text is in a supported format.
- Clone the repository to your local machine.
- Navigate to the project directory.
- Run
./gradlew buildto install dependencies and build the project. - Run
./gradlew runto start the application. - Follow the on-screen instructions to select a document and extract keywords.
- Use the GUI to toggle stemming and dark mode as needed.
- Export the keywords to a TXT or CSV file for further use.
- For batch testing, use the
BatchTestRunnerclass to run automated tests on multiple documents. - For IDF training, place your
.txtfiles in thetrainingData/folder and run theIdfTrainerclass to build the IDF map. - For CLI usage, run
./gradlew run --args='cli'to access the console-based interface. - For testing, run
./gradlew testto execute the JUnit and Mockito tests. - For any issues, refer to the troubleshooting section or contact the contributors for assistance.
- For further documentation, refer to the
UserGuide.mdfile for a full walkthrough of the GUI and CLI features. - For screenshots of the GUI in different modes, refer to the
screenshotsfolder for a visual preview of the application. - For any updates or changes, refer to the
CHANGELOG.mdfile for a history of modifications and improvements made to the project. - For any additional features or enhancements, refer to the
FUTURE_PLANS.mdfile for a list of potential future developments and improvements to the application. - For any contributions or suggestions, refer to the
CONTRIBUTING.mdfile for guidelines on how to contribute to the project and submit pull requests. - For any licensing information, refer to the
LICENSEfile for details on the project's license and usage rights. - For any contact information, refer to the
CONTACT.mdfile for details on how to reach the contributors for questions or feedback. - For any additional resources or references, refer to the
RESOURCES.mdfile for a list of helpful links and materials related to the project and its technologies. - For any acknowledgments or credits, refer to the
ACKNOWLEDGMENTS.mdfile for recognition of any third-party libraries or resources used in the project. - For any additional notes or comments, refer to the
NOTES.mdfile for any extra information or context related to the project and its development. - For any additional documentation or resources, refer to the
DOCUMENTATION.mdfile for a comprehensive overview of the project's structure, features, and usage. - For any additional testing or quality assurance information, refer to the
TESTING.mdfile for details on the testing strategy and methodologies used in the project. - For any additional deployment or distribution information, refer to the
DEPLOYMENT.mdfile for details on how to deploy and distribute the application for use in different environments. - For any additional configuration or setup information, refer to the
CONFIGURATION.mdfile for details on how to configure and set up the application for different use cases and environments. - For any additional performance or optimization information, refer to the
PERFORMANCE.mdfile for details on how to optimize the application for better performance and efficiency. - For any additional security or privacy information, refer to the
SECURITY.mdfile for details on how to ensure the security and privacy of the application and its users. - For any additional accessibility or usability information, refer to the
ACCESSIBILITY.mdfile for details on how to ensure the application is accessible and usable for all users. - For any additional localization or internationalization information, refer to the
LOCALIZATION.mdfile for details on how to ensure the application is localized and internationalized for different languages and regions. - For any additional community or support information, refer to the
COMMUNITY.mdfile for details on how to engage with the community and seek support for the project. - For any additional feedback or suggestions, refer to the
FEEDBACK.mdfile for details on how to provide feedback and suggestions for the project. - For any additional resources or references, refer to the
REFERENCES.mdfile for a list of helpful links and materials related to the project and its technologies.
January 6, 2026



