QuestionDetection using Deep Learning

Model architecture:

This is a LSTM based Recurrent Network. The many-to-one approach is used to predict the last label of an input sentence, the label being either question or not-a-question.

Instead of using simple one-hot-encodings for the vocabulary, it uses word embeddings to represent high-dimentional patterns.
The network uses packed padded sequences because of variability in input lengths. The packed padded inputs help in speeding up the training since lot of processing for zero-padding is not done.
This is binary classification architecture, in which an input sequence is classified as either a question or not-a-question
The final probabilties are calculated using softmax
All the params can be found in config.json file

    def __init__(self, embedding_dim, hidden_dim, vocab_size):
        super(QuestionDetector, self).__init__()
        self.hidden_dim = hidden_dim
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, 3)
        self.fc1 = nn.Linear(hidden_dim, 1)

Training:

Run the following command to train the model. Note: All the params are loaded from config.json

python train.py --config config.json

Dataset:

Pre-processing:

The input data is expected to be in the following format:

Each line in the data_file is a sample_input
The last word of each line is either a | (not-a-question) or ? (question)
The raw text data can be cleaned into a usable format using the script in conversion/text_to_data.py which uses multiprocessing to fasten the process

Tensorboard Logging:

All the logs are saved in the saved folder. Project uses tensorboard to write the logs and so tensorboardX needs to be installed in your environment

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
base		base
conversion		conversion
data_loader		data_loader
dataset		dataset
model		model
trainer		trainer
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuestionDetection using Deep Learning

Model architecture:

Training:

Dataset:

Pre-processing:

Tensorboard Logging:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QuestionDetection using Deep Learning

Model architecture:

Training:

Dataset:

Pre-processing:

Tensorboard Logging:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages