This is a LSTM based Recurrent Network. The many-to-one approach is used to predict the last label of an input sentence, the label being either question or not-a-question.
- Instead of using simple one-hot-encodings for the vocabulary, it uses word embeddings to represent high-dimentional patterns.
- The network uses
packed padded sequencesbecause of variability in input lengths. The packed padded inputs help in speeding up the training since lot of processing forzero-paddingis not done. - This is binary classification architecture, in which an input sequence is classified as either a
questionornot-a-question - The final probabilties are calculated using
softmax - All the
paramscan be found inconfig.jsonfile
def __init__(self, embedding_dim, hidden_dim, vocab_size):
super(QuestionDetector, self).__init__()
self.hidden_dim = hidden_dim
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, 3)
self.fc1 = nn.Linear(hidden_dim, 1) Run the following command to train the model. Note: All the params are loaded from config.json
python train.py --config config.jsonThe input data is expected to be in the following format:
- Each line in the
data_fileis asample_input - The last word of each line is either a
|(not-a-question) or?(question) - The raw text data can be cleaned into a usable format using the script in
conversion/text_to_data.pywhich uses multiprocessing to fasten the process
All the logs are saved in the saved folder. Project uses tensorboard to write the logs and so tensorboardX needs to be installed in your environment