Encoding error on windows

I can not load utf-8 file while building my vocabulary or loading my dataset because gbk is used by default on windows. I added a new option to allow manually setting encoding PairedTextData. #269 

```
$ python main.py 
Traceback (most recent call last):
  File "main.py", line 62, in <module>
    main()
  File "main.py", line 28, in main
    hparams=config_data.train, device=device)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\data\paired_text_data.py", line 140, in __init__
    eos_token=src_hparams.eos_token)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\vocabulary.py", line 103, in __init__
    = self.load(self._filename)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\vocabulary.py", line 119, in load
    vocab = list(line.strip() for line in vocab_file)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\vocabulary.py", line 119, in <genexpr>
    vocab = list(line.strip() for line in vocab_file)
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8c in position 2: illegal multibyte sequence

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Encoding error on windows #270

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Encoding error on windows #270

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions