diff --git a/docs/concepts.md b/docs/concepts.md
index 3d5ab1b3..5224cb2b 100644
--- a/docs/concepts.md
+++ b/docs/concepts.md
@@ -27,10 +27,9 @@ An encoder reads in "source data", e.g. a sequence of words or an image, and pro
 
 ## Decoder
 
-A decoder is a generative model that is conditioned on the representation created by the encoder. For example, a Recurrent Neural Network decoder may learn generate the translation for an encoded sentence in another language. For a list of available decoder, see the [Decoder Reference](decoders/).
+A decoder is a generative model that is conditioned on the representation created by the encoder. For example, a Recurrent Neural Network decoder may learn to generate the translation for an encoded sentence in another language. For a list of available decoder, see the [Decoder Reference](decoders/).
 
 
 ## Model
 
-A model defines how to put together an encoder and decoder, and how to calculate and minize the loss functions. It also handles the necessary preprocessing of data read from an input pipeline. Under the hood, each model is implemented as a [model_fn passed to a tf.contrib.learn Estimator](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Estimator). For a list of available models, see the [Models Reference](models/).
-
+A model defines how to put together an encoder and decoder, and how to calculate and minimize the loss functions. It also handles the necessary preprocessing of data read from an input pipeline. Under the hood, each model is implemented as a [model_fn passed to a tf.contrib.learn Estimator](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Estimator). For a list of available models, see the [Models Reference](models/).
diff --git a/docs/encoders.md b/docs/encoders.md
index ac8e57d9..38cc7c53 100644
--- a/docs/encoders.md
+++ b/docs/encoders.md
@@ -39,7 +39,7 @@ An encoder that pools over embeddings, as described in [https://arxiv.org/abs/16
 | --- | --- | --- |
 | `pooling_fn` | `tensorflow.layers.average_pooling1d` | The 1-d pooling function to use, e.g. `tensorflow.layers.average_pooling1d`. |
 | `pool_size` | `5` | The pooling window, passed as `pool_size` to the pooling function. |
-| `strides` | `1` | The stride during pooling, passed as `strides` the pooling function. |
+| `strides` | `1` | The stride during pooling, passed as `strides` to the pooling function. |
 | `position_embeddings.enable` | `True` | If true, add position embeddings to the inputs before pooling. |
 | `position_embeddings.combiner_fn` | `tensorflow.add` | Function used to combine the position embeddings with the inputs. For example, `tensorflow.add`. |
 | `position_embeddings.num_positions` | `100` | Size of the position embedding matrix. This should be set to the maximum sequence length of the inputs. |
@@ -56,5 +56,3 @@ hidden layer before the logits as the feature representation.
 | --- | --- | --- |
 | `resize_height` | `299` | Resize the image to this height before feeding it into the convolutional network. |
 | `resize_width` | `299` | Resize the image to this width before feeding it into the convolutional network. |
-
-
diff --git a/docs/index.md b/docs/index.md
index 24383af6..2c2e00c2 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -12,9 +12,9 @@ We built tf-seq2seq with the following goals in mind:
 
 - **Usability**: You can train a model with a single command. Several types of input data are supported, including standard raw text.
 
-- **Reproducibility**: Training pipelines and models are configured using YAML files. This allows other to run your exact same model configurations.
+- **Reproducibility**: Training pipelines and models are configured using YAML files. This allows others to run your exact same model configurations.
 
-- **Extensibility**: Code is structured in a modular way and that easy to build upon. For example, adding a new type of attention mechanism or encoder architecture requires only minimal code changes.
+- **Extensibility**: Code is structured in a modular way and that's easy to build upon. For example, adding a new type of attention mechanism or encoder architecture requires only minimal code changes.
 
 - **Documentation**: All code is documented using standard Python docstrings, and we have written guides to help you get started with common tasks.
 
diff --git a/docs/inference.md b/docs/inference.md
index ed4657b4..c8525a50 100644
--- a/docs/inference.md
+++ b/docs/inference.md
@@ -82,7 +82,7 @@ python -m bin.infer \
   ...
 ```
 
-By default, this script generates an `attention_score.npy` array file and one attention plot per example. The array file can be [loaded used numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) and will contain a list of arrays with shape `[target_length, source_length]`. If you only want the raw attention score data without the plots you can enable the `dump_atention_no_plot` parameter.
+By default, this script generates an `attention_score.npy` array file and one attention plot per example. The array file can be [loaded used numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) and will contain a list of arrays with shape `[target_length, source_length]`. If you only want the raw attention score data without the plots you can enable the `dump_attention_no_plot` parameter.
 
 
 
diff --git a/docs/nmt.md b/docs/nmt.md
index ce6393b5..76afcc0e 100644
--- a/docs/nmt.md
+++ b/docs/nmt.md
@@ -91,7 +91,7 @@ export TRAIN_STEPS=1000000
 
 ## Alternative: Generate Toy Data
 
-Training on real-world translation data can take a very long time. If you do not have access to a machine with a GPU but would like to play around with a smaller dataset, we provide a way to generate toy data. The following command will generate a dataset where the target sequences are reversed source sequences. That is, the model needs to learn the reverse the inputs.  While this task is not very useful in practice, we can train such a model quickly and use it as as sanity-check to make sure that the end-to-end pipeline is working as intended.
+Training on real-world translation data can take a very long time. If you do not have access to a machine with a GPU but would like to play around with a smaller dataset, we provide a way to generate toy data. The following command will generate a dataset where the target sequences are reversed source sequences. That is, the model needs to learn the reverse of the inputs.  While this task is not very useful in practice, we can train such a model quickly and use it as as sanity-check to make sure that the end-to-end pipeline is working as intended.
 
 ```
 DATA_TYPE=reverse ./bin/data/toy.sh
diff --git a/docs/tools.md b/docs/tools.md
index 73fa76b4..3dd86fed 100644
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -20,7 +20,7 @@ To run training on characters you must pass set `source_delimiter` and `target_d
 
 ## Visualizing Beam Search
 
-If you use the `DumpBeams` inference task (see [Inference](inference/) for more details) you can inspect the beam search data by loading the array using numpy, or generate beam search visualizations using the `generate_beam_viz.py` script. This required the `networkx` module to be installed.
+If you use the `DumpBeams` inference task (see [Inference](inference/) for more details) you can inspect the beam search data by loading the array using numpy, or generate beam search visualizations using the `generate_beam_viz.py` script. This requires the `networkx` module to be installed.
 
 ```
 python -m bin.tools.generate_beam_viz  \
diff --git a/seq2seq/test/hooks_test.py b/seq2seq/test/hooks_test.py
index dedc6594..f943b0cf 100644
--- a/seq2seq/test/hooks_test.py
+++ b/seq2seq/test/hooks_test.py
@@ -39,16 +39,16 @@ class TestPrintModelAnalysisHook(tf.test.TestCase):
   def test_begin(self):
     model_dir = tempfile.mkdtemp()
     outfile = tempfile.NamedTemporaryFile()
-    tf.get_variable("weigths", [128, 128])
+    tf.get_variable("weights", [128, 128])
     hook = hooks.PrintModelAnalysisHook(
         params={}, model_dir=model_dir, run_config=tf.contrib.learn.RunConfig())
     hook.begin()
 
     with gfile.GFile(os.path.join(model_dir, "model_analysis.txt")) as file:
-      file_contents = file.read().strip()
+      file_contents = tf.compat.as_text(file.read()).strip()
 
-    self.assertEqual(file_contents.decode(), "_TFProfRoot (--/16.38k params)\n"
-                     "  weigths (128x128, 16.38k/16.38k params)")
+    self.assertEqual(file_contents, "_TFProfRoot (--/16.38k params)\n"
+                     "  weights (128x128, 16.38k/16.38k params)")
     outfile.close()
 
 
@@ -94,7 +94,7 @@ def test_sampling(self):
       outfile = os.path.join(self.sample_dir, "samples_000000.txt")
       with open(outfile, "rb") as readfile:
         self.assertIn("Prediction followed by Target @ Step 0",
-                      readfile.read().decode("utf-8"))
+                      tf.compat.as_text(readfile.read()))
 
       # Should not trigger for step 9
       sess.run(tf.assign(global_step, 9))
@@ -108,7 +108,7 @@ def test_sampling(self):
       outfile = os.path.join(self.sample_dir, "samples_000010.txt")
       with open(outfile, "rb") as readfile:
         self.assertIn("Prediction followed by Target @ Step 10",
-                      readfile.read().decode("utf-8"))
+                      tf.compat.as_text(readfile.read()))
 
 
 class TestMetadataCaptureHook(tf.test.TestCase):