Formatting and typos

mwhittaker · mwhittaker · commit 653458e250f3 · 2017-08-23T20:45:58.000-07:00
diff --git a/html/decandia2007dynamo.html b/html/decandia2007dynamo.html
@@ -11,23 +11,34 @@
     <a href="../">Papers</a>
   </div>
   <div id="container">
-<h2 id="dynamo-amazons-highly-available-key-value-store-2007"><a href="https://scholar.google.com/scholar?cluster=5432858092023181552&amp;hl=en&amp;as_sdt=0,5">Dynamo: Amazon's Highly Available Key-value Store (2007)</a></h2>
-<p><strong>Overview.</strong> Amazon has a service-oriented infrastructure which consists of a large number of networked services, each with a strict <em>SLA</em>: a formal contract between the clients and server which guarantees the server meet certain performance benchmarks (e.g. 99.9% of responses are within 500 milliseconds). Amazon's user-facing business model makes it more important to meet the SLAs by providing availability, scalability, and low-latency than it is to provide strong consistency. Dynamo is Amazon's distributed higly-available eventually consistent zero-hop distributed hash table (a.k.a. key-value store) that uses consistent hashing, vector clocks, quorums, gossip, and more.</p>
-<p><strong>System Interface.</strong> Dynamo is a key-value store where the values are arbitrary blobs of data. Users can issue <code>get(key)</code> requests which returns either an object or a list of conflicting objects and a context. If multiple objects are returned, the user is responsible for merging them. Moreover, users can issue <code>put(key, context, value)</code> requests where <code>context</code> is used to maintain version clocks.</p>
-<p><strong>Partitioning Algorithm.</strong> Dynamo uses consistent hashing to partition data very similarly to Chord. Data is hashed into a circular space. Nodes are broken down into virtual nodes, each of which is randomly provided a point in the circular key space. Each node is responsible for all the keys between it and its predecessor. The number of virtual nodes at each physical node can be tuned according to the capacity of the node.</p>
-<p><strong>Replication.</strong> Data is sent to a <em>coordinator</em> which writes the data locally and also sends the data to N-1 other nodes. Moreover, each data item has a <em>preference list</em> of nodes where it should be written, and each node in the system knows the preference list for all data items.</p>
-<p><strong>Data Versioning.</strong> Data in Dynamo is timestamped with a vector clock. If a write <code>a</code> happens before a write <code>b</code>, then the two writes can be reconciled trivially; this is known as <em>syntactic reconciliation</em>. However, if <code>a</code> and <code>b</code> are concurrent, then the system or the user has to perform <em>semantic reconciliation</em>. To avoid vector clocks of unbounded size, vector clocks are given a maximum size, and each entry in a vector clock is timestamped with a physical time. When the vector clock exceeds its maximum size, the oldest entry is evicted.</p>
-<p><strong>Execution of <code>get()</code> and <code>put()</code>.</strong> To execute a <code>get()</code> or <code>put()</code>, a Dynamo client can</p>
+<h1 id="dynamo-amazons-highly-available-key-value-store-2007"><a href="https://scholar.google.com/scholar?cluster=5432858092023181552&amp;hl=en&amp;as_sdt=0,5">Dynamo: Amazon's Highly Available Key-value Store (2007)</a></h1>
+<h2 id="overview">Overview</h2>
+<p>Amazon has a service-oriented infrastructure which consists of a large number of networked services, each with a strict <em>SLA</em>: a formal contract between the clients and server which guarantees the server meet certain performance benchmarks (e.g. 99.9% of responses are within 500 milliseconds). Amazon's user-facing business model makes it more important to meet the SLAs by providing availability, scalability, and low-latency than it is to provide strong consistency. Dynamo is Amazon's distributed higly-available eventually consistent zero-hop distributed hash table (a.k.a. key-value store) that uses consistent hashing, vector clocks, quorums, gossip, and more.</p>
+<h2 id="system-interface">System Interface</h2>
+<p>Dynamo is a key-value store where the values are arbitrary blobs of data. Users can issue <code>get(key)</code> requests which returns either an object or a list of conflicting objects and a context. If multiple objects are returned, the user is responsible for merging them. Moreover, users can issue <code>put(key, context, value)</code> requests where <code>context</code> is used to maintain version clocks.</p>
+<h2 id="partitioning-algorithm">Partitioning Algorithm</h2>
+<p>Dynamo uses consistent hashing to partition data very similarly to Chord. Data is hashed into a circular space. Nodes are broken down into virtual nodes, each of which is randomly provided a point in the circular key space. Each node is responsible for all the keys between it and its predecessor. The number of virtual nodes at each physical node can be tuned according to the capacity of the node.</p>
+<h2 id="replication">Replication</h2>
+<p>Data is sent to a <em>coordinator</em> which writes the data locally and also sends the data to N-1 other nodes. Moreover, each data item has a <em>preference list</em> of nodes where it should be written, and each node in the system knows the preference list for all data items.</p>
+<h2 id="data-versioning">Data Versioning</h2>
+<p>Data in Dynamo is timestamped with a vector clock. If a write <code>a</code> happens before a write <code>b</code>, then the two writes can be reconciled trivially; this is known as <em>syntactic reconciliation</em>. However, if <code>a</code> and <code>b</code> are concurrent, then the system or the user has to perform <em>semantic reconciliation</em>. To avoid vector clocks of unbounded size, vector clocks are given a maximum size, and each entry in a vector clock is timestamped with a physical time. When the vector clock exceeds its maximum size, the oldest entry is evicted.</p>
+<h2 id="execution-of-get-and-put">Execution of <code>get()</code> and <code>put()</code></h2>
+<p>To execute a <code>get()</code> or <code>put()</code>, a Dynamo client can</p>
 <ol style="list-style-type: decimal">
 <li>Issue a request to a load balancer, or</li>
 <li>issue it itself if it is a partition aware client (more on this later).</li>
 </ol>
 <p>Dynamo uses quorums to write data. A read must be acknowledged by <code>R</code> servers, a write must be acknowledged by <code>W</code> servers, and <code>R + W &gt; N</code>.</p>
-<p><strong>Handling Failures.</strong> Dynamo uses a <em>sloppy quorum</em> where data can be stored at a node outside its preference list. The data is tagged with the node where the data should be, and the node transfers it there eventually. Moreover, preference lists span multiple data centers.</p>
-<p><strong>Handling Permanent Failures.</strong> Nodes user Merkle trees to determine what state has diverged from one another.</p>
-<p><strong>Membership and Failure Detection.</strong> Membership changes are initiated manually by a human. Nodes gossip membership information and use it transfer data to the newly joined and removed nodes. There are also seed nodes in the ring which nodes always gossip with to avoid a split ring.</p>
-<p><strong>Implementation.</strong> Dynamo is implemented with a pluggable storage engine and uses a SEDA architecture implemented in Java.</p>
-<p><strong>Experiences and Lessons Learned.</strong> Amazon has learned a lot from its experience with Dynamo:</p>
+<h2 id="handling-failures">Handling Failures</h2>
+<p>Dynamo uses a <em>sloppy quorum</em> where data can be stored at a node outside its preference list. The data is tagged with the node where the data should be, and the node transfers it there eventually. Moreover, preference lists span multiple data centers.</p>
+<h2 id="handling-permanent-failures">Handling Permanent Failures</h2>
+<p>Nodes user Merkle trees to determine what state has diverged from one another.</p>
+<h2 id="membership-and-failure-detection">Membership and Failure Detection</h2>
+<p>Membership changes are initiated manually by a human. Nodes gossip membership information and use it transfer data to the newly joined and removed nodes. There are also seed nodes in the ring which nodes always gossip with to avoid a split ring.</p>
+<h2 id="implementation">Implementation</h2>
+<p>Dynamo is implemented with a pluggable storage engine and uses a SEDA architecture implemented in Java.</p>
+<h2 id="experiences-and-lessons-learned">Experiences and Lessons Learned</h2>
+<p>Amazon has learned a lot from its experience with Dynamo:</p>
 <ul>
 <li><em>Balancing performance and durability.</em> Improving durability can decrease performance. For example, if we want to write to <code>W</code> nodes, then increasing <code>W</code> decreases the availability of the system. Dynamo allows writes to be buffered by nodes, rather than written to their disks to increase availability at the cost of durability.</li>
 <li><em>Ensuring uniform load.</em> Assuming there are enough hot keys, hashing data into a circular key space should ensure uniform load. However, the partitioning scheme described above where each node is divided into some number of virtual nodes, the virtual nodes are placed randomly on the ring, and the node placement determines data partitioning has some downsides. Two alternatives are to divide the key space into equal sized partitions and give each node a random number of virtual nodes. Or, to divide the key space into equal sized partitions and adjust the total number of tokens as nodes join and leave the system.</li>
diff --git a/html/diaconu2013hekaton.html b/html/diaconu2013hekaton.html
@@ -36,7 +36,7 @@ <h2 id="programmability-and-query-processing">Programmability and Query Processi
 <p>Hekaton does not compile query plans into a series of function calls. Instead, a query plan is compiled into a single function and operators are connected together via labels and gotos. This allows the code to bypass some otherwise unnecessary function calls. For example, when the query is initially executed, it jumps immediately to the leaves of the query plan rather than recursively calling down to them. Some code (e.g. sort and complicated arithmetic functions) is not generated.</p>
 <p>Hekaton stored procedures have some restrictions (e.g. the schema of the tables that a stored procedure reads must be fixed, the stored procedures must execute within a single transaction). To overcome some of these restrictions, SQL Server allows regular/unrestricted/interpreted stored procedures to read and write Hekaton tables.</p>
 <h2 id="transaction-management">Transaction Management</h2>
-<p>Hekaton supports snapshot isolation, repeatable read, and serializability all implemented with multiversion concurrency control. There are two conditions which can be checked during validation:</p>
+<p>Hekaton supports snapshot isolation, repeatable read, and serializability all implemented with optimistic multiversion concurrency control. There are two conditions which can be checked during validation:</p>
 <ol style="list-style-type: decimal">
 <li><strong>Read stability</strong>. All the versions that a transaction read must still be valid versions upon commit.</li>
 <li><strong>Phantom avoidance</strong>. All the scans a transaction made must be repeatable upon commit.</li>
diff --git a/html/terry1995managing.html b/html/terry1995managing.html
@@ -29,7 +29,7 @@ <h3 id="merge-procedures">Merge Procedures</h3>
 <h2 id="replica-consistency">Replica Consistency</h2>
 <p>Every server maintains a logical timestamp that is roughly kept in correspondence with its physical time. Servers tag writes with an id of the form (timestamp, server id). These ids form a total order, and servers order writes with respect to it. Servers immediately apply writes whenever they are received, and these writes are <strong>tentative</strong>. Slowly, writes are deemed <strong>committed</strong> and ordered before the tentative writes. It's possible that a new write appears and is inserted in the middle of the sequence of writes. This forces a server to <em>undo</em> the affects of later writes. The undo process is described later.</p>
 <h2 id="write-stability-and-commitment">Write Stability and Commitment</h2>
-<p>When a write is applied by a server for the last time, it is considered <strong>stable</strong> (equivalently, committed). Clients can query servers to see which writes have been committed. How to servers commit writes? One approach is to commit a write whenever its timestamp is less than the current timestamp of all servers. Unfortunately, if any of the servers is disconnected, this strategy can delay commit. In Bayou, a single server is designated as the primary and determines the order in which writes are committed. If this primary becomes disconnected, other servers may not see committed data for a while.</p>
+<p>When a write is applied by a server for the last time, it is considered <strong>stable</strong> (equivalently, committed). Clients can query servers to see which writes have been committed. How do servers commit writes? One approach is to commit a write whenever its timestamp is less than the current timestamp of all servers. Unfortunately, if any of the servers is disconnected, this strategy can delay commit. In Bayou, a single server is designated as the primary and determines the order in which writes are committed. If this primary becomes disconnected, other servers may not see committed data for a while.</p>
 <h2 id="storage-system-implementation-issues">Storage System Implementation Issues</h2>
 <p>There are three main components to each server: a write log, a tuple store, and an undo log.</p>
 <ol style="list-style-type: decimal">
diff --git a/papers/decandia2007dynamo.md b/papers/decandia2007dynamo.md
@@ -1,5 +1,5 @@
-## [Dynamo: Amazon's Highly Available Key-value Store (2007)](https://scholar.google.com/scholar?cluster=5432858092023181552&hl=en&as_sdt=0,5)
-**Overview.**
+# [Dynamo: Amazon's Highly Available Key-value Store (2007)](https://scholar.google.com/scholar?cluster=5432858092023181552&hl=en&as_sdt=0,5)
+## Overview
 Amazon has a service-oriented infrastructure which consists of a large number
 of networked services, each with a strict *SLA*: a formal contract between the
 clients and server which guarantees the server meet certain performance
@@ -10,28 +10,28 @@ strong consistency. Dynamo is Amazon's distributed higly-available eventually
 consistent zero-hop distributed hash table (a.k.a. key-value store) that uses
 consistent hashing, vector clocks, quorums, gossip, and more.
 
-**System Interface.**
+## System Interface
 Dynamo is a key-value store where the values are arbitrary blobs of data. Users
 can issue `get(key)` requests which returns either an object or a list of
 conflicting objects and a context. If multiple objects are returned, the user
 is responsible for merging them. Moreover, users can issue `put(key, context,
 value)` requests where `context` is used to maintain version clocks.
 
-**Partitioning Algorithm.**
+## Partitioning Algorithm
 Dynamo uses consistent hashing to partition data very similarly to Chord. Data
 is hashed into a circular space. Nodes are broken down into virtual nodes, each
 of which is randomly provided a point in the circular key space. Each node is
 responsible for all the keys between it and its predecessor. The number of
 virtual nodes at each physical node can be tuned according to the capacity of
 the node.
 
-**Replication.**
+## Replication
 Data is sent to a *coordinator* which writes the data locally and also sends
 the data to N-1 other nodes. Moreover, each data item has a *preference list*
 of nodes where it should be written, and each node in the system knows the
 preference list for all data items.
 
-**Data Versioning.**
+## Data Versioning
 Data in Dynamo is timestamped with a vector clock. If a write `a` happens
 before a write `b`, then the two writes can be reconciled trivially; this is
 known as *syntactic reconciliation*. However, if `a` and `b` are concurrent,
@@ -40,7 +40,7 @@ vector clocks of unbounded size, vector clocks are given a maximum size, and
 each entry in a vector clock is timestamped with a physical time. When the
 vector clock exceeds its maximum size, the oldest entry is evicted.
 
-**Execution of `get()` and `put()`.**
+## Execution of `get()` and `put()`
 To execute a `get()` or `put()`, a Dynamo client can
 
 1. Issue a request to a load balancer, or
@@ -49,26 +49,26 @@ To execute a `get()` or `put()`, a Dynamo client can
 Dynamo uses quorums to write data. A read must be acknowledged by `R` servers,
 a write must be acknowledged by `W` servers, and `R + W > N`.
 
-**Handling Failures.**
+## Handling Failures
 Dynamo uses a *sloppy quorum* where data can be stored at a node outside its
 preference list. The data is tagged with the node where the data should be, and
 the node transfers it there eventually. Moreover, preference lists span
 multiple data centers.
 
-**Handling Permanent Failures.**
+## Handling Permanent Failures
 Nodes user Merkle trees to determine what state has diverged from one another.
 
-**Membership and Failure Detection.**
+## Membership and Failure Detection
 Membership changes are initiated manually by a human. Nodes gossip membership
 information and use it transfer data to the newly joined and removed nodes.
 There are also seed nodes in the ring which nodes always gossip with to avoid
 a split ring.
 
-**Implementation.**
+## Implementation
 Dynamo is implemented with a pluggable storage engine and uses a SEDA
 architecture implemented in Java.
 
-**Experiences and Lessons Learned.**
+## Experiences and Lessons Learned
 Amazon has learned a lot from its experience with Dynamo:
 
 - *Balancing performance and durability.* Improving durability can decrease
@@ -93,4 +93,3 @@ Amazon has learned a lot from its experience with Dynamo:
 - *Balancing foreground and background.* Dynamo uses a resource controller to
   implement admission control for background tasks, preventing them from
   interfering with important foreground tasks.
-
diff --git a/papers/terry1995managing.md b/papers/terry1995managing.md
@@ -72,7 +72,7 @@ described later.
 ## Write Stability and Commitment
 When a write is applied by a server for the last time, it is considered
 **stable** (equivalently, committed). Clients can query servers to see which
-writes have been committed. How to servers commit writes? One approach is to
+writes have been committed. How do servers commit writes? One approach is to
 commit a write whenever its timestamp is less than the current timestamp of all
 servers. Unfortunately, if any of the servers is disconnected, this strategy
 can delay commit. In Bayou, a single server is designated as the primary and