Added LEAP paper.

mwhittaker · mwhittaker · commit 5832883c666c · 2017-03-02T10:46:24.000-08:00
diff --git a/html/lin2016towards.html b/html/lin2016towards.html
@@ -0,0 +1,67 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <title>Papers</title>
+  <link href='../style.css' rel='stylesheet'>
+  <meta name=viewport content="width=device-width, initial-scale=1">
+</head>
+
+<body>
+  <div id="container">
+<h1 id="towards-a-non-2pc-transaction-management-in-distributed-database-systems-2016"><a href="https://scholar.google.com/scholar?cluster=9359440394568724083">Towards a Non-2PC Transaction Management in Distributed Database Systems (2016)</a></h1>
+<p>For a traditional single-node database, the data that a transaction reads and writes is all on a single machine. For a distributed OLTP database, there are two types of transactions:</p>
+<ol style="list-style-type: decimal">
+<li><strong>Local transactions</strong> are transactions that read and write data on a single machine, much like traditional transactions. A distributed database can process a local transaction as efficiently as a traditional single-node database can.</li>
+<li><strong>Distributed transactions</strong> are transactions that read and write data that is spread across multiple machines. Typically, distributed databases use two-phase commit (2PC) to commit or abort distributed transactions. Because 2PC requires multiple rounds of communications, distributed databases process distributed transactions less efficiently than local transactions.</li>
+</ol>
+<p>This paper presents an alternative to 2PC, dubbed <strong>Localizing Executions via Aggressive Placement of data (LEAP)</strong>, which tries to avoid the communication overheads of 2PC by aggressively moving all the data a distributed transaction reads and writes onto a single machine, effectively turning the distributed transaction into a local transaction.</p>
+<h2 id="leap">LEAP</h2>
+<p>LEAP is based on the following assumptions and observations:</p>
+<ul>
+<li>Transactions in an OLTP workload don’t read or write many tuples.</li>
+<li>Tuples in an OLTP database are typically very small.</li>
+<li>Multiple transactions issued one after another may access the same data again and again.</li>
+<li>As more advanced network technology becomes available (e.g. RDMA), the cost of moving data becomes smaller and smaller.</li>
+</ul>
+<p>With LEAP, tuples are horizontally partitioned across a set of nodes, and each tuple is stored exactly once. Each node has two data structures:</p>
+<ul>
+<li>a <strong>data table</strong> which stores tuples, and</li>
+<li>a horizontally partitioned <strong>owner table</strong> key-value store which stores ownership information.</li>
+</ul>
+<p>Consider a tuple <code>d = (k, v)</code> with primary key <code>k</code> and value <code>v</code>. The owner table contains an entry<code>(k, o)</code> indicating that node <code>o</code> owns the tuple with key <code>k</code>. The node <code>o</code> contains a <code>(k, v)</code> entry in its data table. The owner table key-value store is partitioned across nodes using any arbitrary partitioning scheme (e.g. hash-based, range-based).</p>
+<p>When a node initiates a transaction, it requests ownership of every tuple it reads and writes. This migrates the tuples to the initiating node and updates the ownership information to reflect the ownership transfer. Here’s how the ownership transfer protocol works. For a given tuple <code>d = (k, v)</code>, the <strong>requester</strong> is the node requesting ownership of <code>d</code>, the <strong>partitioner</strong> is the node with ownership information <code>(k, o)</code>, and the owner is the node that stores <code>d</code>.</p>
+<ul>
+<li>First, the requester sends an <strong>owner request</strong> with key <code>k</code> to the partitioner.</li>
+<li>Then, the partitioner looks up the owner of the tuple with <code>k</code> in its owner table and sends a <strong>transfer request</strong> to the owner.</li>
+<li>The owner retrieves the value of the tuple and sends it in a <strong>transfer response</strong> back to the requester. It also deletes its copy of the tuple.</li>
+<li>Finally, the requester sends an <strong>inform</strong> message to the partitioner informing it that the ownership transfer was complete. The partitioner updates its owner table to reflect the new owner.</li>
+</ul>
+<p>Also note that</p>
+<ul>
+<li>if the requester, partitioner, and owner are all different nodes, then this scheme requires <strong>4 messages</strong>,</li>
+<li>if the partitioner and owner are the same, then this scheme requires <strong>3 messages</strong>, and</li>
+<li>if the requester and partitioner are the same, then this scheme requires <strong>2 messages</strong>.</li>
+</ul>
+<p>If the transfer request is dropped and the owner deletes the tuple, data is lost. See the appendix for information on how to make this ownership transfer fault tolerant. Also see the paper for a theoretical comparison of 2PC and LEAP.</p>
+<h2 id="leap-based-oltp-engine">LEAP-Based OLTP Engine</h2>
+<p>L-Store is a distributed OLTP database based on H-Store which uses LEAP to manage transactions. Transactions acquire read/write locks on individual tuples and use strict two-phase locking. Transactions are assigned globally unique identifiers, and deadlock prevention is implemented with a wait-die scheme where lower timestamped transactions have higher priority. That is, higher priority threads wait on lower priority threads, but lower priority threads abort rather than wait on higher priority threads.</p>
+<p>Concurrent local transactions are processed as usual; what’s interesting is concurrent transfer requests. Imagine a transaction is requesting ownership of a tuple on another node.</p>
+<ul>
+<li>First, the requester creates a <strong>request lock</strong> locally indicating that it is currently trying to request ownership of the tuple. It then sends an owner request to the partitioner.</li>
+<li>The partitioner may receive multiple concurrent owner requests. It processes them serially using the wait-die scheme. As an optimization, it processes requests in decreasing timestamp order to avoid aborts whenever possible. It then forwards a transfer request to the owner.</li>
+<li>If the owner is currently accessing the tuple being requested, it again uses a wait-die scheme to access the tuple before sending it back to the owner.</li>
+<li>Finally, the owner changes the request lock into a normal data lock and continues processing.</li>
+</ul>
+<p>If a transaction cannot successfully get ownership of a tuple, it aborts. L-Store also uses logging and checkpointing for fault tolerance (see paper for details).</p>
+  <script type="text/x-mathjax-config">
+    MathJax.Hub.Config({
+      tex2jax: {
+        inlineMath: [['$','$'], ['\\(','\\)']],
+        skipTags: ['script', 'noscript', 'style', 'textarea'],
+      },
+      messageStyle: "none",
+    });
+  </script>
+  </div>
+</body>
+</html>
diff --git a/index.html b/index.html
@@ -102,6 +102,7 @@ <h1 id="indextitle">Papers</h1>
       <li><a href="html/halevy2016goods.html">Goods: Organizing Google's Datasets <span class="year">(2016)</span></a></li>
       <li><a href="html/chen2016realtime.html">Realtime Data Processing at Facebook <span class="year">(2016)</span></a></li>
       <li><a href="html/crooks2016tardis.html">TARDiS: A Branch-and-Merge Approach To Weak Consistency <span class="year">(2016)</span></a></li>
+      <li><a href="html/lin2016towards.html">Towards a Non-2PC Transaction Management in Distributed Database Systems <span class="year">(2016)</span></a></li>
     </ol>
   </div>
 </body>
diff --git a/papers/lin2016towards.md b/papers/lin2016towards.md
@@ -0,0 +1,105 @@
+# [Towards a Non-2PC Transaction Management in Distributed Database Systems (2016)](https://scholar.google.com/scholar?cluster=9359440394568724083)
+For a traditional single-node database, the data that a transaction reads and
+writes is all on a single machine. For a distributed OLTP database, there are
+two types of transactions:
+
+1. **Local transactions** are transactions that read and write data on a single
+   machine, much like traditional transactions. A distributed database can
+   process a local transaction as efficiently as a traditional single-node
+   database can.
+2. **Distributed transactions**  are transactions that read and write data
+   that is spread across multiple machines. Typically, distributed databases
+   use two-phase commit (2PC) to commit or abort distributed transactions.
+   Because 2PC requires multiple rounds of communications, distributed
+   databases process distributed transactions less efficiently than local
+   transactions.
+
+This paper presents an alternative to 2PC, dubbed **Localizing Executions via
+Aggressive Placement of data (LEAP)**, which tries to avoid the communication
+overheads of 2PC by aggressively moving all the data a distributed transaction
+reads and writes onto a single machine, effectively turning the distributed
+transaction into a local transaction.
+
+## LEAP
+LEAP is based on the following assumptions and observations:
+
+- Transactions in an OLTP workload don't read or write many tuples.
+- Tuples in an OLTP database are typically very small.
+- Multiple transactions issued one after another may access the same data again
+  and again.
+- As more advanced network technology becomes available (e.g. RDMA), the cost
+  of moving data becomes smaller and smaller.
+
+With LEAP, tuples are horizontally partitioned across a set of nodes, and each
+tuple is stored exactly once. Each node has two data structures:
+
+- a **data table** which stores tuples, and
+- a horizontally partitioned **owner table** key-value store which stores
+  ownership information.
+
+Consider a tuple `d = (k, v)` with primary key `k` and value `v`. The owner
+table contains an entry`(k, o)` indicating that node `o` owns the tuple with
+key `k`. The node `o` contains a `(k, v)` entry in its data table. The owner
+table key-value store is partitioned across nodes using any arbitrary
+partitioning scheme (e.g. hash-based, range-based).
+
+When a node initiates a transaction, it requests ownership of every tuple it
+reads and writes. This migrates the tuples to the initiating node and updates
+the ownership information to reflect the ownership transfer. Here's how the
+ownership transfer protocol works. For a given tuple `d = (k, v)`, the
+**requester** is the node requesting ownership of `d`, the **partitioner** is
+the node with ownership information `(k, o)`, and the owner is the node that
+stores `d`.
+
+- First, the requester sends an **owner request** with key `k` to the
+  partitioner.
+- Then, the partitioner looks up the owner of the tuple with `k` in its owner
+  table and sends a **transfer request** to the owner.
+- The owner retrieves the value of the tuple and sends it in a **transfer
+  response** back to the requester. It also deletes its copy of the tuple.
+- Finally, the requester sends an **inform** message to the partitioner
+  informing it that the ownership transfer was complete. The partitioner
+  updates its owner table to reflect the new owner.
+
+Also note that
+
+- if the requester, partitioner, and owner are all different nodes, then this
+  scheme requires **4 messages**,
+- if the partitioner and owner are the same, then this scheme requires **3
+  messages**, and
+- if the requester and partitioner are the same, then this scheme requires **2
+  messages**.
+
+If the transfer request is dropped and the owner deletes the tuple, data is
+lost. See the appendix for information on how to make this ownership transfer
+fault tolerant. Also see the paper for a theoretical comparison of 2PC and
+LEAP.
+
+## LEAP-Based OLTP Engine
+L-Store is a distributed OLTP database based on H-Store which uses LEAP to
+manage transactions. Transactions acquire read/write locks on individual tuples
+and use strict two-phase locking. Transactions are assigned globally unique
+identifiers, and deadlock prevention is implemented with a wait-die scheme
+where lower timestamped transactions have higher priority. That is, higher
+priority threads wait on lower priority threads, but lower priority threads
+abort rather than wait on higher priority threads.
+
+Concurrent local transactions are processed as usual; what's interesting is
+concurrent transfer requests. Imagine a transaction is requesting ownership of
+a tuple on another node.
+
+- First, the requester creates a **request lock** locally indicating that it is
+  currently trying to request ownership of the tuple. It then sends an owner
+  request to the partitioner.
+- The partitioner may receive multiple concurrent owner requests. It processes
+  them serially using the wait-die scheme. As an optimization, it processes
+  requests in decreasing timestamp order to avoid aborts whenever possible. It
+  then forwards a transfer request to the owner.
+- If the owner is currently accessing the tuple being requested, it again uses
+  a wait-die scheme to access the tuple before sending it back to the owner.
+- Finally, the owner changes the request lock into a normal data lock and
+  continues processing.
+
+If a transaction cannot successfully get ownership of a tuple, it aborts.
+L-Store also uses logging and checkpointing for fault tolerance (see paper for
+details).