diff --git a/README.md b/README.md
index 6e7a2c8..8da1024 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,292 @@
-# Accelerated Stochastic Block Partitioning
+# Accelerated Stochastic Block Partitioning (SBP)
 
-Despite the title of the repository, to date there is no GPU code in this repository (though it is planned for the future).
+This project provides an optimized implementation of Stochastic Block Partitioning (SBP) for community detection in graphs. It integrates 4 complementary acceleration heuristics into a unified, high-performance framework:
 
-Stochastic block partitioning (SBP) code based on the reference implementation in the [Graph Challenge](http://graphchallenge.org)
+1. **Sampling** via the SamBaS framework [[IEEE HPEC Paper @ IEEE Xplore](https://ieeexplore.ieee.org/document/9926331), [Preprint](https://arxiv.org/abs/2209.04905), [Journal Paper Preprint](https://arxiv.org/abs/2209.04905)]
+2. **Shared-memory parallelization** via Hybrid SBP [[ICPP Paper @ ACM](https://dl.acm.org/doi/10.1145/3545008.3545091), [Preprint](https://arxiv.org/abs/2206.14709)]
+3. **Distributed-memory parallelization** via EDiSt [[IEEE CLUSTER Paper Preprint](https://arxiv.org/abs/2209.04910)]
+4. **Top-Down Computation** via Top-Down SBP [[ACM HPDC Paper](https://dl.acm.org/doi/10.1145/3731545.3731589)]
 
-Sequential, shared memory parallel, and distributed memory, multi-node SBP code is in src/main.cpp.
+Our SBP implementation is based on the [Graph Challenge](http://graphchallenge.org) reference implementation and optimized for modern HPC systems with support for multi-threaded (OpenMP) and distributed (MPI) execution.
 
-A C++ translation of the divide-and-conquer approach based on [iHeartGraph's implementation](https://github.com/iHeartGraph/GraphChallenge) and [Scalable Stochastic Block Partition paper](https://ieeexplore.ieee.org/document/8091050) is found in src/DivideAndConquerSBP.cpp.
+## Installation
+
+### Requirements
+
+- C++17 compatible compiler (e.g., AMD Clang, GCC, Intel)
+- CMake 3.10 or higher
+- MPI implementation (OpenMPI, Cray MPICH, etc.)
+- OpenMP support
+
+### Build Instructions
+
+```bash
+git clone --recursive https://github.com/vtsynergy/IntegratedSBP.git
+cd IntegratedSBP
+mkdir build
+cd build
+cmake ..
+make
+```
+
+### Build Options
+
+For testing and debugging purposes:
+
+- `cmake -DSCOREP=ON ..` - Enable Score-P instrumentation
+- `cmake -DVALGRIND=ON ..` - Enable Valgrind instrumentation
+- `cmake -DDEBUG=ON ..` - Enable AddressSanitizer (note: will cause test failures due to memory leak detection)
+
+### Testing
+
+Run the test suite with:
+
+```bash
+make test
+```
+
+**Note**: Some build options (particularly `-DDEBUG=ON`) will cause test failures.
+
+## Usage
+
+### Basic Usage
+
+For a single-node run with OpenMP parallelization:
+
+```bash
+./SBP --filepath /path/to/graph --threads <num_threads>
+```
+
+### Recommended Usage (Integrated Approach)
+
+For best results with the integrated sampling, shared-memory, top-down computation and distributed-memory approach:
+
+```bash
+mpiexec -n <num_cluster_nodes> ./TopDownSBP \
+  --filepath <path> \
+  -a hybrid_mcmc \
+  --threads <num_threads> \
+  --batches 2 \
+  --distribute none-edge-balanced \
+  --nonparametric \
+  -m 0.1 \
+  --samplingalg expansion_snowball \
+  --samplesize 0.5 \
+  --cachesize 20000
+  --split connectivity-snowball \
+  --splitinit random
+```
+
+Since most of these options have been made the default in newer builds, and you should get similar results using:
+
+```bash
+mpiexec -n <num_cluster_nodes> ./TopDownSBP --filepath <path>
+```
+
+### Graph File Formats
+
+The code expects graph files in one of these formats:
+
+- **Matrix Market** (`.mtx`): Standard sparse matrix format
+- **Tab-Separated Values** (`.tsv`): Edge list format with tab delimiter
+
+**File naming convention**: If `--filepath=/path/to/graph`, the code will look for:
+- Graph: `/path/to/graph.mtx` or `/path/to/graph.tsv`
+- Ground truth (optional): `/path/to/graph_truePartition.tsv`
+
+### Key Parameters
+
+#### Algorithm Options
+
+- `-a, --algorithm <type>` - MCMC algorithm to use:
+  - `hybrid_mcmc` (default, recommended): Hybrid approach balancing parallelism and accuracy
+  - `metropolis_hastings`: Classic MH algorithm (no parallel implementation)
+  - `async_gibbs`: Fully asynchronous, faster but may reduce quality
+
+#### Performance Tuning
+
+- `--threads <N>` - Number of OpenMP threads (default: 1)
+- `--batches <N>` - Number of batches for parallel algorithms (default: 2, recommended ≥ 2)
+- `--cachesize <N>` - Size of logarithm cache (default: 20000)
+- `-m, --mh_percent <X>` - Fraction of high-influence vertices for sequential processing (default: 0.1). High-influence vertices are identified by edge degree product (or vertex degree if `--vertex-degree-sort` is used)
+
+#### Sampling
+
+- `--samplesize <X>` - Fraction of vertices to sample (0.0 to 1.0, default: 1.0 = no sampling)
+- `--samplingalg <type>` - Sampling algorithm:
+  - `random`: Uniform random sampling
+  - `max_degree`: Select highest degree vertices
+  - `expansion_snowball`: Graph expansion from high-degree seeds
+
+#### Entropy Calculation
+
+- `--parametric` - Use parametric entropy instead of nonparametric (not recommended; nonparametric is default and more accurate)
+- `--hastings-correction` - Compute Hastings correction (only useful in parametric mode; disabled by default)
+- `--approximate` - Use faster, less accurate block merge (not recommended with nonparametric mode)
+- `--degreecorrected` - Degree-corrected entropy (**WARNING**: not fully implemented, will produce poor results with nonparametric mode)
+
+#### Distribution (Multi-node)
+
+- `--distribute <scheme>` - Distribution scheme for MPI:
+  - `none` (default): No distribution
+  - `none-edge-balanced`: Balanced edge distribution (recommended for MPI)
+  - `2hop-round-robin`, `2hop-size-balanced`, `2hop-snowball`: Alternative schemes (less tested)
+
+#### Top-Down SBP
+
+- `--split <type>` - Split strategy for top-down SBP:
+  - `connectivity-snowball` (default): Connectivity-aware graph partitioning
+  - `random`: Random vertex assignment
+  - `snowball`, `single-snowball`: Snowball expansion variants
+- `--splitinit <type>` - Split initialization:
+  - `random` (default): Random seed selection
+  - `degree-weighted`: Weighted by vertex degree
+  - `high-degree`: Select highest degree vertex
+
+#### Matrix Storage
+
+- `--no_transpose` - Disable transpose matrix storage (saves memory, slower column access)
+
+#### Vertex Processing
+
+- `--vertex-degree-sort` - Use vertex degree instead of edge degree product to identify high-influence vertices (may reduce accuracy)
+- `--ordered` - Process vertices in order (disabled by default: vertices processed in a random order)
+- `--detach` - Detach degree-1 vertices before community detection
+
+#### Input/Output
+
+- `-f, --filepath <path>` - Path to graph file (without extension)
+- `-j, --json <dir>` - Output directory for JSON results (default: `output`)
+- `--output_file <name>` - Output filename (default: auto-generated from PID and timestamp)
+- `--evaluate` - Evaluate and report accuracy metrics immediately (requires ground truth)
+- `--tag <string>` - Custom tag for identifying runs
+
+#### Graph Properties
+
+- `--undirected` - Treat graph as undirected
+- `--noduplicates` - Check for and remove duplicate edges
+- `--delimiter <char>` - Delimiter for TSV files (default: tab)
+
+### Output
+
+Results are saved as JSON files in the specified output directory. Each file contains:
+
+- Detected community assignments
+- Runtime statistics
+- Algorithm parameters
+- Entropy/DL values
+- Evaluation metrics (if ground truth provided and `--evaluate` used)
+
+### Notes
+
+1. Nonparametric entropy (default) improves accuracy. Do not use `--parametric` unless you have a specific reason.
+2. Set `--batches` to at least 2 (default) for parallel/distributed runs to improve accuracy.
+3. Edge degree product sorting (default) with `-m 0.1` (default) improves accuracy in parallel/distributed runs. `--vertex-degree-sort` and/or lower values of `m` can lead to poor accuracy.
+4. For distributed runs, use `--distribute none-edge-balanced`.
+5. For sampling, `--samplingalg expansion_snowball` with `--samplesize 0.5` provides good speed/accuracy tradeoff. For very sparse datasets, use `--samplingalg max_degree` instead.
+6. Hastings correction is disabled by default. Avoid using `--hastings-correction` and `--approximate` together when in nonparametric mode.
+7. For top-down SBP, the recommended `--split connectivity-snowball` (default) and `--splitinit random` (default) provide best accuracy.
+
+## Code Structure
+
+```
+IntegratedSBP/
+├── include/          # Header files
+│   ├── args.hpp      # Command-line argument parsing
+│   ├── sbp.hpp       # Main SBP algorithm
+│   ├── blockmodel/   # Blockmodel data structures
+│   ├── distributed/  # EDiSt distributed components
+│   └── ...
+├── src/              # Implementation files
+│   ├── main.cpp      # Entry point for SBP
+│   ├── TopDownSBPMain.cpp  # Entry point for top-down SBP
+│   ├── sbp.cpp       # Core SBP logic
+│   ├── finetune.cpp  # MCMC algorithms
+│   ├── block_merge.cpp     # Block merge phase
+│   ├── entropy.cpp   # Entropy calculations
+│   ├── distributed/  # Distributed memory components
+│   └── ...
+├── test/             # Test suite
+├── extern/           # External dependencies
+├── scripts/          # Utility scripts
+└── CMakeLists.txt    # Build configuration
+```
+
+## Citation
+
+If you use this code, please cite our relevant papers:
+
+**Integrated SBP (IEEE HPEC 2023):**
+```bibtex
+@INPROCEEDINGS{Wanye2023IntegratedSBP,
+  author={Wanye, Frank and Gleyzer, Vitaliy and Kao, Edward and Feng, Wu-chun},
+  booktitle={2023 IEEE High Performance Extreme Computing Conference (HPEC)}, 
+  title={An Integrated Approach for Accelerating Stochastic Block Partitioning}, 
+  year={2023},
+  volume={},
+  number={},
+  pages={1-7},
+  month={9},
+  address={Waltham, MA},
+  doi={10.1109/HPEC58863.2023.10363599}
+}
+```
+
+**Top-Down SBP (ACM HPDC 2024):**
+```bibtex
+@INPROCEEDINGS{Wanye2024TopDownSBP,
+  author={Wanye, Frank and Gleyzer, Vitaliy and Kao, Edward and Feng, Wu-chun},
+  booktitle={Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC '24)},
+  title={Top-Down Stochastic Block Partitioning for Scalable Community Detection},
+  year={2024},
+  publisher={Association for Computing Machinery},
+  address={New York, NY, USA},
+  doi={10.1145/3731545.3731589}
+}
+```
+
+**SamBaS (Sampling):**
+```bibtex
+@INPROCEEDINGS{Gleyzer2022SamBaS,
+  author={Gleyzer, Vitaliy and Kao, Edward and Feng, Wu-chun},
+  booktitle={2022 IEEE High Performance Extreme Computing Conference (HPEC)},
+  title={SamBaS: Sampling-Based Stochastic Block Partitioning},
+  year={2022}
+}
+```
+
+**Hybrid SBP (Shared-memory parallelization):**
+```bibtex
+@INPROCEEDINGS{Kao2022HybridSBP,
+  author={Kao, Edward and Gleyzer, Vitaliy and Feng, Wu-chun},
+  booktitle={2022 International Conference on Parallel Processing (ICPP)},
+  title={Hybrid Stochastic Block Partitioning for Large-Scale Graphs},
+  year={2022}
+}
+```
+
+**EDiSt (Distributed-memory parallelization):**
+```bibtex
+@INPROCEEDINGS{Gleyzer2022EDiSt,
+  author={Gleyzer, Vitaliy and Kao, Edward and Feng, Wu-chun},
+  booktitle={2022 IEEE International Conference on Cluster Computing (CLUSTER)},
+  title={EDiSt: Edge-Balanced Distribution for Stochastic Block Partitioning},
+  year={2022}
+}
+```
+
+## Known Issues
+
+- The `--degreecorrected` option is not fully implemented and will produce incorrect results in nonparametric mode (default)
+- The `--approximate` option may reduce quality in nonparametric mode (default)
+- Some distribution schemes other than `none-edge-balanced` are not fully tested
+
+## License
+
+© Virginia Polytechnic Institute and State University, 2023.
+
+Licensed under the GNU Lesser General Public License v2.1. See the [LICENSE](LICENSE) file for details.
+
+## Acknowledgments
+
+This work is based on the Graph Challenge Stochastic Block Partitioning reference implementation. We thank the Graph Challenge organizers and the original SBP authors for their foundational work.
diff --git a/include/args.hpp b/include/args.hpp
index 03c6ce9..fea02b3 100644
--- a/include/args.hpp
+++ b/include/args.hpp
@@ -22,14 +22,14 @@ class Args {
     size_t cachesize;
     std::string csv;  // TODO: get rid of this - results now saved to json
     bool degreecorrected;
-    bool degreeproductsort;
+    bool vertex_degree_sort;
     std::string delimiter;
     bool detach;
     std::string distribute;
     std::string directory;
     bool evaluate;
     std::string filepath;
-    bool greedy;
+    bool hastings_correction;
     std::string json;
     float mh_percent;
     bool mix;
@@ -37,7 +37,7 @@ class Args {
     bool nodelta;  // TODO: if delta is much faster, get rid of this and associated methods.
     bool noduplicates;
     bool nonblocking;
-    bool nonparametric;
+    bool parametric;
     int numvertices;
     bool ordered;
     std::string output_file;
@@ -74,11 +74,11 @@ class Args {
                                                    std::numeric_limits<int>::max(), "[1, infinity]", parser);
             TCLAP::ValueArg<int> _batches("", "batches", "The number of batches to use for the asynchronous_gibbs "
                                           "algorithm. Too many batches will lead to many updates and little parallelism,"
-                                          " but too few will lead to poor results or more iterations", false, 1, "int",
+                                          " but too few will lead to poor results or more iterations", false, 2, "int",
                                           parser);
             TCLAP::ValueArg<std::string> _blocksizevar("b", "blocksizevar", "The variation between the sizes of "
                                                        "communities", false, "low", "low|high|unk", parser);
-            TCLAP::ValueArg<size_t> _cachesize("", "cachesize", "The size of the log cache", false, 100, ">= 1", parser);
+            TCLAP::ValueArg<size_t> _cachesize("", "cachesize", "The size of the log cache", false, 20000, ">= 1", parser);
             TCLAP::ValueArg<std::string> _csv("c", "csv",
                                               "The path to the csv file in which the results will be stored, "
                                               "without the suffix, e.g.:\n"
@@ -86,8 +86,8 @@ class Args {
                                               false, "./eval/test", "path", parser);
             TCLAP::SwitchArg _degreecorrected("", "degreecorrected", "If set, will compute the degree-corrected description length.",
                                               parser, false);
-            TCLAP::SwitchArg _degreeproductsort("", "degreeproductsort", "If set, will use edge degree products to split vertices "
-                                                "into high and low influence sets.", parser, false);
+            TCLAP::SwitchArg _vertex_degree_sort("", "vertex-degree-sort", "If set, will use vertex degree instead of edge degree product "
+                                                "to identify high-influence vertices (less accurate but faster).", parser, false);
             TCLAP::ValueArg<std::string> _delimiter("", "delimiter", "The delimiter used in the file storing the graph",
                                                     false, "\t", "string, usually `\\t` or `,`", parser);
             TCLAP::SwitchArg _detach("", "detach", "If set, will detach 1-degree vertices before running"
@@ -108,12 +108,12 @@ class Args {
                                        parser, false);
             TCLAP::ValueArg<std::string> _filepath("f", "filepath", "The filepath for the graph, minus the extension.",
                                                    true, "./data/default_graph", "path", parser);
-            TCLAP::SwitchArg _greedy("", "greedy", "If set, will *not* use a greedy approach; hastings correction will not be computed",
-                                     parser, true);
+            TCLAP::SwitchArg _hastings_correction("", "hastings-correction", "If set, will compute Hastings correction for more accurate MCMC (slower, only useful in parametric mode)",
+                                     parser, false);
             TCLAP::ValueArg<std::string> _json("j", "json", "The path to the directory containing json output",
                                                false, "output", "path", parser);
             TCLAP::ValueArg<float> _mh_percent("m", "mh_percent", "The percentage of vertices to process sequentially if alg==hybrid_mcmc",
-                                               false, 0.075, "float", parser);
+                                               false, 0.1, "float", parser);
             TCLAP::SwitchArg _mix("", "mix", "If set, will run block merges after golden ratio is found", parser, false);
             TCLAP::SwitchArg _modularity("", "modularity", "If set, will compute modularity at the end of execution.",
                                          parser, false);
@@ -123,7 +123,7 @@ class Args {
                                            "and ensure that they're not inserted twice. Otherwise, the graph needs to "
                                            "be manually checked to ensure that it's not a multigraph.", parser, false);
             TCLAP::SwitchArg _nonblocking("", "nonblocking", "If set, will use MPI nonblocking single-sided communication in the MCMC phase.", parser, false);
-            TCLAP::SwitchArg _nonparametric("", "nonparametric", "If set, will use the nonparametric blockmodel entropy computations.",
+            TCLAP::SwitchArg _parametric("", "parametric", "If set, will use the parametric blockmodel entropy computations instead of nonparametric (disables default behavior).",
                                             parser, false);
             TCLAP::ValueArg<int> _numvertices("n", "numvertices", "The number of vertices in the graph", false, 1000,
                                               "int", parser);
@@ -136,8 +136,8 @@ class Args {
                                                false, 1.0, "0 < x <= 1.0", parser);
             TCLAP::ValueArg<std::string> _samplingalg("", "samplingalg", "The sampling algorithm to use, if --samplesize < 1.0",
                                                       false, "random", "random|max_degree|expansion_snowball", parser);
-            TCLAP::ValueArg<std::string> _split("", "split", "The type of split to use in TopDownSBP", false, "random",
-                                                "random|snowball|single-snowball", parser);
+            TCLAP::ValueArg<std::string> _split("", "split", "The type of split to use in TopDownSBP", false, "connectivity-snowball",
+                                                "random|snowball|single-snowball|connectivity-snowball", parser);
             TCLAP::ValueArg<std::string> _splitinit("", "splitinit", "The type of split initialization to use", false, "random",
                                                     "random|degree-weighted|high-degree", parser);
             TCLAP::ValueArg<int> _subgraphs("", "subgraphs", "If running divide and conquer SBP, the number of subgraphs"
@@ -165,14 +165,14 @@ class Args {
             this->cachesize = _cachesize.getValue();
             this->csv = _csv.getValue();
             this->degreecorrected = _degreecorrected.getValue();
-            this->degreeproductsort = _degreeproductsort.getValue();
+            this->vertex_degree_sort = _vertex_degree_sort.getValue();
             this->delimiter = _delimiter.getValue();
             this->detach = _detach.getValue();
             this->distribute = _distribute.getValue();
             this->directory = _directory.getValue();
             this->evaluate = _evaluate.getValue();
             this->filepath = _filepath.getValue();
-            this->greedy = _greedy.getValue();
+            this->hastings_correction = _hastings_correction.getValue();
             this->json = _json.getValue();
             this->mh_percent = _mh_percent.getValue();
             this->mix = _mix.getValue();
@@ -180,7 +180,7 @@ class Args {
             this->nodelta = _nodelta.getValue();
             this->noduplicates = _noduplicates.getValue();
             this->nonblocking = _nonblocking.getValue();
-            this->nonparametric = _nonparametric.getValue();
+            this->parametric = _parametric.getValue();
             this->numvertices = _numvertices.getValue();
             this->ordered = _ordered.getValue();
             this->output_file = _output_file.getValue();
@@ -201,9 +201,9 @@ class Args {
             this->no_transpose = _notranspose.getValue();
             this->type = _type.getValue();
             this->undirected = _undirected.getValue();
-            if (this->nonparametric) {
-                std::cout << "NOTE: using nonparametric entropy, setting greedy to false" << std::endl;
-                this->greedy = false;
+            if (!this->parametric) {
+                std::cout << "NOTE: using nonparametric entropy, Hastings correction disabled (greedy mode)" << std::endl;
+                this->hastings_correction = false;
             }
         } catch (TCLAP::ArgException &exception) {
             std::cerr << "ERROR " << "ERROR: " << exception.error() << " for argument " << exception.argId() << std::endl;
diff --git a/include/utils.hpp b/include/utils.hpp
index 5b3ec48..37f7b5d 100644
--- a/include/utils.hpp
+++ b/include/utils.hpp
@@ -8,7 +8,7 @@
 #include <fstream>
 #include <iostream>
 #include <cmath>
-#include <execution>
+// #include <execution>  // TBB dependency removed
 #include <mpi.h>
 #include <numeric>
 #include <sstream>
@@ -166,7 +166,7 @@ template <typename T> inline std::vector<long> argsort(const std::vector<T> &uns
     // initialize original index locations
     std::vector<long> indices = utils::range<long>(0, unsorted.size());
     // sort indexes based on comparing values in unsorted
-    std::stable_sort(std::execution::par_unseq, indices.data(), indices.data() + indices.size(),
+    std::stable_sort(indices.data(), indices.data() + indices.size(),
                      [&unsorted](size_t i1, size_t i2) { return unsorted[i1] > unsorted[i2]; });
     return indices;
 }
diff --git a/src/block_merge.cpp b/src/block_merge.cpp
index 035fd07..1796bcb 100644
--- a/src/block_merge.cpp
+++ b/src/block_merge.cpp
@@ -1,287 +1,287 @@
-#include "block_merge.hpp"
-
-#include <cmath>
-
-#include "args.hpp"
-#include "entropy.hpp"
-#include "globals.hpp"
-#include "mpi_data.hpp"
-#include "utils.hpp"
-#include "typedefs.hpp"
-
-namespace block_merge {
-
-Delta blockmodel_delta(long current_block, long proposed_block, const Blockmodel &blockmodel) {
-    Delta delta(current_block, proposed_block, blockmodel.degrees(current_block));
-    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getrow_sparse(current_block)) {
-        long col = entry.first;  // row = current_block
-        long value = entry.second;
-        if (col == current_block || col == proposed_block) {  // entry = current_block, current_block
-            delta.add(proposed_block, proposed_block, value);
-        } else {
-            delta.add(proposed_block, col, value);
-        }
-        delta.sub(current_block, col, value);
-    }
-    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getcol_sparse(current_block)) {
-        long row = entry.first;  // col = current_block
-        if (row == current_block) continue;  // already handled above
-        long value = entry.second;
-        if (row == proposed_block) {  // entry = current_block, current_block
-            delta.add(proposed_block, proposed_block, value);
-        } else {
-            delta.add(row, proposed_block, value);
-        }
-        delta.sub(row, current_block, value);
-    }
-    return delta;
-}
-
-void carry_out_best_merges_advanced(Blockmodel &blockmodel, const std::vector<double> &delta_entropy_for_each_block,
-                                    const std::vector<long> &best_merge_for_each_block, const Graph &graph) {
-    // The following code is modeled after the `merge_sweep` function in
-    // https://git.skewed.de/count0/graph-tool/-/blob/master/src/graph/inference/loops/merge_loop.hh
-    typedef std::tuple<long, long, double> merge_t;
-    auto cmp_fxn = [](merge_t left, merge_t right) { return std::get<2>(left) > std::get<2>(right); };
-    std::priority_queue<merge_t, std::vector<merge_t>, decltype(cmp_fxn)> queue(cmp_fxn);
-    for (long i = 0; i < (long) delta_entropy_for_each_block.size(); ++i)
-        queue.push(std::make_tuple(i, best_merge_for_each_block[i], delta_entropy_for_each_block[i]));
-//     double delta_entropy = 0.0;
-    long num_merged = 0;
-    // Block map is here so that, if you move merge block A to block C, and then block B to block A, all three blocks
-    // end up with the same block assignment (C)
-    std::vector<long> block_map = utils::range<long>(0, blockmodel.num_blocks());
-    while (num_merged < blockmodel.getNum_blocks_to_merge() && !queue.empty()) {
-        merge_t merge = queue.top();
-        queue.pop();
-        long merge_from = std::get<0>(merge);
-        long newblock = std::get<1>(merge);
-        if (newblock < 0 || size_t(newblock) >= block_map.size())
-            std::cout << mpi.rank << " | block_map.size = " << block_map.size() << " nB = " << blockmodel.num_blocks() << " block = " << merge_from << " newblock = " << newblock << std::endl;
-        long merge_to = block_map[std::get<1>(merge)];
-//         double delta_entropy_hlong = std::get<2>(merge);
-        if (merge_from != merge_to) {
-            // Calculate the delta entropy given the current block assignment
-            EdgeWeights out_blocks = blockmodel.blockmatrix()->outgoing_edges(merge_from);
-            EdgeWeights in_blocks = blockmodel.blockmatrix()->incoming_edges(merge_from);
-            long k_out = std::accumulate(out_blocks.values.begin(), out_blocks.values.end(), 0);
-            long k_in = std::accumulate(in_blocks.values.begin(), in_blocks.values.end(), 0);
-            long k = k_out + k_in;
-            utils::ProposalAndEdgeCounts proposal{merge_to, k_out, k_in, k};
-            Delta delta = blockmodel_delta(merge_from, proposal.proposal, blockmodel);
-//            long proposed_block_self_edges = blockmodel.blockmatrix()->get(merge_to, merge_to)
-//                                            + delta.get(merge_to, merge_to);
-//            double delta_entropy_actual = entropy::block_merge_delta_mdl(merge_from, proposal, blockmodel, delta);
-            double delta_entropy_actual = args.nonparametric ?
-                                          entropy::nonparametric::block_merge_delta_mdl(blockmodel, proposal, graph, delta) :
-                                          entropy::block_merge_delta_mdl(merge_from, proposal, blockmodel, delta);
-            // If the actual change in entropy is more positive (greater) than anticipated, put it back in queue
-            if (!queue.empty() && delta_entropy_actual > std::get<2>(queue.top())) {
-                std::get<2>(merge) = delta_entropy_actual;
-                queue.push(merge);
-                continue;
-            }
-            // Perform the merge
-            // 1. Update the assignment
-            for (size_t i = 0; i < block_map.size(); ++i) {
-                long block = block_map[i];
-                if (block == merge_from) {
-                    block_map[i] = merge_to;
-                }
-            }
-            blockmodel.merge_block(merge_from, merge_to, delta, proposal);
-//            blockmodel.update_block_assignment(merge_from, merge_to);
-//             2. Update the matrix
-//            blockmodel.blockmatrix()->update_edge_counts(delta);
-//            blockmodel.degrees_out(new_block_degrees._block_degrees_out);
-//            blockmodel.degrees_in(new_block_degrees._block_degrees_in);
-//            blockmodel.degrees(new_block_degrees._block_degrees);
-//            blockmodel.degrees_out(merge_from, 0);
-//            blockmodel.degrees_out(merge_to, blockmodel.degrees_out(merge_to) + proposal.num_out_neighbor_edges);
-//            blockmodel.degrees_in(merge_from, 0);
-//            blockmodel.degrees_in(merge_to, blockmodel.degrees_in(merge_to) + proposal.num_in_neighbor_edges);
-//            blockmodel.degrees(merge_from, 0);
-//            blockmodel.degrees(merge_to, blockmodel.degrees_out(merge_to) + blockmodel.degrees_in(merge_to)
-//                               - proposed_block_self_edges);
-
-            num_merged++;
-        }
-    }
-    std::vector<long> mapping = Blockmodel::build_mapping(blockmodel.block_assignment());
-    for (long i = 0; i < (long) blockmodel.block_assignment().size(); ++i) {
-        long block = blockmodel.block_assignment(i);
-        long new_block_index = mapping[block];
-        blockmodel.set_block_assignment(i, new_block_index);
-    }
-    blockmodel.num_blocks(blockmodel.num_blocks() - blockmodel.getNum_blocks_to_merge());
-}
-
-EdgeCountUpdates edge_count_updates(std::shared_ptr<ISparseMatrix> blockmodel, long current_block, long proposed_block,
-                                    EdgeWeights &out_blocks, EdgeWeights &in_blocks) {
-    // TODO: these are copy constructors, can we safely get rid of them?
-    std::vector<long> proposal_row = blockmodel->getrow(proposed_block);
-    std::vector<long> proposal_col = blockmodel->getcol(proposed_block);
-    long count_self = blockmodel->get(current_block, current_block);
-    long count_in = count_self, count_out = count_self;
-    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
-        long index = in_blocks.indices[i];
-        long value = in_blocks.values[i];
-        if (index == proposed_block) {
-            count_in += value;
-        }
-        proposal_col[index] += value;
-    }
-    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
-        long index = out_blocks.indices[i];
-        long value = out_blocks.values[i];
-        if (index == proposed_block) {
-            count_out += value;
-        }
-        proposal_row[index] += value;
-    }
-    proposal_row[current_block] -= count_in;
-    proposal_row[proposed_block] += count_in;
-    proposal_col[current_block] -= count_out;
-    proposal_col[proposed_block] += count_out;
-    return EdgeCountUpdates{std::vector<long>(), proposal_row, std::vector<long>(), proposal_col};
-}
-
-void edge_count_updates_sparse(ISparseMatrix *blockmodel, long current_block, long proposed_block,
-                               EdgeWeights &out_blocks, EdgeWeights &in_blocks, SparseEdgeCountUpdates &updates) {
-    // TODO: these are copy constructors, can we safely get rid of them?
-    updates.proposal_row = blockmodel->getrow_sparse(proposed_block);
-    updates.proposal_col = blockmodel->getcol_sparse(proposed_block);
-    long count_self = blockmodel->get(current_block, current_block);
-    long count_in = count_self, count_out = count_self;
-    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
-        long index = in_blocks.indices[i];
-        long value = in_blocks.values[i];
-        if (index == proposed_block) {
-            count_in += value;
-        }
-        updates.proposal_col[index] += value;
-    }
-    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
-        long index = out_blocks.indices[i];
-        long value = out_blocks.values[i];
-        if (index == proposed_block) {
-            count_out += value;
-        }
-        updates.proposal_row[index] += value;
-    }
-    updates.proposal_row[current_block] -= count_in;
-    updates.proposal_row[proposed_block] += count_in;
-    updates.proposal_col[current_block] -= count_out;
-    updates.proposal_col[proposed_block] += count_out;
-}
-
-Blockmodel &merge_blocks(Blockmodel &blockmodel, const Graph &graph, long num_edges) {
-    assert(blockmodel.blockmatrix()->edges() == graph.num_edges());
-    // TODO: add block merge timings to evaluation
-    long num_blocks = blockmodel.num_blocks();
-    std::vector<long> best_merge_for_each_block = utils::constant<long>(num_blocks, -1);
-    std::vector<double> delta_entropy_for_each_block =
-            utils::constant<double>(num_blocks, std::numeric_limits<double>::max());
-    std::vector<long> block_assignment = utils::range<long>(0, num_blocks);
-    // TODO: keep track of already proposed merges, do not re-process those
-    long num_avoided = 0;  // number of avoided/skipped calculations
-    double start_t = MPI_Wtime();
-    #pragma omp parallel for schedule(dynamic) reduction( + : num_avoided) default(none) \
-    shared(num_blocks, num_edges, blockmodel, delta_entropy_for_each_block, best_merge_for_each_block, std::cout, graph)
-    for (long current_block = 0; current_block < num_blocks; ++current_block) {
-        std::unordered_map<long, bool> past_proposals;
-        for (long i = 0; i < NUM_AGG_PROPOSALS_PER_BLOCK; ++i) {
-            ProposalEvaluation proposal = propose_merge_sparse(current_block, blockmodel, graph, past_proposals);
-            if (proposal.delta_entropy == 0.0) {
-//                std::cout << current_block << " --> " << proposal.proposed_block << " == " << proposal.delta_entropy << std::endl;
-                int numvertices = 0;
-                for (int block : blockmodel.block_assignment()) {
-                    if (block == current_block) numvertices++;
-                }
-//                std::cout << "num vertices with block = " << current_block << " = " << numvertices << std::endl;
-//                std::cout << "current block neighbors: ";
-//                utils::print<long>(blockmodel.blockmatrix()->neighbors_weights(current_block));
-//                std::cout << "current block row: ";
-//                utils::print<long>(blockmodel.blockmatrix()->getrow_sparse(current_block));
-//                std::cout << "current block col: ";
-//                utils::print<long>(blockmodel.blockmatrix()->getcol_sparse(current_block));
-//                std::cout << "proposed block neighbors: ";
-//                utils::print<long>(blockmodel.blockmatrix()->neighbors_weights(proposal.proposed_block));
-            }
-            if (proposal.delta_entropy == std::numeric_limits<double>::max()) num_avoided++;
-            if (proposal.delta_entropy < delta_entropy_for_each_block[current_block]) {
-                best_merge_for_each_block[current_block] = proposal.proposed_block;
-                delta_entropy_for_each_block[current_block] = proposal.delta_entropy;
-            }
-        }
-    }
-    timers::BlockMerge_loop_time += MPI_Wtime() - start_t;
-    std::cout << "Avoided " << num_avoided << " / " << NUM_AGG_PROPOSALS_PER_BLOCK * num_blocks << " comparisons."
-              << std::endl;
-    if (args.approximate)
-        blockmodel.carry_out_best_merges(delta_entropy_for_each_block, best_merge_for_each_block);
-    else
-        carry_out_best_merges_advanced(blockmodel, delta_entropy_for_each_block, best_merge_for_each_block,
-                                       graph);
-    blockmodel.initialize_edge_counts(graph);
-    return blockmodel;
-}
-
-// TODO: get rid of block_assignment (block_assignment), just use blockmodel
-//ProposalEvaluation propose_merge(long current_block, long num_edges, Blockmodel &blockmodel,
-//                                 std::vector<long> &block_assignment) {
-//    EdgeWeights out_blocks = blockmodel.blockmatrix()->outgoing_edges(current_block);
-//    EdgeWeights in_blocks = blockmodel.blockmatrix()->incoming_edges(current_block);
-//    utils::ProposalAndEdgeCounts proposal =
-//            common::propose_new_block(current_block, out_blocks, in_blocks, block_assignment, blockmodel, true);
-//    EdgeCountUpdates updates =
-//            edge_count_updates(blockmodel.blockmatrix(), current_block, proposal.proposal, out_blocks, in_blocks);
-//    long current_block_self_edges = blockmodel.blockmatrix()->get(current_block, current_block)
-//                                   + updates.block_row[current_block];
-//    long proposed_block_self_edges = blockmodel.blockmatrix()->get(proposal.proposal, proposal.proposal)
-//                                    + updates.proposal_row[proposal.proposal];
-//    common::NewBlockDegrees new_block_degrees = common::compute_new_block_degrees(
-//            current_block, blockmodel, current_block_self_edges, proposed_block_self_edges, proposal);
-//    double delta_entropy =
-//            entropy::block_merge_delta_mdl(current_block, proposal.proposal, num_edges, blockmodel, updates,
-//                                  new_block_degrees);
-//    return ProposalEvaluation{proposal.proposal, delta_entropy};
-//}
-
-ProposalEvaluation propose_merge_sparse(long current_block, const Blockmodel &blockmodel,
-                                        const Graph &graph, std::unordered_map<long, bool> &past_proposals) {
-    EdgeWeights out_blocks = blockmodel.blockmatrix()->outgoing_edges(current_block);
-    EdgeWeights in_blocks = blockmodel.blockmatrix()->incoming_edges(current_block);
-    utils::ProposalAndEdgeCounts proposal =
-            common::propose_new_block(current_block, out_blocks, in_blocks, blockmodel.block_assignment(), blockmodel, true);
-    if (past_proposals[proposal.proposal])
-        return ProposalEvaluation{proposal.proposal, std::numeric_limits<double>::max()};
-    Delta delta = blockmodel_delta(current_block, proposal.proposal, blockmodel);
-    //==========NEW==============
-    double delta_entropy = args.nonparametric ?
-            entropy::nonparametric::block_merge_delta_mdl(blockmodel, proposal, graph, delta) :
-            entropy::block_merge_delta_mdl(current_block, proposal, blockmodel, delta);
-    //==========OLD==============
-//     SparseEdgeCountUpdates updates;
-//        // edge_count_updates_sparse(blockmodel.blockmatrix(), current_block, proposal.proposal, out_blocks, in_blocks,
-//        //                           updates);
-//        // Note: if it becomes an issue, figuring out degrees on the fly could save some RAM. The
-//        // only ones that change are the degrees for current_block and proposal anyway...
-//        long current_block_self_edges = blockmodel.blockmatrix()->get(current_block, current_block)
-//                                       + delta.get(current_block, current_block);
-//        long proposed_block_self_edges = blockmodel.blockmatrix()->get(proposal.proposal, proposal.proposal)
-//                                        + delta.get(proposal.proposal, proposal.proposal);
-//    common::NewBlockDegrees new_block_degrees = common::compute_new_block_degrees(
-//            current_block, blockmodel, current_block_self_edges, proposed_block_self_edges, proposal);
-////    common::NewBlockDegrees new_block_degrees = common::compute_new_block_degrees(
-////            current_block, blockmodel, blockmodel.blockmatrix()->get(current_block, current_block), proposal);
-//    double delta_entropy = entropy::block_merge_delta_mdl(current_block, blockmodel, delta, new_block_degrees);
-//    // double delta_entropy =
-//    //     entropy::block_merge_delta_mdl(current_block, proposal.proposal, num_edges, blockmodel, updates,
-//    //                                  new_block_degrees);
-    //=========OLD==============
-    past_proposals[proposal.proposal] = true;
-    return ProposalEvaluation{proposal.proposal, delta_entropy};
-}
-
+#include "block_merge.hpp"
+
+#include <cmath>
+
+#include "args.hpp"
+#include "entropy.hpp"
+#include "globals.hpp"
+#include "mpi_data.hpp"
+#include "utils.hpp"
+#include "typedefs.hpp"
+
+namespace block_merge {
+
+Delta blockmodel_delta(long current_block, long proposed_block, const Blockmodel &blockmodel) {
+    Delta delta(current_block, proposed_block, blockmodel.degrees(current_block));
+    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getrow_sparse(current_block)) {
+        long col = entry.first;  // row = current_block
+        long value = entry.second;
+        if (col == current_block || col == proposed_block) {  // entry = current_block, current_block
+            delta.add(proposed_block, proposed_block, value);
+        } else {
+            delta.add(proposed_block, col, value);
+        }
+        delta.sub(current_block, col, value);
+    }
+    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getcol_sparse(current_block)) {
+        long row = entry.first;  // col = current_block
+        if (row == current_block) continue;  // already handled above
+        long value = entry.second;
+        if (row == proposed_block) {  // entry = current_block, current_block
+            delta.add(proposed_block, proposed_block, value);
+        } else {
+            delta.add(row, proposed_block, value);
+        }
+        delta.sub(row, current_block, value);
+    }
+    return delta;
+}
+
+void carry_out_best_merges_advanced(Blockmodel &blockmodel, const std::vector<double> &delta_entropy_for_each_block,
+                                    const std::vector<long> &best_merge_for_each_block, const Graph &graph) {
+    // The following code is modeled after the `merge_sweep` function in
+    // https://git.skewed.de/count0/graph-tool/-/blob/master/src/graph/inference/loops/merge_loop.hh
+    typedef std::tuple<long, long, double> merge_t;
+    auto cmp_fxn = [](merge_t left, merge_t right) { return std::get<2>(left) > std::get<2>(right); };
+    std::priority_queue<merge_t, std::vector<merge_t>, decltype(cmp_fxn)> queue(cmp_fxn);
+    for (long i = 0; i < (long) delta_entropy_for_each_block.size(); ++i)
+        queue.push(std::make_tuple(i, best_merge_for_each_block[i], delta_entropy_for_each_block[i]));
+//     double delta_entropy = 0.0;
+    long num_merged = 0;
+    // Block map is here so that, if you move merge block A to block C, and then block B to block A, all three blocks
+    // end up with the same block assignment (C)
+    std::vector<long> block_map = utils::range<long>(0, blockmodel.num_blocks());
+    while (num_merged < blockmodel.getNum_blocks_to_merge() && !queue.empty()) {
+        merge_t merge = queue.top();
+        queue.pop();
+        long merge_from = std::get<0>(merge);
+        long newblock = std::get<1>(merge);
+        if (newblock < 0 || size_t(newblock) >= block_map.size())
+            std::cout << mpi.rank << " | block_map.size = " << block_map.size() << " nB = " << blockmodel.num_blocks() << " block = " << merge_from << " newblock = " << newblock << std::endl;
+        long merge_to = block_map[std::get<1>(merge)];
+//         double delta_entropy_hlong = std::get<2>(merge);
+        if (merge_from != merge_to) {
+            // Calculate the delta entropy given the current block assignment
+            EdgeWeights out_blocks = blockmodel.blockmatrix()->outgoing_edges(merge_from);
+            EdgeWeights in_blocks = blockmodel.blockmatrix()->incoming_edges(merge_from);
+            long k_out = std::accumulate(out_blocks.values.begin(), out_blocks.values.end(), 0);
+            long k_in = std::accumulate(in_blocks.values.begin(), in_blocks.values.end(), 0);
+            long k = k_out + k_in;
+            utils::ProposalAndEdgeCounts proposal{merge_to, k_out, k_in, k};
+            Delta delta = blockmodel_delta(merge_from, proposal.proposal, blockmodel);
+//            long proposed_block_self_edges = blockmodel.blockmatrix()->get(merge_to, merge_to)
+//                                            + delta.get(merge_to, merge_to);
+//            double delta_entropy_actual = entropy::block_merge_delta_mdl(merge_from, proposal, blockmodel, delta);
+            double delta_entropy_actual = !args.parametric ?
+                                          entropy::nonparametric::block_merge_delta_mdl(blockmodel, proposal, graph, delta) :
+                                          entropy::block_merge_delta_mdl(merge_from, proposal, blockmodel, delta);
+            // If the actual change in entropy is more positive (greater) than anticipated, put it back in queue
+            if (!queue.empty() && delta_entropy_actual > std::get<2>(queue.top())) {
+                std::get<2>(merge) = delta_entropy_actual;
+                queue.push(merge);
+                continue;
+            }
+            // Perform the merge
+            // 1. Update the assignment
+            for (size_t i = 0; i < block_map.size(); ++i) {
+                long block = block_map[i];
+                if (block == merge_from) {
+                    block_map[i] = merge_to;
+                }
+            }
+            blockmodel.merge_block(merge_from, merge_to, delta, proposal);
+//            blockmodel.update_block_assignment(merge_from, merge_to);
+//             2. Update the matrix
+//            blockmodel.blockmatrix()->update_edge_counts(delta);
+//            blockmodel.degrees_out(new_block_degrees._block_degrees_out);
+//            blockmodel.degrees_in(new_block_degrees._block_degrees_in);
+//            blockmodel.degrees(new_block_degrees._block_degrees);
+//            blockmodel.degrees_out(merge_from, 0);
+//            blockmodel.degrees_out(merge_to, blockmodel.degrees_out(merge_to) + proposal.num_out_neighbor_edges);
+//            blockmodel.degrees_in(merge_from, 0);
+//            blockmodel.degrees_in(merge_to, blockmodel.degrees_in(merge_to) + proposal.num_in_neighbor_edges);
+//            blockmodel.degrees(merge_from, 0);
+//            blockmodel.degrees(merge_to, blockmodel.degrees_out(merge_to) + blockmodel.degrees_in(merge_to)
+//                               - proposed_block_self_edges);
+
+            num_merged++;
+        }
+    }
+    std::vector<long> mapping = Blockmodel::build_mapping(blockmodel.block_assignment());
+    for (long i = 0; i < (long) blockmodel.block_assignment().size(); ++i) {
+        long block = blockmodel.block_assignment(i);
+        long new_block_index = mapping[block];
+        blockmodel.set_block_assignment(i, new_block_index);
+    }
+    blockmodel.num_blocks(blockmodel.num_blocks() - blockmodel.getNum_blocks_to_merge());
+}
+
+EdgeCountUpdates edge_count_updates(std::shared_ptr<ISparseMatrix> blockmodel, long current_block, long proposed_block,
+                                    EdgeWeights &out_blocks, EdgeWeights &in_blocks) {
+    // TODO: these are copy constructors, can we safely get rid of them?
+    std::vector<long> proposal_row = blockmodel->getrow(proposed_block);
+    std::vector<long> proposal_col = blockmodel->getcol(proposed_block);
+    long count_self = blockmodel->get(current_block, current_block);
+    long count_in = count_self, count_out = count_self;
+    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
+        long index = in_blocks.indices[i];
+        long value = in_blocks.values[i];
+        if (index == proposed_block) {
+            count_in += value;
+        }
+        proposal_col[index] += value;
+    }
+    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
+        long index = out_blocks.indices[i];
+        long value = out_blocks.values[i];
+        if (index == proposed_block) {
+            count_out += value;
+        }
+        proposal_row[index] += value;
+    }
+    proposal_row[current_block] -= count_in;
+    proposal_row[proposed_block] += count_in;
+    proposal_col[current_block] -= count_out;
+    proposal_col[proposed_block] += count_out;
+    return EdgeCountUpdates{std::vector<long>(), proposal_row, std::vector<long>(), proposal_col};
+}
+
+void edge_count_updates_sparse(ISparseMatrix *blockmodel, long current_block, long proposed_block,
+                               EdgeWeights &out_blocks, EdgeWeights &in_blocks, SparseEdgeCountUpdates &updates) {
+    // TODO: these are copy constructors, can we safely get rid of them?
+    updates.proposal_row = blockmodel->getrow_sparse(proposed_block);
+    updates.proposal_col = blockmodel->getcol_sparse(proposed_block);
+    long count_self = blockmodel->get(current_block, current_block);
+    long count_in = count_self, count_out = count_self;
+    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
+        long index = in_blocks.indices[i];
+        long value = in_blocks.values[i];
+        if (index == proposed_block) {
+            count_in += value;
+        }
+        updates.proposal_col[index] += value;
+    }
+    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
+        long index = out_blocks.indices[i];
+        long value = out_blocks.values[i];
+        if (index == proposed_block) {
+            count_out += value;
+        }
+        updates.proposal_row[index] += value;
+    }
+    updates.proposal_row[current_block] -= count_in;
+    updates.proposal_row[proposed_block] += count_in;
+    updates.proposal_col[current_block] -= count_out;
+    updates.proposal_col[proposed_block] += count_out;
+}
+
+Blockmodel &merge_blocks(Blockmodel &blockmodel, const Graph &graph, long num_edges) {
+    assert(blockmodel.blockmatrix()->edges() == graph.num_edges());
+    // TODO: add block merge timings to evaluation
+    long num_blocks = blockmodel.num_blocks();
+    std::vector<long> best_merge_for_each_block = utils::constant<long>(num_blocks, -1);
+    std::vector<double> delta_entropy_for_each_block =
+            utils::constant<double>(num_blocks, std::numeric_limits<double>::max());
+    std::vector<long> block_assignment = utils::range<long>(0, num_blocks);
+    // TODO: keep track of already proposed merges, do not re-process those
+    long num_avoided = 0;  // number of avoided/skipped calculations
+    double start_t = MPI_Wtime();
+    #pragma omp parallel for schedule(dynamic) reduction( + : num_avoided) default(none) \
+    shared(num_blocks, num_edges, blockmodel, delta_entropy_for_each_block, best_merge_for_each_block, std::cout, graph)
+    for (long current_block = 0; current_block < num_blocks; ++current_block) {
+        std::unordered_map<long, bool> past_proposals;
+        for (long i = 0; i < NUM_AGG_PROPOSALS_PER_BLOCK; ++i) {
+            ProposalEvaluation proposal = propose_merge_sparse(current_block, blockmodel, graph, past_proposals);
+            if (proposal.delta_entropy == 0.0) {
+//                std::cout << current_block << " --> " << proposal.proposed_block << " == " << proposal.delta_entropy << std::endl;
+                int numvertices = 0;
+                for (int block : blockmodel.block_assignment()) {
+                    if (block == current_block) numvertices++;
+                }
+//                std::cout << "num vertices with block = " << current_block << " = " << numvertices << std::endl;
+//                std::cout << "current block neighbors: ";
+//                utils::print<long>(blockmodel.blockmatrix()->neighbors_weights(current_block));
+//                std::cout << "current block row: ";
+//                utils::print<long>(blockmodel.blockmatrix()->getrow_sparse(current_block));
+//                std::cout << "current block col: ";
+//                utils::print<long>(blockmodel.blockmatrix()->getcol_sparse(current_block));
+//                std::cout << "proposed block neighbors: ";
+//                utils::print<long>(blockmodel.blockmatrix()->neighbors_weights(proposal.proposed_block));
+            }
+            if (proposal.delta_entropy == std::numeric_limits<double>::max()) num_avoided++;
+            if (proposal.delta_entropy < delta_entropy_for_each_block[current_block]) {
+                best_merge_for_each_block[current_block] = proposal.proposed_block;
+                delta_entropy_for_each_block[current_block] = proposal.delta_entropy;
+            }
+        }
+    }
+    timers::BlockMerge_loop_time += MPI_Wtime() - start_t;
+    std::cout << "Avoided " << num_avoided << " / " << NUM_AGG_PROPOSALS_PER_BLOCK * num_blocks << " comparisons."
+              << std::endl;
+    if (args.approximate)
+        blockmodel.carry_out_best_merges(delta_entropy_for_each_block, best_merge_for_each_block);
+    else
+        carry_out_best_merges_advanced(blockmodel, delta_entropy_for_each_block, best_merge_for_each_block,
+                                       graph);
+    blockmodel.initialize_edge_counts(graph);
+    return blockmodel;
+}
+
+// TODO: get rid of block_assignment (block_assignment), just use blockmodel
+//ProposalEvaluation propose_merge(long current_block, long num_edges, Blockmodel &blockmodel,
+//                                 std::vector<long> &block_assignment) {
+//    EdgeWeights out_blocks = blockmodel.blockmatrix()->outgoing_edges(current_block);
+//    EdgeWeights in_blocks = blockmodel.blockmatrix()->incoming_edges(current_block);
+//    utils::ProposalAndEdgeCounts proposal =
+//            common::propose_new_block(current_block, out_blocks, in_blocks, block_assignment, blockmodel, true);
+//    EdgeCountUpdates updates =
+//            edge_count_updates(blockmodel.blockmatrix(), current_block, proposal.proposal, out_blocks, in_blocks);
+//    long current_block_self_edges = blockmodel.blockmatrix()->get(current_block, current_block)
+//                                   + updates.block_row[current_block];
+//    long proposed_block_self_edges = blockmodel.blockmatrix()->get(proposal.proposal, proposal.proposal)
+//                                    + updates.proposal_row[proposal.proposal];
+//    common::NewBlockDegrees new_block_degrees = common::compute_new_block_degrees(
+//            current_block, blockmodel, current_block_self_edges, proposed_block_self_edges, proposal);
+//    double delta_entropy =
+//            entropy::block_merge_delta_mdl(current_block, proposal.proposal, num_edges, blockmodel, updates,
+//                                  new_block_degrees);
+//    return ProposalEvaluation{proposal.proposal, delta_entropy};
+//}
+
+ProposalEvaluation propose_merge_sparse(long current_block, const Blockmodel &blockmodel,
+                                        const Graph &graph, std::unordered_map<long, bool> &past_proposals) {
+    EdgeWeights out_blocks = blockmodel.blockmatrix()->outgoing_edges(current_block);
+    EdgeWeights in_blocks = blockmodel.blockmatrix()->incoming_edges(current_block);
+    utils::ProposalAndEdgeCounts proposal =
+            common::propose_new_block(current_block, out_blocks, in_blocks, blockmodel.block_assignment(), blockmodel, true);
+    if (past_proposals[proposal.proposal])
+        return ProposalEvaluation{proposal.proposal, std::numeric_limits<double>::max()};
+    Delta delta = blockmodel_delta(current_block, proposal.proposal, blockmodel);
+    //==========NEW==============
+    double delta_entropy = !args.parametric ?
+            entropy::nonparametric::block_merge_delta_mdl(blockmodel, proposal, graph, delta) :
+            entropy::block_merge_delta_mdl(current_block, proposal, blockmodel, delta);
+    //==========OLD==============
+//     SparseEdgeCountUpdates updates;
+//        // edge_count_updates_sparse(blockmodel.blockmatrix(), current_block, proposal.proposal, out_blocks, in_blocks,
+//        //                           updates);
+//        // Note: if it becomes an issue, figuring out degrees on the fly could save some RAM. The
+//        // only ones that change are the degrees for current_block and proposal anyway...
+//        long current_block_self_edges = blockmodel.blockmatrix()->get(current_block, current_block)
+//                                       + delta.get(current_block, current_block);
+//        long proposed_block_self_edges = blockmodel.blockmatrix()->get(proposal.proposal, proposal.proposal)
+//                                        + delta.get(proposal.proposal, proposal.proposal);
+//    common::NewBlockDegrees new_block_degrees = common::compute_new_block_degrees(
+//            current_block, blockmodel, current_block_self_edges, proposed_block_self_edges, proposal);
+////    common::NewBlockDegrees new_block_degrees = common::compute_new_block_degrees(
+////            current_block, blockmodel, blockmodel.blockmatrix()->get(current_block, current_block), proposal);
+//    double delta_entropy = entropy::block_merge_delta_mdl(current_block, blockmodel, delta, new_block_degrees);
+//    // double delta_entropy =
+//    //     entropy::block_merge_delta_mdl(current_block, proposal.proposal, num_edges, blockmodel, updates,
+//    //                                  new_block_degrees);
+    //=========OLD==============
+    past_proposals[proposal.proposal] = true;
+    return ProposalEvaluation{proposal.proposal, delta_entropy};
+}
+
 }  // namespace block_merge
\ No newline at end of file
diff --git a/src/distributed/dist_finetune.cpp b/src/distributed/dist_finetune.cpp
index 463fc6a..45ec421 100755
--- a/src/distributed/dist_finetune.cpp
+++ b/src/distributed/dist_finetune.cpp
@@ -1,471 +1,471 @@
-/**
- * The distributed finetuning phase of the stochastic block blockmodeling algorithm.
- */
-#ifndef SBP_DIST_FINETUNE_HPP
-#define SBP_DIST_FINETUNE_HPP
-
-#include "distributed/dist_finetune.hpp"
-
-#include <mpi.h>
-
-#include "distributed/dist_common.hpp"
-#include "entropy.hpp"
-#include "finetune.hpp"
-
-namespace finetune::dist {
-
-std::vector<double> MCMC_RUNTIMES;
-std::vector<unsigned long> MCMC_VERTEX_EDGES;
-std::vector<long> MCMC_NUM_BLOCKS;
-std::vector<unsigned long> MCMC_BLOCK_DEGREES;
-std::vector<unsigned long long> MCMC_AGGREGATE_BLOCK_DEGREES;
-
-const int MEMBERSHIP_T_BLOCK_LENGTHS[2] = {1, 1};
-const MPI_Aint MEMBERSHIP_T_DISPLACEMENTS[2] = {0, sizeof(long)};
-const MPI_Datatype MEMBERSHIP_T_TYPES[2] = {MPI_LONG, MPI_LONG};
-
-MPI_Datatype Membership_t;
-
-std::vector<Membership> mpi_get_assignment_updates(const std::vector<Membership> &membership_updates) {
-//    for (const auto &m : membership_updates) {
-//        if (m.vertex < 0 || m.vertex >= 50000) {
-//            std::cout << mpi.rank << " ERROR: invalid vertex " << m.vertex << std::endl;
-//            exit(-111);
-//        }
-//    }
-    int num_moves = static_cast<int>(membership_updates.size());
-    std::vector<int> rank_moves(mpi.num_processes);
-    utils::MPI(MPI_Allgather(&num_moves, 1, MPI_INT, rank_moves.data(), 1, MPI_INT, mpi.comm));
-
-    std::vector<int> offsets(mpi.num_processes);
-    offsets[0] = 0;
-    for (int i = 1; i < mpi.num_processes; ++i) {
-        offsets[i] = offsets[i - 1] + rank_moves[i - 1];
-    }
-
-    int total_moves = offsets[mpi.num_processes - 1] + rank_moves[mpi.num_processes - 1];
-    std::vector<Membership> collected_membership_updates(total_moves, {-1, -1});
-
-//    std::cout << mpi.rank << " | size of cmu: " << collected_membership_updates.size() << std::endl;
-//    for (int i = 0; i < mpi.num_processes; ++i) {
-//        std::cout << "(offset: " << offsets[i] << ", rank_moves: " << rank_moves[i] << "), ";
-//    }
-//    std::cout << std::endl;
-
-    utils::MPI(MPI_Allgatherv(
-            membership_updates.data(), num_moves, Membership_t,collected_membership_updates.data(),
-            rank_moves.data(), offsets.data(), Membership_t, mpi.comm));
-
-    return collected_membership_updates;
-}
-
-
-bool async_move(const Membership &membership, const Graph &graph, TwoHopBlockmodel &blockmodel) {
-    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), membership.vertex, false);
-    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), membership.vertex, true);
-    Vertex v = { membership.vertex,
-                 (long) graph.out_neighbors(membership.vertex).size(),
-                 (long) graph.in_neighbors(membership.vertex).size() };
-    VertexMove_v3 move {
-            0.0, true, v, membership.block, out_edges, in_edges
-    };
-    return blockmodel.move_vertex(move);
-}
-
-std::vector<Membership> asynchronous_gibbs_iteration(TwoHopBlockmodel &blockmodel, const Graph &graph,
-                                                     std::vector<long> *next_assignment, MPI_Win mcmc_window,
-                                                     const std::vector<long> &active_set, int batch) {
-    std::vector<Membership> membership_updates;
-    std::vector<long> vertices;
-    if (active_set.empty())
-        vertices = utils::range<long>(0, graph.num_vertices());
-    else
-        vertices = active_set;
-    auto batch_size = size_t(ceil(double(vertices.size()) / args.batches));
-    size_t start = batch * batch_size;
-    size_t end = std::min(vertices.size(), (batch + 1) * batch_size);
-    #pragma omp parallel for schedule(dynamic) default(none) \
-    shared(args, start, end, vertices, blockmodel, graph, membership_updates, mcmc_window, next_assignment)
-    for (size_t index = start; index < end; ++index) {
-        long vertex = vertices[index];
-        if (!blockmodel.owns_vertex(vertex)) continue;
-        VertexMove proposal = dist::propose_gibbs_move(blockmodel, vertex, graph);
-        if (proposal.did_move) {
-            remote_update_membership(vertex, proposal.proposed_block, membership_updates, next_assignment, mcmc_window);
-        }
-    }
-    return membership_updates;
-}
-
-bool early_stop(long iteration, bool golden_ratio_not_reached, double initial_entropy,
-                std::vector<double> &delta_entropies) {
-    size_t last_index = delta_entropies.size() - 1;
-    if (delta_entropies[last_index] == 0.0) {
-        return true;
-    }
-    if (iteration < 3) {
-        return false;
-    }
-    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
-    average /= -3.0;
-    double threshold;
-    if (golden_ratio_not_reached) { // Golden ratio bracket not yet established
-        threshold = 5e-4 * initial_entropy;
-    } else {
-        threshold = 1e-4 * initial_entropy;
-    }
-    return average < threshold;
-}
-
-Blockmodel &finetune_assignment(TwoHopBlockmodel &blockmodel, Graph &graph) {
-    return mcmc(graph, blockmodel, false);
-    MPI_Win mcmc_window;
-    std::vector<long> next_assignment;
-    if (args.nonblocking) {
-//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-        std::cout << mpi.rank << " | initializing the MPI single-sided comm window" << std::endl;
-        next_assignment = blockmodel.block_assignment();
-        auto win_size = long(next_assignment.size() * sizeof(long));
-        MPI_Win_create(next_assignment.data(), win_size, sizeof(long), MPI_INFO_NULL,
-                       mpi.comm, &mcmc_window);
-        assert(next_assignment.size() == (size_t) graph.num_vertices());
-    } else {
-        MPI_Type_create_struct(2, MEMBERSHIP_T_BLOCK_LENGTHS, MEMBERSHIP_T_DISPLACEMENTS, MEMBERSHIP_T_TYPES,
-                               &Membership_t);
-        MPI_Type_commit(&Membership_t);
-    }
-    if (mpi.rank == 0)
-        std::cout << "Fine-tuning partition results after sample results have been extended to full graph" << std::endl;
-    std::vector<double> delta_entropies;
-    long total_vertex_moves = 0;
-    double old_entropy = args.nonparametric ?
-                  entropy::nonparametric::mdl(blockmodel, graph) :
-                  entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
-    blockmodel.setOverall_entropy(old_entropy);
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-        double start_t = MPI_Wtime();
-        if (args.nonblocking) {
-//            if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-            next_assignment.assign(blockmodel.block_assignment().begin(), blockmodel.block_assignment().end());
-        }
-//        for (const long &b: block_assignment) {
-//            assert(b < blockmodel.num_blocks());
-//        }
-        std::vector<long> vertices = utils::range<long>(0, graph.num_vertices());
-        shuffle_active_set(vertices);
-        size_t vertex_moves = 0;
-        for (int batch = 0; batch < args.batches; ++batch) {
-            std::vector<Membership> membership_updates = metropolis_hastings_iteration(
-                    blockmodel, graph, &next_assignment, mcmc_window, vertices, batch);
-            vertex_moves += update_blockmodel(graph, blockmodel, membership_updates, &next_assignment, mcmc_window);
-        }
-        MCMC_RUNTIMES.push_back(MPI_Wtime() - start_t);
-        double new_entropy = args.nonparametric ?
-                             entropy::nonparametric::mdl(blockmodel, graph) :
-                             entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
-        double delta_entropy = new_entropy - old_entropy;
-        old_entropy = new_entropy;
-        delta_entropies.push_back(delta_entropy);
-        if (mpi.rank == 0) {
-            std::cout << "Itr: " << iteration << " vertex moves: " << vertex_moves << " delta S: "
-                      << delta_entropy / new_entropy << std::endl;
-        }
-        total_vertex_moves += vertex_moves;
-        timers::MCMC_iterations++;
-        // Early stopping
-        if (finetune::early_stop(iteration, blockmodel.getOverall_entropy(), delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    if (mpi.rank == 0) std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-    if (mpi.rank == 0) std::cout << blockmodel.getOverall_entropy() << std::endl;
-    if (args.nonblocking) {
-//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-        utils::MPI(MPI_Win_free(&mcmc_window));
-    } else {
-        MPI_Type_free(&Membership_t);
-    }
-    return blockmodel;
-}
-
-void measure_imbalance_metrics(const TwoHopBlockmodel &blockmodel, const Graph &graph) {
-    std::vector<long> degrees = graph.degrees();
-    MapVector<bool> block_count;
-    unsigned long num_degrees = 0;
-    unsigned long long num_aggregate_block_degrees = 0;
-    for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
-        if (!blockmodel.owns_vertex(vertex)) continue;
-        num_degrees += degrees[vertex];
-        long block = blockmodel.block_assignment(vertex);
-        block_count[block] = true;
-        num_aggregate_block_degrees += blockmodel.degrees(block);
-    }
-    MCMC_VERTEX_EDGES.push_back(num_degrees);
-    MCMC_NUM_BLOCKS.push_back(block_count.size());
-    unsigned long block_degrees = 0;
-    for (const std::pair<long, bool> &entry : block_count) {
-        long block = entry.first;
-        block_degrees += blockmodel.degrees(block);
-    }
-    MCMC_BLOCK_DEGREES.push_back(block_degrees);
-    MCMC_AGGREGATE_BLOCK_DEGREES.push_back(num_aggregate_block_degrees);
-}
-
-TwoHopBlockmodel &mcmc(Graph &graph, TwoHopBlockmodel &blockmodel, bool golden_ratio_not_reached) {
-    if (blockmodel.num_blocks() == 1) {
-        std::cout << mpi.rank << " | number of blocks is 1 for some reason..." << std::endl;
-        return blockmodel;
-    }
-    common::candidates = std::uniform_int_distribution<long>(0, blockmodel.num_blocks() - 2);
-    if (mpi.rank == 0) std::cout << "Starting MCMC vertex moves using " << args.algorithm << std::endl;
-    // MPI Datatype init
-    MPI_Win mcmc_window;
-    std::vector<long> next_assignment;
-    if (args.nonblocking) {
-//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-//        std::cout << mpi.rank << " | initializing the MPI single-sided comm window" << std::endl;
-        next_assignment = blockmodel.block_assignment();
-        auto win_size = long(next_assignment.size() * sizeof(long));
-        MPI_Win_create(next_assignment.data(), win_size, sizeof(long), MPI_INFO_NULL,
-                       mpi.comm, &mcmc_window);
-        assert(next_assignment.size() == (size_t) graph.num_vertices());
-        utils::MPI(MPI_Win_fence(0, mcmc_window));
-    } else {
-        MPI_Type_create_struct(2, MEMBERSHIP_T_BLOCK_LENGTHS, MEMBERSHIP_T_DISPLACEMENTS, MEMBERSHIP_T_TYPES,
-                               &Membership_t);
-        MPI_Type_commit(&Membership_t);
-    }
-    std::vector<double> delta_entropies;
-//    size_t total_vertex_moves = 0;
-    double old_entropy = args.nonparametric ?
-                         entropy::nonparametric::mdl(blockmodel, graph) :
-                         entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
-    blockmodel.setOverall_entropy(old_entropy);
-    double new_entropy = 0;
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-        measure_imbalance_metrics(blockmodel, graph);
-        double start_t = MPI_Wtime();
-        // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
-        // asynchronous Gibbs sampling
-        if (args.nonblocking) {
-//            if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-            next_assignment.assign(blockmodel.block_assignment().begin(), blockmodel.block_assignment().end());
-        }
-        size_t vertex_moves = 0;
-        if (args.algorithm == "hybrid_mcmc") {
-            std::vector<long> active_set = graph.high_degree_vertices();
-            shuffle_active_set(active_set);
-//            std::shuffle(active_set.begin(), active_set.end(), rng::generator());
-            if (mpi.rank == 0) std::cout << mpi.rank << " | starting MH portion of hybrid MCMC =========" << std::endl;
-            std::vector<Membership> membership_updates = metropolis_hastings_iteration(blockmodel, graph, &next_assignment, mcmc_window, graph.high_degree_vertices(), -1);
-            vertex_moves = update_blockmodel(graph, blockmodel, membership_updates, &next_assignment, mcmc_window);
-            if (mpi.rank == 0) std::cout << mpi.rank << " | starting AG portion of hybrid MCMC =========" << std::endl;
-            active_set = graph.low_degree_vertices();
-            shuffle_active_set(active_set);
-//            std::shuffle(active_set.begin(), active_set.end(), rng::generator());
-            for (int batch = 0; batch < args.batches; ++batch) {
-                if (mpi.rank == 0) std::cout << mpi.rank << " | starting AG batch " << batch << " portion of hybrid MCMC =========" << std::endl;
-                std::vector<Membership> async_updates = asynchronous_gibbs_iteration(blockmodel, graph, &next_assignment, mcmc_window, active_set, batch);
-                vertex_moves += update_blockmodel(graph, blockmodel, async_updates, &next_assignment, mcmc_window);
-            }
-        } else {
-            std::vector<long> active_set = utils::range<long>(0, graph.num_vertices());
-            shuffle_active_set(active_set);
-//            std::shuffle(active_set.begin(), active_set.end(), rng::generator());
-            for (int batch = 0; batch < args.batches; ++batch) {
-//                if (mpi.rank == 0) std::cout << "processing batch = " << batch << std::endl;
-                std::vector<Membership> membership_updates;
-                if (args.algorithm == "async_gibbs")
-                    membership_updates = asynchronous_gibbs_iteration(blockmodel, graph, &next_assignment, mcmc_window,
-                                                                      active_set, batch);
-                else
-                    membership_updates = metropolis_hastings_iteration(blockmodel, graph, &next_assignment, mcmc_window, active_set, batch);
-                vertex_moves += update_blockmodel(graph, blockmodel, membership_updates, &next_assignment, mcmc_window);
-            }
-        }
-        MCMC_RUNTIMES.push_back(MPI_Wtime() - start_t);
-        new_entropy = args.nonparametric ?
-                      entropy::nonparametric::mdl(blockmodel, graph) :
-                      entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
-        double delta_entropy = new_entropy - old_entropy;
-        old_entropy = new_entropy;
-        delta_entropies.push_back(delta_entropy);
-        if (mpi.rank == 0) {
-            std::cout << "Itr: " << iteration << " vertex moves: " << vertex_moves << " delta S: "
-                      << delta_entropy / new_entropy << std::endl;
-        }
-//        total_vertex_moves += vertex_moves;
-        timers::MCMC_iterations++;
-        // Early stopping
-        if (early_stop(iteration, golden_ratio_not_reached, new_entropy, delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(new_entropy);
-    if (args.nonblocking) {
-//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-        utils::MPI(MPI_Win_free(&mcmc_window));
-    } else {
-        MPI_Type_free(&Membership_t);
-    }
-    return blockmodel;
-}
-
-std::vector<Membership> metropolis_hastings_iteration(TwoHopBlockmodel &blockmodel, Graph &graph,
-                                                      std::vector<long> *next_assignment, MPI_Win mcmc_window,
-                                                      const std::vector<long> &active_set, int batch) {
-    std::vector<Membership> membership_updates;
-    std::vector<long> vertices;
-    if (active_set.empty())
-        vertices = utils::range<long>(0, graph.num_vertices());
-    else
-        vertices = active_set;
-    long batch_size = long(ceil(double(vertices.size()) / args.batches));
-    long start = batch * batch_size;
-    size_t end = std::min(long(vertices.size()), (batch + 1) * batch_size);
-    // for hybrid_mcmc, we want to go through entire active_set in one go, regardless of number of batches
-    if (batch == -1) {
-        start = 0;
-        end = vertices.size();
-    }
-    for (size_t index = start; index < end; ++index) {  // long vertex : vertices) {
-        long vertex = vertices[index];
-        if (!blockmodel.owns_vertex(vertex)) continue;
-        VertexMove proposal = dist::propose_mh_move(blockmodel, vertex, graph);
-        if (proposal.did_move) {
-//            assert(proposal.proposed_block < blockmodel.num_blocks());
-            assert(blockmodel.stores(proposal.proposed_block));
-            remote_update_membership(vertex, proposal.proposed_block, membership_updates, next_assignment, mcmc_window);
-//            membership_updates.push_back(Membership{vertex, proposal.proposed_block});
-        }
-    }
-    return membership_updates;
-}
-
-VertexMove propose_gibbs_move(const TwoHopBlockmodel &blockmodel, long vertex, const Graph &graph) {
-    bool did_move = false;
-    long current_block = blockmodel.block_assignment(vertex);
-//    if (blockmodel.block_size(current_block) <= args.threads) {
-//        return VertexMove{0.0, did_move, -1, -1 };
-//    }
-
-    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
-    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
-
-    utils::ProposalAndEdgeCounts proposal = common::dist::propose_new_block(
-            current_block, out_edges, in_edges, blockmodel.block_assignment(), blockmodel, false);
-    if (!blockmodel.stores(proposal.proposal)) {
-        std::cerr << "ERROR " << "blockmodel doesn't own proposed block!!!!!" << std::endl;
-        exit(-1000000000);
-    }
-    if (proposal.proposal == current_block) {
-        return VertexMove{0.0, did_move, -1, -1 };
-    }
-
-    return eval_vertex_move(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
-}
-
-VertexMove propose_mh_move(TwoHopBlockmodel &blockmodel, long vertex, const Graph &graph) {
-    bool did_move = false;
-    long current_block = blockmodel.block_assignment(vertex);  // getBlock_assignment()[vertex];
-    if (blockmodel.block_size(current_block) == 1) {
-        return VertexMove{0.0, did_move, -1, -1 };
-    }
-    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
-    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
-
-    utils::ProposalAndEdgeCounts proposal = common::dist::propose_new_block(
-            current_block, out_edges, in_edges, blockmodel.block_assignment(), blockmodel, false);
-    if (!blockmodel.stores(proposal.proposal)) {
-        std::cerr << "ERROR " << "blockmodel doesn't own proposed block!!!!!" << std::endl;
-        exit(-1000000000);
-    }
-    if (proposal.proposal == current_block) {
-        return VertexMove{0.0, did_move, -1, -1 };
-    }
-
-    return move_vertex(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
-}
-
-void remote_update_membership(long vertex, long new_block, std::vector<Membership> &membership_updates,
-                              std::vector<long> *next_assignment, MPI_Win mcmc_window) {
-//    std::cout << mpi.rank << " | attempting to move " << vertex << " to " << new_block << std::endl;
-    if (args.nonblocking) {
-//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-        (*next_assignment)[vertex] = new_block;
-        for (int rank = 0; rank < mpi.num_processes; ++rank) {
-            if (rank == mpi.rank) continue;
-//            std::cout << mpi.rank << " | attempting to update index " << vertex << " to " << (*next_assignment)[vertex] << " on rank " << rank << std::endl;
-//            std::cout << "origin_addr = " << &new_block << std::endl;
-//            assert(mcmc_window != nullptr);
-//            std::cout << "window addr = " << &mcmc_window << std::endl;
-//            assert(next_assignment != nullptr);
-//            assert(next_assignment->size() > vertex);
-//            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, mcmc_window);
-            utils::MPI(MPI_Put(&new_block, 1, MPI_LONG, rank, vertex, 1, MPI_LONG, mcmc_window));
-//            MPI_Win_unlock(rank, mcmc_window);
-        }
-        return;
-    }
-    #pragma omp critical (updates)
-    {
-        membership_updates.push_back(Membership{vertex, new_block});
-    }
-}
-
-void shuffle_active_set(std::vector<long> &active_set) {
-    if (!args.ordered) {
-        unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
-        std::shuffle(active_set.begin(), active_set.end(), std::mt19937_64(seed));
-    }
-}
-
-size_t update_blockmodel(const Graph &graph, TwoHopBlockmodel &blockmodel,
-                         const std::vector<Membership> &membership_updates,
-                         std::vector<long> *next_assignment, MPI_Win mcmc_window) {
-    if (args.nonblocking) {
-//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
-//        std::cout << mpi.rank << " | updating blockmodel!" << std::endl;
-//        std::cout << mpi.rank << " | next_assignment_addr = " << next_assignment << std::endl;
-//        assert(next_assignment != nullptr);
-//        assert(next_assignment->size() == graph.num_vertices());
-//        assert(mcmc_window != nullptr);
-//        std::cout << mpi.rank << " | window_addr = " << mcmc_window << std::endl;
-        size_t vertex_moves = 0;
-//        std::cout << mpi.rank << " | arrived at win_fence" << std::endl;
-        utils::MPI(MPI_Win_fence(0, mcmc_window));
-//        std::cout << mpi.rank << " | got through win_fence" << std::endl;
-        for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
-            if ((*next_assignment)[vertex] == blockmodel.block_assignment(vertex)) continue;
-//            if ((*next_assignment)[vertex] >= blockmodel.num_blocks()) {
-//                std::cout << mpi.rank << " | updated assignment for " << vertex << " is " << (*next_assignment)[vertex]
-//                          << " when old assignment is " << blockmodel.block_assignment(vertex) << std::endl;
-//                assert((*next_assignment)[vertex] < blockmodel.num_blocks());
-//            }
-            if (async_move({vertex, (*next_assignment)[vertex]}, graph, blockmodel)) {
-                vertex_moves++;
-            }
-        }
-        return vertex_moves;
-    }
-    std::vector<Membership> collected_membership_updates = mpi_get_assignment_updates(membership_updates);
-    size_t vertex_moves = 0;
-    for (const Membership &membership: collected_membership_updates) {
-        if (membership.vertex < 0 || membership.vertex >= graph.num_vertices()) {
-            std::cout << mpi.rank << " | ERROR moving " << membership.vertex << " to block " << membership.block << std::endl;
-            exit(-114);
-        }
-        if (membership.block == blockmodel.block_assignment(membership.vertex)) continue;
-        if (async_move(membership, graph, blockmodel)) {
-            vertex_moves++;
-        }
-    }
-//    size_t vertex_moves = collected_membership_updates.size();
-    return vertex_moves;
-}
-
-}
-// namespace finetune::dist
-
-#endif // SBP_DIST_FINETUNE_HPP
+/**
+ * The distributed finetuning phase of the stochastic block blockmodeling algorithm.
+ */
+#ifndef SBP_DIST_FINETUNE_HPP
+#define SBP_DIST_FINETUNE_HPP
+
+#include "distributed/dist_finetune.hpp"
+
+#include <mpi.h>
+
+#include "distributed/dist_common.hpp"
+#include "entropy.hpp"
+#include "finetune.hpp"
+
+namespace finetune::dist {
+
+std::vector<double> MCMC_RUNTIMES;
+std::vector<unsigned long> MCMC_VERTEX_EDGES;
+std::vector<long> MCMC_NUM_BLOCKS;
+std::vector<unsigned long> MCMC_BLOCK_DEGREES;
+std::vector<unsigned long long> MCMC_AGGREGATE_BLOCK_DEGREES;
+
+const int MEMBERSHIP_T_BLOCK_LENGTHS[2] = {1, 1};
+const MPI_Aint MEMBERSHIP_T_DISPLACEMENTS[2] = {0, sizeof(long)};
+const MPI_Datatype MEMBERSHIP_T_TYPES[2] = {MPI_LONG, MPI_LONG};
+
+MPI_Datatype Membership_t;
+
+std::vector<Membership> mpi_get_assignment_updates(const std::vector<Membership> &membership_updates) {
+//    for (const auto &m : membership_updates) {
+//        if (m.vertex < 0 || m.vertex >= 50000) {
+//            std::cout << mpi.rank << " ERROR: invalid vertex " << m.vertex << std::endl;
+//            exit(-111);
+//        }
+//    }
+    int num_moves = static_cast<int>(membership_updates.size());
+    std::vector<int> rank_moves(mpi.num_processes);
+    utils::MPI(MPI_Allgather(&num_moves, 1, MPI_INT, rank_moves.data(), 1, MPI_INT, mpi.comm));
+
+    std::vector<int> offsets(mpi.num_processes);
+    offsets[0] = 0;
+    for (int i = 1; i < mpi.num_processes; ++i) {
+        offsets[i] = offsets[i - 1] + rank_moves[i - 1];
+    }
+
+    int total_moves = offsets[mpi.num_processes - 1] + rank_moves[mpi.num_processes - 1];
+    std::vector<Membership> collected_membership_updates(total_moves, {-1, -1});
+
+//    std::cout << mpi.rank << " | size of cmu: " << collected_membership_updates.size() << std::endl;
+//    for (int i = 0; i < mpi.num_processes; ++i) {
+//        std::cout << "(offset: " << offsets[i] << ", rank_moves: " << rank_moves[i] << "), ";
+//    }
+//    std::cout << std::endl;
+
+    utils::MPI(MPI_Allgatherv(
+            membership_updates.data(), num_moves, Membership_t,collected_membership_updates.data(),
+            rank_moves.data(), offsets.data(), Membership_t, mpi.comm));
+
+    return collected_membership_updates;
+}
+
+
+bool async_move(const Membership &membership, const Graph &graph, TwoHopBlockmodel &blockmodel) {
+    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), membership.vertex, false);
+    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), membership.vertex, true);
+    Vertex v = { membership.vertex,
+                 (long) graph.out_neighbors(membership.vertex).size(),
+                 (long) graph.in_neighbors(membership.vertex).size() };
+    VertexMove_v3 move {
+            0.0, true, v, membership.block, out_edges, in_edges
+    };
+    return blockmodel.move_vertex(move);
+}
+
+std::vector<Membership> asynchronous_gibbs_iteration(TwoHopBlockmodel &blockmodel, const Graph &graph,
+                                                     std::vector<long> *next_assignment, MPI_Win mcmc_window,
+                                                     const std::vector<long> &active_set, int batch) {
+    std::vector<Membership> membership_updates;
+    std::vector<long> vertices;
+    if (active_set.empty())
+        vertices = utils::range<long>(0, graph.num_vertices());
+    else
+        vertices = active_set;
+    auto batch_size = size_t(ceil(double(vertices.size()) / args.batches));
+    size_t start = batch * batch_size;
+    size_t end = std::min(vertices.size(), (batch + 1) * batch_size);
+    #pragma omp parallel for schedule(dynamic) default(none) \
+    shared(args, start, end, vertices, blockmodel, graph, membership_updates, mcmc_window, next_assignment)
+    for (size_t index = start; index < end; ++index) {
+        long vertex = vertices[index];
+        if (!blockmodel.owns_vertex(vertex)) continue;
+        VertexMove proposal = dist::propose_gibbs_move(blockmodel, vertex, graph);
+        if (proposal.did_move) {
+            remote_update_membership(vertex, proposal.proposed_block, membership_updates, next_assignment, mcmc_window);
+        }
+    }
+    return membership_updates;
+}
+
+bool early_stop(long iteration, bool golden_ratio_not_reached, double initial_entropy,
+                std::vector<double> &delta_entropies) {
+    size_t last_index = delta_entropies.size() - 1;
+    if (delta_entropies[last_index] == 0.0) {
+        return true;
+    }
+    if (iteration < 3) {
+        return false;
+    }
+    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
+    average /= -3.0;
+    double threshold;
+    if (golden_ratio_not_reached) { // Golden ratio bracket not yet established
+        threshold = 5e-4 * initial_entropy;
+    } else {
+        threshold = 1e-4 * initial_entropy;
+    }
+    return average < threshold;
+}
+
+Blockmodel &finetune_assignment(TwoHopBlockmodel &blockmodel, Graph &graph) {
+    return mcmc(graph, blockmodel, false);
+    MPI_Win mcmc_window;
+    std::vector<long> next_assignment;
+    if (args.nonblocking) {
+//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+        std::cout << mpi.rank << " | initializing the MPI single-sided comm window" << std::endl;
+        next_assignment = blockmodel.block_assignment();
+        auto win_size = long(next_assignment.size() * sizeof(long));
+        MPI_Win_create(next_assignment.data(), win_size, sizeof(long), MPI_INFO_NULL,
+                       mpi.comm, &mcmc_window);
+        assert(next_assignment.size() == (size_t) graph.num_vertices());
+    } else {
+        MPI_Type_create_struct(2, MEMBERSHIP_T_BLOCK_LENGTHS, MEMBERSHIP_T_DISPLACEMENTS, MEMBERSHIP_T_TYPES,
+                               &Membership_t);
+        MPI_Type_commit(&Membership_t);
+    }
+    if (mpi.rank == 0)
+        std::cout << "Fine-tuning partition results after sample results have been extended to full graph" << std::endl;
+    std::vector<double> delta_entropies;
+    long total_vertex_moves = 0;
+    double old_entropy = !args.parametric ?
+                  entropy::nonparametric::mdl(blockmodel, graph) :
+                  entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
+    blockmodel.setOverall_entropy(old_entropy);
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+        double start_t = MPI_Wtime();
+        if (args.nonblocking) {
+//            if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+            next_assignment.assign(blockmodel.block_assignment().begin(), blockmodel.block_assignment().end());
+        }
+//        for (const long &b: block_assignment) {
+//            assert(b < blockmodel.num_blocks());
+//        }
+        std::vector<long> vertices = utils::range<long>(0, graph.num_vertices());
+        shuffle_active_set(vertices);
+        size_t vertex_moves = 0;
+        for (int batch = 0; batch < args.batches; ++batch) {
+            std::vector<Membership> membership_updates = metropolis_hastings_iteration(
+                    blockmodel, graph, &next_assignment, mcmc_window, vertices, batch);
+            vertex_moves += update_blockmodel(graph, blockmodel, membership_updates, &next_assignment, mcmc_window);
+        }
+        MCMC_RUNTIMES.push_back(MPI_Wtime() - start_t);
+        double new_entropy = !args.parametric ?
+                             entropy::nonparametric::mdl(blockmodel, graph) :
+                             entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
+        double delta_entropy = new_entropy - old_entropy;
+        old_entropy = new_entropy;
+        delta_entropies.push_back(delta_entropy);
+        if (mpi.rank == 0) {
+            std::cout << "Itr: " << iteration << " vertex moves: " << vertex_moves << " delta S: "
+                      << delta_entropy / new_entropy << std::endl;
+        }
+        total_vertex_moves += vertex_moves;
+        timers::MCMC_iterations++;
+        // Early stopping
+        if (finetune::early_stop(iteration, blockmodel.getOverall_entropy(), delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    if (mpi.rank == 0) std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+    if (mpi.rank == 0) std::cout << blockmodel.getOverall_entropy() << std::endl;
+    if (args.nonblocking) {
+//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+        utils::MPI(MPI_Win_free(&mcmc_window));
+    } else {
+        MPI_Type_free(&Membership_t);
+    }
+    return blockmodel;
+}
+
+void measure_imbalance_metrics(const TwoHopBlockmodel &blockmodel, const Graph &graph) {
+    std::vector<long> degrees = graph.degrees();
+    MapVector<bool> block_count;
+    unsigned long num_degrees = 0;
+    unsigned long long num_aggregate_block_degrees = 0;
+    for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
+        if (!blockmodel.owns_vertex(vertex)) continue;
+        num_degrees += degrees[vertex];
+        long block = blockmodel.block_assignment(vertex);
+        block_count[block] = true;
+        num_aggregate_block_degrees += blockmodel.degrees(block);
+    }
+    MCMC_VERTEX_EDGES.push_back(num_degrees);
+    MCMC_NUM_BLOCKS.push_back(block_count.size());
+    unsigned long block_degrees = 0;
+    for (const std::pair<long, bool> &entry : block_count) {
+        long block = entry.first;
+        block_degrees += blockmodel.degrees(block);
+    }
+    MCMC_BLOCK_DEGREES.push_back(block_degrees);
+    MCMC_AGGREGATE_BLOCK_DEGREES.push_back(num_aggregate_block_degrees);
+}
+
+TwoHopBlockmodel &mcmc(Graph &graph, TwoHopBlockmodel &blockmodel, bool golden_ratio_not_reached) {
+    if (blockmodel.num_blocks() == 1) {
+        std::cout << mpi.rank << " | number of blocks is 1 for some reason..." << std::endl;
+        return blockmodel;
+    }
+    common::candidates = std::uniform_int_distribution<long>(0, blockmodel.num_blocks() - 2);
+    if (mpi.rank == 0) std::cout << "Starting MCMC vertex moves using " << args.algorithm << std::endl;
+    // MPI Datatype init
+    MPI_Win mcmc_window;
+    std::vector<long> next_assignment;
+    if (args.nonblocking) {
+//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+//        std::cout << mpi.rank << " | initializing the MPI single-sided comm window" << std::endl;
+        next_assignment = blockmodel.block_assignment();
+        auto win_size = long(next_assignment.size() * sizeof(long));
+        MPI_Win_create(next_assignment.data(), win_size, sizeof(long), MPI_INFO_NULL,
+                       mpi.comm, &mcmc_window);
+        assert(next_assignment.size() == (size_t) graph.num_vertices());
+        utils::MPI(MPI_Win_fence(0, mcmc_window));
+    } else {
+        MPI_Type_create_struct(2, MEMBERSHIP_T_BLOCK_LENGTHS, MEMBERSHIP_T_DISPLACEMENTS, MEMBERSHIP_T_TYPES,
+                               &Membership_t);
+        MPI_Type_commit(&Membership_t);
+    }
+    std::vector<double> delta_entropies;
+//    size_t total_vertex_moves = 0;
+    double old_entropy = !args.parametric ?
+                         entropy::nonparametric::mdl(blockmodel, graph) :
+                         entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
+    blockmodel.setOverall_entropy(old_entropy);
+    double new_entropy = 0;
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+        measure_imbalance_metrics(blockmodel, graph);
+        double start_t = MPI_Wtime();
+        // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
+        // asynchronous Gibbs sampling
+        if (args.nonblocking) {
+//            if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+            next_assignment.assign(blockmodel.block_assignment().begin(), blockmodel.block_assignment().end());
+        }
+        size_t vertex_moves = 0;
+        if (args.algorithm == "hybrid_mcmc") {
+            std::vector<long> active_set = graph.high_degree_vertices();
+            shuffle_active_set(active_set);
+//            std::shuffle(active_set.begin(), active_set.end(), rng::generator());
+            if (mpi.rank == 0) std::cout << mpi.rank << " | starting MH portion of hybrid MCMC =========" << std::endl;
+            std::vector<Membership> membership_updates = metropolis_hastings_iteration(blockmodel, graph, &next_assignment, mcmc_window, graph.high_degree_vertices(), -1);
+            vertex_moves = update_blockmodel(graph, blockmodel, membership_updates, &next_assignment, mcmc_window);
+            if (mpi.rank == 0) std::cout << mpi.rank << " | starting AG portion of hybrid MCMC =========" << std::endl;
+            active_set = graph.low_degree_vertices();
+            shuffle_active_set(active_set);
+//            std::shuffle(active_set.begin(), active_set.end(), rng::generator());
+            for (int batch = 0; batch < args.batches; ++batch) {
+                if (mpi.rank == 0) std::cout << mpi.rank << " | starting AG batch " << batch << " portion of hybrid MCMC =========" << std::endl;
+                std::vector<Membership> async_updates = asynchronous_gibbs_iteration(blockmodel, graph, &next_assignment, mcmc_window, active_set, batch);
+                vertex_moves += update_blockmodel(graph, blockmodel, async_updates, &next_assignment, mcmc_window);
+            }
+        } else {
+            std::vector<long> active_set = utils::range<long>(0, graph.num_vertices());
+            shuffle_active_set(active_set);
+//            std::shuffle(active_set.begin(), active_set.end(), rng::generator());
+            for (int batch = 0; batch < args.batches; ++batch) {
+//                if (mpi.rank == 0) std::cout << "processing batch = " << batch << std::endl;
+                std::vector<Membership> membership_updates;
+                if (args.algorithm == "async_gibbs")
+                    membership_updates = asynchronous_gibbs_iteration(blockmodel, graph, &next_assignment, mcmc_window,
+                                                                      active_set, batch);
+                else
+                    membership_updates = metropolis_hastings_iteration(blockmodel, graph, &next_assignment, mcmc_window, active_set, batch);
+                vertex_moves += update_blockmodel(graph, blockmodel, membership_updates, &next_assignment, mcmc_window);
+            }
+        }
+        MCMC_RUNTIMES.push_back(MPI_Wtime() - start_t);
+        new_entropy = !args.parametric ?
+                      entropy::nonparametric::mdl(blockmodel, graph) :
+                      entropy::dist::mdl(blockmodel, graph.num_vertices(), graph.num_edges());
+        double delta_entropy = new_entropy - old_entropy;
+        old_entropy = new_entropy;
+        delta_entropies.push_back(delta_entropy);
+        if (mpi.rank == 0) {
+            std::cout << "Itr: " << iteration << " vertex moves: " << vertex_moves << " delta S: "
+                      << delta_entropy / new_entropy << std::endl;
+        }
+//        total_vertex_moves += vertex_moves;
+        timers::MCMC_iterations++;
+        // Early stopping
+        if (early_stop(iteration, golden_ratio_not_reached, new_entropy, delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(new_entropy);
+    if (args.nonblocking) {
+//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+        utils::MPI(MPI_Win_free(&mcmc_window));
+    } else {
+        MPI_Type_free(&Membership_t);
+    }
+    return blockmodel;
+}
+
+std::vector<Membership> metropolis_hastings_iteration(TwoHopBlockmodel &blockmodel, Graph &graph,
+                                                      std::vector<long> *next_assignment, MPI_Win mcmc_window,
+                                                      const std::vector<long> &active_set, int batch) {
+    std::vector<Membership> membership_updates;
+    std::vector<long> vertices;
+    if (active_set.empty())
+        vertices = utils::range<long>(0, graph.num_vertices());
+    else
+        vertices = active_set;
+    long batch_size = long(ceil(double(vertices.size()) / args.batches));
+    long start = batch * batch_size;
+    size_t end = std::min(long(vertices.size()), (batch + 1) * batch_size);
+    // for hybrid_mcmc, we want to go through entire active_set in one go, regardless of number of batches
+    if (batch == -1) {
+        start = 0;
+        end = vertices.size();
+    }
+    for (size_t index = start; index < end; ++index) {  // long vertex : vertices) {
+        long vertex = vertices[index];
+        if (!blockmodel.owns_vertex(vertex)) continue;
+        VertexMove proposal = dist::propose_mh_move(blockmodel, vertex, graph);
+        if (proposal.did_move) {
+//            assert(proposal.proposed_block < blockmodel.num_blocks());
+            assert(blockmodel.stores(proposal.proposed_block));
+            remote_update_membership(vertex, proposal.proposed_block, membership_updates, next_assignment, mcmc_window);
+//            membership_updates.push_back(Membership{vertex, proposal.proposed_block});
+        }
+    }
+    return membership_updates;
+}
+
+VertexMove propose_gibbs_move(const TwoHopBlockmodel &blockmodel, long vertex, const Graph &graph) {
+    bool did_move = false;
+    long current_block = blockmodel.block_assignment(vertex);
+//    if (blockmodel.block_size(current_block) <= args.threads) {
+//        return VertexMove{0.0, did_move, -1, -1 };
+//    }
+
+    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
+    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
+
+    utils::ProposalAndEdgeCounts proposal = common::dist::propose_new_block(
+            current_block, out_edges, in_edges, blockmodel.block_assignment(), blockmodel, false);
+    if (!blockmodel.stores(proposal.proposal)) {
+        std::cerr << "ERROR " << "blockmodel doesn't own proposed block!!!!!" << std::endl;
+        exit(-1000000000);
+    }
+    if (proposal.proposal == current_block) {
+        return VertexMove{0.0, did_move, -1, -1 };
+    }
+
+    return eval_vertex_move(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
+}
+
+VertexMove propose_mh_move(TwoHopBlockmodel &blockmodel, long vertex, const Graph &graph) {
+    bool did_move = false;
+    long current_block = blockmodel.block_assignment(vertex);  // getBlock_assignment()[vertex];
+    if (blockmodel.block_size(current_block) == 1) {
+        return VertexMove{0.0, did_move, -1, -1 };
+    }
+    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
+    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
+
+    utils::ProposalAndEdgeCounts proposal = common::dist::propose_new_block(
+            current_block, out_edges, in_edges, blockmodel.block_assignment(), blockmodel, false);
+    if (!blockmodel.stores(proposal.proposal)) {
+        std::cerr << "ERROR " << "blockmodel doesn't own proposed block!!!!!" << std::endl;
+        exit(-1000000000);
+    }
+    if (proposal.proposal == current_block) {
+        return VertexMove{0.0, did_move, -1, -1 };
+    }
+
+    return move_vertex(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
+}
+
+void remote_update_membership(long vertex, long new_block, std::vector<Membership> &membership_updates,
+                              std::vector<long> *next_assignment, MPI_Win mcmc_window) {
+//    std::cout << mpi.rank << " | attempting to move " << vertex << " to " << new_block << std::endl;
+    if (args.nonblocking) {
+//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+        (*next_assignment)[vertex] = new_block;
+        for (int rank = 0; rank < mpi.num_processes; ++rank) {
+            if (rank == mpi.rank) continue;
+//            std::cout << mpi.rank << " | attempting to update index " << vertex << " to " << (*next_assignment)[vertex] << " on rank " << rank << std::endl;
+//            std::cout << "origin_addr = " << &new_block << std::endl;
+//            assert(mcmc_window != nullptr);
+//            std::cout << "window addr = " << &mcmc_window << std::endl;
+//            assert(next_assignment != nullptr);
+//            assert(next_assignment->size() > vertex);
+//            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, mcmc_window);
+            utils::MPI(MPI_Put(&new_block, 1, MPI_LONG, rank, vertex, 1, MPI_LONG, mcmc_window));
+//            MPI_Win_unlock(rank, mcmc_window);
+        }
+        return;
+    }
+    #pragma omp critical (updates)
+    {
+        membership_updates.push_back(Membership{vertex, new_block});
+    }
+}
+
+void shuffle_active_set(std::vector<long> &active_set) {
+    if (!args.ordered) {
+        unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
+        std::shuffle(active_set.begin(), active_set.end(), std::mt19937_64(seed));
+    }
+}
+
+size_t update_blockmodel(const Graph &graph, TwoHopBlockmodel &blockmodel,
+                         const std::vector<Membership> &membership_updates,
+                         std::vector<long> *next_assignment, MPI_Win mcmc_window) {
+    if (args.nonblocking) {
+//        if (mpi.rank == 0) std::cout << "nonblocking call" << std::endl;
+//        std::cout << mpi.rank << " | updating blockmodel!" << std::endl;
+//        std::cout << mpi.rank << " | next_assignment_addr = " << next_assignment << std::endl;
+//        assert(next_assignment != nullptr);
+//        assert(next_assignment->size() == graph.num_vertices());
+//        assert(mcmc_window != nullptr);
+//        std::cout << mpi.rank << " | window_addr = " << mcmc_window << std::endl;
+        size_t vertex_moves = 0;
+//        std::cout << mpi.rank << " | arrived at win_fence" << std::endl;
+        utils::MPI(MPI_Win_fence(0, mcmc_window));
+//        std::cout << mpi.rank << " | got through win_fence" << std::endl;
+        for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
+            if ((*next_assignment)[vertex] == blockmodel.block_assignment(vertex)) continue;
+//            if ((*next_assignment)[vertex] >= blockmodel.num_blocks()) {
+//                std::cout << mpi.rank << " | updated assignment for " << vertex << " is " << (*next_assignment)[vertex]
+//                          << " when old assignment is " << blockmodel.block_assignment(vertex) << std::endl;
+//                assert((*next_assignment)[vertex] < blockmodel.num_blocks());
+//            }
+            if (async_move({vertex, (*next_assignment)[vertex]}, graph, blockmodel)) {
+                vertex_moves++;
+            }
+        }
+        return vertex_moves;
+    }
+    std::vector<Membership> collected_membership_updates = mpi_get_assignment_updates(membership_updates);
+    size_t vertex_moves = 0;
+    for (const Membership &membership: collected_membership_updates) {
+        if (membership.vertex < 0 || membership.vertex >= graph.num_vertices()) {
+            std::cout << mpi.rank << " | ERROR moving " << membership.vertex << " to block " << membership.block << std::endl;
+            exit(-114);
+        }
+        if (membership.block == blockmodel.block_assignment(membership.vertex)) continue;
+        if (async_move(membership, graph, blockmodel)) {
+            vertex_moves++;
+        }
+    }
+//    size_t vertex_moves = collected_membership_updates.size();
+    return vertex_moves;
+}
+
+}
+// namespace finetune::dist
+
+#endif // SBP_DIST_FINETUNE_HPP
diff --git a/src/distributed/two_hop_blockmodel.cpp b/src/distributed/two_hop_blockmodel.cpp
index 33edc20..6b477e0 100644
--- a/src/distributed/two_hop_blockmodel.cpp
+++ b/src/distributed/two_hop_blockmodel.cpp
@@ -107,7 +107,7 @@ void TwoHopBlockmodel::distribute_none_edge_balanced(const Graph &graph) {
             vertex_info[i] = std::make_pair(i, vertex_degrees[i]);
         }
 //	    std::vector<long> sorted_indices = utils::argsort<long>(vertex_degrees);
-        std::stable_sort(std::execution::par_unseq, vertex_info.begin(), vertex_info.end(),
+        std::stable_sort(vertex_info.begin(), vertex_info.end(),
                          [](const auto &i1, const auto &i2) {
             return i1.second > i2.second;
         });
@@ -473,7 +473,7 @@ std::vector<std::pair<long,long>> TwoHopBlockmodel::sorted_block_sizes() const {
         block_sizes[block].second++;
     }
 //    utils::radix_sort(block_sizes);
-    std::stable_sort(std::execution::par_unseq, block_sizes.begin(), block_sizes.end(), [](const auto &i1, const auto &i2) {
+    std::stable_sort(block_sizes.begin(), block_sizes.end(), [](const auto &i1, const auto &i2) {
         return i1.second > i2.second;
     });
     return block_sizes;
diff --git a/src/entropy.cpp b/src/entropy.cpp
index b376698..4782053 100644
--- a/src/entropy.cpp
+++ b/src/entropy.cpp
@@ -1,1227 +1,1227 @@
-#include "entropy.hpp"
-#include "fastlgamma.hpp"
-#include "spence.hpp"
-
-#include "cmath"
-
-namespace entropy {
-
-double block_merge_delta_mdl(long current_block, long proposal, long num_edges, const Blockmodel &blockmodel,
-                             EdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
-    // Blockmodel indexing
-    std::vector<long> old_block_row = blockmodel.blockmatrix()->getrow(current_block); // M_r_t1
-    std::vector<long> old_proposal_row = blockmodel.blockmatrix()->getrow(proposal);   // M_s_t1
-    std::vector<long> old_block_col = blockmodel.blockmatrix()->getcol(current_block); // M_t2_r
-    std::vector<long> old_proposal_col = blockmodel.blockmatrix()->getcol(proposal);   // M_t2_s
-
-    // Exclude current_block, proposal to prevent double counting
-    std::vector<long> new_proposal_col = common::exclude_indices(updates.proposal_col, current_block, proposal);
-    old_block_col = common::exclude_indices(old_block_col, current_block, proposal);       // M_t2_r
-    old_proposal_col = common::exclude_indices(old_proposal_col, current_block, proposal); // M_t2_s
-    std::vector<long> new_block_degrees_out = common::exclude_indices(block_degrees.block_degrees_out, current_block,
-                                                                     proposal);
-    std::vector<long> old_block_degrees_out = common::exclude_indices(blockmodel.degrees_out(),
-                                                                     current_block, proposal);
-
-    // Remove 0 indices
-    std::vector<long> new_proposal_row_degrees_in = common::index_nonzero(block_degrees.block_degrees_in,
-                                                                         updates.proposal_row);
-    std::vector<long> new_proposal_row = common::nonzeros(updates.proposal_row);
-    std::vector<long> new_proposal_col_degrees_out = common::index_nonzero(new_block_degrees_out, new_proposal_col);
-    new_proposal_col = common::nonzeros(new_proposal_col);
-
-    std::vector<long> old_block_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(),
-                                                                      old_block_row);
-    std::vector<long> old_proposal_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(),
-                                                                         old_proposal_row);
-    old_block_row = common::nonzeros(old_block_row);
-    old_proposal_row = common::nonzeros(old_proposal_row);
-    std::vector<long> old_block_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_block_col);
-    std::vector<long> old_proposal_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_proposal_col);
-    old_block_col = common::nonzeros(old_block_col);
-    old_proposal_col = common::nonzeros(old_proposal_col);
-
-    double delta_entropy = 0.0;
-    delta_entropy -= common::delta_entropy_temp(new_proposal_row, new_proposal_row_degrees_in,
-                                                block_degrees.block_degrees_out[proposal], num_edges);
-    delta_entropy -= common::delta_entropy_temp(new_proposal_col, new_proposal_col_degrees_out,
-                                                block_degrees.block_degrees_in[proposal], num_edges);
-    delta_entropy += common::delta_entropy_temp(old_block_row, old_block_row_degrees_in,
-                                                blockmodel.degrees_out(current_block), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_proposal_row, old_proposal_row_degrees_in,
-                                                blockmodel.degrees_out(proposal), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_block_col, old_block_col_degrees_out,
-                                                blockmodel.degrees_in(current_block), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_proposal_col, old_proposal_col_degrees_out,
-                                                blockmodel.degrees_in(proposal), num_edges);
-    return delta_entropy;
-}
-
-double block_merge_delta_mdl(long current_block, long proposal, long num_edges, const Blockmodel &blockmodel,
-                             SparseEdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
-    // Blockmodel indexing
-    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
-    const MapVector<long> &old_block_row = matrix->getrow_sparse(current_block); // M_r_t1
-    const MapVector<long> &old_proposal_row = matrix->getrow_sparse(proposal);   // M_s_t1
-    const MapVector<long> &old_block_col = matrix->getcol_sparse(current_block); // M_t2_r
-    const MapVector<long> &old_proposal_col = matrix->getcol_sparse(proposal);   // M_t2_s
-
-    double delta_entropy = 0.0;
-    delta_entropy -= common::delta_entropy_temp(updates.proposal_row, block_degrees.block_degrees_in,
-                                                block_degrees.block_degrees_out[proposal], num_edges);
-    delta_entropy -= common::delta_entropy_temp(updates.proposal_col, block_degrees.block_degrees_out,
-                                                block_degrees.block_degrees_in[proposal], current_block, proposal,
-                                                num_edges);
-    delta_entropy += common::delta_entropy_temp(old_block_row, blockmodel.degrees_in(),
-                                                blockmodel.degrees_out(current_block), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_proposal_row, blockmodel.degrees_in(),
-                                                blockmodel.degrees_out(proposal), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_block_col, blockmodel.degrees_out(),
-                                                blockmodel.degrees_in(current_block), current_block,
-                                                proposal, num_edges);
-    delta_entropy += common::delta_entropy_temp(old_proposal_col, blockmodel.degrees_out(),
-                                                blockmodel.degrees_in(proposal), current_block, proposal,
-                                                num_edges);
-    return delta_entropy;
-}
-
-double block_merge_delta_mdl(long current_block, const Blockmodel &blockmodel, const Delta &delta,
-                             common::NewBlockDegrees &block_degrees) {
-    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
-    double delta_entropy = 0.0;
-    long proposed_block = delta.proposed_block();
-    for (const std::tuple<long, long, long> &entry: delta.entries()) {
-        long row = std::get<0>(entry);
-        long col = std::get<1>(entry);
-        long change = std::get<2>(entry);
-        // delta += + E(old) - E(new)
-        delta_entropy += common::cell_entropy(matrix->get(row, col), blockmodel.degrees_in(col),
-                                              blockmodel.degrees_out(row));
-        if (row == current_block || col == current_block) continue;  // the "new" cell entropy == 0;
-        delta_entropy -= common::cell_entropy(matrix->get(row, col) + change, block_degrees.block_degrees_in[col],
-                                              block_degrees.block_degrees_out[row]);
-    }
-    for (const LongEntry &entry: blockmodel.blockmatrix()->getrow_sparse(proposed_block)) {
-        long row = proposed_block;
-        long col = entry.first;
-        long value = entry.second;
-        if (delta.get(row, col) != 0) continue;
-        // Value has not changed
-        delta_entropy += common::cell_entropy((double) value, (double) blockmodel.degrees_in(col),
-                                              (double) blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy((double) value, (double) block_degrees.block_degrees_in[col],
-                                              (double) block_degrees.block_degrees_out[row]);
-    }
-    for (const LongEntry &entry: blockmodel.blockmatrix()->getcol_sparse(proposed_block)) {
-        long row = entry.first;
-        long col = proposed_block;
-        long value = entry.second;
-        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
-        // Value has not changed and we're not double counting
-        delta_entropy += common::cell_entropy((double) value, (double) blockmodel.degrees_in(col),
-                                              (double) blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy((double) value, (double) block_degrees.block_degrees_in[col],
-                                              (double) block_degrees.block_degrees_out[row]);
-    }
-    return delta_entropy;
-}
-
-double block_merge_delta_mdl(long current_block, utils::ProposalAndEdgeCounts proposal, const Blockmodel &blockmodel,
-                             const Delta &delta) {
-    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
-    double delta_entropy = 0.0;
-    long proposed_block = delta.proposed_block();
-    auto get_deg_in = [&blockmodel, &proposal, current_block, proposed_block](long index) -> double {
-        long value = blockmodel.degrees_in(index);
-        if (index == current_block)
-            value -= proposal.num_in_neighbor_edges;
-        else if (index == proposed_block)
-            value += proposal.num_in_neighbor_edges;
-        return double(value);
-    };
-    auto get_deg_out = [&blockmodel, &proposal, current_block, proposed_block](long index) -> double {
-        long value = blockmodel.degrees_out(index);
-        if (index == current_block)
-            value -= proposal.num_out_neighbor_edges;
-        else if (index == proposed_block)
-            value += proposal.num_out_neighbor_edges;
-        return double(value);
-    };
-    for (const std::tuple<long, long, long> &entry: delta.entries()) {
-        long row = std::get<0>(entry);
-        long col = std::get<1>(entry);
-        auto change = (double) std::get<2>(entry);
-        // delta += + E(old) - E(new)
-        auto value = (double) matrix->get(row, col);
-        delta_entropy += common::cell_entropy(value, (double) blockmodel.degrees_in(col),
-                                              (double) blockmodel.degrees_out(row));
-        if (row == current_block || col == current_block) continue;  // the "new" cell entropy == 0;
-        delta_entropy -= common::cell_entropy(value + change, get_deg_in(col), get_deg_out(row));
-    }
-    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getrow_sparse(proposed_block)) {
-        long row = proposed_block;
-        long col = entry.first;
-        auto value = (double) entry.second;
-        if (delta.get(row, col) != 0) continue;
-        // Value has not changed
-        delta_entropy += common::cell_entropy((double) value, (double) blockmodel.degrees_in(col),
-                                              (double) blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
-    }
-    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getcol_sparse(proposed_block)) {
-        long row = entry.first;
-        long col = proposed_block;
-        auto value = (double) entry.second;
-        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
-        // Value has not changed and we're not double counting
-        delta_entropy += common::cell_entropy(value, (double) blockmodel.degrees_in(col),
-                                              (double) blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
-    }
-    return delta_entropy;
-}
-
-double delta_mdl(long current_block, long proposal, const Blockmodel &blockmodel, long num_edges,
-                 EdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
-    // Blockmodel indexing
-    std::vector<long> old_block_row = blockmodel.blockmatrix()->getrow(current_block); // M_r_t1
-    std::vector<long> old_proposal_row = blockmodel.blockmatrix()->getrow(proposal);   // M_s_t1
-    std::vector<long> old_block_col = blockmodel.blockmatrix()->getcol(current_block); // M_t2_r
-    std::vector<long> old_proposal_col = blockmodel.blockmatrix()->getcol(proposal);   // M_t2_s
-
-    // Exclude current_block, proposal to prevent double counting
-    std::vector<long> new_block_col = common::exclude_indices(updates.block_col, current_block, proposal); // added
-    std::vector<long> new_proposal_col = common::exclude_indices(updates.proposal_col, current_block, proposal);
-    old_block_col = common::exclude_indices(old_block_col, current_block, proposal);       // M_t2_r
-    old_proposal_col = common::exclude_indices(old_proposal_col, current_block, proposal); // M_t2_s
-    std::vector<long> new_block_degrees_out = common::exclude_indices(block_degrees.block_degrees_out, current_block,
-                                                                     proposal);
-    std::vector<long> old_block_degrees_out = common::exclude_indices(blockmodel.degrees_out(), current_block, proposal);
-
-    // Remove 0 indices
-    std::vector<long> new_block_row_degrees_in = common::index_nonzero(block_degrees.block_degrees_in,
-                                                                      updates.block_row); // added
-    std::vector<long> new_proposal_row_degrees_in = common::index_nonzero(block_degrees.block_degrees_in,
-                                                                         updates.proposal_row);
-    std::vector<long> new_block_row = common::nonzeros(updates.block_row); // added
-    std::vector<long> new_proposal_row = common::nonzeros(updates.proposal_row);
-    std::vector<long> new_block_col_degrees_out = common::index_nonzero(new_block_degrees_out, new_block_col); // added
-    std::vector<long> new_proposal_col_degrees_out = common::index_nonzero(new_block_degrees_out, new_proposal_col);
-    new_block_col = common::nonzeros(new_block_col); // added
-    new_proposal_col = common::nonzeros(new_proposal_col);
-
-    std::vector<long> old_block_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(), old_block_row);
-    std::vector<long> old_proposal_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(), old_proposal_row);
-    old_block_row = common::nonzeros(old_block_row);
-    old_proposal_row = common::nonzeros(old_proposal_row);
-    std::vector<long> old_block_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_block_col);
-    std::vector<long> old_proposal_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_proposal_col);
-    old_block_col = common::nonzeros(old_block_col);
-    old_proposal_col = common::nonzeros(old_proposal_col);
-
-    double delta_entropy = 0.0;
-    delta_entropy -= common::delta_entropy_temp(new_block_row, new_block_row_degrees_in,
-                                                block_degrees.block_degrees_out[current_block], num_edges); // added
-    delta_entropy -= common::delta_entropy_temp(new_proposal_row, new_proposal_row_degrees_in,
-                                                block_degrees.block_degrees_out[proposal], num_edges);
-    delta_entropy -= common::delta_entropy_temp(new_block_col, new_block_col_degrees_out,
-                                                block_degrees.block_degrees_in[current_block], num_edges); // added
-    delta_entropy -= common::delta_entropy_temp(new_proposal_col, new_proposal_col_degrees_out,
-                                                block_degrees.block_degrees_in[proposal], num_edges);
-    delta_entropy += common::delta_entropy_temp(old_block_row, old_block_row_degrees_in,
-                                                blockmodel.degrees_out(current_block), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_proposal_row, old_proposal_row_degrees_in,
-                                                blockmodel.degrees_out(proposal), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_block_col, old_block_col_degrees_out,
-                                                blockmodel.degrees_in(current_block), num_edges);
-    delta_entropy += common::delta_entropy_temp(old_proposal_col, old_proposal_col_degrees_out,
-                                                blockmodel.degrees_in(proposal), num_edges);
-    if (std::isnan(delta_entropy)) {
-        std::cout << "Error: Dense delta entropy is NaN" << std::endl;
-        exit(-142321);
-    }
-    return delta_entropy;
-}
-
-double delta_mdl(long current_block, long proposal, const Blockmodel &blockmodel, long num_edges,
-                 SparseEdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
-    // Blockmodel indexing
-    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
-    const MapVector<long> &old_block_row = matrix->getrow_sparseref(current_block); // M_r_t1
-    const MapVector<long> &old_proposal_row = matrix->getrow_sparseref(proposal);   // M_s_t1
-    const MapVector<long> &old_block_col = matrix->getcol_sparseref(current_block); // M_t2_r
-    const MapVector<long> &old_proposal_col = matrix->getcol_sparseref(proposal);   // M_t2_s
-
-    double delta_entropy = 0.0;
-    delta_entropy -= common::delta_entropy_temp(updates.block_row, block_degrees.block_degrees_in,
-                                                block_degrees.block_degrees_out[current_block], num_edges);
-    assert(!std::isnan(delta_entropy));
-    delta_entropy -= common::delta_entropy_temp(updates.proposal_row, block_degrees.block_degrees_in,
-                                                block_degrees.block_degrees_out[proposal], num_edges);
-    assert(!std::isnan(delta_entropy));
-    delta_entropy -= common::delta_entropy_temp(updates.block_col, block_degrees.block_degrees_out,
-                                                block_degrees.block_degrees_in[current_block], current_block, proposal,
-                                                num_edges);
-    if (std::isnan(delta_entropy)) {
-        std::cout << "block_col: ";
-        utils::print<long>(updates.block_col);
-        std::cout << "_block_degrees_out: ";
-        utils::print<long>(block_degrees.block_degrees_out);
-        std::cout << "block_degree in: " << block_degrees.block_degrees_in[current_block] << std::endl;
-    }
-    assert(!std::isnan(delta_entropy));
-    delta_entropy -= common::delta_entropy_temp(updates.proposal_col, block_degrees.block_degrees_out,
-                                                block_degrees.block_degrees_in[proposal], current_block, proposal,
-                                                num_edges);
-    assert(!std::isnan(delta_entropy));
-    delta_entropy += common::delta_entropy_temp(old_block_row, blockmodel.degrees_in(),
-                                                blockmodel.degrees_out(current_block), num_edges);
-    assert(!std::isnan(delta_entropy));
-    delta_entropy += common::delta_entropy_temp(old_proposal_row, blockmodel.degrees_in(),
-                                                blockmodel.degrees_out(proposal), num_edges);
-    assert(!std::isnan(delta_entropy));
-    delta_entropy += common::delta_entropy_temp(old_block_col, blockmodel.degrees_out(),
-                                                blockmodel.degrees_in(current_block), current_block,
-                                                proposal, num_edges);
-    assert(!std::isnan(delta_entropy));
-    delta_entropy += common::delta_entropy_temp(old_proposal_col, blockmodel.degrees_out(),
-                                                blockmodel.degrees_in(proposal), current_block, proposal,
-                                                num_edges);
-    assert(!std::isnan(delta_entropy));
-    if (std::isnan(delta_entropy)) {
-        std::cerr << "ERROR " << "Error: Sparse delta entropy is NaN" << std::endl;
-        exit(-142321);
-    }
-    return delta_entropy;
-}
-
-double delta_mdl(const Blockmodel &blockmodel, const Delta &delta, const utils::ProposalAndEdgeCounts &proposal) {
-    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
-    double delta_entropy = 0.0;
-    long current_block = delta.current_block();
-    long proposed_block = delta.proposed_block();
-    auto get_deg_in = [&blockmodel, &proposal, &delta, current_block, proposed_block](long index) -> size_t {
-        long value = blockmodel.degrees_in(index);
-        if (index == current_block)
-            value -= (proposal.num_in_neighbor_edges + delta.self_edge_weight());
-        else if (index == proposed_block)
-            value += (proposal.num_in_neighbor_edges + delta.self_edge_weight());
-        return value;
-    };
-    auto get_deg_out = [&blockmodel, &proposal, current_block, proposed_block](long index) -> size_t {
-        long value = blockmodel.degrees_out(index);
-        if (index == current_block)
-            value -= proposal.num_out_neighbor_edges;
-        else if (index == proposed_block)
-            value += proposal.num_out_neighbor_edges;
-        return value;
-    };
-    for (const std::tuple<long, long, long> &entry: delta.entries()) {
-        long row = std::get<0>(entry);
-        long col = std::get<1>(entry);
-        long change = std::get<2>(entry);
-        delta_entropy += common::cell_entropy(matrix->get(row, col), blockmodel.degrees_in(col),
-                                              blockmodel.degrees_out(row));
-        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
-            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << matrix->get(row, col) << " delta: " << change << std::endl;
-            utils::print<long>(blockmodel.blockmatrix()->getrow_sparse(row));
-            utils::print<long>(blockmodel.blockmatrix()->getcol_sparse(col));
-            std::cout << "d_out[row]: " << blockmodel.degrees_out(row) << " d_in[col]: " << blockmodel.degrees_in(row) << std::endl;
-            throw std::invalid_argument("nan/inf in bm delta for old bm when delta != 0");
-        }
-        delta_entropy -= common::cell_entropy(matrix->get(row, col) + change, get_deg_in(col),
-                                              get_deg_out(row));
-        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
-            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << matrix->get(row, col) << " delta: " << change;
-            std::cout << " current: " << current_block << " proposed: " << proposed_block << std::endl;
-            utils::print<long>(blockmodel.blockmatrix()->getrow_sparse(row));
-            utils::print<long>(blockmodel.blockmatrix()->getcol_sparse(col));
-            std::cout << "d_out[row]: " << blockmodel.degrees_out(row) << " d_in[col]: " << blockmodel.degrees_in(col) << std::endl;
-            std::cout << "new d_out[row]: " << get_deg_out(row) << " d_in[col]: " << get_deg_in(col) << std::endl;
-            std::cout << "v_out: " << proposal.num_out_neighbor_edges << " v_in: " << proposal.num_in_neighbor_edges << " v_total: " << proposal.num_neighbor_edges << std::endl;
-            throw std::invalid_argument("nan/inf in bm delta for new bm when delta != 0");
-        }
-    }
-    // Compute change in entropy for cells with no delta
-    for (const auto &entry: blockmodel.blockmatrix()->getrow_sparseref(current_block)) {
-        long row = current_block;
-        long col = entry.first;
-        long value = entry.second;
-        if (delta.get(row, col) != 0) continue;
-        // Value has not changed
-        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
-                                              blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
-        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
-            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
-            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and row = current block");
-        }
-    }
-    for (const auto &entry: blockmodel.blockmatrix()->getrow_sparseref(proposed_block)) {
-        long row = proposed_block;
-        long col = entry.first;
-        long value = entry.second;
-        if (delta.get(row, col) != 0) continue;
-        // Value has not changed
-        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
-                                              blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
-        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
-            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
-            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and row = proposed block");
-        }
-    }
-    for (const auto &entry: blockmodel.blockmatrix()->getcol_sparseref(current_block)) {
-        long row = entry.first;
-        long col = current_block;
-        long value = entry.second;
-        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
-        // Value has not changed and we're not double counting
-        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
-                                              blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
-        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
-            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
-            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and col = current block");
-        }
-    }
-    for (const auto &entry: blockmodel.blockmatrix()->getcol_sparseref(proposed_block)) {
-        long row = entry.first;
-        long col = proposed_block;
-        long value = entry.second;
-        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
-        // Value has not changed and we're not double counting
-        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
-                                              blockmodel.degrees_out(row));
-        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
-        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
-            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
-            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and col = proposed block");
-        }
-    }
-    return delta_entropy;
-}
-
-double hastings_correction(const Blockmodel &blockmodel, EdgeWeights &out_blocks, EdgeWeights &in_blocks,
-                           utils::ProposalAndEdgeCounts &proposal, EdgeCountUpdates &updates,
-                           common::NewBlockDegrees &new_block_degrees) {
-    if (proposal.num_neighbor_edges == 0 || args.greedy) {
-        return 1.0;
-    }
-    // Compute block weights
-    std::map<long, long> block_counts;
-    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
-        long block = out_blocks.indices[i];
-        long weight = out_blocks.values[i];
-        block_counts[block] += weight; // block_count[new block] should initialize to 0
-    }
-    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
-        long block = in_blocks.indices[i];
-        long weight = in_blocks.values[i];
-        block_counts[block] += weight; // block_count[new block] should initialize to 0
-    }
-    // Create Arrays using unique blocks
-    size_t num_unique_blocks = block_counts.size();
-    std::vector<double> counts(num_unique_blocks, 0);
-    std::vector<double> proposal_weights(num_unique_blocks, 0);
-    std::vector<double> block_weights(num_unique_blocks, 0);
-    std::vector<double> block_degrees(num_unique_blocks, 0);
-    std::vector<double> proposal_degrees(num_unique_blocks, 0);
-    // Indexing
-    std::vector<long> proposal_row = blockmodel.blockmatrix()->getrow(proposal.proposal);
-    std::vector<long> proposal_col = blockmodel.blockmatrix()->getcol(proposal.proposal);
-    // Fill Arrays
-    long index = 0;
-    long num_blocks = blockmodel.num_blocks();
-    const std::vector<long> &current_block_degrees = blockmodel.degrees();
-    for (auto const &entry: block_counts) {
-        counts[index] = entry.second;
-        proposal_weights[index] = proposal_row[entry.first] + proposal_col[entry.first] + 1.0;
-        block_degrees[index] = current_block_degrees[entry.first] + num_blocks;
-        block_weights[index] = updates.block_row[entry.first] + updates.block_col[entry.first] + 1.0;
-        proposal_degrees[index] = new_block_degrees.block_degrees[entry.first] + num_blocks;
-        index++;
-    }
-    // Compute p_forward and p_backward
-    auto p_forward = utils::sum<double>(counts * proposal_weights / block_degrees);
-    auto p_backward = utils::sum<double>(counts * block_weights / proposal_degrees);
-    return p_backward / p_forward;
-}
-
-double hastings_correction(const Blockmodel &blockmodel, EdgeWeights &out_blocks, EdgeWeights &in_blocks,
-                           utils::ProposalAndEdgeCounts &proposal, SparseEdgeCountUpdates &updates,
-                           common::NewBlockDegrees &new_block_degrees) {
-    if (proposal.num_neighbor_edges == 0 || args.greedy) {
-        return 1.0;
-    }
-    // Compute block weights
-    std::map<long, long> block_counts;
-    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
-        long block = out_blocks.indices[i];
-        long weight = out_blocks.values[i];
-        block_counts[block] += weight; // block_count[new block] should initialize to 0
-    }
-    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
-        long block = in_blocks.indices[i];
-        long weight = in_blocks.values[i];
-        block_counts[block] += weight; // block_count[new block] should initialize to 0
-    }
-    // Create Arrays using unique blocks
-    size_t num_unique_blocks = block_counts.size();
-    std::vector<double> counts(num_unique_blocks, 0);
-    std::vector<double> proposal_weights(num_unique_blocks, 0);
-    std::vector<double> block_weights(num_unique_blocks, 0);
-    std::vector<double> block_degrees(num_unique_blocks, 0);
-    std::vector<double> proposal_degrees(num_unique_blocks, 0);
-    // Indexing
-    std::vector<long> proposal_row = blockmodel.blockmatrix()->getrow(proposal.proposal);
-    std::vector<long> proposal_col = blockmodel.blockmatrix()->getcol(proposal.proposal);
-    // Fill Arrays
-    long index = 0;
-    long num_blocks = blockmodel.num_blocks();
-    const std::vector<long> &current_block_degrees = blockmodel.degrees();
-    for (auto const &entry: block_counts) {
-        counts[index] = entry.second;
-        proposal_weights[index] = proposal_row[entry.first] + proposal_col[entry.first] + 1.0;
-        block_degrees[index] = current_block_degrees[entry.first] + num_blocks;
-        block_weights[index] = updates.block_row[entry.first] + updates.block_col[entry.first] + 1.0;
-        proposal_degrees[index] = new_block_degrees.block_degrees[entry.first] + num_blocks;
-        index++;
-    }
-    // Compute p_forward and p_backward
-    auto p_forward = utils::sum<double>(counts * proposal_weights / block_degrees);
-    auto p_backward = utils::sum<double>(counts * block_weights / proposal_degrees);
-    return p_backward / p_forward;
-}
-
-double hastings_correction(long vertex, const Graph &graph, const Blockmodel &blockmodel, const Delta &delta,
-                           long current_block, const utils::ProposalAndEdgeCounts &proposal) {
-    if (proposal.num_neighbor_edges == 0 || args.greedy || args.nonparametric) {  // No correction needed with greedy proposals
-        return 1.0;
-    }
-    // Compute block weights
-    MapVector<long> block_counts;
-    for (const long neighbor: graph.out_neighbors(vertex)) {
-        long neighbor_block = blockmodel.block_assignment(neighbor);
-        block_counts[neighbor_block] += 1;
-    }
-    for (const long neighbor: graph.in_neighbors(vertex)) {
-        if (neighbor == vertex) continue;
-        long neighbor_block = blockmodel.block_assignment(neighbor);
-        block_counts[neighbor_block] += 1;
-    }
-    // Create Arrays using unique blocks
-    size_t num_unique_blocks = block_counts.size();
-    std::vector<double> counts(num_unique_blocks, 0);
-    std::vector<double> proposal_weights(num_unique_blocks, 0);
-    std::vector<double> block_weights(num_unique_blocks, 0);
-    std::vector<double> block_degrees(num_unique_blocks, 0);
-    std::vector<double> proposal_degrees(num_unique_blocks, 0);
-    // Indexing
-//    std::vector<long> proposal_row = blockmodel.blockmatrix()->getrow(proposal.proposal);
-//    std::vector<long> proposal_col = blockmodel.blockmatrix()->getcol(proposal.proposal);
-    const MapVector<long> &proposal_row = blockmodel.blockmatrix()->getrow_sparseref(proposal.proposal);
-    const MapVector<long> &proposal_col = blockmodel.blockmatrix()->getcol_sparseref(proposal.proposal);
-    // Fill Arrays
-    long index = 0;
-    long num_blocks = blockmodel.num_blocks();
-    const std::vector<long> &current_block_degrees = blockmodel.degrees();
-    for (auto const &entry: block_counts) {
-        counts[index] = entry.second;
-        proposal_weights[index] = map_vector::get(proposal_row, entry.first) + map_vector::get(proposal_col, entry.first) + 1.0;
-        block_degrees[index] = current_block_degrees[entry.first] + num_blocks;
-        block_weights[index] = blockmodel.blockmatrix()->get(current_block, entry.first) +
-                               delta.get(current_block, entry.first) +
-                               //                get(delta, std::make_pair(current_block, entry.first)) +
-                               blockmodel.blockmatrix()->get(entry.first, current_block) +
-                               delta.get(entry.first, current_block) + 1.0;
-//                get(delta, std::make_pair(entry.first, current_block)) + 1.0;
-        long new_block_degree = blockmodel.degrees(entry.first);
-        if (entry.first == current_block) {
-            long current_block_self_edges = blockmodel.blockmatrix()->get(current_block, current_block)
-                                           + delta.get(current_block, current_block);
-            long degree_out = blockmodel.degrees_out(current_block) - proposal.num_out_neighbor_edges;
-            long degree_in = blockmodel.degrees_in(current_block) - proposal.num_in_neighbor_edges;
-            new_block_degree = degree_out + degree_in - current_block_self_edges;
-        } else if (entry.first == proposal.proposal) {
-            long proposed_block_self_edges = blockmodel.blockmatrix()->get(proposal.proposal, proposal.proposal)
-                                            + delta.get(proposal.proposal, proposal.proposal);
-            long degree_out = blockmodel.degrees_out(proposal.proposal) + proposal.num_out_neighbor_edges;
-            long degree_in = blockmodel.degrees_in(proposal.proposal) + proposal.num_in_neighbor_edges;
-            new_block_degree = degree_out + degree_in - proposed_block_self_edges;
-        }
-//        proposal_degrees[index] = new_block_degrees.block_degrees[entry.first] + _num_blocks;
-        proposal_degrees[index] = new_block_degree + num_blocks;
-        index++;
-    }
-    // Compute p_forward and p_backward
-    auto p_forward = utils::sum<double>(counts * proposal_weights / block_degrees);
-    auto p_backward = utils::sum<double>(counts * block_weights / proposal_degrees);
-    return p_backward / p_forward;
-}
-
-double normalize_mdl_v1(double mdl, const Graph &graph) {
-    return mdl / null_mdl_v1(graph);
-}
-
-//double normalize_mdl_v2(double mdl, long num_vertices, long num_edges) {
-//    return mdl / null_mdl_v2(num_vertices, num_edges);
-//}
-
-double null_mdl_v1(const Graph &graph) {
-    if (args.nonparametric) {
-//        std::cout << "why is this running nonparametric?" << std::endl;
-        std::vector<long> assignment = utils::constant<long>(graph.num_vertices(), 0);
-        Blockmodel null_model(1, graph, 0.5, assignment);
-        return mdl(null_model, graph);
-    }
-    std::cout << "running correct version at least..." << std::endl;
-    std::cout << graph.num_edges() << std::endl;
-    double log_posterior_p = graph.num_edges() * log(1.0 / graph.num_edges());
-    double x = 1.0 / graph.num_edges();
-    double h = ((1 + x) * log(1 + x)) - (x * log(x));
-    std::cout << "log posterior = " << log_posterior_p << " blockmodel = " << (graph.num_edges() * h) << std::endl;
-    return (graph.num_edges() * h) - log_posterior_p;
-}
-
-/** double null_mdl_v2(long num_vertices, long num_edges) {
-    // TODO: not sure how this works in nonparametric version
-    double log_posterior_p = num_edges * log(1.0 / num_edges);
-    // done calculating log_posterior_probability
-    double x = pow(num_vertices, 2) / num_edges;
-    double h = ((1 + x) * log(1 + x)) - (x * log(x));
-//    std::cout << "log posterior = " << log_posterior_p << " blockmodel = " << (num_edges * h) + (num_vertices * log(num_vertices)) << std::endl;
-    return (num_edges * h) + (num_vertices * log(num_vertices)) - log_posterior_p;
-} */
-
-double mdl(const Blockmodel &blockmodel, const Graph &graph) {
-    if (args.nonparametric)
-        return nonparametric::mdl(blockmodel, graph);
-    double log_posterior_p = blockmodel.log_posterior_probability();
-    double x = pow(blockmodel.num_blocks(), 2) / graph.num_edges();
-    double h = ((1 + x) * log(1 + x)) - (x * log(x));
-    return (graph.num_edges() * h) + (graph.num_vertices() * log(blockmodel.num_blocks())) - log_posterior_p;
-}
-
-namespace dist {
-
-double mdl(const TwoHopBlockmodel &blockmodel, long num_vertices, long num_edges) {
-    double log_posterior_p = blockmodel.log_posterior_probability();
-    double x = pow(blockmodel.num_blocks(), 2) / num_edges;
-    double h = ((1 + x) * log(1 + x)) - (x * log(x));
-    return (num_edges * h) + (num_vertices * log(blockmodel.num_blocks())) - log_posterior_p;
-}
-
-}  // namespace dist
-
-namespace nonparametric {
-
-double get_deg_entropy(const Graph &graph, long vertex) {  // , const simple_degs_t&) {
-    long k_in = (long) graph.in_neighbors(vertex).size();
-    long k_out = (long) graph.out_neighbors(vertex).size();
-    return -fastlgamma(k_in + 1) - fastlgamma(k_out + 1);
-}
-
-double sparse_entropy(const Blockmodel &blockmodel, const Graph &graph) {
-    double S = 0;
-
-    for (long source = 0; source < blockmodel.num_blocks(); ++source) {
-        const MapVector<long> &row = blockmodel.blockmatrix()->getrow_sparseref(source);
-        for (const std::pair<long, long> &entry : row) {
-            long destination = entry.first;
-            long weight = entry.second;
-            S += eterm_exact(source, destination, weight);
-            assert(!std::isinf(S));
-            assert(!std::isnan(S));        }
-    }
-
-    for (long block = 0; block < blockmodel.num_blocks(); ++block) {
-        S += vterm_exact(blockmodel.degrees_out(block), blockmodel.degrees_in(block), blockmodel.block_size(block));
-        assert(!std::isinf(S));
-        assert(!std::isnan(S));    }
-
-//    std::cout << "S before deg_ent?ropy: " << S << std::endl;
-    if (!args.degreecorrected) return S;
-//    double S2 = 0.0;
-    // In distributed case, we would only compute these for vertices we're responsible for. Since it's a simple addition, we can do an allreduce.
-    for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
-//        S += get_deg_entropy(graph, vertex);
-        double temp = get_deg_entropy(graph, vertex);
-        S += temp;
-        assert(!std::isinf(S));
-        assert(!std::isnan(S));
-//        S2 += temp;
-    }
-
-//    std::cout << "deg_entropy: " << S2 << std::endl;
-
-    return S;
-}
-
-double get_partition_dl(long N, const Blockmodel &blockmodel) { // _N = number of vertices, _actual_B = nonzero blocks, _total = vector of block sizes
-    double S = 0;
-    S += fastlbinom(N - 1, blockmodel.num_nonempty_blocks() - 1);
-    S += fastlgamma(N + 1);
-    for (const long &block_size : blockmodel.block_sizes())
-        S -= fastlgamma(block_size + 1);
-    S += fastlog(N);
-//    std::cout << "Partition dl: " << S << std::endl;
-    assert(!std::isinf(S));
-    assert(!std::isnan(S));
-    return S;
-}
-
-/// No idea what this function does. See int_part.cc in https://git.skewed.de/count0/graph-tool
-double get_v(double u, double epsilon) {
-    double v = u;
-    double delta = 1;
-    while (delta > epsilon) {
-        double n_v = u * sqrt(spence(exp(-v)));
-        delta = abs(n_v - v);
-        v = n_v;
-    }
-    return v;
-}
-
-double log_q_approx_small(size_t n, size_t k) {
-    return fastlbinom(n - 1, k - 1) - fastlgamma(k + 1);
-}
-
-/// Computes the number of restricted of integer n into at most m parts. This is part of teh prior for the
-/// degree-corrected SBM.
-/// TO-DO: the current function contains only the approximation of log_q. If it becomes a bottleneck, you'll want to
-/// compute a cache of log_q(n, m) for ~20k n and maybe a few hundred m? I feel like for larger graphs, the cache
-/// will be a waste of time.
-/// See int_part.cc in https://git.skewed.de/count0/graph-tool
-double log_q(size_t n, size_t k) {
-    if (n <= 0 || k < 1) return 0;
-    if (k > n) k = n;
-    if (k < pow(n, 1/4.))
-        return log_q_approx_small(n, k);
-    double u = k / sqrt(n);
-    double v = get_v(u);
-    double lf = log(v) - log1p(- exp(-v) * (1 + u * u/2)) / 2 - log(2) * 3 / 2.
-                - log(u) - log(M_PI);
-    double g = 2 * v / u - u * log1p(-exp(-v));
-    return lf - log(n) + sqrt(n) * g;
-}
-
-double get_deg_dl_dist(const Blockmodel &blockmodel) { // Rs&& rs, Ks&& ks) {  // RS: range from 0 to B, KS is an empty array of pairs?
-    if (!args.degreecorrected) return 0.0;
-
-    double S = 0;
-
-    for (int block = 0; block < blockmodel.num_blocks(); ++block) {
-        S += log_q(blockmodel.degrees_out(block), blockmodel.block_size(block));
-        S += log_q(blockmodel.degrees_in(block), blockmodel.block_size(block));
-        size_t total = 0;
-        if (!args.undirected) {
-            for (const std::pair<long, long> &entry : blockmodel.in_degree_histogram(block)) {
-                S -= fastlgamma(entry.second + 1);
-                assert(!std::isinf(S));
-                assert(!std::isnan(S));
-            }
-        }
-        for (const std::pair<long, long> &entry : blockmodel.out_degree_histogram(block)) {
-            S -= fastlgamma(entry.second + 1);
-            assert(!std::isinf(S));
-            assert(!std::isnan(S));
-            total += entry.second;
-        }
-
-        if (args.undirected) {
-            S += fastlgamma(total + 1);
-        } else {
-            S += 2 * fastlgamma(total + 1);
-        }
-        assert(!std::isinf(S));
-        assert(!std::isnan(S));
-    }
-//    std::cout << "degree_dl: " << S << std::endl;
-    return S;
-}
-
-double get_edges_dl(size_t B, size_t E) {
-    size_t NB = !args.undirected ? B * B : (B * (B + 1)) / 2;
-    double E_dl = fastlbinom(NB + E - 1, E);
-//    std::cout << "edges_dl: " << E_dl << std::endl;
-    return E_dl;
-}
-
-double mdl(const Blockmodel &blockmodel, const Graph &graph) {
-    double S = 0, S_dl = 0;
-
-    S = sparse_entropy(blockmodel, graph);
-    assert(!std::isinf(S));
-    assert(!std::isnan(S));
-//    std::cout << "sparse E: " << S << std::endl;
-
-    S_dl += get_partition_dl(graph.num_vertices(), blockmodel);
-    assert(!std::isinf(S_dl));
-    assert(!std::isnan(S_dl));
-//    std::cout << "partition_dl: " << S_dl << std::endl;
-
-    S_dl += get_deg_dl_dist(blockmodel);  // (ea.degree_dl_kind);
-    assert(!std::isinf(S_dl));
-    assert(!std::isnan(S_dl));
-//    std::cout << "after deg_dl: " << S_dl << std::endl;
-
-//    std::cout << "NB: " << blockmodel.num_nonempty_blocks() << " E: " << graph.num_edges() << std::endl;
-//    S_dl += get_edges_dl(blockmodel.num_nonempty_blocks(), graph.num_edges());
-    double E_dl = get_edges_dl(blockmodel.num_nonempty_blocks(), graph.num_edges());
-//    std::cout << "edges_dl: " << E_dl << std::endl;
-    S_dl += E_dl;
-    assert(!std::isinf(S_dl));
-    assert(!std::isnan(S_dl));
-//    std::cout << "after edge_dl: " << S_dl << std::endl;
-//    utils::print(blockmodel.block_sizes());
-    return S + S_dl * BETA_DL;
-}
-
-// obtain the entropy difference given a set of entries in the e_rs matrix
-//template <bool exact, class MEntries, class Eprop, class EMat, class BGraph>
-//[[gnu::always_inline]] [[gnu::flatten]] [[gnu::hot]] inline
-double entries_dS(const Blockmodel &blockmodel, const Delta &delta) {  // MEntries& m_entries, Eprop& mrs, EMat& emat, BGraph& bg) {
-    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
-    double dS = 0;
-    for (const std::tuple<long, long, long> &entry : delta.entries()) {
-        long row = std::get<0>(entry);
-        long col = std::get<1>(entry);
-        auto change = (long) std::get<2>(entry);
-        // delta += + E(old) - E(new)
-        auto value = (long) matrix->get(row, col);
-        dS += eterm_exact(row, col, value + change) - eterm_exact(row, col, value);
-        assert(!std::isinf(dS));
-        assert(!std::isnan(dS));
-    }
-//    entries_op(m_entries, emat,
-//               [&](auto r, auto s, auto& me, auto d)
-//               {
-//                   size_t ers = 0;
-//                   if (me != emat.get_null_edge())
-//                       ers = mrs[me];
-//                   assert(int(ers) + d >= 0);
-//                   if constexpr (exact)
-//                       dS += eterm_exact(r, s, ers + d, bg) - eterm_exact(r, s, ers, bg);
-//                   else
-//                       dS += eterm(r, s, ers + d, bg) - eterm(r, s, ers, bg);
-//               });
-    return dS;
-}
-
-// compute the entropy difference of a virtual move of vertex from block r
-// to nr
-//template <bool exact, class MEntries>
-double virtual_move_sparse(const Blockmodel &blockmodel, const Delta &delta, long weight,
-                           const utils::ProposalAndEdgeCounts &proposal) {  // size_t v, size_t r, size_t nr, MEntries& m_entries) {
-    if (delta.current_block() == delta.proposed_block()) return 0.;
-
-    double dS = entries_dS(blockmodel, delta);  // <exact>(m_entries, _mrs, _emat, _bg);
-
-    long kin = proposal.num_in_neighbor_edges;
-    long kout = proposal.num_out_neighbor_edges;
-//    auto [kin, kout] = get_deg(v, _eweight, _degs, _g);
-
-//    int dwr = _vweight[v];
-//    int dwnr = dwr;
-//    int dwr = 1, dwnr = 1;
-
-//    if (r == null_group && dwnr == 0)
-//        dwnr = 1;
-
-    auto vt = [&](auto out_degree, auto in_degree, auto w) { // , auto nr) {
-        assert(out_degree >= 0 && in_degree >=0);
-//        if constexpr (exact)
-        return vterm_exact(out_degree, in_degree, w);  // , nr, _deg_corr, _bg);
-//        else
-//            return vterm(mrp, mrm, nr, _deg_corr, _bg);
-    };
-
-//    if (r != null_group)
-//    {
-//    auto mrp_r = _mrp[r];
-//    auto mrm_r = _mrm[r];
-//    auto wr_r = _wr[r];
-    dS += vt(blockmodel.degrees_out(delta.current_block()) - kout, blockmodel.degrees_in(delta.current_block()) - kin, blockmodel.block_size(delta.current_block()) - weight);  // , wr_r - dwr);
-    dS -= vt(blockmodel.degrees_out(delta.current_block()), blockmodel.degrees_in(delta.current_block()), blockmodel.block_size(delta.current_block()));  //        , mrm_r      , wr_r      );
-    assert(!std::isinf(dS));
-    assert(!std::isnan(dS));
-//    }
-
-//    if (nr != null_group)
-//    {
-//        auto mrp_nr = _mrp[nr];
-//        auto mrm_nr = _mrm[nr];
-//        auto wr_nr = _wr[nr];
-    dS += vt(blockmodel.degrees_out(delta.proposed_block()) + kout, blockmodel.degrees_in(delta.proposed_block()) + kin, blockmodel.block_size(delta.proposed_block()) + weight);  // , wr_nr + dwnr);
-    dS -= vt(blockmodel.degrees_out(delta.proposed_block()), blockmodel.degrees_in(delta.proposed_block()), blockmodel.block_size(delta.proposed_block()));  //        , mrm_nr      , wr_nr       );
-    assert(!std::isinf(dS));
-    assert(!std::isnan(dS));
-//    }
-
-//    std::cout << "delta sparse entropy: " << dS << std::endl;
-    return dS;
-}
-
-double get_delta_partition_dl(long num_vertices, const Blockmodel &blockmodel, const Delta &delta, long weight) {  // size_t v, size_t r, size_t nr, const entropy_args_t& ea) {
-    if (delta.current_block() == delta.proposed_block()) return 0.;
-
-    double dS = 0;
-
-//    auto& f = _bfield[v];
-//    if (!f.empty())
-//    {
-//        if (nr != null_group)
-//            dS -= (nr < f.size()) ? f[nr] : f.back();
-//        if (r != null_group)
-//            dS += (r < f.size()) ? f[r] : f.back();
-//    }
-
-//    if (r == nr)
-//        return 0;
-
-//    if (r != null_group)
-//        r = get_r(r);
-
-//    if (nr != null_group)
-//        nr = get_r(nr);
-
-//    int n = 1;  // vweight[v]; for block_merge, change this to size of the blockmodel
-//    if (n == 0) {
-//        if (r == null_group)
-//            n = 1;
-//        else
-//            return 0;
-//    }
-
-    double S_b = 0;
-    double S_a = 0;
-
-//    if (r != null_group)
-//    {
-//    std::cout << "size of current: " << blockmodel.block_size(delta.current_block())
-    S_b += -fastlgamma(blockmodel.block_size(delta.current_block()) + 1);  // _total[r] + 1);
-    S_a += -fastlgamma(blockmodel.block_size(delta.current_block()) - weight + 1);  // _total[r] - n + 1);
-
-//    std::cout << "point A: S_b = " << S_b << " S_a = " << S_a << std::endl;
-//    }
-
-//    if (nr != null_group)
-//    {
-    S_b += -fastlgamma(blockmodel.block_size(delta.proposed_block()) + 1);  // _total[nr] + 1);
-    S_a += -fastlgamma(blockmodel.block_size(delta.proposed_block()) + weight + 1);  // _total[nr] + n + 1);
-
-//    std::cout << "point B: S_b = " << S_b << " S_a = " << S_a << std::endl;
-
-//    }
-
-//    int dN = 0;
-//    if (r == null_group)
-//        dN += n;
-//    if (nr == null_group)
-//        dN -= n;
-//
-//    S_b += lgamma_fast(_N + 1);
-//    S_a += lgamma_fast(_N + dN + 1);
-
-    int dB = 0;
-    if (blockmodel.block_size(delta.current_block()) == weight)
-        dB--;
-    if (blockmodel.block_size(delta.proposed_block()) == 0)
-        dB++;
-//    if (r != null_group && _total[r] == n)
-//        dB--;
-//    if (nr != null_group && _total[nr] == 0)
-//        dB++;
-
-    if (dB != 0) {
-        S_b += fastlbinom(num_vertices - 1, blockmodel.num_nonempty_blocks() - 1);
-        S_a += fastlbinom(num_vertices - 1, blockmodel.num_nonempty_blocks() + dB - 1);
-    }
-
-//    std::cout << "point C: S_b = " << S_b << " S_a = " << S_a << std::endl;
-
-//    if ((dN != 0 || dB != 0)) {
-//        S_b += lbinom_fast(_N - 1, _actual_B - 1);
-//        S_a += lbinom_fast(_N - 1 + dN, _actual_B + dB - 1);
-//    }
-
-//    if (dN != 0)
-//    {
-//        S_b += safelog_fast(_N);
-//        S_a += safelog_fast(_N + dN);
-//    }
-
-    dS += S_a - S_b;
-    assert(!std::isinf(dS));
-    assert(!std::isnan(dS));
-
-//    if (ea.partition_dl)
-//    {
-//        auto& ps = get_partition_stats(v);
-//        dS += ps.get_delta_partition_dl(v, r, nr, _vweight);
-//    }
-
-//    if (_coupled_state != nullptr)
-//    {
-//        bool r_vacate = (r != null_group && _wr[r] == _vweight[v]);
-//        bool nr_occupy = (nr != null_group && _wr[nr] == 0);
-//
-//        auto& bh = _coupled_state->get_b();
-//        if (r_vacate && nr_occupy)
-//        {
-//            dS += _coupled_state->get_delta_partition_dl(r, bh[r], bh[nr],
-//                                                         _coupled_entropy_args);
-//        }
-//        else
-//        {
-//            if (r_vacate)
-//                dS += _coupled_state->get_delta_partition_dl(r, bh[r], null_group,
-//                                                             _coupled_entropy_args);
-//            if (nr_occupy)
-//                dS += _coupled_state->get_delta_partition_dl(nr, null_group, bh[nr],
-//                                                             _coupled_entropy_args);
-//        }
-//    }
-//    std::cout << "delta partition dl: " << dS << std::endl;
-    return dS;
-}
-
-//template <class DegOP>
-//double get_delta_deg_dl_dist_change(size_t r, DegOP&& dop, int diff) {
-double get_delta_deg_dl_dist_change(const Blockmodel &blockmodel, long block, long vkin, long vkout, long vweight,
-                                    int diff) {
-//    if (!args.degreecorrected) return 0.0;
-    // vweight may be unnecessary. At least for the DOp portion
-    auto total_r = blockmodel.block_size(block);  // _total[r];
-    auto ep_r = blockmodel.degrees_out(block);  // _ep[r];
-    auto em_r = blockmodel.degrees_in(block);  // _em[r];
-
-    auto get_Se = [&](int delta, int kin, int kout) {
-        double S = 0;
-        assert(total_r + delta >= 0);
-        assert(em_r + kin >= 0);
-        assert(ep_r + kout >= 0);
-        S += log_q(em_r + kin, total_r + delta);
-        S += log_q(ep_r + kout, total_r + delta);
-        return S;
-    };
-
-    auto get_Sr = [&](int delta) {
-        assert(total_r + delta + 1 >= 0);
-        if (args.undirected)
-            return fastlgamma(total_r + delta + 1);
-        else
-            return 2 * fastlgamma(total_r + delta + 1);
-    };
-
-    auto get_Sk = [&](std::pair<long, long>& deg, int delta) {
-        double S = 0;
-        int nd = 0;
-        if (!args.undirected) {
-//            if (_hist_in[block] != nullptr) {
-//                auto& h = *_hist_in[block];
-//            if (blockmodel.in_degree_histogram(block) != nullptr) {
-            const MapVector<long> &histogram = blockmodel.in_degree_histogram(block);
-            auto iter = histogram.find(std::get<0>(deg));
-            if (iter != histogram.end())
-                nd = iter->second;
-//            }
-            S -= fastlgamma(nd + delta + 1);
-        }
-
-        nd = 0;
-//        if (_hist_out[block] != nullptr) {
-//        auto& h = *_hist_out[block];
-        const MapVector<long> &histogram = blockmodel.out_degree_histogram(block);
-        auto iter = histogram.find(std::get<1>(deg));
-        if (iter != histogram.end())
-            nd = iter->second;
-//        }
-
-        return S - fastlgamma(nd + delta + 1);
-    };
-
-    double S_b = 0, S_a = 0;
-    int tkin = 0, tkout = 0, n = 0;
-//    dop([&](size_t kin, size_t kout, int nk)
-//        {
-    tkin += vkin;  //  * vweight;
-    tkout += vkout;  //  * vweight;
-    n += vweight;
-
-    std::pair<long, long> deg = std::make_pair(vkin, vkout);
-    S_b += get_Sk(deg,         0);
-    S_a += get_Sk(deg, diff * vweight);
-//        });
-
-    S_b += get_Se(       0,           0,            0);
-    S_a += get_Se(diff * n, diff * tkin, diff * tkout);
-
-    S_b += get_Sr(       0);
-    S_a += get_Sr(diff * n);
-
-    return S_a - S_b;
-}
-
-
-//template <class Graph, class VProp, class EProp, class Degs>
-double get_delta_deg_dl(long vertex, const Blockmodel &blockmodel, const Delta &delta, const Graph &graph) {  // size_t r, size_t nr, VProp& vweight,
-//                        EProp& eweight, Degs& degs, Graph& g, int kind) {
-//    if (r == nr || vweight[v] == 0)
-//        return 0;
-    if (!args.degreecorrected) return 0.00;
-    if (delta.current_block() == delta.proposed_block()) return 0.;  // for block_merge, it's this || size(block) == 0
-//    if (r != null_group)
-//        r = get_r(r);
-//    if (nr != null_group)
-//        nr = get_r(nr);
-
-//    auto dop = [&](auto&& f) {
-//        long kin = graph.in_neighbors(vertex).size();
-//        long kout = graph.out_neighbors(vertex).size();
-////        auto [kin, kout] = get_deg(v, eweight, degs, g);
-//        f(kin, kout, 1);  // for block merge, it's size(block) instead of 1 | vweight[v]);
-//    };
-
-    long vkin = graph.in_neighbors(vertex).size();
-    long vkout = graph.out_neighbors(vertex).size();
-    double dS = 0;
-//    if (r != null_group)
-//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  dop, -1);
-    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  vkin, vkout, 1, -1);
-//    if (nr != null_group)
-//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), dop, +1);
-    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), vkin, vkout, 1, +1);
-    assert(!std::isinf(dS));
-    assert(!std::isnan(dS));
-
-//    }
-//    std::cout << "delta degree dl: " << dS << std::endl;
-    return dS;
-}
-
-//template <class VProp, class Graph>
-double get_delta_edges_dl(const Blockmodel &blockmodel, const Delta &delta, long weight, long num_edges) {
-    if (delta.current_block() == delta.proposed_block())
-        return 0;
-
-//    if (r != null_group)
-//        r = get_r(r);
-//    if (nr != null_group)
-//        nr = get_r(nr);
-
-    double S_b = 0, S_a = 0;
-
-//    int n = weight;  // vweight[v];
-
-//    if (n == 0)
-//    {
-//        if (r == null_group)
-//            n = 1;
-//        else
-//            return 0;
-//    }
-
-    int dB = 0;
-//    if (r != null_group && _total[r] == n)
-    if (blockmodel.block_size(delta.current_block()) == weight)
-        dB--;
-//    if (nr != null_group && _total[nr] == 0)
-    if (blockmodel.block_size(delta.proposed_block()) == 0)
-        dB++;
-
-    if (dB != 0) {
-        S_b += get_edges_dl(blockmodel.num_nonempty_blocks(), num_edges);
-        S_a += get_edges_dl(blockmodel.num_nonempty_blocks() + dB, num_edges);
-    }
-
-    double dS = S_a - S_b;
-    assert(!std::isinf(dS));
-    assert(!std::isnan(dS));
-//    std::cout << "delta edges dl: " << dS << std::endl;
-    return dS;
-}
-
-double delta_mdl(const Blockmodel &blockmodel, const Graph &graph, long vertex, const Delta &delta,
-                 const utils::ProposalAndEdgeCounts &proposal) {
-//    std::cout << blockmodel.block_assignment(vertex) << " != " << delta.current_block() << std::endl;
-    assert(blockmodel.block_assignment(vertex) == delta.current_block());
-
-//    get_move_entries(v, r, nr, m_entries, [](auto) constexpr { return false; });
-
-    if (delta.current_block() == delta.proposed_block()) return 0;
-//    if (r == nr || _vweight[v] == 0)
-//        return 0;
-
-    double dS = 0;
-    dS = virtual_move_sparse(blockmodel, delta, 1, proposal);  // <true>(v, r, nr, m_entries);
-
-    double dS_dl = 0;
-
-    dS_dl += get_delta_partition_dl(graph.num_vertices(), blockmodel, delta, 1);  // v, r, nr, ea);
-    assert(!std::isinf(dS_dl));
-    assert(!std::isnan(dS_dl));
-
-//    if (ea.degree_dl || ea.edges_dl) {
-//    auto& ps = get_partition_stats(v);
-//    if (_deg_corr && ea.degree_dl)
-    dS_dl += get_delta_deg_dl(vertex, blockmodel, delta, graph);  // v, r, nr, _vweight, _eweight, _degs, _g, ea.degree_dl_kind);
-//    if (ea.edges_dl)
-//    {
-//    size_t actual_B = 0;
-//    for (auto& ps : _partition_stats)
-//        actual_B += ps.get_actual_B();
-    dS_dl += get_delta_edges_dl(blockmodel, delta, 1, graph.num_edges());  // v, r, nr, _vweight, actual_B, _g);
-
-    return dS + BETA_DL * dS_dl;
-}
-
-double get_delta_deg_dl(const Blockmodel &blockmodel, const Delta &delta) {
-    if (!args.degreecorrected) return 0.0;
-
-    if (delta.current_block() == delta.proposed_block()) return 0.;  // for block_merge, it's this || size(block) == 0
-
-    long vkin = blockmodel.degrees_in(delta.current_block());  // graph.in_neighbors(vertex).size();
-    long vkout = blockmodel.degrees_out(delta.current_block());  // graph.out_neighbors(vertex).size();
-    long weight = blockmodel.block_size(delta.current_block());
-    double dS = 0;
-//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  dop, -1);
-    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  vkin, vkout, weight, -1);
-//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), dop, +1);
-    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), vkin, vkout, weight, +1);
-
-//    }
-    return dS;
-}
-
-double block_merge_delta_mdl(const Blockmodel &blockmodel, const utils::ProposalAndEdgeCounts &proposal,
-                             const Graph &graph, const Delta &delta) {
-//    get_move_entries(v, r, nr, m_entries, [](auto) constexpr { return false; });
-
-    if (delta.current_block() == delta.proposed_block()) return 0;
-
-    if (blockmodel.block_size(delta.current_block()) == 0 || blockmodel.block_size(delta.proposed_block()) == 0)
-        return 0;
-//    if (r == nr || _vweight[v] == 0)
-//        return 0;
-
-    double dS = 0;
-
-    dS = virtual_move_sparse(blockmodel, delta, blockmodel.block_size(delta.current_block()), proposal);  // <true>(v, r, nr, m_entries);
-
-    double dS_dl = 0;
-
-    dS_dl += get_delta_partition_dl(graph.num_vertices(), blockmodel, delta,
-                                    blockmodel.block_size(delta.current_block()));  // v, r, nr, ea);
-
-    dS_dl += get_delta_deg_dl(blockmodel, delta);  // v, r, nr, _vweight, _eweight, _degs, _g, ea.degree_dl_kind);
-
-    dS_dl += get_delta_edges_dl(blockmodel, delta, blockmodel.block_size(delta.current_block()), graph.num_edges());  // v, r, nr, _vweight, actual_B, _g);
-
-    return dS + BETA_DL * dS_dl;
-}
-
-}  // namespace nonparametric
-
-}  // namespace entropy
+#include "entropy.hpp"
+#include "fastlgamma.hpp"
+#include "spence.hpp"
+
+#include "cmath"
+
+namespace entropy {
+
+double block_merge_delta_mdl(long current_block, long proposal, long num_edges, const Blockmodel &blockmodel,
+                             EdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
+    // Blockmodel indexing
+    std::vector<long> old_block_row = blockmodel.blockmatrix()->getrow(current_block); // M_r_t1
+    std::vector<long> old_proposal_row = blockmodel.blockmatrix()->getrow(proposal);   // M_s_t1
+    std::vector<long> old_block_col = blockmodel.blockmatrix()->getcol(current_block); // M_t2_r
+    std::vector<long> old_proposal_col = blockmodel.blockmatrix()->getcol(proposal);   // M_t2_s
+
+    // Exclude current_block, proposal to prevent double counting
+    std::vector<long> new_proposal_col = common::exclude_indices(updates.proposal_col, current_block, proposal);
+    old_block_col = common::exclude_indices(old_block_col, current_block, proposal);       // M_t2_r
+    old_proposal_col = common::exclude_indices(old_proposal_col, current_block, proposal); // M_t2_s
+    std::vector<long> new_block_degrees_out = common::exclude_indices(block_degrees.block_degrees_out, current_block,
+                                                                     proposal);
+    std::vector<long> old_block_degrees_out = common::exclude_indices(blockmodel.degrees_out(),
+                                                                     current_block, proposal);
+
+    // Remove 0 indices
+    std::vector<long> new_proposal_row_degrees_in = common::index_nonzero(block_degrees.block_degrees_in,
+                                                                         updates.proposal_row);
+    std::vector<long> new_proposal_row = common::nonzeros(updates.proposal_row);
+    std::vector<long> new_proposal_col_degrees_out = common::index_nonzero(new_block_degrees_out, new_proposal_col);
+    new_proposal_col = common::nonzeros(new_proposal_col);
+
+    std::vector<long> old_block_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(),
+                                                                      old_block_row);
+    std::vector<long> old_proposal_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(),
+                                                                         old_proposal_row);
+    old_block_row = common::nonzeros(old_block_row);
+    old_proposal_row = common::nonzeros(old_proposal_row);
+    std::vector<long> old_block_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_block_col);
+    std::vector<long> old_proposal_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_proposal_col);
+    old_block_col = common::nonzeros(old_block_col);
+    old_proposal_col = common::nonzeros(old_proposal_col);
+
+    double delta_entropy = 0.0;
+    delta_entropy -= common::delta_entropy_temp(new_proposal_row, new_proposal_row_degrees_in,
+                                                block_degrees.block_degrees_out[proposal], num_edges);
+    delta_entropy -= common::delta_entropy_temp(new_proposal_col, new_proposal_col_degrees_out,
+                                                block_degrees.block_degrees_in[proposal], num_edges);
+    delta_entropy += common::delta_entropy_temp(old_block_row, old_block_row_degrees_in,
+                                                blockmodel.degrees_out(current_block), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_proposal_row, old_proposal_row_degrees_in,
+                                                blockmodel.degrees_out(proposal), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_block_col, old_block_col_degrees_out,
+                                                blockmodel.degrees_in(current_block), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_proposal_col, old_proposal_col_degrees_out,
+                                                blockmodel.degrees_in(proposal), num_edges);
+    return delta_entropy;
+}
+
+double block_merge_delta_mdl(long current_block, long proposal, long num_edges, const Blockmodel &blockmodel,
+                             SparseEdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
+    // Blockmodel indexing
+    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
+    const MapVector<long> &old_block_row = matrix->getrow_sparse(current_block); // M_r_t1
+    const MapVector<long> &old_proposal_row = matrix->getrow_sparse(proposal);   // M_s_t1
+    const MapVector<long> &old_block_col = matrix->getcol_sparse(current_block); // M_t2_r
+    const MapVector<long> &old_proposal_col = matrix->getcol_sparse(proposal);   // M_t2_s
+
+    double delta_entropy = 0.0;
+    delta_entropy -= common::delta_entropy_temp(updates.proposal_row, block_degrees.block_degrees_in,
+                                                block_degrees.block_degrees_out[proposal], num_edges);
+    delta_entropy -= common::delta_entropy_temp(updates.proposal_col, block_degrees.block_degrees_out,
+                                                block_degrees.block_degrees_in[proposal], current_block, proposal,
+                                                num_edges);
+    delta_entropy += common::delta_entropy_temp(old_block_row, blockmodel.degrees_in(),
+                                                blockmodel.degrees_out(current_block), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_proposal_row, blockmodel.degrees_in(),
+                                                blockmodel.degrees_out(proposal), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_block_col, blockmodel.degrees_out(),
+                                                blockmodel.degrees_in(current_block), current_block,
+                                                proposal, num_edges);
+    delta_entropy += common::delta_entropy_temp(old_proposal_col, blockmodel.degrees_out(),
+                                                blockmodel.degrees_in(proposal), current_block, proposal,
+                                                num_edges);
+    return delta_entropy;
+}
+
+double block_merge_delta_mdl(long current_block, const Blockmodel &blockmodel, const Delta &delta,
+                             common::NewBlockDegrees &block_degrees) {
+    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
+    double delta_entropy = 0.0;
+    long proposed_block = delta.proposed_block();
+    for (const std::tuple<long, long, long> &entry: delta.entries()) {
+        long row = std::get<0>(entry);
+        long col = std::get<1>(entry);
+        long change = std::get<2>(entry);
+        // delta += + E(old) - E(new)
+        delta_entropy += common::cell_entropy(matrix->get(row, col), blockmodel.degrees_in(col),
+                                              blockmodel.degrees_out(row));
+        if (row == current_block || col == current_block) continue;  // the "new" cell entropy == 0;
+        delta_entropy -= common::cell_entropy(matrix->get(row, col) + change, block_degrees.block_degrees_in[col],
+                                              block_degrees.block_degrees_out[row]);
+    }
+    for (const LongEntry &entry: blockmodel.blockmatrix()->getrow_sparse(proposed_block)) {
+        long row = proposed_block;
+        long col = entry.first;
+        long value = entry.second;
+        if (delta.get(row, col) != 0) continue;
+        // Value has not changed
+        delta_entropy += common::cell_entropy((double) value, (double) blockmodel.degrees_in(col),
+                                              (double) blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy((double) value, (double) block_degrees.block_degrees_in[col],
+                                              (double) block_degrees.block_degrees_out[row]);
+    }
+    for (const LongEntry &entry: blockmodel.blockmatrix()->getcol_sparse(proposed_block)) {
+        long row = entry.first;
+        long col = proposed_block;
+        long value = entry.second;
+        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
+        // Value has not changed and we're not double counting
+        delta_entropy += common::cell_entropy((double) value, (double) blockmodel.degrees_in(col),
+                                              (double) blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy((double) value, (double) block_degrees.block_degrees_in[col],
+                                              (double) block_degrees.block_degrees_out[row]);
+    }
+    return delta_entropy;
+}
+
+double block_merge_delta_mdl(long current_block, utils::ProposalAndEdgeCounts proposal, const Blockmodel &blockmodel,
+                             const Delta &delta) {
+    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
+    double delta_entropy = 0.0;
+    long proposed_block = delta.proposed_block();
+    auto get_deg_in = [&blockmodel, &proposal, current_block, proposed_block](long index) -> double {
+        long value = blockmodel.degrees_in(index);
+        if (index == current_block)
+            value -= proposal.num_in_neighbor_edges;
+        else if (index == proposed_block)
+            value += proposal.num_in_neighbor_edges;
+        return double(value);
+    };
+    auto get_deg_out = [&blockmodel, &proposal, current_block, proposed_block](long index) -> double {
+        long value = blockmodel.degrees_out(index);
+        if (index == current_block)
+            value -= proposal.num_out_neighbor_edges;
+        else if (index == proposed_block)
+            value += proposal.num_out_neighbor_edges;
+        return double(value);
+    };
+    for (const std::tuple<long, long, long> &entry: delta.entries()) {
+        long row = std::get<0>(entry);
+        long col = std::get<1>(entry);
+        auto change = (double) std::get<2>(entry);
+        // delta += + E(old) - E(new)
+        auto value = (double) matrix->get(row, col);
+        delta_entropy += common::cell_entropy(value, (double) blockmodel.degrees_in(col),
+                                              (double) blockmodel.degrees_out(row));
+        if (row == current_block || col == current_block) continue;  // the "new" cell entropy == 0;
+        delta_entropy -= common::cell_entropy(value + change, get_deg_in(col), get_deg_out(row));
+    }
+    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getrow_sparse(proposed_block)) {
+        long row = proposed_block;
+        long col = entry.first;
+        auto value = (double) entry.second;
+        if (delta.get(row, col) != 0) continue;
+        // Value has not changed
+        delta_entropy += common::cell_entropy((double) value, (double) blockmodel.degrees_in(col),
+                                              (double) blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
+    }
+    for (const std::pair<long, long> &entry: blockmodel.blockmatrix()->getcol_sparse(proposed_block)) {
+        long row = entry.first;
+        long col = proposed_block;
+        auto value = (double) entry.second;
+        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
+        // Value has not changed and we're not double counting
+        delta_entropy += common::cell_entropy(value, (double) blockmodel.degrees_in(col),
+                                              (double) blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
+    }
+    return delta_entropy;
+}
+
+double delta_mdl(long current_block, long proposal, const Blockmodel &blockmodel, long num_edges,
+                 EdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
+    // Blockmodel indexing
+    std::vector<long> old_block_row = blockmodel.blockmatrix()->getrow(current_block); // M_r_t1
+    std::vector<long> old_proposal_row = blockmodel.blockmatrix()->getrow(proposal);   // M_s_t1
+    std::vector<long> old_block_col = blockmodel.blockmatrix()->getcol(current_block); // M_t2_r
+    std::vector<long> old_proposal_col = blockmodel.blockmatrix()->getcol(proposal);   // M_t2_s
+
+    // Exclude current_block, proposal to prevent double counting
+    std::vector<long> new_block_col = common::exclude_indices(updates.block_col, current_block, proposal); // added
+    std::vector<long> new_proposal_col = common::exclude_indices(updates.proposal_col, current_block, proposal);
+    old_block_col = common::exclude_indices(old_block_col, current_block, proposal);       // M_t2_r
+    old_proposal_col = common::exclude_indices(old_proposal_col, current_block, proposal); // M_t2_s
+    std::vector<long> new_block_degrees_out = common::exclude_indices(block_degrees.block_degrees_out, current_block,
+                                                                     proposal);
+    std::vector<long> old_block_degrees_out = common::exclude_indices(blockmodel.degrees_out(), current_block, proposal);
+
+    // Remove 0 indices
+    std::vector<long> new_block_row_degrees_in = common::index_nonzero(block_degrees.block_degrees_in,
+                                                                      updates.block_row); // added
+    std::vector<long> new_proposal_row_degrees_in = common::index_nonzero(block_degrees.block_degrees_in,
+                                                                         updates.proposal_row);
+    std::vector<long> new_block_row = common::nonzeros(updates.block_row); // added
+    std::vector<long> new_proposal_row = common::nonzeros(updates.proposal_row);
+    std::vector<long> new_block_col_degrees_out = common::index_nonzero(new_block_degrees_out, new_block_col); // added
+    std::vector<long> new_proposal_col_degrees_out = common::index_nonzero(new_block_degrees_out, new_proposal_col);
+    new_block_col = common::nonzeros(new_block_col); // added
+    new_proposal_col = common::nonzeros(new_proposal_col);
+
+    std::vector<long> old_block_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(), old_block_row);
+    std::vector<long> old_proposal_row_degrees_in = common::index_nonzero(blockmodel.degrees_in(), old_proposal_row);
+    old_block_row = common::nonzeros(old_block_row);
+    old_proposal_row = common::nonzeros(old_proposal_row);
+    std::vector<long> old_block_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_block_col);
+    std::vector<long> old_proposal_col_degrees_out = common::index_nonzero(old_block_degrees_out, old_proposal_col);
+    old_block_col = common::nonzeros(old_block_col);
+    old_proposal_col = common::nonzeros(old_proposal_col);
+
+    double delta_entropy = 0.0;
+    delta_entropy -= common::delta_entropy_temp(new_block_row, new_block_row_degrees_in,
+                                                block_degrees.block_degrees_out[current_block], num_edges); // added
+    delta_entropy -= common::delta_entropy_temp(new_proposal_row, new_proposal_row_degrees_in,
+                                                block_degrees.block_degrees_out[proposal], num_edges);
+    delta_entropy -= common::delta_entropy_temp(new_block_col, new_block_col_degrees_out,
+                                                block_degrees.block_degrees_in[current_block], num_edges); // added
+    delta_entropy -= common::delta_entropy_temp(new_proposal_col, new_proposal_col_degrees_out,
+                                                block_degrees.block_degrees_in[proposal], num_edges);
+    delta_entropy += common::delta_entropy_temp(old_block_row, old_block_row_degrees_in,
+                                                blockmodel.degrees_out(current_block), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_proposal_row, old_proposal_row_degrees_in,
+                                                blockmodel.degrees_out(proposal), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_block_col, old_block_col_degrees_out,
+                                                blockmodel.degrees_in(current_block), num_edges);
+    delta_entropy += common::delta_entropy_temp(old_proposal_col, old_proposal_col_degrees_out,
+                                                blockmodel.degrees_in(proposal), num_edges);
+    if (std::isnan(delta_entropy)) {
+        std::cout << "Error: Dense delta entropy is NaN" << std::endl;
+        exit(-142321);
+    }
+    return delta_entropy;
+}
+
+double delta_mdl(long current_block, long proposal, const Blockmodel &blockmodel, long num_edges,
+                 SparseEdgeCountUpdates &updates, common::NewBlockDegrees &block_degrees) {
+    // Blockmodel indexing
+    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
+    const MapVector<long> &old_block_row = matrix->getrow_sparseref(current_block); // M_r_t1
+    const MapVector<long> &old_proposal_row = matrix->getrow_sparseref(proposal);   // M_s_t1
+    const MapVector<long> &old_block_col = matrix->getcol_sparseref(current_block); // M_t2_r
+    const MapVector<long> &old_proposal_col = matrix->getcol_sparseref(proposal);   // M_t2_s
+
+    double delta_entropy = 0.0;
+    delta_entropy -= common::delta_entropy_temp(updates.block_row, block_degrees.block_degrees_in,
+                                                block_degrees.block_degrees_out[current_block], num_edges);
+    assert(!std::isnan(delta_entropy));
+    delta_entropy -= common::delta_entropy_temp(updates.proposal_row, block_degrees.block_degrees_in,
+                                                block_degrees.block_degrees_out[proposal], num_edges);
+    assert(!std::isnan(delta_entropy));
+    delta_entropy -= common::delta_entropy_temp(updates.block_col, block_degrees.block_degrees_out,
+                                                block_degrees.block_degrees_in[current_block], current_block, proposal,
+                                                num_edges);
+    if (std::isnan(delta_entropy)) {
+        std::cout << "block_col: ";
+        utils::print<long>(updates.block_col);
+        std::cout << "_block_degrees_out: ";
+        utils::print<long>(block_degrees.block_degrees_out);
+        std::cout << "block_degree in: " << block_degrees.block_degrees_in[current_block] << std::endl;
+    }
+    assert(!std::isnan(delta_entropy));
+    delta_entropy -= common::delta_entropy_temp(updates.proposal_col, block_degrees.block_degrees_out,
+                                                block_degrees.block_degrees_in[proposal], current_block, proposal,
+                                                num_edges);
+    assert(!std::isnan(delta_entropy));
+    delta_entropy += common::delta_entropy_temp(old_block_row, blockmodel.degrees_in(),
+                                                blockmodel.degrees_out(current_block), num_edges);
+    assert(!std::isnan(delta_entropy));
+    delta_entropy += common::delta_entropy_temp(old_proposal_row, blockmodel.degrees_in(),
+                                                blockmodel.degrees_out(proposal), num_edges);
+    assert(!std::isnan(delta_entropy));
+    delta_entropy += common::delta_entropy_temp(old_block_col, blockmodel.degrees_out(),
+                                                blockmodel.degrees_in(current_block), current_block,
+                                                proposal, num_edges);
+    assert(!std::isnan(delta_entropy));
+    delta_entropy += common::delta_entropy_temp(old_proposal_col, blockmodel.degrees_out(),
+                                                blockmodel.degrees_in(proposal), current_block, proposal,
+                                                num_edges);
+    assert(!std::isnan(delta_entropy));
+    if (std::isnan(delta_entropy)) {
+        std::cerr << "ERROR " << "Error: Sparse delta entropy is NaN" << std::endl;
+        exit(-142321);
+    }
+    return delta_entropy;
+}
+
+double delta_mdl(const Blockmodel &blockmodel, const Delta &delta, const utils::ProposalAndEdgeCounts &proposal) {
+    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
+    double delta_entropy = 0.0;
+    long current_block = delta.current_block();
+    long proposed_block = delta.proposed_block();
+    auto get_deg_in = [&blockmodel, &proposal, &delta, current_block, proposed_block](long index) -> size_t {
+        long value = blockmodel.degrees_in(index);
+        if (index == current_block)
+            value -= (proposal.num_in_neighbor_edges + delta.self_edge_weight());
+        else if (index == proposed_block)
+            value += (proposal.num_in_neighbor_edges + delta.self_edge_weight());
+        return value;
+    };
+    auto get_deg_out = [&blockmodel, &proposal, current_block, proposed_block](long index) -> size_t {
+        long value = blockmodel.degrees_out(index);
+        if (index == current_block)
+            value -= proposal.num_out_neighbor_edges;
+        else if (index == proposed_block)
+            value += proposal.num_out_neighbor_edges;
+        return value;
+    };
+    for (const std::tuple<long, long, long> &entry: delta.entries()) {
+        long row = std::get<0>(entry);
+        long col = std::get<1>(entry);
+        long change = std::get<2>(entry);
+        delta_entropy += common::cell_entropy(matrix->get(row, col), blockmodel.degrees_in(col),
+                                              blockmodel.degrees_out(row));
+        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
+            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << matrix->get(row, col) << " delta: " << change << std::endl;
+            utils::print<long>(blockmodel.blockmatrix()->getrow_sparse(row));
+            utils::print<long>(blockmodel.blockmatrix()->getcol_sparse(col));
+            std::cout << "d_out[row]: " << blockmodel.degrees_out(row) << " d_in[col]: " << blockmodel.degrees_in(row) << std::endl;
+            throw std::invalid_argument("nan/inf in bm delta for old bm when delta != 0");
+        }
+        delta_entropy -= common::cell_entropy(matrix->get(row, col) + change, get_deg_in(col),
+                                              get_deg_out(row));
+        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
+            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << matrix->get(row, col) << " delta: " << change;
+            std::cout << " current: " << current_block << " proposed: " << proposed_block << std::endl;
+            utils::print<long>(blockmodel.blockmatrix()->getrow_sparse(row));
+            utils::print<long>(blockmodel.blockmatrix()->getcol_sparse(col));
+            std::cout << "d_out[row]: " << blockmodel.degrees_out(row) << " d_in[col]: " << blockmodel.degrees_in(col) << std::endl;
+            std::cout << "new d_out[row]: " << get_deg_out(row) << " d_in[col]: " << get_deg_in(col) << std::endl;
+            std::cout << "v_out: " << proposal.num_out_neighbor_edges << " v_in: " << proposal.num_in_neighbor_edges << " v_total: " << proposal.num_neighbor_edges << std::endl;
+            throw std::invalid_argument("nan/inf in bm delta for new bm when delta != 0");
+        }
+    }
+    // Compute change in entropy for cells with no delta
+    for (const auto &entry: blockmodel.blockmatrix()->getrow_sparseref(current_block)) {
+        long row = current_block;
+        long col = entry.first;
+        long value = entry.second;
+        if (delta.get(row, col) != 0) continue;
+        // Value has not changed
+        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
+                                              blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
+        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
+            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
+            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and row = current block");
+        }
+    }
+    for (const auto &entry: blockmodel.blockmatrix()->getrow_sparseref(proposed_block)) {
+        long row = proposed_block;
+        long col = entry.first;
+        long value = entry.second;
+        if (delta.get(row, col) != 0) continue;
+        // Value has not changed
+        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
+                                              blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
+        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
+            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
+            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and row = proposed block");
+        }
+    }
+    for (const auto &entry: blockmodel.blockmatrix()->getcol_sparseref(current_block)) {
+        long row = entry.first;
+        long col = current_block;
+        long value = entry.second;
+        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
+        // Value has not changed and we're not double counting
+        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
+                                              blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
+        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
+            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
+            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and col = current block");
+        }
+    }
+    for (const auto &entry: blockmodel.blockmatrix()->getcol_sparseref(proposed_block)) {
+        long row = entry.first;
+        long col = proposed_block;
+        long value = entry.second;
+        if (delta.get(row, col) != 0 || row == current_block || row == proposed_block) continue;
+        // Value has not changed and we're not double counting
+        delta_entropy += common::cell_entropy(value, blockmodel.degrees_in(col),
+                                              blockmodel.degrees_out(row));
+        delta_entropy -= common::cell_entropy(value, get_deg_in(col), get_deg_out(row));
+        if (std::isnan(delta_entropy) || std::isinf(delta_entropy)) {
+            std::cout << delta_entropy << " for row: " << row << " col: " << col << " val: " << value << " delta: 0" << std::endl;
+            throw std::invalid_argument("nan/inf in bm delta when delta = 0 and col = proposed block");
+        }
+    }
+    return delta_entropy;
+}
+
+double hastings_correction(const Blockmodel &blockmodel, EdgeWeights &out_blocks, EdgeWeights &in_blocks,
+                           utils::ProposalAndEdgeCounts &proposal, EdgeCountUpdates &updates,
+                           common::NewBlockDegrees &new_block_degrees) {
+    if (proposal.num_neighbor_edges == 0 || !args.hastings_correction) {
+        return 1.0;
+    }
+    // Compute block weights
+    std::map<long, long> block_counts;
+    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
+        long block = out_blocks.indices[i];
+        long weight = out_blocks.values[i];
+        block_counts[block] += weight; // block_count[new block] should initialize to 0
+    }
+    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
+        long block = in_blocks.indices[i];
+        long weight = in_blocks.values[i];
+        block_counts[block] += weight; // block_count[new block] should initialize to 0
+    }
+    // Create Arrays using unique blocks
+    size_t num_unique_blocks = block_counts.size();
+    std::vector<double> counts(num_unique_blocks, 0);
+    std::vector<double> proposal_weights(num_unique_blocks, 0);
+    std::vector<double> block_weights(num_unique_blocks, 0);
+    std::vector<double> block_degrees(num_unique_blocks, 0);
+    std::vector<double> proposal_degrees(num_unique_blocks, 0);
+    // Indexing
+    std::vector<long> proposal_row = blockmodel.blockmatrix()->getrow(proposal.proposal);
+    std::vector<long> proposal_col = blockmodel.blockmatrix()->getcol(proposal.proposal);
+    // Fill Arrays
+    long index = 0;
+    long num_blocks = blockmodel.num_blocks();
+    const std::vector<long> &current_block_degrees = blockmodel.degrees();
+    for (auto const &entry: block_counts) {
+        counts[index] = entry.second;
+        proposal_weights[index] = proposal_row[entry.first] + proposal_col[entry.first] + 1.0;
+        block_degrees[index] = current_block_degrees[entry.first] + num_blocks;
+        block_weights[index] = updates.block_row[entry.first] + updates.block_col[entry.first] + 1.0;
+        proposal_degrees[index] = new_block_degrees.block_degrees[entry.first] + num_blocks;
+        index++;
+    }
+    // Compute p_forward and p_backward
+    auto p_forward = utils::sum<double>(counts * proposal_weights / block_degrees);
+    auto p_backward = utils::sum<double>(counts * block_weights / proposal_degrees);
+    return p_backward / p_forward;
+}
+
+double hastings_correction(const Blockmodel &blockmodel, EdgeWeights &out_blocks, EdgeWeights &in_blocks,
+                           utils::ProposalAndEdgeCounts &proposal, SparseEdgeCountUpdates &updates,
+                           common::NewBlockDegrees &new_block_degrees) {
+    if (proposal.num_neighbor_edges == 0 || !args.hastings_correction) {
+        return 1.0;
+    }
+    // Compute block weights
+    std::map<long, long> block_counts;
+    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
+        long block = out_blocks.indices[i];
+        long weight = out_blocks.values[i];
+        block_counts[block] += weight; // block_count[new block] should initialize to 0
+    }
+    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
+        long block = in_blocks.indices[i];
+        long weight = in_blocks.values[i];
+        block_counts[block] += weight; // block_count[new block] should initialize to 0
+    }
+    // Create Arrays using unique blocks
+    size_t num_unique_blocks = block_counts.size();
+    std::vector<double> counts(num_unique_blocks, 0);
+    std::vector<double> proposal_weights(num_unique_blocks, 0);
+    std::vector<double> block_weights(num_unique_blocks, 0);
+    std::vector<double> block_degrees(num_unique_blocks, 0);
+    std::vector<double> proposal_degrees(num_unique_blocks, 0);
+    // Indexing
+    std::vector<long> proposal_row = blockmodel.blockmatrix()->getrow(proposal.proposal);
+    std::vector<long> proposal_col = blockmodel.blockmatrix()->getcol(proposal.proposal);
+    // Fill Arrays
+    long index = 0;
+    long num_blocks = blockmodel.num_blocks();
+    const std::vector<long> &current_block_degrees = blockmodel.degrees();
+    for (auto const &entry: block_counts) {
+        counts[index] = entry.second;
+        proposal_weights[index] = proposal_row[entry.first] + proposal_col[entry.first] + 1.0;
+        block_degrees[index] = current_block_degrees[entry.first] + num_blocks;
+        block_weights[index] = updates.block_row[entry.first] + updates.block_col[entry.first] + 1.0;
+        proposal_degrees[index] = new_block_degrees.block_degrees[entry.first] + num_blocks;
+        index++;
+    }
+    // Compute p_forward and p_backward
+    auto p_forward = utils::sum<double>(counts * proposal_weights / block_degrees);
+    auto p_backward = utils::sum<double>(counts * block_weights / proposal_degrees);
+    return p_backward / p_forward;
+}
+
+double hastings_correction(long vertex, const Graph &graph, const Blockmodel &blockmodel, const Delta &delta,
+                           long current_block, const utils::ProposalAndEdgeCounts &proposal) {
+    if (proposal.num_neighbor_edges == 0 || !args.hastings_correction || !args.parametric) {  // No correction if disabled or in nonparametric mode
+        return 1.0;
+    }
+    // Compute block weights
+    MapVector<long> block_counts;
+    for (const long neighbor: graph.out_neighbors(vertex)) {
+        long neighbor_block = blockmodel.block_assignment(neighbor);
+        block_counts[neighbor_block] += 1;
+    }
+    for (const long neighbor: graph.in_neighbors(vertex)) {
+        if (neighbor == vertex) continue;
+        long neighbor_block = blockmodel.block_assignment(neighbor);
+        block_counts[neighbor_block] += 1;
+    }
+    // Create Arrays using unique blocks
+    size_t num_unique_blocks = block_counts.size();
+    std::vector<double> counts(num_unique_blocks, 0);
+    std::vector<double> proposal_weights(num_unique_blocks, 0);
+    std::vector<double> block_weights(num_unique_blocks, 0);
+    std::vector<double> block_degrees(num_unique_blocks, 0);
+    std::vector<double> proposal_degrees(num_unique_blocks, 0);
+    // Indexing
+//    std::vector<long> proposal_row = blockmodel.blockmatrix()->getrow(proposal.proposal);
+//    std::vector<long> proposal_col = blockmodel.blockmatrix()->getcol(proposal.proposal);
+    const MapVector<long> &proposal_row = blockmodel.blockmatrix()->getrow_sparseref(proposal.proposal);
+    const MapVector<long> &proposal_col = blockmodel.blockmatrix()->getcol_sparseref(proposal.proposal);
+    // Fill Arrays
+    long index = 0;
+    long num_blocks = blockmodel.num_blocks();
+    const std::vector<long> &current_block_degrees = blockmodel.degrees();
+    for (auto const &entry: block_counts) {
+        counts[index] = entry.second;
+        proposal_weights[index] = map_vector::get(proposal_row, entry.first) + map_vector::get(proposal_col, entry.first) + 1.0;
+        block_degrees[index] = current_block_degrees[entry.first] + num_blocks;
+        block_weights[index] = blockmodel.blockmatrix()->get(current_block, entry.first) +
+                               delta.get(current_block, entry.first) +
+                               //                get(delta, std::make_pair(current_block, entry.first)) +
+                               blockmodel.blockmatrix()->get(entry.first, current_block) +
+                               delta.get(entry.first, current_block) + 1.0;
+//                get(delta, std::make_pair(entry.first, current_block)) + 1.0;
+        long new_block_degree = blockmodel.degrees(entry.first);
+        if (entry.first == current_block) {
+            long current_block_self_edges = blockmodel.blockmatrix()->get(current_block, current_block)
+                                           + delta.get(current_block, current_block);
+            long degree_out = blockmodel.degrees_out(current_block) - proposal.num_out_neighbor_edges;
+            long degree_in = blockmodel.degrees_in(current_block) - proposal.num_in_neighbor_edges;
+            new_block_degree = degree_out + degree_in - current_block_self_edges;
+        } else if (entry.first == proposal.proposal) {
+            long proposed_block_self_edges = blockmodel.blockmatrix()->get(proposal.proposal, proposal.proposal)
+                                            + delta.get(proposal.proposal, proposal.proposal);
+            long degree_out = blockmodel.degrees_out(proposal.proposal) + proposal.num_out_neighbor_edges;
+            long degree_in = blockmodel.degrees_in(proposal.proposal) + proposal.num_in_neighbor_edges;
+            new_block_degree = degree_out + degree_in - proposed_block_self_edges;
+        }
+//        proposal_degrees[index] = new_block_degrees.block_degrees[entry.first] + _num_blocks;
+        proposal_degrees[index] = new_block_degree + num_blocks;
+        index++;
+    }
+    // Compute p_forward and p_backward
+    auto p_forward = utils::sum<double>(counts * proposal_weights / block_degrees);
+    auto p_backward = utils::sum<double>(counts * block_weights / proposal_degrees);
+    return p_backward / p_forward;
+}
+
+double normalize_mdl_v1(double mdl, const Graph &graph) {
+    return mdl / null_mdl_v1(graph);
+}
+
+//double normalize_mdl_v2(double mdl, long num_vertices, long num_edges) {
+//    return mdl / null_mdl_v2(num_vertices, num_edges);
+//}
+
+double null_mdl_v1(const Graph &graph) {
+    if (!args.parametric) {
+//        std::cout << "why is this running nonparametric?" << std::endl;
+        std::vector<long> assignment = utils::constant<long>(graph.num_vertices(), 0);
+        Blockmodel null_model(1, graph, 0.5, assignment);
+        return mdl(null_model, graph);
+    }
+    std::cout << "running correct version at least..." << std::endl;
+    std::cout << graph.num_edges() << std::endl;
+    double log_posterior_p = graph.num_edges() * log(1.0 / graph.num_edges());
+    double x = 1.0 / graph.num_edges();
+    double h = ((1 + x) * log(1 + x)) - (x * log(x));
+    std::cout << "log posterior = " << log_posterior_p << " blockmodel = " << (graph.num_edges() * h) << std::endl;
+    return (graph.num_edges() * h) - log_posterior_p;
+}
+
+/** double null_mdl_v2(long num_vertices, long num_edges) {
+    // TODO: not sure how this works in nonparametric version
+    double log_posterior_p = num_edges * log(1.0 / num_edges);
+    // done calculating log_posterior_probability
+    double x = pow(num_vertices, 2) / num_edges;
+    double h = ((1 + x) * log(1 + x)) - (x * log(x));
+//    std::cout << "log posterior = " << log_posterior_p << " blockmodel = " << (num_edges * h) + (num_vertices * log(num_vertices)) << std::endl;
+    return (num_edges * h) + (num_vertices * log(num_vertices)) - log_posterior_p;
+} */
+
+double mdl(const Blockmodel &blockmodel, const Graph &graph) {
+    if (!args.parametric)
+        return nonparametric::mdl(blockmodel, graph);
+    double log_posterior_p = blockmodel.log_posterior_probability();
+    double x = pow(blockmodel.num_blocks(), 2) / graph.num_edges();
+    double h = ((1 + x) * log(1 + x)) - (x * log(x));
+    return (graph.num_edges() * h) + (graph.num_vertices() * log(blockmodel.num_blocks())) - log_posterior_p;
+}
+
+namespace dist {
+
+double mdl(const TwoHopBlockmodel &blockmodel, long num_vertices, long num_edges) {
+    double log_posterior_p = blockmodel.log_posterior_probability();
+    double x = pow(blockmodel.num_blocks(), 2) / num_edges;
+    double h = ((1 + x) * log(1 + x)) - (x * log(x));
+    return (num_edges * h) + (num_vertices * log(blockmodel.num_blocks())) - log_posterior_p;
+}
+
+}  // namespace dist
+
+namespace nonparametric {
+
+double get_deg_entropy(const Graph &graph, long vertex) {  // , const simple_degs_t&) {
+    long k_in = (long) graph.in_neighbors(vertex).size();
+    long k_out = (long) graph.out_neighbors(vertex).size();
+    return -fastlgamma(k_in + 1) - fastlgamma(k_out + 1);
+}
+
+double sparse_entropy(const Blockmodel &blockmodel, const Graph &graph) {
+    double S = 0;
+
+    for (long source = 0; source < blockmodel.num_blocks(); ++source) {
+        const MapVector<long> &row = blockmodel.blockmatrix()->getrow_sparseref(source);
+        for (const std::pair<long, long> &entry : row) {
+            long destination = entry.first;
+            long weight = entry.second;
+            S += eterm_exact(source, destination, weight);
+            assert(!std::isinf(S));
+            assert(!std::isnan(S));        }
+    }
+
+    for (long block = 0; block < blockmodel.num_blocks(); ++block) {
+        S += vterm_exact(blockmodel.degrees_out(block), blockmodel.degrees_in(block), blockmodel.block_size(block));
+        assert(!std::isinf(S));
+        assert(!std::isnan(S));    }
+
+//    std::cout << "S before deg_ent?ropy: " << S << std::endl;
+    if (!args.degreecorrected) return S;
+//    double S2 = 0.0;
+    // In distributed case, we would only compute these for vertices we're responsible for. Since it's a simple addition, we can do an allreduce.
+    for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
+//        S += get_deg_entropy(graph, vertex);
+        double temp = get_deg_entropy(graph, vertex);
+        S += temp;
+        assert(!std::isinf(S));
+        assert(!std::isnan(S));
+//        S2 += temp;
+    }
+
+//    std::cout << "deg_entropy: " << S2 << std::endl;
+
+    return S;
+}
+
+double get_partition_dl(long N, const Blockmodel &blockmodel) { // _N = number of vertices, _actual_B = nonzero blocks, _total = vector of block sizes
+    double S = 0;
+    S += fastlbinom(N - 1, blockmodel.num_nonempty_blocks() - 1);
+    S += fastlgamma(N + 1);
+    for (const long &block_size : blockmodel.block_sizes())
+        S -= fastlgamma(block_size + 1);
+    S += fastlog(N);
+//    std::cout << "Partition dl: " << S << std::endl;
+    assert(!std::isinf(S));
+    assert(!std::isnan(S));
+    return S;
+}
+
+/// No idea what this function does. See int_part.cc in https://git.skewed.de/count0/graph-tool
+double get_v(double u, double epsilon) {
+    double v = u;
+    double delta = 1;
+    while (delta > epsilon) {
+        double n_v = u * sqrt(spence(exp(-v)));
+        delta = abs(n_v - v);
+        v = n_v;
+    }
+    return v;
+}
+
+double log_q_approx_small(size_t n, size_t k) {
+    return fastlbinom(n - 1, k - 1) - fastlgamma(k + 1);
+}
+
+/// Computes the number of restricted of integer n into at most m parts. This is part of teh prior for the
+/// degree-corrected SBM.
+/// TO-DO: the current function contains only the approximation of log_q. If it becomes a bottleneck, you'll want to
+/// compute a cache of log_q(n, m) for ~20k n and maybe a few hundred m? I feel like for larger graphs, the cache
+/// will be a waste of time.
+/// See int_part.cc in https://git.skewed.de/count0/graph-tool
+double log_q(size_t n, size_t k) {
+    if (n <= 0 || k < 1) return 0;
+    if (k > n) k = n;
+    if (k < pow(n, 1/4.))
+        return log_q_approx_small(n, k);
+    double u = k / sqrt(n);
+    double v = get_v(u);
+    double lf = log(v) - log1p(- exp(-v) * (1 + u * u/2)) / 2 - log(2) * 3 / 2.
+                - log(u) - log(M_PI);
+    double g = 2 * v / u - u * log1p(-exp(-v));
+    return lf - log(n) + sqrt(n) * g;
+}
+
+double get_deg_dl_dist(const Blockmodel &blockmodel) { // Rs&& rs, Ks&& ks) {  // RS: range from 0 to B, KS is an empty array of pairs?
+    if (!args.degreecorrected) return 0.0;
+
+    double S = 0;
+
+    for (int block = 0; block < blockmodel.num_blocks(); ++block) {
+        S += log_q(blockmodel.degrees_out(block), blockmodel.block_size(block));
+        S += log_q(blockmodel.degrees_in(block), blockmodel.block_size(block));
+        size_t total = 0;
+        if (!args.undirected) {
+            for (const std::pair<long, long> &entry : blockmodel.in_degree_histogram(block)) {
+                S -= fastlgamma(entry.second + 1);
+                assert(!std::isinf(S));
+                assert(!std::isnan(S));
+            }
+        }
+        for (const std::pair<long, long> &entry : blockmodel.out_degree_histogram(block)) {
+            S -= fastlgamma(entry.second + 1);
+            assert(!std::isinf(S));
+            assert(!std::isnan(S));
+            total += entry.second;
+        }
+
+        if (args.undirected) {
+            S += fastlgamma(total + 1);
+        } else {
+            S += 2 * fastlgamma(total + 1);
+        }
+        assert(!std::isinf(S));
+        assert(!std::isnan(S));
+    }
+//    std::cout << "degree_dl: " << S << std::endl;
+    return S;
+}
+
+double get_edges_dl(size_t B, size_t E) {
+    size_t NB = !args.undirected ? B * B : (B * (B + 1)) / 2;
+    double E_dl = fastlbinom(NB + E - 1, E);
+//    std::cout << "edges_dl: " << E_dl << std::endl;
+    return E_dl;
+}
+
+double mdl(const Blockmodel &blockmodel, const Graph &graph) {
+    double S = 0, S_dl = 0;
+
+    S = sparse_entropy(blockmodel, graph);
+    assert(!std::isinf(S));
+    assert(!std::isnan(S));
+//    std::cout << "sparse E: " << S << std::endl;
+
+    S_dl += get_partition_dl(graph.num_vertices(), blockmodel);
+    assert(!std::isinf(S_dl));
+    assert(!std::isnan(S_dl));
+//    std::cout << "partition_dl: " << S_dl << std::endl;
+
+    S_dl += get_deg_dl_dist(blockmodel);  // (ea.degree_dl_kind);
+    assert(!std::isinf(S_dl));
+    assert(!std::isnan(S_dl));
+//    std::cout << "after deg_dl: " << S_dl << std::endl;
+
+//    std::cout << "NB: " << blockmodel.num_nonempty_blocks() << " E: " << graph.num_edges() << std::endl;
+//    S_dl += get_edges_dl(blockmodel.num_nonempty_blocks(), graph.num_edges());
+    double E_dl = get_edges_dl(blockmodel.num_nonempty_blocks(), graph.num_edges());
+//    std::cout << "edges_dl: " << E_dl << std::endl;
+    S_dl += E_dl;
+    assert(!std::isinf(S_dl));
+    assert(!std::isnan(S_dl));
+//    std::cout << "after edge_dl: " << S_dl << std::endl;
+//    utils::print(blockmodel.block_sizes());
+    return S + S_dl * BETA_DL;
+}
+
+// obtain the entropy difference given a set of entries in the e_rs matrix
+//template <bool exact, class MEntries, class Eprop, class EMat, class BGraph>
+//[[gnu::always_inline]] [[gnu::flatten]] [[gnu::hot]] inline
+double entries_dS(const Blockmodel &blockmodel, const Delta &delta) {  // MEntries& m_entries, Eprop& mrs, EMat& emat, BGraph& bg) {
+    const std::shared_ptr<ISparseMatrix> matrix = blockmodel.blockmatrix();
+    double dS = 0;
+    for (const std::tuple<long, long, long> &entry : delta.entries()) {
+        long row = std::get<0>(entry);
+        long col = std::get<1>(entry);
+        auto change = (long) std::get<2>(entry);
+        // delta += + E(old) - E(new)
+        auto value = (long) matrix->get(row, col);
+        dS += eterm_exact(row, col, value + change) - eterm_exact(row, col, value);
+        assert(!std::isinf(dS));
+        assert(!std::isnan(dS));
+    }
+//    entries_op(m_entries, emat,
+//               [&](auto r, auto s, auto& me, auto d)
+//               {
+//                   size_t ers = 0;
+//                   if (me != emat.get_null_edge())
+//                       ers = mrs[me];
+//                   assert(int(ers) + d >= 0);
+//                   if constexpr (exact)
+//                       dS += eterm_exact(r, s, ers + d, bg) - eterm_exact(r, s, ers, bg);
+//                   else
+//                       dS += eterm(r, s, ers + d, bg) - eterm(r, s, ers, bg);
+//               });
+    return dS;
+}
+
+// compute the entropy difference of a virtual move of vertex from block r
+// to nr
+//template <bool exact, class MEntries>
+double virtual_move_sparse(const Blockmodel &blockmodel, const Delta &delta, long weight,
+                           const utils::ProposalAndEdgeCounts &proposal) {  // size_t v, size_t r, size_t nr, MEntries& m_entries) {
+    if (delta.current_block() == delta.proposed_block()) return 0.;
+
+    double dS = entries_dS(blockmodel, delta);  // <exact>(m_entries, _mrs, _emat, _bg);
+
+    long kin = proposal.num_in_neighbor_edges;
+    long kout = proposal.num_out_neighbor_edges;
+//    auto [kin, kout] = get_deg(v, _eweight, _degs, _g);
+
+//    int dwr = _vweight[v];
+//    int dwnr = dwr;
+//    int dwr = 1, dwnr = 1;
+
+//    if (r == null_group && dwnr == 0)
+//        dwnr = 1;
+
+    auto vt = [&](auto out_degree, auto in_degree, auto w) { // , auto nr) {
+        assert(out_degree >= 0 && in_degree >=0);
+//        if constexpr (exact)
+        return vterm_exact(out_degree, in_degree, w);  // , nr, _deg_corr, _bg);
+//        else
+//            return vterm(mrp, mrm, nr, _deg_corr, _bg);
+    };
+
+//    if (r != null_group)
+//    {
+//    auto mrp_r = _mrp[r];
+//    auto mrm_r = _mrm[r];
+//    auto wr_r = _wr[r];
+    dS += vt(blockmodel.degrees_out(delta.current_block()) - kout, blockmodel.degrees_in(delta.current_block()) - kin, blockmodel.block_size(delta.current_block()) - weight);  // , wr_r - dwr);
+    dS -= vt(blockmodel.degrees_out(delta.current_block()), blockmodel.degrees_in(delta.current_block()), blockmodel.block_size(delta.current_block()));  //        , mrm_r      , wr_r      );
+    assert(!std::isinf(dS));
+    assert(!std::isnan(dS));
+//    }
+
+//    if (nr != null_group)
+//    {
+//        auto mrp_nr = _mrp[nr];
+//        auto mrm_nr = _mrm[nr];
+//        auto wr_nr = _wr[nr];
+    dS += vt(blockmodel.degrees_out(delta.proposed_block()) + kout, blockmodel.degrees_in(delta.proposed_block()) + kin, blockmodel.block_size(delta.proposed_block()) + weight);  // , wr_nr + dwnr);
+    dS -= vt(blockmodel.degrees_out(delta.proposed_block()), blockmodel.degrees_in(delta.proposed_block()), blockmodel.block_size(delta.proposed_block()));  //        , mrm_nr      , wr_nr       );
+    assert(!std::isinf(dS));
+    assert(!std::isnan(dS));
+//    }
+
+//    std::cout << "delta sparse entropy: " << dS << std::endl;
+    return dS;
+}
+
+double get_delta_partition_dl(long num_vertices, const Blockmodel &blockmodel, const Delta &delta, long weight) {  // size_t v, size_t r, size_t nr, const entropy_args_t& ea) {
+    if (delta.current_block() == delta.proposed_block()) return 0.;
+
+    double dS = 0;
+
+//    auto& f = _bfield[v];
+//    if (!f.empty())
+//    {
+//        if (nr != null_group)
+//            dS -= (nr < f.size()) ? f[nr] : f.back();
+//        if (r != null_group)
+//            dS += (r < f.size()) ? f[r] : f.back();
+//    }
+
+//    if (r == nr)
+//        return 0;
+
+//    if (r != null_group)
+//        r = get_r(r);
+
+//    if (nr != null_group)
+//        nr = get_r(nr);
+
+//    int n = 1;  // vweight[v]; for block_merge, change this to size of the blockmodel
+//    if (n == 0) {
+//        if (r == null_group)
+//            n = 1;
+//        else
+//            return 0;
+//    }
+
+    double S_b = 0;
+    double S_a = 0;
+
+//    if (r != null_group)
+//    {
+//    std::cout << "size of current: " << blockmodel.block_size(delta.current_block())
+    S_b += -fastlgamma(blockmodel.block_size(delta.current_block()) + 1);  // _total[r] + 1);
+    S_a += -fastlgamma(blockmodel.block_size(delta.current_block()) - weight + 1);  // _total[r] - n + 1);
+
+//    std::cout << "point A: S_b = " << S_b << " S_a = " << S_a << std::endl;
+//    }
+
+//    if (nr != null_group)
+//    {
+    S_b += -fastlgamma(blockmodel.block_size(delta.proposed_block()) + 1);  // _total[nr] + 1);
+    S_a += -fastlgamma(blockmodel.block_size(delta.proposed_block()) + weight + 1);  // _total[nr] + n + 1);
+
+//    std::cout << "point B: S_b = " << S_b << " S_a = " << S_a << std::endl;
+
+//    }
+
+//    int dN = 0;
+//    if (r == null_group)
+//        dN += n;
+//    if (nr == null_group)
+//        dN -= n;
+//
+//    S_b += lgamma_fast(_N + 1);
+//    S_a += lgamma_fast(_N + dN + 1);
+
+    int dB = 0;
+    if (blockmodel.block_size(delta.current_block()) == weight)
+        dB--;
+    if (blockmodel.block_size(delta.proposed_block()) == 0)
+        dB++;
+//    if (r != null_group && _total[r] == n)
+//        dB--;
+//    if (nr != null_group && _total[nr] == 0)
+//        dB++;
+
+    if (dB != 0) {
+        S_b += fastlbinom(num_vertices - 1, blockmodel.num_nonempty_blocks() - 1);
+        S_a += fastlbinom(num_vertices - 1, blockmodel.num_nonempty_blocks() + dB - 1);
+    }
+
+//    std::cout << "point C: S_b = " << S_b << " S_a = " << S_a << std::endl;
+
+//    if ((dN != 0 || dB != 0)) {
+//        S_b += lbinom_fast(_N - 1, _actual_B - 1);
+//        S_a += lbinom_fast(_N - 1 + dN, _actual_B + dB - 1);
+//    }
+
+//    if (dN != 0)
+//    {
+//        S_b += safelog_fast(_N);
+//        S_a += safelog_fast(_N + dN);
+//    }
+
+    dS += S_a - S_b;
+    assert(!std::isinf(dS));
+    assert(!std::isnan(dS));
+
+//    if (ea.partition_dl)
+//    {
+//        auto& ps = get_partition_stats(v);
+//        dS += ps.get_delta_partition_dl(v, r, nr, _vweight);
+//    }
+
+//    if (_coupled_state != nullptr)
+//    {
+//        bool r_vacate = (r != null_group && _wr[r] == _vweight[v]);
+//        bool nr_occupy = (nr != null_group && _wr[nr] == 0);
+//
+//        auto& bh = _coupled_state->get_b();
+//        if (r_vacate && nr_occupy)
+//        {
+//            dS += _coupled_state->get_delta_partition_dl(r, bh[r], bh[nr],
+//                                                         _coupled_entropy_args);
+//        }
+//        else
+//        {
+//            if (r_vacate)
+//                dS += _coupled_state->get_delta_partition_dl(r, bh[r], null_group,
+//                                                             _coupled_entropy_args);
+//            if (nr_occupy)
+//                dS += _coupled_state->get_delta_partition_dl(nr, null_group, bh[nr],
+//                                                             _coupled_entropy_args);
+//        }
+//    }
+//    std::cout << "delta partition dl: " << dS << std::endl;
+    return dS;
+}
+
+//template <class DegOP>
+//double get_delta_deg_dl_dist_change(size_t r, DegOP&& dop, int diff) {
+double get_delta_deg_dl_dist_change(const Blockmodel &blockmodel, long block, long vkin, long vkout, long vweight,
+                                    int diff) {
+//    if (!args.degreecorrected) return 0.0;
+    // vweight may be unnecessary. At least for the DOp portion
+    auto total_r = blockmodel.block_size(block);  // _total[r];
+    auto ep_r = blockmodel.degrees_out(block);  // _ep[r];
+    auto em_r = blockmodel.degrees_in(block);  // _em[r];
+
+    auto get_Se = [&](int delta, int kin, int kout) {
+        double S = 0;
+        assert(total_r + delta >= 0);
+        assert(em_r + kin >= 0);
+        assert(ep_r + kout >= 0);
+        S += log_q(em_r + kin, total_r + delta);
+        S += log_q(ep_r + kout, total_r + delta);
+        return S;
+    };
+
+    auto get_Sr = [&](int delta) {
+        assert(total_r + delta + 1 >= 0);
+        if (args.undirected)
+            return fastlgamma(total_r + delta + 1);
+        else
+            return 2 * fastlgamma(total_r + delta + 1);
+    };
+
+    auto get_Sk = [&](std::pair<long, long>& deg, int delta) {
+        double S = 0;
+        int nd = 0;
+        if (!args.undirected) {
+//            if (_hist_in[block] != nullptr) {
+//                auto& h = *_hist_in[block];
+//            if (blockmodel.in_degree_histogram(block) != nullptr) {
+            const MapVector<long> &histogram = blockmodel.in_degree_histogram(block);
+            auto iter = histogram.find(std::get<0>(deg));
+            if (iter != histogram.end())
+                nd = iter->second;
+//            }
+            S -= fastlgamma(nd + delta + 1);
+        }
+
+        nd = 0;
+//        if (_hist_out[block] != nullptr) {
+//        auto& h = *_hist_out[block];
+        const MapVector<long> &histogram = blockmodel.out_degree_histogram(block);
+        auto iter = histogram.find(std::get<1>(deg));
+        if (iter != histogram.end())
+            nd = iter->second;
+//        }
+
+        return S - fastlgamma(nd + delta + 1);
+    };
+
+    double S_b = 0, S_a = 0;
+    int tkin = 0, tkout = 0, n = 0;
+//    dop([&](size_t kin, size_t kout, int nk)
+//        {
+    tkin += vkin;  //  * vweight;
+    tkout += vkout;  //  * vweight;
+    n += vweight;
+
+    std::pair<long, long> deg = std::make_pair(vkin, vkout);
+    S_b += get_Sk(deg,         0);
+    S_a += get_Sk(deg, diff * vweight);
+//        });
+
+    S_b += get_Se(       0,           0,            0);
+    S_a += get_Se(diff * n, diff * tkin, diff * tkout);
+
+    S_b += get_Sr(       0);
+    S_a += get_Sr(diff * n);
+
+    return S_a - S_b;
+}
+
+
+//template <class Graph, class VProp, class EProp, class Degs>
+double get_delta_deg_dl(long vertex, const Blockmodel &blockmodel, const Delta &delta, const Graph &graph) {  // size_t r, size_t nr, VProp& vweight,
+//                        EProp& eweight, Degs& degs, Graph& g, int kind) {
+//    if (r == nr || vweight[v] == 0)
+//        return 0;
+    if (!args.degreecorrected) return 0.00;
+    if (delta.current_block() == delta.proposed_block()) return 0.;  // for block_merge, it's this || size(block) == 0
+//    if (r != null_group)
+//        r = get_r(r);
+//    if (nr != null_group)
+//        nr = get_r(nr);
+
+//    auto dop = [&](auto&& f) {
+//        long kin = graph.in_neighbors(vertex).size();
+//        long kout = graph.out_neighbors(vertex).size();
+////        auto [kin, kout] = get_deg(v, eweight, degs, g);
+//        f(kin, kout, 1);  // for block merge, it's size(block) instead of 1 | vweight[v]);
+//    };
+
+    long vkin = graph.in_neighbors(vertex).size();
+    long vkout = graph.out_neighbors(vertex).size();
+    double dS = 0;
+//    if (r != null_group)
+//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  dop, -1);
+    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  vkin, vkout, 1, -1);
+//    if (nr != null_group)
+//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), dop, +1);
+    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), vkin, vkout, 1, +1);
+    assert(!std::isinf(dS));
+    assert(!std::isnan(dS));
+
+//    }
+//    std::cout << "delta degree dl: " << dS << std::endl;
+    return dS;
+}
+
+//template <class VProp, class Graph>
+double get_delta_edges_dl(const Blockmodel &blockmodel, const Delta &delta, long weight, long num_edges) {
+    if (delta.current_block() == delta.proposed_block())
+        return 0;
+
+//    if (r != null_group)
+//        r = get_r(r);
+//    if (nr != null_group)
+//        nr = get_r(nr);
+
+    double S_b = 0, S_a = 0;
+
+//    int n = weight;  // vweight[v];
+
+//    if (n == 0)
+//    {
+//        if (r == null_group)
+//            n = 1;
+//        else
+//            return 0;
+//    }
+
+    int dB = 0;
+//    if (r != null_group && _total[r] == n)
+    if (blockmodel.block_size(delta.current_block()) == weight)
+        dB--;
+//    if (nr != null_group && _total[nr] == 0)
+    if (blockmodel.block_size(delta.proposed_block()) == 0)
+        dB++;
+
+    if (dB != 0) {
+        S_b += get_edges_dl(blockmodel.num_nonempty_blocks(), num_edges);
+        S_a += get_edges_dl(blockmodel.num_nonempty_blocks() + dB, num_edges);
+    }
+
+    double dS = S_a - S_b;
+    assert(!std::isinf(dS));
+    assert(!std::isnan(dS));
+//    std::cout << "delta edges dl: " << dS << std::endl;
+    return dS;
+}
+
+double delta_mdl(const Blockmodel &blockmodel, const Graph &graph, long vertex, const Delta &delta,
+                 const utils::ProposalAndEdgeCounts &proposal) {
+//    std::cout << blockmodel.block_assignment(vertex) << " != " << delta.current_block() << std::endl;
+    assert(blockmodel.block_assignment(vertex) == delta.current_block());
+
+//    get_move_entries(v, r, nr, m_entries, [](auto) constexpr { return false; });
+
+    if (delta.current_block() == delta.proposed_block()) return 0;
+//    if (r == nr || _vweight[v] == 0)
+//        return 0;
+
+    double dS = 0;
+    dS = virtual_move_sparse(blockmodel, delta, 1, proposal);  // <true>(v, r, nr, m_entries);
+
+    double dS_dl = 0;
+
+    dS_dl += get_delta_partition_dl(graph.num_vertices(), blockmodel, delta, 1);  // v, r, nr, ea);
+    assert(!std::isinf(dS_dl));
+    assert(!std::isnan(dS_dl));
+
+//    if (ea.degree_dl || ea.edges_dl) {
+//    auto& ps = get_partition_stats(v);
+//    if (_deg_corr && ea.degree_dl)
+    dS_dl += get_delta_deg_dl(vertex, blockmodel, delta, graph);  // v, r, nr, _vweight, _eweight, _degs, _g, ea.degree_dl_kind);
+//    if (ea.edges_dl)
+//    {
+//    size_t actual_B = 0;
+//    for (auto& ps : _partition_stats)
+//        actual_B += ps.get_actual_B();
+    dS_dl += get_delta_edges_dl(blockmodel, delta, 1, graph.num_edges());  // v, r, nr, _vweight, actual_B, _g);
+
+    return dS + BETA_DL * dS_dl;
+}
+
+double get_delta_deg_dl(const Blockmodel &blockmodel, const Delta &delta) {
+    if (!args.degreecorrected) return 0.0;
+
+    if (delta.current_block() == delta.proposed_block()) return 0.;  // for block_merge, it's this || size(block) == 0
+
+    long vkin = blockmodel.degrees_in(delta.current_block());  // graph.in_neighbors(vertex).size();
+    long vkout = blockmodel.degrees_out(delta.current_block());  // graph.out_neighbors(vertex).size();
+    long weight = blockmodel.block_size(delta.current_block());
+    double dS = 0;
+//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  dop, -1);
+    dS += get_delta_deg_dl_dist_change(blockmodel, delta.current_block(),  vkin, vkout, weight, -1);
+//    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), dop, +1);
+    dS += get_delta_deg_dl_dist_change(blockmodel, delta.proposed_block(), vkin, vkout, weight, +1);
+
+//    }
+    return dS;
+}
+
+double block_merge_delta_mdl(const Blockmodel &blockmodel, const utils::ProposalAndEdgeCounts &proposal,
+                             const Graph &graph, const Delta &delta) {
+//    get_move_entries(v, r, nr, m_entries, [](auto) constexpr { return false; });
+
+    if (delta.current_block() == delta.proposed_block()) return 0;
+
+    if (blockmodel.block_size(delta.current_block()) == 0 || blockmodel.block_size(delta.proposed_block()) == 0)
+        return 0;
+//    if (r == nr || _vweight[v] == 0)
+//        return 0;
+
+    double dS = 0;
+
+    dS = virtual_move_sparse(blockmodel, delta, blockmodel.block_size(delta.current_block()), proposal);  // <true>(v, r, nr, m_entries);
+
+    double dS_dl = 0;
+
+    dS_dl += get_delta_partition_dl(graph.num_vertices(), blockmodel, delta,
+                                    blockmodel.block_size(delta.current_block()));  // v, r, nr, ea);
+
+    dS_dl += get_delta_deg_dl(blockmodel, delta);  // v, r, nr, _vweight, _eweight, _degs, _g, ea.degree_dl_kind);
+
+    dS_dl += get_delta_edges_dl(blockmodel, delta, blockmodel.block_size(delta.current_block()), graph.num_edges());  // v, r, nr, _vweight, actual_B, _g);
+
+    return dS + BETA_DL * dS_dl;
+}
+
+}  // namespace nonparametric
+
+}  // namespace entropy
diff --git a/src/finetune.cpp b/src/finetune.cpp
index f51ddca..b37246b 100644
--- a/src/finetune.cpp
+++ b/src/finetune.cpp
@@ -1,1009 +1,1009 @@
-#include "finetune.hpp"
-
-#include "args.hpp"
-#include "entropy.hpp"
-#include "mpi_data.hpp"
-#include "rng.hpp"
-#include "utils.hpp"
-#include "typedefs.hpp"
-
-#include <algorithm>
-#include <cassert>
-#include <chrono>
-//#include <fenv.h>
-#include <iostream>
-#include <random>
-
-namespace finetune {
-
-long num_surrounded = 0;
-//std::ofstream my_file;
-
-bool accept(double delta_entropy, double hastings_correction) {
-    if (args.greedy) {
-        return delta_entropy < 0.0;
-    }
-    std::uniform_real_distribution<double> distribution(0.0, 1.0);
-    double random_probability = distribution(rng::generator());
-    // NOTE: 3.0 can be a user parameter (beta) -- higher value favors exploitation
-    double accept_probability = exp(-3.0 * delta_entropy) * hastings_correction;
-    accept_probability = (accept_probability >= 1.0) ? 1.0 : accept_probability;
-    return random_probability <= accept_probability;
-}
-
-Blockmodel &asynchronous_gibbs(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
-    std::cout << "Asynchronous Gibbs iteration" << std::endl;
-    if (blockmodel.num_blocks() == 1) {
-        return blockmodel;
-    }
-    std::vector<double> delta_entropies;
-    std::vector<long> vertex_moves;
-    std::vector<long> vertices = utils::range<long>(0, graph.num_vertices());
-    long total_vertex_moves = 0;
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    double initial_entropy = blockmodel.getOverall_entropy();
-    double last_entropy = initial_entropy;
-    std::vector<long> shuffled_vertices;
-    if (!args.ordered) {
-        shuffled_vertices = utils::range<long>(0, graph.num_vertices());
-    }
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-        if (!args.ordered) {
-            unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
-            std::shuffle(shuffled_vertices.begin(), shuffled_vertices.end(), std::mt19937_64(seed));
-        }
-        long _vertex_moves = 0;
-        double num_batches = args.batches;
-        long batch_size = long(ceil(double(graph.num_vertices()) / num_batches));
-        for (long batch = 0; batch < graph.num_vertices() / batch_size; ++batch) {
-            long start = batch * batch_size;
-            long end = std::min(graph.num_vertices(), (batch + 1) * batch_size);
-            // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
-            // asynchronous Gibbs sampling
-            std::vector<VertexMove_v3> moves(graph.num_vertices());
-            double start_t = MPI_Wtime();
-            #pragma omp parallel for schedule(dynamic) default(none) \
-            shared(start, end, blockmodel, graph, _vertex_moves, moves, args, shuffled_vertices)
-            for (long index = start; index < end; ++index) {
-                long vertex = index;
-                if (!args.ordered) {
-                    vertex = shuffled_vertices[index];
-                }
-                VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
-                if (proposal.did_move) {
-                    _vertex_moves++;
-                }
-                moves[vertex] = proposal;
-            }
-            double parallel_t = MPI_Wtime();
-            timers::MCMC_parallel_time += parallel_t - start_t;
-            for (const VertexMove_v3 &move : moves) {
-                if (!move.did_move) continue;
-                if (blockmodel.move_vertex(move)) {
-                    _vertex_moves++;
-                }
-            }
-            timers::MCMC_vertex_move_time += MPI_Wtime() - parallel_t;
-        }
-        double entropy = entropy::mdl(blockmodel, graph);
-        double delta_entropy = entropy - last_entropy;
-        delta_entropies.push_back(delta_entropy);
-        last_entropy = entropy;
-        vertex_moves.push_back(_vertex_moves);
-        timers::MCMC_moves += _vertex_moves;
-        std::cout << "Itr: " << iteration << ", number of vertex moves: " << _vertex_moves << ", delta S: ";
-        std::cout << delta_entropy / initial_entropy << std::endl;
-        total_vertex_moves += _vertex_moves;
-        timers::MCMC_iterations++;
-        // Early stopping
-        if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-    std::cout << blockmodel.getOverall_entropy() << std::endl;
-    return blockmodel;
-}
-
-Blockmodel &asynchronous_gibbs_load_balanced(Blockmodel &blockmodel, const Graph &graph,
-                                             bool golden_ratio_not_reached) {
-    std::cout << "Load Balanced Asynchronous Gibbs iteration" << std::endl;
-    if (blockmodel.num_blocks() == 1) {
-        return blockmodel;
-    }
-//    std::vector<std::vector<long>> thread_vertices = load_balance(graph);
-    std::vector<double> delta_entropies;
-    std::vector<long> vertex_moves;
-    std::vector<long> all_vertices = utils::range<long>(0, graph.num_vertices());
-    long total_vertex_moves = 0;
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    double initial_entropy = blockmodel.getOverall_entropy();
-    double last_entropy = initial_entropy;
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-        std::shuffle(all_vertices.begin(), all_vertices.end(), rng::generator());
-        long _vertex_moves = 0;
-        size_t num_batches = args.batches;
-        auto batch_size = size_t(ceil(double(graph.num_vertices()) / double(num_batches)));
-        long num_processed_vertices = 0;
-        for (size_t batch = 0; batch < num_batches; ++batch) {
-            size_t start = batch * batch_size;
-            size_t end = std::min(all_vertices.size(), (batch + 1) * batch_size);
-            std::vector<VertexMove_v3> moves(graph.num_vertices());
-            std::vector<std::vector<long>> thread_vertices = load_balance(graph, all_vertices, start, end);
-            double parallel_start_t = MPI_Wtime();
-            #pragma omp parallel default(none) shared(args, batch, blockmodel, std::cout, graph, moves, \
-            num_batches, thread_vertices, num_processed_vertices)
-            {
-                assert(omp_get_num_threads() == (int) thread_vertices.size());
-                std::vector<long> &vertices = thread_vertices[omp_get_thread_num()];
-//                auto batch_size = size_t(ceil(double(vertices.size()) / double(num_batches)));
-//                size_t start = batch * batch_size;
-//                size_t end = std::min(vertices.size(), (batch + 1) * batch_size);
-//                std::shuffle(vertices.begin() + long(start), vertices.begin() + long(end), rng::generator());
-                for (long vertex : vertices) { // size_t vertex_index = start; vertex_index < end; ++vertex_index) {
-//                    long vertex = vertices[vertex_index];
-                    VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
-                    moves[vertex] = proposal;
-                    #pragma omp atomic
-                    num_processed_vertices++;
-                }
-            }
-            double end_parallel_t = MPI_Wtime();
-            timers::MCMC_parallel_time += end_parallel_t - parallel_start_t;
-            for (const VertexMove_v3 &move: moves) {
-                if (!move.did_move) continue;
-                if (blockmodel.move_vertex(move)) {
-                    _vertex_moves++;
-                }
-            }
-            timers::MCMC_vertex_move_time += MPI_Wtime() - end_parallel_t;
-        }
-        assert(num_processed_vertices == graph.num_vertices());
-        double entropy = entropy::mdl(blockmodel, graph);
-        double delta_entropy = entropy - last_entropy;
-        delta_entropies.push_back(delta_entropy);
-        last_entropy = entropy;
-        vertex_moves.push_back(_vertex_moves);
-        timers::MCMC_moves += _vertex_moves;
-        std::cout << "Itr: " << iteration << ", number of vertex moves: " << _vertex_moves << ", delta S: ";
-        std::cout << delta_entropy / initial_entropy << std::endl;
-        total_vertex_moves += _vertex_moves;
-        timers::MCMC_iterations++;
-        // Early stopping
-        if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-    std::cout << blockmodel.getOverall_entropy() << std::endl;
-    return blockmodel;
-}
-
-EdgeWeights block_edge_weights(const std::vector<long> &block_assignment, const EdgeWeights &neighbor_weights) {
-    std::map<long, long> block_counts;
-    for (ulong i = 0; i < neighbor_weights.indices.size(); ++i) {
-        long neighbor = neighbor_weights.indices[i];
-        long block = block_assignment[neighbor];
-        long weight = neighbor_weights.values[i];
-        block_counts[block] += weight; // block_count[new block] should initialize to 0
-    }
-    std::vector<long> blocks;
-    std::vector<long> weights;
-    for (auto const &entry: block_counts) {
-        blocks.push_back(entry.first);
-        weights.push_back(entry.second);
-    }
-    return EdgeWeights{blocks, weights};
-}
-
-Delta blockmodel_delta(long vertex, long current_block, long proposed_block, const EdgeWeights &out_edges,
-                       const EdgeWeights &in_edges, const Blockmodel &blockmodel) {
-    Delta delta(current_block, proposed_block, long(std::max(out_edges.indices.size(), in_edges.indices.size())));
-
-    // current_block -> current_block == proposed_block --> proposed_block  (this includes self edges)
-    // current_block --> other_block == proposed_block --> other_block
-    // other_block --> current_block == other_block --> proposed_block
-    // current_block --> proposed_block == proposed_block --> proposed_block
-    // proposed_block --> current_block == proposed_block --> proposed_block
-    for (size_t i = 0; i < out_edges.indices.size(); ++i) {
-        long out_vertex = out_edges.indices[i];
-        long out_block = blockmodel.block_assignment(out_vertex);
-        long edge_weight = out_edges.values[i];
-        if (vertex == out_vertex) {
-            delta.add(proposed_block, proposed_block, edge_weight);
-            delta.self_edge_weight(1);
-        } else {
-            delta.add(proposed_block, out_block, edge_weight);
-        }
-        delta.sub(current_block, out_block, edge_weight);
-    }
-    for (size_t i = 0; i < in_edges.indices.size(); ++i) {
-        long in_vertex = in_edges.indices[i];
-        long in_block = blockmodel.block_assignment(in_vertex);
-        long edge_weight = in_edges.values[i];
-        if (vertex == in_vertex) {
-            delta.add(proposed_block, proposed_block, edge_weight);
-            delta.self_edge_weight(1);
-        } else {
-            delta.add(in_block, proposed_block, edge_weight);
-        }
-        delta.sub(in_block, current_block, edge_weight);
-    }
-    return delta;
-}
-
-std::pair<std::vector<long>, long> count_low_degree_block_neighbors(const Graph &graph, const Blockmodel &blockmodel) {
-    const std::vector<long> &low_degree_vertices = graph.low_degree_vertices();
-    std::vector<long> result = utils::constant<long>(blockmodel.num_blocks(), 0);
-    long total = 0;
-    for (long vertex : low_degree_vertices) {
-        long block = blockmodel.block_assignment(vertex);
-        long neighbors = blockmodel.blockmatrix()->distinct_edges(block);
-        result[block] = std::max(result[block], neighbors);
-        total += neighbors;
-    }
-    std::vector<long> ownership = utils::constant<long>((long) low_degree_vertices.size(), -1);
-    long thread = 0;
-    long current_neighbors = 0;
-    long average_neighbors = total / omp_get_max_threads();
-    for (long index = 0; index < (long) low_degree_vertices.size(); ++index) {
-        long vertex = low_degree_vertices[index];
-        long block = blockmodel.block_assignment(vertex);
-        long neighbors = result[block];
-        current_neighbors += neighbors;
-        ownership[index] = thread;
-        if (current_neighbors >= average_neighbors && thread < (omp_get_max_threads() - 1)) {
-            current_neighbors = 0;
-            thread++;
-        }
-    }
-    return std::make_pair(ownership, total);
-}
-
-bool early_stop(long iteration, bool golden_ratio_not_reached, double initial_entropy,
-                std::vector<double> &delta_entropies) {
-    size_t last_index = delta_entropies.size() - 1;
-    if (delta_entropies[last_index] == 0.0) {
-        return true;
-    }
-    if (iteration < 3) {
-        return false;
-    }
-    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
-    average /= -3.0;
-    double threshold;
-    if (golden_ratio_not_reached) { // Golden ratio bracket not yet established
-        threshold = 5e-4 * initial_entropy;
-    } else {
-        threshold = 1e-4 * initial_entropy;
-    }
-    return average < threshold;
-}
-
-bool early_stop(long iteration, double initial_entropy, std::vector<double> &delta_entropies) {
-    if (iteration < 3) {
-        return false;
-    }
-    size_t last_index = delta_entropies.size() - 1;
-    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
-    average /= -3.0;
-    double threshold = 1e-4 * initial_entropy;
-    return average < threshold;
-}
-
-bool early_stop_parallel(long iteration, bool golden_ratio_not_reached, double initial_entropy,
-                         std::vector<double> &delta_entropies, std::vector<long> &vertex_moves) {
-    size_t last_index = delta_entropies.size() - 1;
-    if (vertex_moves[last_index] == 0) {
-        return true;
-    }
-    if (iteration < 4) {
-        return false;
-    }
-    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
-    average /= -3.0;
-    double threshold;
-    if (golden_ratio_not_reached) { // Golden ratio bracket not yet established
-        threshold = 5e-4 * initial_entropy;
-    } else {
-        threshold = 1e-4 * initial_entropy;
-    }
-    if (average < threshold) return true;
-    long max = std::max(vertex_moves[last_index - 1],
-                       std::max(vertex_moves[last_index - 2], vertex_moves[last_index - 3]));
-    return vertex_moves[last_index] > max;
-}
-
-[[maybe_unused]] EdgeCountUpdates edge_count_updates(ISparseMatrix *blockmodel, long current_block, long proposed_block,
-                                                     EdgeWeights &out_blocks, EdgeWeights &in_blocks,
-                                                     long self_edge_weight) {
-    std::vector<long> block_row = blockmodel->getrow(current_block);
-    std::vector<long> block_col = blockmodel->getcol(current_block);
-    std::vector<long> proposal_row = blockmodel->getrow(proposed_block);
-    std::vector<long> proposal_col = blockmodel->getcol(proposed_block);
-
-    long count_in_block = 0, count_out_block = 0;
-    long count_in_proposal = self_edge_weight, count_out_proposal = self_edge_weight;
-
-    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
-        long index = in_blocks.indices[i];
-        long value = in_blocks.values[i];
-        if (index == current_block) {
-            count_in_block += value;
-        }
-        if (index == proposed_block) {
-            count_in_proposal += value;
-        }
-        block_col[index] -= value;
-        proposal_col[index] += value;
-    }
-    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
-        long index = out_blocks.indices[i];
-        long value = out_blocks.values[i];
-        if (index == current_block) {
-            count_out_block += value;
-        }
-        if (index == proposed_block) {
-            count_out_proposal += value;
-        }
-        block_row[index] -= value;
-        proposal_row[index] += value;
-    }
-
-    proposal_row[current_block] -= count_in_proposal;
-    proposal_row[proposed_block] += count_in_proposal;
-    proposal_col[current_block] -= count_out_proposal;
-    proposal_col[proposed_block] += count_out_proposal;
-
-    block_row[current_block] -= count_in_block;
-    block_row[proposed_block] += count_in_block;
-    block_col[current_block] -= count_out_block;
-    block_col[proposed_block] += count_out_block;
-
-    return EdgeCountUpdates{block_row, proposal_row, block_col, proposal_col};
-}
-
-// TODO: remove double counts from the edge count updates? But then we'll have to figure out how to correctly update
-// the blockmodel since we won't be able to do bm.column[block/proposal] = updates.block/proposal_col
-void edge_count_updates_sparse(const Blockmodel &blockmodel, long vertex, long current_block, long proposed_block,
-                               EdgeWeights &out_edges, EdgeWeights &in_edges, SparseEdgeCountUpdates &updates) {
-    updates.block_row = blockmodel.blockmatrix()->getrow_sparse(current_block);
-    updates.block_col = blockmodel.blockmatrix()->getcol_sparse(current_block);
-    updates.proposal_row = blockmodel.blockmatrix()->getrow_sparse(proposed_block);
-    updates.proposal_col = blockmodel.blockmatrix()->getcol_sparse(proposed_block);
-
-    for (size_t i = 0; i < out_edges.indices.size(); ++i) {
-        long out_vertex = out_edges.indices[i];
-        long out_block = blockmodel.block_assignment(out_vertex);
-        long edge_weight = out_edges.values[i];
-        if (vertex == out_vertex) {
-            updates.proposal_row[proposed_block] += edge_weight;
-            updates.proposal_col[proposed_block] += edge_weight;
-        } else {
-            updates.proposal_row[out_block] += edge_weight;
-            if (out_block == proposed_block)
-                updates.proposal_col[proposed_block] += edge_weight;
-            if (out_block == current_block)
-                updates.block_col[proposed_block] += edge_weight;
-        }
-        updates.block_row[out_block] -= edge_weight;
-        if (out_block == current_block)
-            updates.block_col[current_block] -= edge_weight;
-        if (out_block == proposed_block)
-            updates.proposal_col[current_block] -= edge_weight;
-    }
-    for (size_t i = 0; i < in_edges.indices.size(); ++i) {
-        long in_vertex = in_edges.indices[i];
-        long in_block = blockmodel.block_assignment(in_vertex);
-        long edge_weight = in_edges.values[i];
-        if (vertex == in_vertex) {
-            updates.proposal_col[proposed_block] += edge_weight;
-            updates.proposal_row[proposed_block] += edge_weight;
-        } else {
-            updates.proposal_col[in_block] += edge_weight;
-            if (in_block == proposed_block)
-                updates.proposal_row[proposed_block] += edge_weight;
-            if (in_block == current_block)
-                updates.block_row[proposed_block] += edge_weight;
-        }
-        updates.block_col[in_block] -= edge_weight;
-        if (in_block == current_block)
-            updates.block_row[current_block] -= edge_weight;
-        if (in_block == proposed_block)
-            updates.proposal_row[current_block] -= edge_weight;
-    }
-}
-
-EdgeWeights edge_weights(const NeighborList &neighbors, long vertex, bool ignore_self) {
-    std::vector<long> indices;
-    std::vector<long> values;
-    // Assumes graph is unweighted
-    const std::vector<long> &neighbor_vector = neighbors[vertex];
-    for (const long neighbor: neighbor_vector) {
-        if (ignore_self && neighbor == vertex) continue;
-        indices.push_back(neighbor);
-        values.push_back(1);
-    }
-    return EdgeWeights{indices, values};
-}
-
-VertexMove eval_vertex_move(long vertex, long current_block, utils::ProposalAndEdgeCounts proposal,
-                            const Blockmodel &blockmodel, const Graph &graph, EdgeWeights &out_edges,
-                            EdgeWeights &in_edges) {
-//    if (args.nodelta)
-//        return eval_vertex_move_nodelta(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
-    const Delta delta = blockmodel_delta(vertex, current_block, proposal.proposal, out_edges, in_edges, blockmodel);
-    double hastings = entropy::hastings_correction(vertex, graph, blockmodel, delta, current_block, proposal);
-    double delta_entropy = args.nonparametric ?
-            entropy::nonparametric::delta_mdl(blockmodel, graph, vertex, delta, proposal) :
-            entropy::delta_mdl(blockmodel, delta, proposal);
-    if (accept(delta_entropy, hastings))
-        return VertexMove{delta_entropy, true, vertex, proposal.proposal};
-    return VertexMove{delta_entropy, false, -1, -1};
-}
-
-VertexMove_v3 eval_vertex_move_v3(long vertex, long current_block, utils::ProposalAndEdgeCounts proposal,
-                                 const Blockmodel &blockmodel, const Graph &graph, EdgeWeights &out_edges,
-                                 EdgeWeights &in_edges) {
-    // TODO: things like size and such should be ulong/size_t, not long.
-    Vertex v = { vertex, long(graph.out_neighbors(vertex).size()), long(graph.in_neighbors(vertex).size()) };
-    const Delta delta = blockmodel_delta(vertex, current_block, proposal.proposal, out_edges, in_edges, blockmodel);
-    double hastings = entropy::hastings_correction(vertex, graph, blockmodel, delta, current_block, proposal);
-    double delta_entropy = args.nonparametric ?
-                           entropy::nonparametric::delta_mdl(blockmodel, graph, vertex, delta, proposal) :
-                           entropy::delta_mdl(blockmodel, delta, proposal);
-    if (accept(delta_entropy, hastings))
-        return VertexMove_v3{delta_entropy, true, v, proposal.proposal, out_edges, in_edges};
-//        return VertexMove_v2{delta_entropy, true, vertex, proposal.proposal, out_edges, in_edges};
-    return VertexMove_v3{delta_entropy, false, InvalidVertex, -1, out_edges, in_edges};
-//    return VertexMove_v2{delta_entropy, false, -1, -1, out_edges, in_edges};
-}
-
-Blockmodel &hybrid_mcmc_load_balanced(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
-        std::cout << "Hybrid MCMC iteration" << std::endl;
-        if (blockmodel.num_blocks() == 1) {
-            return blockmodel;
-        }
-        std::vector<double> delta_entropies;
-        long total_vertex_moves = 0;
-        blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-        double initial_entropy = blockmodel.getOverall_entropy();
-        double num_batches = args.batches;
-        long num_low_degree_vertices = long(graph.low_degree_vertices().size());
-        long batch_size = long(ceil(num_low_degree_vertices / num_batches));
-        std::vector<unsigned long> thread_degrees(omp_get_max_threads());
-//        std::vector<std::pair<long,long>> vertex_properties = sort_vertices_by_degree(graph);
-
-        for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-//            std::vector<std::pair<long,long>> block_neighbors = sort_vertices_by_degree(graph);
-            for (long i = 0; i < omp_get_max_threads(); ++i) {
-                thread_degrees[i] = 0;
-            }
-            num_surrounded = 0;
-            long vertex_moves = 0;
-            double delta_entropy = 0.0;
-            double start_t = MPI_Wtime();
-            for (long vertex : graph.high_degree_vertices()) {  // Only run Metropolis-Hastings on high-degree vertices
-                VertexMove proposal = propose_move(blockmodel, vertex, graph);
-                if (proposal.did_move) {
-                    vertex_moves++;
-                    delta_entropy += proposal.delta_entropy;
-                }
-            }
-            double sequential_t = MPI_Wtime();
-            timers::MCMC_sequential_time += sequential_t - start_t;
-            std::pair<std::vector<long>, long> block_neighbors = count_low_degree_block_neighbors(graph, blockmodel);
-            // TODO: make sure that with batches, we still go over every vertex in the graph
-            for (long batch = 0; batch < num_low_degree_vertices / batch_size; ++batch) {
-                long start = batch * batch_size;
-                long end = std::min(num_low_degree_vertices, (batch + 1) * batch_size);
-                // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
-                // asynchronous Gibbs sampling
-                std::vector<long> block_assignment(blockmodel.block_assignment());
-                std::vector<VertexMove_v3> moves(graph.num_vertices());
-//                omp_set_dynamic(0);
-                start_t = MPI_Wtime();
-                #pragma omp parallel default(none) shared(start, end, blockmodel, graph, vertex_moves, delta_entropy, block_assignment, moves, thread_degrees, block_neighbors, std::cout)
-                {
-                    long thread_id = omp_get_thread_num();
-                    if (thread_id == 0)
-                        std::cout << "Using " << omp_get_num_threads() << "/" << omp_get_max_threads() << " threads!" << std::endl;
-                    // TODO: figure out how to make this happen once per iteration
-//                    std::vector<bool> my_blocks = load_balance(blockmodel, block_neighbors);
-//                    std::vector<bool> my_vertices = load_balance_vertices(graph, vertex_properties);
-//                    std::vector<bool> my_vertices = load_balance_block_neighbors(graph, blockmodel, block_neighbors);
-//                    long num_processed = 0;
-                    for (long index = start; index < end; ++index) {
-                        if (block_neighbors.first[index] != thread_id) continue;
-                        long vertex = graph.low_degree_vertices()[index];
-//                        if (!my_vertices[vertex]) continue;
-                        long block = blockmodel.block_assignment(vertex);
-//                        if (!my_blocks[block]) continue;  // Only process the vertices this thread is responsible for
-                        unsigned long num_neighbors = blockmodel.blockmatrix()->distinct_edges(block);
-                        thread_degrees[thread_id] += num_neighbors;
-                        VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
-                        if (proposal.did_move) {
-                            #pragma omp atomic
-                            vertex_moves++;
-                            delta_entropy += proposal.delta_entropy;
-                            block_assignment[vertex] = proposal.proposed_block;
-                        }
-                        moves[vertex] = proposal;
-//                        num_processed++;
-                    }
-//                    std::cout << thread_id << ": " << num_processed << std::endl;
-                }
-                double parallel_t = MPI_Wtime();
-                timers::MCMC_parallel_time += parallel_t - start_t;
-                for (const VertexMove_v3 &move : moves) {
-                    if (!move.did_move) continue;
-                    blockmodel.move_vertex(move);
-                }
-                timers::MCMC_vertex_move_time += MPI_Wtime() - parallel_t;
-            }
-            delta_entropies.push_back(delta_entropy);
-            std::cout << "Itr: " << iteration << ", number of vertex moves: " << vertex_moves << ", delta S: ";
-            std::cout << delta_entropy / initial_entropy << ", num surrounded vertices: " << num_surrounded << std::endl;
-            total_vertex_moves += vertex_moves;
-            timers::MCMC_iterations++;
-            // Early stopping
-            if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
-                break;
-            }
-        }
-        blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-        std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-        std::cout << blockmodel.getOverall_entropy() << std::endl;
-        timers::MCMC_moves += total_vertex_moves;
-        return blockmodel;
-    }
-
-Blockmodel &hybrid_mcmc(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
-    std::cout << "Hybrid MCMC iteration" << std::endl;
-    if (blockmodel.num_blocks() == 1) {
-        return blockmodel;
-    }
-    std::vector<double> delta_entropies;
-    std::vector<long> vertex_moves;
-    long total_vertex_moves = 0;
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    double initial_entropy = blockmodel.getOverall_entropy();
-    double last_entropy = initial_entropy;
-    double num_batches = args.batches;
-    long num_low_degree_vertices = long(graph.low_degree_vertices().size());
-    long batch_size = long(ceil(num_low_degree_vertices / num_batches));
-    std::vector<long> hdv = graph.high_degree_vertices();
-    std::vector<long> ldv = graph.low_degree_vertices();
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-//        std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
-        num_surrounded = 0;
-        long _vertex_moves = 0;
-        double start_t = MPI_Wtime();
-        if (!args.ordered) {
-            unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
-            std::shuffle(hdv.begin(), hdv.end(), std::mt19937_64(seed));
-            std::shuffle(ldv.begin(), ldv.end(), std::mt19937_64(seed));
-        }
-        for (long vertex : hdv) {  // Only run Metropolis-Hastings on high-degree vertices
-            VertexMove proposal = propose_move(blockmodel, vertex, graph);
-            if (proposal.did_move) {
-                _vertex_moves++;
-            }
-        }
-        double sequential_t = MPI_Wtime();
-        timers::MCMC_sequential_time += sequential_t - start_t;
-//        assert(blockmodel.validate(graph));
-        for (long batch = 0; batch < num_low_degree_vertices / batch_size; ++batch) {
-            start_t = MPI_Wtime();
-            long start = batch * batch_size;
-            long end = std::min(num_low_degree_vertices, (batch + 1) * batch_size);
-            // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
-            // asynchronous Gibbs sampling
-            std::vector<VertexMove_v3> moves(graph.num_vertices());
-            #pragma omp parallel for schedule(dynamic) default(none) \
-            shared(start, end, blockmodel, graph, _vertex_moves, moves, ldv)
-            for (long index = start; index < end; ++index) {
-                long vertex = ldv[index];
-                VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
-                if (proposal.did_move) {
-                    #pragma omp atomic
-                    _vertex_moves++;
-                }
-                moves[vertex] = proposal;
-            }
-            double parallel_t = MPI_Wtime();
-            timers::MCMC_parallel_time += parallel_t - start_t;
-            for (const VertexMove_v3 &move : moves) {
-                if (!move.did_move) continue;
-                if (blockmodel.move_vertex(move)) {
-                    _vertex_moves++;
-                }
-            }
-            timers::MCMC_vertex_move_time += MPI_Wtime() - parallel_t;
-//            assert(blockmodel.validate(graph));
-        }
-        double entropy = entropy::mdl(blockmodel, graph);
-        double delta_entropy = entropy - last_entropy;
-        delta_entropies.push_back(delta_entropy);
-        last_entropy = entropy;
-        vertex_moves.push_back(_vertex_moves);
-        std::cout << "Itr: " << iteration << ", number of vertex moves: " << _vertex_moves << ", delta S: ";
-        std::cout << delta_entropy / initial_entropy << ", num surrounded vertices: " << num_surrounded << std::endl;
-        total_vertex_moves += _vertex_moves;
-        timers::MCMC_iterations++;
-        // Early stopping
-        if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    timers::MCMC_moves += total_vertex_moves;
-    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-    std::cout << blockmodel.getOverall_entropy() << std::endl;
-    return blockmodel;
-}
-
-std::vector<std::vector<long>> load_balance(const Graph &graph) {
-    std::vector<std::vector<long>> thread_vertices(args.threads);
-    std::vector<long> vertex_degrees = graph.degrees();
-    std::vector<long> sorted_indices = utils::argsort<long>(vertex_degrees);
-    for (size_t index = 0; index < size_t(graph.num_vertices()); ++index) {
-        long vertex = sorted_indices[index];
-        size_t thread_id = index % (2 * args.threads);
-        if (thread_id >= size_t(args.threads))
-            thread_id = ((2 * args.threads) - 1) - thread_id;
-        thread_vertices[thread_id].push_back(vertex);
-    }
-    return thread_vertices;
-}
-
-std::vector<std::vector<long>> load_balance(const Graph &graph, const std::vector<long> &all_vertices,
-                                            size_t start_index, size_t end_index) {
-    std::vector<std::vector<long>> thread_vertices(args.threads);
-    std::vector<long> vertex_degrees;
-    for (size_t i = start_index; i < end_index; ++i) {
-        long vertex = all_vertices[i];
-        vertex_degrees.push_back(graph.degree(vertex));
-    }
-    std::vector<long> sorted_indices = utils::argsort<long>(vertex_degrees);
-    for (size_t index = 0; index < vertex_degrees.size(); ++index) {
-        size_t sorted_index = sorted_indices[index];
-        size_t vertex_index = sorted_index + start_index;
-        long vertex = all_vertices[vertex_index];
-        size_t thread_id = sorted_index % (2 * args.threads);
-        if (thread_id >= size_t(args.threads))
-            thread_id = ((2 * args.threads) - 1) - thread_id;
-        thread_vertices[thread_id].push_back(vertex);
-    }
-    return thread_vertices;
-}
-
-
-std::vector<bool> load_balance(const Blockmodel &blockmodel, const std::vector<std::pair<long, long>> &block_neighbors) {
-    // Decide which blocks each thread is responsible for
-    long thread_id = omp_get_thread_num();
-    std::vector<bool> my_blocks = utils::constant<bool>(blockmodel.num_blocks(), false);
-    for (long i = thread_id; i < blockmodel.num_blocks(); i += 2 * omp_get_max_threads()) {
-        long block = block_neighbors[i].first;
-        my_blocks[block] = true;
-    }
-    for (long i = 2 * omp_get_max_threads() - 1 - thread_id; i < blockmodel.num_blocks(); i += 2 * omp_get_max_threads()) {
-        long block = block_neighbors[i].first;
-        my_blocks[block] = true;
-    }
-    return my_blocks;
-}
-
-std::vector<bool> load_balance_block_neighbors(const Graph &graph, const Blockmodel &blockmodel,
-                                               const std::pair<std::vector<long>, long> &block_neighbors) {
-    // Decide which blocks each thread is responsible for
-    long thread_id = omp_get_thread_num();
-    std::vector<bool> my_vertices = utils::constant<bool>(graph.num_vertices(), false);
-    long total_neighbors = block_neighbors.second;
-//    for (long num_neighbors : block_neighbors.first) {
-//        total_neighbors += num_neighbors;
-//    }
-    long average_neighbors = (long) total_neighbors / omp_get_max_threads();
-    if (thread_id == 4)
-        std::cout << "average neighbors: " << average_neighbors << " total neighbors: " << total_neighbors << std::endl << " threads: " << omp_get_max_threads();
-    long thread = 0;
-    long current_neighbors = 0;
-    for (long vertex : graph.low_degree_vertices()) {
-        long block = blockmodel.block_assignment(vertex);
-        current_neighbors += block_neighbors.first[block];
-    }
-    assert(current_neighbors == total_neighbors);
-    current_neighbors = 0;
-    for (long vertex : graph.low_degree_vertices()) {
-        long block = blockmodel.block_assignment(vertex);
-        long neighbors = block_neighbors.first[block];
-        if (thread == thread_id) {
-            my_vertices[vertex] = true;
-        }
-        current_neighbors += neighbors;
-        if (current_neighbors >= average_neighbors && thread < (omp_get_max_threads() - 1)) {
-            current_neighbors = 0;
-            thread++;
-        }
-        if (thread_id == 4) {
-            std::cout << "block: " << block << " neighbors: " << neighbors << " thread: " << thread << " current_neighbors: " << current_neighbors << std::endl;
-        }
-    }
-    if (thread_id == 4)
-        std::cout << "average neighbors: " << average_neighbors << " total neighbors: " << total_neighbors << std::endl << " threads: " << omp_get_max_threads();
-    return my_vertices;
-}
-
-std::vector<bool> load_balance_vertices(const Graph &graph, const std::vector<std::pair<long, long>> &vertex_properties) {
-    // Decide which blocks each thread is responsible for
-    long thread_id = omp_get_thread_num();
-    std::vector<bool> my_vertices = utils::constant<bool>(graph.num_vertices(), false);
-    for (long i = thread_id; i < graph.num_vertices(); i += 2 * omp_get_max_threads()) {
-        long vertex = vertex_properties[i].first;
-        my_vertices[vertex] = true;
-    }
-    for (long i = 2 * omp_get_max_threads() - 1 - thread_id; i < graph.num_vertices(); i += 2 * omp_get_max_threads()) {
-        long vertex = vertex_properties[i].first;
-        my_vertices[vertex] = true;
-    }
-    return my_vertices;
-}
-
-Blockmodel &mcmc(int iteration, const Graph &graph, Blockmodel &blockmodel, BlockmodelTriplet &blockmodel_triplet) {
-//    timers::MCMC_moves = 0;
-//    timers::MCMC_iterations = 0;
-//    timers::MCMC_vertex_move_time = 0;
-//    timers::MCMC_parallel_time = 0;
-//    timers::MCMC_sequential_time = 0;
-    common::candidates = std::uniform_int_distribution<long>(0, blockmodel.num_blocks() - 2);
-//    std::cout << "Starting MCMC vertex moves" << std::endl;
-    if (args.algorithm == "async_gibbs" && iteration < args.asynciterations)
-        blockmodel = finetune::asynchronous_gibbs(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
-    else if (args.algorithm == "async_gibbs_load_balanced" && iteration < args.asynciterations)
-        blockmodel = finetune::asynchronous_gibbs_load_balanced(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
-    else if (args.algorithm == "hybrid_mcmc")
-        blockmodel = finetune::hybrid_mcmc(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
-    else // args.algorithm == "metropolis_hastings"
-        blockmodel = finetune::metropolis_hastings(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
-    return blockmodel;
-}
-
-Blockmodel &metropolis_hastings(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
-    std::cout << "Metropolis hastings iteration" << std::endl;
-    if (blockmodel.num_blocks() == 1) {
-        return blockmodel;
-    }
-    std::vector<double> delta_entropies;
-    long total_vertex_moves = 0;
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    std::vector<long> shuffled_vertices;
-    if (!args.ordered) {
-        shuffled_vertices = utils::range<long>(0, graph.num_vertices());
-    }
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-        if (!args.ordered) {
-            unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
-            std::shuffle(shuffled_vertices.begin(), shuffled_vertices.end(), std::mt19937_64(seed));
-        }
-        long vertex_moves = 0;
-        double delta_entropy = 0.0;
-        double start_t = MPI_Wtime();
-        for (long index = 0; index < graph.num_vertices(); ++index) {
-            long vertex = index;
-            if (!args.ordered) {
-                vertex = shuffled_vertices[index];
-            }
-            VertexMove proposal = propose_move(blockmodel, vertex, graph);
-            if (proposal.did_move) {
-                vertex_moves++;
-                delta_entropy += proposal.delta_entropy;
-            }
-        }
-        timers::MCMC_sequential_time += MPI_Wtime() - start_t;
-        delta_entropies.push_back(delta_entropy);
-        std::cout << "Itr: " << iteration << ", number of vertex moves: " << vertex_moves << ", delta S: ";
-        std::cout << delta_entropy << std::endl;
-        total_vertex_moves += vertex_moves;
-        timers::MCMC_iterations++;
-        // Early stopping
-        if (early_stop(iteration, golden_ratio_not_reached, blockmodel.getOverall_entropy(), delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    timers::MCMC_moves += total_vertex_moves;
-    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-    std::cout << blockmodel.getOverall_entropy() << std::endl;
-    return blockmodel;
-}
-
-VertexMove move_vertex(long vertex, long current_block, utils::ProposalAndEdgeCounts proposal, Blockmodel &blockmodel,
-                       const Graph &graph, EdgeWeights &out_edges, EdgeWeights &in_edges) {
-//    if (args.nodelta)
-//        return move_vertex_nodelta(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
-    Delta delta = blockmodel_delta(vertex, current_block, proposal.proposal, out_edges, in_edges,
-                                   blockmodel);
-    double hastings = entropy::hastings_correction(vertex, graph, blockmodel, delta, current_block, proposal);
-    double delta_entropy = args.nonparametric ?
-                           entropy::nonparametric::delta_mdl(blockmodel, graph, vertex, delta, proposal) :
-                           entropy::delta_mdl(blockmodel, delta, proposal);
-//    double delta_entropy = entropy::delta_mdl(blockmodel, delta, proposal);
-
-    if (accept(delta_entropy, hastings)) {
-        Vertex v = { vertex, (long) graph.out_neighbors(vertex).size(), (long) graph.in_neighbors(vertex).size() };
-        blockmodel.move_vertex(v, delta, proposal);
-        return VertexMove{delta_entropy, true, vertex, proposal.proposal};
-    }
-    return VertexMove{delta_entropy, false, vertex, proposal.proposal};
-}
-
-VertexMove propose_move(Blockmodel &blockmodel, long vertex, const Graph &graph) {
-    bool did_move = false;
-    long current_block = blockmodel.block_assignment(vertex);
-    if (blockmodel.block_size(current_block) == 1) {
-        return VertexMove{std::numeric_limits<double>::max(), did_move, -1, -1 };
-    }
-    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
-    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
-
-    MapVector<long> neighbor_blocks;
-    for (long neighbor : out_edges.indices) {
-        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
-    }
-    for (long neighbor : in_edges.indices) {
-        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
-    }
-    if (neighbor_blocks.size() == 1) {
-        num_surrounded += 1;
-    }
-
-    utils::ProposalAndEdgeCounts proposal = common::propose_new_block(
-            current_block, out_edges, in_edges, blockmodel.block_assignment(), blockmodel, false);
-    if (proposal.proposal == current_block) {
-        return VertexMove{0.0, did_move, -1, -1 };
-    }
-
-    return move_vertex(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
-}
-
-VertexMove_v3 propose_gibbs_move_v3(const Blockmodel &blockmodel, long vertex, const Graph &graph) {
-    bool did_move = false;
-    long current_block = blockmodel.block_assignment(vertex);
-    // TODO: need to do this more intelligently. Instead of preventing moves here, prevent them in the code that
-    // actually does the moves.
-//    if (blockmodel.block_size(current_block) <= args.threads) {
-//        return VertexMove_v3{ 0.0, did_move, InvalidVertex, -1 };
-//    }
-
-    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
-    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
-
-    MapVector<long> neighbor_blocks;
-    for (long neighbor : out_edges.indices) {
-        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
-    }
-    for (long neighbor : in_edges.indices) {
-        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
-    }
-    if (neighbor_blocks.size() == 1) {
-        num_surrounded += 1;
-    }
-
-    utils::ProposalAndEdgeCounts proposal = common::propose_new_block(current_block, out_edges, in_edges,
-                                                                      blockmodel.block_assignment(), blockmodel,
-                                                                      false);
-    if (proposal.proposal == current_block) {
-        return VertexMove_v3{0.0, did_move, {-1, -1, -1 }, -1, out_edges, in_edges};
-    }
-    return eval_vertex_move_v3(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
-}
-
-[[maybe_unused]] Blockmodel &finetune_assignment(Blockmodel &blockmodel, Graph &graph) {
-    std::cout << "Fine-tuning partition results after sample results have been extended to full graph" << std::endl;
-    std::vector<double> delta_entropies;
-    // TODO: Add number of finetuning iterations to evaluation
-    long total_vertex_moves = 0;
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
-        long vertex_moves = 0;
-        double delta_entropy = 0.0;
-        for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
-            VertexMove proposal = propose_move(blockmodel, vertex, graph);
-            if (proposal.did_move) {
-                vertex_moves++;
-                delta_entropy += proposal.delta_entropy;
-            }
-        }
-        delta_entropies.push_back(delta_entropy);
-        std::cout << "Itr: " << iteration << ", number of finetuning moves: " << vertex_moves << ", delta S: ";
-        std::cout << delta_entropy / blockmodel.getOverall_entropy() << std::endl;
-        total_vertex_moves += vertex_moves;
-        // Early stopping
-        if (early_stop(iteration, blockmodel.getOverall_entropy(), delta_entropies)) {
-            break;
-        }
-    }
-    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
-    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
-    std::cout << blockmodel.getOverall_entropy() << std::endl;
-    return blockmodel;
-}
-
-std::vector<std::pair<long, long>> sort_blocks_by_neighbors(const Blockmodel &blockmodel) {
-    std::vector<std::pair<long, long>> block_neighbors;
-    for (long i = 0; i < blockmodel.num_blocks(); ++i) {
-        block_neighbors.emplace_back(std::make_pair(i, blockmodel.blockmatrix()->distinct_edges(i)));
-    }
-    utils::radix_sort(block_neighbors);
-//    std::sort(block_neighbors.begin(), block_neighbors.end(),
-//              [](const std::pair<long, long> &a, const std::pair<long, long> &b) { return a.second > b.second; });
-//    std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
-    return block_neighbors;
-}
-
-std::vector<std::pair<long, long>> sort_blocks_by_size(const Blockmodel &blockmodel) {
-    std::vector<std::pair<long,long>> block_sizes;
-    for (long i = 0; i < blockmodel.num_blocks(); ++i) {
-        block_sizes.emplace_back(std::make_pair(i, 0));
-    }
-    for (const long &block : blockmodel.block_assignment()) {
-        block_sizes[block].second++;
-    }
-    utils::radix_sort(block_sizes);
-//    std::sort(block_sizes.begin(), block_sizes.end(),
-//              [](const std::pair<long, long> &a, const std::pair<long, long> &b) { return a.second > b.second; });
-//    std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
-    return block_sizes;
-}
-
-std::vector<std::pair<long,long>> sort_vertices_by_degree(const Graph &graph) {
-    std::vector<std::pair<long,long>> vertex_degrees;
-    for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
-        long degree = (long)(graph.out_neighbors(vertex).size() + graph.in_neighbors(vertex).size());
-        vertex_degrees.emplace_back(std::make_pair(vertex, degree));
-    }
-    utils::radix_sort(vertex_degrees);
-//    std::sort(vertex_degrees.begin(), vertex_degrees.end(),
-//              [](const std::pair<long, long> &a, const std::pair<long, long> &b) { return a.second > b.second; });
-//    std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
-    return vertex_degrees;
-}
-
-//namespace undirected {
-//
-//double mdl(const Blockmodel &blockmodel, long num_vertices, long num_edges) {
-//    std::cout << "undirected!" << std::endl;
-//    double log_posterior_p = blockmodel.log_posterior_probability(num_edges);
-//    if (std::isnan(log_posterior_p)) {
-//        std::cout << "nan in log posterior" << std::endl;
-//        exit(-5000);
-//    }
-//    double x = blockmodel.getNum_blocks() * (blockmodel.num_blocks() + 1.0) / (2.0 * num_edges);
-//    if (std::isnan(x)) {
-//        std::cout << "nan in X" << std::endl;
-//        exit(-5000);
-//    }
-//    double h = ((1 + x) * log(1 + x)) - (x * log(x));
-//    if (std::isnan(h)) {
-//        std::cout << "nan in h()" << std::endl;
-//    }
-//        // std::cout << "X: " << x << std::endl;
-//        // std::cout << "log(X): " << log(x) << std::endl;
-//    if (std::isnan(h)) {
-//        exit(-5000);
-//    }
-//    double first = (num_edges * h) + (num_vertices * log(blockmodel.num_blocks()));
-//    std::cout << "first: " << first << " log_posterior: " << log_posterior_p << std::endl;
-//    double result = (num_edges * h) + (num_vertices * log(blockmodel.num_blocks())) - log_posterior_p;
-//    if (std::isnan(result)) {
-//        std::cout << "nan in result" << std::endl;
-//        exit(-5000);
-//    }
-//    return result;
-//}
-//
-//}  // namespace undirected
-
-}  // namespace finetune
+#include "finetune.hpp"
+
+#include "args.hpp"
+#include "entropy.hpp"
+#include "mpi_data.hpp"
+#include "rng.hpp"
+#include "utils.hpp"
+#include "typedefs.hpp"
+
+#include <algorithm>
+#include <cassert>
+#include <chrono>
+//#include <fenv.h>
+#include <iostream>
+#include <random>
+
+namespace finetune {
+
+long num_surrounded = 0;
+//std::ofstream my_file;
+
+bool accept(double delta_entropy, double hastings_correction) {
+    if (!args.hastings_correction) {
+        return delta_entropy < 0.0;
+    }
+    std::uniform_real_distribution<double> distribution(0.0, 1.0);
+    double random_probability = distribution(rng::generator());
+    // NOTE: 3.0 can be a user parameter (beta) -- higher value favors exploitation
+    double accept_probability = exp(-3.0 * delta_entropy) * hastings_correction;
+    accept_probability = (accept_probability >= 1.0) ? 1.0 : accept_probability;
+    return random_probability <= accept_probability;
+}
+
+Blockmodel &asynchronous_gibbs(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
+    std::cout << "Asynchronous Gibbs iteration" << std::endl;
+    if (blockmodel.num_blocks() == 1) {
+        return blockmodel;
+    }
+    std::vector<double> delta_entropies;
+    std::vector<long> vertex_moves;
+    std::vector<long> vertices = utils::range<long>(0, graph.num_vertices());
+    long total_vertex_moves = 0;
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    double initial_entropy = blockmodel.getOverall_entropy();
+    double last_entropy = initial_entropy;
+    std::vector<long> shuffled_vertices;
+    if (!args.ordered) {
+        shuffled_vertices = utils::range<long>(0, graph.num_vertices());
+    }
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+        if (!args.ordered) {
+            unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
+            std::shuffle(shuffled_vertices.begin(), shuffled_vertices.end(), std::mt19937_64(seed));
+        }
+        long _vertex_moves = 0;
+        double num_batches = args.batches;
+        long batch_size = long(ceil(double(graph.num_vertices()) / num_batches));
+        for (long batch = 0; batch < graph.num_vertices() / batch_size; ++batch) {
+            long start = batch * batch_size;
+            long end = std::min(graph.num_vertices(), (batch + 1) * batch_size);
+            // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
+            // asynchronous Gibbs sampling
+            std::vector<VertexMove_v3> moves(graph.num_vertices());
+            double start_t = MPI_Wtime();
+            #pragma omp parallel for schedule(dynamic) default(none) \
+            shared(start, end, blockmodel, graph, _vertex_moves, moves, args, shuffled_vertices)
+            for (long index = start; index < end; ++index) {
+                long vertex = index;
+                if (!args.ordered) {
+                    vertex = shuffled_vertices[index];
+                }
+                VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
+                if (proposal.did_move) {
+                    _vertex_moves++;
+                }
+                moves[vertex] = proposal;
+            }
+            double parallel_t = MPI_Wtime();
+            timers::MCMC_parallel_time += parallel_t - start_t;
+            for (const VertexMove_v3 &move : moves) {
+                if (!move.did_move) continue;
+                if (blockmodel.move_vertex(move)) {
+                    _vertex_moves++;
+                }
+            }
+            timers::MCMC_vertex_move_time += MPI_Wtime() - parallel_t;
+        }
+        double entropy = entropy::mdl(blockmodel, graph);
+        double delta_entropy = entropy - last_entropy;
+        delta_entropies.push_back(delta_entropy);
+        last_entropy = entropy;
+        vertex_moves.push_back(_vertex_moves);
+        timers::MCMC_moves += _vertex_moves;
+        std::cout << "Itr: " << iteration << ", number of vertex moves: " << _vertex_moves << ", delta S: ";
+        std::cout << delta_entropy / initial_entropy << std::endl;
+        total_vertex_moves += _vertex_moves;
+        timers::MCMC_iterations++;
+        // Early stopping
+        if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+    std::cout << blockmodel.getOverall_entropy() << std::endl;
+    return blockmodel;
+}
+
+Blockmodel &asynchronous_gibbs_load_balanced(Blockmodel &blockmodel, const Graph &graph,
+                                             bool golden_ratio_not_reached) {
+    std::cout << "Load Balanced Asynchronous Gibbs iteration" << std::endl;
+    if (blockmodel.num_blocks() == 1) {
+        return blockmodel;
+    }
+//    std::vector<std::vector<long>> thread_vertices = load_balance(graph);
+    std::vector<double> delta_entropies;
+    std::vector<long> vertex_moves;
+    std::vector<long> all_vertices = utils::range<long>(0, graph.num_vertices());
+    long total_vertex_moves = 0;
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    double initial_entropy = blockmodel.getOverall_entropy();
+    double last_entropy = initial_entropy;
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+        std::shuffle(all_vertices.begin(), all_vertices.end(), rng::generator());
+        long _vertex_moves = 0;
+        size_t num_batches = args.batches;
+        auto batch_size = size_t(ceil(double(graph.num_vertices()) / double(num_batches)));
+        long num_processed_vertices = 0;
+        for (size_t batch = 0; batch < num_batches; ++batch) {
+            size_t start = batch * batch_size;
+            size_t end = std::min(all_vertices.size(), (batch + 1) * batch_size);
+            std::vector<VertexMove_v3> moves(graph.num_vertices());
+            std::vector<std::vector<long>> thread_vertices = load_balance(graph, all_vertices, start, end);
+            double parallel_start_t = MPI_Wtime();
+            #pragma omp parallel default(none) shared(args, batch, blockmodel, std::cout, graph, moves, \
+            num_batches, thread_vertices, num_processed_vertices)
+            {
+                assert(omp_get_num_threads() == (int) thread_vertices.size());
+                std::vector<long> &vertices = thread_vertices[omp_get_thread_num()];
+//                auto batch_size = size_t(ceil(double(vertices.size()) / double(num_batches)));
+//                size_t start = batch * batch_size;
+//                size_t end = std::min(vertices.size(), (batch + 1) * batch_size);
+//                std::shuffle(vertices.begin() + long(start), vertices.begin() + long(end), rng::generator());
+                for (long vertex : vertices) { // size_t vertex_index = start; vertex_index < end; ++vertex_index) {
+//                    long vertex = vertices[vertex_index];
+                    VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
+                    moves[vertex] = proposal;
+                    #pragma omp atomic
+                    num_processed_vertices++;
+                }
+            }
+            double end_parallel_t = MPI_Wtime();
+            timers::MCMC_parallel_time += end_parallel_t - parallel_start_t;
+            for (const VertexMove_v3 &move: moves) {
+                if (!move.did_move) continue;
+                if (blockmodel.move_vertex(move)) {
+                    _vertex_moves++;
+                }
+            }
+            timers::MCMC_vertex_move_time += MPI_Wtime() - end_parallel_t;
+        }
+        assert(num_processed_vertices == graph.num_vertices());
+        double entropy = entropy::mdl(blockmodel, graph);
+        double delta_entropy = entropy - last_entropy;
+        delta_entropies.push_back(delta_entropy);
+        last_entropy = entropy;
+        vertex_moves.push_back(_vertex_moves);
+        timers::MCMC_moves += _vertex_moves;
+        std::cout << "Itr: " << iteration << ", number of vertex moves: " << _vertex_moves << ", delta S: ";
+        std::cout << delta_entropy / initial_entropy << std::endl;
+        total_vertex_moves += _vertex_moves;
+        timers::MCMC_iterations++;
+        // Early stopping
+        if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+    std::cout << blockmodel.getOverall_entropy() << std::endl;
+    return blockmodel;
+}
+
+EdgeWeights block_edge_weights(const std::vector<long> &block_assignment, const EdgeWeights &neighbor_weights) {
+    std::map<long, long> block_counts;
+    for (ulong i = 0; i < neighbor_weights.indices.size(); ++i) {
+        long neighbor = neighbor_weights.indices[i];
+        long block = block_assignment[neighbor];
+        long weight = neighbor_weights.values[i];
+        block_counts[block] += weight; // block_count[new block] should initialize to 0
+    }
+    std::vector<long> blocks;
+    std::vector<long> weights;
+    for (auto const &entry: block_counts) {
+        blocks.push_back(entry.first);
+        weights.push_back(entry.second);
+    }
+    return EdgeWeights{blocks, weights};
+}
+
+Delta blockmodel_delta(long vertex, long current_block, long proposed_block, const EdgeWeights &out_edges,
+                       const EdgeWeights &in_edges, const Blockmodel &blockmodel) {
+    Delta delta(current_block, proposed_block, long(std::max(out_edges.indices.size(), in_edges.indices.size())));
+
+    // current_block -> current_block == proposed_block --> proposed_block  (this includes self edges)
+    // current_block --> other_block == proposed_block --> other_block
+    // other_block --> current_block == other_block --> proposed_block
+    // current_block --> proposed_block == proposed_block --> proposed_block
+    // proposed_block --> current_block == proposed_block --> proposed_block
+    for (size_t i = 0; i < out_edges.indices.size(); ++i) {
+        long out_vertex = out_edges.indices[i];
+        long out_block = blockmodel.block_assignment(out_vertex);
+        long edge_weight = out_edges.values[i];
+        if (vertex == out_vertex) {
+            delta.add(proposed_block, proposed_block, edge_weight);
+            delta.self_edge_weight(1);
+        } else {
+            delta.add(proposed_block, out_block, edge_weight);
+        }
+        delta.sub(current_block, out_block, edge_weight);
+    }
+    for (size_t i = 0; i < in_edges.indices.size(); ++i) {
+        long in_vertex = in_edges.indices[i];
+        long in_block = blockmodel.block_assignment(in_vertex);
+        long edge_weight = in_edges.values[i];
+        if (vertex == in_vertex) {
+            delta.add(proposed_block, proposed_block, edge_weight);
+            delta.self_edge_weight(1);
+        } else {
+            delta.add(in_block, proposed_block, edge_weight);
+        }
+        delta.sub(in_block, current_block, edge_weight);
+    }
+    return delta;
+}
+
+std::pair<std::vector<long>, long> count_low_degree_block_neighbors(const Graph &graph, const Blockmodel &blockmodel) {
+    const std::vector<long> &low_degree_vertices = graph.low_degree_vertices();
+    std::vector<long> result = utils::constant<long>(blockmodel.num_blocks(), 0);
+    long total = 0;
+    for (long vertex : low_degree_vertices) {
+        long block = blockmodel.block_assignment(vertex);
+        long neighbors = blockmodel.blockmatrix()->distinct_edges(block);
+        result[block] = std::max(result[block], neighbors);
+        total += neighbors;
+    }
+    std::vector<long> ownership = utils::constant<long>((long) low_degree_vertices.size(), -1);
+    long thread = 0;
+    long current_neighbors = 0;
+    long average_neighbors = total / omp_get_max_threads();
+    for (long index = 0; index < (long) low_degree_vertices.size(); ++index) {
+        long vertex = low_degree_vertices[index];
+        long block = blockmodel.block_assignment(vertex);
+        long neighbors = result[block];
+        current_neighbors += neighbors;
+        ownership[index] = thread;
+        if (current_neighbors >= average_neighbors && thread < (omp_get_max_threads() - 1)) {
+            current_neighbors = 0;
+            thread++;
+        }
+    }
+    return std::make_pair(ownership, total);
+}
+
+bool early_stop(long iteration, bool golden_ratio_not_reached, double initial_entropy,
+                std::vector<double> &delta_entropies) {
+    size_t last_index = delta_entropies.size() - 1;
+    if (delta_entropies[last_index] == 0.0) {
+        return true;
+    }
+    if (iteration < 3) {
+        return false;
+    }
+    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
+    average /= -3.0;
+    double threshold;
+    if (golden_ratio_not_reached) { // Golden ratio bracket not yet established
+        threshold = 5e-4 * initial_entropy;
+    } else {
+        threshold = 1e-4 * initial_entropy;
+    }
+    return average < threshold;
+}
+
+bool early_stop(long iteration, double initial_entropy, std::vector<double> &delta_entropies) {
+    if (iteration < 3) {
+        return false;
+    }
+    size_t last_index = delta_entropies.size() - 1;
+    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
+    average /= -3.0;
+    double threshold = 1e-4 * initial_entropy;
+    return average < threshold;
+}
+
+bool early_stop_parallel(long iteration, bool golden_ratio_not_reached, double initial_entropy,
+                         std::vector<double> &delta_entropies, std::vector<long> &vertex_moves) {
+    size_t last_index = delta_entropies.size() - 1;
+    if (vertex_moves[last_index] == 0) {
+        return true;
+    }
+    if (iteration < 4) {
+        return false;
+    }
+    double average = delta_entropies[last_index] + delta_entropies[last_index - 1] + delta_entropies[last_index - 2];
+    average /= -3.0;
+    double threshold;
+    if (golden_ratio_not_reached) { // Golden ratio bracket not yet established
+        threshold = 5e-4 * initial_entropy;
+    } else {
+        threshold = 1e-4 * initial_entropy;
+    }
+    if (average < threshold) return true;
+    long max = std::max(vertex_moves[last_index - 1],
+                       std::max(vertex_moves[last_index - 2], vertex_moves[last_index - 3]));
+    return vertex_moves[last_index] > max;
+}
+
+[[maybe_unused]] EdgeCountUpdates edge_count_updates(ISparseMatrix *blockmodel, long current_block, long proposed_block,
+                                                     EdgeWeights &out_blocks, EdgeWeights &in_blocks,
+                                                     long self_edge_weight) {
+    std::vector<long> block_row = blockmodel->getrow(current_block);
+    std::vector<long> block_col = blockmodel->getcol(current_block);
+    std::vector<long> proposal_row = blockmodel->getrow(proposed_block);
+    std::vector<long> proposal_col = blockmodel->getcol(proposed_block);
+
+    long count_in_block = 0, count_out_block = 0;
+    long count_in_proposal = self_edge_weight, count_out_proposal = self_edge_weight;
+
+    for (ulong i = 0; i < in_blocks.indices.size(); ++i) {
+        long index = in_blocks.indices[i];
+        long value = in_blocks.values[i];
+        if (index == current_block) {
+            count_in_block += value;
+        }
+        if (index == proposed_block) {
+            count_in_proposal += value;
+        }
+        block_col[index] -= value;
+        proposal_col[index] += value;
+    }
+    for (ulong i = 0; i < out_blocks.indices.size(); ++i) {
+        long index = out_blocks.indices[i];
+        long value = out_blocks.values[i];
+        if (index == current_block) {
+            count_out_block += value;
+        }
+        if (index == proposed_block) {
+            count_out_proposal += value;
+        }
+        block_row[index] -= value;
+        proposal_row[index] += value;
+    }
+
+    proposal_row[current_block] -= count_in_proposal;
+    proposal_row[proposed_block] += count_in_proposal;
+    proposal_col[current_block] -= count_out_proposal;
+    proposal_col[proposed_block] += count_out_proposal;
+
+    block_row[current_block] -= count_in_block;
+    block_row[proposed_block] += count_in_block;
+    block_col[current_block] -= count_out_block;
+    block_col[proposed_block] += count_out_block;
+
+    return EdgeCountUpdates{block_row, proposal_row, block_col, proposal_col};
+}
+
+// TODO: remove double counts from the edge count updates? But then we'll have to figure out how to correctly update
+// the blockmodel since we won't be able to do bm.column[block/proposal] = updates.block/proposal_col
+void edge_count_updates_sparse(const Blockmodel &blockmodel, long vertex, long current_block, long proposed_block,
+                               EdgeWeights &out_edges, EdgeWeights &in_edges, SparseEdgeCountUpdates &updates) {
+    updates.block_row = blockmodel.blockmatrix()->getrow_sparse(current_block);
+    updates.block_col = blockmodel.blockmatrix()->getcol_sparse(current_block);
+    updates.proposal_row = blockmodel.blockmatrix()->getrow_sparse(proposed_block);
+    updates.proposal_col = blockmodel.blockmatrix()->getcol_sparse(proposed_block);
+
+    for (size_t i = 0; i < out_edges.indices.size(); ++i) {
+        long out_vertex = out_edges.indices[i];
+        long out_block = blockmodel.block_assignment(out_vertex);
+        long edge_weight = out_edges.values[i];
+        if (vertex == out_vertex) {
+            updates.proposal_row[proposed_block] += edge_weight;
+            updates.proposal_col[proposed_block] += edge_weight;
+        } else {
+            updates.proposal_row[out_block] += edge_weight;
+            if (out_block == proposed_block)
+                updates.proposal_col[proposed_block] += edge_weight;
+            if (out_block == current_block)
+                updates.block_col[proposed_block] += edge_weight;
+        }
+        updates.block_row[out_block] -= edge_weight;
+        if (out_block == current_block)
+            updates.block_col[current_block] -= edge_weight;
+        if (out_block == proposed_block)
+            updates.proposal_col[current_block] -= edge_weight;
+    }
+    for (size_t i = 0; i < in_edges.indices.size(); ++i) {
+        long in_vertex = in_edges.indices[i];
+        long in_block = blockmodel.block_assignment(in_vertex);
+        long edge_weight = in_edges.values[i];
+        if (vertex == in_vertex) {
+            updates.proposal_col[proposed_block] += edge_weight;
+            updates.proposal_row[proposed_block] += edge_weight;
+        } else {
+            updates.proposal_col[in_block] += edge_weight;
+            if (in_block == proposed_block)
+                updates.proposal_row[proposed_block] += edge_weight;
+            if (in_block == current_block)
+                updates.block_row[proposed_block] += edge_weight;
+        }
+        updates.block_col[in_block] -= edge_weight;
+        if (in_block == current_block)
+            updates.block_row[current_block] -= edge_weight;
+        if (in_block == proposed_block)
+            updates.proposal_row[current_block] -= edge_weight;
+    }
+}
+
+EdgeWeights edge_weights(const NeighborList &neighbors, long vertex, bool ignore_self) {
+    std::vector<long> indices;
+    std::vector<long> values;
+    // Assumes graph is unweighted
+    const std::vector<long> &neighbor_vector = neighbors[vertex];
+    for (const long neighbor: neighbor_vector) {
+        if (ignore_self && neighbor == vertex) continue;
+        indices.push_back(neighbor);
+        values.push_back(1);
+    }
+    return EdgeWeights{indices, values};
+}
+
+VertexMove eval_vertex_move(long vertex, long current_block, utils::ProposalAndEdgeCounts proposal,
+                            const Blockmodel &blockmodel, const Graph &graph, EdgeWeights &out_edges,
+                            EdgeWeights &in_edges) {
+//    if (args.nodelta)
+//        return eval_vertex_move_nodelta(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
+    const Delta delta = blockmodel_delta(vertex, current_block, proposal.proposal, out_edges, in_edges, blockmodel);
+    double hastings = entropy::hastings_correction(vertex, graph, blockmodel, delta, current_block, proposal);
+    double delta_entropy = !args.parametric ?
+            entropy::nonparametric::delta_mdl(blockmodel, graph, vertex, delta, proposal) :
+            entropy::delta_mdl(blockmodel, delta, proposal);
+    if (accept(delta_entropy, hastings))
+        return VertexMove{delta_entropy, true, vertex, proposal.proposal};
+    return VertexMove{delta_entropy, false, -1, -1};
+}
+
+VertexMove_v3 eval_vertex_move_v3(long vertex, long current_block, utils::ProposalAndEdgeCounts proposal,
+                                 const Blockmodel &blockmodel, const Graph &graph, EdgeWeights &out_edges,
+                                 EdgeWeights &in_edges) {
+    // TODO: things like size and such should be ulong/size_t, not long.
+    Vertex v = { vertex, long(graph.out_neighbors(vertex).size()), long(graph.in_neighbors(vertex).size()) };
+    const Delta delta = blockmodel_delta(vertex, current_block, proposal.proposal, out_edges, in_edges, blockmodel);
+    double hastings = entropy::hastings_correction(vertex, graph, blockmodel, delta, current_block, proposal);
+    double delta_entropy = !args.parametric ?
+                           entropy::nonparametric::delta_mdl(blockmodel, graph, vertex, delta, proposal) :
+                           entropy::delta_mdl(blockmodel, delta, proposal);
+    if (accept(delta_entropy, hastings))
+        return VertexMove_v3{delta_entropy, true, v, proposal.proposal, out_edges, in_edges};
+//        return VertexMove_v2{delta_entropy, true, vertex, proposal.proposal, out_edges, in_edges};
+    return VertexMove_v3{delta_entropy, false, InvalidVertex, -1, out_edges, in_edges};
+//    return VertexMove_v2{delta_entropy, false, -1, -1, out_edges, in_edges};
+}
+
+Blockmodel &hybrid_mcmc_load_balanced(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
+        std::cout << "Hybrid MCMC iteration" << std::endl;
+        if (blockmodel.num_blocks() == 1) {
+            return blockmodel;
+        }
+        std::vector<double> delta_entropies;
+        long total_vertex_moves = 0;
+        blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+        double initial_entropy = blockmodel.getOverall_entropy();
+        double num_batches = args.batches;
+        long num_low_degree_vertices = long(graph.low_degree_vertices().size());
+        long batch_size = long(ceil(num_low_degree_vertices / num_batches));
+        std::vector<unsigned long> thread_degrees(omp_get_max_threads());
+//        std::vector<std::pair<long,long>> vertex_properties = sort_vertices_by_degree(graph);
+
+        for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+//            std::vector<std::pair<long,long>> block_neighbors = sort_vertices_by_degree(graph);
+            for (long i = 0; i < omp_get_max_threads(); ++i) {
+                thread_degrees[i] = 0;
+            }
+            num_surrounded = 0;
+            long vertex_moves = 0;
+            double delta_entropy = 0.0;
+            double start_t = MPI_Wtime();
+            for (long vertex : graph.high_degree_vertices()) {  // Only run Metropolis-Hastings on high-degree vertices
+                VertexMove proposal = propose_move(blockmodel, vertex, graph);
+                if (proposal.did_move) {
+                    vertex_moves++;
+                    delta_entropy += proposal.delta_entropy;
+                }
+            }
+            double sequential_t = MPI_Wtime();
+            timers::MCMC_sequential_time += sequential_t - start_t;
+            std::pair<std::vector<long>, long> block_neighbors = count_low_degree_block_neighbors(graph, blockmodel);
+            // TODO: make sure that with batches, we still go over every vertex in the graph
+            for (long batch = 0; batch < num_low_degree_vertices / batch_size; ++batch) {
+                long start = batch * batch_size;
+                long end = std::min(num_low_degree_vertices, (batch + 1) * batch_size);
+                // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
+                // asynchronous Gibbs sampling
+                std::vector<long> block_assignment(blockmodel.block_assignment());
+                std::vector<VertexMove_v3> moves(graph.num_vertices());
+//                omp_set_dynamic(0);
+                start_t = MPI_Wtime();
+                #pragma omp parallel default(none) shared(start, end, blockmodel, graph, vertex_moves, delta_entropy, block_assignment, moves, thread_degrees, block_neighbors, std::cout)
+                {
+                    long thread_id = omp_get_thread_num();
+                    if (thread_id == 0)
+                        std::cout << "Using " << omp_get_num_threads() << "/" << omp_get_max_threads() << " threads!" << std::endl;
+                    // TODO: figure out how to make this happen once per iteration
+//                    std::vector<bool> my_blocks = load_balance(blockmodel, block_neighbors);
+//                    std::vector<bool> my_vertices = load_balance_vertices(graph, vertex_properties);
+//                    std::vector<bool> my_vertices = load_balance_block_neighbors(graph, blockmodel, block_neighbors);
+//                    long num_processed = 0;
+                    for (long index = start; index < end; ++index) {
+                        if (block_neighbors.first[index] != thread_id) continue;
+                        long vertex = graph.low_degree_vertices()[index];
+//                        if (!my_vertices[vertex]) continue;
+                        long block = blockmodel.block_assignment(vertex);
+//                        if (!my_blocks[block]) continue;  // Only process the vertices this thread is responsible for
+                        unsigned long num_neighbors = blockmodel.blockmatrix()->distinct_edges(block);
+                        thread_degrees[thread_id] += num_neighbors;
+                        VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
+                        if (proposal.did_move) {
+                            #pragma omp atomic
+                            vertex_moves++;
+                            delta_entropy += proposal.delta_entropy;
+                            block_assignment[vertex] = proposal.proposed_block;
+                        }
+                        moves[vertex] = proposal;
+//                        num_processed++;
+                    }
+//                    std::cout << thread_id << ": " << num_processed << std::endl;
+                }
+                double parallel_t = MPI_Wtime();
+                timers::MCMC_parallel_time += parallel_t - start_t;
+                for (const VertexMove_v3 &move : moves) {
+                    if (!move.did_move) continue;
+                    blockmodel.move_vertex(move);
+                }
+                timers::MCMC_vertex_move_time += MPI_Wtime() - parallel_t;
+            }
+            delta_entropies.push_back(delta_entropy);
+            std::cout << "Itr: " << iteration << ", number of vertex moves: " << vertex_moves << ", delta S: ";
+            std::cout << delta_entropy / initial_entropy << ", num surrounded vertices: " << num_surrounded << std::endl;
+            total_vertex_moves += vertex_moves;
+            timers::MCMC_iterations++;
+            // Early stopping
+            if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
+                break;
+            }
+        }
+        blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+        std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+        std::cout << blockmodel.getOverall_entropy() << std::endl;
+        timers::MCMC_moves += total_vertex_moves;
+        return blockmodel;
+    }
+
+Blockmodel &hybrid_mcmc(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
+    std::cout << "Hybrid MCMC iteration" << std::endl;
+    if (blockmodel.num_blocks() == 1) {
+        return blockmodel;
+    }
+    std::vector<double> delta_entropies;
+    std::vector<long> vertex_moves;
+    long total_vertex_moves = 0;
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    double initial_entropy = blockmodel.getOverall_entropy();
+    double last_entropy = initial_entropy;
+    double num_batches = args.batches;
+    long num_low_degree_vertices = long(graph.low_degree_vertices().size());
+    long batch_size = long(ceil(num_low_degree_vertices / num_batches));
+    std::vector<long> hdv = graph.high_degree_vertices();
+    std::vector<long> ldv = graph.low_degree_vertices();
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+//        std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
+        num_surrounded = 0;
+        long _vertex_moves = 0;
+        double start_t = MPI_Wtime();
+        if (!args.ordered) {
+            unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
+            std::shuffle(hdv.begin(), hdv.end(), std::mt19937_64(seed));
+            std::shuffle(ldv.begin(), ldv.end(), std::mt19937_64(seed));
+        }
+        for (long vertex : hdv) {  // Only run Metropolis-Hastings on high-degree vertices
+            VertexMove proposal = propose_move(blockmodel, vertex, graph);
+            if (proposal.did_move) {
+                _vertex_moves++;
+            }
+        }
+        double sequential_t = MPI_Wtime();
+        timers::MCMC_sequential_time += sequential_t - start_t;
+//        assert(blockmodel.validate(graph));
+        for (long batch = 0; batch < num_low_degree_vertices / batch_size; ++batch) {
+            start_t = MPI_Wtime();
+            long start = batch * batch_size;
+            long end = std::min(num_low_degree_vertices, (batch + 1) * batch_size);
+            // Block assignment used to re-create the Blockmodel after each batch to improve mixing time of
+            // asynchronous Gibbs sampling
+            std::vector<VertexMove_v3> moves(graph.num_vertices());
+            #pragma omp parallel for schedule(dynamic) default(none) \
+            shared(start, end, blockmodel, graph, _vertex_moves, moves, ldv)
+            for (long index = start; index < end; ++index) {
+                long vertex = ldv[index];
+                VertexMove_v3 proposal = propose_gibbs_move_v3(blockmodel, vertex, graph);
+                if (proposal.did_move) {
+                    #pragma omp atomic
+                    _vertex_moves++;
+                }
+                moves[vertex] = proposal;
+            }
+            double parallel_t = MPI_Wtime();
+            timers::MCMC_parallel_time += parallel_t - start_t;
+            for (const VertexMove_v3 &move : moves) {
+                if (!move.did_move) continue;
+                if (blockmodel.move_vertex(move)) {
+                    _vertex_moves++;
+                }
+            }
+            timers::MCMC_vertex_move_time += MPI_Wtime() - parallel_t;
+//            assert(blockmodel.validate(graph));
+        }
+        double entropy = entropy::mdl(blockmodel, graph);
+        double delta_entropy = entropy - last_entropy;
+        delta_entropies.push_back(delta_entropy);
+        last_entropy = entropy;
+        vertex_moves.push_back(_vertex_moves);
+        std::cout << "Itr: " << iteration << ", number of vertex moves: " << _vertex_moves << ", delta S: ";
+        std::cout << delta_entropy / initial_entropy << ", num surrounded vertices: " << num_surrounded << std::endl;
+        total_vertex_moves += _vertex_moves;
+        timers::MCMC_iterations++;
+        // Early stopping
+        if (early_stop(iteration, golden_ratio_not_reached, initial_entropy, delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    timers::MCMC_moves += total_vertex_moves;
+    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+    std::cout << blockmodel.getOverall_entropy() << std::endl;
+    return blockmodel;
+}
+
+std::vector<std::vector<long>> load_balance(const Graph &graph) {
+    std::vector<std::vector<long>> thread_vertices(args.threads);
+    std::vector<long> vertex_degrees = graph.degrees();
+    std::vector<long> sorted_indices = utils::argsort<long>(vertex_degrees);
+    for (size_t index = 0; index < size_t(graph.num_vertices()); ++index) {
+        long vertex = sorted_indices[index];
+        size_t thread_id = index % (2 * args.threads);
+        if (thread_id >= size_t(args.threads))
+            thread_id = ((2 * args.threads) - 1) - thread_id;
+        thread_vertices[thread_id].push_back(vertex);
+    }
+    return thread_vertices;
+}
+
+std::vector<std::vector<long>> load_balance(const Graph &graph, const std::vector<long> &all_vertices,
+                                            size_t start_index, size_t end_index) {
+    std::vector<std::vector<long>> thread_vertices(args.threads);
+    std::vector<long> vertex_degrees;
+    for (size_t i = start_index; i < end_index; ++i) {
+        long vertex = all_vertices[i];
+        vertex_degrees.push_back(graph.degree(vertex));
+    }
+    std::vector<long> sorted_indices = utils::argsort<long>(vertex_degrees);
+    for (size_t index = 0; index < vertex_degrees.size(); ++index) {
+        size_t sorted_index = sorted_indices[index];
+        size_t vertex_index = sorted_index + start_index;
+        long vertex = all_vertices[vertex_index];
+        size_t thread_id = sorted_index % (2 * args.threads);
+        if (thread_id >= size_t(args.threads))
+            thread_id = ((2 * args.threads) - 1) - thread_id;
+        thread_vertices[thread_id].push_back(vertex);
+    }
+    return thread_vertices;
+}
+
+
+std::vector<bool> load_balance(const Blockmodel &blockmodel, const std::vector<std::pair<long, long>> &block_neighbors) {
+    // Decide which blocks each thread is responsible for
+    long thread_id = omp_get_thread_num();
+    std::vector<bool> my_blocks = utils::constant<bool>(blockmodel.num_blocks(), false);
+    for (long i = thread_id; i < blockmodel.num_blocks(); i += 2 * omp_get_max_threads()) {
+        long block = block_neighbors[i].first;
+        my_blocks[block] = true;
+    }
+    for (long i = 2 * omp_get_max_threads() - 1 - thread_id; i < blockmodel.num_blocks(); i += 2 * omp_get_max_threads()) {
+        long block = block_neighbors[i].first;
+        my_blocks[block] = true;
+    }
+    return my_blocks;
+}
+
+std::vector<bool> load_balance_block_neighbors(const Graph &graph, const Blockmodel &blockmodel,
+                                               const std::pair<std::vector<long>, long> &block_neighbors) {
+    // Decide which blocks each thread is responsible for
+    long thread_id = omp_get_thread_num();
+    std::vector<bool> my_vertices = utils::constant<bool>(graph.num_vertices(), false);
+    long total_neighbors = block_neighbors.second;
+//    for (long num_neighbors : block_neighbors.first) {
+//        total_neighbors += num_neighbors;
+//    }
+    long average_neighbors = (long) total_neighbors / omp_get_max_threads();
+    if (thread_id == 4)
+        std::cout << "average neighbors: " << average_neighbors << " total neighbors: " << total_neighbors << std::endl << " threads: " << omp_get_max_threads();
+    long thread = 0;
+    long current_neighbors = 0;
+    for (long vertex : graph.low_degree_vertices()) {
+        long block = blockmodel.block_assignment(vertex);
+        current_neighbors += block_neighbors.first[block];
+    }
+    assert(current_neighbors == total_neighbors);
+    current_neighbors = 0;
+    for (long vertex : graph.low_degree_vertices()) {
+        long block = blockmodel.block_assignment(vertex);
+        long neighbors = block_neighbors.first[block];
+        if (thread == thread_id) {
+            my_vertices[vertex] = true;
+        }
+        current_neighbors += neighbors;
+        if (current_neighbors >= average_neighbors && thread < (omp_get_max_threads() - 1)) {
+            current_neighbors = 0;
+            thread++;
+        }
+        if (thread_id == 4) {
+            std::cout << "block: " << block << " neighbors: " << neighbors << " thread: " << thread << " current_neighbors: " << current_neighbors << std::endl;
+        }
+    }
+    if (thread_id == 4)
+        std::cout << "average neighbors: " << average_neighbors << " total neighbors: " << total_neighbors << std::endl << " threads: " << omp_get_max_threads();
+    return my_vertices;
+}
+
+std::vector<bool> load_balance_vertices(const Graph &graph, const std::vector<std::pair<long, long>> &vertex_properties) {
+    // Decide which blocks each thread is responsible for
+    long thread_id = omp_get_thread_num();
+    std::vector<bool> my_vertices = utils::constant<bool>(graph.num_vertices(), false);
+    for (long i = thread_id; i < graph.num_vertices(); i += 2 * omp_get_max_threads()) {
+        long vertex = vertex_properties[i].first;
+        my_vertices[vertex] = true;
+    }
+    for (long i = 2 * omp_get_max_threads() - 1 - thread_id; i < graph.num_vertices(); i += 2 * omp_get_max_threads()) {
+        long vertex = vertex_properties[i].first;
+        my_vertices[vertex] = true;
+    }
+    return my_vertices;
+}
+
+Blockmodel &mcmc(int iteration, const Graph &graph, Blockmodel &blockmodel, BlockmodelTriplet &blockmodel_triplet) {
+//    timers::MCMC_moves = 0;
+//    timers::MCMC_iterations = 0;
+//    timers::MCMC_vertex_move_time = 0;
+//    timers::MCMC_parallel_time = 0;
+//    timers::MCMC_sequential_time = 0;
+    common::candidates = std::uniform_int_distribution<long>(0, blockmodel.num_blocks() - 2);
+//    std::cout << "Starting MCMC vertex moves" << std::endl;
+    if (args.algorithm == "async_gibbs" && iteration < args.asynciterations)
+        blockmodel = finetune::asynchronous_gibbs(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
+    else if (args.algorithm == "async_gibbs_load_balanced" && iteration < args.asynciterations)
+        blockmodel = finetune::asynchronous_gibbs_load_balanced(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
+    else if (args.algorithm == "hybrid_mcmc")
+        blockmodel = finetune::hybrid_mcmc(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
+    else // args.algorithm == "metropolis_hastings"
+        blockmodel = finetune::metropolis_hastings(blockmodel, graph, blockmodel_triplet.golden_ratio_not_reached());
+    return blockmodel;
+}
+
+Blockmodel &metropolis_hastings(Blockmodel &blockmodel, const Graph &graph, bool golden_ratio_not_reached) {
+    std::cout << "Metropolis hastings iteration" << std::endl;
+    if (blockmodel.num_blocks() == 1) {
+        return blockmodel;
+    }
+    std::vector<double> delta_entropies;
+    long total_vertex_moves = 0;
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    std::vector<long> shuffled_vertices;
+    if (!args.ordered) {
+        shuffled_vertices = utils::range<long>(0, graph.num_vertices());
+    }
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+        if (!args.ordered) {
+            unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
+            std::shuffle(shuffled_vertices.begin(), shuffled_vertices.end(), std::mt19937_64(seed));
+        }
+        long vertex_moves = 0;
+        double delta_entropy = 0.0;
+        double start_t = MPI_Wtime();
+        for (long index = 0; index < graph.num_vertices(); ++index) {
+            long vertex = index;
+            if (!args.ordered) {
+                vertex = shuffled_vertices[index];
+            }
+            VertexMove proposal = propose_move(blockmodel, vertex, graph);
+            if (proposal.did_move) {
+                vertex_moves++;
+                delta_entropy += proposal.delta_entropy;
+            }
+        }
+        timers::MCMC_sequential_time += MPI_Wtime() - start_t;
+        delta_entropies.push_back(delta_entropy);
+        std::cout << "Itr: " << iteration << ", number of vertex moves: " << vertex_moves << ", delta S: ";
+        std::cout << delta_entropy << std::endl;
+        total_vertex_moves += vertex_moves;
+        timers::MCMC_iterations++;
+        // Early stopping
+        if (early_stop(iteration, golden_ratio_not_reached, blockmodel.getOverall_entropy(), delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    timers::MCMC_moves += total_vertex_moves;
+    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+    std::cout << blockmodel.getOverall_entropy() << std::endl;
+    return blockmodel;
+}
+
+VertexMove move_vertex(long vertex, long current_block, utils::ProposalAndEdgeCounts proposal, Blockmodel &blockmodel,
+                       const Graph &graph, EdgeWeights &out_edges, EdgeWeights &in_edges) {
+//    if (args.nodelta)
+//        return move_vertex_nodelta(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
+    Delta delta = blockmodel_delta(vertex, current_block, proposal.proposal, out_edges, in_edges,
+                                   blockmodel);
+    double hastings = entropy::hastings_correction(vertex, graph, blockmodel, delta, current_block, proposal);
+    double delta_entropy = !args.parametric ?
+                           entropy::nonparametric::delta_mdl(blockmodel, graph, vertex, delta, proposal) :
+                           entropy::delta_mdl(blockmodel, delta, proposal);
+//    double delta_entropy = entropy::delta_mdl(blockmodel, delta, proposal);
+
+    if (accept(delta_entropy, hastings)) {
+        Vertex v = { vertex, (long) graph.out_neighbors(vertex).size(), (long) graph.in_neighbors(vertex).size() };
+        blockmodel.move_vertex(v, delta, proposal);
+        return VertexMove{delta_entropy, true, vertex, proposal.proposal};
+    }
+    return VertexMove{delta_entropy, false, vertex, proposal.proposal};
+}
+
+VertexMove propose_move(Blockmodel &blockmodel, long vertex, const Graph &graph) {
+    bool did_move = false;
+    long current_block = blockmodel.block_assignment(vertex);
+    if (blockmodel.block_size(current_block) == 1) {
+        return VertexMove{std::numeric_limits<double>::max(), did_move, -1, -1 };
+    }
+    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
+    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
+
+    MapVector<long> neighbor_blocks;
+    for (long neighbor : out_edges.indices) {
+        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
+    }
+    for (long neighbor : in_edges.indices) {
+        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
+    }
+    if (neighbor_blocks.size() == 1) {
+        num_surrounded += 1;
+    }
+
+    utils::ProposalAndEdgeCounts proposal = common::propose_new_block(
+            current_block, out_edges, in_edges, blockmodel.block_assignment(), blockmodel, false);
+    if (proposal.proposal == current_block) {
+        return VertexMove{0.0, did_move, -1, -1 };
+    }
+
+    return move_vertex(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
+}
+
+VertexMove_v3 propose_gibbs_move_v3(const Blockmodel &blockmodel, long vertex, const Graph &graph) {
+    bool did_move = false;
+    long current_block = blockmodel.block_assignment(vertex);
+    // TODO: need to do this more intelligently. Instead of preventing moves here, prevent them in the code that
+    // actually does the moves.
+//    if (blockmodel.block_size(current_block) <= args.threads) {
+//        return VertexMove_v3{ 0.0, did_move, InvalidVertex, -1 };
+//    }
+
+    EdgeWeights out_edges = edge_weights(graph.out_neighbors(), vertex, false);
+    EdgeWeights in_edges = edge_weights(graph.in_neighbors(), vertex, true);
+
+    MapVector<long> neighbor_blocks;
+    for (long neighbor : out_edges.indices) {
+        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
+    }
+    for (long neighbor : in_edges.indices) {
+        neighbor_blocks[blockmodel.block_assignment(neighbor)] += 1;
+    }
+    if (neighbor_blocks.size() == 1) {
+        num_surrounded += 1;
+    }
+
+    utils::ProposalAndEdgeCounts proposal = common::propose_new_block(current_block, out_edges, in_edges,
+                                                                      blockmodel.block_assignment(), blockmodel,
+                                                                      false);
+    if (proposal.proposal == current_block) {
+        return VertexMove_v3{0.0, did_move, {-1, -1, -1 }, -1, out_edges, in_edges};
+    }
+    return eval_vertex_move_v3(vertex, current_block, proposal, blockmodel, graph, out_edges, in_edges);
+}
+
+[[maybe_unused]] Blockmodel &finetune_assignment(Blockmodel &blockmodel, Graph &graph) {
+    std::cout << "Fine-tuning partition results after sample results have been extended to full graph" << std::endl;
+    std::vector<double> delta_entropies;
+    // TODO: Add number of finetuning iterations to evaluation
+    long total_vertex_moves = 0;
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    for (long iteration = 0; iteration < MAX_NUM_ITERATIONS; ++iteration) {
+        long vertex_moves = 0;
+        double delta_entropy = 0.0;
+        for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
+            VertexMove proposal = propose_move(blockmodel, vertex, graph);
+            if (proposal.did_move) {
+                vertex_moves++;
+                delta_entropy += proposal.delta_entropy;
+            }
+        }
+        delta_entropies.push_back(delta_entropy);
+        std::cout << "Itr: " << iteration << ", number of finetuning moves: " << vertex_moves << ", delta S: ";
+        std::cout << delta_entropy / blockmodel.getOverall_entropy() << std::endl;
+        total_vertex_moves += vertex_moves;
+        // Early stopping
+        if (early_stop(iteration, blockmodel.getOverall_entropy(), delta_entropies)) {
+            break;
+        }
+    }
+    blockmodel.setOverall_entropy(entropy::mdl(blockmodel, graph));
+    std::cout << "Total number of vertex moves: " << total_vertex_moves << ", overall entropy: ";
+    std::cout << blockmodel.getOverall_entropy() << std::endl;
+    return blockmodel;
+}
+
+std::vector<std::pair<long, long>> sort_blocks_by_neighbors(const Blockmodel &blockmodel) {
+    std::vector<std::pair<long, long>> block_neighbors;
+    for (long i = 0; i < blockmodel.num_blocks(); ++i) {
+        block_neighbors.emplace_back(std::make_pair(i, blockmodel.blockmatrix()->distinct_edges(i)));
+    }
+    utils::radix_sort(block_neighbors);
+//    std::sort(block_neighbors.begin(), block_neighbors.end(),
+//              [](const std::pair<long, long> &a, const std::pair<long, long> &b) { return a.second > b.second; });
+//    std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
+    return block_neighbors;
+}
+
+std::vector<std::pair<long, long>> sort_blocks_by_size(const Blockmodel &blockmodel) {
+    std::vector<std::pair<long,long>> block_sizes;
+    for (long i = 0; i < blockmodel.num_blocks(); ++i) {
+        block_sizes.emplace_back(std::make_pair(i, 0));
+    }
+    for (const long &block : blockmodel.block_assignment()) {
+        block_sizes[block].second++;
+    }
+    utils::radix_sort(block_sizes);
+//    std::sort(block_sizes.begin(), block_sizes.end(),
+//              [](const std::pair<long, long> &a, const std::pair<long, long> &b) { return a.second > b.second; });
+//    std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
+    return block_sizes;
+}
+
+std::vector<std::pair<long,long>> sort_vertices_by_degree(const Graph &graph) {
+    std::vector<std::pair<long,long>> vertex_degrees;
+    for (long vertex = 0; vertex < graph.num_vertices(); ++vertex) {
+        long degree = (long)(graph.out_neighbors(vertex).size() + graph.in_neighbors(vertex).size());
+        vertex_degrees.emplace_back(std::make_pair(vertex, degree));
+    }
+    utils::radix_sort(vertex_degrees);
+//    std::sort(vertex_degrees.begin(), vertex_degrees.end(),
+//              [](const std::pair<long, long> &a, const std::pair<long, long> &b) { return a.second > b.second; });
+//    std::cout << "thread_limit: " << omp_get_max_threads() << std::endl;
+    return vertex_degrees;
+}
+
+//namespace undirected {
+//
+//double mdl(const Blockmodel &blockmodel, long num_vertices, long num_edges) {
+//    std::cout << "undirected!" << std::endl;
+//    double log_posterior_p = blockmodel.log_posterior_probability(num_edges);
+//    if (std::isnan(log_posterior_p)) {
+//        std::cout << "nan in log posterior" << std::endl;
+//        exit(-5000);
+//    }
+//    double x = blockmodel.getNum_blocks() * (blockmodel.num_blocks() + 1.0) / (2.0 * num_edges);
+//    if (std::isnan(x)) {
+//        std::cout << "nan in X" << std::endl;
+//        exit(-5000);
+//    }
+//    double h = ((1 + x) * log(1 + x)) - (x * log(x));
+//    if (std::isnan(h)) {
+//        std::cout << "nan in h()" << std::endl;
+//    }
+//        // std::cout << "X: " << x << std::endl;
+//        // std::cout << "log(X): " << log(x) << std::endl;
+//    if (std::isnan(h)) {
+//        exit(-5000);
+//    }
+//    double first = (num_edges * h) + (num_vertices * log(blockmodel.num_blocks()));
+//    std::cout << "first: " << first << " log_posterior: " << log_posterior_p << std::endl;
+//    double result = (num_edges * h) + (num_vertices * log(blockmodel.num_blocks())) - log_posterior_p;
+//    if (std::isnan(result)) {
+//        std::cout << "nan in result" << std::endl;
+//        exit(-5000);
+//    }
+//    return result;
+//}
+//
+//}  // namespace undirected
+
+}  // namespace finetune
diff --git a/src/graph.cpp b/src/graph.cpp
index 22d2c6f..1a5eb18 100644
--- a/src/graph.cpp
+++ b/src/graph.cpp
@@ -1,6 +1,6 @@
 #include "graph.hpp"
 
-#include <execution>
+// #include <execution>  // TBB dependency removed
 #include "mpi.h"
 
 #include "globals.hpp"
@@ -233,16 +233,18 @@ void Graph::parse_undirected(NeighborList &in_neighbors, NeighborList &out_neigh
 }
 
 void Graph::sort_vertices() {
-    if (args.degreeproductsort) {
+    if (!args.vertex_degree_sort) {
+        // Default: use edge degree product for more accurate high-influence vertex identification
         this->degree_product_sort();
         return;
     }
+    // Alternative: use vertex degree (faster but less accurate)
 //    std::cout << "Starting to sort vertices" << std::endl;
 //    double start_t = MPI_Wtime();
     std::vector<long> vertex_degrees = this->degrees();
     std::vector<int> indices = utils::range<int>(0, this->_num_vertices);
     // std::nth_element(std::execution::par_unseq, indices.data(), indices.data() + int(args.mh_percent * this->_num_vertices),
-    std::stable_sort(std::execution::par_unseq, indices.data(),
+    std::stable_sort(indices.data(),
               indices.data() + indices.size(), [&vertex_degrees](size_t i1, size_t i2) {
               return vertex_degrees[i1] > vertex_degrees[i2];
     });
@@ -309,7 +311,7 @@ std::vector<std::pair<std::pair<long, long>, long>> Graph::sorted_edge_list() co
             edge_info.emplace_back(std::make_pair(source, dest), information);
         }
     }
-    std::stable_sort(std::execution::par_unseq, edge_info.begin(), edge_info.end(), [](const auto &i1, const auto &i2) {
+    std::stable_sort(edge_info.begin(), edge_info.end(), [](const auto &i1, const auto &i2) {
         return i1.second > i2.second;
     });
     return edge_info;
diff --git a/src/sample.cpp b/src/sample.cpp
index 85fba03..5112ee7 100644
--- a/src/sample.cpp
+++ b/src/sample.cpp
@@ -275,7 +275,7 @@ Sample snowball(const Graph &graph, int subgraph_index, int num_subgraphs) {
         // I think the problem here is that the starting vertices aren't the same between ranks. Should broadcast the top n
         // vertices out to every rank, or have one rank do the sampling for all ranks and then broadcast results
 	// std::nth_element(std::execution::par_unseq, indices.data(), indices.data() + num_subgraphs,
-	std::stable_sort(std::execution::par_unseq, indices.data(),
+	std::stable_sort(indices.data(),
                          indices.data() + indices.size(), [&vertex_degrees](size_t i1, size_t i2) {
                     return vertex_degrees[i1] > vertex_degrees[i2];
                 });
diff --git a/src/top_down_sbp.cpp b/src/top_down_sbp.cpp
index 5529d6a..0be27fb 100644
--- a/src/top_down_sbp.cpp
+++ b/src/top_down_sbp.cpp
@@ -574,7 +574,7 @@ std::pair<long, long> split_init_random(const Graph &subgraph) {
 
 std::pair<long, long> split_init_degree_weighted(const Graph &subgraph, const std::vector<long> &vertex_degrees) {
     std::vector<int> indices = utils::range<int>(0, subgraph.num_vertices());
-    std::nth_element(std::execution::par_unseq, indices.data(), indices.data() + (subgraph.num_vertices() / 10),
+    std::nth_element(indices.data(), indices.data() + (subgraph.num_vertices() / 10),
                      indices.data() + indices.size(), [&vertex_degrees](size_t i1, size_t i2) {
                 return vertex_degrees[i1] > vertex_degrees[i2];
             });
@@ -591,7 +591,7 @@ std::pair<long, long> split_init_degree_weighted(const Graph &subgraph, const st
 
 std::pair<long, long> split_init_high_degree(const Graph &subgraph, const std::vector<long> &vertex_degrees) {
     std::vector<int> indices = utils::range<int>(0, subgraph.num_vertices());
-    std::nth_element(std::execution::par_unseq, indices.data(), indices.data() + 3,
+    std::nth_element(indices.data(), indices.data() + 3,
                      indices.data() + indices.size(), [&vertex_degrees](size_t i1, size_t i2) {
                 return vertex_degrees[i1] > vertex_degrees[i2];
             });
diff --git a/src/utils.cpp b/src/utils.cpp
index 0284772..475e85f 100644
--- a/src/utils.cpp
+++ b/src/utils.cpp
@@ -262,9 +262,9 @@ void write_json(const std::vector<long> &block_assignment, double description_le
     output["Filepath"] = args.filepath;
     output["Tag"] = args.tag;
     output["Algorithm"] = args.algorithm;
-    output["Degree Product Sort"] = args.degreeproductsort;
+    output["Vertex Sort Method"] = args.vertex_degree_sort ? "Vertex Degree" : "Edge Degree Product";
     output["Data Distribution"] = args.distribute;
-    output["Greedy"] = args.greedy;
+    output["Hastings Correction"] = args.hastings_correction;
     output["Metropolis-Hastings Ratio"] = args.mh_percent;
     output["Overlap"] = args.overlap;
     output["Block Size Variation"] = args.blocksizevar;
diff --git a/test/nonparametric_entropy_test.cpp b/test/nonparametric_entropy_test.cpp
index bef1362..d1810b7 100644
--- a/test/nonparametric_entropy_test.cpp
+++ b/test/nonparametric_entropy_test.cpp
@@ -190,7 +190,7 @@ TEST_F(NonparametricEntropyTest, SpecialCaseShouldGiveCorrectDeltaMDL) {
     utils::print<long>(B4.block_assignment());
     Blockmodel B5 = B3.copy();
     std::cout << "before move_vertex" << std::endl;
-    args.nonparametric = true;
+    args.parametric = false;  // Use nonparametric mode
     VertexMove result = finetune::move_vertex(6, 3, proposal, B4, graph, out_edges, in_edges);
     std::cout << "============ B5.move_vertex()" << std::endl;
     B5.move_vertex(V6, deltas, proposal);
@@ -210,7 +210,7 @@ TEST_F(NonparametricEntropyTest, SpecialCaseShouldGiveCorrectDeltaMDL) {
 //    std::cout << "======== After move =======" << std::endl;
 //    B5.print_blockmodel();
     EXPECT_FLOAT_EQ(dE, result.delta_entropy);
-    args.nonparametric = false;
+    args.parametric = true;  // Use parametric mode
 }
 
 //TEST_F(NonparametricEntropyTest, NullModelMDLv1ShouldGiveCorrectMDLForSmallGraph) {