utils.diff.diff_collections helper function is not used directly in the hub, but still a useful tool to test two data collections for their diffs.
|
def diff_collections(b1, b2, use_parallel=True, step=10000): |
The existing use_parallel option was using ipython parallel, which is probably no longer working. We would like to have a new way to run diffs in parallel, without the dependency of ipython parallel. Typically, we don't need to parallelize across multiple machines, parallelizing on multiple CPU cores of the same machine should be good enough.