diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 27252d8..ae433ab 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -75,8 +75,6 @@ git push origin - `get_chunks` is the additional backend-specific argument for almost all the algorithms in nx-parallel and it's tested for all algorithms together in `nx_parallel/tests/test_get_chunks.py` file. -- We use `@pytest.mark.order` for the tests that change the global configurations (i.e. `nx.config`) to make sure those tests run in the specified order and don't cause unexpected failures. - ## Documentation syntax For displaying a small note about nx-parallel's implementation at the end of the main NetworkX documentation, we use the `backend_info` [entry_point](https://packaging.python.org/en/latest/specifications/entry-points/#entry-points) (in the `pyproject.toml` file). The [`get_info` function](./_nx_parallel/__init__.py) is used to parse the docstrings of all the algorithms in nx-parallel and display the nx-parallel specific documentation on the NetworkX's main docs, in the "Additional Backend implementations" box, as shown in the screenshot below. @@ -110,7 +108,7 @@ def parallel_func(G, nx_arg, additional_backend_arg_1, additional_backend_arg_2= In parallel computing, "chunking" refers to dividing a large task into smaller, more manageable chunks that can be processed simultaneously by multiple computing units, such as CPU cores or distributed computing nodes. It's like breaking down a big task into smaller pieces so that multiple workers can work on different pieces at the same time, and in the case of nx-parallel, this usually speeds up the overall process. -The default chunking in nx-parallel is done by slicing the list of nodes (or edges or any other iterator) into `n_jobs` number of chunks. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). By default, `n_jobs` is `None`. To learn about how you can modify the value of `n_jobs` and other config options refer [`Config.md`](./Config.md). The default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)). +Chunking in nx-parallel defaults to slicing the input into `n_jobs` chunks (`n_jobs=-1` means using all CPU cores; see [`chunk.py`](./nx_parallel/utils/chunk.py)). To know how to change config options like `n_jobs`, see [`Config.md`](./Config.md). A user can override chunking by passing a custom function to the `get_chunks` kwarg. When adding a new algorithm, you may modify this default chunking behavior if needed (e.g. [PR#33](https://github.com/networkx/nx-parallel/pull/33)). ## General guidelines on adding a new algorithm diff --git a/Config.md b/Config.md index 4d3ed3a..22e59a7 100644 --- a/Config.md +++ b/Config.md @@ -1,34 +1,12 @@ # Configuring nx-parallel `nx-parallel` provides flexible parallel computing capabilities, allowing you to control settings like `backend`, `n_jobs`, `verbose`, and more. This can be done through two configuration systems: `joblib` and `NetworkX`. This guide explains how to configure `nx-parallel` using both systems. +Note that both NetworkX’s and Joblib’s config systems offer the same parameters and behave similarly; which one to use depends on your use case. See Section 3 below for more. +## 1. Setting configs using NetworkX (`nx.config`) -## 1. Setting configs using `joblib.parallel_config` +By default, `nx-parallel` uses NetworkX's configuration system. Please refer to the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on the configuration system. -`nx-parallel` relies on [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html) for parallel computing. You can adjust its settings through the [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) class provided by `joblib`. For more details, check out the official [joblib documentation](https://joblib.readthedocs.io/en/latest/parallel.html). - -### 1.1 Usage - -```python -from joblib import parallel_config - -# Setting global configs -parallel_config(n_jobs=3, verbose=50) -nx.square_clustering(H) - -# Setting configs in a context -with parallel_config(n_jobs=7, verbose=0): - nx.square_clustering(H) -``` - -Please refer the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand the config parameters. - -Note: Ensure that `nx.config.backends.parallel.active = False` when using `joblib` for configuration, as NetworkX configurations will override `joblib.parallel_config` settings if `active` is `True`. - -## 2. Setting configs using `networkx`'s configuration system for backends - -To use NetworkX’s configuration system in `nx-parallel`, you must set the `active` flag (in `nx.config.backends.parallel`) to `True`. - -### 2.1 Configs in NetworkX for backends +### 1.1 Configs in NetworkX for backends When you import NetworkX, it automatically sets default configurations for all installed backends, including `nx-parallel`. @@ -40,36 +18,36 @@ print(nx.config) Output: -``` +```sh NetworkXConfig( - backend_priority=[], + backend_priority=BackendPriorities( + algos=[], + generators=[] + ), backends=Config( parallel=ParallelConfig( - active=False, - backend="loky", - n_jobs=None, + active=True, + backend='loky', + n_jobs=-1, verbose=0, temp_folder=None, - max_nbytes="1M", - mmap_mode="r", + max_nbytes='1M', + mmap_mode='r', prefer=None, require=None, inner_max_num_threads=None, - backend_params={}, + backend_params={} ) ), cache_converted_graphs=True, + fallback_to_nx=False, + warnings_to_ignore=set() ) ``` -As you can see in the above output, by default, `active` is set to `False`. So, to enable NetworkX configurations for `nx-parallel`, set `active` to `True`. Please refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on networkx configuration system. - -### 2.2 Usage - -```python -# enabling networkx's config for nx-parallel -nx.config.backends.parallel.active = True +### 1.2 Usage +```py # Setting global configs nxp_config = nx.config.backends.parallel nxp_config.n_jobs = 3 @@ -78,15 +56,36 @@ nxp_config.verbose = 50 nx.square_clustering(H) # Setting config in a context -with nxp_config(n_jobs=7, verbose=0): +with nxp_config(n_jobs=7, verbose=10): nx.square_clustering(H) ``` -The configuration parameters are the same as `joblib.parallel_config`, so you can refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand these config parameters. +### 1.3 How Does NetworkX's Configuration Work in nx-parallel? + +In `nx-parallel`, there's a `_configure_if_nx_active` decorator applied to all algorithms. This decorator checks the value of `active` (in `nx.config.backends.parallel`) and then accordingly uses the appropriate configuration system (`joblib` or `networkx`). Since `active=True` by default, it extracts the configs from `nx.config.backends.parallel` and passes them in a `joblib.parallel_config` context manager and calls the function within this context. If the `active` flag is set to `False`, it simply calls the function, assuming that you(user) have set the desired configurations in `joblib.parallel_config`. + +## 2. Setting configs using `joblib.parallel_config` + +Another way to configure `nx-parallel` is by using [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) class provided by `joblib`. Please refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand the config parameters. -### 2.3 How Does NetworkX's Configuration Work in nx-parallel? +### 2.1 Usage -In `nx-parallel`, there's a `_configure_if_nx_active` decorator applied to all algorithms. This decorator checks the value of `active`(in `nx.config.backends.parallel`) and then accordingly uses the appropriate configuration system (`joblib` or `networkx`). If `active=True`, it extracts the configs from `nx.config.backends.parallel` and passes them in a `joblib.parallel_config` context manager and calls the function in this context. Otherwise, it simply calls the function. +To use `joblib.parallel_config` with `nx-parallel`, you need to disable NetworkX's config by setting `nx.config.backends.parallel.active = False`. There are two ways to do this: globally, or temporarily using a context manager. This ensures that Joblib’s settings take effect instead. + +```py +from joblib import parallel_config + +# Disable NetworkX configs +nx.config.backends.parallel.active = False + +# Setting global configs for Joblib +parallel_config(n_jobs=5, verbose=50) +nx.square_clustering(H) + +# Setting configs in Joblib's context +with parallel_config(n_jobs=7, verbose=10): + nx.square_clustering(H) +``` ## 3. Comparing NetworkX and Joblib Configuration Systems @@ -97,8 +96,7 @@ You can use both NetworkX’s configuration system and `joblib.parallel_config` Example: ```py -# Enable NetworkX configuration -nx.config.backends.parallel.active = True +# Global NetworkX configuration nx.config.backends.parallel.n_jobs = 6 # Global Joblib configuration @@ -114,24 +112,35 @@ with joblib.parallel_config(n_jobs=4, verbose=55): joblib.Parallel()(joblib.delayed(sqrt)(i**2) for i in range(10)) ``` -- **NetworkX Configurations for nx-parallel**: When calling functions within `nx-parallel`, NetworkX’s configurations will override those specified by Joblib. For example, the `nx.square_clustering` function will use the `n_jobs=6` setting from `nx.config.backends.parallel`, regardless of any Joblib settings within the same context. +- **NetworkX Configurations for nx-parallel**: When calling functions within `nx-parallel`, NetworkX’s configurations will override those specified by Joblib because `active` is set to `True` by default. For example, the `nx.square_clustering` function will use the `n_jobs=6` setting from `nx.config.backends.parallel`, regardless of any Joblib settings within the same context. - **Joblib Configurations for Other Code**: For any other parallel code outside of `nx-parallel`, such as a direct call to `joblib.Parallel`, the configurations specified within the Joblib context will be applied. -This behavior ensures that `nx-parallel` functions consistently use NetworkX’s settings when enabled, while still allowing Joblib configurations to apply to non-NetworkX parallel tasks. +This behavior ensures that `nx-parallel` functions consistently use NetworkX’s settings, while still allowing Joblib configurations to apply to non-NetworkX parallel tasks. + +To understand using `joblib.parallel_config` within `nx-parallel`, see [Section 2.1](#21-usage). **Key Takeaway**: When both systems are used together, NetworkX's configuration (`nx.config.backends.parallel`) takes precedence for `nx-parallel` functions. To avoid unexpected behavior, ensure that the `active` setting aligns with your intended configuration system. ### 3.2 Key Differences -- **Parameter Handling**: The main difference is how `backend_params` are passed. Since, in networkx configurations are stored as a [`@dataclass`](https://docs.python.org/3/library/dataclasses.html), we need to pass them as a dictionary, whereas in `joblib.parallel_config` you can just pass them along with the other configurations, as shown below: +- **Parameter Handling**: The main difference is how `backend_params` are passed. Since networkx configurations are stored as a [`@dataclass`](https://docs.python.org/3/library/dataclasses.html), we need to pass them as a dictionary, whereas in `joblib.parallel_config` you can just pass them along with the other configurations, as shown below: ```py nx.config.backends.parallel.backend_params = {"max_nbytes": None} joblib.parallel_config(backend="loky", max_nbytes=None) ``` -- **Default Behavior**: By default, `nx-parallel` looks for configs in `joblib.parallel_config` unless `nx.config.backends.parallel.active` is set to `True`. +- **Default `n_jobs`**: In the NetworkX configuration system, `n_jobs=-1` by default, i.e uses all available CPU cores, whereas `joblib.parallel_config` defaults to `n_jobs=None`. So, parallelism is enabled by default in NetworkX, but must be manually enabled when using `joblib.parallel_config`. + + ```py + # NetworkX + print(nx.config.backends.parallel.n_jobs) # Output : -1 + + # Joblib + with joblib.parallel_config() as cfg: + print(cfg["n_jobs"]) # Output : default(None) + ``` ### 3.3 When Should You Use Which System? @@ -139,9 +148,10 @@ When the only networkx backend you're using is `nx-parallel`, then either of the But, when working with multiple NetworkX backends, it's crucial to ensure compatibility among the backends to avoid conflicts between different configurations. In such cases, using NetworkX's configuration system to configure `nx-parallel` is recommended. This approach helps maintain consistency across backends. For example: -```python +```py nx.config.backend_priority = ["another_nx_backend", "parallel"] nx.config.backends.another_nx_backend.config_1 = "xyz" +nx.config.backends.parallel.active = False joblib.parallel_config(n_jobs=7, verbose=50) nx.square_clustering(G) diff --git a/README.md b/README.md index fb2386c..cbbe151 100644 --- a/README.md +++ b/README.md @@ -5,25 +5,25 @@ nx-parallel is a NetworkX backend that uses joblib for parallelization. This pro ## Algorithms in nx-parallel - [all_pairs_all_shortest_paths](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/generic.py#L11) -- [all_pairs_bellman_ford_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L212) -- [all_pairs_bellman_ford_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L168) +- [all_pairs_bellman_ford_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L208) +- [all_pairs_bellman_ford_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L165) - [all_pairs_dijkstra](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L29) -- [all_pairs_dijkstra_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L124) -- [all_pairs_dijkstra_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L73) +- [all_pairs_dijkstra_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L122) +- [all_pairs_dijkstra_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L72) - [all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L18) -- [all_pairs_shortest_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L63) +- [all_pairs_shortest_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L62) - [all_pairs_shortest_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L19) -- [approximate_all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/approximation/connectivity.py#L13) +- [approximate_all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/approximation/connectivity.py#L14) - [betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L20) - [closeness_vitality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L10) -- [edge_betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L96) +- [edge_betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L103) - [is_reachable](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L13) -- [johnson](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L256) -- [local_efficiency](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L10) +- [johnson](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L251) +- [local_efficiency](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L11) - [node_redundancy](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/bipartite/redundancy.py#L12) - [number_of_isolates](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L9) - [square_clustering](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L11) -- [tournament_is_strongly_connected](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L59) +- [tournament_is_strongly_connected](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L58)
Script used to generate the above list @@ -49,7 +49,7 @@ conda install nx-parallel For more, see [INSTALL.md](./INSTALL.md). -## Backend usage +## Usage You can run your networkx code with nx-parallel backend by: @@ -57,18 +57,12 @@ You can run your networkx code with nx-parallel backend by: export NETWORKX_AUTOMATIC_BACKENDS="parallel" && python nx_code.py ``` -Note that for all functions inside `nx_code.py` that do not have an nx-parallel implementation their original networkx implementation will be executed. You can also use the nx-parallel backend in your code for only some specific function calls in the following ways: +Note that for all functions inside `nx_code.py` that do not have an nx-parallel implementation, their original networkx implementation will be executed. You can also use the nx-parallel backend in your code for only some specific function calls in the following ways: ```py import networkx as nx import nx_parallel as nxp -# enabling networkx's config for nx-parallel -nx.config.backends.parallel.active = True - -# setting `n_jobs` (by default, `n_jobs=None`) -nx.config.backends.parallel.n_jobs = 4 - G = nx.path_graph(4) H = nxp.ParallelGraph(G) @@ -85,7 +79,64 @@ nxp.betweenness_centrality(G) nxp.betweenness_centrality(H) ``` -For more on how to play with configurations in nx-parallel refer the [Config.md](./Config.md)! Additionally, refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on functionalities provided by networkx for backends and configs like logging, `backend_priority`, etc. Another way to configure nx-parallel is by using [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html). +You can also measure the performance gains of parallel algorithms by comparing them to their sequential counter parts. Following is a simple benchmarking setup: + +```py +import networkx as nx +import nx_parallel as nxp +from timeit import timeit + +G = nx.erdos_renyi_graph(1000, 0.01) +H = nxp.ParallelGraph(G) + +# By default, nx-parallel uses all available cores +sequential = timeit(lambda: nx.betweenness_centrality(G), number=1) +print(f"Sequential: {sequential:.2f}s") + +parallel = timeit(lambda: nx.betweenness_centrality(H), number=1) +print(f"Parallel: {parallel:.2f}s") +``` + +Output: +```sh +Sequential: 1.29s +Parallel: 0.62s +``` + +### Setting Configurations + +You can modify the default NetworkX configuration and observe internal behavior by enabling logging. This is useful for understanding which backend is being used and how tasks are scheduled: + +```py +import networkx as nx +import nx_parallel as nxp +import logging + +G = nx.path_graph(4) +H = nxp.ParallelGraph(G) + +# setting up logging +nxl = logging.getLogger("networkx") +nxl.addHandler(logging.StreamHandler()) +nxl.setLevel(logging.DEBUG) + +# setting NetworkX configs +with nx.config.backends.parallel(n_jobs=2, verbose=10): + nx.betweenness_centrality(G, backend="parallel") +``` + +Output : +```sh +Converting input graphs from 'networkx' backend to 'parallel' backend for call to 'betweenness_centrality' +Using backend 'parallel' for call to 'betweenness_centrality' with arguments: (G=, k=None, normalized=True, weight=None, endpoints=False, seed=) +[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers. +[Parallel(n_jobs=2)]: Batch computation too fast (0.16860580444335938s.) Setting batch_size=2. +[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.2s finished +``` + +For more on how to play with configurations in nx-parallel refer the [Config.md](./Config.md)! + +Additionally, refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more. ### Notes diff --git a/_nx_parallel/config.py b/_nx_parallel/config.py index 4f7ff49..ff27bed 100644 --- a/_nx_parallel/config.py +++ b/_nx_parallel/config.py @@ -9,9 +9,9 @@ @dataclass class ParallelConfig(Config): - active: bool = False + active: bool = True backend: str = "loky" - n_jobs: int = None + n_jobs: int = -1 verbose: int = 0 temp_folder: str = None max_nbytes: Union[int, str] = "1M" diff --git a/nx_parallel/tests/test_entry_points.py b/nx_parallel/tests/test_entry_points.py index 6de0d03..514a40d 100644 --- a/nx_parallel/tests/test_entry_points.py +++ b/nx_parallel/tests/test_entry_points.py @@ -19,9 +19,9 @@ def test_config_init(): import networkx as nx assert dict(nx.config.backends.parallel) == { - "active": False, + "active": True, "backend": "loky", - "n_jobs": None, + "n_jobs": -1, "verbose": 0, "temp_folder": None, "max_nbytes": "1M", diff --git a/nx_parallel/utils/tests/test_chunk.py b/nx_parallel/utils/tests/test_chunk.py index f20d206..7e61152 100644 --- a/nx_parallel/utils/tests/test_chunk.py +++ b/nx_parallel/utils/tests/test_chunk.py @@ -9,7 +9,7 @@ def test_get_n_jobs(): # Test with no n_jobs (default) with pytest.MonkeyPatch().context() as mp: mp.delitem(os.environ, "PYTEST_CURRENT_TEST", raising=False) - assert nxp.get_n_jobs() == 1 + assert nxp.get_n_jobs() == os.cpu_count() # Test with n_jobs set to positive value assert nxp.get_n_jobs(4) == 4 @@ -20,11 +20,11 @@ def test_get_n_jobs(): # Test with joblib's context from joblib import parallel_config - with parallel_config(n_jobs=3): + with nx.config.backends.parallel(active=False), parallel_config(n_jobs=3): assert nxp.get_n_jobs() == 3 # Test with nx-parallel's context - with nx.config.backends.parallel(active=True, n_jobs=5): + with nx.config.backends.parallel(n_jobs=5): assert nxp.get_n_jobs() == 5 # Test with n_jobs = 0 to raise a ValueError