Skip to content

make n_jobs=-1 as default #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,6 @@ git push origin <branch_name>

- `get_chunks` is the additional backend-specific argument for almost all the algorithms in nx-parallel and it's tested for all algorithms together in `nx_parallel/tests/test_get_chunks.py` file.

- We use `@pytest.mark.order` for the tests that change the global configurations (i.e. `nx.config`) to make sure those tests run in the specified order and don't cause unexpected failures.

## Documentation syntax

For displaying a small note about nx-parallel's implementation at the end of the main NetworkX documentation, we use the `backend_info` [entry_point](https://packaging.python.org/en/latest/specifications/entry-points/#entry-points) (in the `pyproject.toml` file). The [`get_info` function](./_nx_parallel/__init__.py) is used to parse the docstrings of all the algorithms in nx-parallel and display the nx-parallel specific documentation on the NetworkX's main docs, in the "Additional Backend implementations" box, as shown in the screenshot below.
Expand Down Expand Up @@ -110,7 +108,7 @@ def parallel_func(G, nx_arg, additional_backend_arg_1, additional_backend_arg_2=

In parallel computing, "chunking" refers to dividing a large task into smaller, more manageable chunks that can be processed simultaneously by multiple computing units, such as CPU cores or distributed computing nodes. It's like breaking down a big task into smaller pieces so that multiple workers can work on different pieces at the same time, and in the case of nx-parallel, this usually speeds up the overall process.

The default chunking in nx-parallel is done by slicing the list of nodes (or edges or any other iterator) into `n_jobs` number of chunks. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). By default, `n_jobs` is `None`. To learn about how you can modify the value of `n_jobs` and other config options refer [`Config.md`](./Config.md). The default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)).
Chunking in nx-parallel defaults to slicing the input into `n_jobs` chunks (`n_jobs=-1` means using all CPU cores; see [`chunk.py`](./nx_parallel/utils/chunk.py)). To know how to change config options like `n_jobs`, see [`Config.md`](./Config.md). A user can override chunking by passing a custom function to the `get_chunks` kwarg. When adding a new algorithm, you may modify this default chunking behavior if needed (e.g. [PR#33](https://github.com/networkx/nx-parallel/pull/33)).

## General guidelines on adding a new algorithm

Expand Down
122 changes: 74 additions & 48 deletions Config.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,12 @@
# Configuring nx-parallel

`nx-parallel` provides flexible parallel computing capabilities, allowing you to control settings like `backend`, `n_jobs`, `verbose`, and more. This can be done through two configuration systems: `joblib` and `NetworkX`. This guide explains how to configure `nx-parallel` using both systems.
Note that both NetworkX’s and Joblib’s config systems offer the same parameters and behave similarly; which one to use depends on your use case. See Section 3 below for more.
## 1. Setting configs using NetworkX (`nx.config`)

## 1. Setting configs using `joblib.parallel_config`
By default, `nx-parallel` uses NetworkX's configuration system. Please refer to the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on the configuration system.

`nx-parallel` relies on [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html) for parallel computing. You can adjust its settings through the [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) class provided by `joblib`. For more details, check out the official [joblib documentation](https://joblib.readthedocs.io/en/latest/parallel.html).

### 1.1 Usage

```python
from joblib import parallel_config

# Setting global configs
parallel_config(n_jobs=3, verbose=50)
nx.square_clustering(H)

# Setting configs in a context
with parallel_config(n_jobs=7, verbose=0):
nx.square_clustering(H)
```

Please refer the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand the config parameters.

Note: Ensure that `nx.config.backends.parallel.active = False` when using `joblib` for configuration, as NetworkX configurations will override `joblib.parallel_config` settings if `active` is `True`.

## 2. Setting configs using `networkx`'s configuration system for backends

To use NetworkX’s configuration system in `nx-parallel`, you must set the `active` flag (in `nx.config.backends.parallel`) to `True`.

### 2.1 Configs in NetworkX for backends
### 1.1 Configs in NetworkX for backends

When you import NetworkX, it automatically sets default configurations for all installed backends, including `nx-parallel`.

Expand All @@ -42,34 +20,34 @@ Output:

```
NetworkXConfig(
backend_priority=[],
backend_priority=BackendPriorities(
algos=[],
generators=[]
),
backends=Config(
parallel=ParallelConfig(
active=False,
backend="loky",
n_jobs=None,
active=True,
backend='loky',
n_jobs=-1,
verbose=0,
temp_folder=None,
max_nbytes="1M",
mmap_mode="r",
max_nbytes='1M',
mmap_mode='r',
prefer=None,
require=None,
inner_max_num_threads=None,
backend_params={},
backend_params={}
)
),
cache_converted_graphs=True,
fallback_to_nx=False,
warnings_to_ignore=set()
)
```

As you can see in the above output, by default, `active` is set to `False`. So, to enable NetworkX configurations for `nx-parallel`, set `active` to `True`. Please refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on networkx configuration system.

### 2.2 Usage
### 1.2 Usage

```python
# enabling networkx's config for nx-parallel
nx.config.backends.parallel.active = True

# Setting global configs
nxp_config = nx.config.backends.parallel
nxp_config.n_jobs = 3
Expand All @@ -82,11 +60,46 @@ with nxp_config(n_jobs=7, verbose=0):
nx.square_clustering(H)
```

The configuration parameters are the same as `joblib.parallel_config`, so you can refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand these config parameters.
All configuration parameters are the same as `joblib.parallel_config`, so you can refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand these config parameters.

### 1.3 How Does NetworkX's Configuration Work in nx-parallel?

In `nx-parallel`, there's a `_configure_if_nx_active` decorator applied to all algorithms. This decorator checks the value of `active` (in `nx.config.backends.parallel`) and then accordingly uses the appropriate configuration system (`joblib` or `networkx`). Since `active=True` by default, it extracts the configs from `nx.config.backends.parallel` and passes them in a `joblib.parallel_config` context manager and calls the function within this context. If the `active` flag is set to `False`, it simply calls the function, assuming that you(user) have set the desired configurations in `joblib.parallel_config`.

## 2. Setting configs using `joblib.parallel_config`

Another way to configure `nx-parallel` is by using [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) class provided by `joblib`. For more details, check out the official [joblib documentation](https://joblib.readthedocs.io/en/latest/parallel.html).

### 2.1 Usage

To use `joblib.parallel_config` with `nx-parallel`, set `nx.config.backends.parallel.active = False`. This disables the default NetworkX configuration so joblib settings can take effect. This can be done globally or within a context manager.

**2.1.1 Disable NetworkX config globally**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section is showing how joblib configs can be globally and in a context-- so "Disable NetworkX config globally" heading seems out of place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to explain to users that there are two ways to disable NetworkX’s config system so that joblib’s own configuration can take effect — either globally or within a context manager. The focus is on how to disable NetworkX’s config, not on how joblib can be set globally or in a context by itself. Although, different ways of setting joblib configs has been implicitly conveyed with:

# Setting global configs for Joblib
parallel_config(n_jobs=5, verbose=50)
nx.square_clustering(H)

# Setting configs in Joblib's context
with parallel_config(n_jobs=7, verbose=0):
    nx.square_clustering(H)

let me know if you think any tweaks are needed -- but this looks clear to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the start of this Config.md, "This guide explains how to configure nx-parallel using both systems.", and this second section discusses how to use joblib.parallel_config -- and to use that the first step is to disable networkx's config via active parameter (the default) -- so it's just a step not the whole process. HTH.

```python
from joblib import parallel_config

# Setting global configs for NetworkX
nx.config.backends.parallel.active = False

# Setting global configs for Joblib
parallel_config(n_jobs=5, verbose=50)
nx.square_clustering(H)

# Setting configs in Joblib's context
with parallel_config(n_jobs=7, verbose=0):
nx.square_clustering(H)
```

**2.1.2 Disable NetworkX config in a context**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

```python
from joblib import parallel_config

### 2.3 How Does NetworkX's Configuration Work in nx-parallel?
# Setting configs in NetworkX's context
with nx.config.backends.parallel(active=False), parallel_config(n_jobs=7, verbose=50):
nx.square_clustering(H)
```

In `nx-parallel`, there's a `_configure_if_nx_active` decorator applied to all algorithms. This decorator checks the value of `active`(in `nx.config.backends.parallel`) and then accordingly uses the appropriate configuration system (`joblib` or `networkx`). If `active=True`, it extracts the configs from `nx.config.backends.parallel` and passes them in a `joblib.parallel_config` context manager and calls the function in this context. Otherwise, it simply calls the function.
Please refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand the config parameters.

## 3. Comparing NetworkX and Joblib Configuration Systems

Expand All @@ -97,8 +110,7 @@ You can use both NetworkX’s configuration system and `joblib.parallel_config`
Example:

```py
# Enable NetworkX configuration
nx.config.backends.parallel.active = True
# Global NetworkX configuration
nx.config.backends.parallel.n_jobs = 6

# Global Joblib configuration
Expand All @@ -114,24 +126,37 @@ with joblib.parallel_config(n_jobs=4, verbose=55):
joblib.Parallel()(joblib.delayed(sqrt)(i**2) for i in range(10))
```

- **NetworkX Configurations for nx-parallel**: When calling functions within `nx-parallel`, NetworkX’s configurations will override those specified by Joblib. For example, the `nx.square_clustering` function will use the `n_jobs=6` setting from `nx.config.backends.parallel`, regardless of any Joblib settings within the same context.
- **NetworkX Configurations for nx-parallel**: When calling functions within `nx-parallel`, NetworkX’s configurations will override those specified by Joblib because `active` is set to `True` by default. For example, the `nx.square_clustering` function will use the `n_jobs=6` setting from `nx.config.backends.parallel`, regardless of any Joblib settings within the same context.

- **Joblib Configurations for Other Code**: For any other parallel code outside of `nx-parallel`, such as a direct call to `joblib.Parallel`, the configurations specified within the Joblib context will be applied.

This behavior ensures that `nx-parallel` functions consistently use NetworkX’s settings when enabled, while still allowing Joblib configurations to apply to non-NetworkX parallel tasks.
This behavior ensures that `nx-parallel` functions consistently use NetworkX’s settings, while still allowing Joblib configurations to apply to non-NetworkX parallel tasks.

To understand using `joblib.parallel_config` within `nx-parallel`, see [the usage guide](#11-usage).

**Key Takeaway**: When both systems are used together, NetworkX's configuration (`nx.config.backends.parallel`) takes precedence for `nx-parallel` functions. To avoid unexpected behavior, ensure that the `active` setting aligns with your intended configuration system.

### 3.2 Key Differences

- **Parameter Handling**: The main difference is how `backend_params` are passed. Since, in networkx configurations are stored as a [`@dataclass`](https://docs.python.org/3/library/dataclasses.html), we need to pass them as a dictionary, whereas in `joblib.parallel_config` you can just pass them along with the other configurations, as shown below:
- **Parameter Handling**: The main difference is how `backend_params` are passed. Since networkx configurations are stored as a [`@dataclass`](https://docs.python.org/3/library/dataclasses.html), we need to pass them as a dictionary, whereas in `joblib.parallel_config` you can just pass them along with the other configurations, as shown below:

```py
nx.config.backends.parallel.backend_params = {"max_nbytes": None}
joblib.parallel_config(backend="loky", max_nbytes=None)
```

- **Default Behavior**: By default, `nx-parallel` looks for configs in `joblib.parallel_config` unless `nx.config.backends.parallel.active` is set to `True`.
- **Default `n_jobs`**: In the NetworkX configuration system, `n_jobs=-1` by default, i.e uses all available CPU cores, whereas `joblib.parallel_config` defaults to `n_jobs=None`. So, parallelism is enabled by default in NetworkX, but must be manually configured when using `joblib.parallel_config`.

```python
# NetworkX
print(nx.config.backends.parallel.n_jobs) # Output : -1

# Joblib
with joblib.parallel_config() as cfg:
print(cfg["n_jobs"]) # Output : default(None)
```

- **Default Behavior**: By default, `nx-parallel` looks for configs in `nx.config.backends.parallel` unless `active` flag is set to `False`.

### 3.3 When Should You Use Which System?

Expand All @@ -142,6 +167,7 @@ But, when working with multiple NetworkX backends, it's crucial to ensure compat
```python
nx.config.backend_priority = ["another_nx_backend", "parallel"]
nx.config.backends.another_nx_backend.config_1 = "xyz"
nx.config.backends.parallel.active = False
joblib.parallel_config(n_jobs=7, verbose=50)

nx.square_clustering(G)
Expand Down
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,6 @@ Note that for all functions inside `nx_code.py` that do not have an nx-parallel
import networkx as nx
import nx_parallel as nxp

# enabling networkx's config for nx-parallel
nx.config.backends.parallel.active = True

# setting `n_jobs` (by default, `n_jobs=None`)
nx.config.backends.parallel.n_jobs = 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here-- or below when we are calling the parallel backend-- about the fact that all cores are being used by default would be helpful-- and maybe we can also use timeit to show the performance improvement

G = nx.path_graph(4)
H = nxp.ParallelGraph(G)

Expand All @@ -85,7 +79,14 @@ nxp.betweenness_centrality(G)
nxp.betweenness_centrality(H)
```

For more on how to play with configurations in nx-parallel refer the [Config.md](./Config.md)! Additionally, refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on functionalities provided by networkx for backends and configs like logging, `backend_priority`, etc. Another way to configure nx-parallel is by using [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html).
### Setting Configurations
```py
# modify number of jobs, if you want to limit the number of cores used
nx.config.backends.parallel.n_jobs = 4
```
Comment on lines +83 to +86

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example could be a bit more detailed i think-- maybe switching logging on and including the output would be a good idea. lmkwyt.

Copy link
Contributor Author

@akshitasure12 akshitasure12 Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I feel adding logging setup directly in the README might make the example harder to follow for users. I'll make sure to show the expected output :)

For more on how to play with configurations in nx-parallel refer the [Config.md](./Config.md)!

Additionally, refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more.

### Notes

Expand Down
4 changes: 2 additions & 2 deletions _nx_parallel/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@

@dataclass
class ParallelConfig(Config):
active: bool = False
active: bool = True
backend: str = "loky"
n_jobs: int = None
n_jobs: int = -1
verbose: int = 0
temp_folder: str = None
max_nbytes: Union[int, str] = "1M"
Expand Down
4 changes: 2 additions & 2 deletions nx_parallel/tests/test_entry_points.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ def test_config_init():
import networkx as nx

assert dict(nx.config.backends.parallel) == {
"active": False,
"active": True,
"backend": "loky",
"n_jobs": None,
"n_jobs": -1,
"verbose": 0,
"temp_folder": None,
"max_nbytes": "1M",
Expand Down
6 changes: 3 additions & 3 deletions nx_parallel/utils/tests/test_chunk.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def test_get_n_jobs():
# Test with no n_jobs (default)
with pytest.MonkeyPatch().context() as mp:
mp.delitem(os.environ, "PYTEST_CURRENT_TEST", raising=False)
assert nxp.get_n_jobs() == 1
assert nxp.get_n_jobs() == os.cpu_count()

# Test with n_jobs set to positive value
assert nxp.get_n_jobs(4) == 4
Expand All @@ -20,11 +20,11 @@ def test_get_n_jobs():
# Test with joblib's context
from joblib import parallel_config

with parallel_config(n_jobs=3):
with nx.config.backends.parallel(active=False), parallel_config(n_jobs=3):
assert nxp.get_n_jobs() == 3

# Test with nx-parallel's context
with nx.config.backends.parallel(active=True, n_jobs=5):
with nx.config.backends.parallel(n_jobs=5):
assert nxp.get_n_jobs() == 5

# Test with n_jobs = 0 to raise a ValueError
Expand Down
Loading