Skip to content

Conversation

@paul-tqh-nguyen
Copy link
Contributor

This PR contains several translator optimizations.

Some timings are below.

CuDFEdgeSet->ScipyEdgeSet (1e6 edges)

  • Original Time: 3.611112594604492
  • New Time: 0.3172187805175781
  • 10x speedup

CuDFEdgeMap->ScipyEdgeMap (1e6 edges)

  • Original Time: 3.7685587406158447
  • New Time: 0.3261408805847168
  • 10x speedup

CuDFNodeMap->PythonNodeMap (1e4 nodes)

  • Original Time: 24.827038526535034
  • New Time: 0.006268501281738281
  • 3960x speedup

ScipyEdgeMap->CuDFEdgeMap (1e4 edges)

  • Original Time: 0.037711374759674072
  • New Time: 0.003190011978149414
  • 10x speedup

ScipyEdgeSet->CuDFEdgeSet (1e4 edges)

  • Original Time: 0.033192789554595947
  • New Time: 0.002845954895019531
  • 10x speedup

ScipyEdgeSet -> CuGraphEdgeSet (1e4 edges)

  • Original Time: 0.08482241868972779
  • New Time: 0.05516828298568725
  • 30% speedup

ScipyEdgeMap -> CuGraphEdgeMap (1e4 edges)

  • Original Time: 0.09005213499069215
  • New Time: 0.057203121185302734
  • 2x speedup

CuGraphEdgeSet -> ScipyEdgeSet (1e4 edges)

  • Original Time: 0.022807248115539552
  • New Time: 0.011398396968841552
  • 2x speedup

CuGraphEdgeMap -> ScipyEdgeMap (1e4 edges)

  • Original Time: 0.03481658458709717
  • New Time: 0.01864086937904358
  • 3x speedup

CuGraph -> ScipyGraph (1e4 edges)

  • Original Time: 0.035700538873672485
  • New Time: 0.018383591890335085
  • 2x speedup

ScipyGraph -> CuGraph (1e4 edges)

  • Original Time: 0.10689815402030944
  • New Time: 0.0734414622783661
  • 30% speedup

…tor ; optimize CuDFEdgeSet->ScipyEdgeSet translator

The CuDFEdgeSet->ScipyEdgeSet translator used to grab unique values on the CPU-side, which is expensive. This is now done on the GPU-side.

We shave off some time on the part of the translator that gets the unique nodes. The speedup is from 43.716705560684204 seconds to 41.033896923065186 seconds using the code below:

from typing import Callable, Generator
from contextlib import contextmanager
@contextmanager
def timer(section_name: str = None, exitCallback: Callable[[float], None] = None) -> Generator:
    import time
    start_time = time.time()
    yield
    end_time = time.time()
    elapsed_time = end_time - start_time
    if exitCallback != None:
        exitCallback(elapsed_time)
    elif section_name:
        print(f'{section_name} took {elapsed_time} seconds.')
    else:
        print(f'Execution took {elapsed_time} seconds.')
    return

import cudf
import random
import numpy as np
import networkx as nx
import metagraph as mg

r = mg.resolver

print("Generating graph.")
num_nodes = int(1e7)
source_nodes = list(range(num_nodes))
target_nodes = list(range(num_nodes))
random.shuffle(source_nodes)
random.shuffle(target_nodes)
df = cudf.DataFrame({"source": source_nodes, "target": target_nodes})

print("Loading into metagraph.")
edge_list = r.wrappers.EdgeSet.CuDFEdgeSet(df, src_label="source", dst_label="target", is_directed=False)

print("Timing translation.")
with timer(section_name="cupy"):
    r.translate(edge_list, r.types.EdgeSet.ScipyEdgeSetType)
This commit moves translation to the GPU for the CuDFEdgeSet -> ScipyEdgeSet translator.

We get a 10x speed up from 3.3889002799987793 seconds to 0.3172187805175781 seconds.
This commit moves translation to the GPU for the CuDFEdgeMap -> ScipyEdgeMap translator.

We get a 10x speed up from 3.7685587406158447 seconds to 0.3261408805847168 seconds using the following code:

from typing import Callable, Generator
from contextlib import contextmanager
@contextmanager
def timer(section_name: str = None, exitCallback: Callable[[float], None] = None) -> Generator:
    import time
    start_time = time.time()
    yield
    end_time = time.time()
    elapsed_time = end_time - start_time
    if exitCallback != None:
        exitCallback(elapsed_time)
    elif section_name:
        print(f'{section_name} took {elapsed_time} seconds.')
    else:
        print(f'Execution took {elapsed_time} seconds.')
    return

import cudf
import random
import numpy as np
import networkx as nx
import metagraph as mg

r = mg.resolver

print("Generating graph.")
num_nodes = int(1e6)
source_nodes = list(range(num_nodes))
target_nodes = list(range(num_nodes))
weights = list(range(num_nodes))
random.shuffle(source_nodes)
random.shuffle(target_nodes)
df = cudf.DataFrame({"source": source_nodes, "target": target_nodes, "weight": weights})

print("Loading into metagraph.")
edge_list = r.wrappers.EdgeMap.CuDFEdgeMap(df, src_label="source", dst_label="target", is_directed=False)

print("Timing translation.")
with timer(section_name="cupy"):
    r.translate(edge_list, r.types.EdgeMap.ScipyEdgeMapType)
CuDFNodeMap -> PythonNodeMapType translator speed up from 24.827038526535034 seconds to 0.006268501281738281 seconds on 1e4 size node map.

ScipyEdgeMap -> CuDFEdgeMap translator speed up from 0.037711374759674072 seconds to 0.003190011978149414 seconds on 1e4 size edge map.

ScipyEdgeSet -> CuDFEdgeSet translator speed up from 0.033192789554595947 seconds to 0.002845954895019531 seconds on 1e4 size edge set.
This commit optimizes the ScipyEdgeSet -> CuGraphEdgeSet translator.

Original Time: 0.08482241868972779
New Time: 0.05516828298568725

We used this code to get those numbers:

from typing import Callable, Generator
from contextlib import contextmanager
@contextmanager
def timer(section_name: str = None, exitCallback: Callable[[float], None] = None) -> Generator:
    import time
    start_time = time.time()
    yield
    end_time = time.time()
    elapsed_time = end_time - start_time
    if exitCallback != None:
        exitCallback(elapsed_time)
    elif section_name:
        print(f'{section_name} took {elapsed_time} seconds.')
    else:
        print(f'Execution took {elapsed_time} seconds.')
    return

from statistics import mean
import cudf
import random
import numpy as np
import networkx as nx
import metagraph as mg

r = mg.resolver

print("Generating data.")
num_nodes = int(1e4)
source_nodes = list(range(num_nodes))
target_nodes = list(range(num_nodes))
random.shuffle(source_nodes)
random.shuffle(target_nodes)
df = cudf.DataFrame({"source": source_nodes, "target": target_nodes})

print("Loading into metagraph.")
edge_list = r.wrappers.EdgeSet.CuDFEdgeSet(df, src_label="source", dst_label="target", is_directed=False)
edge_list = r.translate(edge_list, r.types.EdgeSet.ScipyEdgeSetType)

print("Timing translation.")
times = []
for _ in range(int(1e2)):
    with timer(exitCallback=lambda new_time: times.append(new_time)):
        r.translate(edge_list, r.types.EdgeSet.CuGraphEdgeSetType)
time = mean(times)
print(f"time {repr(time)}")
cugraph.Graph data structures can store data as adjacency lists (CSR) rather than edge lists (COO).

When a cugraph is stored this way and we're translating to SciPy sparse matrix, we shouldn't go out of our way to convert the cugraph CSR matrix to COO (especially when the cugraph data structure does not have the edge list / COO data already computed).

This commit makes it so that the CuGraph* -> Scipy* translators do not always convert to COO before creating the SciPy sparse matrix. It'll simply translate the CSR data if it's already available or otherwise translate to COO. There's now no longer any COO<->CSR translation as before in these translators.
This commit includes 2 cugraph translator improvements:
* Removing calls to the method toccsr when converting to SciPy graphs as it's not always clear that this is a neccessary or will lead to better performance. There's no reason to pay this O(n) translation cost unless it's necessary.
* For the Scipy* -> CuGraph* translators, if the Scipy* matrix is in CSR format, we now translate it directly into the CuGraph*. Before, we converted the CSR data to COO before translating it to the CuGraph*

Tests were updated as well as a result of these changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant