Change `dask` and `distributed` deps to `==2021.11.2` #78

karlhigley · 2022-04-29T14:02:26Z

No description provided.

github-actions · 2022-04-29T14:05:04Z

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-78

nvidia-merlin-bot · 2022-04-29T14:05:27Z

Click to view CI Results

GitHub pull request #78 of commit 5b48974be883364539cd732493b3c657c06f8211, no merge conflicts.
Running as SYSTEM
Setting status of 5b48974be883364539cd732493b3c657c06f8211 to PENDING with url https://10.20.13.93:8080/job/merlin_core/38/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/78/*:refs/remotes/origin/pr/78/* # timeout=10
 > git rev-parse 5b48974be883364539cd732493b3c657c06f8211^{commit} # timeout=10
Checking out Revision 5b48974be883364539cd732493b3c657c06f8211 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5b48974be883364539cd732493b3c657c06f8211 # timeout=10
Commit message: "Change `dask` and `distributed` deps to `>=2021.11.2`"
 > git rev-list --no-walk ae8cd6f884df7b824b31295387e718e923f4a5fb # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins5319646179197397235.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 230 items / 2 skipped
tests/unit/core/test_dispatch.py .                                       [  0%]

tests/unit/dag/test_base_operator.py EEEE                                [  2%]

tests/unit/dag/test_column_selector.py ..........................        [ 13%]

tests/unit/dag/test_tags.py ......                                       [ 16%]

tests/unit/dag/ops/test_selection.py ...                                 [ 17%]

tests/unit/schema/test_column_schemas.py ............................... [ 30%]

........................................................................ [ 62%]

.......................................................................  [ 93%]

tests/unit/schema/test_schema.py ......                                  [ 95%]

tests/unit/schema/test_schema_io.py ..                                   [ 96%]

tests/unit/utils/test_utils.py ...s.s.s                                  [100%]
==================================== ERRORS ====================================

___________ ERROR at setup of test_graph_validates_schemas[parquet] ____________
request = <SubRequest 'dataset' for <Function test_graph_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

______ ERROR at setup of test_compute_selector_validates_schemas[parquet] ______
request = <SubRequest 'dataset' for <Function test_compute_selector_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

____ ERROR at setup of test_compute_input_schema_validates_schemas[parquet] ____
request = <SubRequest 'dataset' for <Function test_compute_input_schema_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

___ ERROR at setup of test_compute_output_schema_validates_schemas[parquet] ____
request = <SubRequest 'dataset' for <Function test_compute_output_schema_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

=============================== warnings summary ===============================

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35919 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35105 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42055 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

ERROR tests/unit/dag/test_base_operator.py::test_graph_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_selector_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_input_schema_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_output_schema_validates_schemas[parquet]

============= 223 passed, 5 skipped, 3 warnings, 4 errors in 7.45s =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins2539145909978768347.sh

nvidia-merlin-bot · 2022-04-29T15:48:57Z

Click to view CI Results

GitHub pull request #78 of commit 6fae13032f558660dcefe3286c6b7ee0400205bf, no merge conflicts.
Running as SYSTEM
Setting status of 6fae13032f558660dcefe3286c6b7ee0400205bf to PENDING with url https://10.20.13.93:8080/job/merlin_core/39/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/78/*:refs/remotes/origin/pr/78/* # timeout=10
 > git rev-parse 6fae13032f558660dcefe3286c6b7ee0400205bf^{commit} # timeout=10
Checking out Revision 6fae13032f558660dcefe3286c6b7ee0400205bf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6fae13032f558660dcefe3286c6b7ee0400205bf # timeout=10
Commit message: "Update requirements.txt to use `==` for `dask` and `distributed`"
 > git rev-list --no-walk 5b48974be883364539cd732493b3c657c06f8211 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins3709684617353582845.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 230 items / 2 skipped
tests/unit/core/test_dispatch.py .                                       [  0%]

tests/unit/dag/test_base_operator.py EEEE                                [  2%]

tests/unit/dag/test_column_selector.py ..........................        [ 13%]

tests/unit/dag/test_tags.py ......                                       [ 16%]

tests/unit/dag/ops/test_selection.py ...                                 [ 17%]

tests/unit/schema/test_column_schemas.py ............................... [ 30%]

........................................................................ [ 62%]

.......................................................................  [ 93%]

tests/unit/schema/test_schema.py ......                                  [ 95%]

tests/unit/schema/test_schema_io.py ..                                   [ 96%]

tests/unit/utils/test_utils.py ...s.s.s                                  [100%]
==================================== ERRORS ====================================

___________ ERROR at setup of test_graph_validates_schemas[parquet] ____________
request = <SubRequest 'dataset' for <Function test_graph_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

______ ERROR at setup of test_compute_selector_validates_schemas[parquet] ______
request = <SubRequest 'dataset' for <Function test_compute_selector_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

____ ERROR at setup of test_compute_input_schema_validates_schemas[parquet] ____
request = <SubRequest 'dataset' for <Function test_compute_input_schema_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

___ ERROR at setup of test_compute_output_schema_validates_schemas[parquet] ____
request = <SubRequest 'dataset' for <Function test_compute_output_schema_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

=============================== warnings summary ===============================

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35023 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38899 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34741 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

ERROR tests/unit/dag/test_base_operator.py::test_graph_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_selector_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_input_schema_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_output_schema_validates_schemas[parquet]

============= 223 passed, 5 skipped, 3 warnings, 4 errors in 7.37s =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins7261871937943938479.sh

jperez999 · 2022-05-02T14:14:18Z

rerun tests

jperez999 · 2022-05-02T14:41:30Z

rerun tests

nvidia-merlin-bot · 2022-05-02T14:42:49Z

Click to view CI Results

GitHub pull request #78 of commit 6fae13032f558660dcefe3286c6b7ee0400205bf, no merge conflicts.
Running as SYSTEM
Setting status of 6fae13032f558660dcefe3286c6b7ee0400205bf to PENDING with url https://10.20.13.93:8080/job/merlin_core/41/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/78/*:refs/remotes/origin/pr/78/* # timeout=10
 > git rev-parse 6fae13032f558660dcefe3286c6b7ee0400205bf^{commit} # timeout=10
Checking out Revision 6fae13032f558660dcefe3286c6b7ee0400205bf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6fae13032f558660dcefe3286c6b7ee0400205bf # timeout=10
Commit message: "Update requirements.txt to use `==` for `dask` and `distributed`"
 > git rev-list --no-walk e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins3724537205971980355.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
  Downloading setuptools-62.1.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 38.0 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 61.0.0
    Uninstalling setuptools-61.0.0:
      Successfully uninstalled setuptools-61.0.0
Successfully installed setuptools-62.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 342 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 11%]

tests/unit/io/test_io.py ............................................... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 74%]

.......................................................................  [ 95%]

tests/unit/schema/test_schema.py ......                                  [ 97%]

tests/unit/schema/test_schema_io.py ..                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 72 warnings

/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_serial_context[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41655 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40185 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39973 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41093 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40205 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44509 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39971 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

================= 342 passed, 1 skipped, 84 warnings in 53.41s =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins5191163580780155278.sh

Change dask and distributed deps to >=2021.11.2

5b48974

karlhigley requested a review from jperez999 April 29, 2022 14:02

karlhigley self-assigned this Apr 29, 2022

karlhigley added this to the Merlin 22.05 milestone Apr 29, 2022

karlhigley added the clean up label Apr 29, 2022

karlhigley changed the title ~~Change dask and distributed deps to >=2021.11.2~~ Change dask and distributed deps to ==2021.11.2 Apr 29, 2022

Update requirements.txt to use == for dask and distributed

6fae130

jperez999 approved these changes Apr 29, 2022

View reviewed changes

Merge branch 'main' into fix/dask-dist-deps

d8d2b77

jperez999 merged commit 39ca553 into NVIDIA-Merlin:main May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change `dask` and `distributed` deps to `==2021.11.2` #78

Change `dask` and `distributed` deps to `==2021.11.2` #78

Uh oh!

karlhigley commented Apr 29, 2022

Uh oh!

github-actions bot commented Apr 29, 2022

Uh oh!

nvidia-merlin-bot commented Apr 29, 2022

Uh oh!

nvidia-merlin-bot commented Apr 29, 2022

Uh oh!

jperez999 commented May 2, 2022

Uh oh!

jperez999 commented May 2, 2022

Uh oh!

nvidia-merlin-bot commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change dask and distributed deps to ==2021.11.2 #78

Change dask and distributed deps to ==2021.11.2 #78

Uh oh!

Conversation

karlhigley commented Apr 29, 2022

Uh oh!

github-actions bot commented Apr 29, 2022

Documentation preview

Uh oh!

nvidia-merlin-bot commented Apr 29, 2022

Uh oh!

nvidia-merlin-bot commented Apr 29, 2022

Uh oh!

jperez999 commented May 2, 2022

Uh oh!

jperez999 commented May 2, 2022

Uh oh!

nvidia-merlin-bot commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change `dask` and `distributed` deps to `==2021.11.2` #78

Change `dask` and `distributed` deps to `==2021.11.2` #78