Skip to content

Conversation

@karlhigley
Copy link
Contributor

No description provided.

@karlhigley karlhigley requested a review from jperez999 April 29, 2022 14:02
@karlhigley karlhigley self-assigned this Apr 29, 2022
@karlhigley karlhigley added this to the Merlin 22.05 milestone Apr 29, 2022
@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-78

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #78 of commit 5b48974be883364539cd732493b3c657c06f8211, no merge conflicts.
Running as SYSTEM
Setting status of 5b48974be883364539cd732493b3c657c06f8211 to PENDING with url https://10.20.13.93:8080/job/merlin_core/38/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/78/*:refs/remotes/origin/pr/78/* # timeout=10
 > git rev-parse 5b48974be883364539cd732493b3c657c06f8211^{commit} # timeout=10
Checking out Revision 5b48974be883364539cd732493b3c657c06f8211 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5b48974be883364539cd732493b3c657c06f8211 # timeout=10
Commit message: "Change `dask` and `distributed` deps to `>=2021.11.2`"
 > git rev-list --no-walk ae8cd6f884df7b824b31295387e718e923f4a5fb # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins5319646179197397235.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 230 items / 2 skipped

tests/unit/core/test_dispatch.py . [ 0%]
tests/unit/dag/test_base_operator.py EEEE [ 2%]
tests/unit/dag/test_column_selector.py .......................... [ 13%]
tests/unit/dag/test_tags.py ...... [ 16%]
tests/unit/dag/ops/test_selection.py ... [ 17%]
tests/unit/schema/test_column_schemas.py ............................... [ 30%]
........................................................................ [ 62%]
....................................................................... [ 93%]
tests/unit/schema/test_schema.py ...... [ 95%]
tests/unit/schema/test_schema_io.py .. [ 96%]
tests/unit/utils/test_utils.py ...s.s.s [100%]

==================================== ERRORS ====================================
___________ ERROR at setup of test_graph_validates_schemas[parquet] ____________

request = <SubRequest 'dataset' for <Function test_graph_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
______ ERROR at setup of test_compute_selector_validates_schemas[parquet] ______

request = <SubRequest 'dataset' for <Function test_compute_selector_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
____ ERROR at setup of test_compute_input_schema_validates_schemas[parquet] ____

request = <SubRequest 'dataset' for <Function test_compute_input_schema_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
___ ERROR at setup of test_compute_output_schema_validates_schemas[parquet] ____

request = <SubRequest 'dataset' for <Function test_compute_output_schema_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-12/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fb216e719a0>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
=============================== warnings summary ===============================
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35919 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35105 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42055 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
ERROR tests/unit/dag/test_base_operator.py::test_graph_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_selector_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_input_schema_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_output_schema_validates_schemas[parquet]
============= 223 passed, 5 skipped, 3 warnings, 4 errors in 7.45s =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins2539145909978768347.sh

@karlhigley karlhigley changed the title Change dask and distributed deps to >=2021.11.2 Change dask and distributed deps to ==2021.11.2 Apr 29, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #78 of commit 6fae13032f558660dcefe3286c6b7ee0400205bf, no merge conflicts.
Running as SYSTEM
Setting status of 6fae13032f558660dcefe3286c6b7ee0400205bf to PENDING with url https://10.20.13.93:8080/job/merlin_core/39/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/78/*:refs/remotes/origin/pr/78/* # timeout=10
 > git rev-parse 6fae13032f558660dcefe3286c6b7ee0400205bf^{commit} # timeout=10
Checking out Revision 6fae13032f558660dcefe3286c6b7ee0400205bf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6fae13032f558660dcefe3286c6b7ee0400205bf # timeout=10
Commit message: "Update requirements.txt to use `==` for `dask` and `distributed`"
 > git rev-list --no-walk 5b48974be883364539cd732493b3c657c06f8211 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins3709684617353582845.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 230 items / 2 skipped

tests/unit/core/test_dispatch.py . [ 0%]
tests/unit/dag/test_base_operator.py EEEE [ 2%]
tests/unit/dag/test_column_selector.py .......................... [ 13%]
tests/unit/dag/test_tags.py ...... [ 16%]
tests/unit/dag/ops/test_selection.py ... [ 17%]
tests/unit/schema/test_column_schemas.py ............................... [ 30%]
........................................................................ [ 62%]
....................................................................... [ 93%]
tests/unit/schema/test_schema.py ...... [ 95%]
tests/unit/schema/test_schema_io.py .. [ 96%]
tests/unit/utils/test_utils.py ...s.s.s [100%]

==================================== ERRORS ====================================
___________ ERROR at setup of test_graph_validates_schemas[parquet] ____________

request = <SubRequest 'dataset' for <Function test_graph_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
______ ERROR at setup of test_compute_selector_validates_schemas[parquet] ______

request = <SubRequest 'dataset' for <Function test_compute_selector_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
____ ERROR at setup of test_compute_input_schema_validates_schemas[parquet] ____

request = <SubRequest 'dataset' for <Function test_compute_input_schema_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
___ ERROR at setup of test_compute_output_schema_validates_schemas[parquet] ____

request = <SubRequest 'dataset' for <Function test_compute_output_schema_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7fc32d879760>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
=============================== warnings summary ===============================
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35023 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38899 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34741 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
ERROR tests/unit/dag/test_base_operator.py::test_graph_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_selector_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_input_schema_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_output_schema_validates_schemas[parquet]
============= 223 passed, 5 skipped, 3 warnings, 4 errors in 7.37s =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins7261871937943938479.sh

@jperez999
Copy link
Collaborator

rerun tests

1 similar comment
@jperez999
Copy link
Collaborator

rerun tests

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #78 of commit 6fae13032f558660dcefe3286c6b7ee0400205bf, no merge conflicts.
Running as SYSTEM
Setting status of 6fae13032f558660dcefe3286c6b7ee0400205bf to PENDING with url https://10.20.13.93:8080/job/merlin_core/41/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/78/*:refs/remotes/origin/pr/78/* # timeout=10
 > git rev-parse 6fae13032f558660dcefe3286c6b7ee0400205bf^{commit} # timeout=10
Checking out Revision 6fae13032f558660dcefe3286c6b7ee0400205bf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6fae13032f558660dcefe3286c6b7ee0400205bf # timeout=10
Commit message: "Update requirements.txt to use `==` for `dask` and `distributed`"
 > git rev-list --no-walk e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins3724537205971980355.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
  Downloading setuptools-62.1.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 38.0 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 61.0.0
    Uninstalling setuptools-61.0.0:
      Successfully uninstalled setuptools-61.0.0
Successfully installed setuptools-62.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 342 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 11%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 72 warnings
/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_serial_context[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41655 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40185 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39973 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41093 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40205 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44509 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39971 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 342 passed, 1 skipped, 84 warnings in 53.41s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins5191163580780155278.sh

@jperez999 jperez999 merged commit 39ca553 into NVIDIA-Merlin:main May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants