Skip to content

Conversation

@mikemckiernan
Copy link
Member

Revise to use ext-toc and MyST-NB.

@mikemckiernan mikemckiernan added the documentation Improvements or additions to documentation label May 2, 2022
@mikemckiernan mikemckiernan added this to the Merlin 22.05 milestone May 2, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #79 of commit e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a, no merge conflicts.
Running as SYSTEM
Setting status of e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a to PENDING with url https://10.20.13.93:8080/job/merlin_core/40/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/79/*:refs/remotes/origin/pr/79/* # timeout=10
 > git rev-parse e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a^{commit} # timeout=10
Checking out Revision e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a # timeout=10
Commit message: "docs: Add ext-toc and copydirs"
 > git rev-list --no-walk 6fae13032f558660dcefe3286c6b7ee0400205bf # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins4852994142891201096.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 230 items / 2 skipped

tests/unit/core/test_dispatch.py . [ 0%]
tests/unit/dag/test_base_operator.py EEEE [ 2%]
tests/unit/dag/test_column_selector.py .......................... [ 13%]
tests/unit/dag/test_tags.py ...... [ 16%]
tests/unit/dag/ops/test_selection.py ... [ 17%]
tests/unit/schema/test_column_schemas.py ............................... [ 30%]
........................................................................ [ 62%]
....................................................................... [ 93%]
tests/unit/schema/test_schema.py ...... [ 95%]
tests/unit/schema/test_schema_io.py .. [ 96%]
tests/unit/utils/test_utils.py ...s.s.s [100%]

==================================== ERRORS ====================================
___________ ERROR at setup of test_graph_validates_schemas[parquet] ____________

request = <SubRequest 'dataset' for <Function test_graph_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
______ ERROR at setup of test_compute_selector_validates_schemas[parquet] ______

request = <SubRequest 'dataset' for <Function test_compute_selector_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
____ ERROR at setup of test_compute_input_schema_validates_schemas[parquet] ____

request = <SubRequest 'dataset' for <Function test_compute_input_schema_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
___ ERROR at setup of test_compute_output_schema_validates_schemas[parquet] ____

request = <SubRequest 'dataset' for <Function test_compute_output_schema_validates_schemas[parquet]>>
paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']
engine = 'parquet'

@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv
  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)

tests/conftest.py:216:


merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
merlin/io/parquet.py:311: in init
self._real_meta, rg_byte_size_0 = run_on_worker(
merlin/core/utils.py:488: in run_on_worker
return func(*args, **kwargs)


path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'
fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>
cpu = False, n = 1, memory_usage = True, kwargs = {}

def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:
      if cudf.utils.ioutils._is_local_filesystem(fs):

E AttributeError: 'NoneType' object has no attribute 'utils'

merlin/io/parquet.py:1207: AttributeError
=============================== warnings summary ===============================
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37945 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37279 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43653 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
ERROR tests/unit/dag/test_base_operator.py::test_graph_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_selector_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_input_schema_validates_schemas[parquet]
ERROR tests/unit/dag/test_base_operator.py::test_compute_output_schema_validates_schemas[parquet]
============= 223 passed, 5 skipped, 3 warnings, 4 errors in 7.66s =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins4610322860672222656.sh

@github-actions
Copy link

github-actions bot commented May 2, 2022

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-79

@mikemckiernan mikemckiernan merged commit 3f4ce43 into NVIDIA-Merlin:main May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants