docs: Add ext-toc and copydirs #79

mikemckiernan · 2022-05-02T14:16:16Z

Revise to use ext-toc and MyST-NB.

nvidia-merlin-bot · 2022-05-02T14:16:44Z

Click to view CI Results

GitHub pull request #79 of commit e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a, no merge conflicts.
Running as SYSTEM
Setting status of e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a to PENDING with url https://10.20.13.93:8080/job/merlin_core/40/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/79/*:refs/remotes/origin/pr/79/* # timeout=10
 > git rev-parse e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a^{commit} # timeout=10
Checking out Revision e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e128d9be9ac516aed9a9e1ee78d5f895d53cdf9a # timeout=10
Commit message: "docs: Add ext-toc and copydirs"
 > git rev-list --no-walk 6fae13032f558660dcefe3286c6b7ee0400205bf # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins4852994142891201096.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 230 items / 2 skipped
tests/unit/core/test_dispatch.py .                                       [  0%]

tests/unit/dag/test_base_operator.py EEEE                                [  2%]

tests/unit/dag/test_column_selector.py ..........................        [ 13%]

tests/unit/dag/test_tags.py ......                                       [ 16%]

tests/unit/dag/ops/test_selection.py ...                                 [ 17%]

tests/unit/schema/test_column_schemas.py ............................... [ 30%]

........................................................................ [ 62%]

.......................................................................  [ 93%]

tests/unit/schema/test_schema.py ......                                  [ 95%]

tests/unit/schema/test_schema_io.py ..                                   [ 96%]

tests/unit/utils/test_utils.py ...s.s.s                                  [100%]
==================================== ERRORS ====================================

___________ ERROR at setup of test_graph_validates_schemas[parquet] ____________
request = <SubRequest 'dataset' for <Function test_graph_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

______ ERROR at setup of test_compute_selector_validates_schemas[parquet] ______
request = <SubRequest 'dataset' for <Function test_compute_selector_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

____ ERROR at setup of test_compute_input_schema_validates_schemas[parquet] ____
request = <SubRequest 'dataset' for <Function test_compute_input_schema_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

___ ERROR at setup of test_compute_output_schema_validates_schemas[parquet] ____
request = <SubRequest 'dataset' for <Function test_compute_output_schema_validates_schemas[parquet]>>

paths = ['/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-1.parquet']

engine = 'parquet'
@pytest.fixture(scope="function")
def dataset(request, paths, engine):
    try:
        gpu_memory_frac = request.getfixturevalue("gpu_memory_frac")
    except Exception:  # pylint: disable=broad-except
        gpu_memory_frac = 0.01

    try:
        cpu = request.getfixturevalue("cpu")
    except Exception:  # pylint: disable=broad-except
        cpu = False

    kwargs = {}
    if engine == "csv-no-header":
        kwargs["names"] = allcols_csv


  return merlin.io.Dataset(paths, part_mem_fraction=gpu_memory_frac, cpu=cpu, **kwargs)


tests/conftest.py:216:

merlin/io/dataset.py:303: in init

self.engine = ParquetDatasetEngine(

merlin/io/parquet.py:311: in init

self._real_meta, rg_byte_size_0 = run_on_worker(

merlin/core/utils.py:488: in run_on_worker

return func(*args, **kwargs)

path = '/tmp/pytest-of-jenkins/pytest-1/parquet0/dataset-0.parquet'

fs = <fsspec.implementations.local.LocalFileSystem object at 0x7f6476be9b80>

cpu = False, n = 1, memory_usage = True, kwargs = {}
def _sample_row_group(path, fs, cpu=False, n=1, memory_usage=False, **kwargs):
    """Return the first Parquet Row-Group for a given path

    The memory_usage of the row-group will also be returned
    if `memory_usage=True`.
    """
    if cpu:
        with fs.open(path, "rb") as f0:
            # Use pyarrow for CPU version.
            # Pandas does not enable single-row-group access.
            _df = pq.ParquetFile(f0).read_row_group(0).to_pandas()
    else:


      if cudf.utils.ioutils._is_local_filesystem(fs):


E           AttributeError: 'NoneType' object has no attribute 'utils'
merlin/io/parquet.py:1207: AttributeError

=============================== warnings summary ===============================

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37945 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37279 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43653 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

ERROR tests/unit/dag/test_base_operator.py::test_graph_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_selector_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_input_schema_validates_schemas[parquet]

ERROR tests/unit/dag/test_base_operator.py::test_compute_output_schema_validates_schemas[parquet]

============= 223 passed, 5 skipped, 3 warnings, 4 errors in 7.66s =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins4610322860672222656.sh

github-actions · 2022-05-02T14:19:18Z

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-79

docs: Add ext-toc and copydirs

e128d9b

mikemckiernan added the documentation Improvements or additions to documentation label May 2, 2022

mikemckiernan added this to the Merlin 22.05 milestone May 2, 2022

mikemckiernan mentioned this pull request May 2, 2022

Simplify and enhance using examples in documentation NVIDIA-Merlin/Merlin#218

Closed

3 tasks

karlhigley approved these changes May 2, 2022

View reviewed changes

mikemckiernan merged commit 3f4ce43 into NVIDIA-Merlin:main May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add ext-toc and copydirs #79

docs: Add ext-toc and copydirs #79

Uh oh!

mikemckiernan commented May 2, 2022

Uh oh!

nvidia-merlin-bot commented May 2, 2022

Uh oh!

github-actions bot commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: Add ext-toc and copydirs #79

docs: Add ext-toc and copydirs #79

Uh oh!

Conversation

mikemckiernan commented May 2, 2022

Uh oh!

nvidia-merlin-bot commented May 2, 2022

Uh oh!

github-actions bot commented May 2, 2022

Documentation preview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants