Skip to content

Creating Docker artifacts for a Google Cloud Storage bucket #240

@bensoltoff

Description

@bensoltoff

Describe the bug

Seems to be a repeat of #165. I want to create Docker artifacts for a model stored on a GCS bucket, but the gcsfs package is not being listed in the vetiver_requirements.txt file so Docker cannot run the container.

To Reproduce

Create and store a model on a GCS bucket, and generate the required Docker artifacts. I used

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from pins import board_gcs
from vetiver import VetiverModel, vetiver_pin_write, prepare_docker
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Load Palmer Penguins dataset
url = "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv"
penguins = pd.read_csv(url)

# Drop rows with missing values
penguins = penguins.dropna()

# Select features and target
X = penguins[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]]
y = penguins["body_mass_g"]

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Fit linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Create Vetiver model
v = VetiverModel(model, "penguins-test", prototype_data = X_train)

# Store on board
board = board_gcs("info-4940-models/test/", cache=None, allow_pickle_read=True)
vetiver_pin_write(board, v)

# Create Docker artifacts
prepare_docker(
    board, 
    "penguins-test",
    path = "~/Desktop/penguins-test"
)

Note that prepare_docker() included this warning

/Users/bcs88/Projects/info-4940/assessments/.venv/lib/python3.13/site-packages/vetiver/attach_pkgs.py:77: UserWarning: required packages unknown for board protocol: ('gs', 'gcs'), add to model's metadata to export

I then built and ran the Docker container.

docker build -t penguins .
docker run -p 8080:8080 penguins

Which generated this output

Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/fsspec/registry.py", line 261, in get_filesystem_class
    register_implementation(protocol, _import_class(bit["class"]))
                                      ~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/fsspec/registry.py", line 296, in _import_class
    mod = importlib.import_module(mod)
  File "/usr/local/lib/python3.13/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'gcsfs'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/usr/local/lib/python3.13/site-packages/click/core.py", line 1462, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/click/core.py", line 1383, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.13/site-packages/click/core.py", line 1246, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/click/core.py", line 814, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/uvicorn/main.py", line 423, in main
    run(
    ~~~^
        app,
        ^^^^
    ...<46 lines>...
        h11_max_incomplete_event_size=h11_max_incomplete_event_size,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/uvicorn/main.py", line 593, in run
    server.run()
    ~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/uvicorn/server.py", line 67, in run
    return asyncio_run(self.serve(sockets=sockets), loop_factory=self.config.get_loop_factory())
  File "/usr/local/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/usr/local/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/local/lib/python3.13/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/uvicorn/server.py", line 71, in serve
    await self._serve(sockets)
  File "/usr/local/lib/python3.13/site-packages/uvicorn/server.py", line 78, in _serve
    config.load()
    ~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/uvicorn/config.py", line 439, in load
    self.loaded_app = import_from_string(self.app)
                      ~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.13/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1027, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/vetiver/app/app.py", line 8, in <module>
    b = pins.board_gcs('info-4940-models/test/', allow_pickle_read=True)
  File "/usr/local/lib/python3.13/site-packages/pins/constructors.py", line 542, in board_gcs
    return board("gcs", path, versioned, cache, allow_pickle_read, storage_options=opts)
  File "/usr/local/lib/python3.13/site-packages/pins/constructors.py", line 97, in board
    fs = fsspec.filesystem(protocol, **storage_options)
  File "/usr/local/lib/python3.13/site-packages/fsspec/registry.py", line 321, in filesystem
    cls = get_filesystem_class(protocol)
  File "/usr/local/lib/python3.13/site-packages/fsspec/registry.py", line 263, in get_filesystem_class
    raise ImportError(bit.get("err")) from e
ImportError: Please install gcsfs to access Google Storage

As expected since gcsfs is not listed in the vetiver_requirements.txt.

Expected behavior

I expected gcsfs to be automatically included in the requirements file. I can edit the file manually to add the requirement and then I can successfully build and run the container. But I thought #166 automated this step.

Desktop (please complete the following information):

Positron Version: 2025.10.1 build 4
Code - OSS Version: 1.103.0
Commit: b13dd1ca4803bc04a4a9165395b589b8caf4ab58
Date: 2025-10-14T21:43:42.876Z
Electron: 37.2.3
Chromium: 138.0.7204.100
Node.js: 22.17.0
V8: 13.8.500258-electron.0
OS: Darwin arm64 25.0.0

I confirmed I am using 0.2.6 which should include #166.

pip show vetiver
Name: vetiver
Version: 0.2.6
Summary: Version, share, deploy, and monitor models.
Home-page: https://github.com/rstudio/vetiver-python
Author: 
Author-email: Isabel Zimmerman <[email protected]>
License: MIT
Location: /Users/bcs88/Projects/info-4940/assessments/.venv/lib/python3.13/site-packages
Requires: fastapi, httpx, joblib, nest-asyncio, numpy, pandas, pins, pip-tools, plotly, pydantic, python-dotenv, requests, rsconnect-python, scikit-learn, uvicorn
Required-by: 

Additional context

Only thing that sticks out to me is

UserWarning: required packages unknown for board protocol: ('gs', 'gcs'), add to model's metadata to export

Which lists the protocols in reverse order compared to attach_pkgs.py.

elif prot == ("gcs", "gs"):
return ["gcsfs"]

But I am primarily an R user, so I don't know if the order matters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdeploypinsrelated to pins package

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions