Skip to content

Conversation

@thammegowda
Copy link
Collaborator

Description

While the cmake build produces *.whl files, they are not distributable via PyPI.
PyPI enforces certain rules to improve compatibility for different Linux distributions.
This PR adds scripts for producing pymarian wheel files that can be distributable on PyPI.

List of changes:

  • add src/python/build.sh and src/python/build-manylinux.sh scripts.
    The former invokes docker run whereas the latter runs within docker env to create wheels for python version 3.8 to 3.12
  • fixed an issue with Python compatibility (previously wrong headers were included for some Python versions). Solution: set(PYBIND11_NOPYTHON On) before adding pybind11
  • loosened the constraint for huggingface-hub version as the strict version causes conflicts with other libs (such as transformers) and also unavailable for some version of python

Added dependencies: require docker

How to test

Run src/python/build.sh to produce wheel files at build-python/manylinux/*.whl

Describe how you have tested your code, including OS and the cmake command.

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

@thammegowda
Copy link
Collaborator Author

thammegowda commented Aug 9, 2024

Turns out PyPI has file size limit of 100MB by default.
Our statically linked cuda supported native extensions are ~600MB. CMAKE_BUILD_TYPE=Slim did not help reduce the extension size when CUDA is enabled.
So I was unable to upload our packages today.

There is a process to request the increase of file size limit. For instance, pytorch packages are ~800MB and they have successfully uploaded to PyPI. I have followed PyPI's suggested process for requesting limit increase.
I am not sure how long they will take to review our request and approve it. Fingers crossed.

Reference to track progress: pypi/support#4520

@thammegowda
Copy link
Collaborator Author

update: the limit was increased this morning, and I have uploaded packages to PyPI, which are compatible with both CUDA and intel MKL backends.

# this wont work if pybind11 is git submodule
#find_package(pybind11 REQUIRED)

# NOTE: this property must be set before including pybind11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems confusing given the commented line below. Please check if it's okay and explain in the comment or fix.

"sacremoses",
"pyqt5",
"sentence-splitter@git+https://github.com/mediacloud/sentence-splitter",
# "sentence-splitter@git+https://github.com/mediacloud/sentence-splitter",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain in the comment why it's commented (because not used yet but will be in pymarian-webapp?) or remove.

@thammegowda
Copy link
Collaborator Author

Moved this PR to internal fork. Leaving it open here in case somebody wants to build manylinux wheels and we should close this PR once the code is synced

emjotde pushed a commit that referenced this pull request Jul 9, 2025
* pymarian: manylinux whl builder;
* bind main() function.  Upon pip install, a "pymarian" is made available which has same functionality as "marian" CLI.
* fix github CI and devops CI with recent changes to cmake build for multiple python versions

This PR was originally on public fork
* #1029
* #1028
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants