Skip to content

Guide: Compiling tokenizers on Android/Termux #1902

@Manamama-Gemini-Cloud-AI-01

Description

Hello Hugging Face team and fellow developers,

This is a guide for anyone trying to install tokenizers (or packages that depend on it, like transformers or docling) on an Android device using Termux. Currently, there are no other issues mentioning Termux, so hopefully, this guide can help others.

The Problem

When running pip install tokenizers in a standard Termux environment, the installation fails during the compilation of a C++ dependency with an error similar to this:

error: use of undeclared identifier 'pthread_cond_clockwait'

This happens because the build system is targeting an Android API level where this function is not available in the C library headers.

The Solution

The solution is to force the compilation from source and pass specific flags to the C++ compiler to set the correct Android API level and link the required libraries.

Here is a step-by-step guide:

Step 1: Install Build Dependencies

You will need the Rust toolchain and other build essentials. You can install them in Termux using pkg:

pkg update && pkg install rust clang make maturin

Step 2: Find Your Android API Level

The fix requires telling the compiler which Android API level you are using. You can get this number by running the following command in your Termux shell:

getprop ro.build.version.sdk

This will return a number, for example 29, 30, 33, etc. This function (pthread_cond_clockwait) was introduced in API level 21, so your device's level should be higher than that.

Step 3: Compile and Install tokenizers

Now, you can install the package using pip. The command below will automatically use the API level from the previous step.

# This command automatically gets your API level and uses it to compile tokenizers
ANDROID_API_LEVEL=$(getprop ro.build.version.sdk)
CXXFLAGS="-lpthread -D__ANDROID_API__=${ANDROID_API_LEVEL}" pip install tokenizers --no-binary :all:

After this, pip install tokenizers (and packages that depend on it) should succeed.

Explanation of the Flags:

  • CXXFLAGS="...": This sets environment variables to pass flags to the C++ compiler.
  • -lpthread: This flag explicitly tells the linker to link against the POSIX threads library.
  • -D__ANDROID_API__=${ANDROID_API_LEVEL}: This is the critical part. It defines a macro that tells the C++ headers to expose functions available for your specific Android version, making pthread_cond_clockwait visible to the compiler.
  • --no-binary :all:: This forces pip to ignore pre-compiled wheels and build the package from the source code, which is necessary for the flags to be applied.

Hope this helps other developers working in the Termux environment!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions