Initial OpenFold container commit #35

tonykew · 2025-08-26T21:01:27Z

Use numpy 1.x
Add libaio
x86_64 and ARM64 build working
Ran Inference tests on x86_64 and ARM64
Ran Training on 8 GPU DGX (x86_64) node:
Completed epoch 0 training
Restarted training from epoch 0 checkpoint AOK
Slurm examples for x86_64 and ARM64

NOTE: The following files have "raw" github URLs that will have to be fixed for production:

BUILD-ARM64.md
README.md

Tony

Use numpy 1.x Add libaio x86_64 and ARM64 build working Ran Inference tests on x86_64 and ARM64 Ran Training on 8 GPU DGX (x86_64) node: Completed ephoch 0 training Restarted training from epoch 0 checkpoint AOK Slurm examples for x86_64 and ARM64 NOTE: The following files have "raw" github URLs that will have to be fixed for production: BUILD-ARM64.md README.md Tony

Prevent attempting to run X86 binaries on ARM64 Tony

Tony

Avoids a "df" error when running OpenFold Tony

A GPU has to be requested with the "salloc" or the nvidia pieces, "nvcc" CUDA etc. don't get downloaded and installed. Even though "--exclusive" is used, "nvidia-smi -L" sees no GPU Tony

Build takes about 4 hours Tony

Tony

Note: ARM64 build still broken Tony

Tony

Occasionally there wil be the following error on container startup: /usr/bin/rm: cannot remove '/usr/local/cuda/compat/lib': Read-only file system rm: cannot remove '/usr/local/cuda/compat/lib': Read-only file system The nvidia continers add "--writable-tmpfs" to their containers but I can't find a way to do this in the .def file, so added to the container startup. Tony

Tony

Unfortunately here are slews of deprecation and future warnings that cannot be easily supresssed - needs code changes Tony

tonykew added 22 commits August 26, 2025 16:49

Added missing Slurm directive to fix X86 bin issue.

be66b73

Prevent attempting to run X86 binaries on ARM64 Tony

Fix QOS for the DGX partition

76c41a0

Tony

Fixed logic for Slurm script end messages

d00cf6c

Tony

Merge branch 'main' into OpenFold

bd22ed1

Merge branch 'main' into OpenFold

837156c

Create an empty tuning dir for triton

377871e

Avoids a "df" error when running OpenFold Tony

Fix ARM64 build

709cf3d

A GPU has to be requested with the "salloc" or the nvidia pieces, "nvcc" CUDA etc. don't get downloaded and installed. Even though "--exclusive" is used, "nvidia-smi -L" sees no GPU Tony

Increaed the salloc runtome to 5 hours

050ba12

Build takes about 4 hours Tony

Added missing "--account" option to salloc

8c04410

Tony

Docs missed an "exit"

8913877

Tony

Merge branch 'main' into OpenFold

343a68c

Added OpenFold to the "menu" README.md

0c1b059

Tony

Merge branch 'main' into OpenFold

7b0175b

Merge branch 'main' into OpenFold

632d683

Fix x86_64 build

6b2d0b3

Note: ARM64 build still broken Tony

Fix ARM64 build & updates for x86_64

7bc9b7f

Tony

Merge branch 'main' into OpenFold

a3cda8a

Merge branch 'main' into OpenFold

3dbc94a

Fixed the x86_86 build - OpenFold test runs.

30b7f8d

Tony

Fix ARM64 build

678c47e

Unfortunately here are slews of deprecation and future warnings that cannot be easily supresssed - needs code changes Tony

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial OpenFold container commit #35

Initial OpenFold container commit #35

Uh oh!

tonykew commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Initial OpenFold container commit #35

Are you sure you want to change the base?

Initial OpenFold container commit #35

Uh oh!

Conversation

tonykew commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant