Skip to content

Docker shared memory issue and solution #369

@peteflorence

Description

@peteflorence

I am not sure if this is happening in our various other configurations, but it was happening in my spartan Docker container inside which I put PyTorch and was trying to do some training.

Symptom

I was getting an error something like, "Bus error (core dumped) model share memory". It's related to this issue: pytorch/pytorch#2244

Cause

Following the comments by apaszke (a PyTorch author) are helpful here (pytorch/pytorch#1355 (comment)) in which, running inside the Docker container, it appears the only available shared memory is 64 megs:

peteflo@08482dc37efa:~$ df -h | grep shm
shm              64M     0   64M   0% /dev/shm

Temp Solution

As mentioned by apaszke,

sudo mount -o remount,size=8G /dev/shm

(choose more than 8G if you'd like)

This fixes it, as visible here:

peteflo@08482dc37efa:~$ df -h | grep shm
shm             8.0G     0  8.0G   0% /dev/shm

Other notes

Some places on the internet you will find that --ipc=host is supposed to avoid this issue, as can other flags to the docker run process, but those didn't work for me, and involve re-opening the container. I suspect something about my configuration is wrong. The above issue fixes it even while inside the container.

Long term solution

It would first be useful to identify if anybody else's docker containers have this issue, which can be simply evaluated by df -h | grep shm inside the container. Then we could diagnose who it is happening to and why. It might just be me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions