Skip to content

[RFC] Implement basic on disk caching #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: oulgen/stack/25
Choose a base branch
from

Conversation

oulgen
Copy link
Contributor

@oulgen oulgen commented Jul 19, 2025

Stacked PRs:


[RFC] Implement basic on disk caching

oulgen added a commit that referenced this pull request Jul 19, 2025
stack-info: PR: #336, branch: oulgen/stack/26
@oulgen oulgen force-pushed the oulgen/stack/25 branch from 0c8b535 to bc3438b Compare July 19, 2025 23:40
@oulgen oulgen force-pushed the oulgen/stack/26 branch from 6895b25 to 544375a Compare July 19, 2025 23:40
@oulgen oulgen mentioned this pull request Jul 19, 2025
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 19, 2025
@oulgen oulgen requested review from jansel, yf225 and drisspg July 19, 2025 23:40
@oulgen oulgen changed the base branch from oulgen/stack/25 to main July 19, 2025 23:50
oulgen added a commit that referenced this pull request Jul 19, 2025
stack-info: PR: #336, branch: oulgen/stack/26
@oulgen oulgen force-pushed the oulgen/stack/26 branch from 544375a to 37aad8b Compare July 19, 2025 23:50
@oulgen oulgen changed the base branch from main to oulgen/stack/25 July 19, 2025 23:50
stack-info: PR: #336, branch: oulgen/stack/26
@oulgen oulgen changed the base branch from oulgen/stack/25 to main July 20, 2025 00:08
@oulgen oulgen force-pushed the oulgen/stack/26 branch from 37aad8b to 7bd22d1 Compare July 20, 2025 00:08
@oulgen oulgen changed the base branch from main to oulgen/stack/25 July 20, 2025 00:08
Comment on lines +34 to +37
helion_key: Hash of source code of Helion
torch_key: Hash of source code of PyTorch
system_hash: Hash of system information,
including Triton, current device, cuda/rocm arch version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include these by default?

I was thinking for this use case, where autotuning is very expensive and caching is off by default, it may make sense to default to a more minimal key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take is correctness is utmost important, and by default we should have as strict key as possible because if we are writing to the file system, people update their helion/triton/torch version, we shouldnt give them bad autotuning results.

Having said this, on L26, one of the TODOs is giving the users customization over the cache key. With that customization, we can allow users to choose to omit one or more of these by-default-included-strict requirements.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think baseline here is users copy and pasting the config into their code. This makes the config a BC surface where we need to make sure old configs work with new versions of Helion/PyTorch.

I'm a lot more worried about surprise re-autotuning.

return hashlib.sha256(repr(self).encode("utf-8")).hexdigest()


class AutotuneCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

Comment on lines +69 to +72
for idx, a in enumerate(args):
if isinstance(a, torch.Tensor):
input_dtypes.append((idx, a.dtype))
input_shapes.append((idx, a.shape))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mimic/reuse some of the key generation infra in kernel.py (for the in-memory cache). IMO the memory key should be a subset of the disk key. I think right now the memory key includes some extra stuff.

cache_dir, # pyright: ignore[reportPrivateImportUsage]
)

return Path(cache_dir()) / "helion" / f"{self._get_cache_key()}.best_config"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to let users specify a filename for the cache so they can more easily distribute it to production environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO on L27 is about importing and exporting a file with all the autotune settings. Do you think that would be good enough for should I add an env option to control cache directory?

FWIW we can just do both


def put(self, config: Config) -> None:
path = self._get_local_cache_path()
config.save(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the save to a temp file and rename trick here so the change is atomic. I thinking about multi-process races.



class TestCache(TestCase):
@mock.patch("helion.autotuner.DifferentialEvolutionSearch", new=BasicSearch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a cleaner way for people to set the autotuner to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, lets define an API that people can set. Out of the scope of this PR but I can look into it. Filed #337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants