Refactor and generalize `loss.py` #635

ppegolo · 2025-06-23T15:16:32Z

Fixes #629

Refactoring of the loss modules as discussed in #629 (with @SanggyuChong).

dynamic lookup and registration as suggested by @HaoZeke
moved <target_name> as top level section in the loss field in the input file
grouped "torch-like" losses such as MSELoss, L1Loss, HuberLoss to a common interface

Contributor (creator of pull-request) checklist

Tests updated (for new features and bugfixes)?
Documentation updated (for new features)?
Issue referenced (for PRs that solve an issue)?

Reviewer checklist

CHANGELOG updated with public API or any other important changes?

📚 Documentation preview 📚: https://metatrain--635.org.readthedocs.build/en/635/

HaoZeke

Thanks for the PR @ppegolo! I think it looks great, have some comments for discussion inline, one thing which is missing for approval though is tests

HaoZeke · 2025-07-04T00:54:48Z

src/metatrain/utils/custom_loss.py

+        # Initialize the base loss weight on the first call
+        if not self.scheduler.initialized:
+            self.sliding_weight = self.scheduler.initialize(self.base, targets)


This could go to the constructor (__init__) so it isn't called every time compute is used..

It's called there because we have access to the targets only at compute, not at the time the loss is initialized in the trainer. Maybe there's a way to redesign it, but we've implemented it this way to have minimal change with respect to the current implementation of Trainer

src/metatrain/utils/custom_loss.py

HaoZeke · 2025-07-04T01:00:01Z

src/metatrain/utils/custom_loss.py

+        self.base = base_loss
+        self.scheduler = scheduler
+        self.target = base_loss.target
+        self.reduction = base_loss.reduction


If we're always composing / delegatin then it might be nicer to store the object and use it, e.g.
self._base_loss = base_loss

Just in case we forget to delegate anything else later.

HaoZeke · 2025-07-04T01:10:47Z

src/metatrain/utils/custom_loss.py

+    """
+    Metaclass to auto-register :py:class:`LossInterface` subclasses.
+
+    Maintains a mapping from ``registry_name`` to the subclass type.


I think perhaps, if we're only discovering internal plugins, then an enum might be clearer (single source of truth). So something like:

class CoreLoss(Enum): MSE = TensorMapMSELoss ...

If / when we want to allow third party loss functions (i.e. in a separate pip install-able package) then we'd add in a Registry then which discovers plugins off of entry_points(group='metatrain.losses') and provides a unified view of all the loss functions present?

Note that this is a minor design nit, I think we can easily do that later too... mostly I'm thinking an explicit interface for internal losses is clearer (and in keeping with the "Zen of Python") rather than auto-discovery here.

enum or even just a (factory) function somewhere doing

def get_loss(name, hypers): if name == "mse": return TensorMapMSELoss(**hypers) ...

We can then extend the factor however we want down the line

Thanks for the comments! If we need to have a register at some point when we'll allow for external loss functions, is it so bad to have it already now, to avoid having to re-implement it then?

I'm not sure we'll want to use a register for external loss function. The goal is not so much to have user A use the loss function provided by user B in some other python package, but rather allow user A to directly provide a python script with a loss. This can be done without any kind of global registry

FWIW regarding external package entrypoint registration there's also this (draft) spec. https://scientific-python.org/specs/spec-0002/

HaoZeke · 2025-07-04T01:16:21Z

src/metatrain/utils/custom_loss.py

+            # Use explicit registry_name if given, else snake_case from class name
+            key = getattr(cls, "registry_name", None)
+            if key is None:
+                key = "".join(
+                    f"_{c.lower()}" if c.isupper() else c for c in name
+                ).lstrip("_")
+            # only register the very first class under each key
+            mcs._registry.setdefault(key, cls)


Suggested change

# Use explicit registry_name if given, else snake_case from class name

key = getattr(cls, "registry_name", None)

if key is None:

key = "".join(

f"_{c.lower()}" if c.isupper() else c for c in name

).lstrip("_")

# only register the very first class under each key

mcs._registry.setdefault(key, cls)

if name != "LossInterface" and issubclass(cls, LossInterface):

if "registry_name" not in attrs:

raise TypeError(f"Class '{name}' must define a 'registry_name' class attribute")

Indentation might be off. It is good to have a fallback in general, but not if we only have a set number of loss functions in metatrain, since we can easily enforce a fixed name for each new loss (esp. with the enum suggestion).

If we end up with third party loss support, then this would make much more sense :)

@Luthaf what's your opinion on this?

I agree that names for the losses should be explicit, but I don't think we need to go through through a registration mechanism. For now, doing something like

def get_loss(name: str): if name == "mse": return MSELoss(...) elif name == "mae": return MAELoss(...) elif name == "whatever": return WhateverLoss(...) else: raise ValueError(f"unknown loss function {name}")

The main advantage of a registry would be to allow loss defined outside of metatrain, but this is not something we need yet, so let's apply YAGNI

PicoCentauri · 2025-07-04T10:10:10Z

src/metatrain/soap_bpnn/schema-hypers.json

@@ -211,5 +171,35 @@
      "uniqueItems": true
    }
  },
-  "additionalProperties": false
+  "additionalProperties": false,


This has to go in any of the models that use the new loss function, right. Maybe it makes sense that we write a general json schema for the loss and only "import" those in the architecture ones.

I will figure out if there is a clever way to do it.

ppegolo and others added 29 commits June 23, 2025 10:19

Exploratory changes

1653345

Merge branch 'main' into generic_loss

e927204

A few name changes

efbfe56

New hypers for loss functions

3a8ae2c

Linting

05eddfd

revive sliding weight routine for PET support

60a1beb

import torch loss for type control

332fe2a

Add wrapper for torch losses

4c28710

fix list-based handling of single gradients

47f07be

first attempt at sliding loss implementation in LossAggregator

a7b08a6

Merge branch 'main' into generic_loss

a85fb3b

Fix mypy complaints

e73c563

Fix variable name

e9331d6

Attempt to move sliding weight setup/update to separate methods

521ac53

Use sliding weights scheduler

8785170

Update loss info logging

25ee5c9

Remove commented lines

0029a74

Add a few comments and simplify average tensormap creation

c6392cc

Add reduction to base attributes

112067c

Add some docstrings

45a8341

update soap-bpnn schema

4b727bd

Add place-holder for extra_data argument

9a91ea5

Merge branch 'main' into generic_loss

83d70b1

Avoid missing key error for multiple-system training

d38bef3

Decouple LossAggregator and sliding weights scheduler

47d6655

Add documentation stub

f0d515f

Linting

7c2327c

add loss docpage to the index

6a8bf08

Merge branch 'main' into generic_loss

1cdfdde

ppegolo marked this pull request as ready for review July 2, 2025 15:48

ppegolo requested a review from frostedoyster as a code owner July 2, 2025 15:48

ppegolo requested a review from HaoZeke July 3, 2025 07:26

HaoZeke requested changes Jul 4, 2025

View reviewed changes

PicoCentauri reviewed Jul 4, 2025

View reviewed changes

HaoZeke mentioned this pull request Jul 7, 2025

Consider pydantic for user input validation instead of json-schema #651

Open

jwa7 added 2 commits July 7, 2025 18:39

Merge branch 'main' into generic_loss

1921970

Replace loss in PET and implement masked loss classes

f091f9b

jwa7 requested a review from abmazitov as a code owner July 8, 2025 09:13

ppegolo and others added 11 commits July 8, 2025 17:01

Use enum and factory instead of registry and update all models

e08825f

Merge branch 'main' into generic_loss

9821f57

linting

41c2dda

Merge branch 'main' into generic_loss

0cfc74a

Merge branch 'main' into generic_loss

2a0a3cd

Add tests for new loss

c55b442

Rename loss module

d315555

Update tests to increase coverage

a9cb97d

update tests for transfer function

a6f6fea

Merge branch 'main' into generic_loss

fd75baf

Merge branch 'main' into generic_loss

5564169

Refactor and generalize loss.py #635

Are you sure you want to change the base?

Refactor and generalize loss.py #635

Uh oh!

Conversation

ppegolo commented Jun 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor (creator of pull-request) checklist

Reviewer checklist

Uh oh!

HaoZeke left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Refactor and generalize `loss.py` #635

Refactor and generalize `loss.py` #635

ppegolo commented Jun 23, 2025 •

edited by github-actions bot

Loading

HaoZeke left a comment •

edited

Loading