Inclusion of a Differentiable Particle Filter #201

DanWaxman · 2026-02-20T15:01:25Z

DanWaxman
Feb 20, 2026
Collaborator

Currently, the PF defined in the SMC classes is not differentiable, in the sense that estimators of the score (i.e., grad(marginal log likelihood)) will be biased. In fact, automatic differentiation at all is broken now on CPU, but bias persists even after fixing that. This is unfortunate, because it limits what one can do for system identification -- for example, SGMCMC -- which rely on these score estimates being unbiased.

There are several ways to ameliorate this (c.f. the PyDPF paper), but one of the simplest is the stop-gradient trick, which recovers some classical gradient estimators by a simple + term - term trick.

I implemented this version of the DPF, and compared the resulting MLL and score estimates of a bootstrap PF to the analytically-known ones in a linear-Gaussian SSM. These show the expected results, i.e., that both the PF as-is and DPF have good MLL estimates, but the PF is highly biased:

This bias persists under many Monte Carlo draws, even at the true parameters:

The implementation adds minimal overhead (which is also in line with the paper):

PF (standard): 15.80 ± 1.80 ms (median ± std over 1000 evals)
PF (diff-resampling): 15.97 ± 1.14 ms (median ± std over 1000 evals)

This leaves two main design questions:

Whether this should be a default behavior, or an option. Right now I have it as an option, i.e., filter_combine is now

def filter_combine(
    state_1: ParticleFilterState,
    state_2: ParticleFilterState,
    propagate_sample: PropagateSample,
    log_potential: LogPotential,
    resampling_fn: Resampling,
    ess_threshold: float,
    differentiable_resampling: bool,
) -> ParticleFilterState:
    """Combine previous filter state with the state prepared for the current step.

    Implements the particle filter update: conditional resampling,
    propagation through state dynamics, and reweighting based on the potential function.

    Args:
        state_1: Filter state from the previous time step.
        state_2: Filter state prepared for the current step.
        propagate_sample: Function to sample from the Markov kernel M_t(x_t | x_{t-1}).
        log_potential: Function to compute the log potential log G_t(x_{t-1}, x_t).
        resampling_fn: Resampling algorithm to use (e.g., systematic, multinomial).
        ess_threshold: Fraction of particle count specifying when to resample.
            Resampling is triggered when the effective sample size (ESS) < ess_threshold * N.
        differentiable_resampling: If True, uses the stop-gradient resampling surrogate
            of Ścibior & Wood (2021) for gradient computation.

    Returns:
        The filtered state at the current time step.
    """

The current resampling strategies utilize a jax pure_callback to a numba implementation. Unfortunately, this is incompatible with jax AD. I took the easy route and just call the pure jax implementation, but in principle we could do a custom_vjp rule and make it a straight-through-estimator (in the PF code, lax.stop_gradient will be called every time anyways).

You can check out the implementation (which is only a few lines of code changed) on this branch of my forked copy.

SamDuffield · 2026-02-20T15:14:14Z

SamDuffield
Feb 20, 2026
Maintainer

This is great!! I think you can go ahead and make the PR so we can clearly see the few lines changed! I'm not sure whether this should be supported at the cuthbert PF level or the cuthbertlib resampling.

Regarding 1. I think if it doesn't change the computed values and adds minimal overhead I'd be ok to have it as default behaviour without the option (is there any downside to it??)

I think @AdrienCorenflos might have some thoughts on 2.

5 replies

DanWaxman Feb 20, 2026
Collaborator Author

Great! I just opened a PR #202. There's no big downside to making it the default, I think.

For (2), in the PR, I reset the util to its previous state -- as long as there is no differentiable = False option, backpropagation will always be blocked through the resampling step anyways. Though, of course, happy to change it to something like lax.stop_gradient(_inverse_cdf) or a STE if desired.

AdrienCorenflos Feb 22, 2026
Maintainer

It's a good default for learning parameters of the model because it recovers the (high variance version of) the fisher score, but a terrible one (if I recall correctly, it's been years) for learning parameters of proposals because the gradient behaviour is then undefined -> given Cuthbert tends to side with param inference rather than proposal I'd tend to agree it's more natural to default to it, but also more dangerous for the proposal learning case!

If it were me, I'd set the default the way we have it now.

Also 0 idea what is meant by point 2: the gradient of inverse cdf is 0 wrt the weights except at some set of measure 0 (that's the reason for the differentiability problem in the first place). Do you get a different gradient on GPU ss CPU (you shouldn't)?

DanWaxman Feb 22, 2026
Collaborator Author

it recovers the (high variance version of) the fisher score, but a terrible one (if I recall correctly, it's been years) for learning parameters of proposals because the gradient behaviour is then undefined

Yes, that's correct (whenever the proposal isn't bootstrap).

Also 0 idea what is meant by point 2: the gradient of inverse cdf is 0 wrt the weights except at some set of measure 0 (that's the reason for the differentiability problem in the first place). Do you get a different gradient on GPU ss CPU (you shouldn't)?

Right now, implementation for CPU is via a jax.pure_callback to numba. This is not just non-differentiable in the mathematical sense, it breaks jax AD. I.e., if you wanted to backpropagate through log Z right now, you get a runtime error, not a biased score estimate.

What I meant was: one option is to just call stop_gradient around this. This at least doesn't crash, though I don't quite see how it's a better default -- unless I'm missing something, instead of gradients w.r.t. proposal parameters being unprincipled, both gradients w.r.t. proposal parameters and SSM parameters are inconsistent and unprincipled. In hindsight, sure I don't think the STE option makes much sense (I guess the "right" way to do this is with "soft resampling", but this changes the forward pass), sorry about that.

AdrienCorenflos Feb 22, 2026
Maintainer

Soft resampling is just a bad idea, it's an unprincipled way to do adaptive resampling hoping for it to propagate gradients.

I understand the gradient, didn't think it would raise. The stop gradient is probably the cleanest solution you are right.

Regarding default behaviour, I'm not sure. The current behaviour is biased for SSM params but at least I know exactly how biased: it will always overestimate the transition noise because it behaves as if the model decomposed independently across time steps. I have on the other hand no idea what happens to proposals under the stop gradient trick!

DanWaxman Feb 23, 2026
Collaborator Author

I think the bias will be the same, no? It's still as if the model is decomposed independently across time, just with an extra factor for each timestep.

Regardless, I think this is moot after the refactor, as it's not a decorator.

Sahel13 · 2026-02-22T07:23:18Z

Sahel13
Feb 22, 2026
Collaborator

I think we want to have the option to turn this off (i.e., no `stop_gradient). When using the resampling scheme from Adrien's paper (which we should implement at some point), we do want the gradient to flow through the resampling step.

Maybe it makes more sense to have this as a property of the resampling schemes in cuthbertlib.

1 reply

DanWaxman Feb 22, 2026
Collaborator Author

Yeah, if the plan was to include a number of different differentiable resamplers, I agree an option to turn this off is good. Implementing on the cuthbertlib side also makes sense in that case, though it would require some refactoring of code; namely, current resamplers just provide the ancestral indices, but the resamplers would need to return the weights more directly.

AdrienCorenflos · 2026-02-22T20:31:26Z

AdrienCorenflos
Feb 22, 2026
Maintainer

Within the current library framework my OT based resampling is a bit inconvenient because it modifies the locations of the particles, whereas other resamplings don't. This can of course be changed but it's currently incompatible.

0 replies

AdrienCorenflos · 2026-02-22T20:32:17Z

AdrienCorenflos
Feb 22, 2026
Maintainer

Now that I think about it, it's probably something we'll want to change anyway because there are a number of resamplings that directly need to have access to the locations anyway!

5 replies

DanWaxman Feb 22, 2026
Collaborator Author

That makes sense. If this seems like an inevitable change (I agree it is for inclusion of OT-based samplings), I'm happy to take a stab at that and include in the PR.

AdrienCorenflos Feb 22, 2026
Maintainer

It's probably better done in 2 Pars, but I'd like to have a chance to discuss it with Sahel and Sam. I am in favour of having resamplings take weights and positions as inputs, and return weights positions and ancestors as outputs but maybe I'm missing an edge case.

AdrienCorenflos Feb 22, 2026
Maintainer

E.g. my OT based resampling doesn't really have the concept of ancestors, or at least not in the same way.

SamDuffield Feb 23, 2026
Maintainer

yep separate PR for this. Also start with the simplest change/interface (add particles to input and output?) then if it’s still insufficient for some method we can discuss it when that arises. But agreed giving access to particle positions makes sense!

AdrienCorenflos Feb 23, 2026
Maintainer

Well for Dan's purpose we need weights output, and I anyway know some methods which would end up needing this too.

Another thing is that it would allow us to move adaptive resampling (

cuthbert/cuthbert/smc/particle_filter.py

Line 196 in 71287f9

ancestor_indices, log_weights = jax.lax.cond(

) inside cuthbertlib as a decorator

DanWaxman · 2026-02-23T23:00:12Z

DanWaxman
Feb 23, 2026
Collaborator Author

Thanks for the lively discussion! I've made a version that uses the resampling refactor -- I haven't made a PR yet since that branch may ostensibly change a bit, but will rebase + PR after #207 lands. You can see it on my fork at https://github.com/DanWaxman/cuthbert/tree/dw-sg-dpf.

2 replies

AdrienCorenflos Feb 24, 2026
Maintainer

As a side note, I think this should be unittested as part of SMC tests: take the same simple model we have been working with and compute expected and obtained derivatives.

I'm not really sure how to do it otherwise: I can't think of a low level thing that the stop gradient trick must verify.

DanWaxman Feb 26, 2026
Collaborator Author

I opened a new PR with the resampling (and closed the old one), now at #209. Unit testing turned out to be tricky, for reasons discussed in the PR.

DanWaxman · 2026-03-05T19:55:43Z

DanWaxman
Mar 5, 2026
Collaborator Author

Closing this discussion since #202 landed!

0 replies

Inclusion of a Differentiable Particle Filter #201

Uh oh!

DanWaxman Feb 20, 2026 Collaborator

Replies: 6 comments · 13 replies

Uh oh!

SamDuffield Feb 20, 2026 Maintainer

Uh oh!

DanWaxman Feb 20, 2026 Collaborator Author

Uh oh!

AdrienCorenflos Feb 22, 2026 Maintainer

Uh oh!

DanWaxman Feb 22, 2026 Collaborator Author

Uh oh!

AdrienCorenflos Feb 22, 2026 Maintainer

Uh oh!

DanWaxman Feb 23, 2026 Collaborator Author

Uh oh!

Sahel13 Feb 22, 2026 Collaborator

Uh oh!

DanWaxman Feb 22, 2026 Collaborator Author

Uh oh!

AdrienCorenflos Feb 22, 2026 Maintainer

Uh oh!

AdrienCorenflos Feb 22, 2026 Maintainer

Uh oh!

DanWaxman Feb 22, 2026 Collaborator Author

Uh oh!

AdrienCorenflos Feb 22, 2026 Maintainer

Uh oh!

AdrienCorenflos Feb 22, 2026 Maintainer

Uh oh!

SamDuffield Feb 23, 2026 Maintainer

Uh oh!

AdrienCorenflos Feb 23, 2026 Maintainer

Uh oh!

DanWaxman Feb 23, 2026 Collaborator Author

Uh oh!

AdrienCorenflos Feb 24, 2026 Maintainer

Uh oh!

DanWaxman Feb 26, 2026 Collaborator Author

Uh oh!

DanWaxman Mar 5, 2026 Collaborator Author

DanWaxman
Feb 20, 2026
Collaborator

Replies: 6 comments 13 replies

SamDuffield
Feb 20, 2026
Maintainer

DanWaxman Feb 20, 2026
Collaborator Author

AdrienCorenflos Feb 22, 2026
Maintainer

DanWaxman Feb 22, 2026
Collaborator Author

AdrienCorenflos Feb 22, 2026
Maintainer

DanWaxman Feb 23, 2026
Collaborator Author

Sahel13
Feb 22, 2026
Collaborator

DanWaxman Feb 22, 2026
Collaborator Author

AdrienCorenflos
Feb 22, 2026
Maintainer

AdrienCorenflos
Feb 22, 2026
Maintainer

DanWaxman Feb 22, 2026
Collaborator Author

AdrienCorenflos Feb 22, 2026
Maintainer

AdrienCorenflos Feb 22, 2026
Maintainer

SamDuffield Feb 23, 2026
Maintainer

AdrienCorenflos Feb 23, 2026
Maintainer

DanWaxman
Feb 23, 2026
Collaborator Author

AdrienCorenflos Feb 24, 2026
Maintainer

DanWaxman Feb 26, 2026
Collaborator Author

DanWaxman
Mar 5, 2026
Collaborator Author