Store terminal observations in AutoResetWrapper infos for value bootstrap #641
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
Differentiating between truncation and termination of episodes is key to proper value estimation. Brax already exposes
truncationandepisode_doneflags as part ofinfoswhich allows to make the distinction.When using the
AutoResetWrapper(which is desirable) and encountering a terminal state, the returned state (and therefore observation) is the first state of the new episode. However, when doing value estimation in such states, one wishes to compute the value of the last state for value bootstrap. For a terminal (not truncated) state, this poses no issue as the value is 0 and the state can be ignored, however for a truncated state one needs to predict the value of the said state (not the first state of the new episode). This is not possible at the moment as this state is never exposed.I propose adding a simple
obs_stfield in the infos returned through the AutoResetWrapper, which exposes this info to users for correct value bootstrapping.I have added the corresponding test, and all tests pass.
I am new to open source-contributions and I could not find a style guide or a linter for brax. Please tell me if there are any modifications to be made.