Skip to content

Conversation

@martintmk
Copy link
Member

@martintmk martintmk commented Sep 29, 2025

Add a new recoverable crate that provides standardized types for classifying
error conditions as recoverable or non-recoverable, enabling consistent retry
behavior across different error types and resilience middleware.

Core features:

  • RecoveryInfo type for classifying errors with recovery metadata
  • Recoverable trait for types that can determine their recoverability
  • RecoveryKind enum distinguishing between retry, outage, never, and unknown
  • Support for explicit retry delays via delay() method
  • Service outage detection with optional recovery hints

@codecov
Copy link

codecov bot commented Sep 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (ea33c90) to head (088ba4f).

Additional details and impacted files
@@           Coverage Diff            @@
##             main      #18    +/-   ##
========================================
  Coverage   100.0%   100.0%            
========================================
  Files          10       11     +1     
  Lines         992     1119   +127     
========================================
+ Hits          992     1119   +127     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@martintmk martintmk marked this pull request as ready for review October 1, 2025 15:51
Copilot AI review requested due to automatic review settings October 1, 2025 15:51
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new recoverable crate that provides standardized types for error classification and recovery behavior in resilience patterns. The crate enables consistent determination of whether conditions are recoverable (transient) or non-recoverable (permanent/successful).

  • Core recovery classification system with Recovery, RecoveryKind, and Recover trait
  • Support for retry timing hints through delay metadata
  • Service outage detection capabilities for widespread failures
  • Comprehensive documentation and examples for proper usage patterns

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/recoverable/src/lib.rs Main implementation with Recovery struct, RecoveryKind enum, Recover trait, and comprehensive test suite
crates/recoverable/Cargo.toml Package configuration for the new recoverable crate
crates/recoverable/README.md Documentation and usage examples for the crate
crates/recoverable/CHANGELOG.md Empty changelog file for future version tracking
crates/recoverable/logo.png Git LFS tracked logo image for the crate
Cargo.toml Workspace updates to include recoverable crate and static_assertions dependency
README.md Updated main README to reference the new recoverable crate
CHANGELOG.md Updated main changelog to reference recoverable's changelog

@martintmk martintmk marked this pull request as draft October 1, 2025 15:55
@geeknoid
Copy link
Member

geeknoid commented Oct 1, 2025

I think the names of the types aren't quite right English-wise. I would recommend the following:

RecoveryMetadata - type for classifying errors with recovery metadata
RecoveryAware - trait for types that can determine their recoverability
RecoveryKind enum distinguishing between retry, outage, never, and unknown

As a first step. And then, I'm still not sure why the retry delay can't be incorporated directly into the RecoveryKind enum which would eliminate a separate type. You could add a delay function for this enum which would return an updated RecoveryKind with the given delay, so it would match the semantics that the current delay function has. So what scenario does this complicate?

@martintmk
Copy link
Member Author

I think the names of the types aren't quite right English-wise. I would recommend the following:

RecoveryMetadata - type for classifying errors with recovery metadata RecoveryAware - trait for types that can determine their recoverability RecoveryKind enum distinguishing between retry, outage, never, and unknown

Regarding the type names, I brainstormed these a while back with @ralfbiedert. I feel that RecoveryMetadata and RecoveryAware break the M-CONCISE-NAMES guideline.

As a first step. And then, I'm still not sure why the retry delay can't be incorporated directly into the RecoveryKind enum which would eliminate a separate type. You could add a delay function for this enum which would return an updated RecoveryKind with the given delay, so it would match the semantics that the current delay function has. So what scenario does this complicate?

This is what I had in previous iteration and it introduced friction. For example, one komponent evaluates the recovery kind, while the other extracts the retry-after header and updates existing metadata.

let mut recovery = detect_recovery(...);
// ...
// little later 
if let Some(delay) = extract_delay(...) {
   recovery = recovery.delay(delay);
}

In addition, I plan to introduce a reason property, so the recovery metadata cannot be simply expressed through enums because many additional props can apply across all members. That's why I started thinking about metadata simply as some dumb bag of properties where producer of these can fill any combinations of these.

Important part, that most consumers won't care about, is evaluation of recovery metadata. This will be done mostly once by individual resilience middleware. Here, the inspection is flattened out, and middleware looks at each property individually. (so check RecoveryKind first, then if it evaluates to a retry it looks the delay property, then reason if necessary, etc.)

I tried the current simplified model in internal project and it indeed simplifies how the recovery metadata are consumed. Of course, this can still change as we will go through more usage patterns. But currently, I would like to try this latest approach.

@geeknoid
Copy link
Member

geeknoid commented Oct 2, 2025

Well, even if you think my suggested names are too long, the existing names aren't right.

"Recover" is a verb, it's not appropriate as a trait name.
"Recovery" is an event and is not meaningful as a "carrier of state".

@martintmk
Copy link
Member Author

What about:

Recover -> Recoverable (declares capability)
Recovery -> RecoveryInfo (cleaner, as a bag of properties, feels nicer than RecoveryMetadata)

@geeknoid
Copy link
Member

geeknoid commented Oct 3, 2025

Recover -> Recoverable (declares capability)
Recovery -> RecoveryInfo (cleaner, as a bag of properties, feels nicer than RecoveryMetadata)

Yeah, that works :-)

@ralfbiedert
Copy link
Collaborator

"Recover" is a verb, it's not appropriate as a trait name.

That's not true, there are several lenses to look at trait naming, compare API Guidelines:

Linguistically:

Imperatives like io::Write, fmt::Debug, clone::Clone
Agent nouns like iter::Iterator, hash::Hasher
Nouns like fmt::Binary, ops::Fn
Adjectives like marker::Sized, panic::UnwindSafe
Preposition like convert::Into, borrow::ToOwned

Functionally:

  • If the trait has a single self-explanatory method (or a set of nearly identical methods), name it after the method: Clone, Hash, Default, Into, Write, ToOwned, AsRef, Extend.
  • If the trait has no methods, name it after what ability (Send, Sync, Copy) or property (UnwindSafe) its implementors have.
  • If the trait has a broader set of methods or a single method that is not self-explanatory, the name should describe either what its implementors are (Iterator, Hasher, Fn, Error, Future, Termination) or what ability/property they have (AsciiExt, fmt::Debug, fmt::Binary).

However, as a universal convention trait names are generally short, unless compound IntoIter, UnwindSafe. Many of our trait names have unfortunately deviated from that and more leaned into C# interfaces naming style , and as we move here I'd like to revert that.

With that said I agree Recover, while being imperative, does not match the "single method" convention, so given the method is called recovery, I'd like the trait to be named Recovery as well. since that matches the "Noun + single method" lens (and removes the -able that doesn't feel idiomatic.

About RecoveryInfo I don't have strong feelings.

@@ -0,0 +1,19 @@
[package]
name = "recoverable"
description = "recovery metadata and classification for resilience patterns"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we capitalize this?

ServiceUnavailable { retry_after: Option<Duration> },
}

impl Recoverable for NetworkError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Recovery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geeknoid Would you agree with the rename?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think it's right.

This doesn't do anything about recovery, it merely says that an error can be recovered from. When I see "recovery", I think "action", this IS the recovery to the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait only tells that particular error or result is recoverable but does not nor it is able to do the recovery action.

It's up to the caller to do the action itself based on the conditions.

For example, caller might decide ahead of time that particular action can be retried and he needs to put aside all information/parameters required to do that recovery action. This trait only gives the information that such recovery is possible.

Our names should reflect that. With this context in mind, I find the Recoverable more appropriate than Recovery.
(capability vs action)

@geeknoid geeknoid force-pushed the u/mtomka/recoverable branch from db34508 to 088ba4f Compare October 6, 2025 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants