Building distributed systems I've always gotten frustrated with retry libraries treating all failures the same and with rewriting special handling myself. E.g., timeouts getting the same backoff as rate limit errors, db locks, etc. Also observability around retries tends to be either too noisy or too opaque.
I wrote reflexio so retries would behave more like we want real systems to behave:
- errors are classified (e.g., TRANSIENT, RATE_LIMIT, CONCURRENCY, PERMANENT, etc)
- each class gets its own strategy (decorrelated jitter, equal jitter, fixed delay, etc)
- everything emits clean, structured events for logging/metrics/tracing
The library has sync and async support and a simple @retry decorator. The retry envelopes are deterministic. There is an optional observability hook for structured events.
I've kept the API small and explicit, so it is very easy to read the whole codebase.
Building distributed systems I've always gotten frustrated with retry libraries treating all failures the same and with rewriting special handling myself. E.g., timeouts getting the same backoff as rate limit errors, db locks, etc. Also observability around retries tends to be either too noisy or too opaque. I wrote reflexio so retries would behave more like we want real systems to behave: - errors are classified (e.g., TRANSIENT, RATE_LIMIT, CONCURRENCY, PERMANENT, etc) - each class gets its own strategy (decorrelated jitter, equal jitter, fixed delay, etc) - everything emits clean, structured events for logging/metrics/tracing
The library has sync and async support and a simple @retry decorator. The retry envelopes are deterministic. There is an optional observability hook for structured events. I've kept the API small and explicit, so it is very easy to read the whole codebase.
Docs: https://aponysus.github.io/reflexio/
GitHub: https://github.com/aponysus/reflexio
PyPI: https://pypi.org/project/reflexio/
I'd greatly appreciate any feedback. Especially, if there are patterns you rely on that I have failed to capture.