Return Errors for QueryAPI when Upstream is Down #8239
hanzhang911
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Our Proposed Architecture
We are building a new architecture using Thanos for monitoring. Due to the very large amount of active series, we plan to host different categories of data in different shards.

On each shard, we deploy a Thanos querier to talk to the upstream Mimir. We also use a separate Thanos querier to federate the results returned from these shards. So it looks like this
Returning partial result or failure when one upstream querier is down
We are exploring how the above Thanos-federation-querier behaves (return partial results or failures/erros) when one of the upstream queriers (querier on shard-1 or on shard-2) is down. We have not decided what to return, but we wanted to see what are the options that are natively supported by Thanos.
We found that the arguments named “--query.partial-response” and “--no-query.partial-response” can control the behavior. However, further tests show that these two arguments are only effective for StoreAPI. And they do not apply to QueryAPI.
In our case, the federation-querier is talking to the two upstream queriers via QueryAPI, so even if we specify “--no-query.partial-response”, the federation-querier still returns partial results, instead of returning failures.
We have also tried pass the HTTP parameter
partial_response=falseas follows when querying the federation querier, but it still returns partial results instead of errors.So my question is that
Thanks
Han
Beta Was this translation helpful? Give feedback.
All reactions