Skip to content

Conversation

@ArrisLee
Copy link
Collaborator

Which issue this PR addresses:

https://issues.redhat.com/browse/ARO-7736

What this PR does / why we need it:

Adding a new doc for MIMO regarding infra, components, metrics and troubleshooting

Test plan for issue:

N/A

Is there any documentation that needs to be updated for this PR?

Yes, it is a doc

Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pointed out one place where I think the TSG should be more specific, but LGTM otherwise 👍🏻

Comment on lines +340 to +342
- **Global queue queries** (e.g., viewing all queued manifests across all clusters) are NOT available via Admin API
- **Single manifest detail queries** (e.g., getting specific manifest by ID) are NOT available via Admin API
- For these operations, use internal tooling: **Geneva Actions**, **Admin Portal**, **Kusto/DGrep**, or **monitoring dashboards**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For doing these two types of queries that aren't supported by the admin API, I don't see how Geneva Actions, the Admin Portal, or Kusto/DGrep would help. There are a few points in the doc where the TSG directs you to do these types of queries, so I think we should be more specific about how to do them (even if we need to them in a suboptimal way right now since the ideal tooling isn't in place yet).

Copy link
Member

@mociarain mociarain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic work and I think the detail is excellent. I do think though it's not laid out in a way that makes it easy for people to find what they want quickly.

Some ideas:

  • The Admin API stuff can me move to the already existing AdminAPI file and referenced.
  • The entire first section belongs in the README. This is really great overview and helped me a lot. I think it should go front and center and then referenced as needed.
  • All the content on alerting rules belongs in it's own file IMO

For the rest I don't know if this belongs in the RP. AFAIK we keep our TSG and this type of on-call information in the Wiki (that's migrating atm but still). Would this stuff be better in there so someone on-call will find it?

@ArrisLee
Copy link
Collaborator Author

thx for the feedback team, I think I will move this to eng.ms where all our TSGs and SOPs sit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants