Skip to content

NameLookup can end up in a weird state with two pods, but where restarting the Solr pod causes it to download the database again #897

@gaurav

Description

@gaurav

When NameRes is run with LOAD_DATA=yes, we create three pods:

  • A web pod, which acts independently of the others
  • A restore job, which completes its work and goes to "Succeeded"
  • A Solr pod, which has:
    • An init-container that downloads the Solr database from RENCI, and then
    • A Solr database that the restore job will talk to in order to start the restoration process

Somehow, on ITRB CI on May 2-3, 2024, we ran into a situation where

  • On May 2, 2024, NameRes was restarted with LOAD_DATA=yes
  • Some updates might have caused it to restart -- it's unclear whether or not it was restarted in LOAD_DATA=yes or no mode, but let's assume LOAD_DATA=no, as that's the default for
  • On May 3, 2024, @pabbathreddya2 and I found that there were two pods (solr and web). We observed that there was around 158G of data in the Solr pod. Updating it with LOAD_DATA=no did not change the pods, so @pabbathreddya2 did a helm uninstall and then restarted it with LOAD_DATA=yes, which restarted in the Solr pod becoming essentially empty of data. My theory is that the 158G of data was the download, which is deleted at the start of a new download (since I think Rewrite NameRes script to delete the database later in the download process #842 is fixed now?), so that the database had in fact been wiped previously -- but how? We would see this if the PVC was wiped, but it's unclear how that would happen.

Essentially, this boils down to: how can the pods be in LOAD_DATA=no state (with two pods instead of three), but then restarting the Solr job causes it to start the download as if it's in LOAD_DATA=yes state?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions