BeeGFS becomes inconsistent when registering a storage node/target fails #70
-
|
I'm observing a problem with BeeGFS 8 community version when adding a storage node/target fails. The list of targets in the storage_pool_default becomes inconsistent with the actual status of the system. It looks like the targets in the pool get added even though there occurs an error during registration. I now have a BeeGFS with no storage targets, but the default pool shows 36 targets as members. How can I clean this up/recover from such a registration failure? Here some more info from logs and command output: Pool list: There seems to be no command to remove targets from a pool. I guess that should happen automatically - the mgmt-daemon should keep everything consistent? The naive attempt fails: Target list: Node list: Daemon status: Each storage service has 8 storage targets as a member. The total count of storage targets is 32. I had a bug in the script deploying the storage node. I don't recall exactly what it was: either I gave all targets of a service the same target ID or I gave all services associated with a systemd unit a different ID. Either way, beegfs-setup-storage returned without error, but the unit couldn't start. I got this " I managed to clear out the data base error by deleting the storage node " I interpret this as there is left-over information from the first failed attempt in the data base that now prevents registration of this target. Before I scrap the entire system and start from scratch, I would really like to know if there is a procedure to clean up something like that. For now, its a test system with no data on it. In the future, if something like that happens, there should be a way to remove data that is the result of an aborted transaction. My actual expectation is that any administrative operation is of the type all-or-nothing and observations like the one I report here should be impossible. The health status itself is reported as "healthy": Thanks for any hints, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 7 replies
-
|
Some information found on the storage targets indicating partial initialisation: |
Beta Was this translation helpful? Give feedback.
-
|
I was going to look into the data base (https://doc.beegfs.io/latest/advanced_topics/management_service.html) and get stuck with this error that is thrown on every SQL command: |
Beta Was this translation helpful? Give feedback.
-
|
My final report: working on a copy of the database, executing Thanks for the help! |
Beta Was this translation helpful? Give feedback.
Hi Frank,
Thank you for the details.
I agree the mgmtd shouldn't let you get into this state. If it happens and there is no data or configuration you need to keep, the cleanest approach is to just delete the database, wip…