pacemaker-remoted load_env_var refactors, and assorted minor changes #3833

nrwahl2 · 2025-03-03T09:52:00Z

I'm trying to fix the Coverity issues that are popping up for #3832 and these are also appearing on my machine. There are some more I haven't reached yet.

daemons/execd/remoted_pidone.c

nrwahl2 · 2025-03-03T19:19:22Z

Used a define per review (https://github.com/ClusterLabs/pacemaker/compare/17bb059c39e3ee0b75a8fc26c259a26c5241c719..3ebf24e3100e21a2900009ebcb5c5c7edafa92a1)

clumens · 2025-03-03T19:47:45Z

Were you able to test the changes to load_env_vars? That, or some unit tests, would make me feel a whole lot better about all that small batch string processing code.

nrwahl2 · 2025-03-03T20:33:38Z

Were you able to test the changes to load_env_vars? That, or some unit tests, would make me feel a whole lot better about all that small batch string processing code.

Not really. That's gonna take me a little while to set up and figure out (won't be ready today during business hours). But it is a good idea.

Also interested in whether there's a better way than rolling our own parser, but haven't found anything yet.

nrwahl2 · 2025-03-05T07:25:02Z

Not really. That's gonna take me a little while to set up and figure out (won't be ready today during business hours). But it is a good idea.

Also interested in whether there's a better way than rolling our own parser, but haven't found anything yet.

I'm planning to use pcmk__scan_nvpair() here (now that I'm aware it exists) and then process the key and value separately. It will hopefully make things easier to reason about, even though there will still be corner cases involving escapes and nested quotes for now. See also #3834.

I think the authors made the load_env_vars() function way too complicated in our habitual obsession with "try to never allocate heap memory for a string." Which makes sense in the kernel, but IMO it's pretty silly in MOST of Pacemaker, where we're allocating large data structures frequently. Way too complicated without even achieving totally correct behavior, because it's so easy to get bogged down in the character-by-character parsing details.

clumens · 2025-03-05T17:54:31Z

I'm planning to use pcmk__scan_nvpair() here (now that I'm aware it exists) and then process the key and value separately. It will hopefully make things easier to reason about, even though there will still be corner cases involving escapes and nested quotes for now. See also #3834.

Okay, I'll wait to do final review on this until after you've moved to using pcmk__scan_nvpair().

I think the authors made the load_env_vars() function way too complicated in our habitual obsession with "try to never allocate heap memory for a string." Which makes sense in the kernel, but IMO it's pretty silly in MOST of Pacemaker, where we're allocating large data structures frequently. Way too complicated without even achieving totally correct behavior, because it's so easy to get bogged down in the character-by-character parsing details.

I absolutely agree. If I were writing this function from scratch, I would (1) have it take file contents as a parameter instead of a file name, (2) put it in a library somewhere as private API to allow us to unit test it, and (3) find something in glib that could be used to do the parsing instead of writing it by hand. Now we're stuck with having to support whatever bugs there might be in processing since otherwise we'll break people's remote nodes.

nrwahl2 · 2025-03-06T07:30:45Z

Now we're stuck with having to support whatever bugs there might be in processing since otherwise we'll break people's remote nodes.

Not necessarily. Obviously we've run into many situations where users are relying on behavior they shouldn't be... but these corner cases seem like pretty clear bugs that shouldn't be guaranteed to work, and that it's unlikely anyone is using. I'd be willing to gamble, if we actually feel like fixing them.

That's not something I want to spend time on right now though. If we find a good parser in a library, fine. But setting flags for quote nesting levels, whether we just saw a backslash and are escaping the next character, etc., sounds really unappealing.

nrwahl2 · 2025-03-06T10:39:20Z

Rebased on current main, which led to dropping the final commit. Otherwise no changes.

daemons/execd/remoted_schemas.c

nrwahl2 · 2025-03-08T01:02:12Z

I added a commit to fix the rand() suppression. I also split up the load_env_vars() stuff into several more incremental commits and then used pcmk__scan_nvpair() in the final one. It doesn't really make that function any shorter, but IMO it's a lot more readable now.

I have not tested yet.

nrwahl2 · 2025-03-08T07:53:04Z

Still need to test the load_env_vars() stuff and maybe functionize the value validation part, and run cts-lab for a bit to maybe find any obvious issues with the remote message stuff.

I can't easily test the localized_remote_header() involving byte swapping for endianness, because I don't have a big-endian machine. The last time I tried to set up a big-endian VM, if I ever succeeded at all, it was an enormous pain. I don't see much benefit from unit testing it either: we can trust that the GLib functions do what they're supposed to do, as long as the surrounding logic is correct.

But this actually resolves all the Coverity issues on my laptop. There is a suppression for rand(), and there are two suppressions in crm_resource.c that will go away as part of my crm_resource overhaul PRs. And that's it.

clumens · 2025-07-16T18:22:04Z

@nrwahl2 Is there anything else remaining to be done on this PR?

nrwahl2 · 2025-07-16T18:33:17Z

@nrwahl2 Is there anything else remaining to be done on this PR?

Only testing, hopefully.

There is no unit testing yet, which may be overkill now that we rely on a handful of GLib function calls... but more importantly I haven't tested this by creating a bundle and making sure it loads environment variables appropriately. I need to refresh on how to do that.

nrwahl2 · 2025-07-22T00:23:11Z

Updated the "Standardize remote environment variable parsing" commit substantially, to resolve an issue where leading whitespace in an unquoted value was allowed. Also added a commit to a document a couple of limitations that I don't feel like dealing with. No changes to earlier commits.

About to rebase on current main to resolve conflict.

Note that delay_max != delay_base implies that delay_max > delay_base. Signed-off-by: Reid Wahl <[email protected]>

Add 1 to the modulus so that we include the full range of (delay_max - delay_base), not that it really matters. For example, if delay_max == 5 and delay_base == 2, then we want to add values in {0, 1, 2, 3} to delay_base -- not to add values only in {0, 1, 2}. Signed-off-by: Reid Wahl <[email protected]>

The current callers won't have trailing newlines from the CLI or from reading from a file. Newlines would be present only if the user included them explicitly. Newline removal was (re-)added by b7fe638, to fix a regression in 28ca6aa. However, at the time of 28ca6aa, pcmk_scan_nvpair() was public API, so the change in behavior actually mattered. Now, it doesn't. It appears to be safe to stop stripping trailing newlines. Signed-off-by: Reid Wahl <[email protected]>

I don't see any reason not to. This does change behavior for crm_resource and stonith_admin if "NAME=" is given as a command line argument. However, this is such an edge case that I don't expect it to matter. It seems foolish to depend on the current behavior. Of course, that means nothing, so if anyone complains, we can revert. Signed-off-by: Reid Wahl <[email protected]>

Better than defining our own swab functions if they're not available on the system. GLib already did that work for us and uses arch-specific optimizations where possible. Signed-off-by: Reid Wahl <[email protected]>

Coverity has been complaining about the payload_offset and payload_uncompressed being tainted scalar values. There's only so much validation we can do when we're reading from a socket. But apparently this is enough, because it makes the Coverity errors go away. There's a lot more room for improvement in the remote message processing. I found a few bugs a while back that we need to fix involving multiple messages received in rapid succession. This is an improvement for now. Note that I got rid of the CRM_LOG_ASSERT() line that subtracts 1 from the index. As far as I can tell, that's an off-by-one error and we have no reason to expect that position to contain a null byte. The commit that added it doesn't have any information in the commit message or comments. Signed-off-by: Reid Wahl <[email protected]>

Signed-off-by: Reid Wahl <[email protected]>

Logic doesn't change otherwise. We just have to copy the "eat the rest of the line" bit. Signed-off-by: Reid Wahl <[email protected]>

getline() allocates a buffer large enough to hold the entire input line, avoiding the need to worry about LINE_MAX (which glibc doesn't enforce anyway). The man page says: "getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found." So for each line that we read, either it contains a newline or we've reached EOF. This makes the line-eating and a bit of the error-checking unnecessary. Also define a constant for the file name instead of taking an argument. Signed-off-by: Reid Wahl <[email protected]>

Skipping (or now, stripping) leading whitespace is one fewer thing we have to do ourselves. g_strchug() is defined to operate on a (gchar *), but it modifies its argument in place without allocating/rellocating/freeing memory, so it's safe. Since find_env_var_name() only needs to return one pointer now, it doesn't need output arguments. Signed-off-by: Reid Wahl <[email protected]>

After recent refactors, this is no longer necessary. We can just continue to the next outer loop iteration in the one place where it would have mattered. Signed-off-by: Reid Wahl <[email protected]>

Signed-off-by: Reid Wahl <[email protected]>

We consider the value invalid if anything besides whitespace or a comment (beginning with '#') follows it on the line. Further, we ignore all trailing whitespace after a valid value. So it's fine to strip all trailing whitespace, including a newline if present, before processing. Clearly we must now look for '\0' where we previously looked for '\n'. Signed-off-by: Reid Wahl <[email protected]>

We don't want to do this until after parsing the value, because a '#' character inside a quoted value should be treated as part of the value. Signed-off-by: Reid Wahl <[email protected]>

By stripping the rest of the trailing whitespace after stripping the comment, we eliminate the need to null-terminate explictly. If the value is valid (not followed by any garbage), then end already points to a terminating null byte after stripping the comment and remaining whitespace. Signed-off-by: Reid Wahl <[email protected]>

It's no longer needed outside this block. Signed-off-by: Reid Wahl <[email protected]>

The changes in this commit involve using (gchar *) strings and some dynamic allocation to make load_env_vars() easier to reason about. Previously, we tried to be as efficient as possible by reading a line into a buffer, iterating over it with multiple pointers, dividing it into name and value by insertion of a null terminator, etc. This is premature optimization. This function is not called along a path of execution where performance is so critical as to require this. In my opinion, clarity is more important. * Use pcmk__scan_nvpair() to separate the line on the equal sign. * Simplify name validation since we now have name as a separate string (before the equal sign) and don't need to return first and last pointers. * Skip the leading quote by converting it to a space and calling g_strchug() to memmove() value up by one position. This ensures that value still points to the beginning of the buffer, which simplifies freeing it later. Signed-off-by: Reid Wahl <[email protected]>

Signed-off-by: Reid Wahl <[email protected]>

Our bespoke code has been fragile and hard to understand at a glance. There were certainly corner cases involving nested quotes that were not handled in the way the shell would handle them. Now we use g_shell_parse_argv() instead, to mimic how the shell would process the value in a "NAME=VALUE" assignment statement. Signed-off-by: Reid Wahl <[email protected]>

At least on my machine, this one is no longer needed after all the recent refactors. Signed-off-by: Reid Wahl <[email protected]>

We can resolve these limitations if there's ever a need. For now, it's not worth the trouble of implementing and testing these missing capabilities. Signed-off-by: Reid Wahl <[email protected]>

nrwahl2 · 2025-07-22T08:05:38Z

I tested successfully with the following environment file, making sure the values were as expected. It's not pretty, just a quick-and-dirty thing that hits all the important cases I could think of. The comment at the top is the variables that should get set.

# 1-3, 5, 6, 10-16, 21-24
#
VAR1=value1
   VAR2='value2'

VAR3='value3  3'
VAR4=value4 4
# starts with comment
VAR5=value5 #5
VAR6=value6 #6 also 6
1VAR7=value7
'VAR8'='value8'
#VAR9=value9
VAR10=
VAR11=''
VAR12=""
VAR13='""'
VAR14="''"
VAR15="  "
VAR16='  '
VAR17 =17
VAR18 = 18
VAR19= 19
VAR20= 20 # comment
VAR21="'nested single then double \"quotes\"' after-text"
VAR22="pre-text \"nested double then 'quotes'\""
VAR23='pre-text '\''nested single then double '\"quotes\"\'
VAR24='"pre-text nested double then single '\''quotes'\''" after-text'

nrwahl2 requested a review from clumens March 3, 2025 09:52

nrwahl2 force-pushed the nrwahl2-coverity branch from 883c023 to 8481902 Compare March 3, 2025 09:55

clumens reviewed Mar 3, 2025

View reviewed changes

daemons/execd/remoted_pidone.c Outdated Show resolved Hide resolved

clumens added the review: in progress PRs that are currently being reviewed label Mar 3, 2025

nrwahl2 force-pushed the nrwahl2-coverity branch 2 times, most recently from 3ebf24e to 6a42764 Compare March 3, 2025 19:18

clumens mentioned this pull request Mar 5, 2025

Fix systemd unit overrides #3832

Merged

nrwahl2 force-pushed the nrwahl2-coverity branch from 6a42764 to cfbf4a4 Compare March 6, 2025 10:39

clumens reviewed Mar 6, 2025

View reviewed changes

daemons/execd/remoted_schemas.c Outdated Show resolved Hide resolved

nrwahl2 force-pushed the nrwahl2-coverity branch from cfbf4a4 to c94342c Compare March 8, 2025 00:57

nrwahl2 force-pushed the nrwahl2-coverity branch 5 times, most recently from ec1ad9f to f71ef81 Compare March 8, 2025 01:26

nrwahl2 marked this pull request as draft March 8, 2025 01:28

nrwahl2 force-pushed the nrwahl2-coverity branch 3 times, most recently from 81b5104 to d561a8b Compare March 8, 2025 01:34

nrwahl2 marked this pull request as ready for review March 8, 2025 06:59

nrwahl2 force-pushed the nrwahl2-coverity branch from f1b55a0 to 7e60792 Compare March 8, 2025 07:02

nrwahl2 force-pushed the nrwahl2-coverity branch from cf12b21 to f94c34d Compare July 13, 2025 01:25

clumens approved these changes Jul 16, 2025

View reviewed changes

nrwahl2 force-pushed the nrwahl2-coverity branch from eebcadc to 8f73fbb Compare July 22, 2025 00:21

nrwahl2 added 21 commits July 21, 2025 17:23

Refactor: fencer: Simplify start_delay calculation for fence command

c5ffaa8

Note that delay_max != delay_base implies that delay_max > delay_base. Signed-off-by: Reid Wahl <[email protected]>

Refactor: libcrmcommon: Use GLib functions for swapping byte order

e3080fc

Better than defining our own swab functions if they're not available on the system. GLib already did that work for us and uses arch-specific optimizations where possible. Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Remove unnecessary nesting in load_env_vars()

b3b73fb

Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Remove more nesting from load_env_vars()

f1cfabd

Logic doesn't change otherwise. We just have to copy the "eat the rest of the line" bit. Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Skip setting value = NULL in load_env_vars()

5bd051a

After recent refactors, this is no longer necessary. We can just continue to the next outer loop iteration in the one place where it would have mattered. Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Remove some nesting in load_env_vars()

3f5334b

Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Strip trailing comment in load_env_vars()

e192036

We don't want to do this until after parsing the value, because a '#' character inside a quoted value should be treated as part of the value. Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Limit quote scope in load_env_vars()

89668b9

It's no longer needed outside this block. Signed-off-by: Reid Wahl <[email protected]>

Refactor: remoted: Functionize loading one environment variable

2387f7d

Signed-off-by: Reid Wahl <[email protected]>

Refactor: libcrmcommon: Drop unnecessary Coverity suppression

68d6718

At least on my machine, this one is no longer needed after all the recent refactors. Signed-off-by: Reid Wahl <[email protected]>

Doc: Pacemaker Explained: Document limitation of pcmk-init.env parsing

bb71085

We can resolve these limitations if there's ever a need. For now, it's not worth the trouble of implementing and testing these missing capabilities. Signed-off-by: Reid Wahl <[email protected]>

nrwahl2 force-pushed the nrwahl2-coverity branch from 8f73fbb to bb71085 Compare July 22, 2025 00:28

clumens merged commit a473d15 into ClusterLabs:main Jul 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pacemaker-remoted load_env_var refactors, and assorted minor changes #3833

pacemaker-remoted load_env_var refactors, and assorted minor changes #3833

Uh oh!

nrwahl2 commented Mar 3, 2025

Uh oh!

Uh oh!

nrwahl2 commented Mar 3, 2025

Uh oh!

clumens commented Mar 3, 2025

Uh oh!

nrwahl2 commented Mar 3, 2025

Uh oh!

nrwahl2 commented Mar 5, 2025 •

edited

Loading

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

nrwahl2 commented Mar 6, 2025

Uh oh!

nrwahl2 commented Mar 6, 2025

Uh oh!

Uh oh!

nrwahl2 commented Mar 8, 2025

Uh oh!

nrwahl2 commented Mar 8, 2025 •

edited

Loading

Uh oh!

clumens commented Jul 16, 2025

Uh oh!

nrwahl2 commented Jul 16, 2025

Uh oh!

nrwahl2 commented Jul 22, 2025

Uh oh!

nrwahl2 commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

pacemaker-remoted load_env_var refactors, and assorted minor changes #3833

pacemaker-remoted load_env_var refactors, and assorted minor changes #3833

Uh oh!

Conversation

nrwahl2 commented Mar 3, 2025

Uh oh!

Uh oh!

nrwahl2 commented Mar 3, 2025

Uh oh!

clumens commented Mar 3, 2025

Uh oh!

nrwahl2 commented Mar 3, 2025

Uh oh!

nrwahl2 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

nrwahl2 commented Mar 6, 2025

Uh oh!

nrwahl2 commented Mar 6, 2025

Uh oh!

Uh oh!

nrwahl2 commented Mar 8, 2025

Uh oh!

nrwahl2 commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clumens commented Jul 16, 2025

Uh oh!

nrwahl2 commented Jul 16, 2025

Uh oh!

nrwahl2 commented Jul 22, 2025

Uh oh!

nrwahl2 commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

nrwahl2 commented Mar 5, 2025 •

edited

Loading

nrwahl2 commented Mar 8, 2025 •

edited

Loading