Skip to content

WIP no submodules#512

Draft
biojppm wants to merge 2 commits into
masterfrom
nosubmodules
Draft

WIP no submodules#512
biojppm wants to merge 2 commits into
masterfrom
nosubmodules

Conversation

@biojppm

@biojppm biojppm commented Mar 30, 2025

Copy link
Copy Markdown
Owner

From the comments below:

If I understand correctly, the c4core submodule is replaced by a copy of the c4core repository, correct ?

Yes, with some details to it:

  • this PR removes the c4core submodule, and copies c4core code into the rapidyaml tree, to two different dirs:
    • ext/c4core.src: c4core code used in the rapidyaml library
    • ext/c4core.dev: c4core code used by tests/benchmarks
      Doing this makes the standalone library marginally smaller, and enforces the standalone default of rapidyaml bypassing the submodule. So most users (which don't care about c4core itself) will now see rapidyaml as a plain project without extra dependencies or submodules.
  • this PR retains the existing option to compile against a full instance of the c4core library, thus ignoring the in-source copy under ext/c4core.src and ext/c4core.dev
    • the default is still to compile in standalone mode (ie c4core part of rapidyaml)
    • the pre-existing cmake option RYML_STANDALONE still enables users to build rapidyaml against a previously build c4core library (like in the ubuntu packages where rapidyaml depends on c4core).
  • the amalgamation tool should work just as before

Why is this desirable, and what issue does this resolve ?

I have had repeated feedback that the c4core submodule makes it impossible for some people to use rapidyaml, either because it contains submodules, or because c4core is a dependency boogeyman.

While I disagree with both of these reasonings, it is not for me to dispute their choice.

So I am removing these reasons by making the c4core code really standalone in rapidyaml, while striving to retain all current existing workflows.


Do you plan to remove the c4core repository and only maintain it within rapidyaml

No, not at all. c4core will remain just as it is, in https://github.com/biojppm/c4core.

and if not, how will the fixes from c4core be applied to rapidyaml (beside manually) ?

This PR adds a helper makefile to simplify the workflow, with two main approaches:

  • create an in-source clone of c4core (parallel to the existing copies, under the old location of the submodule ext/c4core) and...
  • importing changes from c4core clone to rapidyaml copies
  • exporting changes from rapidyaml copies to c4core clone
  • check sync status.

The CI enforces that the in-source c4core copy is in sync with the target commit hash (just like a submodule dirty status).

The makefile helper to sync with c4core is at ext/Makefile, and the README with instructions will be at ext/README.md.

The file enforcing the c4core version is at ext/c4core.mk.


Please clarify the intent here, I am really worried about this change.

Thanks for looking into this, I truly appreciate.

I am still working in this PR, so this is truly the proper time to start offering feedback. I have made a conscious effort to retain all existing usage scenarios, but if there is something I'm overlooking, I'd like to hear about it.

And FWIW, after getting this PR ready and green, I will let it linger for some weeks before merging it.

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@coveralls

coveralls commented Apr 17, 2025

Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 14553906133

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.578%

Totals Coverage Status
Change from base Build 14417824626: 0.0%
Covered Lines: 11564
Relevant Lines: 11851

💛 - Coveralls

@codecov

codecov Bot commented Apr 17, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.66%. Comparing base (119b604) to head (938663d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #512   +/-   ##
=======================================
  Coverage   97.65%   97.66%           
=======================================
  Files          46       46           
  Lines       13750    13762   +12     
=======================================
+ Hits        13428    13440   +12     
  Misses        322      322           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@biojppm biojppm force-pushed the master branch 2 times, most recently from fbee0da to e4aa0b3 Compare August 24, 2025 18:35
@biojppm biojppm marked this pull request as draft September 24, 2025 23:04
@biojppm biojppm changed the base branch from master to add_visit_error October 1, 2025 23:33
@biojppm biojppm force-pushed the add_visit_error branch 2 times, most recently from 2d3a5b5 to d6b80e0 Compare October 8, 2025 09:28
@biojppm biojppm force-pushed the add_visit_error branch 2 times, most recently from 779b983 to 3c045c9 Compare October 20, 2025 14:44
@biojppm biojppm force-pushed the add_visit_error branch 2 times, most recently from 9092456 to 62f058c Compare December 28, 2025 19:18
Base automatically changed from add_visit_error to master December 30, 2025 12:53
@biojppm biojppm force-pushed the nosubmodules branch 3 times, most recently from 72ed803 to b121877 Compare January 7, 2026 15:53
/* 44 , */ __, /* 45 - */ __, /* 46 . */ __, /* 47 / */ 63,
/* 48 0 */ 52, /* 49 1 */ 53, /* 50 2 */ 54, /* 51 3 */ 55,
/* 52 4 */ 56, /* 53 5 */ 57, /* 54 6 */ 58, /* 55 7 */ 59,
/* 56 8 */ 60, /* 57 9 */ 61, /* 58 : */ __, /* 59 ; */ __,

Check notice

Code scanning / CodeQL

Commented-out code

This comment appears to contain commented-out code.

Copilot Autofix

AI 5 months ago

To fix the problem, we should keep the informational content of the comment (that this entry corresponds to ASCII 59, ;) but alter the syntax enough so that the semicolon no longer looks like a standalone code token to the analyzer. The rest of the table uses the same format for punctuation, so the minimal change is to wrap the character in quotes or otherwise mark it as descriptive text, as suggested in the recommendation.

The best low-impact fix is to update just that single comment from /* 59 ; */ to something like /* 59 ';' */ (or " ; "), which still clearly documents the character but makes it obvious that the semicolon is part of a string literal-style annotation rather than code. This does not change any compiled code or data; it only affects the comment, so functionality is unchanged. The edit is localized to ext/c4core.src/c4/base64.cpp, on the line with /* 59 ; */, inside the base64_char_to_sextet_ table. No new methods, imports, or definitions are needed.

Suggested changeset 1
ext/c4core.src/c4/base64.cpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/base64.cpp b/ext/c4core.src/c4/base64.cpp
--- a/ext/c4core.src/c4/base64.cpp
+++ b/ext/c4core.src/c4/base64.cpp
@@ -54,7 +54,7 @@
     /* 44 ,  */ __, /* 45 -  */ __, /* 46 .  */ __, /* 47 /  */ 63,
     /* 48 0  */ 52, /* 49 1  */ 53, /* 50 2  */ 54, /* 51 3  */ 55,
     /* 52 4  */ 56, /* 53 5  */ 57, /* 54 6  */ 58, /* 55 7  */ 59,
-    /* 56 8  */ 60, /* 57 9  */ 61, /* 58 :  */ __, /* 59 ;  */ __,
+    /* 56 8  */ 60, /* 57 9  */ 61, /* 58 :  */ __, /* 59 ';' */ __,
     /* 60 <  */ __, /* 61 =  */ __, /* 62 >  */ __, /* 63 ?  */ __,
     /* 64 @  */ __, /* 65 A  */  0, /* 66 B  */  1, /* 67 C  */  2,
     /* 68 D  */  3, /* 69 E  */  4, /* 70 F  */  5, /* 71 G  */  6,
EOF
@@ -54,7 +54,7 @@
/* 44 , */ __, /* 45 - */ __, /* 46 . */ __, /* 47 / */ 63,
/* 48 0 */ 52, /* 49 1 */ 53, /* 50 2 */ 54, /* 51 3 */ 55,
/* 52 4 */ 56, /* 53 5 */ 57, /* 54 6 */ 58, /* 55 7 */ 59,
/* 56 8 */ 60, /* 57 9 */ 61, /* 58 : */ __, /* 59 ; */ __,
/* 56 8 */ 60, /* 57 9 */ 61, /* 58 : */ __, /* 59 ';' */ __,
/* 60 < */ __, /* 61 = */ __, /* 62 > */ __, /* 63 ? */ __,
/* 64 @ */ __, /* 65 A */ 0, /* 66 B */ 1, /* 67 C */ 2,
/* 68 D */ 3, /* 69 E */ 4, /* 70 F */ 5, /* 71 G */ 6,
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
/*108 l */ 37, /*109 m */ 38, /*110 n */ 39, /*111 o */ 40,
/*112 p */ 41, /*113 q */ 42, /*114 r */ 43, /*115 s */ 44,
/*116 t */ 45, /*117 u */ 46, /*118 v */ 47, /*119 w */ 48,
/*120 x */ 49, /*121 y */ 50, /*122 z */ 51, /*123 { */ __,

Check notice

Code scanning / CodeQL

Commented-out code

This comment appears to contain commented-out code.

Copilot Autofix

AI 5 months ago

In general, to fix this kind of issue you either (a) remove the suspect comment if it’s not needed, or (b) reformat it so that it no longer resembles code (for example, by quoting it or adding explanatory text), while keeping the informational content.

Here, the comment documents that index 123 corresponds to the { character. We should preserve that information but make the comment clearly non‑code. The simplest non‑disruptive fix is to tweak the comment text so the { is not a standalone C++ token, e.g. change /*123 { */ __, to /*123 '{'*/ __, or /*123 char '{' */ __,. This keeps the content, avoids affecting the initializer values, and satisfies the recommendation to enclose code‑like snippets in quotes.

Concretely, in ext/c4core.src/c4/base64.cpp, on the line that currently reads:

/*120 x  */ 49, /*121 y  */ 50, /*122 z  */ 51, /*123 {  */ __,

we will change only the last inline comment to something like:

/*120 x  */ 49, /*121 y  */ 50, /*122 z  */ 51, /*123 '{'*/ __,

No additional includes or definitions are required.

Suggested changeset 1
ext/c4core.src/c4/base64.cpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/base64.cpp b/ext/c4core.src/c4/base64.cpp
--- a/ext/c4core.src/c4/base64.cpp
+++ b/ext/c4core.src/c4/base64.cpp
@@ -70,7 +70,7 @@
     /*108 l  */ 37, /*109 m  */ 38, /*110 n  */ 39, /*111 o  */ 40,
     /*112 p  */ 41, /*113 q  */ 42, /*114 r  */ 43, /*115 s  */ 44,
     /*116 t  */ 45, /*117 u  */ 46, /*118 v  */ 47, /*119 w  */ 48,
-    /*120 x  */ 49, /*121 y  */ 50, /*122 z  */ 51, /*123 {  */ __,
+    /*120 x  */ 49, /*121 y  */ 50, /*122 z  */ 51, /*123 '{'*/ __,
     /*124 |  */ __, /*125 }  */ __, /*126 ~  */ __, /*127 DEL*/ __,
     #undef __
 };
EOF
@@ -70,7 +70,7 @@
/*108 l */ 37, /*109 m */ 38, /*110 n */ 39, /*111 o */ 40,
/*112 p */ 41, /*113 q */ 42, /*114 r */ 43, /*115 s */ 44,
/*116 t */ 45, /*117 u */ 46, /*118 v */ 47, /*119 w */ 48,
/*120 x */ 49, /*121 y */ 50, /*122 z */ 51, /*123 { */ __,
/*120 x */ 49, /*121 y */ 50, /*122 z */ 51, /*123 '{'*/ __,
/*124 | */ __, /*125 } */ __, /*126 ~ */ __, /*127 DEL*/ __,
#undef __
};
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +188 to +190
// MSVC cant handle the C4_FOR_EACH macro... need to fix this
//#define C4_BEGIN_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_BEGIN_NAMESPACE, , __VA_ARGS__)
//#define C4_END_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_END_NAMESPACE, , __VA_ARGS__)

Check notice

Code scanning / CodeQL

Commented-out code

This comment appears to contain commented-out code.

Copilot Autofix

AI 5 months ago

In general, to fix commented-out code you either (1) delete it if it is no longer needed, or (2) reinstate it as active code if it is still required. If it is only needed as an example, rewrite it as non-compilable pseudocode or document it in a clearer comment format.

Here, the two #define lines using C4_FOR_EACH_SEP are an unused alternative implementation of C4_BEGIN_NAMESPACE / C4_END_NAMESPACE. The active implementation is already provided immediately afterward, and the preceding comment about MSVC’s limitation is still useful. The minimal, non-functional-change fix is therefore to keep the comment about MSVC but remove the two commented-out macro lines.

Concretely:

  • In ext/c4core.src/c4/language.hpp, in the region around lines 188–193, remove:
    • //#define C4_BEGIN_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_BEGIN_NAMESPACE, , __VA_ARGS__)
    • //#define C4_END_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_END_NAMESPACE, , __VA_ARGS__)
  • Leave the surrounding comment and the active #define C4_BEGIN_NAMESPACE(ns) and #define C4_END_NAMESPACE(ns) unchanged.
  • No new imports, methods, or definitions are required.
Suggested changeset 1
ext/c4core.src/c4/language.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/language.hpp b/ext/c4core.src/c4/language.hpp
--- a/ext/c4core.src/c4/language.hpp
+++ b/ext/c4core.src/c4/language.hpp
@@ -186,8 +186,6 @@
 #define _C4_END_NAMESPACE(ns)   }
 
 // MSVC cant handle the C4_FOR_EACH macro... need to fix this
-//#define C4_BEGIN_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_BEGIN_NAMESPACE, , __VA_ARGS__)
-//#define C4_END_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_END_NAMESPACE, , __VA_ARGS__)
 #define C4_BEGIN_NAMESPACE(ns) namespace ns {
 #define C4_END_NAMESPACE(ns) }
 
EOF
@@ -186,8 +186,6 @@
#define _C4_END_NAMESPACE(ns) }

// MSVC cant handle the C4_FOR_EACH macro... need to fix this
//#define C4_BEGIN_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_BEGIN_NAMESPACE, , __VA_ARGS__)
//#define C4_END_NAMESPACE(...) C4_FOR_EACH_SEP(_C4_END_NAMESPACE, , __VA_ARGS__)
#define C4_BEGIN_NAMESPACE(ns) namespace ns {
#define C4_END_NAMESPACE(ns) }

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == '.')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix “for loop variable changed in body,” move any complex or nonstandard manipulation of the loop variable out of a for loop and into a while (or equivalent) structure where manual updates are explicit and centralized. The loop control expression should be the only place that regularly changes the counter; exceptions like skipping ahead should be structurally obvious.

For this specific case in _first_real_span_dec, the best fix is to convert the for( ; pos < len; ++pos) loop into a while(pos < len) loop and replace the implicit ++pos in the for header with explicit increments at the end of each path that continues scanning. The existing ++pos; before the goto statements should remain, because they correctly advance past the current character before jumping into the fractional or power parsing sections. For branches that return from the function, we do not need to increment. For the main “continue scanning integral part” case (digits and delimiters not causing early return), we add ++pos; explicitly at the bottom of the loop body.

Concretely, in ext/c4core.src/c4/substr.hpp around line 1310, replace:

  • for( ; pos < len; ++pos) with while(pos < len),
  • keep the inner logic unchanged except for adding a single ++pos; just before the closing brace of the loop body, so that each iteration that doesn’t return or goto advances pos once. No new includes or helper methods are needed.
Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1307,7 +1307,7 @@
         bool fracchars = false;
         bool powchars;
         // integral part
-        for( ; pos < len; ++pos)
+        while(pos < len)
         {
             const char c = str[pos];
             if(c >= '0' && c <= '9')
@@ -1332,6 +1332,8 @@
             {
                 return first(0);
             }
+
+            ++pos;
         }
         // no . or p were found; this is either an integral number
         // or not a number at all
EOF
@@ -1307,7 +1307,7 @@
bool fracchars = false;
bool powchars;
// integral part
for( ; pos < len; ++pos)
while(pos < len)
{
const char c = str[pos];
if(c >= '0' && c <= '9')
@@ -1332,6 +1332,8 @@
{
return first(0);
}

++pos;
}
// no . or p were found; this is either an integral number
// or not a number at all
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == 'e' || c == 'E')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix this kind of issue you either (1) stop modifying the for loop counter inside the body and rely solely on the loop’s increment expression, or (2) change the construct to a while loop (or similar) where manual manipulation of the index is expected and clear. Here, because pos needs to be incremented conditionally (and is immediately followed by goto statements), the most straightforward fix that preserves behavior is to replace the for loop with a while loop and take full manual control of pos increments.

Concretely, in _first_real_span_dec, replace:

  • The header for( ; pos < len; ++pos) with while(pos < len).
  • The unconditional increment ++pos in the for header is no longer present, so we must add explicit ++pos; where the loop previously would have incremented at the bottom of each iteration. The safest and clearest approach is to increment pos at the end of the main scanning iteration for the “normal” control‑flow path (after the if/else if chain), while leaving the existing ++pos; lines before the goto statements intact, since those branches previously benefited from both the manual increment and the for loop’s implicit increment (net effect was pos += 2 before the target label executed). To replicate that behavior exactly, we keep the ++pos; before each goto and do not add an extra increment after the goto labels; the labels and their logic (not shown) will continue from the correct index. No new headers, methods, or imports are needed; the change is purely a restructuring of the loop in ext/c4core.src/c4/substr.hpp around lines 1310–1335.
Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1307,12 +1307,13 @@
         bool fracchars = false;
         bool powchars;
         // integral part
-        for( ; pos < len; ++pos)
+        while(pos < len)
         {
             const char c = str[pos];
             if(c >= '0' && c <= '9')
             {
                 intchars = true;
+                ++pos;
             }
             else if(c == '.')
             {
EOF
@@ -1307,12 +1307,13 @@
bool fracchars = false;
bool powchars;
// integral part
for( ; pos < len; ++pos)
while(pos < len)
{
const char c = str[pos];
if(c >= '0' && c <= '9')
{
intchars = true;
++pos;
}
else if(c == '.')
{
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == 'p' || c == 'P')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix “for loop variable changed in body” issues, you either (1) remove in-body modifications and rely solely on the for header increment, or (2) convert the loop to a while/manual-loop form where explicit manipulation of the index variable is expected and clear. Here the function implements a state machine over pos with goto labels; pos is intentionally incremented in some branches before jumping to another labeled block. This matches pattern (2): the cleanest fix is to replace the for loops in _first_real_span_bin with while loops over pos, keeping the explicit ++pos operations in the body as they are. That preserves behavior and removes the confusing mixture of automatic and manual increments.

Concretely, in _first_real_span_bin, there is one visible for loop (and almost certainly a second one in the fractional_part_bin section analogous to _first_real_span_hex). CodeQL’s alert refers to the loop header at line 1488 and the in-body ++pos at line 1501. We will change that for( ; pos < len; ++pos) to a while(pos < len) and move the ++pos to appropriate places in the loop body, ensuring we still advance pos for normal iteration. Since every path that uses the in-body ++pos then leaves the loop via goto or return, we can simply add a ++pos; at the end of the loop body to mimic the for loop’s increment for the non-goto/non-return paths. This way, explicit increments in the body (like the one before the goto power_part_bin) will continue to work identically, and the rule’s requirement—that the loop counter not be modified in the body of a for loop—is satisfied by no longer using a for loop at all.

No new methods or imports are needed; all changes are local to _first_real_span_bin in ext/c4core.src/c4/substr.hpp, around the integral-part scanning loop (and, if present in the unseen part, its sibling loop for the binary exponent).

Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1484,7 +1484,7 @@
         bool fracchars = false;
         bool powchars;
         // integral part
-        for( ; pos < len; ++pos)
+        while(pos < len)
         {
             const char c = str[pos];
             if(c == '0' || c == '1')
@@ -1509,6 +1509,7 @@
             {
                 return first(0);
             }
+            ++pos;
         }
         // no . or p were found; this is either an integral number
         // or not a number at all
EOF
@@ -1484,7 +1484,7 @@
bool fracchars = false;
bool powchars;
// integral part
for( ; pos < len; ++pos)
while(pos < len)
{
const char c = str[pos];
if(c == '0' || c == '1')
@@ -1509,6 +1509,7 @@
{
return first(0);
}
++pos;
}
// no . or p were found; this is either an integral number
// or not a number at all
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == 'p' || c == 'P')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix “for loop variable changed in body” issues you either (1) stop modifying the loop variable inside the body and rely solely on the loop’s increment expression, or (2) change the loop to a while or do loop where manual control of the index is expected. Here, the cleanest fix that preserves semantics is to rearrange the control flow so that pos is not incremented in the body of any for loop, but rather after the loop(s), before the power_part_bin section.

Concretely for _first_real_span_bin, pos is manually incremented in two places:

  • In the integral part loop when encountering 'p' or 'P' (++pos; goto power_part_bin;).
  • In the fractional part loop when encountering 'p' or 'P' (++pos; goto power_part_bin; on line 1530).

Both increments exist so that, when we jump to power_part_bin, pos points at the sign character ('+' or '-'), not at the 'p'/'P'. But the power_part_bin label is always reached by goto or by falling through the end of the function; no active for loop continues after the goto. We can therefore:

  • Remove the ++pos; before each goto power_part_bin;.
  • At the power_part_bin label, increment pos once at the top so that its value matches the previous code’s expectations.

This way, the loop variable pos is not mutated in the body of any for loop; its value is only advanced either by the loop header (++pos in for) or in straight‑line code outside the loops. The only code changes are within _first_real_span_bin in ext/c4core.src/c4/substr.hpp; no new headers or helpers are needed.

Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1527,7 +1527,6 @@
             }
             else if(c == 'p' || c == 'P')
             {
-                ++pos;
                 goto power_part_bin; // NOLINT
             }
             else if(_is_delim_char(c))
@@ -1545,6 +1544,7 @@
     power_part_bin:
         C4_ASSERT(pos > 0);
         C4_ASSERT(str[pos - 1] == 'p' || str[pos - 1] == 'P');
+        ++pos; // move past 'p'/'P' to the expected sign
         // either a + or a - is expected here, followed by more chars.
         // also, using (pos+1) in this check will cause an early
         // return when no more chars follow the sign.
EOF
@@ -1527,7 +1527,6 @@
}
else if(c == 'p' || c == 'P')
{
++pos;
goto power_part_bin; // NOLINT
}
else if(_is_delim_char(c))
@@ -1545,6 +1544,7 @@
power_part_bin:
C4_ASSERT(pos > 0);
C4_ASSERT(str[pos - 1] == 'p' || str[pos - 1] == 'P');
++pos; // move past 'p'/'P' to the expected sign
// either a + or a - is expected here, followed by more chars.
// also, using (pos+1) in this check will cause an early
// return when no more chars follow the sign.
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == '.')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix “for loop variable changed in body” issues, avoid explicit assignments or increments to the loop variable inside the loop body. Instead, either (a) move that logic into the for loop’s increment expression, or (b) switch the loop to a while and manage the increment explicitly in one place, or (c) use a separate index variable if multiple step patterns are required.

In this function, pos is incremented in the loop header (++pos) and again inside the loop body when a . is encountered (line 1586) before jumping to the fractional_part_oct label. That causes the loop counter to be modified in the body. We can preserve the original semantics by moving the pos increment that conceptually “consumes” the . into the transition logic after the loop, where parsing of the fractional part begins, and by removing the ++pos from the loop body. Concretely:

  • At line 1586, remove the ++pos; and leave just the goto fractional_part_oct;.
  • Just before the fractional_part_oct: label (after the return that ends the integral-only path), insert ++pos; with a clarifying comment, so that when control jumps to the label, it first advances past the . character, matching the original behavior.
  • Leave the for header for( ; pos < len; ++pos) unchanged; pos will still advance on each loop iteration, but no longer be modified inside the loop body.

This keeps all observable behavior the same: when c == '.', previously pos was incremented in the body and then incremented again by the loop header on the next iteration. After the change, we still increment pos once in the loop increment, and when we later jump into the fractional_part_oct block, we increment pos once more at the top of that section, which is exactly what used to happen before entering the fractional part. The fix is localized to ext/c4core.src/c4/substr.hpp inside the _first_real_span_oct method.

Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1583,7 +1583,7 @@
             }
             else if(c == '.')
             {
-                ++pos;
+                // consume '.' and continue parsing in the fractional part
                 goto fractional_part_oct; // NOLINT
             }
             else if(c == 'p' || c == 'P')
@@ -1605,6 +1605,8 @@
         return intchars ?
             *this :
             first(0);
+    // we jumped here after seeing '.', so advance past it now
+    ++pos;
     fractional_part_oct:
         C4_ASSERT(pos > 0);
         C4_ASSERT(str[pos - 1] == '.');
EOF
@@ -1583,7 +1583,7 @@
}
else if(c == '.')
{
++pos;
// consume '.' and continue parsing in the fractional part
goto fractional_part_oct; // NOLINT
}
else if(c == 'p' || c == 'P')
@@ -1605,6 +1605,8 @@
return intchars ?
*this :
first(0);
// we jumped here after seeing '.', so advance past it now
++pos;
fractional_part_oct:
C4_ASSERT(pos > 0);
C4_ASSERT(str[pos - 1] == '.');
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == 'p' || c == 'P')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix “for loop variable changed in body” problems, you either (1) remove any mutation of the loop counter from the loop body and express all advancement in the loop header, or (2) replace the for loop with a while loop where more complex control of the index is explicit and expected. The goal is that the loop’s controlling expression is the only place where the loop variable is incremented or otherwise changed.

In this case, we must keep the existing parsing behavior: when the integral‑part loop sees a 'p' or 'P', it should leave the loop, and power_part_oct should see pos pointing to the character after that 'p'/'P'. Currently this is achieved via ++pos; goto power_part_oct; inside the loop. To avoid changing the loop variable in the body, we can instead (a) record that we need to transition to power_part_oct, (b) break out of the loop, and (c) after the loop, when we still know we just left via a 'p'/'P', increment pos once and jump to power_part_oct. Concretely:

  • Replace the else if(c == 'p' || c == 'P') arm inside the loop with a break; that exits the loop after noting that we found a power marker.
  • Introduce a small state flag variable (e.g., bool has_power = false;) initialized before the loop, set to true when we see 'p'/'P'.
  • After the loop (before the comment “// no . or p were found; this is either an integral number …”), add a if(has_power) { ++pos; goto power_part_oct; } block. At this point, pos is still the index of the 'p'/'P' character, so incrementing it moves to the correct starting point for power_part_oct.
  • Leave the rest of the code (including the power_part_oct label and logic) unchanged.

All changes are confined to _first_real_span_oct in ext/c4core.src/c4/substr.hpp. No new headers or external dependencies are required, since we only add a bool local and adjust control flow.

Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1573,6 +1573,7 @@
         bool intchars = false;
         bool fracchars = false;
         bool powchars;
+        bool has_power = false;
         // integral part
         for( ; pos < len; ++pos)
         {
@@ -1588,8 +1589,8 @@
             }
             else if(c == 'p' || c == 'P')
             {
-                ++pos;
-                goto power_part_oct; // NOLINT
+                has_power = true;
+                break;
             }
             else if(_is_delim_char(c))
             {
@@ -1600,6 +1601,13 @@
                 return first(0);
             }
         }
+        if(has_power)
+        {
+            // pos currently indexes the 'p' or 'P' character; advance
+            // to the following character to start parsing the power part.
+            ++pos;
+            goto power_part_oct; // NOLINT
+        }
         // no . or p were found; this is either an integral number
         // or not a number at all
         return intchars ?
EOF
@@ -1573,6 +1573,7 @@
bool intchars = false;
bool fracchars = false;
bool powchars;
bool has_power = false;
// integral part
for( ; pos < len; ++pos)
{
@@ -1588,8 +1589,8 @@
}
else if(c == 'p' || c == 'P')
{
++pos;
goto power_part_oct; // NOLINT
has_power = true;
break;
}
else if(_is_delim_char(c))
{
@@ -1600,6 +1601,13 @@
return first(0);
}
}
if(has_power)
{
// pos currently indexes the 'p' or 'P' character; advance
// to the following character to start parsing the power part.
++pos;
goto power_part_oct; // NOLINT
}
// no . or p were found; this is either an integral number
// or not a number at all
return intchars ?
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
}
else if(c == 'p' || c == 'P')
{
++pos;

Check notice

Code scanning / CodeQL

For loop variable changed in body

Loop counters should not be modified in the body of the [loop](1).

Copilot Autofix

AI 5 months ago

In general, to fix this kind of issue you should ensure that the for loop’s iteration variable is only changed in one place: either in the loop’s increment expression or explicitly in the body, but not both. When more complex movement of the index is needed, a while loop (with all increments handled explicitly in the body) is often clearer.

Here, the problematic construct is:

for( ; pos < len; ++pos)
{
    ...
    else if(c == 'p' || c == 'P')
    {
        ++pos;
        goto power_part_oct; // NOLINT
    }
    ...
}

At the power_part_oct label, the code assumes pos refers to the sign character (+ or -), then increments once more to move past it:

power_part_oct:
    C4_ASSERT(pos > 0);
    C4_ASSERT(str[pos - 1] == 'p' || str[pos - 1] == 'P');
    ...
    ++pos; // this was the sign.
    ...
    for( ; pos < len; ++pos)
    {
        ...
    }

So from fractional_part_oct, the intent on encountering 'p'/'P' is:

  • Increment pos once in the loop body so that pos points to the sign.
  • Jump to power_part_oct, which validates and then increments pos once more so that it points at the first power digit.

The for loop’s ++pos in the header is not relied upon when we go via goto (the loop condition is not evaluated again after the goto), so the only problematic double‑increment is within the fractional_part_oct loop itself: ++pos in the body plus ++pos in the header on the same iteration.

We can maintain the existing behavior while fixing the CodeQL finding by:

  • Changing the fractional_part_oct loop to a while loop that does not automatically increment pos.
  • Explicitly incrementing pos at the bottom of the loop body when we continue scanning fractional digits.
  • Retaining the ++pos; before goto power_part_oct; so that we still land on the sign character as before.

This way, pos is only modified explicitly in the body; there is no hidden increment in the loop construct, and the observable behavior of the parser remains the same.

Concretely, in ext/c4core.src/c4/substr.hpp:

  • Replace the for( ; pos < len; ++pos) loop under the fractional_part_oct: label (around lines 1611–1631) with a while(pos < len) loop.
  • Inside the loop:
    • At the beginning of each iteration, use const char c = str[pos]; as before.
    • For the 'p'/'P' case, keep ++pos; goto power_part_oct;.
    • For delimiter and invalid cases, return exactly as before.
    • At the end of the loop body (after the if/else chain), add ++pos; to advance to the next character when we continue scanning the fractional part.
  • Leave the power_part_oct section and its for loop unchanged.

No additional methods or imports are needed; this is a local control‑flow refactor.

Suggested changeset 1
ext/c4core.src/c4/substr.hpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ext/c4core.src/c4/substr.hpp b/ext/c4core.src/c4/substr.hpp
--- a/ext/c4core.src/c4/substr.hpp
+++ b/ext/c4core.src/c4/substr.hpp
@@ -1608,7 +1608,7 @@
     fractional_part_oct:
         C4_ASSERT(pos > 0);
         C4_ASSERT(str[pos - 1] == '.');
-        for( ; pos < len; ++pos)
+        while(pos < len)
         {
             const char c = str[pos];
             if(c >= '0' && c <= '7')
@@ -1628,6 +1628,7 @@
             {
                 return first(0);
             }
+            ++pos;
         }
         return intchars || fracchars ?
             *this :
EOF
@@ -1608,7 +1608,7 @@
fractional_part_oct:
C4_ASSERT(pos > 0);
C4_ASSERT(str[pos - 1] == '.');
for( ; pos < len; ++pos)
while(pos < len)
{
const char c = str[pos];
if(c >= '0' && c <= '7')
@@ -1628,6 +1628,7 @@
{
return first(0);
}
++pos;
}
return intchars || fracchars ?
*this :
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@biojppm biojppm force-pushed the nosubmodules branch 2 times, most recently from 3869565 to 91c88fe Compare January 14, 2026 18:43
@biojppm biojppm force-pushed the nosubmodules branch 4 times, most recently from 1ff5a23 to df6a8d4 Compare January 23, 2026 11:35
@marcalff

marcalff commented Feb 7, 2026

Copy link
Copy Markdown
Contributor

@biojppm

If I understand correctly, the c4core submodule is replaced by a copy of the c4core repository, correct ?

Why is this desirable, and what issue does this resolve ?

Do you plan to remove the c4core repository and only maintain it within rapidyaml, and if not, how will the fixes from c4core be applied to rapidyaml (beside manually) ?

Please clarify the intent here, I am really worried about this change.

@biojppm

biojppm commented Feb 7, 2026

Copy link
Copy Markdown
Owner Author

If I understand correctly, the c4core submodule is replaced by a copy of the c4core repository, correct ?

Yes, with some details to it:

  • this PR removes the c4core submodule, and copies c4core code into the rapidyaml tree, to two different dirs:
    • ext/c4core.src: c4core code used in the rapidyaml library
    • ext/c4core.dev: c4core code used used by tests/benchmarks
      Doing this makes the standalone library marginally smaller, and enforces the standalone default of rapidyaml bypassing the submodule. So most users (which don't care about c4core itself) will now see rapidyaml as a plain project without extra dependencies or submodules.
  • this PR retains the existing option to compile against a full instance of the c4core library, thus ignoring the in-source copy under ext/c4core.src and ext/c4core.dev
    • the default is still to compile in standalone mode (ie c4core part of rapidyaml)
    • the pre-existing cmake option RYML_STANDALONE still enables users to build rapidyaml against a previously build c4core library (like in the ubuntu packages where rapidyaml depends on c4core).
  • the amalgamation tool should work just as before

Why is this desirable, and what issue does this resolve ?

I have had repeated feedback that the c4core submodule makes it impossible for some people to use rapidyaml, either because it contains submodules, or because c4core is a dependency boogeyman.

While I disagree with both of these reasonings, it is not for me to dispute their choice.

So I am removing these reasons by making the c4core code really standalone in rapidyaml, while striving to retain all current existing workflows.


Do you plan to remove the c4core repository and only maintain it within rapidyaml

No, not at all. c4core will remain just as it is, in https://github.com/biojppm/c4core.

and if not, how will the fixes from c4core be applied to rapidyaml (beside manually) ?

This PR adds a helper makefile to simplify the workflow, with two main approaches:

  • create an in-source clone of c4core (parallel to the existing copies, under the old location of the submodule ext/c4core) and...
  • importing changes from c4core clone to rapidyaml copies
  • exporting changes from rapidyaml copies to c4core clone
  • check sync status.

The CI will enforce that the in-source c4core copy is in sync with the target commit hash (just like a submodule dirty status).

While this PR is still a WIP, it will eventually contain instructions on how to use the makefile to accomplish these steps.


Please clarify the intent here, I am really worried about this change.

Thanks for looking into this, I truly appreciate.

I am still working in this PR, so this is truly the proper time to start offering feedback. I have made a conscious effort to retain all existing usage scenarios, but if there is something I'm overlooking, I'd like to hear about it.

And FWIW, after getting this PR ready and green, I will let it linger for some weeks before merging it.

@biojppm

biojppm commented Feb 7, 2026

Copy link
Copy Markdown
Owner Author

@marcalff

The makefile helper to sync with c4core is at ext/Makefile, and the README with instructions will be at ext/README.md.

The file enforcing the c4core version is at ext/c4core.mk.

@marcalff

marcalff commented Feb 7, 2026

Copy link
Copy Markdown
Contributor

@biojppm

Thanks for the context and clarifications, it helps.

The reason I care in particular is because I now have a dependency on rapidyaml, as you may have noticed already.

If not, it comes from:

@BwL1289

BwL1289 commented Mar 2, 2026

Copy link
Copy Markdown

Looking forward to this PR. Prefer the standalone version rather than pulling in purpose-built utils.

@biojppm biojppm force-pushed the nosubmodules branch 5 times, most recently from d2374e3 to 0768a11 Compare March 23, 2026 00:00
@biojppm

biojppm commented Mar 26, 2026

Copy link
Copy Markdown
Owner Author

BTW, @musicinmybrain please take note of this (eventually) incoming PR.

@musicinmybrain

Copy link
Copy Markdown
Contributor

BTW, @musicinmybrain please take note of this (eventually) incoming PR.

Thanks! I’ll keep an eye on this, although I probably won’t test packaging it until it lands. I can’t tell at a glance what I will need to change to adapt to this, but I will be surprised if it makes things harder given that we already build rapidyaml against a separately-packaged c4core shared library in Fedora.

@biojppm biojppm force-pushed the nosubmodules branch 2 times, most recently from 3af727d to 2dae956 Compare March 27, 2026 00:09
@biojppm

biojppm commented Apr 7, 2026

Copy link
Copy Markdown
Owner Author

will the removal of the submodule also affect any future updates to c4core, or will there be a plan to keep it in sync?

c4core will continue to be updated just as it has been so far. The two repos will be in sync (enforced by the CI), and there will be tools and instructions to simplify that process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants