Skip to content

Decompiler: Add builtin_memset optimizations#9232

Open
EmirX3D wants to merge 2 commits into
NationalSecurityAgency:masterfrom
EmirX3D:decompiler-stack-fill-builtins
Open

Decompiler: Add builtin_memset optimizations#9232
EmirX3D wants to merge 2 commits into
NationalSecurityAgency:masterfrom
EmirX3D:decompiler-stack-fill-builtins

Conversation

@EmirX3D

@EmirX3D EmirX3D commented May 29, 2026

Copy link
Copy Markdown

Add decompiler built-ins for builtin_bzero and builtin_memset, which extends the constant sequence/string copy optimization pass, detecting repeated byte writes and collapse zero-filled or constant-filled memory regions (including patterns like zeroed buffers and non-printable fill bytes such as 0xAA) into the appropriate calls for improved output.

Closes #9230.

@LukeSerne

Copy link
Copy Markdown
Contributor

Interesting PR!

I tried this on a debug xml generated from the sample provided in issue #9230: issue_9230_inlined_memcpy.xml
I applied this PR, built and ran decomp_dbg, loaded the xml and decompiled main. However, the output only included the builtin_memset, not the builtin_bzero:

undefined8 main(void)                                     
                                                          
{                                                         
  char local_68 [12];
  undefined4 uStack_5c;
  undefined8 local_58;
  undefined8 uStack_50;
  undefined8 local_48;
  undefined8 uStack_40;
  undefined8 local_38;
  undefined8 uStack_30;
  undefined8 local_28;
  undefined8 uStack_20;
  undefined8 local_18;
  undefined8 uStack_10;
  undefined4 local_8;
  
  local_18 = 0;
  uStack_10 = 0;
  local_28 = 0;
  uStack_20 = 0;
  local_38 = 0;
  uStack_30 = 0;
  local_48 = 0;
  uStack_40 = 0;
  local_58 = 0;
  uStack_50 = 0;
  local_68[0] = '\0';
  local_68[1] = '\0';
  local_68[2] = '\0';
  local_68[3] = '\0';
  local_68[4] = '\0';
  local_68[5] = '\0';
  local_68[6] = '\0';
  local_68[7] = '\0';
  local_68[8] = '\0';
  local_68[9] = '\0';
  local_68[10] = '\0';
  local_68[0xb] = '\0';
  uStack_5c = 0;
  local_8 = 0;
  printf(local_68);
  local_18 = 0xaaaaaaaaaaaaaaaa;
  uStack_10 = 0xaaaaaaaaaaaaaaaa;
  local_28 = 0xaaaaaaaaaaaaaaaa;
  uStack_20 = 0xaaaaaaaaaaaaaaaa;
  local_38 = 0xaaaaaaaaaaaaaaaa;
  uStack_30 = 0xaaaaaaaaaaaaaaaa;
  local_48 = 0xaaaaaaaaaaaaaaaa;
  uStack_40 = 0xaaaaaaaaaaaaaaaa;
  local_58 = 0xaaaaaaaaaaaaaaaa;
  uStack_50 = 0xaaaaaaaaaaaaaaaa;
  builtin_memset(local_68,0xaa,0xc);
  uStack_5c = 0xaaaaaaaa;
  local_8 = 0xaaaaaaaa;
  printf(local_68);
  builtin_strncpy(local_68,"Test123Test1",0xc);
  uStack_5c = CONCAT13(uStack_5c._3_1_,0x3332);
  printf(local_68);
  return 0;
}

Even if I change the datatype of the local_68 array to be the correct length (100, issue_9230_inlined_memcpy_correct_array.xml ), the builtin_memset does increase its size correctly and the assignments of 0xAAAAAAAA disappear, but there still is no builtin_bzero, and all the byte assignments are listed individually.

Regardless of this issue, I think it's better to only introduce builtin_memset, and just let builtin_bzero(x, len) be builtin_memset(x, 0, len). This keeps the number of userops low, and IMO the behaviour of memset(x, 0, len) is more obvious than the behaviour of builtin_bzero(x, len).

@EmirX3D

EmirX3D commented May 30, 2026

Copy link
Copy Markdown
Author

@LukeSerne, Thanks for testing this and for the detailed repro. I agree with your point. I’ve made a commit to drop builtin_bzero and emit builtin_memset(x, 0, len) for zero fills instead. That keeps the userop surface smaller and makes the output clearer. I also removed the builtin_bzero registration path, so both zero-filled and constant-filled regions now go through builtin_memset. If we later find a strong need for a separate builtin_bzero for some architecture-specific reason, we can revisit it, but the simpler model seems preferable here.

@LukeSerne

LukeSerne commented May 30, 2026

Copy link
Copy Markdown
Contributor

As for why the zero-case was not detected, I think I found the root cause:

ArraySequence::formByteArray does not form a byte array for the zero case, since that function (implicitly) computes the string length (including a single null terminator) and rejects the byte array if it's shorter than MINIMUM_SEQUENCE_LENGTH (4). Since the zero case only contains zero bytes, it detects a string length of 0, which is too short.

With this issue fixed, the example does decompile correctly:

undefined8 main(void)

{
  undefined4 uVar1;
  char local_68 [100];
  
  builtin_memset(local_68,0,100);
  printf(local_68);
  builtin_memset(local_68,0xaa,100);
  printf(local_68);
  uVar1 = local_68._12_4_;
  builtin_strncpy(local_68,"Test123Test123",0xf);
  local_68[0xf] = SUB41(uVar1,3);
  printf(local_68);
  return 0;
}

The change I made to ArraySequence::formByteArray is somewhat nontrivial. I ended up modifying the loop to also allow a buffer being filled with zero-bytes. I'm not sure if this is the best way to solve this problem, and it might cause side-effects elsewhere.

@@ -132,11 +132,18 @@ int4 ArraySequence::formByteArray(int4 sz,int4 slot,uint8 rootOff,bool bigEndian
   int4 bigElSize = charType->getAlignSize();
   int4 maxEl = used.size() / bigElSize;
   int4 count;
+  bool is_all_zero = false;
   for(count=0;count<maxEl;count += 1) {
     uint1 val = used[ count * bigElSize ];
     if (val != 1) {            // Count number of characters not including null terminator
-      if (val == 2)
+      if (val == 2) {
+	if (is_all_zero) continue;
+	if (count == 0) {		// First byte is a null byte -> check for all-zeroes
+	  is_all_zero = true;
+	  continue;
+	}
	count += 1;             // Allow a single null terminator
+      }
       break;
     }
   }

@LukeSerne

Copy link
Copy Markdown
Contributor

Oh, I noticed this PR also contains commits 5700770 and ddc41db, which seem unrelated. Also, since you now removed builtin_bzero, the PR title should be adjusted to reflect this.

@EmirX3D

EmirX3D commented May 31, 2026

Copy link
Copy Markdown
Author

@LukeSerne Thanks for reminding me. I'll remove the commits and update the PR title (I made a mistake when switching branches).

@EmirX3D EmirX3D changed the title Decompiler: Add builtin_bzero and builtin_memset optimizations Decompiler: Add builtin_memset optimizations May 31, 2026
@EmirX3D EmirX3D force-pushed the decompiler-stack-fill-builtins branch from 61f9180 to 94da96d Compare May 31, 2026 10:28
@LukeSerne

Copy link
Copy Markdown
Contributor

Yeah, no problem 🙂. You should also change ArraySequence::formByteArray to ensure builtin_memset is actually used for zero fills as explained in one of my previous comments.

Also, it might be better to call ::buildFill from ::transform instead of from ::buildStringCopy. That way, the documentation of buildStringCopy continues to be correct, and since memset is not really a "string copy", it will also make the function name make more sense. Maybe renaming buildFill to buildMemset would be even better. Either way, the comments in this paragraph are more code style related and don't really affect how the feature works, so feel free to ignore them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: Decompiler Status: Triage Information is being gathered

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand builtin function recovery to memset / bzero

5 participants