Skip to content

fix: respect VirtualSize when building the memory-mapped image (#392)#542

Open
gaoflow wants to merge 2 commits into
erocarrera:masterfrom
gaoflow:master
Open

fix: respect VirtualSize when building the memory-mapped image (#392)#542
gaoflow wants to merge 2 commits into
erocarrera:masterfrom
gaoflow:master

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 25, 2026

Copy link
Copy Markdown

Summary

Fixes #392.

get_memory_mapped_image() returned raw section data up to SizeOfRawData
bytes for every section, ignoring Misc_VirtualSize. The PE/COFF spec
(and the Windows loader) define two distinct behaviors:

Condition Expected mapped content
VirtualSize < SizeOfRawData Only VirtualSize bytes are mapped; the remaining disk bytes are file-alignment padding and must not appear in the memory view
VirtualSize > SizeOfRawData SizeOfRawData bytes are mapped, then the section is zero-padded to VirtualSize (BSS region)

Root cause

The single line responsible:

mapped_data += section.get_data()   # always returns SizeOfRawData bytes

section.get_data() (without ignore_padding=True) returns exactly SizeOfRawData
bytes, so file-alignment padding silently bleeds into the mapped view whenever
VirtualSize < SizeOfRawData, and the BSS tail is truncated whenever
VirtualSize > SizeOfRawData.

Fix

After fetching the raw section data, clip it to VirtualSize (drops disk
padding) and zero-pad it to VirtualSize if shorter (fills BSS):

section_data = section.get_data()
if section.Misc_VirtualSize:
    if len(section_data) > section.Misc_VirtualSize:
        section_data = section_data[: section.Misc_VirtualSize]
    elif len(section_data) < section.Misc_VirtualSize:
        section_data += b"\0" * (section.Misc_VirtualSize - len(section_data))
mapped_data += section_data

Two regression tests covering both directions of the inequality are added to
tests/export_test.py using a synthetic, self-contained PE binary so they
require no external test files.


This pull request was prepared with the assistance of AI, under my direction and review.

@j-t-1

j-t-1 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Thanks @gaoflow.

Reading this I was thinking section.get_data() could be updated so it does this, but I think the way you have done may be simpler and clearer.

Above you mention BSS, which would be the obvious example section where VirtualSize > SizeOfRawData, so although the fix works for any section that satisfies this inequality mentioning BSS clarifies when this would be mainly be used.

@j-t-1 j-t-1 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the approach.

Comment thread tests/export_test.py Outdated
Comment thread tests/export_test.py Outdated
Comment thread pefile.py Outdated
Comment thread pefile.py Outdated
Comment thread tests/export_test.py Outdated
Comment thread tests/export_test.py Outdated
@j-t-1

j-t-1 commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

@nightlark the test @gaoflow has done uses a function to create a non-functional PE file. What do you think of making it more generic and thus able to be used for other tests? Outside of this PR (but maybe just changing this PR so it the arguments are keyword only).

@gaoflow

gaoflow commented Jun 27, 2026

Copy link
Copy Markdown
Author

Thanks @j-t-1 — all addressed in e4b5ed2:

  • Inequalities in pefile.py: rewrote the guard so section.Misc_VirtualSize is bound once to virtual_size and the comparisons read len > virtual_size / len < virtual_size, matching the comment.
  • Format string: fixed to <IIHHI — the section-header tail is PointerToRelocations(I) PointerToLinenumbers(I) NumberOfRelocations(H) NumberOfLinenumbers(H) Characteristics(I). The earlier <IHHHI mislaid the NumberOfLinenumbers half-word.
  • raw_ptr naming: renamed to pointer_to_raw_data throughout the builder.
  • Tests moved: the two new regression tests now live in pefile_test.py (Test_memory_mapped_image), alongside test_relocated_memory_mapped_image; the helper is renamed to _create_pe so it can be reused by other tests. export_test.py is back to upstream.

Both new tests pass locally (python -m unittest pefile_test.Test_memory_mapped_image). Happy to adjust anything else.

This pull request was prepared with the assistance of AI, under my direction and review.

…rrera#392)

get_memory_mapped_image() previously appended the full SizeOfRawData
bytes for each section regardless of Misc_VirtualSize.

According to the PE/COFF spec the Windows loader only maps
Misc_VirtualSize bytes into memory:
- if VirtualSize < SizeOfRawData, the extra disk bytes are file-alignment
  padding and must not appear in the mapped view;
- if VirtualSize > SizeOfRawData, the gap is BSS and must be zero-padded
  in the mapped view.

Clip the section data to VirtualSize and zero-pad when needed. Fixes erocarrera#392.

Regression tests live in pefile_test.py (Test_memory_mapped_image) and
build a minimal PE32 via a reusable _create_pe helper whose section
header is packed with the correct tail layout (<IIHHI rather than the
 subtly-wrong <IHHHI).
@nightlark

nightlark commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

@nightlark the test @gaoflow has done uses a function to create a non-functional PE file. What do you think of making it more generic and thus able to be used for other tests? Outside of this PR (but maybe just changing this PR so it the arguments are keyword only).

Yea, this is an interesting approach that could give some tests tailored to checking specific cases without resorting to the regression tests.

Above you mention BSS, which would be the obvious example section where VirtualSize > SizeOfRawData, so although the fix works for any section that satisfies this inequality mentioning BSS clarifies when this would be mainly be used.

I'm reading up on what bytes should be file-backed vs zero-filled, and there's some evidence that the Windows loader rounds up to the page size, which would require a cap more along the lines of min(SizeOfRawData, roundup(VirtualSize, SectionAlignment)) -- for "correct" binaries the decision might not matter, but there are some corkami binaries that intentionally use the space to hide data.

Still trying to wrap my head around it all, but we want to avoid making a change that will overcorrect and hide bytes that should be visible. (This is an interesting adventure that has led me to the Windows Research Kernel; working on getting a test binary that I can open up and inspect in a debugger on Windows to verify that I understand what is going on).

Reading this I was thinking section.get_data() could be updated so it does this, but I think the way you have done may be simpler and clearer.

We might want to be careful of pre-initializing the data up to VirtualSize, this could result in a large data allocation when looking at binaries that intentionally have a bad VirtualSize.

@j-t-1 j-t-1 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small non-functional suggestions.

Also move _create_pe to the end of pefile_test.py.

I think move test_virtual_size_less_than_raw_size and test_virtual_size_greater_than_raw_size between test_empty_file_exception and test_relocated_memory_mapped_image, so we just have one class (removing Test_memory_mapped_image). Unsure if having just one class is best practice, but keeps with existing way.

Comment thread tests/pefile_test.py Outdated
Comment thread pefile.py Outdated
Comment thread tests/pefile_test.py Outdated
# Section content up to VirtualSize must be present.
self.assertEqual(
image[va : va + vsize],
b"\xcc" * vsize,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b"\xcc" * vsize,
b"\xCC" * vsize,

Comment thread tests/pefile_test.py Outdated
Comment thread tests/pefile_test.py Outdated
Comment thread pefile.py Outdated
Comment thread pefile.py Outdated
@j-t-1

j-t-1 commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Yea, this is an interesting approach that could give some tests tailored to checking specific cases without resorting to the regression tests.

Agree. Thanks @gaoflow.

I'm reading up on what bytes should be file-backed vs zero-filled, and there's some evidence that the Windows loader rounds up to the page size, which would require a cap more along the lines of min(SizeOfRawData, roundup(VirtualSize, SectionAlignment)) -- for "correct" binaries the decision might not matter, but there are some corkami binaries that intentionally use the space to hide data.

I did not think of this, you are likely correct that the memory mapped section is a multiple of page size.

Reorder the section-size comparison to read virtual_size < len(...) /
elif virtual_size > len(...) so each branch matches the VirtualSize-first
wording of its comment (behaviour unchanged: the outer guard already rules
out equality). Use an explicit \x00 and an uppercase \xCC sentinel per the
review suggestions.
@gaoflow

gaoflow commented Jun 29, 2026

Copy link
Copy Markdown
Author

Thanks for the suggestions — applied in 243ab13:

  • Reordered the mapped-image branch to read if virtual_size < len(section_data) / elif virtual_size > len(section_data), so each condition matches the VirtualSize-first wording of its comment (behaviour is unchanged — the outer len != virtual_size guard already rules out equality, so the elif covers the same case the else did).
  • Switched the zero-pad to an explicit b"\x00" and the test sentinel to uppercase b"\xCC".
  • Tweaked the BSS test docstring to "SizeOfRawData < VirtualSize".

The earlier round (test moved into pefile_test.py as _create_pe, pointer_to_raw_data rename, <IIHHI format) is already in the branch. test_virtual_size_less_than_raw_size and test_virtual_size_greater_than_raw_size both pass locally.

@j-t-1

j-t-1 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Thanks @gaoflow.

Can now change this line:
if virtual_size and len(section_data) != virtual_size:
to
if virtual_size:
I do not know if can be removed completely, but safer to keep as is for now.

Amending _create_pe:

section_data = b"\xB8\x00\x01\x00\x00"  #  mov eax, 0x100
section_data += b"\xC3"  # ret
# Append section data with NOPs so leakage is obvious
section_data += b"\x90" * (size_of_raw_data - len(section_data))

Saving the program created and running it fails.

I will try and get it working.
Then get the memory-mapped information when it is run (@nightlark is there an easy way?) so we can check the PR calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_memory_mapped_image leaves trash data in alignment regions

3 participants