fix: respect VirtualSize when building the memory-mapped image (#392)#542
fix: respect VirtualSize when building the memory-mapped image (#392)#542gaoflow wants to merge 2 commits into
Conversation
|
Thanks @gaoflow. Reading this I was thinking Above you mention BSS, which would be the obvious example section where |
|
@nightlark the test @gaoflow has done uses a function to create a non-functional PE file. What do you think of making it more generic and thus able to be used for other tests? Outside of this PR (but maybe just changing this PR so it the arguments are keyword only). |
|
Thanks @j-t-1 — all addressed in
Both new tests pass locally ( This pull request was prepared with the assistance of AI, under my direction and review. |
…rrera#392) get_memory_mapped_image() previously appended the full SizeOfRawData bytes for each section regardless of Misc_VirtualSize. According to the PE/COFF spec the Windows loader only maps Misc_VirtualSize bytes into memory: - if VirtualSize < SizeOfRawData, the extra disk bytes are file-alignment padding and must not appear in the mapped view; - if VirtualSize > SizeOfRawData, the gap is BSS and must be zero-padded in the mapped view. Clip the section data to VirtualSize and zero-pad when needed. Fixes erocarrera#392. Regression tests live in pefile_test.py (Test_memory_mapped_image) and build a minimal PE32 via a reusable _create_pe helper whose section header is packed with the correct tail layout (<IIHHI rather than the subtly-wrong <IHHHI).
Yea, this is an interesting approach that could give some tests tailored to checking specific cases without resorting to the regression tests.
I'm reading up on what bytes should be file-backed vs zero-filled, and there's some evidence that the Windows loader rounds up to the page size, which would require a cap more along the lines of Still trying to wrap my head around it all, but we want to avoid making a change that will overcorrect and hide bytes that should be visible. (This is an interesting adventure that has led me to the Windows Research Kernel; working on getting a test binary that I can open up and inspect in a debugger on Windows to verify that I understand what is going on).
We might want to be careful of pre-initializing the data up to VirtualSize, this could result in a large data allocation when looking at binaries that intentionally have a bad VirtualSize. |
j-t-1
left a comment
There was a problem hiding this comment.
Small non-functional suggestions.
Also move _create_pe to the end of pefile_test.py.
I think move test_virtual_size_less_than_raw_size and test_virtual_size_greater_than_raw_size between test_empty_file_exception and test_relocated_memory_mapped_image, so we just have one class (removing Test_memory_mapped_image). Unsure if having just one class is best practice, but keeps with existing way.
| # Section content up to VirtualSize must be present. | ||
| self.assertEqual( | ||
| image[va : va + vsize], | ||
| b"\xcc" * vsize, |
There was a problem hiding this comment.
| b"\xcc" * vsize, | |
| b"\xCC" * vsize, |
Agree. Thanks @gaoflow.
I did not think of this, you are likely correct that the memory mapped section is a multiple of page size. |
Reorder the section-size comparison to read virtual_size < len(...) / elif virtual_size > len(...) so each branch matches the VirtualSize-first wording of its comment (behaviour unchanged: the outer guard already rules out equality). Use an explicit \x00 and an uppercase \xCC sentinel per the review suggestions.
|
Thanks for the suggestions — applied in 243ab13:
The earlier round (test moved into |
|
Thanks @gaoflow. Can now change this line: Amending Saving the program created and running it fails. I will try and get it working. |
Summary
Fixes #392.
get_memory_mapped_image()returned raw section data up toSizeOfRawDatabytes for every section, ignoring
Misc_VirtualSize. The PE/COFF spec(and the Windows loader) define two distinct behaviors:
VirtualSize < SizeOfRawDataVirtualSizebytes are mapped; the remaining disk bytes are file-alignment padding and must not appear in the memory viewVirtualSize > SizeOfRawDataSizeOfRawDatabytes are mapped, then the section is zero-padded toVirtualSize(BSS region)Root cause
The single line responsible:
section.get_data()(withoutignore_padding=True) returns exactlySizeOfRawDatabytes, so file-alignment padding silently bleeds into the mapped view whenever
VirtualSize < SizeOfRawData, and the BSS tail is truncated wheneverVirtualSize > SizeOfRawData.Fix
After fetching the raw section data, clip it to
VirtualSize(drops diskpadding) and zero-pad it to
VirtualSizeif shorter (fills BSS):Two regression tests covering both directions of the inequality are added to
tests/export_test.pyusing a synthetic, self-contained PE binary so theyrequire no external test files.
This pull request was prepared with the assistance of AI, under my direction and review.