This document outlines best practices for debugging complex cross-platform build failures and other intricate issues in SkiaSharp. These guidelines help avoid common pitfalls that lead to wasted time and compounding errors.
Problem: Jumping to fixes when errors appear rather than understanding the system first.
Solution: Before any fix, answer these questions:
- What is the expected behavior?
- What is the actual behavior?
- When did this start? What changed?
- Why does it happen on platform X but not platform Y?
Before any investigation, determine:
- What's the known-good state/commit?
- What's currently working vs failing?
- What changed since the last known-good state?
Maintain a running log during debugging sessions:
| Time/Commit | Change Made | Result | Notes |
|---|---|---|---|
| baseline | none | macOS ✓, Windows ✗ | |
| commit abc | added define X | macOS ✗, Windows ✗ | X broke macOS! |
Never lose track of cause and effect.
When platform X works and platform Y fails, the difference between X and Y IS the answer. Focus investigation there.
Example: If ARM64 builds pass but x86/x64 fail, immediately ask "what's x86-specific?" (e.g., AVX2, SSE instructions, different compiler flags).
For cross-platform issues, understand exactly which code paths are active on each platform:
- Draw the decision tree for preprocessor directives
- Evaluate each branch for EACH platform/configuration
- Don't proceed until you can explain what code path each platform takes
Example of tracing preprocessor logic:
// cpu.h - trace this for each platform:
#if defined(_MSC_VER) && _MSC_VER >= 1700 // Windows MSVC/clang-cl: YES
#define WEBP_MSC_AVX2 // macOS clang: NO
#endif
#if defined(__AVX2__) || defined(WEBP_MSC_AVX2) // Evaluate for each platform
#define WEBP_USE_AVX2
#endifDifferent compilers define different macros:
- clang (macOS/Linux): Does NOT define
_MSC_VERor__AVX2__by default - clang-cl (Windows): Defines
_MSC_VERfor MSVC compatibility - MSVC: Defines
_MSC_VER, may define__AVX2__with/arch:AVX2
Don't assume - verify with a minimal test or documentation if uncertain.
#define FEATURE 0
#if defined(FEATURE) // TRUE - macro EXISTS
// This code IS compiled!
#endif
#if FEATURE // FALSE - value is 0
// This code is NOT compiled
#endifSetting FEATURE=0 does NOT disable code guarded by defined(FEATURE).
If you need to apply a fix to "all platforms just to be safe," you probably don't understand the problem yet. Broad fixes:
- Obscure the root cause
- May have unintended side effects
- Make future debugging harder
Prefer surgical fixes that target exactly the affected platforms.
Running parallel builds that write to the same output file causes race conditions. When building multiple architectures:
- Build sequentially if they share output paths, OR
- Ensure output paths are distinct
Never say an error is "safe to ignore" without explaining exactly WHY. If you can't explain why it's safe, it's not safe.
- What is the known-good baseline?
- What changed since baseline?
- Can we reproduce the issue locally?
- Am I tracking changes and their effects?
- Have I traced conditional code paths for ALL affected platforms?
- Am I using success/failure patterns as diagnostic signals?
- Can I explain WHY the fix will work?
- Have I tested the hypothesis minimally first?
- Is this a surgical fix or a broad defensive fix?
- Have I verified any claims about compiler/platform behavior?
- Did I test one change at a time?
- Did the fix work on ALL affected platforms?
- If things got worse, did I immediately revert and re-analyze?
State hypotheses explicitly and test them:
- Hypothesis: "I believe X happens because Y"
- Test: "We can verify by doing Z"
- Result: Do Z, observe result
- Conclusion: Confirm or revise hypothesis
Example:
- Hypothesis: "Windows fails because clang-cl defines
_MSC_VER, triggering AVX2 code paths" - Test: "Check if
_MSC_VERis defined by examining the preprocessor output" - Result: Confirmed - clang-cl defines
_MSC_VER=1900 - Conclusion: Hypothesis confirmed, fix should target the AVX2 code path on Windows
- Stop immediately
- Revert the change
- Re-analyze - your mental model is wrong
- Don't pile more fixes on top of a broken fix
- Step back and re-read all the evidence
- Create a fresh tracking table
- Ask clarifying questions - maybe context is missing
- Request artifacts (build logs, binaries) to verify assumptions
When you see undefined symbol: xxx errors, the symbol is missing from the linked libraries.
# Compare linked libraries between platforms
docker run --rm -v $(pwd):/work debian:bookworm-slim bash -c \
"apt-get update -qq && apt-get install -y -qq binutils >/dev/null && \
echo '=== x64 ===' && readelf -d /work/output/native/linux/x64/libSkiaSharp.so | grep NEEDED && \
echo && echo '=== ARM64 ===' && readelf -d /work/output/native/linux/arm64/libSkiaSharp.so | grep NEEDED"If a library appears in one but not the other, that's your root cause.
The ninja file may have -lfoo but the linker silently skips it if it can't find the library:
# Check ninja file for expected libraries
grep "libs = " externals/skia/out/linux/arm64/obj/SkiaSharp.ninja
# Check if library exists in cross-compile sysroot
docker run --rm skiasharp-linux-gnu-cross-arm64 bash -c \
"ls -la /usr/aarch64-linux-gnu/lib/libfontconfig*"Common issue: The -dev package provides a broken symlink (libfoo.so -> libfoo.so.1.2.3)
but the actual .so.1.2.3 file is in the runtime package (libfoo1), not the dev package.
| Root Cause | Fix Location |
|---|---|
| Library missing from linker flags | native/linux/build.cake or externals/skia/third_party/BUILD.gn |
| Library missing from cross-compile sysroot | scripts/Docker/debian/clang-cross/*/Dockerfile |
| Indirect dependency (A→B→C missing) | Fix B's linkage or add C explicitly |
Symptom: undefined symbol: uuid_generate_random on ARM64 only
Investigation:
- x64 had
libfontconfig.so.1in DT_NEEDED - ARM64 was missing
libfontconfig.so.1in DT_NEEDED - But ninja file had
-lfontconfigfor BOTH builds
Root cause: Cross-compile Docker only had libfontconfig1-dev which provides a broken symlink.
The actual shared library is in libfontconfig1 (runtime package).
Fix: Download both -dev (headers) AND runtime (actual .so) packages in the Dockerfile.
When testing different SkiaSharp NuGet versions on WASM, native .wasm binaries are cached in bin/obj/_framework and are version-specific (tied to Emscripten version). Changing the NuGet version reference (e.g., via sed) without cleaning these directories leaves stale native files, producing false positive/negative results. Always use fresh project directories per version or clean bin/, obj/, and _framework before rebuilding.
| Do | Don't |
|---|---|
| Establish baseline first | Jump to fixing immediately |
| Track changes and effects | Lose track of what changed when |
| Trace conditional code completely | Skim for keywords |
| Use platform differences as clues | Ignore success patterns |
| Make one change at a time | Batch multiple changes |
| Verify claims with evidence | State assumptions as facts |
| Explain why errors are safe to ignore | Dismiss errors without explanation |
| Revert when fixes make things worse | Pile more fixes on top |