Fix CI flakiness: async race conditions, disposal crash, and stdout truncation#9939
Fix CI flakiness: async race conditions, disposal crash, and stdout truncation#9939marcpopMSFT merged 1 commit intomainfrom
Conversation
b420984 to
b64ca55
Compare
25-pass milestone reachedFinal validation results:
The branch has been squashed to a single clean commit. All 4 files changed address the 3 root causes identified through analysis of 32+ failed main-branch builds. Build IDs (sample of passing builds)Builds 1324187-1326432 on definition 24 (dnceng-public/public). |
src/Microsoft.TemplateEngine.Edge/BuiltInManagedProvider/GlobalSettings.cs
Show resolved
Hide resolved
27fc5d0 to
e28a87b
Compare
🎯 Milestone: 20 consecutive CI passes20/25 consecutive passes achieved with 0 failures (targeting 25). Branch squashed and force-pushed at this milestone. Validation continues. Build IDs (all succeeded)1327510, 1327563, 1327613, 1327614, 1327625, 1327629, 1327635, 1327658, 1327660, 1327668, 1327682, 1327697, 1328330, 1328376, 1328412, 1328477, 1328626, 1328758, 1328841, 1328908 Baseline on main: ~63% failure rate → Current: 0% failure rate across 20 builds. |
c8ea831 to
26933f3
Compare
🏆 Goal Achieved: 25 consecutive CI passes!25/25 consecutive passes with 0 failures. Branch squashed and force-pushed. Summary
All 25 Build IDs (all succeeded)1327510, 1327563, 1327613, 1327614, 1327625, 1327629, 1327635, 1327658, 1327660, 1327668, 1327682, 1327697, 1328330, 1328376, 1328412, 1328477, 1328626, 1328758, 1328841, 1328908, 1328963, 1328994, 1329063, 1329107, 1329184 |
26933f3 to
8901e02
Compare
…artifact upload collision
Root causes fixed (from ~63% baseline failure rate to 25/25 consecutive passes):
1. Parallel.For async void anti-pattern in TemplatePackageManager
- Parallel.For(0, N, async (i) => ...) silently converts the lambda to async void
- Parallel.For only waits for the synchronous portion; scanResults[] left with nulls
- Fix: await Task.WhenAll(Enumerable.Range(...).Select(async ...))
2. GlobalSettings disposal race condition
- FileSystemWatcher callbacks fire on threadpool threads that race with Dispose()
- Original: _watcher?.Dispose() then _disposed = true (wrong order)
- Fix: set _disposed = true first, add pre-lock and post-lock disposal guards
- Stress testing confirms post-lock guard fires in ~99% of disposal races
- Added disposal guards in GlobalSettingsTemplatePackageProvider and
TemplatePackageManager event handlers for cascading disposal scenarios
3. Console output truncation
- SimpleConsoleLogger background queue not fully drained before exit
- Fix: Console.Out.Flush()/Console.Error.Flush() after logger disposal
4. Artifact upload collision (all platforms)
- Both Debug and Release matrix jobs upload to same PackageArtifacts and
BlobArtifacts containers simultaneously via Arcade SDK publish targets
- Concurrent blob chunk uploads corrupt each other
- Fix: /p:Publish=false for Debug builds on Windows, macOS, and Linux
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8901e02 to
20d03d6
Compare
Fix CI Flakiness: 4 Root Causes → 25/25 Consecutive Passes
Baseline: ~63% failure rate (20/32 recent main branch builds failed)
Result: 25 consecutive passing CI runs with 0 failures
Root Causes Fixed
1.
Parallel.Forasync void anti-pattern (TemplatePackageManager.cs)Parallel.For(0, N, async (i) => ...)silently converts the lambda toasync void.Parallel.Foronly waits for the synchronous portion, leavingscanResults[]with nulls and causingNullReferenceException.Fix:
await Task.WhenAll(Enumerable.Range(...).Select(async ...))2. GlobalSettings disposal race condition (GlobalSettings.cs + 2 others)
FileSystemWatchercallbacks fire on threadpool threads that race withDispose(). The original code set_disposed = trueafter_watcher?.Dispose(), creating a window for callbacks to run on disposed state.Fix:
_disposed = truefirst)FileChangedGlobalSettingsTemplatePackageProvider.OnGlobalSettingsChangedandTemplatePackageManagerevent handlerDisposal Guard Validation: Stress testing (100 iterations of dispose-during-callback) confirms:
_disposedset)ObjectDisposedExceptioncatch: 0 hits (defense-in-depth for future subscribers)3. Console output truncation (ExecutableCommand.cs)
SimpleConsoleLogger's background queue not fully drained before exit.Fix:
Console.Out.Flush()/Console.Error.Flush()after logger disposal4. Artifact upload collision — all platforms (azure-pipelines-pr.yml)
Both Debug and Release matrix jobs upload to the same
PackageArtifactsandBlobArtifactscontainers simultaneously via Arcade SDK publish targets. Concurrent blob chunk uploads corrupt each other, causing ""Blob is incomplete"" errors.Fix:
/p:Publish=falsefor Debug builds on Windows, macOS, and LinuxFiles Changed
azure-pipelines-pr.yml— artifact collision fixsrc/.../Settings/TemplatePackageManager.cs— Parallel.For + disposal guardsrc/.../BuiltInManagedProvider/GlobalSettings.cs— disposal ordering + guardssrc/.../BuiltInManagedProvider/GlobalSettingsTemplatePackageProvider.cs— disposal guardtest/.../GlobalSettingsTests.cs— test handler commenttools/.../Commands/ExecutableCommand.cs— console flushCI Validation
25 consecutive passing builds on
feature/ci-flakinessbranch (build IDs: 1327510–1329184)