Extend IngestionPipeline to support processing documents without a file system reader#7488
Extend IngestionPipeline to support processing documents without a file system reader#7488Copilot wants to merge 8 commits into
Conversation
…ngestionDocument), update file-system methods to take reader param - Remove IngestionDocumentReader from constructor and field - Add new ProcessAsync(IngestionDocument) overload returning Task<IngestionDocument> - Add IngestionDocumentReader reader parameter to file-system ProcessAsync methods - Add ProcessDocument activity constant to DiagnosticsConstants - Replace var with explicit types in pipeline and tests - Update all DataIngestor.cs template/snapshot files - Add CanProcessDocumentWithoutReader test - Update README.md with pipeline usage examples Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/54f5e258-8414-40a0-b8b5-953677d1cce2 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/54f5e258-8414-40a0-b8b5-953677d1cce2 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot The Microsoft.Extensions.AI.Templates.Tests.AIChatWebExecutionTests tests are failing, fix it:
Working Directory: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost
Local Shipping Path: /mnt/vss/_work/1/s/artifacts/packages/Release/Shipping
NuGet Packages Path: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/packages
Command failed with non-zero exit code: 1
Standard Output:
Determining projects to restore...
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-mr8r-92fq-pj8p [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-q834-8qmm-v933 [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
Failed to restore /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj (in 8.11 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.Web/AIChatWeb_gh_aais_A_T_ID_F_net9.Web.csproj (in 10.66 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj (in 13.84 sec).
… template tests Update OpenTelemetry.Exporter.OpenTelemetryProtocol from 1.14.0 to 1.15.3 to fix GHSA-mr8r-92fq-pj8p and GHSA-q834-8qmm-v933 vulnerabilities. Also update companion OTel packages to their latest versions. Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/c6522938-3c86-4211-a016-ca79ad9b591e Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Fixed in 6258a67. Updated OpenTelemetry packages in
|
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot please sync this branch with latest changes from the target branch (data-ingestion-preview2), solve all the merge conflicts. Make sure the tests are passing before pushing the changes.
Done in 991c14a. Merged latest |
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot please address my feedback
Fixed both |
🎉 Good job! The coverage increased 🎉
Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1481649&view=codecoverage-tab |
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot fix following build errors:
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net10.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net10.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net8.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net8.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net9.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net9.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net462]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extension
Fixed both errors: sorted |
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot address my feedback
| { | ||
| Assert.NotEqual(Guid.Empty, retrieved[i].Key); | ||
| Assert.NotEmpty(retrieved[i].SerializedContent!); | ||
| Assert.NotNull(retrieved[i].Content); |
There was a problem hiding this comment.
It was accidentally removed during the merge conflict resolution. Restored it.
| IngestionDocumentReader reader = CreateReader(); | ||
| using IngestionPipeline pipeline = new(CreateChunker(), vectorStoreWriter); | ||
| List<IngestionResult> ingestionResults = await pipeline.ProcessAsync(reader, _sampleFiles).ToListAsync(); |
There was a problem hiding this comment.
just inline it
| IngestionDocumentReader reader = CreateReader(); | |
| using IngestionPipeline pipeline = new(CreateChunker(), vectorStoreWriter); | |
| List<IngestionResult> ingestionResults = await pipeline.ProcessAsync(reader, _sampleFiles).ToListAsync(); | |
| using IngestionPipeline pipeline = new(CreateChunker(), vectorStoreWriter); | |
| List<IngestionResult> ingestionResults = await pipeline.ProcessAsync(CreateReader(), _sampleFiles).ToListAsync(); |
There was a problem hiding this comment.
Done - inlined the reader variable.
| { | ||
| Assert.NotEqual(Guid.Empty, retrieved[i].Key); | ||
| Assert.NotEmpty(retrieved[i].SerializedContent!); | ||
| Assert.NotNull(retrieved[i].Content); |
There was a problem hiding this comment.
Why this assert got deleted?
There was a problem hiding this comment.
Same as above - accidentally removed during merge. Restored.
…r review feedback
IngestionPipeline.cs: remove_readerfield andreaderfrom constructor, addProcessAsync(IngestionDocument)returningTask<IngestionDocument>, addIngestionDocumentReader readeras first param to file-system ProcessAsync overloads and private helper, replace allvarwith explicit typesProcessDocumentactivity name toDiagnosticsConstants.csDataIngestor.csfiles (1 source template + 5 verified snapshots) to pass reader toProcessAsyncinstead of constructorIngestionPipelineTests.cs: update pipeline constructions & ProcessAsync calls, add test for document-basedProcessAsyncwithout reader, replacevarwith explicit typesREADME.mdwith pipeline creation and reader connection exampleMicrosoft.Extensions.DataIngestion.Teststo verify (124 passed, 11 skipped × 3 TFMs)Microsoft.Extensions.AI.Templates.IntegrationTestssnapshot tests to verify (5 passed)eng/packages/ProjectTemplates.propsto fix NU1902 vulnerability warnings