fix: fix byod flow and update integrated vectorization to work with byod flow #1905

Harsh-Microsoft · 2025-09-19T10:35:27Z

Purpose

This pull request introduces significant enhancements to the document chunking and embedding pipeline, primarily by adding a custom Azure Function and updating the Azure Cognitive Search skillset to use a more structured approach for handling document pages and their chunk numbers. It also improves the robustness of tests and adjusts authentication settings for Azure Search. The most important changes are grouped below:

1. Integrated Vectorization Pipeline Improvements

Added a custom Azure Function (combine_pages_and_chunknos.py) to combine page texts and chunk numbers into a single array of objects, exposed as a WebApiSkill endpoint for use in the Azure Cognitive Search skillset. [1] [2] [3]
Updated the integrated vectorization skillset to:
- Include the new WebApiSkill for combining pages and chunk numbers.
- Adjust the AzureOpenAIEmbeddingSkill and index projections to operate on the new pages_with_chunks structure.
- Add a ShaperSkill to bundle metadata into a complex object for each chunk.
- Update field mappings and skillset composition accordingly. [1] [2] [3] [4] [5] [6]

2. Test Robustness and Coverage

Refactored functional tests to validate response structure and key fields rather than exact citation content, making tests more resilient to dynamic data (e.g., SAS tokens).
Improved request validation in tests by checking the presence and structure of key request fields instead of matching the entire payload.
Updated skillset creation tests to verify the presence and configuration of the new WebApiSkill and related outputs.

3. Application Logic and Authentication Adjustments

Changed the Azure Search authentication type to use user_assigned_managed_identity and include the managed identity resource ID, improving security and flexibility.
Fixed citation URL generation to ensure a SAS token placeholder is always present, preventing issues with missing tokens.
Cleaned up unused field mappings in the search configuration dictionary for clarity and maintainability.

Does this introduce a breaking change?

Yes
No

How to Test

Get the code

git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install

Test the code

What to Check

Verify that the following are valid

...

Other Information

…d vectorization skillset

fix byod flow and update integrated vectorization to work with byod flow

dd5e30e

Harsh-Microsoft requested review from Avijit-Microsoft, Roopan-Microsoft, Prajwal-Microsoft, Fr4nc3, Vinay-Microsoft and aniaroramsft as code owners September 19, 2025 10:35

Harsh-Microsoft added 10 commits September 19, 2025 16:07

refactor: remove unnecessary logging

b60aa54

enhance test assertions for Azure BYOD response and updated integrate…

201c000

…d vectorization skillset

adjusting formatting

7523e4e

Merge branch 'dev' into hb-bug-23482

aa0963d

Add BACKEND_URL parameter for function module configuration

ecd97cf

Update managed identity type and enhance allowed FQDN list for CosmosDB

2fa8579

Merge branch 'dev' into hb-bug-23482

01dd998

rebuild main.json

94bf8e7

Merge branch 'dev' into hb-bug-23482

7763f12

rebuilt main.json

3958076

Prajwal-Microsoft approved these changes Sep 30, 2025

View reviewed changes

Prajwal-Microsoft merged commit 28e0a1e into Azure-Samples:dev Sep 30, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fix byod flow and update integrated vectorization to work with byod flow #1905

fix: fix byod flow and update integrated vectorization to work with byod flow #1905

Uh oh!

Harsh-Microsoft commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fix: fix byod flow and update integrated vectorization to work with byod flow #1905

fix: fix byod flow and update integrated vectorization to work with byod flow #1905

Uh oh!

Conversation

Harsh-Microsoft commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Does this introduce a breaking change?

How to Test

What to Check

Other Information

Uh oh!

Uh oh!

Uh oh!

Harsh-Microsoft commented Sep 19, 2025 •

edited

Loading