Skip to content

Conversation

@mworrell
Copy link
Member

No description provided.

@mworrell mworrell changed the title mprove word splitting for string normalization Improve word splitting for string normalization Nov 13, 2025
@mworrell mworrell requested a review from Copilot November 13, 2025 12:32
Copilot finished reviewing on behalf of mworrell November 13, 2025 12:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the word splitting logic in string normalization to better handle various Unicode space and control characters. The changes refactor separator detection into three distinct macros and enhance the normalize_words functions to properly handle characters that should be ignored versus those that should split or be mapped to spaces.

Key changes:

  • Introduces three new character classification macros: is_sep, is_map_space, and is_word_ignore_char for more granular control over character handling
  • Updates normalize_words_word and normalize_words_sep functions to use the new macros for improved word boundary detection
  • Adds test coverage for multi-word normalization scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/z_string_normalize.erl Refactors character classification into dedicated macros and updates word splitting logic to handle separators, space-like characters, and ignorable characters distinctly
test/z_string_test.erl Renames existing single-word test and adds new multi-word normalization test case to verify improved word splitting behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mworrell mworrell merged commit d383b8e into master Nov 13, 2025
3 checks passed
@mworrell mworrell deleted the string-normalize-wordseps branch November 13, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants