-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Add Compliance tests for GB18030-2022 #118075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces comprehensive compliance testing for the GB18030-2022 Chinese character encoding standard by adding a new test project with thorough test coverage across multiple .NET API surfaces.
Key changes:
- Creates a complete test suite for GB18030 encoding compliance validation
- Implements tests for string operations, file I/O, directory operations, console I/O, and encoding roundtrips
- Adds test data file with Chinese character test cases for both short and extended length scenarios
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
Compliance.Tests.csproj | New test project configuration with GB18030 test files and dependencies |
TestHelper.cs | Central test helper providing GB18030 encoding, test data parsing, and culture/comparison configurations |
StringTests.cs | Comprehensive string operation tests including comparison, search, equality, and manipulation methods |
FileTests.cs | File I/O operation tests for reading/writing text with GB18030 encoding |
FileTestBase.cs | Abstract base class for file system operation tests with GB18030 filenames |
FileInfoTests.cs | FileInfo-specific implementation of file system tests |
DirectoryTests.cs | Directory operation tests using GB18030 directory names |
DirectoryTestBase.cs | Abstract base class for directory operation tests with nested directory support |
DirectoryInfoTests.cs | DirectoryInfo-specific implementation with enumeration tests |
ConsoleTests.cs | Console I/O tests for standard input/output/error streams with GB18030 encoding |
EncodingTests.cs | Basic encoding roundtrip validation tests |
Level3+Amendment_Test_Data_for_Mid_to_High_Volume_cases.txt | Test data file containing GB18030 character sequences |
Common.Tests.slnx | Solution file update to include the new compliance test project |
Comments suppressed due to low confidence (1)
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/StringTests.cs:13
- [nitpick] The constant name 'Dummy' is ambiguous and doesn't clearly indicate its purpose as a replacement character for testing. Consider renaming to 'ReplacementChar' or 'TestReplacementCharacter'.
private const string Dummy = "\uFFFF";
Tagging subscribers to this area: @dotnet/area-meta |
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/ConsoleTests.cs
Outdated
Show resolved
Hide resolved
[Theory] | ||
[MemberData(nameof(TestHelper.DecodedTestData), MemberType = typeof(TestHelper))] | ||
public void CreateSubdirectory(string decoded) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do clean up after executing every test in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This apply on other directory and file tests if it is not already doing that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be cleaning up, TempDirectory
gets deleted on Dispose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not seeing the dispose is explicitly called here. Do you mean the temp directory has a finalizer that eventually likely calling the dispose? should we do teh following instead?
using var subDirInfo = TempDirectory.CreateSubdirectory(gb18030Line);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dispose is in the base class:
runtime/src/libraries/Common/tests/ComplianceTests/GB18030/Tests/DirectoryTestBase.cs
Line 77 in ba384e5
TempDirectory.Delete(true); |
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/StringTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/StringTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/StringTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/StringTests.cs
Outdated
Show resolved
Hide resolved
@jozkee I'll wait your changes specifically in the string tests and then will take another look then. Also, it would be good if you could add a specific net core app test that tests |
This test currently verifies whatever data is taken from the Unicode file UnicodeData.txt. If we happened to use an older version that doesn’t include the GB18030 characters, the test would still pass. While this is unlikely, I suggest adding checks in your new tests for some of the newly added GB18030 characters. That way, regardless of the Unicode version in use, we can be confident that GB18030 is covered. |
Tests for Regex, Char and CharUnicodeInfo are adding a considerable chunk of runtime in net481 (2.5 mins in my PC), it finishes in ~8s otherwise. For comparison, it takes ~30s in net10. Time increase is because I made theories for all characters in GB18030 ranges which are ~9k. We could skip them in net481 as you suggested, @tarekgh.
|
5bad2f5
to
ba384e5
Compare
{ | ||
string utf32String = testCase.Utf32CodeValue; | ||
Assert.True(char.IsLetter(utf32String, 0)); | ||
Assert.True(char.IsLetterOrDigit(utf32String, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does all added characters are letters? no other Unicode categories we should check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I discovered this because I didn't want to write code to handle all categories.
src/libraries/Common/tests/ComplianceTests/GB18030/Tests/CharUnicodeInfoTests.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jozkee!
No description provided.