Skip to content

Conversation

@byroot
Copy link
Member

@byroot byroot commented Jan 19, 2026

While trying to speedup various File.* methods, I realized they were way slower and complicated than they should for no apparent reason. However after asking Nobu he explained that Shift JIS encoded text can contain 0x5C (ASCII backslash) as the second byte of a two byte character sequence.

Since on Windows 0x5C is File::ALT_SEPARATOR, this can easily break naive path related algorithms searching for directory separators.

cc @eregon what do you think of that spec? I think it would have helped me figure things out way sooner, so would have been valuable. If you agree I can generalize it to more "path handling" methods.

@byroot byroot force-pushed the shiftjs-windows-path-operations branch 2 times, most recently from e8c8ec8 to 7cfe29b Compare January 19, 2026 05:53
While trying to speedup various `File.*` methods, I realized they
were way slower and complicated than they should for no apparent
reason. However after asking Nobu he explained that Shift JIS encoded
text can contain `0x5C` (ASCII backslash) as the second byte of a
two byte character sequence.

Since on Windows `0x5C` is `File::ALT_SEPARATOR`, this can easily
break naive path related algorithms searching for directory separators.
@byroot byroot force-pushed the shiftjs-windows-path-operations branch from 7cfe29b to 38a2dbf Compare January 19, 2026 12:48
@byroot byroot changed the title File.dirname: add a spec for Shift-JS handling File.dirname: add a spec for Shift JIS handling Jan 19, 2026
@byroot byroot requested a review from trinistr January 19, 2026 12:49
Comment on lines +84 to +102
valid.reject do |enc|
path = "/foo/bar".encode(enc)
expected = "/foo".encode(enc)

File.dirname(path) == expected
end.should == []

invalid.reject do |enc|
path = "/foo/bar".encode(enc)
rescue Encoding::ConverterNotFoundError
true
else
begin
File.dirname(path)
false
rescue Encoding::CompatibilityError
true
end
end.should == []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be made easier to understand:

Suggested change
valid.reject do |enc|
path = "/foo/bar".encode(enc)
expected = "/foo".encode(enc)
File.dirname(path) == expected
end.should == []
invalid.reject do |enc|
path = "/foo/bar".encode(enc)
rescue Encoding::ConverterNotFoundError
true
else
begin
File.dirname(path)
false
rescue Encoding::CompatibilityError
true
end
end.should == []
valid.should.all? do |enc|
path = "/foo/bar".encode(enc)
expected = "/foo".encode(enc)
File.dirname(path) == expected
end
invalid.should.none? do |enc|
path = "/foo/bar".encode(enc)
File.dirname(path)
rescue Encoding::ConverterNotFoundError, Encoding::CompatibilityError
false
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I'm just a contributor, not sure how much my review matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants