Skip to content

Commit 22000db

Browse files
committed
Enchanced js/regex/duplicate-in-character-class's qhelp
1 parent 42a880b commit 22000db

File tree

1 file changed

+40
-10
lines changed

1 file changed

+40
-10
lines changed

javascript/ql/src/RegExp/DuplicateCharacterInCharacterClass.qhelp

Lines changed: 40 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,43 @@
55

66
<overview>
77
<p>
8-
Character classes in regular expressions represent sets of characters, so there is no need to specify
9-
the same character twice in one character class. Duplicate characters in character classes are at best
10-
useless, and may even indicate a latent bug.
8+
Character classes in regular expressions (denoted by square brackets <code>[]</code>) represent sets of characters where the pattern matches any single character from that set. Since character classes are sets, specifying the same character multiple times is redundant and often indicates a programming error.
119
</p>
1210

11+
<p>
12+
Common mistakes include:
13+
</p>
14+
<ul>
15+
<li>Using square brackets <code>[]</code> instead of parentheses <code>()</code> for grouping alternatives</li>
16+
<li>Misunderstanding that special regex characters like <code>|</code>, <code>*</code>, <code>+</code>, <code>()</code>, <code>-</code> etc. work the same inside character classes as outside</li>
17+
<li>Accidentally duplicating characters or escape sequences that represent the same character</li>
18+
</ul>
19+
1320
</overview>
1421
<recommendation>
1522

16-
<p>If the character was accidentally duplicated, remove it. If the character class was meant to be a
17-
group, replace the brackets with parentheses.</p>
23+
<p>
24+
Examine each duplicate character to determine the intended behavior:
25+
</p>
26+
<ul>
27+
<li><strong>If you see <code>|</code> inside square brackets (e.g., <code>[a|b|c]</code>)</strong>: This is usually a mistake. The author likely intended alternation. Replace the character class with a group: <code>(a|b|c)</code></li>
28+
<li><strong>For patterns like <code>[m|x]</code> in unit expressions</strong>: These often represent alternatives like "em" or "ex". Convert to proper alternation: <code>(em|ex)</code></li>
29+
<li>If trying to match alternative strings, use parentheses <code>()</code> for grouping instead of square brackets</li>
30+
<li>If the duplicate was truly accidental, remove the redundant characters</li>
31+
<li>If trying to use special regex operators inside square brackets, note that most operators (like <code>|</code>) are treated as literal characters</li>
32+
</ul>
1833

34+
<p>
35+
<strong>Important:</strong> Simply removing <code>|</code> characters from character classes is rarely the correct fix. Instead, analyze the pattern to understand what the author intended to match.
36+
</p>
1937

2038
</recommendation>
2139
<example>
2240
<p>
23-
In the following example, the character class <code>[password|pwd]</code> contains two instances each
24-
of the characters <code>d</code>, <code>p</code>, <code>s</code>, and <code>w</code>. The programmer
25-
most likely meant to write <code>(password|pwd)</code> (a pattern that matches either the string
26-
<code>"password"</code> or the string <code>"pwd"</code>), and accidentally mistyped the enclosing
27-
brackets.
41+
<strong>Example 1: Confusing character classes with groups</strong>
42+
</p>
43+
<p>
44+
The pattern <code>[password|pwd]</code> does not match "password" or "pwd" as intended. Instead, it matches any single character from the set <code>{p, a, s, w, o, r, d, |}</code>. Note that <code>|</code> has no special meaning inside character classes.
2845
</p>
2946

3047
<sample src="examples/DuplicateCharacterInCharacterClass.js" />
@@ -33,10 +50,23 @@ brackets.
3350
To fix this problem, the regular expression should be rewritten to <code>/(password|pwd) =/</code>.
3451
</p>
3552

53+
<p>
54+
<strong>Example 2: CSS unit matching</strong>
55+
</p>
56+
<p>
57+
The pattern <code>r?e[m|x]</code> appears to be trying to match "rem" or "rex", but actually matches "re" followed by any of the characters <code>{m, |, x}</code>. The correct pattern should be <code>r?e(m|x)</code> or <code>(rem|rex)</code>.
58+
</p>
59+
60+
<p>
61+
Similarly, <code>v[h|w|min|max]</code> should be <code>v(h|w|min|max)</code> to properly match "vh", "vw", "vmin", or "vmax".
62+
</p>
63+
3664
</example>
3765
<references>
3866

3967
<li>Mozilla Developer Network: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions">JavaScript Regular Expressions</a>.</li>
68+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes">Character Classes</a> - Details on how character classes work.</li>
69+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges">Groups and Ranges</a> - Proper use of grouping with parentheses.</li>
4070

4171
</references>
4272
</qhelp>

0 commit comments

Comments
 (0)