Skip to content

Specify what constitutes white-space characters #69

@tahonermann

Description

@tahonermann

The C++ standard defines behavior that depends on whether a character constitutes white-space, but never defines what those characters are. Uses of the "whitespace" and "white-space" terms appear in:

P2178 proposal 2 sought to clarify the set of characters that constitute white-space and proposed the following set. These characters all satisfy the immutable Pattern_White_Space property (see UAX #44 and/or search for Pattern_White_Space in the UCD).

  • U+0009: CHARACTER TABULATION
  • U+000A: LINE FEED (LF)
  • U+000B: LINE TABULATION
  • U+000C: FORM FEED (FF)
  • U+000D: CARRIAGE RETURN (CR)
  • U+0020: SPACE
  • U+0085: NEXT LINE (NEL)
  • U+200E: LEFT-TO-RIGHT MARK
  • U+200F: RIGHT-TO-LEFT MARK
  • U+2028: LINE SEPARATOR
  • U+2029: PARAGRAPH SEPARATOR

The above set of characters excludes the following characters that satisfy the (not immutable) White_Space property (see UAX #44 and/or search for White_Space in the UCD).

  • U+00A0: NO-BREAK SPACE
  • U+1680: OGHAM SPACE MARK
  • U+2000: EN QUAD
  • U+2001: EM QUAD
  • U+2002: EN SPACE
  • U+2003: EM SPACE
  • U+2004: THREE-PER-EM SPACE
  • U+2005: FOUR-PER-EM SPACE
  • U+2006: SIX-PER-EM SPACE
  • U+2007: FIGURE SPACE
  • U+2008: PUNCTUATION SPACE
  • U+2009: THIN SPACE
  • U+200A: HAIR SPACE
  • U+202F: NARROW NO-BREAK SPACE
  • U+205F: MEDIUM MATHEMATICAL SPACE
  • U+3000: IDEOGRAPHIC SPACE

When addressing this issue, we may want to take the opportunity to replace the existing "whitespace" and "white-space" terminology with "blank space"; ISO guidance may require such a renaming in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clarificationSomething isn't clearhelp wantedExtra attention is neededpaper neededA paper proposing a specific solution is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions