Skip to content

Commit 9c2eed8

Browse files
committed
Document uv_to_utf8_family
1 parent b31bc26 commit 9c2eed8

File tree

2 files changed

+28
-9
lines changed

2 files changed

+28
-9
lines changed

pod/perldelta.pod

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,11 @@ L<perlapi/C<utf8_to_uv>> replaces L<perlapi/C<utf8_to_uvchr>> (which is
414414
retained for backwards compatibility), but you should convert to use the
415415
new form, as likely you aren't using the old one safely.
416416

417+
To convert in the opposite direction, you can now use
418+
L<perlapi/C<uv_to_utf8>>. This is not a new function, but a new synonym
419+
for L<perlapi/C<uvchr_to_utf8>>. It is added so you don't have to learn
420+
two sets of names.
421+
417422
There are also two new functions, L<perlapi/C<strict_utf8_to_uv>> and
418423
L<perlapi/C<c9strict_utf8_to_uv>> which do the same thing except when
419424
the input string represents a code point that Unicode doesn't accept as
@@ -440,6 +445,12 @@ L<perlapi/C<utf8_to_uv_errors>> replaces L<perlapi/C<utf8n_to_uvchr_error>>.
440445
L<perlapi/C<utf8_to_uv_msgs>> replaces
441446
L<perlapi/C<utf8n_to_uvchr_msgs>>.
442447

448+
Also added are the inverse functions L<perlapi/C<uv_to_utf8_flags>>
449+
and L<perlapi/C<uv_to_utf8_msgs>>, which are synonyms for the existing
450+
functions, L<perlapi/C<uvchr_to_utf8_flags>> and
451+
L<perlapi/C<uvchr_to_utf8_flags_msgs>> respectively. These are provided only
452+
so you don't have to learn two sets of names.
453+
443454
=item *
444455

445456
Three new API functions are introduced to convert strings encoded in

utf8.c

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -121,14 +121,14 @@ S_new_msg_hv(pTHX_ const char * const message, /* The message text */
121121
=for apidoc uvoffuni_to_utf8_flags
122122
123123
THIS FUNCTION SHOULD BE USED IN ONLY VERY SPECIALIZED CIRCUMSTANCES.
124-
Instead, B<Almost all code should use L<perlapi/uvchr_to_utf8> or
125-
L<perlapi/uvchr_to_utf8_flags>>.
124+
Instead, B<Almost all code should use L<perlapi/uv_to_utf8> or
125+
L<perlapi/uv_to_utf8_flags>>.
126126
127127
This function is like them, but the input is a strict Unicode
128128
(as opposed to native) code point. Only in very rare circumstances should code
129129
not be using the native code point.
130130
131-
For details, see the description for L<perlapi/uvchr_to_utf8_flags>.
131+
For details, see the description for L<perlapi/uv_to_utf8_flags>.
132132
133133
=cut
134134
*/
@@ -155,9 +155,11 @@ const char super_cp_format[] = "Code point 0x%" UVXf " is not Unicode,"
155155
#define MASK UTF_CONTINUATION_MASK
156156

157157
/*
158-
=for apidoc uvchr_to_utf8_flags_msgs
158+
=for apidoc uv_to_utf8_msgs
159+
=for apidoc_item uvchr_to_utf8_flags_msgs
159160
160-
THIS FUNCTION SHOULD BE USED IN ONLY VERY SPECIALIZED CIRCUMSTANCES.
161+
These functions are identical. THEY SHOULD BE USED IN ONLY VERY SPECIALIZED
162+
CIRCUMSTANCES.
161163
162164
Most code should use C<L</uvchr_to_utf8_flags>()> rather than call this directly.
163165
@@ -367,26 +369,32 @@ Perl_uvoffuni_to_utf8_flags_msgs(pTHX_ U8 *d, UV input_uv, UV flags, HV** msgs)
367369
}
368370

369371
/*
370-
=for apidoc uvchr_to_utf8
372+
=for apidoc uv_to_utf8
373+
=for apidoc_item uv_to_utf8_flags
374+
=for apidoc_item uvchr_to_utf8
371375
=for apidoc_item uvchr_to_utf8_flags
372376
373377
These each add the UTF-8 representation of the native code point C<uv> to the
374378
end of the string C<d>; C<d> should have at least C<UVCHR_SKIP(uv)+1> (up to
375379
C<UTF8_MAXBYTES+1>) free bytes available. The return value is the pointer to
376380
the byte after the end of the new character. In other words,
377381
378-
d = uvchr_to_utf8(d, uv);
382+
d = uv_to_utf8(d, uv);
379383
380384
This is the Unicode-aware way of saying
381385
382386
*(d++) = uv;
383387
384-
C<flags> is used to make some classes of code points problematic in some way.
385-
C<uvchr_to_utf8> is effectively the same as calling C<uvchr_to_utf8_flags>
388+
(C<uvchr_to_utf8> is a synonym for C<uv_to_utf8>.)
389+
390+
C<uv_to_utf8_flags> is used to make some classes of code points problematic in
391+
some way. C<uv_to_utf8> is effectively the same as calling C<uv_to_utf8_flags>
386392
with C<flags> set to 0, meaning no class of code point is considered
387393
problematic. That means any input code point from 0..C<IV_MAX> is considered
388394
to be fine. C<IV_MAX> is typically 0x7FFF_FFFF in a 32-bit word.
389395
396+
(C<uvchr_to_utf8_flags> is a synonym for C<uv_to_utf8_flags>).
397+
390398
A code point can be problematic in one of two ways. Its use could just raise a
391399
warning, and/or it could be forbidden with the function failing, and returning
392400
NULL.

0 commit comments

Comments
 (0)