-
Notifications
You must be signed in to change notification settings - Fork 7.9k
ext/standard: handle html entities empty string before processing #19220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's worth optimizing this edge case, anyway here's some feedback
Also note that benchmarks should be done with debug disabled. |
master
branch
|
Anyway, many functions accept empty string and return modified one. |
The first call generates deprecation error because the function argument is |
ext/standard/html.c
Outdated
@@ -1320,6 +1320,10 @@ static void php_html_entities(INTERNAL_FUNCTION_PARAMETERS, int all) | |||
Z_PARAM_BOOL(double_encode); | |||
ZEND_PARSE_PARAMETERS_END(); | |||
|
|||
if (EXPECTED(ZSTR_LEN(str) == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This most certainly is not expected. You should not just benchmark the case of passing an empty string, but also non-empty strings of varying lengths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? as you can see the new benchmark is hitting non empty string as well
https://gist.github.com/xepozz/8a7cbb52dd2087634d2ddaa90a21acbe#file-htmlspecialchars-php-L25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I replace it with UNEXPECTED(len > 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, instead, this should not have a branch hint at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is there a guide how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No guide. But UNEXPECTED should only be used for situations that are actually unexpected: it can move the code to the cold text section and pessimize optimizations. I only use it for error conditions. EXPECTED is less aggressive, but still inappropriate here as calling it with length 0 is an edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So,
UNEXPECTED is preferable for error (exceptions?) branch: call strlen() with null/int/resource
EXPECTED is preferable for most popular branch: call strlen() with non empty string
for the rest just don't write any hints?
Also, please use |
Emitting deprecations, warnings, ... causes a performance degradation indeed. |
|
Sure. I thought performance won't be affected much if I disable deprecation warning. |
Is this an apples-to-apples comparison? The bare You should test:
And for each of the cases you should test the base commit against the PR. For benchmarks it also helps if you include the CPU you test this on, since relative performance often is CPU-dependent (e.g. due to cache sizes). |
branch
master
I don't know why it should work, because the time is different every run. I've tried without manual --runs flag, with 1 000 and 10 000 run. Everytime I get different time both on master and the branch. |
@xepozz You're likely not doing enough iterations in your benchmark PHP code, which means that your run is mostly measuring startup time. |
@nielsdos ok <?php
error_reporting(E_ALL ^ E_DEPRECATED);
for($i=0;$i<=10_000;$i++) {
htmlspecialchars(null);
htmlspecialchars("");
htmlspecialchars(str_repeat('x', 16));
htmlspecialchars(str_repeat('x', 4096));
} hyperfine './sapi/cli/php html.php' -r 100 master: first 3 runs is it enough? |
You have to measure each of the 4 inputs separately, and compare them on master vs this PR. That's how you benchmark that there are improvements for the empty string and no degradations for other cases. |
Well, there are results: 3 runs on each branch: master
branch
Summary:master
branch:
Sourcehtml_null.php <?php
error_reporting(E_ALL ^ E_DEPRECATED);
$x=null;
for($i=0;$i<=10_000;$i++) {
htmlspecialchars($x);
} html_empty.php <?php
error_reporting(E_ALL ^ E_DEPRECATED);
$x="";
for($i=0;$i<=10_000;$i++) {
htmlspecialchars($x);
} html_x16.php <?php
error_reporting(E_ALL ^ E_DEPRECATED);
$x=str_repeat('x', 16);
for($i=0;$i<=10_000;$i++) {
htmlspecialchars($x);
} html_x4096.php <?php
error_reporting(E_ALL ^ E_DEPRECATED);
$x=str_repeat('x', 4096);
for($i=0;$i<=10_000;$i++) {
htmlspecialchars($x);
} command: hyperfine './sapi/cli/php html_null.php' './sapi/cli/php html_empty.php' './sapi/cli/php html_x16.php' './sapi/cli/php html_x4096.php' -r 100
|
btw this is sort of duplicate of #18126 that already contain this change (well if my suggestion is applied there). I think it would be probably worth it to pick that one up. @ArtUkrainskiy might be too busy but it would be good to get it can bring some other improvements. Although if your strategy is to do it piece by piece, that's of course even better as we can see impact of each improvement. |
That PR needs some refactoring though (see my comments there) so I'm fine to merge this on its own unless someone objects. |
There were some internal discussions about slower call of So I don't plan any other improvements unless I find others. |
The benchmark numbers do not make much sense to me though. Why master 2. Summary is a bigger difference than branch 2. Summary? |
You may see similar picture in my previous replies. There are just BIG deviation between runs. |
I benchmarked on my i7-4790, just tested the empty string and "X"*16, 2000000 iterations in PHP.
So PR seems worth it, performance difference of "X"*16 seems to be noise. Testing on "X"*4096 is not particularly useful as the cost of the check is upfront before the actual logic loop. |
https://gist.github.com/xepozz/8a7cbb52dd2087634d2ddaa90a21acbe
master
❯ ./sapi/cli/php test.php
Result: 0.12489318847656
Result: 0.025581121444702
Result: 0.024285078048706
branch
❯ ./sapi/cli/php test.php
Result: 0.11230993270874
Result: 0.012480974197388
Result: 0.01120400428772
all the tests were made with local php build with enabled debug and so on
performance increases because of eliminating memory allocation, I hope memory usage is also reduced