New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP8.1: mb_convert_encoding not working with ASCII chars above 127 #8744
Comments
Well, there are no ASCII characters above 127. The stricter behavior is likely deliberate. @alexdowad, could you please clarify? |
Oh yeah, you're right. Didn't even think about this. So this was actually a bug in all older PHP versions? We fixed our application by using ISO-8859-1 instead of ASCII as $from_encoding. |
Yep, that is right.
Yes, it was. There were a lot of bugs in mbstring in older PHP versions. Most of them were so obscure that probably no user of PHP ever experienced them. |
Okay, closing as invalid then. |
...But thanks to @felixwatzka for reporting. Please keep letting us know if you notice anything else which seems unusual. |
I have a case when we load images from the NTEXT field (SQL Server) database. The images are saved to NTEXT from VBScript (I know that storing binary in NTEXT is deprecated and stupid, but the database is 20 years old and can't be easily changed). In PHP 8.0.11 we used the following conversion: mb_convert_encoding($this->getRawData(), 'Windows-1252', 'UTF-8') Which returned the correct binary representation of the Image but in PHP 8.1.8+, it doesn't work. Some bytes are replaced with 3F Any suggestion on how to emulate old behavior? I also struggle to find what exactly changes did this and in what version. UPD: This still works return @\UConverter::transcode($this->getRawData(), 'Windows-1252', 'UTF-8'); |
@rhulka Thank you very much for the report. I will be happy to investigate and report exactly what has caused the difference in behavior and when it changed. If the change is unintentional, we will revert it; if it is intentional, we will advise how you can work around it. Hope to check into this later today if possible. Thanks again for the report. |
@alexdowad, thanks a lot for the quick reply. We will go with To sum up PHP 8
PHP 8.1
|
@rhulka Please also post the value of |
Just investigating but can't get very far without knowing what |
ntext_image.txt - raw data from the database (NTEXT field SQLServer 2016), it's not a "text". I can't upload with the .bin extension. - here is result image after conversion in PHP 8.0.11 (MacOS and Debian in docker container) - after encoding in PHP 8.1.12 ntext_image_bin2hex.txt - bin2hex output $image = mb_convert_encoding($rawDataFromDbNtextField, 'Windows-1252', 'UTF-8');
header('Content-Type: image/jpeg');
echo $image;
exit; Thank you |
OK, it looks like the error markers ( Please see this information on Windows-1252: https://en.wikipedia.org/wiki/Windows-1252 Looking at the conversion table there, you can see that some codepoints like U+0081 and U+008F have no mapping in Windows-1252. The old implementation of mbstring would convert U+0081 to The Wikipedia page does have an interesting note here:
So if we want to improve compatibility with the Win32 API, we could restore mappings for U+0081, U+008D, U+0090, and U+009D. I'm not sure if this would be enough to make Windows-1252 work for @rhulka's use case or not. Any comments?? |
Calling @cmb69 |
Thinking about this a bit more. Given that mbstring is not "greenfield" but is a library with a long history, I think restoring the BC behavior is the right thing to do here. |
I think it would be enough. The encoder in VBScript (a legacy app that we rewrite in PHP) does just that, that is why we got the idea to use this approach in PHP in the first place. I am not pro in encodings conversion, just noticed that behavior differs between versions, and |
Does it was changed in this commit b5ff87c ? Then I can revert this and re-compile PHP from sources for testing |
@rhulka Yes, that should be the one. |
I agree. |
I should have a patch within a day or so. |
Thank you for bringing up this issue, I figured out that our app was exploiting this "bug" by converting from I apologize for commenting on a closed issue, I just thought it would be useful to add my case here so someone could maybe find this page in Google, I couldn't do it so had to browse the issues list. |
Thanks for posting, @themao. |
Description
The following code:
Resulted in this output:
But I expected this output instead:
In fact, since PHP 8.1, no ASCII character above 127 will be converted correctly. You can try the above example also here: https://3v4l.org/7ZASZ
PHP Version
PHP 8.1.6
Operating System
Ubuntu 22.04
The text was updated successfully, but these errors were encountered: