John Davidson

encoding - Converting UTF-16LE to UTF-8 in PHP - latin extended

0 comments
Message:


I am trying to convert a UTF-16LE txt file to UTF-8.


I found a possible solution here on Stackoverflow and tried the following two:


 iconv($in_charset = 'UTF-16LE' , $out_charset = 'UTF-8' , $str);

and


 mb_convert_encoding($str, 'UTF8', 'UTF-16LE');

This seems to be working, unfortunately only partially. The problem seems to be that I am trying to convert a txt file that is in Czech, which uses Latin extended characters, such as š, č, ř, ž, Š, Č, Ř, ě etc., i.e. letters from the Czech alphabet, which is much longer than the English one. And unfortunately all these non-English characters are encoded incorrectly. Is there any way around this to convert these correctly as well?


I thought about str_replace for the converted text, but the problem is that some of the above are not converted to "weird symbols", but instead to other letters such as "q" or "+" sign, which means that if I replace them, it would also replace all "q" letters or "+" symbols, which is not what I want.


Here are all the possible non-English characters that we use in Czech:


 ěščřžýáíéťůúňĚŠČŘŽÝÁÍÉŮÚŤŇ


My Answer:
UTF-8 characters at all, so this approach wouldn't work for all cases.

One possible solution is to use the `mb_convert_encoding` function with the `UTF-16LE` encoding and specify the `UTF-8` encoding along with the `//IGNORE` flag. This flag will ignore any characters that cannot be converted to UTF-8. Here's an example:

php
$utf8_str = mb_convert_encoding($str, 'UTF-8', 'UTF-16LE//IGNORE');


This should help in converting the Latin extended characters correctly to UTF-8. You can also try different combinations of encoding and flags to see which one works best for your specific case.

Rate this post

4 of 5 based on 8451 votes

Comments




© 2024 Hayatsk.info - Personal Blogs Platform. All Rights Reserved.
Create blog  |  Privacy Policy  |  Terms & Conditions  |  Contact Us