I did a quick bit of research on Japanese character encodings and how functions in PHP handle the conversions between them.
The table below summarizes the results (click to enlarge).
We can see the following;
- Although Shift-JIS (SJIS) is still the most common format in Japan, it is terrible at handling special “hankaku” (single-width) characters. It simply leaves out a lot of them; even the ones that we would like to use quite frequently.
- The PHP
mb_convert_encoding
function gives up when it can’t find a matching character, and deletes the character. On the other hand,iconv
does a pretty good job of finding a good substitute if we specify//TRANSLIT
. - Gathering from webpages that I can find on the subject, a lot of people seem to prefer
mb_convert_encoding
with the sjis-win encoding. This is a lousy solution if you are using special “hankaku” characters. It’s better to useiconv
with CP932 encoding and//TRANSLIT
. There is one snag with CP932 encoding with//TRANSLIT
and that is with regards to the “hankaku” yen character (“¥”). Converting to “yen” isn’t really a nice solution. You can see however that//TRANSLIT
always converts to ASCII, and “yen” probably is the only way you can sensibly convert the ¥ mark. Otherwise, it’s a good idea to use the “zenkaku” (double-width) “¥”. - The micro mark “µ” is not supported in Shift-JIS but the greek mu “μ” is. Therefore, if you want to write a micro mark in Shift-JIS, you should use the greek mu instead. Again,
iconv
with//TRANSLIT
does the correct thing (converting it to “u”).