パソコンのブランドイメージ

私はもっぱらMacを使っていますが、たまにWindowsを使わなければならないこともあるので、昨日Windowsパソコンを買いに行きました（オフィスにあるのは2003年発売モデルでさすがにきつくなってきましたので）。

数多くのメーカーが並んでいる中で、結局は中古のThinkPadを購入しました。実はこのとき、自分の中でブランドイメージをかなり強く意識しました。そこで、現時点で自分がパソコンブランドに対して持っているイメージを記録する意味で、ここに書きとどめようと思います。

日本メーカーのブランドイメージ

世界で戦えていない
高い
今後、何年続く変わらない。もう既に店じまいしている（NECやSony VAIO）

この中で例外なのは世界で戦ってきたToshibaのDynabook。そして軽くて頑丈なPanasonicのLet’s Note。

ただ”Let’s Note”というブランド名は、製品の質実剛健さとは裏腹に、英語にするとあまりにも軽いイメージで、恥ずかしくて外国に持って行けないという感覚があります。

アジアメーカー

性能の割には安い
もしかしてボロいかも知れない

「ボロいかも知れない」というのはアジアメーカーに限ったことではなく、価格戦争に巻き込まれているすべてのウィンドウズOEMに言えることではあるのですが、アジアメーカーだとより強い不安があります。

特にSofmapに置いてあったASUSの展示機は、トラックパッドがなんだか浮いている感じだったのでかなり不安を覚えました。

USメーカー

DELLとかHPとか。

外資系の本社支給だから使っているのでしょう？
安く作るためにアジアに丸投げしているんでしょう？
本当はパソコンを売りたくないんでしょう？（HPとか）

そしてThinkPad

こうしてみると、ウィンドウズ機の中でブランド的に良いイメージのものって全然無いのがわかります。その唯一に例外とも言えるのがThinkPadではないでしょうか？

日本で生まれた製品！
伝説的なキーボードへのこだわりに見られるように、スペック以外にもこだわっているという安心感

そういうこともあって、中古でCore i5搭載のT410sを中古で購入しました。かなり使い減らされていて、ガタが来ていましたが、それでも下手に新品のASUSを買うより長持ちするんじゃないかと思わせるところがThinkPadにはあります。

私にとって、ブランドというのは「目に見えないところ、スペック以外のところにも注意を払っているよ」という暗黙の約束です。だからブランドというのは、購入後にじわじわと良さが伝わってきます。そして「やっぱりこのブランドを買って良かった」と思えるのです。

安物ブランドは買った後に後悔します。そして「まぁ安かったからしょうがないよね」と自分に言い聞かせることになるのです。

この差は大きいと思うのですが。

Internet Explorer 8, 9 usage decline is quite slow

With the support for Windows XP ending in three weeks, we as web developers would hope that usage of Internet Explorer (the newest version of IE to run on Windows XP) to rapidly approach zero.

Support for Windows XP is ending

Unfortunately, this doesn’t seem to be the case. Looking at statistics from StatCounter, it appears that IE8 usage is still 7-8% in the USA and Japan. Encouragingly, the pace of usage decline seems to be accelerating and we might reach almost zero within the year 2015.

StatCounter browser version partially combined US monthly 201302 201402

StatCounter browser version partially combined JP monthly 201302 201402

What is more troublesome is IE9 data. IE9 usage is declining and is already quite low at 5-7%. However the pace of usage decline is quite slow and it looks like it will be with us at least as long as IE8. This is probably due to corporations blocking automatic updating of Internet Explorer.

Analyzing StatCounter data at per-day resolution, we can see that before IE10 debuted, IE9 was used a lot during weekends. However, after IE10 was introduced and most of the consumer users shifted to IE10, corporate users remained on IE9. As a result, IE9 usage became more pronounced during week days.

StatCounter browser version partially combined US daily 20120201 20140231

StatCounter browser version partially combined JP daily 20120201 20140231

In summary, despite Windows XP not being supported after April this year, it looks like IE8 will still be with us, at least till the end of this year. IE9 also looks quite stubborn and since it’s on Windows Vista and 7, it’s unlikely that we will see it go away. We web developers will still have to support these legacy browsers for another year.

日本人のiPhone好きは特殊か、それとも普通か？

なぜ日本人はiPhoneが好きか？

2014年1月15日にカンター・ジャパンが日本のiPhoneのシェアが69.1%で、Androidの30%を圧倒していることが紹介されました。そしてどうして日本人はこんなにもiPhoneが好きなのかということがネット上で話題になりました（例えばJ-Cast「日本人はなぜこんなにiPhoneが好きなのかユーザのITリテラシーが低いから？」）。

理由ははっきりしていました。それは日本ではiPhoneの価格を補填していて、Android端末と価格差がなかったからです。場合によってはiPhoneの方がAndroidを買うよりも割安という状態でした。

「同じ価格なら、大多数の人はiPhoneを買うでしょう？」

というわけです。

もちろん他の要因を考えることはできます。理屈としてはかなり無理がありますが、例えば上記のJ-Castの記事では以下のような要因も挙げられています。

角川アスキー総合研究所主席研究員の遠藤諭氏に聞くと、考えられる要因を挙げた。まず、欧米と比べて日本のユーザーはITリテラシーが低いとの指摘だ。欧米の学校におけるIT教育の素地は、日本とは比べ物にならないという。そのため、スマホ入門者にとっては比較的操作がしやすいiPhoneに流れるのではないか、と推測した。

まぁ、どう考えてもこじつけとしか言えない、とんでもない理屈ではあります。

他にもいろいろなことが言われていますが、同程度に穴だらけの理屈がまかり通っています。

実は中国人も日本人と同程度にiPhoneが好き

つい先日、Umengという中国に強みのあるアプリ・アナリティックス企業がスマートフォンやタブレットの利用動向のレポートを出しました。その中でこう述べています。

High-end smart phones (pricing above 500US$) have a significant market share in China, contributing 27% of total devices.

… 80% of these are iPhone.

つまり500 US$よりも高価なスマートフォンを購入できるだけの豊かな中国人の間では、80%の人がiPhoneを購入しています。一言で言うと

「買うことさえできれば、大多数の人はiPhoneを買うでしょう？」

というわけです。

日本ではiPhoneが実質0円なので、「買うことさえできれば」というのはスマートフォンユーザの全員が満たしている基準になります。

ということで日本のiPhoneのシェア 69.1%と、中国のハイエンド・スマートフォン・ユーザのシェア80%というのは同じものを見ていると言えます。

結論として中国でも日本でも、

「買うことさえできれば、7-8割の人はiPhoneを買う」

と結論できます。

なかなかデータは手に入りませんが、日本と中国に見られるこの数字はおそらく世界の大多数の国でも同じだろうと私は推測しています。つまり日本人のiPhone好きは特殊な現象はなく、ましてやガラパゴスでもなく、普遍的なことだと思います。

The OS for Wearble Devices (Android Not)

Google is releasing an Android SDK for wearables this month (March, 2015).

So what is their vision for wearables is? The example that Pichai reportedly gave is a “smart jacket” with sensors.

Seriously?

The only wearables that I know of that are currently succeeding in the mass market, are the fitness trackers. The Nike FuelBand’s and the Jawbones. NPD has reported that the market for digital fitness devices was $330 million. Given the price of these devices, it looks like millions have been sold.

So the question is, does the FuelBand run Android? Does it run Linux?

The answer lies in the hardware that enables them to be small enough to comfortably fit on your wrist and last a full day on a single battery charge. It looks like the CPU is an ultra-low power ARM Cortex-M3 with 256 Kbytes flash (hacknikefuelband.com).

Not really enough to run Linux or Android.

Even the Pebble smartwatch which is a bluetooth connected notification center, uses a non-Linux OS (FreeRTOS) according to Wikipedia.

Simply put, the hardware that would comfortably fit on your wrist cannot run Android yet. Pichai is right; you need something jacket-sized.

64-bit Android

In September, 2013, just after the iPhone 5s was announced, I wrote that we would be able to gauge Google’s commitment to the high-end based on when the 64-bit version of Android would be released. I commented that Google might not prioritize 64-bit, mainly because their focus has shifted to the low-end with the departure of Andy Rubin.

Until now, I had not heard any credible reports on when a 64-bit version of Android would be available. Now, on March 11th, ABI Research reports that “the first 64-bit version of Android OS is expected in the second half of the year”.

At this point, there is no way of knowing how accurate ABI Research’s prediction is. There is also no way of knowing if Android and ARM’s 64-bit implementation will deliver a significant performance improvement like Apple’s A7 chip did, or whether the gain will be rather insignificant as most industry pundits claimed when the A7 was announced.

All I can say is that we don’t know yet.

Writing No-Framework ASP.NET (part 3: reusing code)

In my previous two articles describing my writing a no-framework ASP.NET file, I described how ASP.NET handles encoding and how a .aspx file should be structured.

In my third article, I will discuss how we can make code reuseable.

Making code reusable in classic ASP, PHP, etc.

Here, I will use the example that I showed in my second article and discuss how we could reuse the link method.

In PHP, you reuse code by placing it in a separate file and using the import method to include it into all the files where you want to use it.
[php]
import(‘filethatdefinesthelink_method.php’);
[/php]

You can use this (and variants like require) to reuse libraries, or reuse HTML fragments (i.e. common headers and footers of the pages in the website).

In classic ASP, you use the include syntax to the same effect.
[html]

[/html]

ASP.NET is different. There isn’t a single method to reuse code but instead there are several ways that each have its own intended use-cases. Coming from a PHP background, it was quite confusing; I didn’t know which method I should be using. This article give you a good idea of the options that you have.

The important thing is that ASP.NET makes the distinction between what kind of code you intend to reuse. If you are going to reuse libraries which contains functions, then you should put them in App_Code. On the other hand, if you want to reuse HTML fragments, then you should use user controls.

Reusing functions

The App_Code must be created at the root of you web site. All files in this directory or any subdirectories that have the extension .vb or .cs will be treated as Virtual Basic or C# files respectively.

The code in App_Code will be available in every ASP.NET file that is in this application. Hence this is the ideal location to store libraries. In our case, it’s a good place to define the link function.

The downside to AppCode is that the files are pure Visual Basic or C#. This means that if you can’t simply write HTML code in the files. You have to generate them as strings (and there is no HEREDOC syntax in Visual Basic either). Hence, the AppCode folder are not a good place to put “partials”; large fragments of reusable HTML code.

For our example, we create the following link.vb file in the App_Code directory.

[vbnet]
Imports Microsoft.VisualBasic
Imports System.Net
Imports System.IO

Private Function urlEncodeUtf8(myString As String) As String
urlEncodeUtf8 = HttpUtility.UrlEncode(myString, new System.Text.UTF8Encoding)
End Function

Public Function link(label As String, endpoint As String, query As String(,)) as String
Dim tuples As New ArrayList()
Dim i As Integer
For i = LBound(query) To UBound(query)
tuples.Add(query(i, 0) & “=” & urlEncodeUtf8(query(i, 1)))
Next
Dim href As String = “http://api.example.com/api.php?ep=” & endpoint & “&” & Join(tuples.ToArray(), “&”)
link = “” & label + “”
End Function
[/vbnet]

Then then we simply use the link function as we did in our previous article.

[vbnet]

[/vbnet]

Reusing HTML fragments

Another way to reuse code in ASP.NET is User Controls.
Instead of putting link function in the App_Code directory, let’s see how we use reuse it with User Controls.

To implement the link function using User Controls, we would create the following control link.ascx file and place it in a “Controls” directory that we create at the root of the web site. Unlike the “AppCode” directory which always has to be at the root-level of the web site and named exactly “AppCode”, the location and name of the “Controls” directory is arbitrary because we specify the file path whenever we use our controls.

[vbnet]
‘ code for “~Controlslink.ascs”
‘

Public label As String
Public endpoint As String
Public query As String(,)
Protected href As String

Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs)
link(label, endpoint, query)
End Sub

Private Function urlEncodeUtf8(myString As String) As String
urlEncodeUtf8 = HttpUtility.UrlEncode(myString, new System.Text.UTF8Encoding)
End Function

Public Sub link(label As String, endpoint As String, query As String(,)) as String
Dim tuples As New ArrayList()
Dim i As Integer
For i = LBound(query) To UBound(query)
tuples.Add(query(i, 0) & “=” & urlEncodeUtf8(query(i, 1)))
Next
href = “http://api.example.com/api.php?ep=” & endpoint & “&” & Join(tuples.ToArray(), “&”)
End Function

<a href="”>
[/vbnet]

Then, to use this control in a page, we would do the following;
[vbnet]
‘…

‘…

[/vbnet]

The src attribute of the Register directive declares where the source code for the user control exists (“~Controlslink.ascx” where “~” indicates the root level of the web site). TagPrefix and TagName are used to define the name of the tag we will use for this control. This tag indicates where the user control will be inserted.

The is where the link is going to be inserted in the resulting HTML. You can see that we are providing the label and the endpoint values using attributes. Notice that in label and endpoint are defined as public accessible attributes in the link.ascs user control.

Ideally, we would like to send value for the query attribute to the user control in the same manner. However, it is not possible to send complex values as attributes. You have to do this programmatically on the user control object which is available as the link_1 variable (specified in the id attribute of the tag). That’s why we set the value of link_1.query on a separate line and not inside the tag.

This code example illustrates the strengths and the weaknesses of user controls. User controls are good if you have a lot of HTML that you want to output because you can simply write down the HTML in the .ascx file. They are also good if the values that you want to pass in are simple. Hence they are ideal for header and footer fragments on your web page.

On the other hand, our example simply the weaknesses of user controls. The link example outputs only a short fragment of HTML, so the strengths of User Controls are not being used. On the other hand, passing in complex values is cumbersome, so User Controls are not a good idea if you need to do a lot of this.

If your intention was to reuse large fragments of HTML, user controls would definitely be the way to go.

Summary of the whole project

In my series “Writing No-Framework ASP.NET”, I wrote about my experience in writing simple ASP.NET code to add a few features to otherwise static HTML files.

Coming from a PHP background, these were the things to look out for;

ASP.NET will do charset encoding and decoding in the background. Make sure that it doesn’t meddle with your encodings in a way that you don’t expect it to.
ASP.NET forces you to put different parts of your code in different places. Know where you should put your function definitions and your string output statements.
Passing complex arguments to functions in a terse way is not common in most code examples that you can find. Compared to PHP, Ruby, etc., it can be downright ugly. It is possible to make it acceptable though.
There are different ways to reuse code based on whether you want to share libraries or whether you want to share HTML fragments. This can be confusing.

In closing, I would like to review an occasion where this No-Framework approach would be necessary. A typical case would be the following;

The original website is build with static HTML and is hosted on IIS.
The IIS server is run with default settings by a cautious administrator. The most we can do is persuade him to activate ASP.NET or ASP. We cannot persuade him to install non-Microsoft extensions like PHP.
We only need to add some simple code to pre-existing static HTML pages to make them easier to manage, or to pull-in some content from an API.
After setting the code up, the web-designer will edit the file more frequently than you do, using an editor that is completely out of your control. The encoding of the file and the encoding of the output is their decision, not the programmers.
Maybe we need some simple single-page stuff like contact forms.
You, the programmer, mostly spend your time in PHP, Ruby, Python and other cool web languages. You’re worried that even if you managed to learn enough ASP.NET to use webform postbacks, you’ll forget how to do it in half-a-year. You don’t want to use postbacks. You want to write code that resembles PHP code.

I’m sure that this is quite a common situation but I couldn’t find web sites that guided me through the steps. I hope that this series will help others in a similar situation.

Writing No-Framework ASP.NET (part 2: code structure)

Basic ASP.NET web page structure

This is the second article in my series on writing a simple ASP.NET web page that resembles a simple PHP file. My first article discussed how encodings are handled in PHP.NET

In this second article, I will discuss the code structure.

The Single-File Page Model structure

Microsoft describes the single-file page model where the page’s markup and the programming code is in the same physical .aspx file. This is exactly the same model that ASP.NET’s predecessor, classical ASP was build upon, and it is also the model that PHP uses.

In my first article, I described my requirements as being quite simple. All I needed to do was to add simple functionality to a pre-exisiting web page (static HTML file). The single-file page model was the obvious place to start.

The following is a bird’s-eye view of Microsoft’s example;

[vbnet]

‘ Controller code to set the text on Label1
‘ in response to a click on Button1

‘ View code in plain HTML with tags indicating the locations
‘ of Label1 and Button1

[/vbnet]

Let’s see how I added the functionality that I required onto this structure.

Encoding links easily

My objective is to create a helper function that generates url-encoded links.

Suppose I want to have links to a query results page. The query that we want to send is;

[ruby]
reactivity: “Bovine（ウシ）”
[/ruby]

Using an online url encoder, we can see that “Bovine（ウシ）” should be url encoded to “Bovine%ef%bc%88%e3%82%a6%e3%82%b7%ef%bc%89” (if we use UTF-8). Hence the link should look like the following;

[html]
Bovine（ウシ）
[/html]

We could always use the online url encoder and then paste the results to a static HTML file. However it would be difficult to manage the file because the urlencoded URL is incomprehensible to human beings.

What we need is a link function that will take non-ASCII arguments and create the url encoded link for us.

The code that I came up with is shown here;

[vbnet]

Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs)
‘ initialization stuff to do immediately after page load
End Sub

Private Function urlEncodeUtf8(myString As String) As String
urlEncodeUtf8 = HttpUtility.UrlEncode(myString, new System.Text.UTF8Encoding)
End Function

‘…

[/vbnet]

Virtual Basic syntax itself is quite simple and easy to grasp. However, without an ASP.NET nor Visual Basic background, there were a quite a few concepts that I initially had difficulty with.

Code/Render blocks in ASP.NET

In PHP, you have <?php> tags that you can put anywhere in the page. Inside these tags, you place PHP code. There is no limit to what code you can insert and all the tags function identically. However, this is not the case for ASP.NET. ASP.NET .aspx files have distinct blocks of code that are called code blocks and render blocks. These two are intended to be used in discrete ways.

The code blocks are the regions surrounded by <script runat="server"> tags. Here you define global variables and functions. On the other hand, you cannot directly write code that will be executed; you can only write definitions. If you want to write code that will be executed, you have to write it inside the Page_Load function which is the event handler that is called immediately after the page is loaded.

Render blocks are surrounded by <% %> or <%= %> tags and are executed when the page is rendered. At first glance, they look exactly like the tags in classic ASP or the <?php ?> tags in PHP; you embed code inside HTML. There is a big difference though. You cannot declare functions or subroutines inside these render blocks. You can however declare variables.

So basically, you define your functions in the code blocks and render your results in render blocks. I suppose the idea is to make sure that you don’t mix too much code with your HTML.

In my example, we defined all our functions (Page_Load, urlEncodeUtf8 and link) at the top of the page inside a code block (<script>). In the render block (<%= %>), we simply called the link function.

Passing complex data as arguments in Visual Basic

For the link function, we want to pass the query as a data structure (not a string). In PHP, the natural choice would be to use associative arrays. Hence the syntax for calling the link function would look like this;

[php]
<?php echo link(“Bovine（ウシ）”, “endpoint.php”,
array (“type” => “Whole IgG”,
“reactivity” => “Bovine（ウシ）”,
“title” => “Bovine Whole IgG secondary-antibodies”)) %>
[/php]

PHP 5.4 has added a short array syntax which makes it even more convenient to write the arguments for this function;

[php]
<?php echo link(“Bovine（ウシ）”, “endpoint.php”,
[“type” => “Whole IgG”,
“reactivity” => “Bovine（ウシ）”,
“title” => “Bovine Whole IgG secondary-antibodies”]) %>
[/php]

Ruby also has a simple syntax for associative arrays or hashes;

[ruby]
“Whole IgG”,
“reactivity” => “Bovine（ウシ）”,
“title” => “Bovine Whole IgG secondary-antibodies” %>
[/ruby]

Ruby 1.9 took this one step further and made it this simple;

[ruby]

[/ruby]

I am emphasizing the terseness of the argument syntax because this code is going to sit inside the HTML. Verbose code would completely interupt the HTML and make it harder to understand. Hence if we are going to insert Visual Basic code into the view at all, we should try to make it simple.

Now Virtual Basic has collections like ArrayList, Hashtable, SortedList, NameValueCollection and others. A Hashtable or a NameValueCollection would be ideal for the arguments that we want to pass. In fact, the Request.QueryString property that is used to retrieve GET parameters returns a NameValueCollection.

The problem is that the NameValueCollection and all the other collections do not have a terse syntax for populating it with values. There apparently is a new From keyword available from .NET Framework 4 that addresses this, but it doesn’t seem to be widely used.

If we were to use a NameValueCollection, the traditional syntax (without “From”) would be like this;

[vbnet]

‘
[/vbnet]

This is totally ridiculous.

The exception is a Visual Basic Array. Visual Basic provides a relatively terse syntax to initialize arrays.

[vbnet]
Dim myArray() As String = {“first_element”, “second_element”}
[/vbnet]

and you can do multi-dimension arrays like so;

[vbnet]
Dim myArray() As String(,) = {{“1-1”, “1-2”}, {“2-1”, “2-2”}}
[/vbnet]

Because the only terse way was to use multi-dimension arrays, we ended up using the following argument structure for the link function.

[vbnet]

[/vbnet]

To sum up, we gave up on the more desirable collections like NameValueCollection because it would be ridiculously verbose. Instead we used multi-dimensional arrays. I would say that the result is pretty OK, but it is troubling how Visual Basic seems to be indifferent to making code concise.

Solution for .NET 4.0

If we are on .NET 4.0, we can use the from keyword to use collections and still keep the code concise (official documentation);

[vbnet]

[/vbnet]

This is much better, but unfortunately doesn’t work on .NET 2.0. It’s unfortunate that Microsoft doesn’t promote this more on their documentation for the collection objects, because I think it’s a very important feature. I was initially bewildered that the NameValueCollection class couldn’t take a Visual Basic Array as an argument on its constructor method, but I suppose that that was due to compatibility concerns with C#. The corresponding syntax in C# is simpler and actually looks as if we are passing an array to the constructor, although that isn’t what is happening behind the scenes.

Summary

In this article, I showed how we used the single-file page structure to add simple code to a web page.

Although this was a very simple exercise, we learnt that we can’t place functions anywhere we want in ASP.NET. We have to place them inside a code block. We also learnt that sending complex arguments to a function can be a bit difficult.

In the next article, I will show how to reuse code for the link function in other pages, which was again quite a surprise for me.

Writing No-Framework ASP.NET (part 1: encodings)

For the past week, I was writing some code in ASP.NET that I wanted to run on a Microsoft IIS server. Here I want to post what I learnt and what the structure of my code looked like.

First, my requirements.

I didn’t want to code a full web application. All I needed to do was to add a few dynamic elements to otherwise static web pages written as HTML files.
I wanted to modify the existing pages as little as possible.
I didn’t want a solution that would require installing plugins to the IIS server. This meant that I couldn’t use PHP and that I had to code in ASP.NET.
I didn’t want to learn a lot to do this. Having to use an IDE or having to learn a framework was completely out of the question.

These are really simple requirements. However, the solution turned out to be quite complicated ranging from code reuse to character encoding. The following is an outline of what I will discuss.

Understanding how ASP.NET handles source file encoding. (part 1)
Basic ASP.NET web page (.aspx) structure. (part 2)
Making Visual Basic function calls as terse as possible in the view code. (part 2)
Ways to reuse code in ASP.NET. (part 3)

I’ll start with understanding source file encoding and then describe the others in separate articles.

The `.aspx` files

The simplest way to code ASP.NET web sites is to use ASP.NET web pages; .aspx files. These are similar to PHP .php pages and you can include both code and HTML markup into a single file.

The .aspx files are compiled before being run in IIS.

An important thing is that all strings in .NET are represented in Unicode. Strings that are not in Unicode are not allowed. This is very different from PHP. In PHP, strings are a simple collection of bytes. It is up to the programmer to keep track of which charset each string is encoded in.

This means that any .aspx files that are not in Unicode have to be charset converted from their original coding on compilation. This includes all the hard-coded HTML strings. Also, if the output from the server is going to be non-unicode, we have to do that conversion (internal Unicode to output encoding conversion) as well.

Example of how encoding of an `.aspx` file would happen

Since a large number of websites in Japan still use Shift-JIS encoding, let’s assume that we are working with a Shift-JIS encoded website. Hence the source files that we want to add dynamic ASP.NET code are in Shift-JIS. We also want the HTML output to be in Shift-JIS so that all web pages on the site (both static HTML pages and .aspx pages) have the same charset.

In this scenario, character encoding conversion of .aspx files would happen in the following manner;

Shift-JIS encoded .aspx source files are converted to Unicode on compile ↓ Code inside the .aspx source files is run as Unicode ↓ Output is converted back to Shift-JIS

With this in mind, let’s look at how to configure stuff to ensure that the encoding happens correctly.

ASP.NET configuration hierarchy

Before going into the charset configuration, I want to briefly touch on how ASP.NET web applications are configured. The configuration hierarchy is quite complex. Compare this to PHP where you basically have one php.ini file to configure all PHP instances, and the Apache .htaccess file where you can put additional settings. In both ASP.NET and PHP, you can additionally change settings inside the application (.aspx or .php).

However, even with ASP.NET’s complex hierarchy, you will probably only have to worry about the web.config file. You basically place a web.config file at any location in your web applications file hierarchy, and that file will change the setting for that directory and any subdirectories. It works like Apache .htaccess.

Telling ASP.NET what charset the source code files (`.aspx` files) are encoded in

Configurations for encoding are set in the globalization property of a web.config file with the fileEncoding attribute.

[xml]

[/xml]

If fileEncoding is not specified in the configuration hierarchy, the system encoding of the server is used. For machines running a Japanese OS, this would be Shift-JIS.

A list of possible encodings is provided by Microsoft.. For Japanese encodings, we have “utf-8” (code page 65001), “shift_jis” (CP932: code page 932) and “EUC-JP” (code page 20932). These encodings are not pure Shift-JIS or pure EUC-JP but have extensions for Windows.

Now that I’ve talked about how to set the ASP.NET configuration for fileEncoding, let’s see how this affects charset conversion.

The rules are as follows;

If the .aspx source code file has a Unicode BOM, then the file is considered to be in the Unicode encoding as described in the BOM.
If there is no BOM, then the file is considered to be in the encoding as configured in the ASP.NET settings (i.e. web.config, etc.).

Telling ASP.NET what charset the HTML output should be encoded in

ASP.NET internally manages the .aspx file contents in Unicode, and converts them to the responseEncoding before it sends the response to the client browser. ASP.NET also sets the “Content-type: text/html; charset=???” HTTP header to responseEncoding.

ASP.NET however does not set the <meta charset=???> tag inside the HTML <head> element. You have to manage this yourself.

ASP.NET uses the following locations to set responseEncoding.

web.config

[xml]

[/xml]

The @ Page directive
.aspx files contain a @ Page directive to set page-specific attributes. You can set responseEncoding here with the following syntax;

[vbnet]

[/vbnet]

The Page object
You can also set responseEncoding on the Page object directly in code.

[vbnet]
Page.responseEncoding = 932
[/vbnet]

Telling ASP.NET what charset the request parameters are encoded in

In addition to setting the charset of the source file (fileEncoding) and setting the charset of the HTML output (responseEncoding), ASP.NET has another charset that you can specify. That is the charset of the request (reqeustEncoding).

This setting affects how the query-string data and the data coming in from POST requests is interpreted by the ASP.NET server. You set this in web.config like so;

[xml]

[/xml]

The default value is “utf-8”.

The charset used by browsers to send queries and post data is a complicated issue. Ruby-on-Rails for example, adds an extra parameter to ensure that all data is in UTF-8.

Microsoft’s documentation suggests that reqeustEncoding should be set to the same charset as responseEncoding for a single server applications. Of course this depends on how ASP.NET servers work, but in general I don’t think this is a good idea. I think reqeustEncoding should be set to “utf-8” regardless of the responseEncoding (the charset of the HTML output), and this is also how Ruby-on-Rails does it.

Encoding settings for a Shift-JIS encoded website

Let’s go back the settings that would be required if we were working on a Shift-JIS encoded website. The requirements are;

.aspx source files are encoded in Shift-JIS.
HTML output is in Shift-JIS.

Then the web.config file should look like

[xml]

[/xml]

I arbitrarily set requestEncoding to “utf-8”. This is how I would set up the system, but it really depends on how your server decodes requests. It does not affect the HTML output from your .aspx files.

Summary

This was the first of my series on working with ASP.NET. It dealt with how ASP.NET handles source file encoding. ASP.NET has to encode the whole source file (.aspx files) into Unicode even before a programmer touches the code, and that is why you need a setting. This is also why you have to specify the output encoding.

PHP doesn’t meddle with string encodings in the source files. Encodings are performed on a per-function basis and the programmer is responsible for managing conversion. In practice, programmers will convert request parameters and output similar to ASP.NET. However, programmers will seldom touch the hard-coded HTML strings in the source files. The idea of converting the encoding of hard-coded HTML is quite surprising and it was a shock that ASP.NET does this in the background.

The ASP.NET way is not inherently a good or bad idea, but it can cause issues when you are simply adding dynamic content to a pre-existing website. You need to make sure that your settings aligns with the encodings and the workflows of your colleagues who might edit your .aspx files with various editors.

Coming from a web-development background, the ASP.NET way is certainly alien.

Notes on Character Encoding Conversions

I did a quick bit of research on Japanese character encodings and how functions in PHP handle the conversions between them.

The table below summarizes the results (click to enlarge).

We can see the following;

Although Shift-JIS (SJIS) is still the most common format in Japan, it is terrible at handling special “hankaku” (single-width) characters. It simply leaves out a lot of them; even the ones that we would like to use quite frequently.
The PHP mb_convert_encoding function gives up when it can’t find a matching character, and deletes the character. On the other hand, iconv does a pretty good job of finding a good substitute if we specify //TRANSLIT.
Gathering from webpages that I can find on the subject, a lot of people seem to prefer mb_convert_encoding with the sjis-win encoding. This is a lousy solution if you are using special “hankaku” characters. It’s better to use iconv with CP932 encoding and //TRANSLIT. There is one snag with CP932 encoding with //TRANSLIT and that is with regards to the “hankaku” yen character (“¥”). Converting to “yen” isn’t really a nice solution. You can see however that //TRANSLIT always converts to ASCII, and “yen” probably is the only way you can sensibly convert the ¥ mark. Otherwise, it’s a good idea to use the “zenkaku” (double-width) “￥”.
The micro mark “µ” is not supported in Shift-JIS but the greek mu “μ” is. Therefore, if you want to write a micro mark in Shift-JIS, you should use the greek mu instead. Again, iconv with //TRANSLIT does the correct thing (converting it to “u”).

Beware of Character Encoding during Cut & Paste of Websites

The issue is very simple;

Do not assume that a website written in “Shift-JIS” will only contain characters that can be represented in “Shift-JIS”.

If you copy any characters that cannot be represented in “Shift-JIS” and paste them to another web page also coded in “Shift-JIS”, it may generate garbled-text (Mojibake).

For example, you may copy text from a website coded in Shift-JIS which contains ™, © or ® to code ™, © or ®. These characters are not available in Shift-JIS. If you paste them in your webpage which is also in Shift-JIS, you will see garbled-text or a “?” sign (Mojibake).

Another example is if the page has an element that is loaded via Ajax. The Ajax payload will be handled by Javascript, which will handle the payload as Unicode. It is capable of inserting characters into the DOM that are not representable in Shift-JIS.

The thing to keep in mind is that the HTML character set is only a transfer protocol. It does not govern or limit in any way the text that can be displayed on the browser. Hence you cannot assume that all the text that you see on the browser is encodable in a particular encoding except Unicode.

Month: Mar 2014

パソコンのブランドイメージ

日本メーカーのブランドイメージ

アジアメーカー

USメーカー

そしてThinkPad

Internet Explorer 8, 9 usage decline is quite slow

日本人のiPhone好きは特殊か、それとも普通か？

なぜ日本人はiPhoneが好きか？

実は中国人も日本人と同程度にiPhoneが好き

The OS for Wearble Devices (Android Not)

64-bit Android

Writing No-Framework ASP.NET (part 3: reusing code)

Making code reusable in classic ASP, PHP, etc.

Reusing functions

Reusing HTML fragments

Summary of the whole project

Writing No-Framework ASP.NET (part 2: code structure)

Basic ASP.NET web page structure

The Single-File Page Model structure

Encoding links easily

Code/Render blocks in ASP.NET

Passing complex data as arguments in Visual Basic

Summary

Other articles in this series

Writing No-Framework ASP.NET (part 1: encodings)

The `.aspx` files

Example of how encoding of an `.aspx` file would happen

ASP.NET configuration hierarchy

Telling ASP.NET what charset the source code files (`.aspx` files) are encoded in

Telling ASP.NET what charset the HTML output should be encoded in

Telling ASP.NET what charset the request parameters are encoded in

Encoding settings for a Shift-JIS encoded website

Summary

Other articles in this series

Notes on Character Encoding Conversions

Beware of Character Encoding during Cut & Paste of Websites

日本メーカーのブランドイメージ

アジアメーカー

USメーカー

そしてThinkPad

なぜ日本人はiPhoneが好きか？

実は中国人も日本人と同程度にiPhoneが好き

Making code reusable in classic ASP, PHP, etc.

Reusing functions

Reusing HTML fragments

Summary of the whole project

Basic ASP.NET web page structure

The Single-File Page Model structure

Encoding links easily

Code/Render blocks in ASP.NET

Passing complex data as arguments in Visual Basic

Summary

Other articles in this series

The .aspx files

Example of how encoding of an .aspx file would happen

ASP.NET configuration hierarchy

Telling ASP.NET what charset the source code files (.aspx files) are encoded in

Telling ASP.NET what charset the HTML output should be encoded in

Telling ASP.NET what charset the request parameters are encoded in

Encoding settings for a Shift-JIS encoded website

Summary

Other articles in this series

The `.aspx` files

Example of how encoding of an `.aspx` file would happen

Telling ASP.NET what charset the source code files (`.aspx` files) are encoded in