javaで作るwebアプリで扱う文字コード（日本語）について

Question

javaで作るごく普通のwebアプリで扱う文字コード（日本語）についての質問です。
プレゼンテーション層－ビジネスロジック層の２層アーキテクチャで、データベースは
無く、ビジネスロジック層で入力ファイルを処理して結果を画面に出すというアプリです。
javaのデフォルト文字コードはUTF-8等だそうですが、例えばクライアントが
Linuxマシンの時は、入力テキストファイルは普通、EUC-JPと思います。
以下、２点質問です。
１．この時、EUC-JPのテキストからUTF-8への変換は、JVMが自動的にするのですか？
　　Yesだったら、クライアント側はプラットフォームの種類にかかわらず、
　　文字コードを意識する必要はないのでしょうか。
　　Noだったら、一般的にはどこがするのですか？
２．１の答えがどちらにしても、クライアントのOSのデフォルトのコードを
　　UTF-8に変更する必要は無いですよね？関連する他のアプリケーションやミドルウェア
　　がUTF-8で動作するのかの検証は必要ないですよね？
　　もし認識が違っておりましたら、教えて下さい。

サーブレットとか画面周りの知識が無いので、質問の仕方がおかしいところが
ありましたら、申し訳ありません。

erichgumma · Accepted Answer

Bruce Eckel, "Thinking in Java (4th Edition)" (Prentice Hall, 2006)
によると、p.922～p.923に、次のようにあります。

Java 1.1 made some significant modifications to the fundamental I/O stream library.
  When you see the Reader and Writer classes, your first thought (like mine) might be that these were meant to replace the InputStream and OutputStream classes.
  But that’s not the case.

Although some aspects of the original streams library are deprecated (if you use them you will receive a warning from the compiler), the InputStream and OutputStream classes still provide valuable functionality in the form of byte-oriented I/O, whereas the Reader and Writer classes provide Unicode-compliant, character-based I/O. In addition:

1. Java 1.1 added new classes into the InputStream and OutputStream hierarchy, so it’s obvious those hierarchies weren’t being replaced.

2. There are times when you must use classes from the “byte” hierarchy in combination with classes in the “character” hierarchy.  To accomplish this, there are “adapter” classes:   
  InputStreamReader converts an InputStream to a Reader and OutputStreamWriter converts an OutputStream to a Writer.

The most important reason for the Reader and Writer hierachies is for internationalization.
  The old I/O stream hierarchy supports only 8-bit byte streams and doesn't handle the 16-bit Unicode characters well.

Since Unicode is used for internationalization (and Java's native char is 16-bit Unicode), the Reader and Writer hierarchies were added to support Unicode in all I/O operations.
  In addition, the new libraries are designed for faster operations than the old.

＞Linuxマシンの時は、入力テキストファイルは普通、EUC-JPと思います。

私の環境ではUTF-8です。

＞関連する他のアプリケーションやミドルウェア
＞がUTF-8で動作するのかの検証は必要ないですよね？

あります。（レガシーなものを使う場合等）

もっとも、こういう文字コードの問題を解決するためにUnicodeが多くの方々の多大の努力によって開発されたにも関わらず、未だにUnicodeを使わずに、単にWindowsマシンだというだけで、ShiftJISを使う人が未だにあとをたたないのは、全くはた迷惑な話で、改善していって欲しいもんです。

askaaska · Answer

1.
自動では、やってくれないわ。
一般的にはファイルを読み込むStreamに指定するのよ。

2．
コード内で処理できるから
わざわざOSの設定をいぢる必要はないわ。
でも他のミドルウェアと連携する場合は
きちんとテストしてね。
作り手として当然の作業だわ。

javaで作るwebアプリで扱う文字コード（日本語）について

Bruce Eckel, "Thinking in Java (4th Edition)" (Prentice Hall, 2006)

1.

似たような質問が見つかりました

関連するカテゴリからQ&Aを探す

デイリーランキングこのカテゴリの人気デイリーQ&Aランキング

マンスリーランキングこのカテゴリの人気マンスリーQ&Aランキング

　Bruce Eckel, "Thinking in Java (4th Edition)" (Prentice Hall, 2006)