ローマ数字の変換をしたいのですが

締切済

質問者：niitan
質問日時：2004/12/21 18:42
回答数：2件

言語：perl5.00404

機種依存文字である、ローマ数字を変換したいのですが、
例：ローマ数字の１～１０を、I,II,III,IV,V,・・・と変換したい。　

試みた方法としては、
&jcode::tr(\$str, "\xAD\xB5", "I");
&jcode::tr(\$str, "\xAD\xB6", "II");
&jcode::tr(\$str, "\xAD\xB7", "III");
&jcode::tr(\$str, "\xAD\xB8", "IV");
&jcode::tr(\$str, "\xAD\xB9", "V");

jcodeを使用して変換。
この方法だと、ローマ数字の１～３は、全て"I"としか
変換してくれず困っています。
（1文字目しか変換されないようなのです。）

これではいけないと考え、正規表現で以下のように試みたのですが、

$eucpre = qr{(?<!\x8F)};
$eucpost = qr{
(?=
(?:[\xA1-\xFE][\xA1-\xFE])* # JIS X 0208 が 0文字以上続いて
(?:[\x00-\x7F\x8E\x8F]|\z) # ASCII, SS2, SS3 または終端
)
}x;

$str =~ s/$eucpre(?:\xAD\xB5)$eucpost/$1I/g;
$str =~ s/$eucpre\Q\xAD\xB5\E$eucpost/$1I/g;

$str =~ s/$eucpre(?:\xAD\xB6)$eucpost/$1II/g;
$str =~ s/$eucpre\Q\xAD\xB6\E$eucpost/$1II/g;

$str =~ s/$eucpre(?:\xAD\xB7)$eucpost/$1III/g;
$str =~ s/$eucpre\Q\xAD\xB7\E$eucpost/$1III/g;

$str =~ s/$eucpre(?:\xAD\xB8)$eucpost/$1IV/g;
$str =~ s/$eucpre\Q\xAD\xB8\E$eucpost/$1IV/g;

$str =~ s/$eucpre(?:\xAD\xB9)$eucpost/$1V/g;
$str =~ s/$eucpre\Q\xAD\xB9\E$eucpost/$1V/g;

これだとperlのバージョンが対応していない(perl5.005以上だとできる）のでこの策もだめで、困り果てています。どなたかよい方法を教えてください。

通報する

この質問への回答は締め切られました。

質問の本文を隠す

回答 (2件)

最新から表示
回答順に表示

No.2

回答者： leaz024
回答日時：2004/12/22 15:37

その正規表現による方法は、参考URLの「Perlメモ：正しくパターンマッチさせる」で紹介されているものですが、そこにはPerl5.005より前の環境でも利用可能な方法も載っていますので、そちらを参考にされるとよいでしょう。

参考URL：http://www.din.or.jp/~ohzaki/perl.htm#JP_Match

- 0
- 件

通報する

この回答へのお礼

ご回答ありがとうございます。

試みたのですが、不可解な現象にぶちあたりました。

【正常に動作】
$ascii = "[\x00-\x7F]";
$twoBytes = "[\x8E\xA1-\xFE][\xA1-\xFE]";
$threeBytes = "\x8F[\xA1-\xFE][\xA1-\xFE]";

$pattern = "(2)";
$replace = "II";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(3)";
$replace = "III";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(4)";
$replace = "IV";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(5)";
$replace = "V";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

【Internal Errorとなってしまう】
$ascii = "[\x00-\x7F]";
$twoBytes = "[\x8E\xA1-\xFE][\xA1-\xFE]";
$threeBytes = "\x8F[\xA1-\xFE][\xA1-\xFE]";

$pattern = "(1)";
$replace = "I";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(2)";
$replace = "II";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(3)";
$replace = "III";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(4)";
$replace = "IV";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

$pattern = "(5)";
$replace = "V";
$str =~ s/\G((?:$ascii|$twoBytes|$threeBytes)*?)(?:$pattern)/$1$replace/g;

ローマ数字の(1)を変換しようとするとエラーになってしまうのですが、なぜでしょう？