Inputting Japanese text in Linux and some BSDs

First, the disclaimers--Japanese in Linux and the BSDs constantly improves, and parts of this page are often deprecated. If you find any errors here, feel free to send an email to scottro11[at]gmail.com.

This page is now listed in the scim-anthy README.

As I get busier, it becomes harder to keep up with various distributions. So if you see that something hasn't been updated in several months, (for instance, the last time I went back to Vector, after a 6 month hiatus, I saw that scim-anthy is now available from their repositories) you should probably check for yourself if the recommended packages are in that distribution's package list. If a section doesn't have a Last updated line, hopefully, it means that I specified the version I was using in the section (such as CentOS 4.3).

As the number of distributions increase, in many cases, you'll be better off going to their forums to see if there's a Japanese howto for that particular distro.

If you read this page, you'll see that almost all of the information has been gathered from others--in other words, if you have a problem and write me, I will help if I can, but I don't know that much about it.

A few quick introductory links: For a far more detailed treatment of this subject, see Dr. Mike Fabian's page on the Suse website. Charles Muller's site has a page on Japanese in Mandrake and David Thiel's page on Japanese in FreeBSD is brief, but very easy to understand and useful. David Oftedal has a page about Internationlization in Gentoo Linux which covers several languages.

JWS has an excellent page on multi-lingual text in Linux (which is again referenced in the printing section of this page.)

Note that this article only covers using Japanese in X.

I most frequently use Japanese in an xterminal with things like vi and mutt as well as OpenOffice. Therefore, these are often the only things that I checked. In most cases, if it works in these applications, it will also work with firefox, thunderbird and the like.

The kinput2 cannaserver combination used to be the input method of choice. These days, scim and anthy have become far more popular. With distributions I have tested, I will give information on using scim and anthy. There are some distros that don't have scim-anthy or uim-anthy or uim-scim packages. Although I give information below about compiling scim, anthy and scim-anthy from source, some of the more newcomer friendly distributions have trouble with compiling source code. In such cases, if these distros do have packages for kinput2 and canna you might be better off using that.

RedHat and its offshoots

Fedora Core 4 and up have scim-anthy available from yum. However, if you choose Japanese support during installation, Fedora 4, 5 and, I believe 6, would install Canna.

Canna covers converting hiragana to kanji, kinput2 is the input method that was most commonly used with it.

Actually, I prefer to install it without Japanese support, since I can install what I need afterwards.

The instructions below are for FC-4, 5 and 6, if you have chosen to install with Japanese support. (If you haven't installed with Japanese support, then ignore the parts about stopping and removing canna and see the end of the section which covers installing fonts. Otherwise, the instructions are the same. If you're using FC-7, you can again ignore the instructions concerning Canna whether or not you installed with Japanese support.)

Canna may be running at startup. So, first stop it

pkill cannaserver

Uninstall it

yum remove Canna

(Note that it's case sensitive, the upper case C is necessary).

Install anthy, scim and scim-anthy

yum install scim-anthy
This will pull in scim and anthy as dependencies.

Fedora's default is to boot up in graphic mode. If you boot in console mode, you can add the following to your .Xclients file, otherwise, add these lines to your .bash_profile. (For the absolute newcomer, these two files are located in your home directory. That is, if your user name is john, they'll be found in /home/john, often referred to as $HOME or even ~. Note that they are dotfiles, that is, they have a period in front of them, so if using one of the graphic text editors, you might have to specify that it show hidden files or show all files. I haven't used such editors in years, so I'm not quite sure of the latest way to do that.)

export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"
export LC_CTYPE=ja_JP.utf8
scim -d

To put them into effect immediately

source .bash_profile

You should see a message that scim is running. (However, this doesn't always work--if it doesn't, just log off and log on again.)

With UTF-8 as your default encoding, theoretically, you shouldn't even have to set LC_CTYPE to ja_JP.utf8. You should be able to use your own language, e.g en_US.utf8. My experience has been that sometimes this works, and sometimes it doesn't. I would actually try setting LC_CTYPE to your own language (in my case, en_US.utf8) and seeing if things work properly. If they don't, then change LC_CTYPE from your native language to ja_JP.utf8.

Now when you start most applications, hitting ctrl+space will open up a little scim panel in the lower right of the screen. If you enter english text, you will see hiragana appear. If you hit the space bar, it will select kanji. Note that the panel should have the word Anthy on it. If it doesn't click the words RAW CODE or English whatever and you should have an option for Japanese=>Anthy.

Using scim-anthy you should be able to use Japanese in most applications. If you have trouble inputting Japanese in an xterm, if, for example, you're using vi or mutt, use uxterm. (This can be called by simply typing uxterm from any command line.)

At some point, I got into a habit of only calling these variables when I needed them, and made a little lang.sh script.

#!/bin/sh 
XMODIFIERS='@im=SCIM' LC_CTYPE=ja_JP.UTF-8 ${1+"$@"} &

Then, I might call mutt, for example, with

lang.sh mutt

Whether or not this saves on resources, it became a habit.

If you choose to do it this way, it's not necessary to have the XMODIFIERS and LC_CTYPE lines in .bash_profile.

I've found that I can usually get away with leaving out the GTK and QT IM_MODULES variables, although it's probably better to include them.

This can be a bit confusing as it varies from distro to distro and application to application. Sometimes, one doesn't even need the LC_CTYPE line, if your distro's default is en_US.utf8. One has to play with the variables and see what works for their distribution or O/S of choice, as well as their favorite applications.

If you wish your menus and the like to be in Japanese as well, you can add, either to the lang.sh script or your .bash_profile

LANG=ja_JP.utf8

Now, most applications will also work in Japanese--some things may show up as mojibake (gibberish) but you will be able to use Sylpheed, xchat (an irc client) etc in Japanese without problem. You'll also be able to input kanji as text in GIMP.

On rare occasions, I've found that the Ctl+space hotkey combination wouldn't open up the scim widget, although clicking on the icon that would come up when it was started would work. Scim creates a $HOME/.scim/config file which should have the line
/Hotkeys/FrontEnd/Trigger = Control+space

If that line is missing from the config file, add it. For FC-3 and under users, I suspect the easiest course of action is to use kinput2 and canna. Both should be available on one of the CDs, but if not, use yum. These instructions are untested, as I don't have any FC-3 installations anymore, but hopefully, they will work.
yum install kinput2 Canna

In this case, the lines in .bash_profile should read
export XMODIFIERS="@im=kinput2"
export LC_CTYPE=ja_JP.UTF-8

If you didn't choose Japanese support at installation, you will also need fonts. In Fedora, CentOS and Blag, I've found that doing
yum search fonts-ja

will give me some choices. Once fonts are installed, there should be no problem. In CentOS, I chose ttfonts-ja, I've forgotten the other choices and if that was their name in Blag.

CentOS and Japanese

The CentOS-4.3 yum repository doesn't have scim or anthy. However, if one adds Karan Singh's repository, they are available and one can simply follow the instructions given above. CentOS-5 does have scim-anthy, so the following instructions are only necessary for earlier versions.

To add Mr. Singh's repository
cd /etc/yum.repos.d/
wget http://centos.karan.org/kbsingh-CentOS-Extras.repo

The url is current at time of writing (July 2006) however, the reader is advised to check the Desktop Users Guide for CentOS 4 from which I got the information.

Like Fedora, if one chooses Japanese support at installation, fonts, kinput2 and canna are installed (with canna running at startup). If one doesn't choose Japanese support at installation time, then do a yum search for fonts-ja as detailed above.

Gentoo Linux and Japanese

(Last update, June 2006)

I used to have an entire section on Gentoo. However, as I don't use it these days, when updating this page, a bit of research indicated that my method was entirely deprecated. Aside from the link to David's page at the top, the reader can also check this thread on Gentoo Forums.

Kevin W. (AKA sandcrawler on Gentoo Forums) was kind enough to send me his mini Gentoo howto.

He added the following USE variables

immqt-bc nls cjk unicode

Then

emerge --newuse world

Emerge the necessary programs

emerge scim anthy scim-anthy scim-qtimm

He added the following to his .bash_profile

export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"
export LC_CTYPE=ja_JP.utf8
scim -f socket -c socket -d

(If not booting into X, you might leave off the scim line and put it in .xinitrc or whatever file you use to start X.)

This enables him to input Japanese in most applications.

Debian and Japanese

(Last tested with Ubuntu, February 2008)

Every once in awhile, I throw Ubuntu on something to see how they're doing with their number one bug, that Windows is more popular. They've now succeeded in making Japanese more difficult than it was 2 years ago. :). The following works on Hardy Heron Alpha 4 (so I should be fair, it is an alpha after all.)
sudo apt-get install scim-anthy im-switch

I add the following to my .bashrc file.
XMODIFIERS='@im=SCIM'
LC_CTYPE=en_US.utf8

You might have to add Japanese language support--in my case, it was already installed. Go to System, Administration, Language Support and make sure that Japanese support is installed. If not, then check it off and install it. Nowadays, you also have to run the im-switch (which you didn't have to do a couple of years ago, hence my comment that it's become more difficult.)

im-switch -z en_US -s scim

You can run scim as daemon when you log in by adding scim -d to your .bash_profile or .bashrc or simply run it when you need it, by typing scim -d at any command prompt. After that, in most applications, hitting ctrl+space should open up the scim widget in the lower right hand corner. It should work in the default gnome-terminal, however, you will probably have to go to the terminal's menu. Choose Terminal, Set Character Encoding and choose Unicode. Otherwise, although it will input the characters, when you hit enter, you see question marks or other mojibake.

Another thing that can be confusing, at least in the Hardy Heron alpha, is that Japanese will show as a supported language on a default install. However, if you open scim, Japanese won't be shown as an available language for input. To enable it, you still have to install additional Japanese support.
sudo apt-get install language-support-ja

ArchLinux

(Last tested February 2008)
ArchLinux, which is the Linux distribution I use most, has packages for scim, scim-anthy and anthy. Add scim-anthy with pacman
pacman -S scim-anthy

(The scim-anthy package has both scim and anthy as dependencies and will install all three packages for you.) Once installed, set the XMODIFIERS and LC_CTYPE and call scim in your .xinitrc, before the line calling your window manager. For example, if your window manager is fluxbox

export XMODIFIERS="@im=SCIM"
export LC_CTYPE=en_US.utf8
scim -d
exec startfluxbox

There is a package for rxvt_ja which supports euc and also a package for rxvt-unicode. If you install rxvt-unicode, it's called with the command urxvt.

However, these days, I much prefer mlterm or xterm's builtin uxterm. ArchLinux also has some truetype fonts for Japanese.
pacman -S ttf-arphic-uming ttf-arphic-ukai
Desired locales should be created using locale-gen. Open /etc/locale.gen and you will see a list of locales, commented out with a # sign. Uncomment the ones that you want, for example, en_US.UTF8 UTF-8, and ja_JP.UTF8 UTF-8. Then run
locale-gen

You should see a message that the desired locales were created.

There was a thread on the ArchLinux forums started by someone who had better luck using uim instead of scim for Japanese input. For those who would prefer to use uim, the thread can be found here.

Installing from source.

If your distribution doesn't have a package for scim, anthy and scim-anthy, they can easily be installed from source.

Scim can be downloaded here, and anthy here. Note that the anthy link sends you to a download selection page. You want the latest version of anthy, not anthy-ss. At time of writing, it's 7900.

The scim-anthy source can be found here.

Once downloaded, untar and install the three programs. Install anthy first, then scim, and scim-anthy last. In each case, the commands are the same. The versions given in these examples are current at time of writing, change the command to fit the version you download.

tar -zxvf anthy-7900.tar.gz
cd anthy-7900
./configure --prefix=/usr && make && make install

Do the same for scim and scim-anthy in that order. Restart X and you should be able to call up scim input in any program by hitting ctrl+space.

You will also want Japanese fonts, especially if you are using Japanese in something like OpenOffice. Subsitute kochi truetype fonts can be found from download.sourceforge.jp. You want the package kochi-substitute-20030809.tar.bz2.

Download it and untar it.

tar -jxvf kochi-substitute-20030809.tar.bz2

This will create a kochi-substitute-20030809 directory. You will see the kochi-mincho and kochi-gothic substitute fonts. They have a .ttf ending.

Move the fonts to /usr/X11R6/lib/X11/fonts/TrueType or /usr/X11R6/lib/X11/fonts/TTF if there is no TrueType directory.

(These are the typical directories called by the FontPath section in /etc/X11/xorg.conf. Doublecheck your system's xorg.conf and if the FontPath is different than the above, use that path.)

Slackware and some Slackware based distributions

(Last updated November, 2006)
Slackware worked without problem when I installed anthy, scim, and scim-anthy from source I also installed the kochi fonts.

However, with one of its offshoots, Vector, at first I would open, for example, an mlterm session. I hit ctl+space and the scim panel appeared. I then entered romaji, but rather than seeing hiragana, I saw dotted squares. If I typed correctly, and hit space (for example, typing nihongo and hitting space once), the word nihongo, in kanji, would appear, however, I didn't see this until I hit enter.

The scim faq indicates that this is because scim isn't finding the fonts it needs. I am not sure what packages were missing--however, choosing to install gimp during the initial installation fixed the problem. Afterwards, even if I deinstalled gimp, it would still work properly

Vector's default editor, like Slackware's, is elvis, which didn't work properly. I had to grab the Slackware package for vim and install it. I used a Slackware CD that I had, but if you don't have one, go to Slackware's package search site. I used the version from 10.1, which may change by time of writing.

As of November, 2006, Vector has a scim-anthy package, which pulls in scim and anthy. I didn't see a font package, and used those kochi fonts I've mentioned above, that I manually retrieved from sourceforge.

kinput2 with canna.

In some cases, the scim-anthy combination might not work or not be available for your distribution.

Two programming friends, Godwin Stewart and Stuart Bouyer (who has done a great deal of work on Japanese input packages for Gentoo Linux) made me a tarball of a modified kinput2 and canna installation. It is not perfect--when one starts cannaserver, you see the message Terminated. However, doing pgrep cannaserver shows that it is running and it works perfectly for me.

The tarball is available from qnd-guides.net.

Thanks to the generosity of the Tokyo Linux User Group it is also available on their site. To use it, first download and untar it. You will see two gzipped files there, one for Canna and one for kinput. Install canna first as the kinput file will be looking for it.

tar -jxvf vanillajpn.tar.bz2
tar -zxvf Canna36p1.tar.gz
cd Canna36p1
xmkmf
make Makefile
make canna
make install
make install.man

When done, you'll have a file /usr/sbin/cannaserver.

Now kinput2

tar -zxvf kinput2-v3.1-beta3.tar.gz
cd kinput2-v3.1.-beta3
xmkmf
make Makefiles
make depend
make 
make install

As every distribution has its own way to make a program run at startup, that is an exercise I will leave to the reader. For example, in Slackware, you can add a few lines to /etc/rc.d/rc.M. As I said, you will see, after starting /usr/sbin/cannaserver the word Terminated. However, it can be ignored.

You will need a terminal that can display Japanese. As mentioned above, one can use the builtin uxterm. The mlterm and rxvt-unicode programs also work.

Add these lines to your .xinitrc above the line that calls your window manager.

export XMODIFIERS='@im=kinput2'
export LC_CTYPE=ja_JP.utf8
kinput2 -canna &

This should enable you to input Japanese in most programs.

A quick note on the LC_CTYPE variable. In most Linux distributions, it is ja_JP.utf8, however, some distros have it differently, and it is case sensitive. To check it type

locale -a | grep ja_JP

If you get something like ja_JP.UTF-8 use that rather than utf8.

If you get an error similar to "Unable to set locale" that is often the reason, you have it as, for example, utf8 and the system is looking for UTF-8.

To sum up, most people consider the scim-anthy combination better than kinput2 and canna. If your distribution doesn't have packages for scim and anthy, you can download and install them, following the instructions given above. If they don't work for you, then use the kinput2 canna combination, using the vanilljpn tarball, for I have found that to work in almost every distribution that I have tried.

Despite there being over 400 Linux distributions, most of them seem to be based on RedHat, Debian or Slackware so the instructions above should work for almost every distribution.

FreeBSD

FreeBSD has a scim-anthy combination. If one installs the scim-anthy port it installs both scim and anthy.
cd /usr/ports/japanese/scim-anthy
make install clean

There is a package message, suggesting setting the LANG variable to ja_JP.eucJP. However, I haven't found this necessary.

In your .xinitrc file

export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"
export LC_CTYPE=ja_JP.UTF-8
scim -d

One will need a terminal capable of displaying unicode. There is the builtin uxterm, mlterm and rxvt-unicode. One oddity I have found is that if I try to type nihongo directly into one of these terminals, it may not display correctly. However, if one tries to cat a text file written in Japanese, it will display the file correctly.

FreeBSD's vi is nvi. I haven't gotten this working properly with Japanese, so I install /usr/ports/editors/vim-lite. One can create an alias by editing their shell's rc file. For example, I use zsh, so in my $HOME/.zshrc file I have

alias vi=vim

For OpenOffice and the like, I need Japanese fonts. I use the the substitute kochi fonts in /usr/ports/japanese/kochi-ttfonts.

NetBSD

(Last updated November 2006)
NetBSD doesn't yet have scim or scim-anthy in pkgsrc. However, scim and scim-anthy are available in their Work In Progress collection. On my machine, the scim-anthy package failed to build, due to unable to allocate memory error in gcc, however, googling indicated that adding)
UNLIMIT_RESOURCES=	datasize

to the scim-anthy Makefile would fix the problem. I tried that solution and it worked.

If you want to stick with pkgsrc they do have anthy and uim. To use that combination
cd /usr/pkgsrc/inputmethod/anthy
make install clean; make clean-depends
cd /usr/pkgsrc/inputmethod/uim
make PKG_OPTIONS.uim="-canna" install clean; make
clean-depends

(This will install uim with anthy and gtk).

Add the following to your .xinitrc above the line calling your window manager.
export XMODIFIERS=@im=uim
uim-xim --engine=anthy &
After starting X, you can then enter Japanese text by hitting shift+space. You turn off Japanese input in the same manner, hitting shift+space again.

Although NetBSD 3.x has en_US.UTF-8 as a locale, they don't have ja_JP.UTF-8. I've had mixed results using en_US.UTF-8 as my LC_CTYPE. Sometimes, if you set LC_CTYPE to en_US.UTF-8 though you can input Japanese, after hitting enter, all that appears are blank squares. You can create a ja_JP.UTF-8 locale by downloading en_US.UTF-8.src from
ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/share/locale/ctype/en_US.UTF-8.src

and then using mklocale. If you downloaded it into your home directory, as root or with root privilege (assuming your user name is joe)
cd /usr/share/locale
mklocale < /home/joe/en_US.UTF-8.src > ja_JP.UTF-8

Hopefully, then a locale -a | grep ja_JP will show ja_JP.UTF-8.

I've found this was necessary to get UTF-8 working with, for example, thunderbird, though it wasn't necessary to input Japanese in a terminal.

In NetBSD, I've never gotten mlterm working properly, so I use rxvt-unicode. I haven't researched this deeply, but the unicode fonts in urxvt aren't as clean as the fonts used by, say, rxvt with eucJP encoding. The choice is up to the reader.

For eucJP encoding, I use either rxvt or mrxvt. If you use rxvt, after it's installed, there is a message telling you that double-byte encoding is disabled by default. You then have to edit /usr/pkg/lib/X11/app-defaults/Rxvt. You will see several lines marked !Rxvt.multichar_enoding
One of them has eucj at the end of it. Take out the ! at the beginning of the line. (Also, put a ! at the beginning of the top line, which ends with noenc).
If you use mrxvt then edit /usr/pkgsrc/x11/mrxvt/Makefile. You will see a section of CONFIGURE_ARGS+= enabling xft, text-shadow and the like. Add
CONFIGURE_ARGS+=	--enable-xim
CONFIGURE_ARGS+=	--enable-cjk
CONFIGURE_ARGS+=	--with-encoding=eucj

You may, of course, choose to use kinput2 and canna with NetBSD. If so
cd /usr/pkgsrc/inputmethod
cd canna; make install clean
cd ../kinput2 make PKG_OPTIONS.kinput2="-wnn4 -sj3" install clean

When done, you'll have a /usr/pkg/sbin/cannaserver as well as kinput2. Cannaserver should be started as daemon upon the next reboot. (You'll see that it also provides a script in /usr/pkg/local/rc.d)
Once again, set your variables in your .xinitrc.

export XMODIFIERS='@im=kinput2'
LC_CTYPE=ja_JP.eucJP
kinput2 -canna &
Like FreeBSD, I've found that I have to use vim rather than vi.

You may get an error when trying to start kinput2. It will say it can't load the app-defaults file and that XFILESEARCHPATH might be set incorrectly. This can also be added to .xinitrc however, be sure to add it ABOVE the kinput2 -canna & line.

export XFILESEARCHPATH=/usr/pkg/lib/X11/app-defaults/Kinput2

DragonFlyBSD

(Last updated June 2006)
DragonFlyBSD now uses NetBSD's pkgsrc collection for third party software. The instructions above, for NetBSD, also work with DragonFly. Just use bmake instead of make when installing the various packages. DragonFly does have ja_JP.UTF-8 installed by default, so the reader can ignore the part about using mklocale.

A Digression about Terminals and UTF.8

Not all people need or want Japanese in an xterm. However many do. For example, I use mutt, so to use Japanese in an email, I need a terminal capable of handling Japanese.

Unfortunately, my favorite terminal, aterm, doesn't handle Japanese by default. Debian has an aterm-ml package, but it doesn't do on-the-spot conversion--that is, if one enters Japanese, a window appears underneath the term window. It works, but I find that annoying.

There is a patch to get aterm working with Japanese. The original site seems to have disappeared, however, I have a bzipped copy of the patch here.

The latest version of aterm is 1.0, however this patch is for the previous version, 0.4.2.

If you wish to use aterm with Japanese, you might choose to do it this way rather than use your distribution's version.

The 0.4.2 version can be downloaded from sourceforge.

Unzip the patch

bunzip2 aterm-0.4.2.-ja.patch.bz2

Decompress and untar the aterm source.

tarl -zxvf aterm-0.4.2.tar.gz

Move the patch into the newly created directory, CD into the directory and apply it.

mv aterm-0.4.2-ja.patch aterm-0.4.2/
cd aterm-0.4.2
patch -p1 < aterm-0.4.2-ja.patch

Run the configuration script, make and make install

./configurej --enable-kanji --enable-xim --enable-fading
make 
make install 

The problem is that aterm can't handle unicode. I have to use it with euc. So, I call it with a script that changes the LC_CTYPE to euc. In FreeBSD, the script reads

#!/bin/sh 
XMODIFIERS='@im=SCIM' LC_CTYPE=ja_JP.eucJP ${1+"$@"} &
#!/bin/sh

(For those with an interest in shell-scripting they can find a detailed explanation of the ${1+"$@"} here. It was written by Cameron Simpson who has frequently helped me with scripting questions.)

I recently received an email from an aterm user who has created a patch that works with aterm-1.x. (However, not with UTF-8, though I think that this is an aterm issue rather than the patch.)

This particular developer (calkina [at] geexbox.org) also hopes to work on adding UTF-8 support, but like many developers finds he has a lack of Copious Free Time(TM).

The patch is in CVS. The instructions for pulling down the CVS code are given on the aterm home page but at time of writing it's simply
cvs -d :pserver:anonymous@cvs.aftercode.net:/home/cvsroot co aterm1

This will pull down the source code and put it in a directory called aterm1. To install it
./configure --enable-kanji --enable-xim; make; make install

After that if you call aterm with ja_JP.eucJP (in FreeBSD) you can input Japanese and have the various other features of aterm 1.0.

Note that the patch doesn't alter the README.configure file, so if you look at README.configure, you won't see any mention of --enable-xim, but you can find it in the Makefile.

The creator of the patch suggests that if you like the patch, email aterm's maintainer (whose email address can be found at the aterm home page link given above) and request that it be included in the next release.

I use fluxbox as my window manager, and have an entry in my .fluxbox/keys file so I can call aterm with this script with a simple key combination. As you become more experienced, you may wish to experiment with different xterms and window managers.

UTF-8 is rapidly becoming the default for Asian language in *nix. However, euc is still popular. Most browsers and email clients that can handle Asian languages are able to read both encodings.

Although most browsers can read it, you might have to manually select it. In opera, firefox and mozilla, it's in View => encodings. Although there is an autoselect for Japanese, it doesn't always work. If you get a page in Japanese that seems to be mojibake, then try different encodings, including Unicode (which isn't in the Japanese section) and one should work.

Dark Prince from bsdnexus.com forums was kind enough to send the following. If you are creating a web page with Japanese in UTF-8, this code should make the viewer's browser use UTF-8 on the page. He tested this on apache, but it should work with any server that can use php. At the top of the page put

<?php header('Content-Type: text/html; charset=UTF-8'); ?>

Again, this will only work if your server has php enabled. Many ISP provided web pages don't support php.

Then, code that tells the browser to read UTF-8. (Dark Prince says this may not be necessary, but it probably can't hurt.) This code should be between the <head> </head> tags

<meta HTTP-EQUIV="content-type" CONTENT="text/html;charset=UTF-8">

Having the meta tags will not be sufficient to make the viewer's browser use UTF-8, the php code is necessary. However, if the page is in straight html, the meta HTTP-EQUIV tags should be enough. (Martin Swift was kind enough to point out that I'd neglected them myself, causing some of the special characters on the page to display incorrectly. I've fixed it since. Thank you Martin.) :) p>

Lately, I've been playing with mrxvt. Again, there is no unicode support. If building from source, one needs to configure it as follows (in addition to any options you choose)

./configure --enable-xim --enable-cjk --with-encoding=eucj

If you are using FreeBSD, a patch I submitted has been accepted to add EUC input. When installing the port simply type

make -DWITH_JAPANESE install clean
Lately, it seems as if mlterm has become the most popular of the multilanguage terminals. We have a quick and dirty guide to it here.

Note that the guide suggests setting mlterm's font size to 14. I've found that if I use the default size of 16, if I set LC_CTYPE to ja_JP.UTF-8, the terminal becomes overly large. (I found this happened even if I set LC_CTYPE to en_US.UTF-8.) This doesn't seem to be distro or window manager specific. The QND guide mentioned above discusses setting mlterm with a transparent background. This is a matter of preference. Josh, who wrote the guide, has younger eyes than I do, but I prefer a gray background with black type. One can set the background at the command line, in .Xdefaults, or do as Josh suggests, creating a directory in your home directory called .mlterm and a file in that directory called main.

One note for others with aging eyes. Recently, it seems to me that urxvt's fonts have gotten smaller. I've found that setting the font size in .Xdefaults helps. I have this entry
urxvt*font: a14

To get a list of available fonts one can type xlsfonts at a command prompt.

Using Putty from Windows

Much of this was taken from this page from umiacs.umd.edu.

If you are using Putty to open an ssh session, you can still view Japanese encodings. On the Windows machine, you will have to install Asian Language Support. (Control Panel, Regional Settings or Regional and Language Settings.) You will probably need the Windows installation CD for this. You should also choose the option to Install files for complex script and right to left languages. (The link given above has several screenshots.). Windows will suggest you reboot after installation. Do so.

Open your putty session, right click on the title bar, and choose Change Settings from from the menu.
Go to Window, Appearance. Click the Change button in the Font settings section.
Choose MS Gothic or MS Mincho and Japanese as the script.
Go to the Translation section. In the dropdown box at top, choose UTF-8. Also check the box that says Treat ambiguous CJK characters as wide.

Once this is done, you should be able to view Japanese text in a putty ssh session from a Windows machine.

Printing

(Last updated January, 2008)
Several applications will translate a file to postscript level 2 (or possibly higher). Acroread, xpdf, OpenOffice, mozilla, firefox and seamonkey will all do this. With such applications, assuming you can already print from them, no further work is necessary and Japanese will print out as written.

Printing in *nix, can be non-trivial in itself. CUPS is making it easier when it works--when it doesn't work, one finds that they spend a lot of time searching google to find many people with the same error messages and few solutions. I have a few simple CUPS solutions on another page.

Depending upon distribution, installing OpenOffice can be a major undertaking. FreeBSD for example, has the development version as a port that requires over 9 gigs of free space to compile. Building the port can take 6-8 hours on a reasonably fast machine.

In OpenOffice, be sure to enable Japanese support under Tools, Options, Language Support, Languages.

To print Japanese one needs the fonts (I use the kochi fonts mentioned above). Once this is done, you can use spadamin to add the fonts. In FreeBSD, they'll be in /usr/X11R6/lib/X11/fonts/TrueType. In some distributions, the path is the same save that it's called truetype. You may have to be root or have root privilege to run spadmin.

In FreeBSD, at least, rather than using spadmin, I just either copy or symlink the fonts to /usr/local/openoffice(version)/share/fonts/truetype. without the fonts, you will be able to input Japanese in OpenOffice, but it won't print correctly.

One can use openoffice, firefox or seamonkey (as well as any other browser that does the postscript conversion for you) to print Japanese text files. Open the textfile in firefox, for example, and then print it.

Recently, looking through a page about UTF-8 I came across another solution for printing textfiles. The author mentions using the openoffice command with the -p option. For example, in FreeBSD, OpenOffice is called with openoffice.org. Suppose I have a text file in Japanese, called nihongo.txt
openoffice.org -p nihongo.txt

will print the text file. (Note that the openoffice command will vary between Linux distributions as well as the BSD's. Many distros use the command ooffice, others use soffice and no doubt other distros use something else.) This can be used with both UTF-8 and EUC encoded textfiles. This is simpler than using OpenOffice, as it will correctly print the file without having to open the OpenOffice application.

The author also mentions the paps program. It converts UTF-8 files to postscript. The mentions of FreeBSD below also apply to NetBSD.

It's available as a package in some distributions, though not in FreeBSD and requires pango. I found I was able to compile and install it in FreeBSD. It gave an error message installing the docs, (due to differences in the cp command between FreeBSD and Linux) but the binary installed without problem. As I didn't really need the docs, (it did install the man pages) I didn't fix it. If one is using FreeBSD and wants the docs then, after untarring the tarball and running config to generate the Makefiles, cd into the paps-{version}/doc directory, find the cp -dpR in the doc directory's Makefile, and change the cp -dpR to cp -pR. They are geared towards other programmers rather than end users. The end user will find the man page and --help option sufficient.

You may also need pkgconfig (sometimes, as in FreeBSD, called pkg-config) for paps to compile.

It didn't work for me with the very basic cups laserjet or deskjet ppds. I also needed the hpijs program, which provides more specific drivers for various HP printers.

Most distributions (and FreeBSD) have hpijs available as a package. If not, you can go to the HPLIP home page and install from source. The HPLIP package includes hpijs package.

Once the package is installed, you can modify your printer, using the cups web interface or lpadmin command to change your printer's ppd from the generic deskjet or laserjet to the hpijs driver for your printer.

Make sure you have some Japanese fonts. Sometimes, even though scim will input Japanese perfectly, if you don't have some specific Japanese fonts paps won't print correctly. If your distribution doesn't have fonts available, use the kochi substitute fonts from sourceforge.

To use paps to print a textfile called nihongo.txt
paps nihongo.txt | lp 

I use mutt, and paps works well with it. With mutt, hitting the pipe key, |, will, obviously enough, pipe the email to another command. If I want to print a Japanese email I would use
|paps|lp

which sends the mail to paps and from there to the lp command.

Evolution and Thunderbird do the postscript conversion on their own--in other words, they will print Japanese emails as easily as firefox prints a Japanese web page.

Sylpheed needs the print command set to feed the email to paps. Changing the standard lpr %s that they use for printing to
paps %s | lp

will enable Japanese emails to print. (This can be found in Configuration, Details, External Commands).

As JWS mentions in his page, if the file uses a different encoding, such as EUC, one can use iconv to first convert the file to UTF-8, then use paps. Using iconv with the -l (a lower case L, as in list) gives the program's naming conventions. For instance EUC encoding can be specified as EUC-JP or EUCJP (as well as typing the complete EXTENDED_UNIXCODE_PACKED_FORMAT_FOR_JAPANESE, but I doubt anyone would want to type that.)

The syntax is quite simple, one uses -f as in from -t as in to and the file name. So, let's say we wanted to print a textfile called nihongo_euc.txt. (This is in FreeBSD, where the iconv -l shows supported encodings in upper case. I'm not sure about Linux, it may be lower case).
iconv -f EUCJP -t UTF-8 nihongo_euc.txt | paps | lp

For me at least, paps was one of the final pieces that makes Japanese almost as easy to use in Linux and some of the BSDs as it is in Windows. (I think Mac stil has the edge as far as ease of Japanese use.)

Speaking of Mac, now that Apple bought CUPS, all of the above may become unnecessary. On Fedora 8, running cups-1.3.4, if I run lp nihon.txt it correctly prints the Japanese text. However, this wasn't the case on Ubuntu. So, whether this was something in Fedora, CUPS, or somewhere else, I don't know.

Romaji

While only indirectly related, there are times when when one wishes to use special characters when typing romaji. When writing for people studying Japanese, my own tendency is to imitate hiragana, and write, for example, juudou for the martial art. However, for people with no knowledge of the language, this can be somewhat confusing. Most systems that have locales will also have a Compose file for that locale. For example, in FreeBSD, there is a file /usr/local/lib/X11/locale/en_US.UTF-8/Compose. It will probably be elsewhere on a Linux system. That file will have several entries for special characters. Mine has things like
<Multi_key> <underscore> <a>  : "ā"   U0101 # LATIN SMALL
LETTER A WITH MACRON

(If that didn't look like a small letter a with a line over it, then your browser is probably using a different encoding. Go to View=>Encodings and choose UTF-8 and you should see (among other things) a lower case "a" with a line over it.)

This means that if I use the Multi or Compose key, then hit underscore, then a, I will get an a with a line over it. The problem is setting the compose key.

This can be done globally (for all users) by making an entry in /etc/X11/xorg.conf. For example, to use the right Windows key one could add
   Option "XkbOptions"  "compose:rwin"

(Finding the right name can be tricky as it varies on different systems. In FreeBSD, one can find the name in /usr/local/share/X11/xkb/keycodes/xfree86. Usually, you're looking for xkb/keycodes/xfree86.). Another way is to use the xev program, find the numeric code and add it to xmodmap. For example, if I want to use the menu key (usually to the right of the right hand Windows key) I run the xev program from an xterm which opens up a little box. (The xev program is usually installed by default, if not, it's readily available for almost all systems. In FreeBSD it's in /usr/ports/x11/xev.)

In the box, hit a key. In the terminal you used to call xev you'll see various things including (if we hit the right Windows key) keycode 116. Now, we create (if it doesn't exist) a $HOME/.xmodmaprc file. In it we add
keycode 117 = Multi_key

Next type
xmodmap ~/.xmodmaprc

If you now hit the right hand Windows key, then type _a, you'll see ā. So, I can use this to write, for example, jūdō.

Special thanks to Dr. Mike Fabian for all his help, as well as several other members of the Tokyo Linux Users Group (tlug).