tesseract-ocrをためしましたが何か？

tesseract-ocrという画像から文字を認識する処理を試しました
かなりお手軽に実験できます

・ソースコードを取ってくる
# wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz
・日本語の翻訳ファイルと英語のファイル（これがないと怒られる）
# wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.jpn.tar.gz
# wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz

・解凍
tar -zxvf tesseract-ocr-3.02.02.tar.gz
tar -zxvf tesseract-ocr-3.02.jpn.tar.gz
tar -zxvf tesseract-ocr-3.02.eng.tar.gz

・ビルド(INSTALLを読んでの手順）
#./autogen.sh
・おっと、私の環境では(ubuntu 13.10)
libtool　とかleptonicaが無いと怒られましたので

# apt-get install libtool libleptonica-dev
・再度ビルド
# ./autogen
# ./configure
# make
# make check
・上手くいったっぽいので
# make install
・で早速sampleを(eurotext.tif)をキャプチャしようとすると
# tesseract ./eurotext.tif testDoc
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your “tessdata” directory.
Failed loading language ‘eng’
Tesseract couldn’t load any languages!
Could not initialize tesseract.

・で、/usr/local/share/tessdata/ に英語と日本語の翻訳ファイルを置きます
# sudo cp -i tesseract-ocr/tessdata/jpn.traineddata /usr/local/share/tessdata/.
# sudo cp -i tesseract-ocr/tessdata/eng.traineddata /usr/local/share/tessdata/.

・実行してみました
# tesseract ./eurotext.tif testDoc

こいつが

こんな風に
testDoc