Here is a pet package called puppyocr that is based on the well-established Tesseract optical character recognition engine. Pet is 1.9MB.
I have included a command-line interface wrapper written in C++ that makes the task of using OCR a bit friendlier than the raw Tesseract command-line interface.
To use it, install the pet and then type "puppyocr" (no quotes) in a terminal and simply follow the prompts there.
When asked for, type the name (including the extension) of a 'tif' file that is stored in your home folder.
You can use the MTPAINT or Gimp software programs to create a 'tif' image file from a scanner.
When prompted for the name of the output file, you can use any name you like. This output file will be created in your home folder but with a 'txt' extension.
edit: replaced with updated version that checks for a suitable input file. If none found program exits with a warning.
Enjoy!
OCR for Puppy Linux
OCR for Puppy Linux
Life is too short to spend it in front of a computer
- TheAsterisk!
- Posts: 406
- Joined: Tue 10 Feb 2009, 08:52
Newer version available here:
http://akita.scottjarvis.com/puppyocr-1.22.pet
And mirrored here:
http://smokey01.com/saintless/Fredx181/ ... r-1.22.pet
Maybe this will also help (Edit: No, the download links do not work there):
http://www.murga-linux.com/puppy/viewto ... 7f975e7829
http://akita.scottjarvis.com/puppyocr-1.22.pet
And mirrored here:
http://smokey01.com/saintless/Fredx181/ ... r-1.22.pet
Maybe this will also help (Edit: No, the download links do not work there):
http://www.murga-linux.com/puppy/viewto ... 7f975e7829
Merci saintless
pet stored in my tool case.
With PuppyOCR i read old documents from 1800 to 1900, about history of france, In spite of errors, 95 percent of text is recognized. Don't want too much. fifteen lines are enough . the whole page don't suit. often these docs were scanned from books end edges are trunked.
PuppyOCR does as it can, but as much as others
With PuppyOCR i read old documents from 1800 to 1900, about history of france, In spite of errors, 95 percent of text is recognized. Don't want too much. fifteen lines are enough . the whole page don't suit. often these docs were scanned from books end edges are trunked.
PuppyOCR does as it can, but as much as others