Sunday, January 25, 2015

Screen Shot OCR

Recently I was trying to solve a problem at work, and I thought I'd share what I learned in hopes that it would benefit others. The problem is that I had some screen captures containing words (mostly medical terms), and I needed to convert those words into text. The first solution that came to mind was to simply use optical character recognition (OCR) software to extract text from the images. I had some screen shots where the text was approximately 14-point, and after testing a variety of OCR applications, I found that the best results were obtained with ABBYY FineReader. However, after trying to apply OCR to another screen capture where the text was approximately 9- or 10-point and stored within cells of a table, the same FineReader application failed miserably.

How could it be that a screen capture with such clearly legible text (certainly at least as legible or more legible than most text that has been passed through a scanner) was uninterpretable by OCR applications? It appears that mainstream OCR applications are designed to work with high-resolution images, whereas I was dealing with low-resolution anti-aliased text from screen captures which is a special use case for OCR. In fact, I found somebody's thesis and other research addressing this problem.

Fortunately, I found several applications designed to perform OCR against screen-captured text with small font sizes. For the most part, these applications are available only for Microsoft Windows. I applied the same process to test all applications: I opened a sample screen capture and used each app to interpret selected lines of text--this was done by using the OCR app to selectively highlight 1 row of text at a time (even though some of the apps also had options to interpret the entire screen shot). Here are my impressions of how these OCR applications performed against screen captures containing medical terminology.

  • Screen OCR (http://www.screenocr.com/, $29.95, 21-day free trial). I was very excited about this app since the first thing it did was build a font database which gave me the impression that it would fine-tune its accuracy by customizing its interpretation based on the font. I had nearly 700 fonts cataloged in its database, and the process took less than 5 minutes. However, when trying to apply the OCR, it produced only gibberish. I probably was doing something wrong, but after tinkering around with all the settings and reading the help documentation, I still couldn't figure out how to use it and simply gave up.
  • CaptureText (http://www.capturetext.com/, $29.95, 21-day free trial). Turns out that this is exactly the same application at Screen OCR. It did not build a font database, but I assume that is because Screen OCR already built one. The user interface, options, color scheme, and results (gibberish) were exactly the same as Screen OCR, so I guess the company is trying to sell the same product under 2 different names. Weird.
  • Boxoft Screen OCR (http://boxoft.com/screen-ocr/, $27, 15-day free trial). Unlike the other 2 OCR apps above, Boxoft Screen OCR did not build a font database, so I was able to start using it right out of the box. Overall accuracy was fair, as it made several common errors such as misinterpreting the "%" character as "'/o" and confusing "e" for "o" on occasion.
  • Capture2Text (http://capture2text.sourceforge.net/, free/open source). Overall accuracy was good, but the user interface was clunky, and access to the clipboard with the OCR-converted text was awkward.
  • ABBYY Screenshot Reader (http://www.abbyy.com/bonussr/, free full unlimited license). This OCR app had good accuracy. Although it did make some mistakes, I felt that the mistakes were reasonable in the sense that the text was of a higher level of difficulty to interpret, and the accuracy was the best of all the apps that I tested. Similar to Capture2Text, I had a hard time figuring out how to reveal the OCR-converted text in the clipboard, but perhaps there is a setting that I need to tweak to see the text in real time.
I should also mention that I came across Tesseract, but since it is an open source OCR engine without a user interface, I was unable to review it. I don't know if any of the above apps actually leverage Tesseract. In any case, if you are a developer, then maybe you can make use of it. I just don't know how.

So in summary, I feel that ABBYY Screenshot Reader is the king of the mountain for OCR use cases where you need to grab text of small font sizes from screen captures. Are you aware of other OCR apps that perform well against screen-captured text in small font sizes? Do they perform better than ABBYY Screenshot Reader? Please share, I would love to know!

1 comment:

  1. Nice article, Dr.Lee. Gives a good overview of technologies that we could leverage for some quick wins. Thanks.

    ReplyDelete