Sunday, January 25, 2015

Screen Shot OCR

Recently I was trying to solve a problem at work, and I thought I'd share what I learned in hopes that it would benefit others. The problem is that I had some screen captures containing words (mostly medical terms), and I needed to convert those words into text. The first solution that came to mind was to simply use optical character recognition (OCR) software to extract text from the images. I had some screen shots where the text was approximately 14-point, and after testing a variety of OCR applications, I found that the best results were obtained with ABBYY FineReader. However, after trying to apply OCR to another screen capture where the text was approximately 9- or 10-point and stored within cells of a table, the same FineReader application failed miserably.

How could it be that a screen capture with such clearly legible text (certainly at least as legible or more legible than most text that has been passed through a scanner) was uninterpretable by OCR applications? It appears that mainstream OCR applications are designed to work with high-resolution images, whereas I was dealing with low-resolution anti-aliased text from screen captures which is a special use case for OCR. In fact, I found somebody's thesis and other research addressing this problem.

Fortunately, I found several applications designed to perform OCR against screen-captured text with small font sizes. For the most part, these applications are available only for Microsoft Windows. I applied the same process to test all applications: I opened a sample screen capture and used each app to interpret selected lines of text--this was done by using the OCR app to selectively highlight 1 row of text at a time (even though some of the apps also had options to interpret the entire screen shot). Here are my impressions of how these OCR applications performed against screen captures containing medical terminology.

  • Screen OCR (http://www.screenocr.com/, $29.95, 21-day free trial). I was very excited about this app since the first thing it did was build a font database which gave me the impression that it would fine-tune its accuracy by customizing its interpretation based on the font. I had nearly 700 fonts cataloged in its database, and the process took less than 5 minutes. However, when trying to apply the OCR, it produced only gibberish. I probably was doing something wrong, but after tinkering around with all the settings and reading the help documentation, I still couldn't figure out how to use it and simply gave up.
  • CaptureText (http://www.capturetext.com/, $29.95, 21-day free trial). Turns out that this is exactly the same application at Screen OCR. It did not build a font database, but I assume that is because Screen OCR already built one. The user interface, options, color scheme, and results (gibberish) were exactly the same as Screen OCR, so I guess the company is trying to sell the same product under 2 different names. Weird.
  • Boxoft Screen OCR (http://boxoft.com/screen-ocr/, $27, 15-day free trial). Unlike the other 2 OCR apps above, Boxoft Screen OCR did not build a font database, so I was able to start using it right out of the box. Overall accuracy was fair, as it made several common errors such as misinterpreting the "%" character as "'/o" and confusing "e" for "o" on occasion.
  • Capture2Text (http://capture2text.sourceforge.net/, free/open source). Overall accuracy was good, but the user interface was clunky, and access to the clipboard with the OCR-converted text was awkward.
  • ABBYY Screenshot Reader (http://www.abbyy.com/bonussr/, free full unlimited license). This OCR app had good accuracy. Although it did make some mistakes, I felt that the mistakes were reasonable in the sense that the text was of a higher level of difficulty to interpret, and the accuracy was the best of all the apps that I tested. Similar to Capture2Text, I had a hard time figuring out how to reveal the OCR-converted text in the clipboard, but perhaps there is a setting that I need to tweak to see the text in real time.
I should also mention that I came across Tesseract, but since it is an open source OCR engine without a user interface, I was unable to review it. I don't know if any of the above apps actually leverage Tesseract. In any case, if you are a developer, then maybe you can make use of it. I just don't know how.

So in summary, I feel that ABBYY Screenshot Reader is the king of the mountain for OCR use cases where you need to grab text of small font sizes from screen captures. Are you aware of other OCR apps that perform well against screen-captured text in small font sizes? Do they perform better than ABBYY Screenshot Reader? Please share, I would love to know!

Saturday, January 3, 2015

Photography 101 - The Exposure Triple Constraint

People sometimes ask me to tell them the basics about photography, so in this post I will discuss the fundamental elements of exposure, specifically what I call the "exposure triple constraint" (this is perhaps more commonly referred to this as the "exposure triangle").  In any case, I am borrowing/stealing the concept of the triple constraint from project management where the 3 constraints are cost, time, and scope.  That is, if the project scope expands, then you may need to increase cost and/or time to delivery.  If your project timeline contracts, then you may need to increase expenditures and/or reduce scope.  Finally, if project costs (budgeted dollars) are reduced, then you may need to increase time and/or reduce scope.  When one parameter changes, failure to simultaneously adjust other parameters will likely result in project deliverables of suboptimal quality.  If you've ever planned an event such as a birthday party or even a wedding, the concept of the project management triple constraint should be intuitive to understand.

There is a perfect analogy in photography, where the exposure triple constraint relates to the combination of aperture, shutter speed, and ISO.  Any photograph you've ever taken was shot with a certain permutation of aperture, shutter speed, and ISO, and there is an intimate relationship between these 3 parameters.  Some photos may look the way you wanted them to look, while others may not.  Have you ever wondered why?  In some cases, maybe you could have achieved the desired effect by leveraging the exposure triple constraint.  The goal of this post is to provide novice photographers with an introduction the basics of exposure, and it is geared toward camera bodies that allow you to individually define these settings (e.g., digital SLRs, ILCs, or even dedicated point-and-shoot cameras) rather than shoot exclusively in full automatic mode (e.g., most smartphone cameras). Let's explore each factor in more detail.

Aperture. The aperture is a measure of how wide your camera's shutter opens when you take a picture.  The smaller the f value, the wider the aperture and therefore the more light will be available to expose your image.  Most entry-level camera lenses will be capable of opening as wide as f/3.5 or maybe even f/2.8.  More expensive lenses will let you go to f/1.8, f/1.4, or even f/1.2 which allow even more light to enter.  Before you jump to the conclusion that wider apertures are always better, you first need to understand that wider apertures will also result in a shallower depth of field--in other words, your foreground and background will become increasingly blurred at wider apertures (smaller f values).  This might be desirable for a portrait where you want a subject's face to be in focus and the backdrop to be blurry.  On the other hand, landscape photos often have objects at different distances that you want to keep in focus, so you may prefer to shoot at smaller apertures (larger f values).  Without getting into too much detail, suffice it to say that for any given lens, the maximum image sharpness usually resides somewhere in between the extremes of apertures.  That is, if your lens is capable of opening as wide as f/1.4 and as narrow as f/22, you don't necessarily want to go to the extremes just because you can.  Rather, use the setting that you need to produce the desired effect.  To summarize the behavior of aperture within the exposure triple constraint, the wider the aperture, the more you can afford to reduce shutter speed or ISO to get a proper exposure.  The narrower the aperture, the more you will need to slow down the shutter speed or increase ISO to get a proper exposure.

Shutter Speed.  The shutter speed is a measure of how long your camera's shutter remains open when you take a picture.  It is measured in seconds.  A very fast shutter speed might be 1/500 sec, 1/1000 sec, or even 1/4000 sec.  Slow shutter speeds might be 1/30 sec, 1/10 sec, or even 1 sec or 30 sec. The faster the shutter speed, the more you are able to "freeze" action, and the slower the shutter speed, the more likely you will have a blurry image for handheld shots (photography with tripods and other mounts might be the topic of another post).  The faster your shutter speed, the more you will need a wider aperture or higher ISO to get a proper exposure.  The slower your shutter speed, the more you can afford to use a narrower aperture or lower ISO to get a proper exposure.

ISO.  The ISO is a measure of the sensitivity of your camera's sensor.  The higher the ISO, the more sensitive the sensor, and the lower the ISO, the less sensitive the sensor.  As with aperture and shutter speed, there are tradeoffs with different ISO settings.  While higher ISO settings are more sensitive to available light, this comes at the expense of introducing more noise to the image--in other words, your image may look more grainy at higher ISO settings.  Therefore, in general you will want to shoot at the lowest possible ISO to minimize noise.  Most entry-level cameras produce very sharp images at ISO settings of 100, 200, 400, or even 800 and above, depending on the model.  The lower your ISO, the more you will need to slow down your shutter speed or increase your aperture to get a proper exposure.  The higher your ISO, the more you can afford to use a faster shutter speed and/or a narrower aperture to get a proper exposure.

The following figure summarizes the parameters in the exposure triple constraint.  The figure originates from this web site, although I have edited it to remove an error on the left side of the triangle.


Now that we have reviewed the 3 inter-related components of the exposure triple constraint, I will discuss how your camera can automatically calculate the exposure settings for different shooting modes.  For this, I will use terminology that is common to Canon cameras since that's what I use the most.  For other kinds of cameras, check your owner's manual to find out the equivalent terms.

Automatic.  When you shoot photos in Automatic mode (designated by the letter "A"; not to be confused with "Av" for aperture value), the camera evaluates the available light and decides on the best compromise between aperture, shutter speed, and ISO to properly expose the image.  When light is abundant (e.g., in broad daylight), generally this results in a pretty decent image.  For most beginners, I'd recommend shooting in Automatic mode to get a feel for your camera's capabilities.  Most cameras have additional automatic settings that bias the triple constraint toward one factor over the others.  For example, the Portrait mode (depicted by an icon of a person's head) will bias the exposure toward a wider aperture and automatically adjust for a faster shutter speed and/or lower ISO.  The Landscape mode (depicted by an icon of a mountain and cloud) will bias the exposure toward a narrower aperture and automatically adjust for a slower shutter speed and/or higher ISO.  The Sports/Action mode (depicted by an icon of a person running) will bias the exposure toward a faster shutter speed and automatically adjust for a wider aperture and/or higher ISO.  This should give you a sense of how your camera automatically applies the exposure triple constraint--when one parameter changes, at least one of the other 2 factors must compensate to produce a proper exposure.  But what if the automatic settings still don't enable you to achieve the desired photographic effect that you're looking for?  Well, read on...

Program.  When you shoot photos in Program mode (designated by the letter "P"), you have the options to manually set the ISO and to manually turn the flash on/off.  As you manually set the ISO and flash settings, the camera will automatically calculate the optimum settings for aperture and shutter speed.  I find this setting most useful to override the automatic flash setting.  For example, if you're at a restaurant where the lighting is suboptimal, and you don't want to fire a flash directly into people's faces, you might choose to turn off the flash. This might result in underexposing the image, but you can adjust it during post-processing (e.g., using Lightroom or Photoshop).  Or perhaps you're outdoors where your subject is in the shade or has the sun behind them and the camera would not otherwise fire the flash--you might force the flash to fire to better illuminate the faces (this is referred to as "fill flash").  Generally I do not use Program mode solely to manually set the ISO, but that could be done--I just can't think of any common scenarios when you'd want to do that, but then again I'm not a professional photographer.  Please leave a comment if you want to share a use case for manually setting the ISO alone, I'd love to understand what others are doing!

Shutter Priority.  When you shoot photos in shutter priority (designated by "Tv" which stands for "time value"), you have the option to manually set the aperture (and ISO, if desired) while the camera automatically calculates the optimum aperture (and ISO, if not set manually).  Shutter priority is often used to photograph subjects that are in motion, where you need a faster shutter speed than your camera might otherwise set in an automatic mode.  This could be a speeding race car in broad daylight where you might desire a shutter speed of 1/1000 sec or faster, or it could be a toddler crawling/running around a poorly lit living room where the automatic setting might default to a shutter speed of 1/50 sec, but you really need 1/100 sec or faster to avoid blurring.  On the other hand, if you're shooting nighttime photos of the stars, or if you're shooting running water such as a waterfall or river where you want to intentionally smooth out the water, you might choose a shutter speed of 3, 10, or 30 sec.

Aperture Priority.  When you shoot photos in aperture priority (designated by "Av" which stands for "aperture value"; not to be confused with "A" for automatic), you have the option to manually set the aperture (and ISO, if desired) while the camera automatically calculates the optimum shutter speed (and ISO, if not set manually).  Aperture priority is commonly used to achieve the desired depth of field.  For example, portrait photos are often shot with a wide aperture to make facial features (especially the eyes) in sharp focus while blurring out the foreground and background ("shallow" depth of field).  On the other hand, landscape photos are often shot with narrower apertures to ensure that both the foreground and background are in focus ("deep" or "large" depth of field).

Manual.  While the P, Av, and Tv modes can be thought of as "semi-automatic" modes since you manually set one constraint and the camera automatically calculates the other two, the Manual mode (designated by the letter "M") lets you independently set the aperture, shutter speed, and ISO, although you can "cheat" by letting the camera automatically calculate the ISO if you desire.

The following table summarizes for each shooting mode how each parameter in the triple constraint is defined.  To reiterate, the automatic modes include not just the "A" setting but also the Portrait, Sports, Landscape, and other automatic settings.


Keep in mind that there are no absolute correct settings to use for any scenario.  The settings that you ultimately select for each exposure should depend on the lighting conditions and the effect that you intend to capture with each image.  Your personal experiences and your own creative mindset will guide your decisions about what kind of exposure is right for any given scenario.

In my opinion, to fully understand the concept of the exposure triple constraint, you need to experiment with different camera settings in different scenarios and evaluate the results.  Sometimes you can obviously tell through the in-camera LCD that you have a poor exposure (e.g., overexposure or underexposure), but sometimes the effects are more nuanced which will require you to view the photos on a computer screen at 100% magnification to see flaws (e.g., not the desired depth of field or perhaps a blurry image from a subject in motion) and learn how to make the necessary adjustments next time.  After multiple iterations, you may gravitate to a certain style of exposure and figure out what is right for you.

When evaluating your photo, it is always helpful to know what aperture, shutter speed, and ISO you used to provide you with context about how the image was exposed.  Virtually all cameras store this and other information (also known as "metadata" which is data that describes other data) in a standard format called EXIF.  When reviewing photos in the camera's LCD display, you can usually toggle the exposure settings on/off, and the commands to do so differ across camera makes and models.  When reviewing photos on your computer, you can usually display the EXIF metadata, and the commands to do so differ across software applications.  Refer to your camera or computer software documentation to learn how to display the exposure settings.

In conclusion, the exposure triple constraint refers to the combination of aperture, shutter speed, and ISO.  Different shooting modes may give you the flexibility to control one or more parameters of the exposure triple constraint, so use the appropriate camera settings to achieve the desired effect in your photos.  Keep shooting and evaluating your photos to calibrate your understanding of how to use the triple constraint to your advantage.  Above all else, have fun!

Please leave a comment if you found this post to be helpful or if you have suggestions on how to improve it.  Thanks!