Fully Automated: The Template Approach
‘Templating’ is the process of picking zones on a document where information is statically located. The template approach is a fully automated solution. In this approach, the software does not guess where information is located; it simply always looks in the same x, y, (height, width) location for each individual field. These fields are usually defined in the software through a process of viewing a sample image and rubber banding field locations (drawing rectangles on the image through the software’s user interface). These locations are then stored as a template and applied during processing to images of the template type. Templates are only used in fixed-forms processing data capture.
The iterative template approach was a technology that came out of manual zoning. This approach does not require any more expertise than templating. With this approach, there is a phase of training that follows these steps. First, an operator creates a new definition for a document type. He or she then loads a set of samples that represent that document type and the variations within it. The operator must iterate over each page in the training set and rubber-band the same fields. As the operator goes from image to image, the software calculates the variations in field location from page to page. By doing so, the software understands how a field may move from one location to another on the page. Once the training is done, the definition can be applied in production. This approach is employed by semi-structured forms processing systems.