Busting the Myth of Data Extraction: OCR and EDI
Posted on November 9, 2015 by Liang McIntosh-Yee
When people are evaluating whether to automate their accounts payable or accounts receivable processes, one of the first things to always come up is the capture process. How do you capture document information? Do you use OCR or EDI? How accurate is your data extraction? One thing is clear, and that is that people are not clear on what they want for document capture. They know they want high accuracy and efficiency, at a low cost, but they are unsure of how to get there. And they are not to be blamed for their confusion.
In the accounts payable and accounts receivable world, we are surrounded by many people saying one thing or another, and often are promoting a single method is the best way to do something. Many of us are skeptical and unsure who to believe. We have decided to share our experience and expertise about the different methods of data extraction and how to best utilize all of the different methods available so that you can make informed decisions.
As long as people have been using paper, the method for getting information from it was a manual process. Whether that was manually transcribing it to another piece of paper, or in today’s day and age, manually keying information into a computer. In the 1970s Electronic Data Interchange (EDI) was introduced and hailed as the method of choice for exchanging data. Later, Optical Character Recognition (OCR) came to prominence as a way to bridge the gap, and more recently, Intelligent Character Recognition (ICR) has been a way to try to solve some of the shortcomings of OCR. The one thing that is common between these methods is that they all have pros and cons, and they all have been THE method of data extraction.
Manual data extraction is burdensome, but also is a tried and true method that can provide a high level of accuracy. Manual data extraction is time consuming and costly, and for many companies, they don’t have the resources to support full time manual data entry. On top of this, having a human provides the ability to interpret handwriting or varied document and file formats that a computer cannot, it also opens you up to the possibility of human error, especially if you don’t have workers dedicated to the task of keying information correctly.
EDI took out the aspect of human error, providing 100% accuracy and completely removing manual processing. A perfect solution, right? Unfortunately, not all vendors can provide invoices or documents in an EDI format or is economically feasible for all vendors, meaning you still have to have another method to do business with them. For organizations that do use EDI, each new vendor and file type needs to have all fields be mapped to provide this accuracy, making implementation very resource intensive, especially for IT.
OCR provided the same benefit of an automated capture, taking out the human touch completely, making it a faster and more efficient method. And unlike EDI, it doesn’t require a vendor to send a specific file, instead allowing you to extract data from print, PDFs, emails and more. However, with the ability to work with these different file formats, you sacrifice the 100% accuracy that EDI provides, this means that most organizations that use OCR still do a manual review to verify accuracy. On top of that the cost to implement and maintaining OCR can be very high and cannot be economically justified. Recently ICR has added some additional functions like the ability to read handwriting, and sort by file type, but the cost to implement ICR are even higher than OCR and the error rates are still high.
The challenge with manual data entry is that it is costly and time consuming. The challenge with EDI is that not all vendors can send out EDI files, and each vendor that sends in a different EDI format represents additional costs. The challenge with OCR is it still lacks 100% accuracy, and has a high barrier of entry. Your challenge is that to get the benefits of all of these methods while avoiding the cons of each. How can you do that? For most companies the best option is to seek out service providers that provide document capture as a core competency that will often allow you to take advantage of all of the various methods, without the same costs to implement and support each one, while getting higher levels of accuracy and efficiency.