Document capture pitfall: Failing to see the forest from the trees
Don’t focus on individual issues, but see the patterns instead
This one is not a technical issue, but a project management one. In technology projects, we’re used to creating tickets for bugs or defects. Each occurrence would get its own ticket. Using this approach in a document capture project is a mistake if you get too granular. Specifically, you need to be sure to recognize the pattern of issues instead of trying to manage each occurrence of an issue.
For example, you will not want to create a ticket for each instance that Invoice Number was not successfully extracted. If you do, you will generate a tremendous amount of noise and you will find yourself fine tuning very small issues that may or may not make a real impact in the end.
This is where having the right data matters. Through the life of your system, you will want to track the changes your operators make to capture the correct data from your documents. Each individual change should be tracked, but you should analyze the data, looking for patterns. You should be looking for double-digit percentages where fields needed to be changed. You will want to run through several hundred documents to really start seeing a pattern. Using this data, you can then determine the root cause and prescribe a fix.
You should gather the following data points:
- Fields changed on each document
- Number of keystrokes the operator made to change a given field
- Document type or variation of document type (e.g., Vendor format for invoices)
- Extraction confidence per field
- BONUS: Type of miss / Reason for miss (I don’t see any of the vendors implementing this in their out of the box reporting tools, but this would be very helpful. Examples: Bad image quality, Keyword not found, No recall, Incorrect recall)
Jimar Garcia is President and Principal Consultant of HatchWork Solutions, helping businesses refresh their business models through pragmatic technology adoption.