Data is the new oil— the most valuable resource in the age of information. The growing volumes of data in enterprises are fast becoming a deluge. By 2025, International Data Corporation (IDC) predicts that worldwide data will exceed 175 zettabytes. While data is a prized asset, it needs to be refined or extracted with specialized tools, just like oil, to be of real value. If we analyze its source within an enterprise, we find only 20% residing in structured formats like excel files, text files, databases like Oracle, SQL server etc., and the remaining 80% in difficult-to-extract unstructured formats such as e-mails, PDF files, Web pages, word documents, scanned images, JPEG, GIF files, and so on. This unstructured data is important, for companies tend to overlook it, thus greatly undervaluing their data.
Companies that can extract and leverage both structured and unstructured data can drive business process automation for higher productivity and revenues. With the advent of robotic process automation (RPA), all manual, mundane, repetitive processes that rely on structured data can be automated—up to a point. Scaling of the RPA bots becomes difficult when they have to deal with the vast 80% of semi-structured and unstructured data that lies within the enterprise. Automating processes involving semi-structured and unstructured data requires far more advanced capabilities that those of RPA bots. This capability is required to process and convert the unstructured data into structured formats so that it can be processed by simple RPA bots within a workflow. This is where intelligent or cognitive automation technologies offer a solution.
Gartner defined a new term—Intelligent Document Capture (IDP)—for classifying tools that use technologies such as computer vision, optical character recognition (OCR), intelligent character recognition (ICR), natural language processing (NLP), and machine learning (ML) to overcome challenges in automating processes that deal with semi-structured and unstructured data. Leading RPA vendors such as Automation Anywhere (AA) and Uipath have added IDP capability to their existing RPA platforms thereby enabling the interoperability between the RPA bots and IDP-enabled bots (called “IQ Bot” in Automation Anywhere and “Document Understanding” in Uipath).
RPA and data capture
Document extraction is a major part of many processes best suited for RPA. By getting the document extraction process right, we can derive maximum ROI from investment in RPA. Automation Anywhere has a sophisticated IDP solution for such use cases—the IQ Bot, which makes the document extraction process as seamless as possible. As simple it may sound, document extraction is not an easy task. Documents come in different formats, languages, structure, and clarity. Scanned documents can be even more difficult to process, especially if they include handwritten content. AA’s purpose-crafted IQ Bot makes the whole process of extracting data from such unstructured documents easier. Without IDP solutions such as AA’s IQ Bot, Uipath’s Document Understanding, Hyper Science Platform, RPA projects will require knowledge workers to read documents and extract data. IDP is key to unlocking the value from RPA; without it, end-to-end automation will remain just a wish.
IQ Bot combines the power of RPA with AI technologies such as computer vision, NLP, fuzzy logic, and ML to automatically classify, extract, and validate information from business documents and e-mails.
How does IQ Bot work?
Let us see how the IQ Bot works when it encounters data extraction from non-structured formats such as passports, Aadhar card, tax forms, bar codes, manifests, engineering drawings, photo images, invoices, purchase orders, legal documents, etc.
Pre-processing: IQ Bot uses techniques such as noise reduction, binarization, and de-skewing, vital for improving the quality of the documents. In cases where the document is a scanned copy or is a low-quality document, these techniques help in reducing the noise and enhancing the accuracy of the document extraction.
Intelligent Document Classification: Large documents need to be classified into relevant sections to extract relevant data. To do this, IQ Bot uses NLP, supervised and unsupervised ML, OCR, and Google Vision.
Data Extraction: The key part of IDP is extraction. The previous steps mentioned increase the accuracy of data extraction. IQ Bot uses ML to extract specific data from documents. The bot is trained in the relevant context to be able to extract the required information. Extraction with precision increases the percentage of tasks that can be engaged for end-to-end automation, significantly minimizing the need for human intervention.
Post-Processing: Once we extract the data from the document, AI-driven techniques are employed to validate the extracted data based on certain rules and conditions. This further improves the extraction results.
Features of IQ Bot
- Dashboard to show progress across processes.
- Allows recognition and categorization of different document types and formats.
- Supports API-based integrations—APIs let you upload documents to IQ Bot and download processed documents.
- Ability to learn with every validation and facilitate continuous improvement.
- Users can toggle between OCR engines—GOOGLE Tesseract4, ABBYY FineReader, Microsoft OCR engines are supported — while leveraging native document classification, auto correction, and extraction capabilities.
- Extended international language support: 190 languages, including but not limited to Asian languages such as Japanese, Korean, Chinese (simplified) and Chinese (traditional).
- Database encryption: IQ Bot document data stored in database tables and columns can be encrypted to protect potentially sensitive information.
- Allows sharing of domain configuration by bots to save time during the design process.
- Uses computer vision and ML to detect patterns and classify documents into groups to reduce training redundancies.
- Provides an indicator that signals the accuracy and reliability of the bot.
- Ability to add custom logic using Python scripting for developers when they need to modify AI workflows.
Benefits of IQ Bot
- 10x faster setup for business user: Business users can set up the IQ Bot easily, without the need for technical knowhow. It provides an intuitive interface with an auto grouping/mapping feature to enable more automation use cases.
- More than 80% straight through processing (STP) rate: This minimizes manual processing of documents by knowledge workers, compared to plain OCR/ICR solutions that provide only 50% STP rate. A higher STP rate means higher savings on automation tools maintenance.
- Direct cost savings: Reduces expenses by dramatically cutting costs to process large volumes of data.
- AI technology ensures high accuracy, even for low-resolution docs: The minimum resolution is 300 dpi; however, the bot can process low resolutions with higher accuracy after repeated training and continuous improvement.
- Learns from human-in-the-loop feedback.
- Provides end-to-end automation as it is integrated with RPA with no changes to workflow.
Power of IDP
With the right tools and technologies, enterprises can leverage the power of their data to drive transformation. Cyient’s suite of digital solutions such as our IntelliCyient RPA/Automation Tech Studio helps enterprises leverage Industry 4.0 technologies to design automated workflows for end-to-end process automation. Market leading IDP tools help enterprises by combining the power of IDP for handling unstructured documents with RPA bots to drive automation and a higher ROI.