April 2024

Generative AI: The future of Intelligent Document Processing

By Jos Polfliet, Chief Architect.

The emergence of generative AI is a technological trigger with a more drastic impact than any other technology in recent history.

Much of the excitement around generative AI comes from its ability to create all kinds of content, such as images and videos. But Generative AI’s interpretative, text-generation, translation and summary capabilities can also be incredibly powerful for the purpose of automation in operational teams. 

When combined with other tools, such as the ability to interact with other systems, it is possible to create ‘Generative Agents’ – software that can emulate human behaviour in ways that open up significant new avenues for automation.

Let’s take a look at why this technology is so impactful.

What is a Generative Agent and what can they do?

Generative Agents are software designed to perform tasks autonomously by using various machine learning algorithms, models, and external services. They act in a predefined environment (i.e. a known set of possible actions) based on a given input (such as a document or e-mail).

Generative Agents are able to emulate human behaviour such as task execution, planning and prioritisation. They can understand goals, create and follow a sequence of steps to achieve them, and learn from previous interactions to optimise their performance.

Generative Agents can perform a range of actions that vary in complexity. Simple tasks include:

  • Extracting information.
  • Parsing dates and numbers.
  • Recognising a document type.

More complex actions encompass tasks like:

  • Interacting with other software systems like ERP and CRM packages.
  • Summarisation.
  • Sending emails.
  • Linking information.
  • Nuanced understanding of language and context.
  • Making interpretative decisions.

Agents must have the ability to learn from feedback, to plan how to achieve a desired output and to take different actions to achieve that output. It’s also vital that the agent has calibrated confidence scores so that it knows when to ask for human validation.

The impact of Generative Agents on document processing

By using Generative AI and agents, Intelligent Document Processing platforms can offer many more capabilities besides extraction or recognition.

This greatly enhances the breadth of repetitive work that can be reliably automated – with humans still in control and in the loop. Indeed, business users can train, evaluate and deploy their own – private – document-processing agents.

Doing so removes the burden from IT, while – as you’ll see below – giving better results.

Generative Agents are a valuable asset for your intelligent document processing pipelines because they:

  • Learn nuances that are hard to analyse or describe in general rules, and therefore difficult to implement in code.
  • Learn from business users instead of requiring expensive IT services.
  • Reduce onboarding time by 90% by avoiding IT services.
  • Reduce the need for human validation by 50% by making better holistic decisions and taking actions.

Generative Agents will also unlock new use cases for automation, as they are able to handle complex requirements, such as:

  • Understanding lots of client-specific logic that is hard to describe.
  • Dealing with bad reference data quality – a known problem for static code.
  • Incorporating world knowledge, like synonyms, writing styles and conversions of units of measurements.
  • Dealing with edge cases.

The roadmap to Generative Agents for IDP platforms

Generative Agents will be powerful additions to Intelligent Document Processing platforms. However, they require certain foundational features to enable you to realise their full potential:

  • User-friendly, no-code functionality for training, evaluating, and deploying Large Language Models (LLMs).
  • Layout-aware models that can understand and represent documents based on their layout instead of just the text.
  • The connectivity to enable Agents to take actions in external systems.

Let’s look at each of these in more detail.

A no-code, user friendly platform for training, evaluating and deploying private LLMs

Lots of AI solutions are based on out-of-the-box LLMs. These models are trained on vast amounts of data, but this makes them generic. Solutions that harness these models fall short for complex, customer-specific processes.

This is why you need an Adaptive IDP platform where you can train your own, private, models on your own data. Current open-source or commercial solutions do not allow users to train/fine-tune, evaluate, deploy and scale LLMs without coding skills.

Which is an issue, because document processing tasks like data entry, order booking and invoice entry are typically performed by users that do not have specialised IT or AI knowledge.

We therefore believe in democratising access to private LLMs by automating those steps and giving power to business users.

Layout-savvy LLMs

Current state-of-the-art foundation models (those massive, but generic LLMs we mentioned above) fall into two broad categories: 

  1. Text-only LLMs. As the name suggests, these can only access and understand text. This is an issue for document processing tasks, which often rely heavily on layout and visual information. They extract only the text and lose vital layout information such as tables, columns, forms and so on.
  2. Multi-modal foundation models combining text and images. The current state-of-the-art in multi-modal foundation models is focussed heavily on photographs and pictures, but not on documents. These models are notoriously bad at “reading” the text even of logos, captions or road signs. This is where specialised OCR models shine instead. Document processing tasks heavily rely on text that needs to be recognised with almost perfect accuracy. Text like numbers and names can’t afford to have even one misrecognised character when extracted.

There is a clear gap that needs filling by giving text-based LLMs access to more layout information while maintaining the accuracy of specialised OCR models.

This is why Duco Adaptive Intelligent Document Processing (AIDP) features custom supervised LLMs that extend pretrained multilingual text-only models with rich layout information, drastically improving performance. The next step is to harness the same techniques using generative AIs as foundation models.

The connectivity to enable agents to take actions in external systems

One of the biggest ways in which Generative Agents can benefit STP is through interacting with other systems and tools. This will become a vital feature of any Adaptive IDP platform.

Every process is different and Adaptive IDP platforms need to be able to tailor their behaviour to a company’s existing or target workflow. This will undoubtedly involve tasks such as external data lookup, interpretative decision-making, applying business logic, or data validation.

Without Generative Agents, certain processes like these remain reliant on IT and repetitive work from the end user. The way this is currently done is often through custom code. Customising every aspect of the pipeline may sound great, but these steps often must be hard-coded by developers and engineers.

This has two big disadvantages:

  1. These custom steps do not learn automatically when a business user corrects the mistakes made. The user will have to correct the same error time and time again, unless the code is updated.
  2. They require IT services to implement, making the business users dependent on the budget and capacity of internal or external IT teams.

Clearly, that is not how an Adaptive IDP system should work. It’s therefore imperative that Generative Agents are able to connect to other systems to perform tasks like those listed above themselves. They can do with minimal training what currently takes coding, time, money and manual intervention to achieve.

The numbers: impact of Generative Agents on document processing STP

So what does all this add up to?

We looked at automation rates for various projects across our customers during a fixed timeframe. The projects are a mix of difficulties: some are so hard they reach no more than 50% STP, while others are fairly straightforward and achieve up to 99.2% STP.

This gave us an aggregated STP rate before Generative Agents of 76.6%.

We then looked at the instances where the platform requested human validation. This enabled us to see where a Generative Agent would have been able to resolve the issue in question automatically without needing human support.

For example, a Generative Agent is capable of making holistic decisions. The agent understands the total context of the document, instead of viewing validation and aggregation rules field-by-field. A good example of this is determining whether a date should be interpreted in US format (month, day, year) or European format (day, month, year). The agent can use the rest of the document as context to decide that, rather than viewing only the date field in isolation. It is therefore less likely to get it wrong and need a human to check the result manually.

Another example is taking an action – for instance, some documents or emails require an action like looking up reference data in an external system, adding new data to systems, or sending an email request for more information. Generative Agents can do these things themselves.

Adding those back into the automation rate gave a potential STP rate of 88.1%. That equates to a 50.6% reduction in human validation – which is time that can be spent on more value-adding tasks!

Preparing for Generative Agents

Generative Agents will significantly enhance STP rates, bringing you one step closer to full data automation. There is still work to be done before the potential of generative AI and Generative Agents is realised in this space. The important thing in the meantime is to ensure that you’re harnessing an Adaptive IDP platform that includes the features and functionality necessary to make full use of generative AI as it develops.