rpa

Going Beyond RPA with LLMs

Robotic Process Automation (RPA) has long been the go-to solution for streamlining repetitive tasks. But when it comes to handling complex, (semi-)structured data, RPA falls short. Enter AI-powered intelligent automation—a transformative approach that redefines what’s possible in business operations. Unlike traditional RPA bots, AI agents bring contextual understanding to the table, making them more reliable and adaptable to real-world challenges. In this blog post, we’ll explore two cutting-edge approaches driving this revolution: horizontal AI enablers designed for advanced data extraction and vertical solutions tailored to industry-specific needs. Discover how these technologies are unlocking new opportunities to automate previously un-automatable tasks, enhancing efficiency, and boosting employee satisfaction in the process.

How does Robotic Process Automation work?

Robotic Process Automation (RPA) works by automating manual tasks within an organisation. It does this by building software bots that mimic the clicks a person would make to complete a task:

  • Mimicking Manual Actions: RPA bots are programmed to perform specific tasks by replicating the exact sequence of clicks a human user would perform. This makes RPA very deterministic.
  • Suitable for Repetitive Tasks: RPA is typically used for automating tasks like data entry or invoice processing, which are common across many businesses but not core competencies. These are often tasks that are repetitive, manual and tedious.
  • Limitations due to Deterministic Nature: Because RPA relies on mimicking clicks, it can be easily disrupted by small changes, such as a misspelled name or a change in the layout of a website. This is because RPA is programmed to follow a specific process and if something deviates from that path the process will fail.
  • Need for Manual Intervention: Due to its limitations, RPA is often only able to complete about 80% of the task. The other 20% requires manual intervention. This means that RPA may not be reliable enough to complete a task without human assistance.
  • Implementation: Implementing RPA usually requires an implementation consultant who observes the steps a person takes to complete a task and then programs the RPA bot to replicate those steps.

How does AI-powered automation surpass RPA's limitations?

AI-powered automation surpasses the limitations of RPA by being able to process unstructured data and intelligently collect context to determine the best course of action. Unlike RPA, which relies on mimicking specific clicks and is therefore very deterministic, AI can handle variations and complexities in tasks.

  • Handling unstructured data: AI, particularly with Large Language Models (LLMs), can process unstructured data, which is not possible with RPA. For example, AI can extract key information from a phone call and input it into a system, whereas RPA cannot.
  • Intelligent context collection: AI agents can intelligently gather context and determine the best course of action, something RPA cannot do. For instance, an AI system can understand the information on a fax and input it into a database, while RPA would require very specific and inflexible instructions.
  • Flexibility and adaptability: RPA is easily disrupted by minor changes, such as a misspelled name or a change in website layout. In contrast, AI systems can adapt to these changes and continue to function effectively because they do not rely on pre-programmed clicks.
  • Self-service and ease of use: AI-powered automation solutions can offer user-friendly interfaces, such as drag-and-drop process flows, allowing users to set up their own automation processes without needing an implementation consultant. RPA, on the other hand, typically requires consultants to program the specific clicks.
  • Browser and web capabilities: AI agents can browse the internet and web in a sophisticated way, enabling them to perform tasks that were previously impossible for RPA.
  • Ability to handle complex tasks: AI can handle complex tasks that are too complicated for RPA, like referral management in healthcare, where there are too many complexities for RPA to handle.

LLM powered RPA - are we there yet?

While a LLM like Claude 3.5 Sonnet is not designed to implement Robotic Process Automation (RPA) in the traditional sense, it can perform many of the tasks that RPA is used for, but in a more sophisticated way.

Traditional RPA:

  • RPA uses software bots that mimic a user's clicks and actions within a system.
  • It is deterministic, meaning it follows a pre-defined set of steps.
  • RPA is often used for automating manual tasks like data entry and invoice processing.
  • It can be brittle and break when something deviates from the expected process, like a misspelled name or a change in a website layout.
  • RPA often requires manual intervention for the 20% of tasks it cannot handle.

Claude 3.5's "Computer Use" Feature:

  • Claude 3.5 interacts with computers using a "vision-only approach", observing the screen through real-time screenshots. It doesn't rely on HTML parsing or access to internal software APIs.
  • It employs a "reasoning-acting" paradigm, similar to ReAct, where it observes the environment before taking action.
  • Claude uses a selective observation strategy, only capturing screenshots when necessary, to reduce computational costs.
  • It can perform a range of tasks, including web searching, workflow automation, office productivity tasks, and even playing video games.
  • Claude 3.5 can navigate complex websites, transfer data between applications, and interact with various software, including closed-source software.
  • It can also extract data from websites, documents, and spreadsheets, and automate repetitive tasks like data entry.
  • Claude can interact with graphical user interfaces (GUIs), bridging the gap between the digital world and AI understanding.

Key Differences:

  • Claude 3.5's "computer use" is more adaptable than RPA because it can understand and interpret visual information from screenshots. This allows it to handle dynamic environments and unstructured data more effectively.
  • RPA is limited to mimicking pre-programmed clicks and actions, while Claude 3.5 can make decisions and adapt to the current GUI state using its reasoning capabilities.
  • Claude can learn from the context of previous actions using a history of screenshots, while RPA cannot.
  • Claude 3.5 uses pre-defined tools provided by developers, whereas RPA bots are created by recording the user's actions.
  • Claude can use tools like "click on a button" or "type text into a field", whereas RPA is based on a pre-programmed set of clicks.
  • Claude can receive feedback in the form of screenshots and text after it executes a tool, to determine if the task is complete or if further actions are required, whereas RPA cannot.
  • Claude 3.5 can integrate with a variety of tools, and also integrates its visual observation with its reasoning process.

In summary, while both RPA and Claude 3.5 Sonnet aim to automate tasks, Claude 3.5 Sonnet represents a more advanced approach using AI, which allows for more flexibility and adaptability than traditional RPA, especially as it uses visual information in its reasoning and planning. The technology is constantly being improved, with new updates in development.

What is next for RPA?

While LLMs like Claude 3.5 Sonnet offer more advanced automation capabilities than traditional Robotic Process Automation, it's not yet a complete replacement. LLMs can perform many of the tasks RPA is used for, but in a more sophisticated way because they can reason and adapt. There are still limitations around accuracy, speed, and reliability, which are being actively addressed. The future of automation is likely to involve a combination of vertical automation solutions and horizontal AI enablers leveraging LLMs to tackle more complex tasks that RPA could not handle.

LLMs are on a path to replace RPA by providing more adaptable and intelligent automation, and that AI agents are similarly poised to replace SaaS by acting as the primary interface between the user and the data and applications, orchestrating processes and automating tasks. These transitions, however, are not yet complete, and both LLMs and AI agents are still under development. The key shift is that both LLMs and AI agents provide more adaptable and intelligent automation than their predecessors.

Vertical AI Agents: The Next SaaS Boom?

Photo by cottonbro studio

Unlock the Future of Business with AI

Dive into our immersive workshops and equip your team with the tools and knowledge to lead in the AI era.

Scroll to top