LLMs in Your Pocket: An Overview of CAMPHOR by Apple

While Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and responding to complex queries, their reliance on server-side processing poses significant challenges for mobile assistants.

These challenges primarily revolve around two key issues:

Privacy: Mobile assistants frequently require access to sensitive personal information to provide accurate and relevant responses. Storing and processing this data on remote servers raises privacy concerns, as users may be hesitant to share such sensitive information.
Latency: Server-side processing introduces delays between query understanding and execution, which can negatively impact the user experience, especially for tasks requiring multiple server-device interactions.

To address these limitations, researchers have explored deploying Small Language Models (SLMs) directly on user devices. SLMs offer faster inference, lower latency, and enhanced privacy protection. However, SLMs face their own set of challenges, mainly related to accuracy and memory constraints.

This is where CAMPHOR comes in.

CAMPHOR: A Multi-Agent System for On-Device Query Understanding

CAMPHOR is a novel on-device SLM multi-agent framework designed to handle multiple user inputs and reason over personal context locally, ensuring user privacy.

Key Features of CAMPHOR:

Hierarchical Architecture: CAMPHOR utilises a hierarchical architecture where a high-order reasoning agent decomposes complex tasks and coordinates expert agents responsible for specific functionalities. These expert agents include:
Personal Context Agent: Retrieves relevant personal information from the user's device.
Device Information Agent: Fetches device-specific information such as location, time, and screen content.
User Perception Agent: Captures recent user activities on the device.
External Knowledge Agent: Seeks information from external sources like web search engines.
Task Completion Agent: Generates function calls to represent user intent and complete tasks.
Parameter Sharing and Prompt Compression: By sharing parameters across agents and employing prompt compression techniques, CAMPHOR reduces model size, latency, and memory usage, making it suitable for on-device deployment.

Understanding CAMPHOR's Approach

Let's break down how CAMPHOR works using an example:

User Query: "Can you show me the cheapest flight options to Barcelona next month and add it to my calendar? Also, let my travel buddy know about our trip plan."

CAMPHOR's Response:

Task Decomposition: The high-order reasoning agent breaks down the query into sub-tasks:

Retrieve the user's current location.
Identify the "travel buddy" from the user's contacts.
Find the cheapest flights to Barcelona next month.
Add the chosen flight to the user's calendar.
Notify the travel buddy about the trip.

Agent Collaboration: The high-order reasoning agent assigns these sub-tasks to the appropriate expert agents:

Device Information Agent retrieves the user's current location.
Personal Context Agent looks up the "travel buddy" contact.
External Knowledge Agent searches for flights.
Task Completion Agent handles adding the flight to the calendar and sending a message to the travel buddy.

Function Call Generation: Each expert agent generates function calls based on its specific task and the available tools on the user's device. For example, the Task Completion Agent might generate functions like create_calendar_event and send_imessage_message.

This collaborative and hierarchical approach allows CAMPHOR to effectively understand and respond to complex queries while utilising personal information securely and efficiently on the user's device.

Evaluating CAMPHOR's Performance

To evaluate CAMPHOR's effectiveness, researchers created a novel dataset, the CAMPHOR dataset, which simulates a user's smartphone environment with diverse personal information and tools. They fine-tuned SLM-based CAMPHOR agents on this dataset and compared their performance to various state-of-the-art LLMs using different prompting strategies.

The results demonstrate that:

Fine-tuned SLMs significantly outperform instruction-based LLMs in terms of task completion metrics.
Prompt compression techniques effectively reduce the prompt size with minimal impact on accuracy, making them crucial for on-device deployment.
RAG-based approaches, while commonly used for grounding LMs with external data, face limitations in this context due to sub-optimal retrieval recall, particularly for compositional queries.

CAMPHOR: A Promising Future for On-Device Personal Assistants

CAMPHOR presents a promising solution for building personalised, private, and efficient mobile assistants. By leveraging the power of on-device SLMs and a multi-agent architecture, CAMPHOR effectively addresses the limitations of traditional server-side LLMs. While the current research focuses on single-interaction queries, future work aims to extend CAMPHOR's capabilities to handle multi-turn conversations and complex runtime feedback.