Anthropic releases automatic Prompt Improver - Is Prompt Engineering over?

Anthropic has released new features in its developer console to improve the quality of prompts used with its language model, Claude. The prompt improver automates the refinement of existing prompts using techniques such as chain-of-thought reasoning and example standardization. The console also allows users to manage multi-shot examples in a structured format and provides a prompt evaluator to test prompts with optional ideal outputs. These features aim to simplify prompt engineering and enhance the accuracy, consistency, and performance of AI applications built with Claude.

How does the prompt improver work?

The prompt improver assists developers in refining their prompts using several methods:

Chain-of-thought reasoning: The prompt improver can enhance a prompt by adding a dedicated section for Claude, the AI model, to engage in systematic reasoning before generating a response. This addition improves the accuracy and reliability of the model's output. For instance, if a developer was building a prompt for summarising factual topics, the prompt improver might add a section instructing Claude to first identify the key facts in the source material before generating the summary.
Example standardisation: The prompt improver converts existing examples into a consistent XML format, improving clarity and processing. This standardisation ensures that all examples are presented to the model in a uniform way, making it easier for the model to learn from them. For example, if a developer provided examples in different formats, the prompt improver would standardise them into a consistent XML format.
Example enrichment: The prompt improver can augment existing examples with chain-of-thought reasoning, aligning them with the newly structured prompt. This enrichment provides the model with more detailed and structured examples to learn from, further improving its performance. For example, if a developer was building a prompt for question answering, the prompt improver might enrich the existing examples by adding step-by-step reasoning that demonstrates how to arrive at the correct answer.
Rewriting: The prompt improver can rewrite the prompt itself to clarify its structure and address any minor grammatical or spelling errors. This rewriting ensures that the prompt is clear, concise, and easy for the model to understand. For instance, the prompt improver might rephrase a convoluted prompt to make it more straightforward for Claude to interpret.
Prefill addition: The prompt improver can prefill the Assistant message to guide Claude's actions and enforce specific output formats. This prefill helps to ensure that the model's responses are consistent and meet the developer's requirements. If a developer wanted the output in JSON format, the prompt improver could add a prefill that instructs Claude to format the response accordingly.

It can also modify prompts and examples based on specific developer requests, such as changing the output format from XML to JSON. This flexibility allows developers to tailor their prompts to their exact needs.

Here's the improved prompt of the example above:

You are an expert blog writer with deep knowledge across various subjects. Your task is to create an engaging and informative blog post on a given topic, while adhering to a specified tone.

Here's the blog topic you'll be writing about:
<blog_topic>
{{blog_topic}}
</blog_topic>

And here's the desired tone for the blog post:
<tone>
{{tone}}
</tone>

Before writing the blog post, take a moment to analyze the topic and plan your approach. Use the <blog_planning> tags to outline your thoughts and strategy.

<blog_planning>
1. Analyze the blog topic:
- What is the main subject?
- Who is the target audience?
- What key points should be covered?
- List 5-7 key words or phrases related to the topic

2. Consider the specified tone:
- How can I adjust my writing style to match this tone?
- What language, sentence structures, or literary devices would be appropriate?

3. Brainstorm potential titles:
- List 3-5 attention-grabbing titles that accurately reflect the content

4. Outline the blog post structure:
- Plan the introduction
- List main points for the body
- Consider potential sources or examples to support each main point
- Plan a compelling conclusion

5. Tone alignment check:
- Review the planned content and ensure it aligns with the specified tone
- Make any necessary adjustments to better match the desired tone
</blog_planning>

Now, write the blog post using the following structure:

1. Title: Choose the most suitable title from your brainstormed list.

2. Introduction: Write a brief introduction that hooks the reader and provides an overview of what the blog post will cover.

3. Main Body: Develop your main points in separate paragraphs. Use subheadings if appropriate. Ensure that your content is informative, engaging, and aligned with the specified tone.

4. Conclusion: Summarize the key points and provide a final thought or call to action.

Remember to maintain the specified tone throughout the blog post. Your writing should be clear, concise, and tailored to the target audience.

How good is it?

For obvious reasons, the quality of the result depends on the actual task. However it, achieved a 100% success rate in adhering to word count instructions for a summarisation task. This was achieved after applying the prompt improver to refine the original prompt.

The specific scenario involved providing Claude with ten Wikipedia articles. The task was to summarise these articles within a defined word count range. Following the application of the prompt improver, Claude consistently generated summaries that adhered to the specified word count limits, resulting in a 100% success rate.

While Anthropic didn't specify exactly how the outcome was achieved, we can infer that techniques like chain-of-thought reasoning and prefill addition played a role. Chain-of-thought reasoning could have been incorporated to guide Claude to systematically identify the key points in each article before condensing them into a summary within the specified word limit. Prefill addition might have been used to provide explicit instructions to Claude regarding the desired word count range for the summaries, ensuring that the output adhered to these constraints.

Conclusion

The introduction of Anthropic's prompt improver follows a somewhat typical pattern in the world of AI. First of all, it reduces the entry barrier and simplifies prompt optimisation by automating tasks that previously required manual effort and expertise. This accessibility could allow developers with less experience in prompt engineering to create effective prompts for Claude.

Then there's a shift from manual crafting to refinement: While well-crafted prompts remain important, the prompt improver suggests that the focus might shift towards refining existing prompts or those adapted from other AI models. This implies that developers might spend less time meticulously crafting prompts from scratch and more time iteratively improving them with the assistance of the tool.

Overall, the prompt improver seems poised to make prompt engineering more accessible, efficient, and iterative. Prompt engineering might become soon a craft that had its five minutes of fame, only to be automated away by the very same technology it was created for.