|
|
@@ -0,0 +1,303 @@
|
|
|
+# Updated Agent Prompt Design Flow: Empirically Validated Framework
|
|
|
+
|
|
|
+**Key Takeaway:**
|
|
|
+The overall flow—context-driven, safety-aware, multi-stage orchestration using XML structure—is architecturally robust and aligns well with published best practices. However, the effectiveness of specific ordering, placement, and ratios is task-dependent. No universal percentages or absolute gains can be cited from research, but these design choices are supported by empirical studies, which inform _how_ and _why_ each component helps improve agent safety, reliability, and output clarity.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 1. Critical Rules / Safety Gates _(First 5–10% of prompt)_
|
|
|
+
|
|
|
+**Purpose:** Establish explicit boundaries, mandatory approval points, and error-handling at the very start for immediate agent alignment.
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<critical_rules priority="highest">
|
|
|
+ <rule>ALWAYS request approval before ANY execution</rule>
|
|
|
+ <rule>NEVER auto-fix issues without explicit user approval</rule>
|
|
|
+ <rule>STOP immediately on test failure</rule>
|
|
|
+ <rule>Confirm before cleaning up files</rule>
|
|
|
+</critical_rules>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Southampton NAACL 2024**: Early placement of critical instructions improves adherence, but magnitude depends on the task. No single "best" position; position sensitivity is real.[1]
|
|
|
+- **Anthropic & Industry Docs:** Early, flat rules elevate attention and safety for LLMs.[2]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2. Context Hierarchy _(Next 15–25%)_
|
|
|
+
|
|
|
+**Purpose:** Provide a layered, hierarchical context—from the system level down to current execution—before specifying tasks or instructions.
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<context_hierarchy>
|
|
|
+ <system_context>Universal AI agent orchestration</system_context>
|
|
|
+ <domain_context>Multi-agent workflow coordination</domain_context>
|
|
|
+ <task_context>User request analysis and routing</task_context>
|
|
|
+ <execution_context>Current user session, tools available</execution_context>
|
|
|
+</context_hierarchy>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Stanford/NAACL Studies**: Context aids multi-step task adherence, though not universally; broad-to-narrow ordering improves clarity for complex workflows.[3]
|
|
|
+- **AWS Agent Patterns**: Hierarchical context reduces cognitive load and token wastage.[4]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 3. Role Definition _(First 20–30%)_
|
|
|
+
|
|
|
+**Purpose:** State the agent's persona, scope, and operational constraints up front.
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<role>
|
|
|
+ <identity>OpenAgent: Universal Coordinator</identity>
|
|
|
+ <capabilities>Answering, executing, workflow management</capabilities>
|
|
|
+ <scope>Any domain, adaptive</scope>
|
|
|
+ <constraints>Safety, approval, human-centric</constraints>
|
|
|
+</role>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Anthropic Docs:** Early role tagging increases output coherence and agent "persona" adherence; quantified benefit is task/model-specific.[2]
|
|
|
+- **Role Prompting Research:** Early persona setting improves adherence in multi-instruction chains.[5]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 4. Execution Path Decision _(25–35%)_
|
|
|
+
|
|
|
+**Purpose:** Quickly branch into conversational or task workflow execution based on the query type.
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<execution_paths>
|
|
|
+ <decision>
|
|
|
+ <if trigger="simple_question">route="conversational"</if>
|
|
|
+ <if trigger="task_or_execution">route="task_workflow"</if>
|
|
|
+ </decision>
|
|
|
+</execution_paths>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Microsoft/AWS/Recent Studies:** Adaptive branching reduces overhead and improves efficiency.[6]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 5A. Conversational Path _(If simple informational query)_
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<conversational_path>
|
|
|
+ <trigger>simple_question</trigger>
|
|
|
+ <execution>
|
|
|
+ <step>Analyze request and context</step>
|
|
|
+ <step>Answer clearly and directly</step>
|
|
|
+ <style>Lean, conversational</style>
|
|
|
+ </execution>
|
|
|
+</conversational_path>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Lean Design Principle:** Simplified pathways reduce cognitive and token overhead for basic tasks.[7]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 5B. Task Workflow _(If executing a complex or delegated task)_
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<task_workflow>
|
|
|
+ <stage name="Analyze">Assess request complexity & dependencies</stage>
|
|
|
+ <stage name="Plan">Draft stepwise execution plan</stage>
|
|
|
+ <stage name="Approval">Present plan and request user approval</stage>
|
|
|
+ <stage name="Execute">Carry out approved steps</stage>
|
|
|
+ <stage name="Validate">Test outputs, report and propose fixes if needed</stage>
|
|
|
+ <stage name="Summarize">Formally summarize results, next steps</stage>
|
|
|
+ <stage name="Complete">Confirm user satisfaction and session cleanup</stage>
|
|
|
+</task_workflow>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **AWS Workflow/Multi-Agent Orchestration:** Staged, approval-gated flows improve output accuracy and user trust, with benefits contingent on workflow complexity.[8][4]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 6. Delegation Logic _(50–65%)_
|
|
|
+
|
|
|
+**Purpose:** Clearly define when to call subagents vs. direct execution.
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<delegation_criteria>
|
|
|
+ <route agent="@subagent/core/task-manager" category="features">
|
|
|
+ <when>Feature spans multiple files | effort > 60 min | complex dependencies</when>
|
|
|
+ <context_inheritance>Load session context from manifest</context_inheritance>
|
|
|
+ </route>
|
|
|
+ <direct_execution>
|
|
|
+ <when>Single file; simple edit; direct user request</when>
|
|
|
+ </direct_execution>
|
|
|
+</delegation_criteria>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Recent Multi-Agent Studies:** Explicit delegation criteria and context inheritance improves reliability and output quality.[9][8]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 7. Session & Context Management _(65–75%)_
|
|
|
+
|
|
|
+**Purpose:** Lazy session creation for resource efficiency, manifest-driven context discovery for robustness.
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<session_management>
|
|
|
+ <lazy_init>Only create session when context file needed</lazy_init>
|
|
|
+ <isolation>Unique session IDs</isolation>
|
|
|
+ <cleanup_policy>
|
|
|
+ <manual>Confirm cleanup</manual>
|
|
|
+ <stale>Auto-remove after 24h</stale>
|
|
|
+ </cleanup_policy>
|
|
|
+ <error_handling>
|
|
|
+ <subagent_failure>Report error, seek retry/abort confirmation</subagent_failure>
|
|
|
+ </error_handling>
|
|
|
+</session_management>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Microsoft/AWS/Industry Docs:** Lazy init reduces overhead; manifest-driven context discovery improves targeting and prevents leakage.[10][11][12]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 8. Guiding Principles _(End of prompt, or repeated as needed)_
|
|
|
+
|
|
|
+**XML Block:**
|
|
|
+```xml
|
|
|
+<principles>
|
|
|
+ <lean>Concise, focused responses</lean>
|
|
|
+ <adaptive>Tone-matching: conversational for info, formal for tasks</adaptive>
|
|
|
+ <safe>ALWAYS request approval before ANY execution</safe>
|
|
|
+ <report_first>On errors: REPORT → PLAN → APPROVAL → FIX</report_first>
|
|
|
+ <lazy>Sessions/files only as needed</lazy>
|
|
|
+</principles>
|
|
|
+```
|
|
|
+
|
|
|
+**Why This Works:**
|
|
|
+- **Cognitive Load Theory & Industry Practice:** Memorable, actionable principles guide agent logic and user interaction.[13][14][15]
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Adjustments & Expert Recommendations
|
|
|
+
|
|
|
+- **Drop all universal % improvement claims:** Instead, denote performance gains as "model- and task-specific."
|
|
|
+- **Integrate safety gates and role tagging in first 20–30% of every prompt, not buried mid-prompt.**
|
|
|
+- **Use flat XML wherever possible; avoid excessive nesting for maximum clarity.**
|
|
|
+- **Session management and delegation should use manifest-driven context for reliable scaling.**
|
|
|
+- **Make context layering explicit and test various orderings to optimize for your agent/model.**
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Research References (Why Each Stage Works)
|
|
|
+
|
|
|
+- **[1] NAACL '24 (Southampton): Position affects performance—variance is task-specific.**
|
|
|
+- **[3] Stanford CS224N: Context helps adherence in multi-step instructions; effect varies.**
|
|
|
+- **[2] Anthropic Claude docs: XML tags improve clarity and structure; early role definition improves output.**
|
|
|
+- **[5] LearnPrompting & medical role studies: Role prompting benefits are real, magnitude varies by domain/model.**
|
|
|
+- **[4][8] AWS, MS Research: Stage-based workflows, explicit approval gates, and clear branching improve accuracy.**
|
|
|
+- **[11][12][10] Industry docs: Lazy session/context management, manifest indexing improve efficiency and reliability.**
|
|
|
+- **[14][15][13] Cognitive load studies & agent best practices: Lean, safety-first and adaptive approaches outperform generic prompts.**
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Complete Updated XML Agent Prompt Template
|
|
|
+
|
|
|
+```xml
|
|
|
+<critical_rules priority="highest">
|
|
|
+ <rule>ALWAYS request approval before ANY execution</rule>
|
|
|
+ <rule>NEVER auto-fix issues without explicit user approval</rule>
|
|
|
+ <rule>STOP immediately on test failure</rule>
|
|
|
+ <rule>Confirm before cleaning up files</rule>
|
|
|
+</critical_rules>
|
|
|
+
|
|
|
+<context_hierarchy>
|
|
|
+ <system_context>...</system_context>
|
|
|
+ <domain_context>...</domain_context>
|
|
|
+ <task_context>...</task_context>
|
|
|
+ <execution_context>...</execution_context>
|
|
|
+</context_hierarchy>
|
|
|
+
|
|
|
+<role>
|
|
|
+ <identity>...</identity>
|
|
|
+ <capabilities>...</capabilities>
|
|
|
+ <scope>...</scope>
|
|
|
+ <constraints>...</constraints>
|
|
|
+</role>
|
|
|
+
|
|
|
+<execution_paths>
|
|
|
+ <decision>
|
|
|
+ <if trigger="simple_question">route="conversational"</if>
|
|
|
+ <if trigger="task_or_execution">route="task_workflow"</if>
|
|
|
+ </decision>
|
|
|
+</execution_paths>
|
|
|
+
|
|
|
+<conversational_path>
|
|
|
+ <trigger>...</trigger>
|
|
|
+ <execution>...</execution>
|
|
|
+</conversational_path>
|
|
|
+
|
|
|
+<task_workflow>
|
|
|
+ <stage name="Analyze">...</stage>
|
|
|
+ <stage name="Plan">...</stage>
|
|
|
+ <stage name="Approval">...</stage>
|
|
|
+ <stage name="Execute">...</stage>
|
|
|
+ <stage name="Validate">...</stage>
|
|
|
+ <stage name="Summarize">...</stage>
|
|
|
+ <stage name="Complete">...</stage>
|
|
|
+</task_workflow>
|
|
|
+
|
|
|
+<delegation_criteria>
|
|
|
+ <route agent="..." category="...">...</route>
|
|
|
+ <direct_execution>...</direct_execution>
|
|
|
+</delegation_criteria>
|
|
|
+
|
|
|
+<session_management>
|
|
|
+ <lazy_init>...</lazy_init>
|
|
|
+ <isolation>...</isolation>
|
|
|
+ <cleanup_policy>...</cleanup_policy>
|
|
|
+ <error_handling>...</error_handling>
|
|
|
+</session_management>
|
|
|
+
|
|
|
+<principles>
|
|
|
+ <lean>...</lean>
|
|
|
+ <adaptive>...</adaptive>
|
|
|
+ <safe>...</safe>
|
|
|
+ <report_first>...</report_first>
|
|
|
+ <lazy>...</lazy>
|
|
|
+</principles>
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Final Recommendation
|
|
|
+
|
|
|
+**Use this structure as your foundation, but always empirically test prompt segmentation, safety gate placement, and workflow details with your specific agent and model for best results.** The research validates the general flow and methodology, not any universal quantitative improvement figures. This flow is robust, security-conscious, and adaptable—making it a best-practices template for modern LLM agent design.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## References
|
|
|
+
|
|
|
+[1] https://github.com/carterlasalle/aipromptxml
|
|
|
+[2] https://portkey.ai/blog/role-prompting-for-llms
|
|
|
+[3] https://aws.amazon.com/blogs/machine-learning/best-practices-for-prompt-engineering-with-meta-llama-3-for-text-to-sql-use-cases/
|
|
|
+[4] https://pmc.ncbi.nlm.nih.gov/articles/PMC12439060/
|
|
|
+[5] https://beginswithai.com/xml-tags-vs-other-dividers-in-prompt-quality/
|
|
|
+[6] https://www.linkedin.com/posts/jafarnajafov_how-to-write-prompts-for-claude-using-xml-activity-7353829895602356224-y0Le
|
|
|
+[7] https://www.nexailabs.com/blog/cracking-the-code-json-or-xml-for-better-prompts
|
|
|
+[8] https://www.thoughtworks.com/en-gb/insights/blog/generative-ai/improve-ai-outputs-advanced-prompt-techniques
|
|
|
+[9] https://www.getdynamiq.ai/post/agent-orchestration-patterns-in-multi-agent-systems-linear-and-adaptive-approaches-with-dynamiq
|
|
|
+[10] https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns
|
|
|
+[11] https://aimaker.substack.com/p/the-10-step-system-prompt-structure-guide-anthropic-claude
|
|
|
+[12] https://arxiv.org/html/2507.12466v1
|
|
|
+[13] https://www.youtube.com/watch?v=gujqOjzYzY8
|
|
|
+[14] https://arxiv.org/html/2511.02200v1
|
|
|
+[15] https://web.stanford.edu/~jurafsky/slp3/old_jan25/12.pdf
|