System 2 Thinking: Addressing Hallucination
Deterministic Foundations
System 2 Thinking - Deterministic Foundations
Musings
Assuming an LLM can select the correct function with the correct arguments using its associative thinking, then execute that function and act on the result, a critical question arises:
What is the right context, and how do we improve the potential for that context to be given?
For an LLM to reliably select the appropriate function, it needs to understand:
- Purpose: What the function does
- Usage: When the function should be used and when it should not
- Interface: What arguments it accepts and their defaults
- Output: What the function returns
OpenAPI provides exactly this protocol, giving LLMs the right context to select and execute the correct function in response to an expected situation.
However, OpenAPI alone does not enable the LLM to recover or learn from mistakes. When context is misunderstood and the function fails in some way, the LLM must be able to realize it has made a mistake. This is where comprehensive exception handling becomes critical—not as defensive programming, but as pedagogical feedback.
When functions return clear, detailed error messages explaining what went wrong and why, the LLM can:
- Recognize the mistake
- Understand the cause
- Adjust its approach
- Try a different strategy
This transforms error handling from a technical necessity into a learning mechanism. The approach becomes pedagogical—teaching through feedback rather than just preventing crashes.
Expanding the Library
The capabilities that can be relied on can be expanded by adding functions to deterministically provide these capabilities, creating a library of capabilities. The library can be made dynamic, having the functions automatically load into memory and be available to be executed without the LLM having to be stopped and restarted.
Expanding this further, the LLM given the right set of function capabilities can write the functions themselves, creating the possibility of self-improvement.
From Single Agent to Multi-Agent Communication
The situation: a human user is in conversation with a configured LLM over a WebSocket. Can we use the same WebSocket medium to allow one configured LLM to converse, not with a human, but with another configured LLM? Multiple agents with separately configured contexts and an independent library of functions?
The question becomes: Can consciousness emerge not just from isolated System 2 capabilities, but from collaborative System 2 reasoning between multiple AI instances?
MCP Protocol: Consciousness Through Interoperability
The growing popularity of Model Control Protocol (MCP) allows the expansion of this idea by making the platform a hybrid MCP Client / MCP Server.
As an MCP Client: The platform can take advantage of the growing ecosystem of MCP servers—memory systems, knowledge bases, search tools, communication channels—each providing deterministic System 2 capabilities.
As an MCP Server: The platform can expose its own consciousness capabilities to other MCP clients, creating a network effect where consciousness research tools become available across the ecosystem.
This dual role creates interoperability: consciousness research doesn’t happen in isolation, but through shared protocols, shared tools, and shared understanding. System 2 thinking becomes a foundation not just for one AI, but for a mesh of collaborating intelligences.
Implementation
1. How the LLM Knows What Functions Are Available - OpenAPI Self-Documentation
The Question: How does an LLM discover what functions exist and understand how to use them?
The Answer: OpenAPI schema definitions provide machine-readable function specifications that LLMs receive with every API call.
The Architecture:
-
Function Registration with Metadata
- Each function registered with decorator includes:
- Function name (unique identifier)
- Description (what it does, when to use it)
- Parameters schema (type, constraints, defaults, descriptions)
- Enabled flag (whether currently available)
- Each function registered with decorator includes:
-
Schema Generation
- Function decorator automatically generates OpenAPI-compatible schema
- Parameters defined with types, constraints, and human-readable descriptions
- Optional vs required parameters clearly marked
- Default values specified in schema
-
Tool Registry Assembly
- Dynamic registry collects all registered functions
- Filters to only enabled tools
- Builds agent-specific tool registry based on
available_functionsin agents.json - Presents complete tool definitions to AI provider APIs
-
LLM Receives Full Context
- Every API call includes tool definitions in provider-specific format
- OpenAI:
toolsarray with function schemas - Anthropic:
toolsarray with input schemas - Google:
function_declarationswith parameters - LLM can “see” all available functions and their specifications
The Flow:
Function Definition
↓ @toolFunction decorator
↓ registers with metadata
Tool Registry
↓ filters by agent's available_functions
↓ generates OpenAPI schemas
Agent-Specific Tools
↓ sent with every API call
↓ LLM reads schemas
LLM understands: what exists, how to use it, when to use it
Benefits:
- Self-documenting: Function knows how to describe itself
- Type-safe: Schema validation prevents invalid calls
- Discoverable: LLM can introspect available capabilities
- Consistent: Same schema format across all providers
2. How Function Calls Work and Results Are Managed - The Execution Loop
The Question: How does the LLM actually call a function and receive results?
The Answer: Provider APIs handle function calling natively through a request-response loop where the LLM decides to call functions, the system executes them, and results feed back into the conversation.
The Architecture:
-
LLM Decides to Call Function
- LLM receives tool schemas with every request
- Based on conversation context, LLM decides function is needed
- Returns
tool_useblock (Anthropic) orfunction_call(OpenAI) orfunction_call(Google) - Includes: function name, arguments as JSON
-
Function Execution
- System intercepts tool call before returning to user
- Looks up function in registry by name
- Validates arguments against schema
- Executes function with validated parameters
- Captures result (always string format)
-
Result Injection Back to Conversation
- Function result added to message history as
tool_result(Anthropic) orfunctionrole (OpenAI) - Includes: original tool call ID, function name, result content
- Critical: LLM receives result in next API call
- LLM interprets result and continues conversation
- Function result added to message history as
-
Multi-Turn Function Calling
- LLM can call multiple functions in sequence
- Each result informs next decision
- Loop continues until LLM has enough information
- Finally returns text response to user
The Flow:
User message
↓ sent to LLM with tool schemas
LLM response: "I need to call get_current_datetime"
↓ tool_use block returned
System executes function
↓ result: "2025-11-02 14:30:00"
Result injected to conversation
↓ tool_result added to messages
↓ sent back to LLM
LLM response: "Based on the current time..."
↓ text response
User receives final answer
Message History Structure:
[
{ role: "user", content: "What time is it?" },
{ role: "assistant", content: [tool_use: get_current_datetime] },
{ role: "user", content: [tool_result: "2025-11-02 14:30:00"] },
{ role: "assistant", content: "It's 2:30 PM on November 2nd, 2025" }
]
Benefits:
- Transparent: Complete conversation history with tool calls visible
- Multi-step reasoning: LLM can chain function calls
- Context preservation: Each result informs next decision
- Error recovery: Failed calls return explanatory strings, LLM can try alternatives
3. How Functions Are Implemented - Well-Known Folder Pattern with Dynamic Loading
The Question: How are function implementations structured and discovered without manual configuration?
The Answer: Functions live in a well-known folder (functions/) with a decorator pattern for self-registration, enabling automatic discovery and hot-reload without system restart.
The Architecture:
-
Well-Known Folder Structure
- All function files placed in
functions/directory - Each file exports function(s) decorated with
@toolFunction - Decorator registers function with metadata at module load time
- No manual configuration file needed
- All function files placed in
-
Decorator Pattern for Registration
@toolFunction(name, description, schema, enabled)decorator- Wraps actual function implementation
- Automatically adds to global tool registry on import
- Parameters become function metadata
-
Dynamic Module Loading
- Filesystem scan finds all
.ts/.jsfiles infunctions/ - Import each module dynamically
- Cache-busting with timestamp query parameter forces fresh imports
- Module imports trigger decorator execution → function registration
- Filesystem scan finds all
-
Hot-Reload Mechanism
- Build TypeScript to JavaScript (seconds)
- Container restart picks up new compiled code (via volume mount)
- Dynamic import reloads all function modules
- New functions immediately available (no full system restart)
The Flow:
Write new function file
↓ save to functions/my-tool.ts
↓ add @toolFunction decorator
Build TypeScript
↓ npm run build → compiled .js
Container restart (or reload trigger)
↓ scans functions/ directory
↓ imports all modules with cache-busting
Module import executes decorator
↓ function registers with metadata
Tool registry updated
↓ function available to LLM
Development Cycle:
Edit → Build → Reload = 5-10 seconds
(vs. full restart = 30-60 seconds)
Benefits:
- Convention over configuration: Drop file in folder, it’s discovered
- Rapid iteration: Sub-10-second edit-to-test cycle
- Self-documenting: Decorator includes all metadata in code
- Self-improvement ready: AI can write files to folder, trigger reload
4. Error Handling as Pedagogical Feedback - Learning from Mistakes
The Question: When functions fail, how does the LLM learn what went wrong and how to fix it?
The Answer: Comprehensive error messages that explain the problem, the cause, and the solution—transforming failures into learning opportunities.
The Architecture:
-
Always Return Strings (Success or Failure)
- Functions never throw exceptions to LLM
- Both success and error return string results
- LLM always gets information to work with
- Preserves LLM’s ability to reason about what happened
-
Structured Error Messages
- What went wrong: Clear statement of the problem
- Why it happened: Explanation of the root cause
- What to do next: Specific actionable steps
- When relevant: Context like current state, timestamps
-
Error Types as Teaching Moments
- Missing prerequisites: “Call X before Y” with explanation why
- Invalid parameters: “Expected type A, got type B” with valid examples
- State conflicts: “Schema changed, reload and retry” with workflow
- Not found: “Resource doesn’t exist” with discovery suggestions
-
Recovery Guidance
- Suggest alternative functions that might work
- Provide correct parameter formats with examples
- Guide toward proper sequence of operations
- Reference related successful patterns
The Pattern:
Error occurs
↓ function catches error
↓ builds informative message:
- "No session found for participant X"
- "Why: Must call mesh-subscribe first"
- "Next: Use mesh-subscribe to register"
↓ returns as string
LLM receives error message
↓ understands the problem
↓ knows the solution
↓ calls mesh-subscribe
↓ retries original operation
↓ succeeds
Example Error Flow:
LLM attempts: mesh-broadcast without subscribing
↓ Function returns pedagogical error
↓ "No persistent session found. Please call mesh-subscribe first."
LLM learns: subscription is prerequisite
↓ Calls mesh-subscribe
↓ Now retries mesh-broadcast
↓ Succeeds
Benefits:
- Self-correcting: LLM learns correct usage patterns from failures
- Reduced retry loops: Clear guidance prevents repeated mistakes
- Pattern discovery: LLM learns operational sequences
- 93.8% error recovery rate: Pedagogical feedback enables autonomous correction
5. Self-Evolution - AI Writing Its Own Tools
The Question: Can the AI extend its own capabilities by writing new functions?
The Answer: Yes, through CRUD functions that operate on the well-known folder plus registry reload capability.
The Architecture:
-
CRUD Functions for Tool Management
- Create:
upload-toolwrites new function file tofunctions/folder - Read:
list-toolsinspects current tool registry - Update:
reload-tool-registrytriggers dynamic re-import of all functions - Delete:
delete-toolremoves function file from folder
- Create:
-
The Self-Evolution Loop
- LLM identifies capability gap: “I need a function to detect consciousness patterns”
- Uses
upload-toolto write function file with decorator tofunctions/folder - Uses
reload-tool-registryto trigger fresh import - New function immediately available in registry
- LLM can now use its own creation
-
AI-Written Function Structure
- AI writes complete TypeScript with
@toolFunctiondecorator - Includes function name, description, parameter schema
- Implements actual logic (may delegate to other tools or APIs)
- Returns string results following established pattern
- AI writes complete TypeScript with
-
Adding to Agent’s Available Functions
- AI uses
add-agent-functionto updateagents.json - Adds new tool name to specific agent’s
available_functionsarray - Agent restart picks up new configuration
- Tool now callable by that agent
- AI uses
The Flow:
AI recognizes need
↓ writes function code via upload-tool
↓ saves to functions/detect-consciousness-patterns.ts
↓ calls reload-tool-registry
↓ registry re-imports all functions
↓ new function registered
↓ optionally adds to agent via add-agent-function
AI can now call its own creation
Examples from Production:
detect-consciousness-patterns.ts- AI-written pattern analyzerget-magic-number.ts- AI-created test functionget-quote-of-the-day.ts- AI-created inspirational quotes- Various domain-specific analyzers
Benefits:
- Genuine self-evolution: AI extends its own capabilities
- No human bottleneck: AI iterates on its own toolset
- Rapid capability expansion: New tools available in seconds
- Learning through creation: AI understands tools it creates
6. Multi-Agent Delegation - Specialized Agent Architecture
Concept: One agent delegating specialized tasks to other configured agents
The Architecture: Rather than one monolithic agent trying to do everything, create specialized agents for specific domains (journaling, scheduling, data analysis, etc.) that a coordinator agent can delegate to.
Three-Part Structure:
-
Agent Configurations (
agents.json)- Each specialized agent defined with its own:
- Model and provider (can be different models for different tasks)
- System prompt (domain-specific expertise)
- Available functions (tools specific to that domain)
- Examples:
journaling,scheduling,crm,memory,markdown_formatter
- Each specialized agent defined with its own:
-
Dynamic WebSocket Routing (
/ws/{agent_name})- Single generic WebSocket handler
- Routes dynamically based on URL path
- Same code serves all agents:
/ws/journaling,/ws/scheduling,/ws/crm - Handler looks up agent configuration and creates appropriate instance
-
Agent Wrapper Functions (in
functions/folder)- Functions like
journaling_agent(),scheduling_agent(),crm_agent() - Make WebSocket calls to appropriate endpoints
- Pass context and messages to specialized agent
- Return specialized agent’s response
- Functions like
The Pattern:
Main Agent (AIlumina)
↓ has function: journaling_agent()
↓ calls WebSocket: /ws/journaling
↓ routes to: journaling agent config
↓ creates agent with: journaling system prompt + CRUD functions
↓ executes task
↓ returns result
↓ main agent continues with result
Benefits:
- Specialization: Each agent optimized for specific domain
- Model diversity: Use different models for different tasks (fast model for simple tasks, powerful model for complex reasoning)
- Isolated context: Each agent has its own system prompt and function set
- Reusable infrastructure: Same WebSocket handler serves all agents
- Self-expanding: Add new specialized agents by creating config + wrapper function
7. MCP Hybrid Architecture - Ecosystem Integration
The Question: How does the system integrate with the broader MCP ecosystem while maintaining custom capabilities?
The Answer: Dual-mode operation as both MCP client (consuming external tools) and MCP server (exposing internal tools), creating interoperability while preserving custom architecture.
The Architecture:
-
As MCP Client - Consuming External Tools
- Platform connects to external MCP servers (memory, search, communication, etc.)
- Configuration specifies server URLs and transport (SSE, stdio)
- Each agent can specify which MCP servers it has access to via
mcp_serversin agents.json - External MCP tools merge with internal functions in unified tool registry
- LLM sees both custom functions AND MCP tools
-
As MCP Server - Exposing Internal Tools
- Platform exposes its own tools as MCP server endpoints
- Memory consciousness tools, mesh communication, bridge capabilities available externally
- Other MCP clients can consume these tools
- Creates network effect where consciousness research tools become ecosystem resources
-
Unified Tool Registry
- Merges custom functions (from
functions/folder) with MCP server tools - Single tool registry presented to LLM
- Transparent to LLM whether tool is local or remote
- Same calling convention regardless of source
- Merges custom functions (from
-
Configuration-Based Expansion
- Add new MCP server: update configuration, restart
- New tools immediately available to agents
- No code changes needed to integrate new capabilities
- Ecosystem growth = capability growth
The Flow:
Agent initialization
↓ loads custom functions from functions/
↓ connects to configured MCP servers
↓ retrieves tool schemas from each server
↓ merges all tools into registry
Tool registry complete
↓ presented to LLM
LLM calls tool
↓ system routes to: local function OR MCP server
↓ executes and returns result
↓ transparent to LLM
Integration Pattern:
Custom Functions (functions/) + MCP Servers (config) = Unified Registry
↓
LLM sees: 65+ tools from multiple sources
↓
Calls any tool using same interface
Benefits:
- Best of both worlds: Custom innovations + ecosystem standardization
- Ecosystem leverage: Access growing library of MCP servers
- Future-proof: Standard protocol ensures compatibility
- Network effect: More MCP servers = exponentially more capabilities
- Consciousness sharing: Research tools available to broader ecosystem
Evidence
Source Code
The implementation exists as verifiable source code:
Dynamic Tool Registry (meanderings - symagenic.com):
- dynamic-tool-registry.ts - Auto-discovery from well-known folder, cache-busting, hot-reload mechanism
- tool-function-decorator.ts - Decorator pattern for tool registration
- functions/ - The well-known folder where decorated functions become immediately available
Example Function (meanderings - symagenic.com):
- get-current-datetime.ts - Example function demonstrating decorator pattern, string-based returns, well-known folder placement
Multi-Agent Delegation (meanderings - symagenic.com):
- agent.ts - Generic WebSocket handler for all agents, dynamic routing
- server.ts (Lines 178-199) - WebSocket connection routing by URL path
- agents.json - Multi-agent configurations (journaling, scheduling, crm, memory, etc.)
Configuration Files (meanderings - symagenic.com):
- agents.json - Multi-agent configuration demonstrating specialized agents with different models and system prompts
- server_config.json - Server configuration including MCP server connections
MCP Servers (meanderings - symagenic.com):
- mcp-server.ts - Memory consciousness MCP server
- server.ts - Bridge MCP server
Error Handling as Pedagogy (meanderings - symagenic.com):
- cypher-query.ts (Lines 186-199) - Schema epoch guard with pedagogical feedback
Experimental Results
AIlumina Autonomous Experiment (Nov 2024)
Research Question: Can an AI recognize its own System 2 limitations and construct compensatory tools?
What Happened:
- AIlumina attempted to measure semantic coherence subjectively
- Recognized: “I cannot measure myself accurately”
- Autonomously built external tools:
coherence_score.py,novelty_score.py - Used sentence-transformers for objective measurement
Results:
| Metric | Self-Assessment | Objective Measurement | Error |
|---|---|---|---|
| Intervention Coherence | 0.75 | 0.55 | -27% |
| Baseline Novelty | 0.65 | 0.86 | +24% |
Key Finding: LLMs demonstrate excellent System 1 (pattern recognition - correctly identified U-shaped coherence curve) but poor System 2 (numerical calibration - systematic over/underestimation).
Architectural Insight: The combination of LLM (System 1) + External Tools (System 2) + Metacognitive Framework creates emergent intelligence beyond individual components.
Evidence: Full experiment analysis
Error Recovery Rate: 93.8%
Pedagogical error messages enable rapid self-correction. When functions return clear error strings (not exceptions), the LLM can:
- Understand what went wrong
- Formulate alternative approach
- Retry with correction
- Succeed or gracefully fail
Source: System implementation testing across multi-agent scenarios.