AI safety and guardrails

Operating AI agents in a regulated financial environment requires defenses against adversarial manipulation and model unreliability. Lola Send’s architecture implements multiple layers of containment and validation to ensure agents operate within defined boundaries — even when presented with adversarial inputs.

Prompt injection defense

Prompt injection attacks attempt to manipulate AI behavior by embedding instructions in sender messages. Lola Send’s architecture mitigates this at four levels:

System prompt isolation

System prompts are statically defined in code using PromptTemplate with fixed variables: current date, sender information, brand name, and assistant name. Sender input is processed as message content within the conversation — it is never interpolated into system instructions. The prompt template is resolved at agent initialization, not at message time.

Declarative tool boundaries

Each agent has a fixed, declarative set of tool functions registered on its MacawAssistant. The AI model can only invoke tools that are explicitly declared during agent construction. It cannot discover, create, or call undeclared tools — the tool boundary is enforced by the framework, not by the model’s behavior.

Middleware-enforced authentication

The middleware pipeline authenticates and enriches every message before it reaches any agent. A malicious prompt embedded in a sender message cannot bypass authentication, skip phone normalization, or circumvent identity verification. The agent only sees messages that have passed the full middleware chain.

Behavioral constraints in prompts

Agent prompts contain explicit operational boundaries:

Defined conversation scope (e.g., remittance operations only)
Language rules and tone guidelines
Required confirmation steps before financial operations
Restricted topics that the agent must decline to discuss
Instructions to defer to human agents for ambiguous requests

Jailbreak resistance

Jailbreak attacks attempt to make the model ignore its instructions and operate outside its intended role. Lola Send’s architecture provides structural defenses beyond prompt-level instructions:

Role-based agent isolation

The LogicRouter enforces strict role separation. Each agent handles only its designated sender state — onboarding, verified, blocked, pending, or unavailable. An agent cannot escalate its own permissions, switch to another agent’s context, or access tool functions from a different agent. This isolation is enforced at the framework level, not the model level.

Immutable runtime configuration

Agents cannot modify their own configuration at runtime. The system prompt, tool set, model parameters, and RAG retrieval configuration are set at initialization and remain fixed for the entire session lifecycle. There is no mechanism for the model to alter its own operating parameters.

Minimal attack surface for restricted agents

Blocked and pending CIP agents are configured with minimal tool sets — status queries only. These agents have no access to payment, recipient, or transactional operations. Even if a jailbreak succeeded at the model level, the agent has no tools to perform unauthorized actions.

Tool set comparison by agent type

Agent	Tool functions	Attack surface
Sender home	Full remittance capabilities — quotes, recipients, payments, receipts	Largest tool set; subject to confirmation requirements
Onboarding	Transaction and identity tools — quotes, country data, initial send	Moderate; new sender has no stored data to access
Blocked	`get_status` only	Minimal; cannot initiate any operations
Pending CIP	`get_status` only	Minimal; limited to status queries
Service unavailable	None	Zero; informational responses only

Hallucination control

In a financial context, hallucinated data — fabricated exchange rates, invented account numbers, or fictional compliance verdicts — presents a direct operational risk. Lola Send’s architecture ensures all factual financial data originates from verified backend services.

Tool-sourced financial data

All factual financial data comes from tool function responses calling backend services:

Data type	Source tool function
Exchange rates and fees	`pre_quote`
Recipient lists	`get_recipients`
Operation status and history	`get_operations`
Available countries	`get_countries`
Bank and payer lists	`get_banks`, `get_payers`
Payment methods	`get_payment_methods`
Receipts	`get_receipt`

The model never generates financial figures, account numbers, or compliance verdicts from its training data. Every piece of financial information displayed to the sender originates from a verified service response.

RAG-grounded responses

Retrieval-augmented generation grounds agent responses in bank-approved knowledge documents. These are Markdown-based knowledge bases loaded per brand and project configuration. The agent retrieves relevant passages before generating responses, reducing reliance on the model’s parametric knowledge for operational and policy questions.

Structured output validation

The Gemini message enhancer validates all outbound messages against a strict JSON schema. Responses must conform to defined types: text, buttons, select, or link. Malformed output — including hallucinated response structures — falls back to safe plain text delivery.

Bounded context window

The model’s conversation window is bounded by core_history_window_length, limiting context accumulation. This prevents unbounded conversation histories from degrading model performance or introducing contradictory context from earlier in long conversations.

Scope containment

Beyond the defenses above, Lola Send enforces structural containment at the framework level:

Typed parameter validation — each tool function declares typed parameters, and ctx.validate_params(validate_types=True) enforces parameter types at invocation. The model cannot pass malformed or unexpected data types to backend services.
Blend model review — the blend_model in MacawSettings provides a secondary processing layer for response refinement. This dual-model architecture separates reasoning from output, adding a review step before responses reach the sender.
Event interception — LogicRouter-level event handlers intercept unsupported message types (documents, images) before they reach any agent, preventing unexpected input from being processed.

Lola Send’s AI safety architecture follows the principle of least privilege: each agent has access only to the tools and data required for its specific role. Structural containment at the framework level ensures safety does not depend solely on model behavior.

Infrastructure security

AI safety

Data and compliance

Prompt injection defense

System prompt isolation

Declarative tool boundaries

Middleware-enforced authentication

Behavioral constraints in prompts

Jailbreak resistance

Role-based agent isolation

Immutable runtime configuration

Minimal attack surface for restricted agents

Hallucination control

Tool-sourced financial data

RAG-grounded responses

Structured output validation

Bounded context window

Scope containment

​Prompt injection defense

​System prompt isolation

​Declarative tool boundaries

​Middleware-enforced authentication

​Behavioral constraints in prompts

​Jailbreak resistance

​Role-based agent isolation

​Immutable runtime configuration

​Minimal attack surface for restricted agents

​Hallucination control

​Tool-sourced financial data

​RAG-grounded responses

​Structured output validation

​Bounded context window

​Scope containment

Prompt injection defense

System prompt isolation

Declarative tool boundaries

Middleware-enforced authentication

Behavioral constraints in prompts

Jailbreak resistance

Role-based agent isolation

Immutable runtime configuration

Minimal attack surface for restricted agents

Hallucination control

Tool-sourced financial data

RAG-grounded responses

Structured output validation

Bounded context window

Scope containment