Most AI sales agent tutorials stop at "make it respond to messages." That's the easy part. The hard part — and the part that determines whether your agent actually closes deals — is making it remember. Not just within a session, but across every conversation that customer has ever had with your product.
This guide walks through the complete architecture: what to store, how to extract it, and how to inject it so your agent walks into every conversation already knowing the customer.
Step 1: Design your customer profile schema
Before writing any code, you need to decide what your agent should know about a customer. The temptation is to store everything. Resist it. A bloated profile is as useless as no profile — your agent needs curated, actionable intelligence, not a transcript archive.
For a sales agent, the profile should capture:
Keep the schema tight. Every field should answer a question your agent would otherwise ask out loud.
Step 2: Extract profile updates after each conversation
After each conversation ends, you need to update the customer profile with new information. This is an LLM task — you pass the conversation transcript and the current profile to a model and ask it to produce an updated profile.
The extraction prompt is critical. It should instruct the model to:
- Add new facts not present in the current profile
- Update existing facts that have changed (e.g. budget revised upward)
- Resolve contradictions in favor of the most recent statement
- Never duplicate — merge, don't append
- Ignore small talk, pleasantries, and irrelevant content
# Pseudocode — extraction call extract_profile_update(customer_id, conversation, current_profile): prompt = f""" You are a customer intelligence extractor. Current profile: {current_profile} New conversation: {conversation} Return an updated profile JSON that: - Merges new facts with existing ones - Updates fields where the customer has shared new information - Resolves contradictions in favor of the latest statement - Keeps the same schema structure - Omits nothing from the current profile unless explicitly contradicted """ response = llm.call(prompt) updated_profile = parse_json(response) return store_profile(customer_id, updated_profile)
Step 3: Inject the profile at session start
Before your agent handles the customer's first message in a new session, fetch their profile and prepend it to the system prompt. This is where the magic happens — the agent walks into the conversation already knowing everything.
# Pseudocode — session initialization start_session(customer_id, incoming_message): # Fetch the profile (~400 tokens) profile = fetch_profile(customer_id) system_prompt = f""" You are a helpful sales agent. === Customer Profile === {format_profile(profile)} ======================= Use this profile to personalize your responses. Reference relevant details naturally — don't recite the profile. Update your approach based on known objections and preferences. """ return llm.chat( system=system_prompt, message=incoming_message )
That's it. The agent now has full context in ~400 tokens. It knows the customer's budget, their last objection, their daughter's upcoming birthday, and that they prefer WhatsApp over email — all without asking a single question.
Step 4: Handle new customers gracefully
For customers with no prior profile, the agent should simply not have a profile section in its system prompt. First interactions are treated as a fresh conversation, and the extraction step after the session will create a new profile from scratch.
# Pseudocode — handle missing profile profile = fetch_profile(customer_id) if profile is None: system_prompt = base_system_prompt # No profile section else: system_prompt = base_system_prompt + format_profile(profile)
Using DeepRaven instead of building it yourself
The architecture above works, but building it properly — handling concurrent updates, profile conflicts, storage at scale, and a robust extraction pipeline — takes significant engineering time. DeepRaven implements all of this as a two-endpoint API:
# Ingest a conversation after it ends POST https://api.deepraven.ai/v1/ingest { "customer_id": "customer_123", "messages": [ { "role": "user", "content": "..." }, { "role": "assistant", "content": "..." } ] } # Fetch the profile before a new session GET https://api.deepraven.ai/v1/profile/{customer_id} # Response: compact profile ready to inject { "customer_id": "customer_123", "buying_triggers": [...], "budget": "~$200", "objections": [...], "preferences": [...], "personal_details": [...], "channel_preference": "WhatsApp", "journey_stage": "warm_lead", "last_interaction": "Discussed rose gold necklace options..." }
The result
An agent built with this pattern — whether you build the memory layer yourself or use DeepRaven — behaves fundamentally differently from a stateless agent. Every session feels like a continuation of a relationship, not a cold restart. Customers don't repeat themselves. Agents don't ask questions they've already been answered. Conversion rates improve because the agent spends time closing, not catching up.
The technical investment is modest. The business impact is not.