Sounding like a real mother, not a clinic brochure
Generic AI replies in Indonesia sound formal, distant, and clinical โ the opposite of how mothers actually talk to each other. Baibu is built to sound like a peer on WhatsApp: warm, brief, local, and emotionally tuned-in.
Same question. Two different worlds.
A worried mother asks the same thing two ways. The first response is what a generic assistant produces. The second is what we're building toward.
Reads like a leaflet. No warmth. No follow-up. Mother stops replying.
Validates feelings first. Practical next step. One clear question. Mother keeps talking.
400,000 words of real mothers talking
Our starting asset is a curated corpus of authentic WhatsApp conversations between Indonesian mothers and community helpers โ anonymized, consented, and labeled. It's the source material for everything that follows.
Words of authentic peer-to-peer WhatsApp conversation between Indonesian mothers.
Names, locations, phone numbers, clinics, and personal stories are aggressively stripped before anything is used.
Every turn is labeled by topic, emotion, intent, language mix, and quality โ turning chat into structured data.
The goal isn't to copy individual mothers. It's to learn the shared register: warm, brief, practical, local.
Teach by example, not by retraining
Instead of expensively retraining a model, we show it the right examples at the right moment โ a technique called in-context learning. It's faster, cheaper, transparent, and improves continuously as the corpus grows.
The playbook
A short, hand-curated guide describing how Baibu should always sound โ warm, peer-like, short messages, acknowledge feelings before advice, ask one clear follow-up, never read like a leaflet. It rides along with every conversation.
Scenario cards
A library of 10โ30 hand-written cards covering recurring situations โ worried-about-child-fever, embarrassed-question, family-stress, urgent-health-warning. Each card teaches how to handle that specific moment.
A library of model responses
Hundreds of sanitized "what a good reply looks like" examples โ each one a user message paired with the kind of warm, local, brief response we'd want. They're labeled by topic, emotion, urgency, and language register.
examples at launch
examples within the first year
shown to the model each turn
What happens when a mother sends a message
Every incoming message is understood, matched, and answered in a few hundred milliseconds. The model never sees the whole corpus โ only the right slice for this specific moment.
Understand the message
Detect the language mix (Indonesian, Javanese, Sundanese, code-switched), the topic, the emotion, and whether it's a safety-sensitive moment. This bridges low-resource input into reliable matching.
Pick the right scenario cards
One or two scenario cards are selected โ for example, "worried child fever" plus "medical triage." They tell the model how to handle this kind of moment specifically.
Surface relevant examples
Three to five carefully chosen examples from the library are pulled in โ matching topic, emotion, language register, and urgency. Diverse, never duplicates.
Compose the reply
The playbook, the cards, and the examples are assembled into a compact prompt. The model writes a fresh response โ in the same voice as the examples, never copying them word-for-word.
Do mothers keep talking to us?
Naturalness, warmth, and locality are subjective. But there is one signal that captures all of them at once: whether the mother sends a second message.
Second-turn continuation rate
If Baibu sounds like a real peer, the mother replies. If it sounds clinical, she stops. Every prompt change, scenario card, and exemplar gets measured against this single, honest signal.
More replies =
more conversation = more data = better voice
Why teach-by-example beats retraining
Lower cost to start
No GPU training runs. We use existing open-weight multilingual models and shape their behavior with examples โ costing orders of magnitude less to iterate.
Transparent & auditable
For every reply we can show exactly which scenario cards and examples were used. Bad outputs get traced back to specific examples and fixed in minutes, not months.
Improves daily, not quarterly
Adding or removing an example is instant. The system gets better every time a contributor sends a message โ no model retraining required.
Safer for sensitive moments
Health-related conversations are routed through dedicated safety cards. Private content is never baked into model weights โ it stays in an auditable, removable data layer.
AI that actually sounds like the community it serves.
Funders, researchers, and clinical partners โ talk to us about co-designing scenario coverage, safety cards, or evaluation sets.