Held AI App
A Dual-Engine Companion
"Caregivers are the invisible backbone of mental health recovery — Held ensures they are never alone, never misinformed, and always supported."
The Dual-Engine
Agentic Workflow
Every inbound message is first classified by a lightweight Gemini 1.5 Flash model acting as the Intent Router. It analyzes semantics, context, and emotional signals to determine — within milliseconds — whether the caregiver is venting and seeking emotional support, or asking a specific clinical/medical question. This binary routing decision drives the entire downstream pipeline, ensuring the right engine is always engaged.
When a caregiver is venting, distressed, or simply processing emotions, the pipeline bypasses the vector database entirely. A lightweight model responds with immediate, warm, empathetic conversational support — no latency from retrieval, no clinical framing.
✓ Dramatically reduced API cost
✓ Maintains therapeutic tone
✓ No hallucination risk (no retrieval)
Triggered for specific medical/diagnostic questions ("What does BPD's splitting behavior look like?"). Performs a semantic search against the preprocessed DSM-5-TR vector store, retrieves the most precise chunks, and synthesizes a clinically accurate answer with source citations.
2. Retrieve top-K hierarchical chunks
3. Synthesize via Claude 3.5 Sonnet / Gemini Pro
4. Return cited, hallucination-resistant answer
The DSM-5-TR is not simply ingested and chunked naively. Its 1,400-page structure demands extensive preprocessing to make it RAG-ready. Naive chunking by token count creates context bleed between diagnostic categories — a catastrophic failure mode for a clinical app.
Discovery-Level
Stack Overview
Flutter enables a single Dart codebase to produce native iOS and Android applications with pixel-perfect UI consistency. For a caregiver audience spanning all demographics, a seamless mobile-first experience is non-negotiable. Dart's compiled performance avoids the bridge overhead of React Native, and Flutter's widget tree maps naturally to the custom design language of the Held brand.
OCI was selected specifically for its enterprise-grade HIPAA-eligible environment. OCI's security posture — including compartments, IAM policies, data encryption at rest and in transit, and audit logging — maps precisely to HIPAA's Technical Safeguard requirements. Unlike AWS or GCP where HIPAA compliance requires extensive manual configuration, OCI's architecture makes compliant deployment the path of least resistance. Critically, OCI's HIPAA Business Associate Agreement (BAA) is included with the enterprise tier.
The pgvector extension transforms standard PostgreSQL into a fully capable vector database. This is a deliberate architectural decision that eliminates the need for a separate vector store (e.g., Pinecone, Weaviate) during the discovery and early launch phases — collapsing two infrastructure components into one.
Gemini 1.5 Pro
API vs. GPU:
Why APIs Win at Launch
| Component | API (OpEx) | Local GPU (CapEx) |
|---|---|---|
| Empathy Engine (80% traffic) | ~$45/mo | Amortized HW cost |
| Clinical RAG (20% traffic) | ~$80/mo | Amortized HW cost |
| Infrastructure (OCI) | ~$120/mo | $1,200+/mo |
| ML Ops Personnel | $0 | ~$8,000–12,000/mo |
| Total Monthly Burn | ~$245/mo | $9,200–13,200/mo |