01 · Executive Summary

Held AI App
A Dual-Engine Companion

A HIPAA-compliant AI platform delivering 24/7 emotional support and clinically grounded education for caregivers of patients with mental health disorders.

Platform Vision

"Caregivers are the invisible backbone of mental health recovery — Held ensures they are never alone, never misinformed, and always supported."

HIPAA Compliant 24/7 Available Clinically Accurate

1 in 5

Adults affected by mental illness annually (US)

53M+

Unpaid family caregivers in the US

80%

Caregivers report chronic stress & burnout

Dedicated AI tools built for this audience

Empathy Engine

A compassionate conversational layer that provides real-time emotional validation and support for caregivers who simply need to be heard — no medical query required.

Clinical RAG Engine

A retrieval-augmented generation pipeline grounded in DSM-5-TR and evidence-based sources, delivering hallucination-resistant answers to complex clinical questions.

HIPAA Compliance by Architecture

Compliance is not a checkbox — it is baked into the infrastructure layer via Oracle Cloud Infrastructure's dedicated HIPAA-eligible environment, encrypted data pipelines, and strict PHI segregation protocols.

02 · System Architecture

The Dual-Engine
Agentic Workflow

Every message flows through an intelligent Intent Router that determines whether a caregiver needs empathy or clinical precision — and routes accordingly in milliseconds.

The Gatekeeper

Intent Router

Gemini 1.5 Flash

Every inbound message is first classified by a lightweight Gemini 1.5 Flash model acting as the Intent Router. It analyzes semantics, context, and emotional signals to determine — within milliseconds — whether the caregiver is venting and seeking emotional support, or asking a specific clinical/medical question. This binary routing decision drives the entire downstream pipeline, ensuring the right engine is always engaged.

PATH A

≈ 80% OF TRAFFIC

The Empathy Engine

When a caregiver is venting, distressed, or simply processing emotions, the pipeline bypasses the vector database entirely. A lightweight model responds with immediate, warm, empathetic conversational support — no latency from retrieval, no clinical framing.

Key Advantages

✓ Sub-second response time
✓ Dramatically reduced API cost
✓ Maintains therapeutic tone
✓ No hallucination risk (no retrieval)

PATH B

≈ 20% OF TRAFFIC

The Clinical RAG Engine

Triggered for specific medical/diagnostic questions ("What does BPD's splitting behavior look like?"). Performs a semantic search against the preprocessed DSM-5-TR vector store, retrieves the most precise chunks, and synthesizes a clinically accurate answer with source citations.

Pipeline Steps

1. Embed query → pgvector similarity search
2. Retrieve top-K hierarchical chunks
3. Synthesize via Claude 3.5 Sonnet / Gemini Pro
4. Return cited, hallucination-resistant answer

DSM-5-TR Preprocessing — The Critical Complexity

1,400 Pages

The DSM-5-TR is not simply ingested and chunked naively. Its 1,400-page structure demands extensive preprocessing to make it RAG-ready. Naive chunking by token count creates context bleed between diagnostic categories — a catastrophic failure mode for a clinical app.

Step 1: Parse

Extract and clean raw text. Remove page headers, footnotes, and publishing artifacts that pollute vector embeddings.

Step 2: Hierarchical Chunking

Segment by semantic headings: Diagnostic Criteria, Prevalence, Differential Diagnosis, Comorbidities. Each chunk stays within its logical boundary.

Step 3: Metadata Tagging

Each chunk is tagged with disorder name, category (e.g., "Bipolar I / Diagnostic Criteria"), and page reference for downstream citation generation.

Why It Matters

Meticulous chunking keeps context windows small and precise, preventing hallucinations and ensuring retrieval surfaces the exact diagnostic section — not adjacent, irrelevant text.

03 · Tech Stack

Discovery-Level
Stack Overview

A deliberate, production-ready architecture built for HIPAA compliance, scalability, and cost-efficient AI inference — without over-engineering at the discovery stage.

Frontend

Flutter

Cross-Platform

Flutter enables a single Dart codebase to produce native iOS and Android applications with pixel-perfect UI consistency. For a caregiver audience spanning all demographics, a seamless mobile-first experience is non-negotiable. Dart's compiled performance avoids the bridge overhead of React Native, and Flutter's widget tree maps naturally to the custom design language of the Held brand.

Cloud & Security

Oracle Cloud Infrastructure (OCI)

HIPAA Eligible

OCI was selected specifically for its enterprise-grade HIPAA-eligible environment. OCI's security posture — including compartments, IAM policies, data encryption at rest and in transit, and audit logging — maps precisely to HIPAA's Technical Safeguard requirements. Unlike AWS or GCP where HIPAA compliance requires extensive manual configuration, OCI's architecture makes compliant deployment the path of least resistance. Critically, OCI's HIPAA Business Associate Agreement (BAA) is included with the enterprise tier.

Database

PostgreSQL + pgvector

Unified Store

The pgvector extension transforms standard PostgreSQL into a fully capable vector database. This is a deliberate architectural decision that eliminates the need for a separate vector store (e.g., Pinecone, Weaviate) during the discovery and early launch phases — collapsing two infrastructure components into one.

Relational Data

User profiles, session history, caregiver–patient relationships, audit logs — all in structured PostgreSQL tables.

Vector Embeddings

DSM-5-TR hierarchical chunks stored as float arrays, enabling HNSW-indexed semantic similarity search at query time.

Cost Impact: Eliminating a managed vector database saves ~$200–$800/month at early scale. PostgreSQL is already provisioned; pgvector adds zero additional cost.

Intelligence Layer

Multi-Model Strategy

Routing + Empathy

Gemini 1.5 Flash

Ultra-fast, cost-efficient. Powers the intent router and the full Empathy Engine path with low latency and minimal cost per token.

Clinical RAG Synthesis

Claude 3.5 Sonnet /
Gemini 1.5 Pro

Reserved for the 20% of complex clinical queries. Longer context window handles retrieved DSM chunks; superior reasoning ensures accurate synthesis.

Stack at a Glance

Flutter

OCI (HIPAA)

PostgreSQL + pgvector

Intent Router (Flash)

Empathy / RAG Engine

04 · LLM Strategy & Unit Economics

API vs. GPU:
Why APIs Win at Launch

A rigorous OpEx vs. CapEx analysis that makes a clear case for closed-source API inference during the discovery and early launch phase.

RECOMMENDED

OpEx Model

Closed-Source APIs
(Gemini + Claude)

$0 upfront CapEx. Pay only per token consumed — no hardware procurement delay or capital lock-in.

Instant scalability. Traffic surges are absorbed by Google and Anthropic's infrastructure, not by the team's operational ceiling.

Zero ML Ops burden. No model fine-tuning, quantization, driver updates, or GPU health monitoring during an already complex build phase.

State-of-the-art models. Access to Gemini 1.5 Pro and Claude 3.5 Sonnet — models that outperform most open-source alternatives on clinical reasoning benchmarks.

Data processing agreements available. Both Google and Anthropic offer BAA agreements for healthcare use cases, maintaining HIPAA compliance.

DEFER TO SCALE

CapEx Model

On-Prem / Local
GPU Cluster

$14,000–$40,000+ upfront. An A100 node or H100 cluster requires massive capital before a single user is onboarded — an existential risk for a pre-revenue product.

Model fine-tuning required. Open-source models (LLaMA, Mistral) require significant prompt engineering and fine-tuning to reach clinical-grade accuracy. This is weeks of additional engineering work.

Dedicated ML Ops role needed. A full-time infrastructure engineer is required just to maintain GPU health, model serving, and inference optimization.

HIPAA PHI exposure risk. Hosting inference on-prem requires custom security architecture to ensure PHI never enters an uncontrolled processing environment.

Illustrative Unit Economics (Per 1,000 Active Users/Month)

Component	API (OpEx)	Local GPU (CapEx)
Empathy Engine (80% traffic)	~$45/mo	Amortized HW cost
Clinical RAG (20% traffic)	~$80/mo	Amortized HW cost
Infrastructure (OCI)	~$120/mo	$1,200+/mo
ML Ops Personnel	$0	~$8,000–12,000/mo
Total Monthly Burn	~$245/mo	$9,200–13,200/mo

* Illustrative estimates. Actual costs depend on average session length, query complexity, and storage requirements.

The Strategic Inflection Point

Local GPU infrastructure becomes cost-competitive only after ~50,000+ active monthly users with predictable usage patterns. Until then, API-first is the financially and operationally rational choice. The architecture is designed so that a future migration to self-hosted inference would be a configuration change, not a re-architecture.

05 · Roadmap & Team

6-Month Path
to Launch

A lean, executable roadmap from discovery to market — with the exact team composition required to ship a compliant, production-grade product on time.

Build-to-Launch Timeline

Month 1–2: Foundation & Compliance Architecture

Months 1–2

OCI environment setup with HIPAA controls. BAA execution with Google Cloud and Anthropic. PostgreSQL + pgvector deployment. DSM-5-TR data ingestion pipeline design and hierarchical chunking framework. Core Flutter app shell with authentication and onboarding flows.

Month 3: DSM-5-TR Processing & Vector Store

Month 3

Complete DSM-5-TR preprocessing pipeline. Parse, clean, chunk hierarchically by diagnostic sections. Embed all chunks and load into pgvector. Build and validate retrieval evaluation harness (precision@K, recall@K). Intent Router prompt engineering and benchmarking.

Month 4: Dual-Engine Build & Integration

Month 4

Build Empathy Engine conversation loop with Gemini 1.5 Flash. Build Clinical RAG pipeline end-to-end. Connect Flutter frontend to backend APIs. Session management, conversation history, and user state. Internal red-team testing for hallucination and clinical inaccuracy.

Month 5: Beta Testing & Clinical Review

Month 5

Closed beta with 50–100 caregivers. Mental health professional review of RAG outputs. Latency optimization and cost profiling. UI/UX refinement based on caregiver feedback. Security penetration testing and HIPAA audit trail review.

Month 6: Launch

Month 6

App Store and Google Play submission. Public launch with observability dashboards. Automated cost monitoring and rate limiting. Support tooling for caregiver escalations. Post-launch iteration backlog and Series A preparation materials.

Required Lean Team

CTO

Lead Architect · Compliance

Owns system design, OCI HIPAA configuration, BAA negotiations, security architecture, and final technical decisions. The compliance authority for the entire platform.

AI/Data Engineer

RAG Pipeline · DSM Preprocessing

Owns the entire intelligence layer: DSM-5-TR ingestion, hierarchical chunking, pgvector integration, intent router, empathy and RAG engine prompt engineering, and evaluation harness.

Backend/Cloud Engineer

OCI Infrastructure · APIs

Provisions and manages OCI infrastructure. Builds REST/GraphQL APIs consumed by Flutter. Owns PostgreSQL schema, session management, observability, and CI/CD pipelines.

Mobile Developer

Flutter · iOS & Android

Builds the complete Flutter application. Owns UI/UX implementation, state management, API integration, offline handling, and App Store / Google Play submission processes.

Held AI AppA Dual-Engine Companion

The Dual-EngineAgentic Workflow

Discovery-LevelStack Overview

API vs. GPU:Why APIs Win at Launch

6-Month Pathto Launch

Held AI App
A Dual-Engine Companion

The Dual-Engine
Agentic Workflow

Discovery-Level
Stack Overview

API vs. GPU:
Why APIs Win at Launch

6-Month Path
to Launch