📚 System Overview

How Roost Client Data Works

A secure, human-in-the-loop system for managing client information with AI-powered ingestion and self-service portals.

🤔

The Problem

Client data comes from everywhere — forms, emails, documents, phone calls, CRM systems. It's scattered, often outdated, and there's no single source of truth. Staff waste hours chasing confirmations and manually updating records.

The Complete Data Flow

1

Data Flows In From Multiple Sources

📝

Customer Forms

Web forms clients fill out directly

🔗

CRM & Apps

Xero, Salesforce via webhooks

📄

Documents

PDFs, payslips, meeting notes

👤

Staff Upload

Manual entry from calls/meetings

2

Validation & AI Extraction

Form Data

Structured data maps directly to fields. Validated against schema rules.

AI Extraction

Unstructured documents (PDFs, notes) are processed by AI to extract facts.

âš ī¸

AI extractions are flagged for review

Nothing from AI goes directly into the database. Staff always verify first.

3

Pending Queue (Human Approval)

3 changes awaiting review
JH

John Henderson

Salary: $85,000 → $92,000

MT

Mike Thompson AI Extracted

Income: $120,000, Dependents: 3

Approved → Goes to database
Rejected → Discarded
Edit → Modify then approve
4

Live Database (Single Source of Truth)

24
Active Clients
156
Data Points
100%
Human Verified

Full Audit Trail

Every change tracked with who, what, when

Encrypted & Secure

PII encrypted at rest and in transit

🔄 The Feedback Loop

Client Self-Service Portal

Clients can review, confirm, and propose changes to their own data

👤 Client Views Their Data

Name John Henderson ✓
Salary $85,000
Employer Acme Corp ✓

âœī¸ Client Proposes a Change

$85,000

$92,000

"Got a raise in January"

📎 payslip-jan-2026.pdf

→ Goes to Pending Queue for staff approval

How Clients Access the Portal

📧 Staff sends invite
🔗 Magic link (48h expiry)
🔐 Secure session
✅ Review & confirm

AI-Powered Queries

Ask questions about your client data in natural language

"What's John Henderson's current salary?"

"Which clients have income over $100k?"

"Who hasn't confirmed their details in 6 months?"

Instant Answers

"John Henderson's salary is currently $85,000. There's a pending update to $92,000 awaiting approval (submitted 2 hours ago)."

Security Built In

🔒

HTTPS Only

TLS 1.3 encryption

đŸ›Ąī¸

Row-Level Security

Clients see only their data

📋

Audit Logging

Every access tracked

🔑

Magic Links + MFA

Secure authentication

The Complete Picture

EXTERNAL

Forms CRM Docs

API

Validate & Extract

PENDING

Human Review

DATABASE

Source of Truth

OUTPUT

Portal AI API
â†Šī¸ Client edits from Portal feed back into Pending Queue
đŸ› ī¸ Technical Architecture

Suggested Tech Stack

Production-ready, secure, and scalable

đŸ–Ĩī¸

Frontend

âš›ī¸

React / Next.js

Staff dashboard & client portal

🎨

Tailwind CSS

Styling & components

📊

shadcn/ui

Pre-built UI components

âš™ī¸

Backend

🐍

FastAPI

High-performance Python API

🐘

PostgreSQL

Database + row-level security

⚡

Redis

Caching & rate limiting

🤖

AI & Auth

đŸĻ™

Ollama / OpenAI

Document extraction & queries

🔐

Auth0 / Clerk

Authentication & MFA

🔑

HashiCorp Vault

Secrets management

🐘 PostgreSQL Features

  • ✓
    Row-Level Security (RLS)

    Clients can only access their own data at the database level

  • ✓
    pgcrypto Extension

    Encrypt PII fields at rest (name, email, salary)

  • ✓
    JSONB Support

    Flexible schema for varying client data structures

  • ✓
    Full Audit Trail

    Triggers log every change with timestamp and user

🐍 FastAPI Features

  • ✓
    Async Support

    Handle thousands of concurrent connections

  • ✓
    Auto-Generated Docs

    OpenAPI/Swagger documentation out of the box

  • ✓
    Pydantic Validation

    Type-safe request/response validation

  • ✓
    Dependency Injection

    Clean auth middleware and DB session management

â˜ī¸ Hosting Options

🏠
Self-Hosted (NAS)

Run on your own hardware for full control and data sovereignty

â€ĸ Docker Compose setup

â€ĸ Cloudflare Tunnel for access

â€ĸ Local backups

🚀
VPS / Cloud

Deploy to DigitalOcean, Hetzner, or AWS for reliability

â€ĸ Single VM or Kubernetes

â€ĸ Managed PostgreSQL option

â€ĸ Auto-scaling available

⚡
Serverless

Use managed services for minimal ops overhead

â€ĸ Vercel (Frontend)

â€ĸ Railway / Render (API)

â€ĸ Supabase (Database)

{} API Endpoints

# Ingestion (creates pending changes)
POST /api/ingest/form       # Customer form submission
POST /api/ingest/webhook    # External app webhooks
POST /api/ingest/document   # Document upload (AI extract)

# Approval (staff only)
GET  /api/pending           # List pending changes
POST /api/pending/{id}/approve
POST /api/pending/{id}/reject

# Query (read approved data)
GET  /api/clients           # List clients
GET  /api/clients/{id}      # Get client facts
POST /api/query             # Natural language query

# Client Portal
GET  /api/portal/me         # Get own data
POST /api/portal/confirm    # Confirm data is correct
POST /api/portal/propose    # Propose a change

🤖 AI Models

📄 Document Extraction

Extract structured data from unstructured documents (PDFs, meeting notes, payslips)

đŸĻ™ Llama 3.2
Local / Free
◐ GPT-4o
Cloud / Paid
▲ Claude 3.5
Cloud / Paid
đŸ’Ŧ Natural Language Queries

Ask questions about client data in plain English

🔷 Mistral 7B
Local / Free
◐ GPT-4o mini
Low cost
G Gemini 1.5 Flash
Low cost
💡 Recommended Approach
✓

Privacy-first: Use local Ollama models (Llama, Mistral) - data never leaves your server

✓

Best accuracy: Use GPT-4o or Claude for complex document extraction

✓

Hybrid: Local for queries, cloud for document extraction only

✓

Cost control: Set spending limits and fallback to local models

đŸ›Ąī¸ Security Architecture

1 Network Layer
Perimeter

HTTPS Only

TLS 1.3

HSTS

Force HTTPS

Rate Limiting

100 req/min

WAF / DDoS

Cloudflare

2 Authentication Layer
Identity

Staff Login

  • â€ĸ Email + Password + MFA (TOTP)
  • â€ĸ SSO via Google/Microsoft
  • â€ĸ Session timeout: 8 hours
  • â€ĸ IP allowlisting (optional)

Client Portal

  • â€ĸ Magic link (256-bit token)
  • â€ĸ Link expires in 48 hours
  • â€ĸ Single-use or limited uses
  • â€ĸ Optional: verify DOB/phone
3 Authorization Layer
Access Control

👤 Client

  • â€ĸ View own data only
  • â€ĸ Propose changes
  • â€ĸ Confirm details

👔 Staff

  • â€ĸ View all clients
  • â€ĸ Approve/reject changes
  • â€ĸ Send review requests

âš™ī¸ Admin

  • â€ĸ Manage users
  • â€ĸ Configure forms
  • â€ĸ API key management
4 Data Protection Layer
Encryption

At Rest

  • â€ĸ AES-256 encryption for PII fields
  • â€ĸ Encrypted database backups
  • â€ĸ Hashed email for lookups (SHA-256)
  • â€ĸ Keys stored in Vault, not env vars

In Transit

  • â€ĸ TLS 1.3 for all connections
  • â€ĸ Certificate pinning (mobile)
  • â€ĸ Encrypted database connections
  • â€ĸ No sensitive data in URLs
5 Audit & Monitoring Layer
Visibility

Access Logs

Who viewed what, when

Change History

Every modification tracked

Failed Logins

Alert on 3+ failures

Data Exports

Logged & approved

Session Token Structure (JWT)

{
  "jti": "unique-session-id",     // For revocation
  "sub": "client:123",            // User identity
  "iat": 1706745600,              // Issued at
  "exp": 1706747400,              // Expires (30 min)
  "scope": ["read:own", "propose"], // Permissions
  "ip": "203.118.x.x"             // IP binding (optional)
}

Privacy Act NZ

Compliant data handling

Right to Access

Client portal provides this

Data Retention

Configurable auto-delete

🚀 Quick Start (Self-Hosted)

# Clone and run with Docker

git clone

https://github.com/yourorg/roost-client-data

cd

roost-client-data

cp

.env.example .env

docker-compose up

-d

# → API at localhost:8000

# → Dashboard at localhost:3000

Ready to see it in action?

Try the interactive demo to experience the full workflow

Launch Demo →