Customer Support RAG Triage Agent
Support teams need consistent ticket triage, relevant evidence, and grounded draft responses without relying on a generic chatbot.

- Project type
- AI engineering
- Core stack
- Python, FastAPI, LangGraph
- Delivery
- Case study
Case Study
The problem, implementation decisions, measured evidence, and next improvements.
Overview
A retrieval-grounded support triage system with observable agent steps, safe fallback behavior, and measurable evaluation.
Problem
High-volume support queues mix intent classification, urgency assessment, retrieval, response drafting, and escalation decisions. Treating this as one unconstrained chat prompt makes the workflow difficult to inspect, evaluate, and operate safely.
Solution
Built a typed seven-node LangGraph workflow that normalizes tickets, classifies intent, detects urgency, retrieves similar Banking77 cases from Qdrant, drafts a response, checks grounding, and recommends a human action. Provider-aware caching and Gemini, Groq, and Cerebras fallback keep the workflow usable when one model is unavailable.
Outcome
Delivered a locally verified support-operations console with deterministic mock mode, semantic search, complete graph traces, provider health, and offline evaluation. The deployment work also identified model memory as the limiting factor on free hosting rather than an application correctness issue.
What It Proves
RAG architecture, LangGraph orchestration, vector retrieval, provider routing, evaluation design, FastAPI, React, Docker, and deployment troubleshooting.
Key Features
- Fixed support workflow with typed state instead of an open-ended chatbot loop.
- Cache-first multi-provider routing with bounded retries and safe degraded output.
- Grounding checks, escalation logic, provider health, and end-to-end trace visibility.
Architecture
- 01
React operations console
- 02
FastAPI service
- 03
LangGraph workflow
- 04
Qdrant retrieval
- 05
Local BGE embeddings
- 06
LLM cache and provider router
- 07
Grounding and next action
Tech Stack
- Python
- FastAPI
- LangGraph
- Qdrant
- BGE embeddings
- React
- TypeScript
- Docker
Challenges & Trade-offs
- Uses 13,069 Banking77 support queries and preserves all 77 original labels while mapping them to nine operational intents.
- Exposes a seven-node execution trace so reviewers can inspect classification, retrieval, generation, and verification.
- Free-tier deployment testing showed that embedding-model memory must be treated as an infrastructure constraint.
Future Improvements
- Add a reranker and multilingual support.
- Ingest help-center policy documents with stronger provenance.
- Capture human feedback and response ratings for continuous evaluation.
Repository README, implementation, and deployment verification reviewed June 14, 2026.