Case studyRAG / LangGraph / Qdrant

Customer Support RAG Triage Agent

Support teams need consistent ticket triage, relevant evidence, and grounded draft responses without relying on a generic chatbot.

GitHub

Customer support RAG triage dashboard and workflow concept.

Project type: AI engineering
Core stack: Python, FastAPI, LangGraph
Delivery: Case study

Case Study

The problem, implementation decisions, measured evidence, and next improvements.

Overview

A retrieval-grounded support triage system with observable agent steps, safe fallback behavior, and measurable evaluation.

Problem

High-volume support queues mix intent classification, urgency assessment, retrieval, response drafting, and escalation decisions. Treating this as one unconstrained chat prompt makes the workflow difficult to inspect, evaluate, and operate safely.

Solution

Built a typed seven-node LangGraph workflow that normalizes tickets, classifies intent, detects urgency, retrieves similar Banking77 cases from Qdrant, drafts a response, checks grounding, and recommends a human action. Provider-aware caching and Gemini, Groq, and Cerebras fallback keep the workflow usable when one model is unavailable.

Outcome

Delivered a locally verified support-operations console with deterministic mock mode, semantic search, complete graph traces, provider health, and offline evaluation. The deployment work also identified model memory as the limiting factor on free hosting rather than an application correctness issue.

What It Proves

RAG architecture, LangGraph orchestration, vector retrieval, provider routing, evaluation design, FastAPI, React, Docker, and deployment troubleshooting.

Key Features

Fixed support workflow with typed state instead of an open-ended chatbot loop.
Cache-first multi-provider routing with bounded retries and safe degraded output.
Grounding checks, escalation logic, provider health, and end-to-end trace visibility.

Architecture

01
React operations console
02
FastAPI service
03
LangGraph workflow
04
Qdrant retrieval
05
Local BGE embeddings
06
LLM cache and provider router
07
Grounding and next action

Tech Stack

Python
FastAPI
LangGraph
Qdrant
BGE embeddings
React
TypeScript
Docker

Challenges & Trade-offs

Uses 13,069 Banking77 support queries and preserves all 77 original labels while mapping them to nine operational intents.
Exposes a seven-node execution trace so reviewers can inspect classification, retrieval, generation, and verification.
Free-tier deployment testing showed that embedding-model memory must be treated as an infrastructure constraint.

Future Improvements

Add a reranker and multilingual support.
Ingest help-center policy documents with stronger provenance.
Capture human feedback and response ratings for continuous evaluation.

Repository README, implementation, and deployment verification reviewed June 14, 2026.