10x more users, half the wait!

A Casestudy on how SupportSages reduced the response time (40s to 15s) of a Langflow Platform for an Enterprise RAG - Chatbot Provider

university

*

The client's RAG chatbot was hitting scale limits. We revamped its GCP infrastructure, making it faster, secure, and ready to handle 500+ users seamlessly.

About the Client

Our client had already built an advanced Retrieval-Augmented Generation (RAG) Chatbot system that plugs seamlessly into multiple websites and internal applications. Although the stack ran on Google Cloud Platform, the team needed and their customers demanded enterprise-grade security and the capacity to serve 500+ concurrent users without performance drops.

The Challenges

The client struggled with performance slowdowns and security gaps as their RAG chatbot scaled across multiple platforms. Their existing GCP setup couldn't efficiently handle 500+ concurrent users, leading to latency and reliability issues.

VM based Architecture

VM based Architecture

Frequent downtime, no automated scaling, manual maintenance overhead.

High Memory Usage

High Memory Usage

Langflow runtime steadily consumed over 2 GB of memory, eventually leading to system crashes.

PostgreSQL Deadlocks

PostgreSQL Deadlocks

100% of requests started failing under load testing conditions.

Slow Response Times

Slow Response Times

Chat responses took 30-40 seconds under typical concurrency.

Low Concurrent Capacity

Low Concurrent Capacity

The system could reliably handle only ~50 concurrent users.

Public Exposure Risk

Public Exposure Risk

Endpoints were publicly accessible with no unified access control.

We've set some ambitious Goals!

Goal bullet

Seamlessly handle 500+ concurrent users without performance dips.

Goal bullet

Eliminate downtime and memory bottlenecks tied to VM-based setups.

Goal bullet

Resolve critical PostgreSQL deadlocks during heavy concurrent flows.

Goal bullet

Reduce the chat response latency for a smoother user experience.

Goal bullet

Strengthen security with private-only access to infrastructure.

Goal bullet

Keep costs predictable even as usage scaled month over month.

Goal bullet

Auto-scale infrastructure to absorb traffic spikes effortlessly.

Goals illustration
SupportSages solution

SupportSages’ Solution

We transformed the deployment from a monolithic VM-based architecture into a fully containerised, autoscaling, and secured GKE infrastructure.

Kubernetes Migration

We deployed Langflow UI and runtime separately on a GKE Standard cluster with HPA and VPA for optimal scaling, using custom values.yaml files for each environment - Dev, UAT and Prod.

01Kubernetes Migration graphic

Secure Network Setup

We secured access via an Internal Load Balancer and Pritunl VPN on a hardened Ubuntu VM, using UFW and Cloud NAT for safe outbound connectivity without public IPs.

02Secure Network Setup graphic

PostgreSQL Stabilization via Cloud SQL

We migrated the database to CloudSQL for PostgreSQL, optimized performance by tuning connections and indexes, and worked with the Langflow core team to refactor queries causing locks.

03PostgreSQL Stabilization via Cloud SQL graphic

CI/CD and Observability

We used Helm rolling updates for zero-downtime upgrades, added alerts for DB and autoscaler limits, and set up Cloud Monitoring to track pod performance, DB locks, and chat latency.

04CI/CD and Observability graphic

Memory-Aware Pod Configuration

We set strict memory requests and limits in deployment YAMLs, enabling Kubernetes to auto-evict and reschedule pods under memory pressure, which improved runtime stability by minimizing GC pauses.

05Memory-Aware Pod Configuration graphic

High-Level Architecture Diagram

High-Level Architecture Diagram

There’s more to our effort, Do you want to explore our client’s full story?

And, our solution resulted in...

Reliable Support for Concurrent Users

The platform now sustains 500+ concurrent users, up from just ~50 earlier.

Reduced failure rate to Zero%

PostgreSQL deadlocks were eliminated, reducing failures from 100% under load to 0% after optimization.

Deployment downtime was reduced to zero, compared to 2-3 minutes for every upgrade earlier.

Improved chat response time!

Chat response times improved by over 50%, dropping from 30 - 40 seconds to ~15 seconds.

Resolved out-of- memory crashes

Frequent out-of-memory crashes were resolved with auto-rescaling and recovery mechanisms.

Network and infrastructure costs are now ~40% optimized, making operations more efficient.

Trophy

And, these impressive Business Impact!

Latency improvement

Massive latency improvement

Average chat latency halved during the peak load.

Security-first Design

Security-first Design

All traffic routed through VPN and Internal Load Balancer with zero public exposure.

More reliable and resilient backend

Langflow can now operate without incident during internal demos, live tests, or production scale.

Backend reliability

Optimized Cost Structure

Automated horizontal scaling led to significant savings compared to always-on VMs.

Cost optimization

Explore More...

Download the CaseStudy