• DevOps
    Case Study

    How we helped a development company rebuild DevOps for efficiency and scale.

    READ CASESTUDY
    icon

    24/7 DevOps as a Service

    Round-the-clock DevOps for uninterrupted efficiency.

    icon

    Infrastructure as a Code

    Crafting infrastructure with ingenious code.

    icon

    CI/CD Pipeline

    Automated CI/CD pipeline for seamless deployments.

    icon

    DevSecOps

    Integrated security in continuous DevOps practices.

    icon

    Hire DevOps Engineers

    Level up your team with DevOps visionaries.

    icon

    Consulting Services

    Navigate success with expert DevOps consulting.

  • TechOps
    Case Study

    How a US hosting leader scaled with us!

    READ CASESTUDY

    WEB HOSTING SUPPORT

    icon

    HelpDesk Support

    Highly skilled 24/7 HelpDesk Support

    icon

    Product Support

    Boost your product support with our expertise.

    MANAGED SERVICES

    icon

    Server Management

    Don’t let server issues slow you down. Let us manage them for you.

    icon

    Server Monitoring

    Safeguard your server health with our comprehensive monitoring solutions.

    STAFF AUGMENTATION

    icon

    Hire an Admin

    Transform your business operations with our expert administrative support.

    icon

    Hire a Team

    Augment your workforce with highly skilled professionals from our diverse talent pool.

  • CloudOps
    Case Study

    How we helped a Private Deemed University in India, save US $3500/m on hosting charges!

    READ CASESTUDY
    icon

    AWS Well Architected Review

    Round-the-clock for uninterrupted efficiency

    icon

    Optimize

    Efficient CloudOps mastery for seamless cloud management

    icon

    Manage

    Automated CI/CD pipeline for seamless deployments

    icon

    Migrate

    Upgrade the journey, Migrate & Modernize seamlessly

    icon

    Modernize

    Simplify compliance complexities with our dedicated services

    icon

    FinOps as a Service

    FinOps as a Service

  • SecOps
    Case Study

    Enabling financial grade platforms through strategic cloud modernisation.

    READ CASESTUDY
    icon

    VAPT

    Vulnerability Assessment and Penetration Testing

    icon

    Source Code Review

    Ensuring source code security ans safe practices to reduce risks

    icon

    Security Consultation

    On demand services for improving server security

    icon

    System Hardening

    Reduced vulnerability and proactive protection

    icon

    Managed SoC

    Monitors and maintains system security. Quick response on incidents.

    icon

    Compliance as a Service

    Regulatory compliance, reduced risk

  • K8s
  • Insights
    Case Study

    How we helped a Private Deemed University in India, save US $3,500/m on hosting charges!

    READ CASESTUDY
    icon

    Blog

    Explore our latest articles and insights

    icon

    Case Studies

    Read about our client success stories

    icon

    Flipbook

    Explore our latest Flipbook

    icon

    Events

    Join us at upcoming events and conferences

    icon

    Webinars

    Watch our educational webinar series

  • Contact Us

Interested to collaborate?

Get in touch with us!

Contact us today to learn how our team can help you leverage our managed cloud and DevOps services so you can focus on growing your business.

  • White Label Managed IT Services for MSPs
  • White Label MSP Support Services
  • Managed HelpDesk Services
  • White Label WordPress Maintenance Services
  • Outsourced WebHosting Support
  • Hosting HelpDesk Support Services
  • cPanel Server Management
  • Plesk Server Management
  • DevOps Automation Services
  • DevOps Containerization Services
  • DevOps Engineering Services Experts
  • DevOps Maturity Assessment
  • DevOps Testing Services & Automation
  • DevOps Implementation Services
  • DevOps Transformation Services
  • White Label Kubernetes IT Services
  • Cloud Automation Services
  • Cloud Modernization Services
  • Database Migration Services
  • DevOps Outsourcing Services

AWS

  • AWS DevOps Services for Scalable Cloud
  • AWS Well-Architected Review
  • AWS Migration Services

Azure

  • Azure DevOps Services & Automation
  • Azure Migration Services

Google Cloud

  • Google Cloud Managed Services
  • Google Cloud Migration Services
  • Google Cloud Platform Services
  • AWSAWS
  • Azure CloudAzure Cloud
  • Google CloudGoogle Cloud
  • Akamai CloudAkamai Cloud
  • OVHOVH
  • Digital OceanDigital Ocean
  • HetznerHetzner
  • Managed DigitalOcean Cloud
  • Managed OVH Cloud
  • Managed Hetzner Cloud
  • Managed Akamai Cloud
  • Oracle Managed Services
  • Our story
  • Life@SupportSages
  • Insights
  • Careers
  • Events
  • Contact Us
  • Sitemap

aws partneraws advanced partner
LinkedInFacebookXInstagramYouTube
SupportSages

Copyright © 2008 – 2026 SupportSages Pvt Ltd. All Rights Reserved.
Privacy PolicyLegal TermsData ProtectionCookie Policy

Building AI-IKB: How We Turned a Client’s Siloed Documentation Into a Dual-Intelligence AI System

Author Profile
Abhiram Thejas
  • 15 min read
Building AI-IKB: How We Turned a Client’s Siloed Documentation Into a Dual-Intelligence AI System

Generating audio, please wait...

 

A deep-dive into architecting a self-hosted, multi-tier RAG system using Open WebUI, LangGraph, Ollama, and pgvector - with one AI for the world, and one for the team.

The Problem We Were Hired to Solve 

When a client came to us, they had two pain points that felt disconnected - but were actually two sides of the same coin. 

Pain Point #1 - The Customer Support Bottleneck:

Their website had no intelligent way for visitors to find answers. Every question about services, pricing, or support went straight to an already-overwhelmed support team. Response times were slow. Conversions were dropping. 

Pain Point #2 - The Internal Knowledge Silo: 

Their engineering and operations team was drowning in 400+ Confluence pages, outdated runbooks, and “tribal knowledge” that lived only in the heads of senior engineers. Onboarding a new developer took weeks. Answering “what version is the auth service running in staging?” required a Slack message, a wait, and a prayer.

Both problems had the same root cause: their knowledge wasn’t working for them. It was locked in static files no one could query intelligently. 

Our solution: 

AI-IKB (Autonomous Infrastructure Knowledge Base) - a production-grade RAG system that serves two distinct audiences from one unified, self-hosted AI core.

The Vision: One Platform, Two Intelligent Surfaces 

The core architectural decision was this: we would not build one monolithic RAG system. Instead, we designed a platform with two completely isolated intelligence layers: 

Layer 

Audience 

Data Sources 

Use Case

Public AI Website Visitors / Customers Service docs, FAQs, pricing pages Answer questions, qualify leads, reduce support load 
Internal AI Engineers & Operations Team Confluence, Git repos, K8s manifests, architecture diagrams Infrastructure queries, codebase awareness, incident response 


This is not a UX decision - it’s a security architecture decision. Customers should never be able to query internal infrastructure data. Engineers should never be limited to only public-facing knowledge. The two worlds must be enforced at the data layer, not just the UI layer. 

 The Tech Stack 

To guarantee data sovereignty for our client, we deployed everything on their own infrastructure - no data ever leaves their network. 
 

Component 

Technology 

Role 

LLM Engine 

Ollama + Gemma 4 

Local inference, self-hosted 

Knowledge Hub 

Open WebUI 

API generation, knowledge collections, RBAC 

Orchestration 

LangGraph 

Stateful, agentic tool routing 

Vector & Relational DB 

PostgreSQL + pgvector 

Semantic search, state persistence 

Embeddings 

nomic-embed-text via Ollama 

Document vectorization 

Ingestion 

Docling 

Parsing docs, code, diagrams 

Caching 

Redis 

Low-latency repeated query response 

Deployment 

Kubernetes (K8s) + Kustomize 

Scalable, resilient container orchestration 

 

No GPU server? No problem. The system is provider-agnostic. The Ollama backend can be swapped for Google Gemini or OpenRouter APIs with a single environment variable change—giving teams full flexibility between self-hosted and cloud-based inference depending on budget, latency requirements, or privacy constraints. 

Promotional banner

How It’s Implemented: A Technical Deep Dive 

Step 1 - Separate Knowledge Collections in Open WebUI  

The foundation of the entire system is data separation, enforced at the source. 

Using Open WebUI’s Knowledge Collections, we created two isolated silos: 

  • “Client Public” Collection: Ingested and indexed from the client’s website content, product documentation, FAQ pages, and service brochures. 
     
  • “Infrastructure & DevOps” Collection: Ingested from Confluence pages, private Git repositories, Kubernetes manifests, YAML configuration files, and architecture diagrams. 

Each collection is completely independent. There are no shared vector indexes, no shared tables, and no shared query paths between them.

Step 2 - Scoped API Keys: RBAC at the Infrastructure Level 

This is the most important security layer. Data separation is achieved through standard Role-Based Access Control (RBAC) - a User or Service Account is granted exclusive access to a specific Knowledge workspace, and their associated API key inherits these strict boundaries. 

  • Public API Key → Tied to a Service Account with access only to the “Client Public” Collection. This key cannot query the internal database, cannot trigger any tool execution, and cannot see any configuration data. Even if this key is leaked or the frontend is compromised, an attacker cannot reach internal infrastructure information. 
     
  • Internal API Key → Tied to an Internal Service Account with access exclusively to the “Infrastructure & DevOps” Collection. While currently strictly a retrieval pipeline, this identity is structurally provisioned to securely accommodate our planned Phase 2 LangGraph tool nodes. 

This is Role-Based Access Control (RBAC) at the API layer - not just in the UI, not just in the application logic, but baked into the identity of each API key itself. 

Step 3 - The Public Website Chat: How It Works End-to-End 

Website Visitor Types Question 
         ↓ 
Client's Website (JS widget) → HTTPS POST → FastAPI Proxy 
         ↓ 
FastAPI checks Redis cache: 
  → Cache HIT?  → Return cached response in <10ms 
  → Cache MISS? → Forward to Open WebUI (Public API Key) 
         ↓ 
Open WebUI triggers Hybrid Search on "Client Public" Collection 
  → BM25 keyword match (exact terms) 
  → nomic-embed-text semantic search (meaning-based) 
  → Top 4 most relevant chunks retrieved 
         ↓ 
Gemma 4 generates a response (temp=0.2 for factual consistency) 
         ↓ 
Response returned to visitor in <2 seconds 
Result cached in Redis for future identical queries 

Why this matters to the client: Their support team now handles a fraction of routine queries. Customers get instant, accurate answers 24/7. The AI never goes off-brand because the retrieval is restricted to approved, curated content. 

Key tuning decisions: * Temperature 0.2: Ensures the model gives factual, consistent answers rather than creative but potentially wrong ones. * Top-4 Context Chunks: Balances retrieval quality with speed. More context = slower response. * Redis Token Bucket Rate Limiting: Protects the backend from abuse or scraping attempts. 

Step 4 - The Internal Employee Portal: How It Works End-to-End 

The internal portal is, at its core, a knowledge retrieval engine. Employees interact with a secure, SSO-authenticated Open WebUI instance that is wired exclusively to the “Infrastructure & DevOps” Knowledge Collection. It does not execute live commands or connect to external systems in its current implementation — that is intentional. 

Here is the current request flow: 

Employee Asks: "What version of the auth service is running in staging?" 
         ↓ 
Open WebUI Internal UI (SSO authenticated) → Internal API Key 
         ↓ 
Hybrid Search on "Infrastructure & DevOps" Collection 
  → BM25 keyword match + nomic-embed-text semantic search 
  → Top relevant chunks retrieved from indexed runbooks, 
    Confluence docs, YAML files, and architecture notes 
         ↓ 
Gemma 4 synthesizes a natural language response 
  → Cites the relevant document/section 
  → Provides version, architecture context, or config details 
         ↓ 
Employee receives a grounded, sourced answer in seconds 

Why this matters: A junior engineer can now get an answer in 10 seconds that previously required finding the right senior engineer, waiting for a response, and hoping the runbook was up to date. The AI becomes the institutional memory that never sleeps and never forgets. 

Codebase Awareness: We used Docling to recursively parse the client’s private Git repositories. The resulting embeddings allow the LLM to explain not just what the code does, but why it was architected that way — making it an invaluable onboarding and incident response tool. 

Safety Guardrails: What the AI Cannot Do 

A common concern when deploying internal AI assistants is unintended access. We addressed this by design, not as an afterthought: 

No Command Execution: The current system is strictly a retrieval and generation pipeline. It cannot run shell commands, kubectl commands, or any system-level scripts. There is no tool-call executor attached to the current deployment. 

No Secrets Retrieval via Chat: The AI does not have access to Kubernetes Secrets, .env files, or credential stores. These are explicitly excluded from the ingestion pipeline’s source scope. 

No Destructive Operations: Prompts asking the AI to perform actions like deletions (rm, kubectl delete, database drops) are handled gracefully — the model is instructed to decline and redirect to the appropriate human owner. 

Scoped Knowledge Only: The internal API key inherits strict RBAC boundaries restricting it to the “Infrastructure & DevOps” collection. It cannot query, infer from, or leak data from any other collection or data source. 

Planned: Agentic Tool Execution (Phase 2) 
In the next phase, we plan to introduce controlled LangGraph Tool Nodes that can execute a pre-approved, audited library of read-only Python scripts - for example, querying pod status from the Kubernetes API or fetching the latest Git commit for a service. Every tool will be whitelisted, sandboxed, and logged. This is a deliberate, phased approach to trust-building before granting the AI any operational access. 

Step 5 - Vector Database: Optimized for Speed and Accuracy 

  • HNSW Indexing: We used Hierarchical Navigable Small World (HNSW) indexes instead of the default IVFFlat. HNSW achieves better recall at lower latency, keeping vector similarity searches under 50ms even across millions of document chunks, using Cosine Distance as the metric. 
     
  • Semantic Chunking via Docling: Standard RAG systems split documents every N characters. We implemented semantic-aware chunking that respects Markdown headers, code block boundaries, and table structures. This means the LLM always receives complete, logically intact context - not a sentence that got cut off halfway through a YAML key. 

Step 6 - LLM Infrastructure: Purpose-Configured for Each Surface 

Both interfaces run on Ollama serving Gemma 4, but with configurations tuned per audience: 

Config 

Public Interface 

Internal Interface 

Context Window 

4k tokens 

8k tokens 

Temperature 

0.2 (factual) 

0.4 (analytical) 

Scaling 

Horizontal (HPA) 

Reserved resources 

 

We enforced Kubernetes Tolerations and Node Affinity so public and internal workloads run on separate node pools. High-traffic public queries cannot starve the internal agent of GPU compute during peak hours. 

The Impact: Before vs. After 

Metric 

Before AI-IKB 

After AI-IKB 

Customer support tickets (routine) 

High volume 

Significantly reduced 

Time to answer “what’s running in staging?” 

15–30 min (Slack + human) 

<15 seconds (AI) 

Developer onboarding time 

2–3 weeks 

Accelerated via codebase Q&A 

Infrastructure change audit trail 

Inconsistent 

100% Git-tracked 

Knowledge accessibility 

Siloed in Confluence 

Instantly queryable 

Lessons Learned

Building AI-IKB taught us three things that no AI tutorial will tell you: 

  1. RAG is 20% LLM, 80% data governance. The quality of the answer is determined entirely by the quality, structure, and segmentation of your ingested data. The model is the last mile. 
     
  2. Security architecture must be enforced at the data layer. Hiding sensitive data behind a UI is not security. When you enforce isolation at the vector database and API key level, there is no application bug that can cause a leak. 
     
  3. The “Human-in-the-Loop” is not a limitation — it’s a feature. The most productive AI systems aren’t autonomous; they’re AI-assisted. The agent proposes; the human decides. This builds trust, and trust is what makes adoption succeed. 

Is Your Organization Sitting on Untapped Knowledge?

If your team recognizes any of these symptoms: - Support teams drowning in repetitive, answerable questions - Engineers spending hours searching Confluence for a YAML value - New hires taking weeks to understand your codebase - Infrastructure changes with no reliable audit trail 

…then you’re carrying a hidden operational cost that AI-IKB was purpose-built to eliminate. 

We build these systems end-to-end — from knowledge ingestion and vector architecture to custom frontends and GitOps guardrails. Whether you need a self-hosted, fully private deployment or a cloud-connected hybrid using Gemini or OpenRouter, we architect for your constraints. 

Let’s talk about what we can build for your team. → Reach out to SupportSages 

  • Customer Care
  • Security

Continue Your Journey With…

SupportSages

SupportSages

SupportSages provides outsourced web hosting support, DevOps support and AWS. Get 24/7 expert support for seamless performance & security.

HelpDesk Support

HelpDesk Support

In a high-speed world, dependable support fue's business. Imagine disruptions vanished, replaced by proactive solutions and empowered teams.

Promotional banner
Promotional banner
Building AI-IKB: How We Turned a Client’s Siloed Documentation Into a Dual-Intelligence AI System

AWS EKS EBS volume attach issue in Kubernetes: Persistence Volume Attach issue fix EKS

AWS EKS EBS volume attach issue in Kubernetes: Persistence Volume Attach issue fix EKS
  • AWS
  • DevOps
logo

Kubernetes runtime security tool: Falco

Kubernetes runtime security tool: Falco
  • AWS
  • DevOps
logo

Monitor Kubernetes nodes with Wazuh

Monitor Kubernetes nodes with Wazuh
  • AWS
  • DevOps
logo

Posts by Abhiram Thejas