Behind the Build: How We Created a Secure, AI-Powered Financial Document Assistant for State Bank

Introduction

In the world of finance, navigating complex documents quickly and securely is a constant challenge. For State Bank, privacy and data security are paramount—especially when leveraging the power of AI. Our team was tasked with building a solution that could answer questions about sensitive financial documents, all while ensuring data never left the organization’s secure environment.

The result? Pennywise: a local, AI-powered financial expert bot that combines cutting-edge language models, Retrieval Augmented Generation (RAG), and image understanding—all running privately, with no data sent to the cloud.

General Overview

Pennywise is a secure, on-premises assistant that allows users to:

Upload financial documents (PDFs, images)
Ask natural language questions about their content
Receive clear, context-aware answers—even for information not present in the AI’s original training data

Key features include:

Local LLM (Llama3) for privacy: All processing happens on-site, ensuring sensitive data never leaves the bank’s infrastructure.
RAG (Retrieval Augmented Generation): Combines document search with AI reasoning for accurate, up-to-date answers.
Image-to-Text Integration: Extracts and explains information from images within documents, ensuring no data is lost.
Fast, user-friendly API: Built with FastAPI for seamless integration and rapid response times.

Tech Stack

Frontend: (Not specified in code; API-first design allows for web, desktop, or internal tool integration)
Backend: Python, FastAPI
Database: ChromaDB (vector database for semantic search)
Infrastructure/DevOps: Runs locally/on-premises; Uvicorn server with SSL support for secure deployment
AI/ML Tools:
Llama3 (8B, 4-bit quantized, via Langchain)
Sentence Transformers (all-MiniLM-L6-v2 for embeddings)
Langchain, ChromaDB, OpenAI API (for hybrid or fallback scenarios)
PyPDF2, image-to-text modules for document and image parsing

How AI Powers the System

AI is at the heart of Pennywise, enabling it to understand, search, and explain complex financial documents:

Local LLM (Llama3): The core language model, running entirely on the bank’s hardware, ensures privacy and compliance.
RAG Pipeline: When a user asks a question, the system:
Searches the uploaded documents using semantic embeddings (all-MiniLM-L6-v2)
Retrieves the most relevant passages
Feeds them, along with the user’s question, into the LLM for a tailored answer
Image Understanding: If a document contains images (e.g., charts, scanned tables), the system uses GPT or similar models to generate text explanations, which are then indexed alongside the rest of the document. This ensures no information is lost, and users can query image content as easily as text.
Integration: All AI components are orchestrated via Langchain, with ChromaDB providing fast, semantic search over both text and image-derived content.
Challenges Solved:
Data Privacy: By running all models locally, no sensitive data is exposed to external APIs.
Image Data Loss: By extracting and saving image explanations, the system ensures complete document coverage.
Performance: Efficient vector search and quantized models keep response times low, even on large documents.

Technical Breakdown

Database Structure: ChromaDB stores high-dimensional vector embeddings for all document chunks (text and image explanations), enabling fast similarity search.
Backend Architecture:
FastAPI app with modular routers for PDF processing and query handling
Middleware for CORS and logging
SSL support for secure deployment
Key API Routes:
/process_pdf/: Accepts PDF uploads, extracts text (and image explanations), splits into chunks, embeds, and stores in ChromaDB
/process_query/: Accepts user questions, retrieves relevant document chunks, and generates answers via the LLM
Engineering Decisions:
Local-first AI: Chosen for privacy and compliance
RAG over pure LLM: Ensures up-to-date, document-specific answers
Image-to-Text: Prevents data loss from non-textual content

User Journey Walkthrough

User uploads a financial document (PDF, possibly containing images) via the API or UI.
Backend processes the document:
Extracts all text using PyPDF2
Detects images, generates text explanations using GPT or similar
Splits content into manageable chunks
Embeds each chunk using all-MiniLM-L6-v2
Stores embeddings in ChromaDB for fast retrieval
User asks a question about the document (e.g., “What is the total revenue in Q2?”)
System retrieves relevant chunks from ChromaDB using semantic similarity
RAG pipeline combines the retrieved context with the user’s question
Local LLM (Llama3) generates an answer, using both the question and the retrieved document content
User receives a clear, context-aware response—with privacy and security guaranteed

Text-Based Flowchart / Architecture

User
  ↓
Uploads Document (PDF/Image) → [API: /process_pdf/]
  ↓
Backend
  → Extracts text from PDF
  → Detects images, generates text explanations
  → Splits content into chunks
  → Embeds chunks (text + image explanations)
  → Stores embeddings in ChromaDB
  ↓
User
  ↓
Asks Question → [API: /process_query/]
  ↓
Backend
  → Retrieves relevant chunks from ChromaDB (semantic search)
  → Combines context with user question (RAG)
  → Passes to Local LLM (Llama3)
  → Generates answer
  ↓
Returns Answer to User

Conclusion

Pennywise demonstrates how advanced AI can be harnessed for sensitive, real-world applications—without sacrificing privacy or security. By combining local LLMs, RAG, and image-to-text integration, we delivered a solution that’s both powerful and compliant with the strictest data protection requirements.

This project highlights our team’s ability to:

Build secure, AI-driven systems for regulated industries
Integrate state-of-the-art language and vision models
Solve real business problems with creative engineering

Future plans include expanding to more document types, adding richer analytics, and further optimizing performance for even larger datasets. If you’re looking to bring secure, AI-powered insights to your organization, let’s connect!