11/17/2025

Building a Context-Aware AI System That Learns, Reuses, Retrieves, Decides, and Validates — for Automated Invoice Processing

 






Automating invoice processing sounds simple — until you try to do it securely, reliably, and at scale.

In real business environments, invoices vary widely. A modern system needs to:

  • Parse PDFs and image-only scans

  • Detect vendor layout automatically

  • Learn new invoice templates on the fly

  • Validate extracted fields using a vision model

  • Govern template promotion securely

  • Run entirely offline in an air-gapped environment

This guide walks you through a production-grade architecture that achieves all of this using:

  • LangGraph → context engineering & workflow automation

  • Milvus → layout-similarity retrieval (RAG)

  • Redis → template caching & success/failure metrics

  • Cerbos → policy enforcement for template promotion

  • Ollama → local LLM & vision self-evaluation

  • Custom Invoice Toolkit → extraction, OCR, signature hashing


What Is Context Engineering?

In modern LLM systems, context engineering means:

Designing the flow, structure, and evolution of all the information that guides an LLM’s behavior — before, during, and after inference.

It’s not prompt engineering.
It’s not model fine-tuning.
It’s the systematic orchestration of:

  • state

  • memory

  • retrieval

  • validation

  • control flow

  • policy

  • learned templates

  • environment variables

  • role-based decisions

  • vision inputs

  • multi-step reasoning

It’s workflow-level intelligence management.

This pipeline demonstrates that very strongly.

Let’s break down how this AirGap AI system works end-to-end.


Part 1 — The Key Components

This project is structured around several cooperating services, each with a clear responsibility.


Docker Compose — The Orchestration Layer

 docker-compose.yml launches:

  • Redis — caching + metrics

  • Milvus — vector DB powering RAG

  • Ollama — local LLM + vision model

  • Cerbos — policy decision point

  • Supporting infrastructure

All of this runs inside an air-gapped network — no external dependencies or cloud calls.


Redis — Fast Memory Layer for Templates & Metrics

Redis serves two essential functions:

Template Cache

Stores:

  • active templates

  • staging templates

  • template promotion status

template_cache.py


def get_template(redis, signature):

    key = f"template:{signature}:active"

    tpl = redis.get(key)

    if tpl:

        return json.loads(tpl)


    staging_key = f"template:{signature}:staging"

    tpl = redis.get(staging_key)

    if tpl:

        return json.loads(tpl)


    return None


Metrics tracking

Tracks:

  • success counts

  • failures

  • vision_pass / vision_fail

  • template usage counters

def increment_success(redis, signature):

    redis.incr(f"metrics:{signature}:success_count")




This enables learning, promotion, and governance logic.


Milvus — Vector Search for Layout Retrieval

Every invoice gets a signature hash derived from its textual structure.
This is embedded into a vector and stored in Milvus.

Milvus enables:

  • layout similarity

  • fast suggestions

  • template reuse

This is Retrieval-Augmented Processing (RAP) for document pipelines.

node_milvus_suggest (from nodes.py)

results = milvus.search(collection="signatures", data=[state.embedding])

if results and results[0].distance < 0.2:

    state.suggested_signature = results[0].id



Cerbos — Secure Policy Enforcement

Cerbos externalizes business rules such as:

  • Who can promote templates

  • Roles allowed to bypass review

  • Permission checks on actions

It decouples authorization from code.


cerbos_client.py

def can_promote_template(role: str) -> bool:

    resp = cerbos_client.check(

        principal={"id": "user", "roles": [role]},

        resource={"kind": "template", "id": "promotion"},

        actions=["promote"]

    )

    return resp.is_allowed("promote")



LangGraph — The Workflow Conductor

LangGraph provides:

  • a deterministic state machine

  • branching decisions

  • node execution

  • in-memory or Redis-checkpointed state persistence

  • observability & debuggability

This is the brain coordinating all the specialists.


build.py (core of the pipeline)

with StateGraph(InvoiceState) as graph:

    graph.add_node("extract_pdf", node_extract_pdf)

    graph.add_node("ocr_if_needed", node_ocr_if_needed)

    graph.add_node("signature", node_signature)

    graph.add_node("check_cache", node_check_cache)

    graph.add_conditional_edges("should_reuse_or_search", should_reuse_or_search)

    graph.add_node("milvus_suggest", node_milvus_suggest)

    graph.add_conditional_edges("should_use_suggest_or_learn", should_use_suggest_or_learn)

    graph.add_node("learn_and_stage", node_learn_and_stage)

    graph.add_node("extract_fields", node_extract_fields)

    graph.add_node("vision_validate", node_vision_validate)

    graph.add_conditional_edges("should_pass_or_review", should_pass_or_review)

    graph.add_node("promote_template", node_promote_template)

    graph.add_node("mark_for_review", node_mark_for_review)

    graph.add_node("done", node_done)



The Invoice Specialists — src/invoice/

Each module is a “specialist”:

  • pdf_io → PDF extraction + OCR

  • signature → signature hashing

  • template_cache → Redis template CRUD

  • template_learner → auto-regex rule generation via LLM

  • extract → apply regex rules to text

  • vision_validate → vision model consistency check

  • cerbos_client → policy decisions

  • metrics → record successes, failures, promotions

This cleanly separates concerns between workflow (LangGraph) and operations (specialists).


Part 2 — Architecture & Setup

Dependencies (requirements.txt)

Includes:

  • langgraph

  • langgraph-checkpoint

  • redis

  • pymilvus

  • Cerbos client

  • OCR & PDF processing libs

The presence of langgraph-checkpoint>=1.0.0 plus a Redis instance means the system is fully ready for persistent state workflows.


requirements.txt

langgraph

langgraph-checkpoint

redis

pymilvus

cerbos

ollama

pdfplumber

pytesseract



Samples

samples/invoices/ contains:

  • image-only invoices

  • text-embedded PDFs

  • various formats for testing


Part 3 — Workflow Structure: Conductor vs Specialists

The project cleanly separates orchestration from behavior.

Flow Diagram:

https://dhanuka84.blogspot.com/p/invoicesmallblogv6bigmindmap.html


The Conductor (src/graph_invoice/)

build.py

Defines the entire workflow graph:

  • every node

  • conditional edges

  • success/failure paths

This is the sheet music for the whole pipeline.

state.py

Defines the InvoiceState, the object passed from node to node.

nodes.py

Bridges workflow and specialists by:

  • pulling in data

  • calling specialists

  • updating state

  • returning results


state.py — The InvoiceState Schema

class InvoiceState(TypedDict):

    pdf_path: str

    text: str

    images: List[Any]

    signature: str

    template: dict

    extracted_fields: dict

    vision_pass: bool

    vision_score: float

    role: str

    done: bool



nodes.py — Node-to-Specialist Bridge

Example: PDF extraction node

def node_extract_pdf(state: InvoiceState):

    text, images = pdf_io.extract(state["pdf_path"])

    state["text"] = text

    state["images"] = images

    return state


Signature node

def node_signature(state):

    sig, vendor = signature.make_signature(state["text"])

    state["signature"] = sig

    state["vendor"] = vendor

    return state


Milvus suggestion node

def node_milvus_suggest(state):

    embedding = signature.embed(state["signature"])

    state["embedding"] = embedding

    return rag_suggest(state)


Vision validation

def node_vision_validate(state):

    res = vision_validate.run(

        fields=state["extracted_fields"],

        images=state["images"]

    )

    state["vision_pass"] = res.pass_

    state["vision_score"] = res.score

    return state


Template promotion

def node_promote_template(state):

    if not can_promote_template(state["role"]):

        return state

    template_cache.promote(

        signature=state["signature"]

    )

    return state




The Specialists (src/invoice/)

Each file handles one business function — ingestion, extraction, learning, validation, caching, security.


Part 4 — The Graph in Motion (Step-by-Step)

Let’s walk the journey of an invoice through the pipeline:


1. Ingestion

  • Extract text & images using pdf_io

  • Fall back to OCR if needed

text, images = pdf_io.extract(pdf_path)

if len(text.strip()) < 20:

    text = ocr.run(images)




2. Signature Identification

signature.py computes:

  • a structural signature hash

  • vendor identification

signature = sha256(text.encode()).hexdigest()[:12]

This allows layout recognition even for unseen formats.


3. Cache Check

Redis determines:

  • Is there an active template?

  • Is there a staging template?

If found → skip straight to extraction.

tpl = template_cache.get_template(redis, signature)


4. Decision 1: Reuse or Search?

If no template exists → query Milvus.



5. RAG Suggestion

Milvus finds similar invoice signatures.

  • If a similar invoice has a known template → reuse

  • Otherwise → learn new template

results = milvus.search(collection, vector)


6. Decision 2: Suggest or Learn

use_suggest
→ go directly to extraction

learn
→ call the Template Learner module


7. Template Learning

The LLM learns:

  • regex rules

  • patterns for total/subtotal/tax

  • numerical extraction

  • structural hints

Template is saved as staging.

template = llm.learn_regex(text)

template_cache.save_staging(signature, template)



8. Extraction

Regex rules produce:

  • total

  • tax

  • subtotal

  • invoice number

  • line items

Math validation ensures consistency.

fields = extract.apply(template, text)


9. Vision Self-Evaluation

A local vision model (via Ollama) verifies:

  • Does the image actually show these values?

  • Are totals readable?

  • Do fields match the parsed text?

Returns:

  • vision_pass

  • vision_score

If score < threshold → review
If score >= threshold → promotion stage (if allowed)

res = vision_model.validate(fields, images)


10. Decision 3: Pass or Review

Self-evaluation determines next steps:

  • success path → promotion

  • failure path → manual review

All logged in Redis metrics.


11. Template Promotion (Security Gate)

Promotion requires:

1. AUTO_PROMOTE_THRESHOLD

Redis tracks success_count.

2. Cerbos Authorization

Only approved roles (manager, auditor, etc.) may promote.

If both checks pass → staging → active.

if metrics.success_count(signature) >= threshold:

    if cerbos.can_promote(role):

        template_cache.promote(signature)


12. Completion

Final JSON state is printed showing:

  • chosen/learned template

  • extracted fields

  • validation results

  • promotion status


Part 5 — Workflow State Management (In-Memory vs Redis Checkpoints)

LangGraph supports two modes of state persistence — and this project is already configured for both.


1. In-Memory State (How run_invoice_graph.py Executes)

 script calls:

run_invoice_graph.py:

result = graph.invoke({"pdf_path": pdf, "role": role})

print(result)


This mode:

  • runs synchronously

  • keeps state in memory

  • returns final InvoiceState at the end

  • is perfect for CLI or simple flow

This is the default for development and testing.


2. Redis Checkpoints (Persistent, Resumable LangGraph Runs)


from langgraph.checkpoint import RedisSaver


config = {"checkpoint_saver": RedisSaver(redis_url)}

app = graph.compile(config)



Project contains:

  • langgraph-checkpoint>=1.0.0

  • A Redis instance in docker-compose.yml

  • A stateful workflow design

This means this graph is ready for checkpointing.

What checkpointing gives you:

  • Resume after crashes

  • Pause workflows mid-run

  • Long-running invoice batches

  • Distributed processing

  • Full audit trails

  • Human-in-the-loop resume points

LangGraph can serialize the InvoiceState into Redis at every step.

This is crucial for air-gapped enterprise environments, where:

  • reliability

  • auditability

  • resumability

  • distributed load

  • strict governance

…are required.


Part 6 — Hands-On: Running the System

Start services


Clone the below GitHub Repository: https://github.com/dhanuka84/local-secure-rag-invoice


```bash

docker-compose up -d


$ docker-compose ps WARN[0000] /home/dhanuka84/research/local-secure-rag-invoice/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS cerbos ghcr.io/cerbos/cerbos:latest "/cerbos server" cerbos 2 days ago Up 16 hours (healthy) 0.0.0.0:3592->3592/tcp, [::]:3592->3592/tcp, 3593/tcp milvus milvusdb/milvus:v2.4.3 "/tini -- milvus run…" milvus 2 days ago Up 2 days 0.0.0.0:9091->9091/tcp, [::]:9091->9091/tcp, 0.0.0.0:19530->19530/tcp, [::]:19530->19530/tcp ollama ollama/ollama:latest "/bin/ollama serve" ollama 2 days ago Up 2 days 0.0.0.0:11434->11434/tcp, [::]:11434->11434/tcp redis redis:7-alpine "docker-entrypoint.s…" redis 2 days ago Up 2 days 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp



make models


Install dependencies


python -m venv .venv && source .venv/bin/activate

pip install --upgrade pip

pip install -r requirements.txt


Process an invoice


APP_ROLE=manager python -m src.graph_invoice.run_invoice_graph samples/invoices/invoice1.pdf


Environment variables:

  • AUTO_PROMOTE_THRESHOLD=1 → enable fast promotion for testing

  • APP_ROLE=manager → allowed to promote templates


Failed Scenario

$ APP_ROLE=manager python -m src.graph_invoice.run_invoice_graph samples/invoices/invoice1.pdf

{

 "pdf": "samples/invoices/invoice1.pdf",

 "signature": "acme_corporation_123_main_street_invoice_72246d14",

 "template_source": "learned",

 "promotion_status": null,

 "fields": {

   "invoice_no": "INV-1001",

   "date": "2025-11-05",

   "subtotal": "100.00",

   "tax": "7.50",

   "total": "107.50",

   "tax_rate": "0.0750"

 },

 "vision_pass": false,

 "vision_score": 0.0,

 "vision_critique": "In the image provided, there are no visible numbers to compare against the extracted invoice amount fields.",

 "done": true

}


(.venv)local-secure-rag-invoice$ python -m src.invoice.templates_cli list

Active:


Staging:

  acme_corporation_123_main_street_invoice_72246d14



Success Scenario


(.venv) dhanuka84@dhanuka84:~/research/local-secure-rag-invoice$ APP_ROLE=manager python -m src.graph_invoice.run_invoice_graph samples/invoices/invoice1.pdf

{

 "pdf": "samples/invoices/invoice1.pdf",

 "signature": "acme_corporation_123_main_street_invoice_72246d14",

 "template_source": "active",

 "promotion_status": "pending_success_0",

 "fields": {

   "invoice_no": "INV-1001",

   "date": "2025-11-05",

   "subtotal": "100.00",

   "tax": "7.50",

   "total": "107.50",

   "tax_rate": "0.0750"

 },

 "vision_pass": true,

 "vision_score": 1.0,

 "vision_critique": "All extracted fields match the images exactly.",

 "done": true

}


$ python -m src.invoice.templates_cli list

Active:

  acme_corporation_123_main_street_invoice_72246d14

Staging:


Checking the Redis Cache

$ docker exec -it redis redis-cli

keys *

1) "invoice:template:acme_corporation_123_main_street_invoice_72246d14"

2) "invoice_metrics:acme_corporation_123_main_street_invoice_72246d14"

127.0.0.1:6379> HGETALL invoice:template:acme_corporation_123_main_street_invoice_72246d14

(error) WRONGTYPE Operation against a key holding the wrong kind of value

127.0.0.1:6379> HGETALL invoice_metrics:acme_corporation_123_main_street_invoice_72246d14

1) "vision_failures"

2) "1"

3) "updated_at"

4) "1763327448"

5) "promotions"

6) "1"



Conclusion — A Modern, Secure, Self-Learning Document AI Pattern

This architecture is a blueprint for modern document automation in secure environments.

It brings together:

  • LangGraph → deterministic & resumable workflows

  • Redis → lightning-fast memory layer

  • Milvus → layout-aware RAG

  • Ollama → local LLM + vision validation

  • Cerbos → enterprise-grade policy enforcement

  • Auto Template Learning → zero-shot adaptation

The result is a fully air-gapped, intelligent, self-evaluating pipeline for tax calculation and invoice extraction — capable of evolving safely while staying compliant with organizational rules.




No comments:

Post a Comment