Overview

This document describes the technical implementation of the Legacy BMS PDF Import Pipeline - an automated system that receives policy premium PDFs via email, extracts structured data using a self-hosted AI model, validates the data against the legacy Business Management System (BMS), and stores the document and metadata directly into the Access/JET Engine database.

All infrastructure is self-hosted and on-premises - no data leaves the client's network.


System Architecture

Component Technology Purpose
Workflow engine n8n (self-hosted) Orchestrates the full pipeline from email trigger to BMS storage
AI model Ollama (self-hosted) Processes and extracts structured data from PDFs on-prem
API bridge Python / FastAPI Exposes REST endpoints that connect to the Access/JET Engine database
Legacy BMS Microsoft Access (JET Engine) Core business management system - the target data store
Tunnel ngrok Exposes the local FastAPI bridge to n8n during development/demo

n8n Workflow

Workflow name: Health Cheque Demo

Workflow ID: CSzK2rozmbjqooll

Execution order: v1

Binary mode: separate

Node Sequence

  1. Gmail Trigger 2. Extract PDF Attachment 3. Extract PDF Data 4. Lookup Client 5. Store Document

1. Gmail Trigger

Setting Value
Type n8n-nodes-base.gmailTrigger v1.3
Poll interval Every minute
Download attachments Yes
Simple mode Off (returns full message data)

Polls the connected Gmail account every minute for new emails. Attachments are downloaded as binary data and passed to the next node.


2. Extract PDF Attachment (Code Node)