Table of contents

Links

Overview

The current model used in production is layoutlmv3-lora-invoice-number-onnx.

Heuristics → Model Fallback is the prefered system architecture.

Below highlights experiment results from candidate models and systems:

Models Device / Provider Accuracy Accuracy (w/ human-in-review) P95 Latency (ms)
layoutlmv3-lora-invoice-number-mps MPS (Apple’s GPU) 0.811 0.994 420.10
layoutlmv3-lora-invoice-number-cpu CPU 0.811 0.994 748.43
layoutlmv3-lora-invoice-number-onnx CPUExecutionProvider 0.811 0.994 550.71
gemini-2.5-flash-zero-shot-txt-img CPU 0.950 0.981 4038.0
gemini-2.5-flash-lite-zero-shot-txt-img CPU 0.907 0.947 3111.48
Systems (Heuristics → Model Fallback) Device / Provider Accuracy Accuracy (w/ human-in-review) P95 Latency (ms)
layoutlmv3-lora-heuristics-mps MPS 0.895 0.994 607.50
layoutlmv3-lora-heuristics-cpu CPU 0.895 0.994 288.66
layoutlmv3-lora-heuristics-onnx CPUExecutionProvider 0.895 0.994 599.18

Model: layoutlmv3-lora-invoice-number

All evaluations are ran with the test dataset in data-v2.1. layoutlmv3-lora-heuristics-mps is the recommended architecture considering performance, latency, and cost.

Performance

Experiment Name Device System Architecture Accuracy Accuracy (w/ human-in-review)
layoutlmv3-lora-invoice-number-mps MPS (Apple’s GPU) Model 0.811 0.994
layoutlmv3-lora-invoice-number-cpu CPU Model 0.811 0.994
layoutlmv3-lora-heuristics-mps MPS Heuristics → Model Fallback 0.895 0.994
layoutlmv3-lora-heuristics-cpu CPU Heuristics → Model Fallback 0.895 0.994

*Additional metrics can be found on Weights and Biases

A receipt invokes “human-in-review” if the invoice number is “Not Found” or it contains multiple predictions. In all experiments, the accuracy with human-in-review is at 99.4%