The current model used in production is layoutlmv3-lora-invoice-number-onnx.
Heuristics → Model Fallback is the prefered system architecture.
Below highlights experiment results from candidate models and systems:
| Models | Device / Provider | Accuracy | Accuracy (w/ human-in-review) | P95 Latency (ms) |
|---|---|---|---|---|
| layoutlmv3-lora-invoice-number-mps | MPS (Apple’s GPU) | 0.811 | 0.994 | 420.10 |
| layoutlmv3-lora-invoice-number-cpu | CPU | 0.811 | 0.994 | 748.43 |
| layoutlmv3-lora-invoice-number-onnx | CPUExecutionProvider | 0.811 | 0.994 | 550.71 |
| gemini-2.5-flash-zero-shot-txt-img | CPU | 0.950 | 0.981 | 4038.0 |
| gemini-2.5-flash-lite-zero-shot-txt-img | CPU | 0.907 | 0.947 | 3111.48 |
| Systems (Heuristics → Model Fallback) | Device / Provider | Accuracy | Accuracy (w/ human-in-review) | P95 Latency (ms) |
|---|---|---|---|---|
| layoutlmv3-lora-heuristics-mps | MPS | 0.895 | 0.994 | 607.50 |
| layoutlmv3-lora-heuristics-cpu | CPU | 0.895 | 0.994 | 288.66 |
| layoutlmv3-lora-heuristics-onnx | CPUExecutionProvider | 0.895 | 0.994 | 599.18 |
All evaluations are ran with the test dataset in data-v2.1. layoutlmv3-lora-heuristics-mps is the recommended architecture considering performance, latency, and cost.
| Experiment Name | Device | System Architecture | Accuracy | Accuracy (w/ human-in-review) |
|---|---|---|---|---|
| layoutlmv3-lora-invoice-number-mps | MPS (Apple’s GPU) | Model | 0.811 | 0.994 |
| layoutlmv3-lora-invoice-number-cpu | CPU | Model | 0.811 | 0.994 |
| layoutlmv3-lora-heuristics-mps | MPS | Heuristics → Model Fallback | 0.895 | 0.994 |
| layoutlmv3-lora-heuristics-cpu | CPU | Heuristics → Model Fallback | 0.895 | 0.994 |
*Additional metrics can be found on Weights and Biases
A receipt invokes “human-in-review” if the invoice number is “Not Found” or it contains multiple predictions. In all experiments, the accuracy with human-in-review is at 99.4%