Overview
Problem statement
Describe the problem you're trying to solve by doing this work.
We have some trouble with GPT4-V output as stated in one of the previous video.

At the sametime, our pipeline cost is high.
Proposed work
High-level overview of what we're building and why we think it will solve the problem.
Towards this, our solution is to adding LayoutLMv3 and Program Synthesis to construct the page content.

- For this, we will do the following:
- We quantitatively (instead of qualitatively) Evaluate PaddleOCR vs Azure by Levenshtein distance of the same bounding box. The ground truth is determined by Azure’s output since it would be more accurate
- Doing this will help reduce the OCR cost
- We create a new benchmark by Labeling Key and Value for training and evaluating LayoutLMv3/GPT4-V
- We add this training set in conjunction with our previous training set to get better set of key and value output
- Doing this will cut down the cost of GPT4-V
- For questions that directly identifiable with a value, we directly return the value, bounding box and page location.
Success criteria
The criteria that must be met in order to consider this project a success.
User stories
How the product should work for various user types.
User type 1
User type 2