Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
benchmarker		benchmarker
config		config
core		core
examples		examples
scripts		scripts
README.md		README.md
example_io.ipynb		example_io.ipynb
requirements.txt		requirements.txt
run_rvlcdip.py		run_rvlcdip.py

Open Source Checklist:

Release Model (Encoder + Text decoder)
Release Most Scripts
Vision Decoder / Weights (Due to fake document generation ethical consideration, we plan to release this functionality as an Azure API)
Demo

Introduction

UDOP unifies vision, text, and layout through vision-text-layout Transformer and unified generative pretraining tasks including vision task, text task, layout task, and mixed task. We show the task prompts (left) and task targets (right) for all self-supervised objectives (joint text-layout reconstruction, visual text recognition, layout modeling, and masked autoencoding) and two example supervised objectives (question answering and layout analysis).

Install

Setup `python` environment

conda create -n UDOP python=3.8   # You can also use other environment.

Install other dependencies

pip install -r requirements.txt

Run Scripts

Switch model type by:

--model_type "UdopDual"

--model_type "UdopUnimodel"

Finetuninng on RVLCDIP

Download RVLCDIP first and change the path For OCR, you might need to customize your code

bash scripts/finetune_rvlcdip.sh   # Finetuning on RVLCDIP

Finetuninng on DUE Benchmark

Download Duebenchmark and follow its procedure to preprocess the data.

The training code adapted to our framework is hosted at benchmarker by running:

bash scripts/finetune_duebenchmark.sh   # Finetuning on DUE Benchmark, Switch tasks by changing path to the dataset

Evaluation of the output generation can be evaluated by Duebenchmark due_evaluator

Model Checkpoints

The model checkpoints are hosted here Huggingface Hub

Models	Huggingface Weights Address
Unimodel 512	udop-unimodel-large-512.zip
Unimodel 512 (new, trained on more steps)	udop-unimodel-large-512-300k-steps.zip
Dual 224	udop-dual-large-224.zip
Unimodel 224	udop-unimodel-large-224.zip

Citation

@article{tang2022unifying,
  title={Unifying Vision, Text, and Layout for Universal Document Processing},
  author={Tang, Zineng and Yang, Ziyi and Wang, Guoxin and Fang, Yuwei and Liu, Yang and Zhu, Chenguang and Zeng, Michael and Zhang, Cha and Bansal, Mohit},
  journal={arXiv preprint arXiv:2212.02623},
  year={2022}
}

Contact

Zineng Tang ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i-Code-Doc

i-Code-Doc

README.md

Unifying Vision, Text, and Layout for Universal Document Processing (CVPR 2023 Highlight)

Introduction

Install

Setup `python` environment

Install other dependencies

Run Scripts

Finetuninng on RVLCDIP

Finetuninng on DUE Benchmark

Model Checkpoints

Citation

Contact

Files

i-Code-Doc

Directory actions

More options

Directory actions

More options

Latest commit

History

i-Code-Doc

Folders and files

parent directory

README.md

Unifying Vision, Text, and Layout for Universal Document Processing (CVPR 2023 Highlight)

Introduction

Install

Setup python environment

Install other dependencies

Run Scripts

Finetuninng on RVLCDIP

Finetuninng on DUE Benchmark

Model Checkpoints

Citation

Contact

Setup `python` environment