google-research
DePlot: Visual Language Reasoning on Charts and Plots
Code and checkpoints for training the visual language models introduced in in the papers
- DePlot: One-shot visual language reasoning by plot-to-table translation
- MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering.
Installation
To install DePlot, it is necessary to clone the google-research repo:
git clone https://github.com/google-research/google-research.git
From the google_research
folder, you may install the necessary requirements
in a conda environment by executing:
conda create -n deplot python=3.9
conda activate deplot
pip install -r deplot/requirements.txt -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
Our checkpoints are fully compatible with the codebase from the Pix2Struct paper. Therefore, all of the tools and instructions described in the documentation apply here as well.
Since we use some gin configurations from Pix2Struct, the repository needs to
be cloned at a directory of choice, which can be exported in a PIX2STRUCT
environment variable.
git clone https://github.com/google-research/pix2struct.git $PIX2STRUCT
Thanks to the Hugging Face team, we also have DePlot [doc] and MatCha [doc] implementations in the HF Transfermers library.
Models
We provide pre-trained models and fine-tuned models.
Task | GCS Path (Base) |
---|---|
Pre-trained | gs://deplot/models/base/matcha/v1 |
Chart-to-table | gs://deplot/models/base/deplot/v1 |
ChartQA | gs://deplot/models/base/chartqa/v1 |
PlotQA V1 | gs://deplot/models/base/plotqa_v1/v1 |
PlotQA V2 | gs://deplot/models/base/plotqa_v2/v1 |
Chart2Text Statista | gs://deplot/models/base/chart2text_statista/v1 |
Chart2Text Pew | gs://deplot/models/base/chart2text_pew/v1 |
The models are also available at Hugging Face:
Task | HF Path |
---|---|
Pre-trained | https://huggingface.co/google/matcha-base |
Chart-to-table | https://huggingface.co/google/deplot |
ChartQA | https://huggingface.co/google/matcha-chartqa |
PlotQA V1 | https://huggingface.co/google/matcha-plotqa-v1 |
PlotQA V2 | https://huggingface.co/google/matcha-plotqa-v2 |
Chart2Text Statista | https://huggingface.co/google/matcha-chart2text-statista |
Chart2Text Pew | https://huggingface.co/google/matcha-chart2text-pew |
Finetuning
Continue pretraining/finetuning of MatCha and DePlot is supported through Hugging Face Transformers. Please see here for more instructions.
Inference
The checkpoints are fully compatible with Pix2Struct.
For testing and demoing purposes, inference may be run on CPU.
In that case, please set the export JAX_PLATFORMS=''
environment variable
to run on cpu.
Web Demo
While running this command, the web demo can be accessed
at localhost:8080
(or any port specified via the port
flag), assuming you
are running the demo locally. You can then upload your custom image and optional
prompt. To use a Plot-To-Table DePlot/MatCha model, you need to specify the
query as: "Generate underlying data table of the figure below:".
python -m pix2struct.demo \
--gin_search_paths="${PIX2STRUCT}/pix2struct/configs,${PIX2STRUCT}" \
--gin_file=models/pix2struct.gin \
--gin_file=runs/inference.gin \
--gin_file=sizes/base.gin \
--gin.MIXTURE_OR_TASK_NAME="'dummy_pix2struct'" \
--gin.TASK_FEATURE_LENGTHS="{'inputs': 4096, 'targets': 512}" \
--gin.BATCH_SIZE=1 \
--gin.CHECKPOINT_PATH="'gs://deplot/models/base/deplot/v1'"
We also provide a DePlot+LLM demo and a MatCha chart QA demo, both hosted on Hugging Face Spaces.
How to cite DePlot and MatCha?
You can cite the DePlot paper and the MatCha paper as follows:
@inproceedings{liu-2022-deplot,
title={DePlot: One-shot visual language reasoning by plot-to-table translation},
author={Fangyu Liu and Julian Martin Eisenschlos and Francesco Piccinno and Syrine Krichene and Chenxi Pang and Kenton Lee and Mandar Joshi and Wenhu Chen and Nigel Collier and Yasemin Altun},
year={2023},
booktitle={Findings of the 61st Annual Meeting of the Association for Computational Linguistics},
url={https://arxiv.org/abs/2212.10505}
}
@inproceedings{liu-2022-matcha,
title={MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering},
author={Fangyu Liu and Francesco Piccinno and Syrine Krichene and Chenxi Pang and Kenton Lee and Mandar Joshi and Yasemin Altun and Nigel Collier and Julian Martin Eisenschlos},
year={2023},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics},
url={https://arxiv.org/abs/2212.09662}
}
Disclaimer
This is not an official Google product.
Contact information
For help or issues, please submit a GitHub issue.