google-research

Форк
0

README.md

DePlot: Visual Language Reasoning on Charts and Plots

Code and checkpoints for training the visual language models introduced in in the papers

  • DePlot: One-shot visual language reasoning by plot-to-table translation
  • MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering.

Installation

To install DePlot, it is necessary to clone the google-research repo:

git clone https://github.com/google-research/google-research.git

From the google_research folder, you may install the necessary requirements in a conda environment by executing:

conda create -n deplot python=3.9
conda activate deplot
pip install -r deplot/requirements.txt -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

Our checkpoints are fully compatible with the codebase from the Pix2Struct paper. Therefore, all of the tools and instructions described in the documentation apply here as well.

Since we use some gin configurations from Pix2Struct, the repository needs to be cloned at a directory of choice, which can be exported in a PIX2STRUCT environment variable.

git clone https://github.com/google-research/pix2struct.git $PIX2STRUCT

Thanks to the Hugging Face team, we also have DePlot [doc] and MatCha [doc] implementations in the HF Transfermers library.

Models

We provide pre-trained models and fine-tuned models.

TaskGCS Path (Base)
Pre-trainedgs://deplot/models/base/matcha/v1 
Chart-to-tablegs://deplot/models/base/deplot/v1
ChartQAgs://deplot/models/base/chartqa/v1
PlotQA V1gs://deplot/models/base/plotqa_v1/v1
PlotQA V2gs://deplot/models/base/plotqa_v2/v1
Chart2Text Statistags://deplot/models/base/chart2text_statista/v1
Chart2Text Pewgs://deplot/models/base/chart2text_pew/v1

The models are also available at Hugging Face:

TaskHF Path
Pre-trainedhttps://huggingface.co/google/matcha-base 
Chart-to-tablehttps://huggingface.co/google/deplot
ChartQAhttps://huggingface.co/google/matcha-chartqa
PlotQA V1https://huggingface.co/google/matcha-plotqa-v1
PlotQA V2https://huggingface.co/google/matcha-plotqa-v2
Chart2Text Statistahttps://huggingface.co/google/matcha-chart2text-statista
Chart2Text Pewhttps://huggingface.co/google/matcha-chart2text-pew

Finetuning

Continue pretraining/finetuning of MatCha and DePlot is supported through Hugging Face Transformers. Please see here for more instructions.

Inference

The checkpoints are fully compatible with Pix2Struct. For testing and demoing purposes, inference may be run on CPU. In that case, please set the export JAX_PLATFORMS='' environment variable to run on cpu.

Web Demo

While running this command, the web demo can be accessed at localhost:8080 (or any port specified via the port flag), assuming you are running the demo locally. You can then upload your custom image and optional prompt. To use a Plot-To-Table DePlot/MatCha model, you need to specify the query as: "Generate underlying data table of the figure below:".

python -m pix2struct.demo \
  --gin_search_paths="${PIX2STRUCT}/pix2struct/configs,${PIX2STRUCT}" \
  --gin_file=models/pix2struct.gin \
  --gin_file=runs/inference.gin \
  --gin_file=sizes/base.gin \
  --gin.MIXTURE_OR_TASK_NAME="'dummy_pix2struct'" \
  --gin.TASK_FEATURE_LENGTHS="{'inputs': 4096, 'targets': 512}" \
  --gin.BATCH_SIZE=1 \
  --gin.CHECKPOINT_PATH="'gs://deplot/models/base/deplot/v1'"

We also provide a DePlot+LLM demo and a MatCha chart QA demo, both hosted on Hugging Face Spaces.

How to cite DePlot and MatCha?

You can cite the DePlot paper and the MatCha paper as follows:

@inproceedings{liu-2022-deplot,
  title={DePlot: One-shot visual language reasoning by plot-to-table translation},
  author={Fangyu Liu and Julian Martin Eisenschlos and Francesco Piccinno and Syrine Krichene and Chenxi Pang and Kenton Lee and Mandar Joshi and Wenhu Chen and Nigel Collier and Yasemin Altun},
  year={2023},
  booktitle={Findings of the 61st Annual Meeting of the Association for Computational Linguistics},
  url={https://arxiv.org/abs/2212.10505}
}

@inproceedings{liu-2022-matcha,
  title={MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering},
  author={Fangyu Liu and Francesco Piccinno and Syrine Krichene and Chenxi Pang and Kenton Lee and Mandar Joshi and Yasemin Altun and Nigel Collier and Julian Martin Eisenschlos},
  year={2023},
  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics},
  url={https://arxiv.org/abs/2212.09662}
}

Disclaimer

This is not an official Google product.

Contact information

For help or issues, please submit a GitHub issue.

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.