google-research

Форк
0

README.md

VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining

This directory contains the model and inference code for the CVPR 2023 paper: "VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining" by Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang.

Model overview

TFHub

The VILA-R model is available on TensorFlow Hub for predicting image aesthetic score. See tfhub_inference.ipynb for a sample notebook to try the model.

If you want to go deeper in the code and implementation, follow the instructions below.

Prerequisite

Install dependencies (works with python3.10):

pip3 install -r requirements.txt

The model checkpoints can be downloaded from gcloud directory link

The folder contains the following checkpoints:

  • ./vila/checkpoints/vila_pretrain/: VILA-P, pretrained on AVA-Captions dataset.
  • ./vila/checkpoints/vila_rank_tuned/: VILA-R, finetuned on AVA MOS prediction task using the proposed rank-based adapter module.
  • ./vila/checkpoints/laion_pretrain/: LAION pretrained CoCa model.
  • ./vila/spm_model/: The sentence piece tokenizer used in the models.

Run Inference

Example command for running VILA-R model for aesthetic assessment.

python3 -m vila.run_vila_predict \
--ckpt_dir=/tmp/vila/checkpoints/vila_rank_tuned/ \
--image_path=/tmp/image.jpg \
--spm_model_path=/tmp/vila/spm_model/spm.model

Example command for running VILA model for captioning.

python3 -m vila.run_vila_decode \
--ckpt_dir=/tmp/vila/checkpoints/vila_rank_tuned/ \
--image_path=/tmp/image.jpg \
--spm_model_path=/tmp/vila/spm_model/spm.model

Example command for running LAION pretrained model for captioning.

python3 -m vila.run_vila_decode \
--is_pretrain \
--ckpt_dir=/tmp/vila/checkpoints/laion_pretrain/ \
--image_path=/tmp/image.jpg \
--spm_model_path=/tmp/vila/spm_model/spm.model

Citation

If you find this code useful for your publication, please cite the original paper:

@inproceedings{ke2023vila,
  title = {VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining},
  author={Ke, Junjie and Ye, Keren and Yu, Jiahui and Wu, Yonghui and Milanfar, Peyman and Yang, Feng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10041--10051},
  year={2023}
}

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.