/

/

google-research

Обзор Центр заботыВойти

google-research

Ветки: 311 Коммиты: 4490 Теги: 0

..

google-research

/homophonous_logography

/

pennchoma

Tools for replicating the results of Penn & Choma.

3 года назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

make_paper_pc_plots.sh

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

plot_cooccurrence.py

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

preprocessors.py

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

README

Implementation of Penn & Choma's correlation measure.

The script run.sh gives examples of usage. The results of one run of that script, averaged over the 5 runs for each case are as follows:

Summed absolute correlations: Document is 1 chapter Document is 6 chapters Chinese 5828 14859 3-gram English 6386 15116 Korean 8508 18200

Mean number of characters per document: Document is 1 chapter Document is 6 chapters Chinese 782 4674 3-gram English 1110 6635 Korean 1118 6686

Korean has higher overall numbers but it also has fewer distinct characters, meaning that more characters have a better chance of cooccurring in any document:

Number of distinct characters:

Chinese 3177 3-gram English 3194 Korean 1249

That is, 3-gram English and Chinese are pretty well matched for character types, whereas Korean has about a third, yielding the result we see.

Between document size and the number of characters, all of Penn & Choma's differences can be explained.

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.