/

/

google-research

Обзор Центр заботыВойти

google-research

Ветки: 311 Коммиты: 4490 Теги: 0

..

google-research

/

non_semantic_speech_benchmark

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

4 года назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

Update papers list.

2 года назад

data_prep_and_eval_beam_main.py

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

data_prep_and_eval_beam_main_test.py

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

requirements.txt

Move non_semantic_speech_benchmark to open source.

4 года назад

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor". https://arxiv.org/abs/2312.07661

9 месяцев назад

README.md

Towards Learning a Universal Non-Semantic Representation of Speech

Papers using this code:

Interspeech 2022: TRILLsson: Distilled Universal Paralinguistic Speech Representations
ICASSP 2022: Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
IEEE 2022: BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Interspeech 2021: FRILL: A Non-Semantic Speech Embedding for Mobile Devices
Interspeech 2021: Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases
Interspeech 2020: Towards Learning a Universal Non-Semantic Representation of Speech

This paper and code repository describe a benchmark for comparing speech representations, and the evaluation code to run it. It also contains a description of our baseline best representation, TRILL.

Things you can do

Reproduce the results from our paper
Compute performance of a new embedding on the Non-Semantic Speech Benchmark (NOSS)
Run our embedding TRILL, or any of the other embedding networks on a new dataset.

Citation

To use this benchmark, please cite as follows:

@inproceedings{trill,
  author={Joel Shor and Aren Jansen and Ronnie Maor and Oran Lang and Omry Tuval and Félix de Chaumont Quitry and Marco Tagliasacchi and Ira Shavitt and Dotan Emanuel and Yinnon Haviv},
  title={Towards Learning a Universal Non-Semantic Representation of Speech},
  year=2020,
  booktitle={Interspeech},
  pages={140--144},
  doi={10.21437/Interspeech.2020-1242}
}

To use the embeddings, please cite the appropriate paper from the list above.

For questions reach out to

Joel Shor (joelshor@google.com)

Oran Lang (oranl@google.com)

Overview

Data flowchart

Embedding flowchart

Eval model flowchart

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.