examples
Generative Pseudo-Labeling
This directory contains the code notebooks explained in the Generative Pseudo-Labeling (GPL) article. Notebooks include:
-
00-download-cord-19.ipynb
shows how to download the CORD-19 dataset. -
01-query-gen.ipynb
demonstrates the synthetic query generation data prep step. -
02-negative-mining.ipynb
works through the second data prep step of negative mining. -
03-ce-scoring.ipynb
details the final data prep step of pseudo-labeling. -
04-finetune.ipynb
shows how to use the data created in the previous notebooks to fine-tune a bi-encoder using Margin MSE loss.
All of this content is part of a course called NLP for Semantic Search.