unstructured
arXiv Topic Modelling
This directory contains an example of how to use the arXiv python package (wrapper for the arXiv api), berTopic python package (transformer based topic modelling)
and several functions from the unstructured
library to run topic modelling on queried arXiV research papers. This notebook is very simple, but can easily modified for more complicated use cases.
To get started, use the following steps:
- Ensure you have Python 3.10 or higher installed on your system
- Create a new Python virtual environment
- Run
pip install -r requirements.txt
to install the dependencies - Run
PYTHONPATH=. jupyter notebook
from this directory to launch the notebook
At this point, you'll be able to run the topic modelling example notebook.