google-research

Check back soon for a new paper. This work is also based on the older paper Using deep learning to annotate the protein universe. The main difference is that in this work,

we don't rely on alignment methods to cut up proteins into domains before using the neural networks.
the neural networks localize the domain calls within proteins.

You might also be interested in our other related work on ProteInfer.

Usage instructions

If you're interested in the command line interface, see below.

Install gcloud on your local machine if you don't have it installed

sudo apt install -y google-cloud-sdk
gcloud auth login

Create GCP instance with a GPU

gcloud compute instances create protenn-gpu --machine-type n1-standard-8 --zone us-west1-b --accelerator type=nvidia-tesla-v100,count=1  --image-family ubuntu-2004-lts --image-project ubuntu-os-cloud --maintenance-policy TERMINATE --boot-disk-size 250

ssh into the machine

# You may need to wait ~30 seconds for the machine to boot up first.
gcloud compute ssh protenn-gpu

Install cuda dependencies for GPU support

sudo apt update
sudo add-apt-repository ppa:graphics-drivers -y

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb -O /tmp/cuda-keyring_1.0-1_all.deb
sudo dpkg -i /tmp/cuda-keyring_1.0-1_all.deb

sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda_learn.list'
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

sudo apt update
sudo apt install -y cuda-10-0 libcudnn7

Install local python virtual environment

sudo apt update
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install -y python3-venv python3.7 python3-pip python3.7-venv 
mkdir ~/python_venv
cd ~/python_venv
python3.7 -m venv protenn
source ~/python_venv/protenn/bin/activate
cd ~

Get our code from github and install python dependencies (e.g. numpy)

sudo apt install -y svn
svn export https://github.com/google-research/google-research/trunk/protenn
pip3 install -r protenn/requirements.txt

Run our code on test sequences

python -m protenn.install_models
python -m protenn.predict -i protenn/testdata/test_hemoglobin.fasta -o ~/hemoglobin_predictions.tsv

You should see the following output:

$ python3 -m protenn.predict -i protenn/testdata/test_hemoglobin.fasta
I0809 16:24:10.495073 140694289487680 protenn.py:420] Running with 1 ensemble elements
I0809 16:24:10.495324 140694289487680 protenn.py:186] Parsing input from protenn/testdata/test_hemoglobin.fasta
Loading models: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.14s/it]
I0809 16:24:12.682420 140694289487680 inference.py:280] Predicting for 1 sequences
Annotating batches of sequences: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.55it/s]
I0809 16:24:13.113105 140694289487680 inference.py:280] Predicting for 1 sequences
Annotating batches of sequences: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.20it/s]

# View your predictions.
$ cat ~/hemoglobin_predictions.tsv

sequence_name	predicted_label	start	end	description
sp|P69891|HBG1_HUMAN	PF00042	25	143	Globin
sp|Q7AP54|HBP2_LISMO	PF05031	31	127	Iron Transport-associated domain
sp|Q7AP54|HBP2_LISMO	PF05031	132	303	Iron Transport-associated domain
sp|Q7AP54|HBP2_LISMO	PF05031	352	488	Iron Transport-associated domain
sp|Q7AP54|HBP2_LISMO	PF00746	528	569	LPXTG cell wall anchor motif