pytorch-lightning

Форк
0

README.md

Creation GPU self-hosted agent pool

Prepare the machine

This is a slightly modified version of the script from https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/docker

apt-get update
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
jq \
git \
iputils-ping \
libcurl4 \
libunwind8 \
netcat \
libssl1.0
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
mkdir /azp

Stating the agents

export TARGETARCH=linux-x64
export AZP_URL="https://dev.azure.com/Lightning-AI"
export AZP_TOKEN="xxxxxxxxxxxxxxxxxxxxxxxxxx"
export AZP_POOL="lit-rtx-3090"
for i in {0..7..2}
do
nohup bash .azure/start.sh \
"AZP_AGENT_NAME=litGPU-YX_$i,$((i+1))" \
"CUDA_VISIBLE_DEVICES=$i,$((i+1))" \
> "agent-$i.log" &
done

Check running agents

ps aux | grep start.sh

Machine maintenance

Since most of our jobs/checks are running in a Docker container, the OS/machine can become polluted and fail to run with errors such as:

No space left on device : '/azp/agent-litGPU-21_0,1/_diag/pages/8bb191f4-a8c2-419a-8788-66e3f0522bea_1.log'

In such cases, you need to log in to the machine and run docker system prune.

Automated ways

Let's explore adding a cron job for periodically removing all Docker caches:

  1. Open your user's cron tab for editing: crontab -e
  2. Schedule/add the command with the --force flag to force pruning without interactive confirmation:
    # every day at 2:00 AM clean docker caches
    0 2 * * * docker system prune --force
  3. Verify the entry: crontab -l

Note: You may need to add yourself to the Docker group by running sudo usermod -aG docker <your_username> to have permission to execute this command without needing sudo and entering the password.

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.