1. worker agent: 在GPU机器上部署的主机管理程序,用于获取该机器的GPU、CPU数量,内存大小,和管理用户创建的容器的生命周期(创建→销毁)
  2. ft agent: 在GPU机器上部署的用于跨主机之间传输数据的程序,主要用于拷贝实例功能 。部署时会随worker agent一同部署以及管理
  3. proxy agent: 用于代理用户访问容器实例的网络桥接服务,访问路径为:用户 → proxy agent → 容器, 而非用户 → 容器,当有若干用户与容器时具备集中管理的优势

以上3个服务在整套系统中的作用和位置请参考:

Untitled

镜像管理

默认镜像

在初始化租户后以下镜像将默认添加到系统镜像中,开箱即用。此外会定期更新最新的框架镜像

镜像名称 镜像地址 CUDA版本
miniconda:cuda12.2-cudnn8-devel-ubuntu22.04-py310 ccr.ccs.tencentyun.com/autodl-private-cloud/miniconda:cuda12.2-cudnn8-devel-ubuntu22.04-py310 12.2
miniconda:cuda11.8-cudnn8-devel-ubuntu22.04-py310 ccr.ccs.tencentyun.com/autodl-private-cloud/miniconda:cuda11.8-cudnn8-devel-ubuntu22.04-py310 11.8
miniconda:cuda11.6-cudnn8-devel-ubuntu20.04-py38 ccr.ccs.tencentyun.com/autodl-private-cloud/miniconda:cuda11.6-cudnn8-devel-ubuntu20.04-py38 11.6
miniconda:cuda11.3-cudnn8-devel-ubuntu18.04-py38 ccr.ccs.tencentyun.com/autodl-private-cloud/miniconda:cuda11.3-cudnn8-devel-ubuntu18.04-py38 11.3
torch:cuda12.1-cudnn8-devel-ubuntu22.04-py312-torch2.3.0 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda12.1-cudnn8-devel-ubuntu22.04-py312-torch2.3.0 12.1
torch:cuda11.8-cudnn8-devel-ubuntu22.04-py310-torch2.1.2 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda11.8-cudnn8-devel-ubuntu22.04-py310-torch2.1.2 11.8
torch:cuda11.8-cudnn8-devel-ubuntu20.04-py38-torch2.0.0 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda11.8-cudnn8-devel-ubuntu20.04-py38-torch2.0.0 11.8
torch:cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.11.0 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.11.0 11.3
torch:cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.10.0 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.10.0 11.3
torch:cuda11.1-cudnn8-devel-ubuntu18.04-py38-torch1.9.0 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda11.1-cudnn8-devel-ubuntu18.04-py38-torch1.9.0 11.1
torch:cuda10.1-cudnn7-devel-ubuntu18.04-py38-torch1.5.1 ccr.ccs.tencentyun.com/autodl-private-cloud/torch:cuda10.1-cudnn7-devel-ubuntu18.04-py38-torch1.5.1 10.1
tensorflow:cuda11.2-cudnn8-devel-ubuntu20.04-py38-tf2.9.0 ccr.ccs.tencentyun.com/autodl-private-cloud/tensorflow:cuda11.2-cudnn8-devel-ubuntu20.04-py38-tf2.9.0 11.2
tensorflow:cuda11.2-cudnn8-devel-ubuntu18.04-py38-tf2.5.0 ccr.ccs.tencentyun.com/autodl-private-cloud/tensorflow:cuda11.2-cudnn8-devel-ubuntu18.04-py38-tf2.5.0 11.2
tensorflow:cuda11.4-py38-tf1.15.5 ccr.ccs.tencentyun.com/autodl-private-cloud/tensorflow:cuda11.x-py38-tf1.15.5 11.4

自定义镜像

在您制作的镜像基础上,对SSH服务和JupyterLab做部分适配或安装后,可录入到系统中作为系统镜像。JupyterLab如果未安装,使用时会导致无法使用JupyterLab,但不会导致实例无法正常启动

相关Dockerfile

# (必要)修改SSH配置
RUN mkdir -p /var/run/sshd && \\
      sed -ri 's/^PermitRootLogin\\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config && \\
      cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \\
      echo "    StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \\
      mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config 

# (可选,但强烈推荐)安装Jupyterlab
RUN pip install --no-cache-dir --upgrade \\
      jupyterlab>=3.0.0 \\
      ipywidgets \\
      matplotlib \\
      jupyterlab_language_pack_zh_CN \\
      -i <https://mirrors.aliyun.com/pypi/simple>

# (可选)安装miniconda,如需其他版本:<https://repo.anaconda.com/miniconda/>
RUN cd /root && wget -q <https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Linux-x86_64.sh> \\
        && bash ./Miniconda3-py38_4.10.3-Linux-x86_64.sh -b -f -p /root/miniconda3 \\
        && rm -f ./Miniconda3-py38_4.10.3-Linux-x86_64.sh \\
        && echo "PATH=/root/miniconda3/bin:/usr/local/bin:$PATH" >> /etc/profile \\
        && echo "source /etc/profile" >> /root/.bashrc \\
        && echo "source /etc/autodl-motd" >> /root/.bashrc

# (可选)设置语言和时区
RUN export DEBIAN_FRONTEND=noninteractive && \\
      locale-gen zh_CN zh_CN.GB18030 zh_CN.GBK zh_CN.UTF-8 en_US.UTF-8 && \\
      update-locale && \\
      echo "LANG=en_US.UTF-8" >> /etc/profile && \\
      echo "LANGUAGE=en_US:en" >> /etc/profile && \\
      echo "LC_ALL=en_US.UTF-8" >> /etc/profile && \\
      cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \\
      echo 'Asia/Shanghai' >/etc/timezone