Skip to main content

Deep Learning on Core with ML in a Box

ML in a Box is a one-stop-shop generic data science stack for those who want to get going quickly with deep learning and other forms of machine learning on Paperspace Core.

It includes stable and up-to-date installations of widely used ML software:

  • PyTorch 1.10.2
  • TensorFlow 2.5.0
  • H2O 3.34 (includes XGBoost)
  • Scikit-learn 0.24.2

along with other standard parts of the Python PyData ecosystem such as JupyterLab, NumPy, and Pandas. The inclusion of H2O and Scikit covers machine learning algorithms outside of deep learning, such as gradient-boosted decision trees.

It also includes a full setup to enable the above to be used with our GPUs:

  • CUDA 11.3
  • cuDNN 8.2.1
  • CUDA toolkit 11.3
  • Nvidia Docker 2.6.0
  • Nvidia RAPIDS 22.02

The software stack is built on a standard Paperspace Ubuntu 20.04 VM template.

For full details of what is included and how it is installed, please see the ML in a Box GitHub repository.

How to run it

Create machine

  • Navigate to the Machines tab in Paperspace Core
  • Click Create a Machine
  • Select ML-in-a-Box Ubuntu 20.04

ML in a Box tile

  • Select your Machine Type and Region
  • For authentication, choose ssh or password
  • Adjust any of the Advanced Options (not required)
  • Click Create

In the Machines tab, start the machine if needed.

Connect to machine

ML in a Box is a terminal / SSH only machine, as indicated on its tile in the create machine page above.

Therefore, connect to the machine from your terminal with an ssh command like

ssh paperspace@123.456.789.012

i.e., user paperspace to the machine's IP.

Then it will use your key or password depending upon the authentication method that you chose. This should bring you to a command prompt in its terminal interface.

On the machine, you will be in the home directory

/home/paperspace

The shell is /bin/bash, and your username is paperspace.

Within this, there are various directories, but they are not relevant to running ML.

Available common data science commands in /usr/bin include:

apt awk bunzip2 bzip2 cat cc comm crontab curl cut df diff docker echo find free gcc git grep gunzip gzip head java jq ln ls make man md5sum mkdir mv nano nohup nvcc nvidia-smi nvlink ping rm rmdir rsync sed ssh sudo tail tar tee top tr uniq unzip vi watch wget whoami which xargs zip

Several others are in /home/paperspace/src/miniconda/bin:

python python3 python3.8 xz pip pip3 openssl

or /home/paperspace/.local/bin:

ipython ipython3 jupyter jupyter-lab tensorboard

Finally, some commands not present at the current time are:

7z ack anchor awscli csvkit dvc emacs lfs screen tmux vim zsh

Several of these will respond with a prompt on how to install them.

You are therefore set up to start using the supplied ML software stack, or to install further items if needed.

Try out the software

You can try out various basic commands to verify that the main libraries that we supply with the machine are working.

PyTorch

Start PyTorch and display GPU.

>>> python
Python 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'NVIDIA Quadro M4000'

TensorFlow

Start TensorFlow and display GPU.

>>> python
...
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
pciBusID: 0000:00:05.0 name: NVIDIA Quadro M4000 computeCapability: 5.2
coreClock: 0.7725GHz coreCount: 13 deviceMemorySize: 7.94GiB deviceMemoryBandwidth: 179.11GiB/s
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

H2O

Run H2O demo.

>>> python3 -c "import h2o; h2o.init(); h2o.demo('glm')"
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
...
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
...
-------------------------------------------------------------------------------
Demo of H2O's Generalized Linear Estimator.

This demo uploads a dataset to h2o, parses it, and shows a description.
Then it divides the dataset into training and test sets, builds a GLM
from the training set, and makes predictions for the test set.
Finally, default performance metrics are displayed.
-------------------------------------------------------------------------------
# Connect to H2O
h2o.init()
(press any key)

etc.

Scikit-Learn

>>> python
...
>>> import sklearn

GPU

Display GPU directly using nvidia-smi.

Thu Feb 17 16:21:44 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00 Driver Version: 470.82.00 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro M4000 Off | 00000000:00:05.0 On | N/A |
| 46% 32C P8 16W / 120W | 217MiB / 8127MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 961 G /usr/lib/xorg/Xorg 118MiB |
| 0 N/A N/A 1290 G /usr/bin/gnome-shell 94MiB |
+-----------------------------------------------------------------------------+

Nvidia RAPIDS

Nvidia RAPIDS in Conda environment

>>> conda init bash
>>> conda activate rapids-22.02
>>> python3 -c "import cudf, cuml; print(cudf.__version__); print(cuml.__version__)"
>>> conda deactivate

Next Steps

ML-in-a-Box is generic, so next steps largely depend upon the needs for your project.

Some common ones are:

  • Run software as-is and proceed with ML
  • See the accompanying GitHub repository for full details of the software that is installed
  • Install your own additional software on top of our generic stack

Please report any issues, feedback, or requests for future software to be included in the stack to the Paperspace Community or support.