Skip to main content

Transferring Data to Core

This guide will walk you through how to load data from various sources onto your Core machine.

Coping Data To and From a Core Linux Machine

In this guide we'll cover how to copy data onto and off of your machine from:

  • A local desktop or laptop using SCP
  • Publicly accessible URLs and buckets using wget
  • Private object storage e.g. S3 Buckets

Copying Files To or From your Laptop or Desktop

note

This article assumes you're using either MacOSX or Linux. If you're using Windows, you can install Windows Subsystem for Linux (WSL) and continue following along as normal.

In order to copy files from your local laptop or desktop to your Core machine, you will first need to have SSH setup correctly on your local machine.

Once you have SSH setup, it only takes a single command to copy files to your Core machine. However, you'll need to modify it slightly first. This command should be run in your local terminal (while not connected via SSH to your machine).

scp -i ~/.ssh/my-key.pem ~/path/to/local_file paperspace@machine-ip-address:~/.

To use this command you'll need to replace my-key with the name of the SSH key you created to connect to your machine (don't forget the .pem).

Next you'll need to replace ~/path/to/local_file with the local file on your machine. Remember that ~ is short hand for your home directory, so if you wanted to upload a data set in your Documents folder you would type something like: ~/Documents/my-data-set.csv.

After that you'll need to replace machine-ip-address with the IP address listed for your machine in the console.

Optionally, you can also replace the ~/. with whatever path you would like to copy the data to on your machine. If you leave it as is, it will be copied into your home directory on the machine.

Copying results from your machine back to your local machine

Copying files back to your local machine is nearly the same as before. However, you have to flip the local and remote paths.

scp -i ~/.ssh/my-key.pem ubuntu@machine-ip-address:~/results.ckpt .

You still need to update the my-key.pem well as machine-ip-address with your own. This time you specify the remote file you would like to copy e.g. ~/results.ckpt and then tell scp where to send it e.g. . (. being the current directory).

Copying files from publicly accessible URLs and cloud storage buckets

To copy files from a public URL or cloud storage bucket we recommend using wget.

First make sure you're connected via SSH to your machine and then run:

wget https://example.com/example-data-set.tar.gz

where https://example.com/example-data-set.tar.gz is the publicly accessible URL of the data set you'd like to download. This works for publicly accessible S3, Azure, or Google Cloud bucket urls.

See below for downloading files from private buckets.

Opening compressed data sets

If the data set you've downloaded is compressed as a file with the .tar.gz extension you can decompress it using the tar command like so:

tar -xvf example-data-set.tar.gz

If your data set is compressed as a file with the .zip extension you'll have to first install unzip using the command:

sudo apt-get install unzip

After installation you can then unzip the data set in its current folder using the unzip command:

unzip example-data-set.zip

If you'd like to move the data set to a new directory when you unzip it, you can use the command:

unzip example-data-set.zip -d /path/to/new/directory

Copying files from private S3, Azure, or Google Cloud buckets

To copy files from a private cloud storage bucket we recommend installing the CLI of the specific cloud provider on your Core machine.

Example: S3 — using the AWS CLI

We recommend installing the AWS CLI via pip. To get the AWS CLI up and running for syncing files from S3 you'll need to SSH into your Core machine and then:

  • Install the AWS CLI via pip by following the instructions for Linux.
  • Configure the CLI to use the correct credentials for accessing the desired bucket.
  • Follow the S3 CLI documentation on using the aws s3 sync command to copy the desired files.

The most common errors we see with using the AWS CLI to download files from or upload files to S3 involve incorrect IAM roles and permissions. It's important to make sure that the IAM user associated with the access key you use has the correct permissions to read from and write to the desired bucket.