Distributed ML @ W&M
  • Distributed ML @ W&M
  • Logging in and Setting up your HPC Account
    • πŸŽ‰Requesting an Account
    • πŸ‘‹Login & Basic Setup
    • πŸ—ΊοΈPBSTOP - Your Cluster Roadmap
    • 🐍Uploading Files
  • The Batch System
    • πŸ‘©β€πŸŒΎWhat is a batch system?
    • πŸ‘·Jobs
    • Interactive Jobs
    • Non-Interactive Jobs
    • Checking the status of your jobs
    • Deleting Jobs
  • Using Python & Batch
    • 🐍Conda Environments
    • Python + Conda in a Job
    • Python & MPI
    • Python & Dask
  • Distributed sklearn
    • Example Dataset
    • Random Search - Simple
    • Random Search - MPI
    • Random Forest
    • Dask & sklearn
  • Distributed PyTorch - Dask
    • Basics of Torch
    • PyTorch + DASK
  • Kubernetes
    • Basics of Kubernetes
    • Your First K8S Deployment
    • Persistence & Python
    • Setting up Torch
    • One Pod Torch with Data
Powered by GitBook
On this page
  1. Distributed sklearn

Example Dataset

PreviousPython & DaskNextRandom Search - Simple

Last updated 2 years ago

To illustrate how to use sklearn in a distributed system, we'll be using a small CSV about student alcohol consumption off of Kaggle (see https://www.kaggle.com/datasets/uciml/student-alcohol-consumption). As a first step, you'll need to upload the CSV into your home directory on SciClone (or any other directory you may want). The easiest way to do this with a GUI is through a program like Filezilla (a free file transfer program); you can also use terminal-based tool such as scp if you are familiar with them.

To upload the file with a program such as filezilla, you'll need to use a SFTP-SSH file transfer protocol, and point your host to bora.sciclone.wm.edu. You'll enter your username and password, and then be provided with a fairly straightforward interface for file uploads - you can also see the full

image

From here, you can just drag and drop the file you want to upload - in our case, we'll be uploading the "studentpor.csv" file. Once you've uploaded it, you can confirm it is present by typing ls into your sciclone terminal, i.e.:

image
Filezilla tutorial