Persistence & Python
Setting up Python and Conda in K8S
Depending on the base image you are using, the process to setup Conda and Python may be slightly different. Here, we show how to do it using a common NVIDIA base image, but you may have to adapt some things to your own use case.
File Persistence in Kubernetes
While individual pods are ephemeral (i.e., nothing written in them is saved), it is possible to have those pods write to locations that are persistent, and that multiple pods can access. In kubernetes, these are referred to as persistent volumes. You will be assigned a volume for your user. In this example, we will create our conda environments inside a persistent volume, so that future pods can access our environments at later dates.
Accessing a persistent volume
To access a persistent volume, you must specify both (a) the claimName for your volume, which the HPC team will provide, and (b) the path you want your persistent volume to be mounted in inside the pod. In the below example, I have been asigned a 500GB volume with the claimName dsmr-vol-01. You will not have access to this claim, and will need to replace it with your own. I am mounting this persistent volume to the path /kube/home within my pod, and then printing out the total disk space available.
As a reminder, you would use: kubectl apply -f persistence.yml
to deploy your pod. You can then monitor it's progress being created with: kubectl get pods
While the pod is being created, you will see "ContainerCreating".
Finally, to inspect the output of your pod, you can use: kubectl logs claim-example
Showing that I have 500 GB of space.
Note that pods with claims will take a few seconds longer to spin up, as K8S must attach the appropriate disks to your pod.
Creating an example persistent file
Now that we have a pod running with access to our persistent volume, we can log into that pod in an interactive job, create a file, then delete the pod to illustrate that the content remains available. To do this, first type in:
Once you are logged into the pod, we are going to create two files - one in the normal file system (that will be destroyed), and another in the persistent file system (that will be retained). First let's create the file we know will be destroyed:
Now, let's create a file in our persistent volume:
Note the path here - /kube/home . This is the same path that's specified in our yml file as the mount point for our persistent volume:
Now that we have our files, let's destroy the pod and check it worked. Type exit
to get out of the pod, then kubectl delete pod claim-example
to delete the pod. Once it's deleted, create it again using kubectl apply -f persistence.yml
, and log back in with kubectl exec -it claim-example -- /bin/bash
. If you try to cat the two files we just created, you'll now see:
Installing Conda in our Persistent Directory
In order to get conda environments working, we'll want to install them into our persistent volume claim. To do this, we can use the below yaml code. Most of this is code you've seen before - but there are a few notable differences. First, we aren't explicitly requesting resources, as we don't really care what we get here - anything from 1 CPU up will work. Second, we're both loading our base image (the NVIDIA base), and also installing things within that base image - in this case, the "wget" tool using apt-get. This is because wget doesn't exist on our base image, and we need it to install conda (note we also have to defined a keyserver here - this is because of some unique issues related to the NVIDIA image, and would not be common to all cases). Finally, and most importantly, we download and install conda to the path we specify with an environmental variable. If this path is on your persistent volume, then it will stay installed between sessions.
Of note, this particular yaml does not have a sleep command at the end. Thus, once conda is installed, it will release the resources for the pod back to the cluster.
Go ahead and deploy this yml file via kubectl apply -f 1_installConda.yml
. Once complete, it will install conda into your persistent claim to the mounted path /kube/home/.envs/conda . You can monitor the progress of the conda installation by typing either kubectl get pods
or kubectl logs conda-install
.
Exploring persistence with Conda
Now that conda is installed in our persistent directory, we can use it in any pod we want. You can do this interactively, or through a dedicated pod. We'll cover the dedicated pod approach in the next tutorial, but here we'll show how you can do it interactively.
First, create a simple pod that we can log in to that has our persistent volume attached to it, and adds the conda path to our global path:
Once that file is created, run kubectl apply -f interactiveCondaPod.yml
to deploy our pod. Once it's up and running, hop into an interactive shell with kubectl exec -it interactive-conda -- /bin/bash
. In order to use conda interactively, we must first tell the shell where it is located. To do this, simply type these two lines into your shell. Note this has to be done any time you open a new shell.
After you export these paths, you can now type "conda", and you will now see the usual conda help documentation. Let's go ahead and create a very simple python environment that has pandas installed:
Note that you will frequently need to use the source activate
technique in kubernetes pods due to system restrictions on rights, but depending on your image conda activate
may also sometimes work. Once you have pandas installed, double check it's working by:
Finally, to illustrate persistence, log out of the node and delete it using kubectl delete pod interactive-conda
. Now, let's recreate it and confirm conda is still working:
If everything worked correctly, pandas should import without any additional installs or information required!
Last updated