Webpage for the University of Chicago Data Science Clinic
Hosted on GitHub Pages — Theme by orderedlist
Often, we use the DSI cluster to run batch jobs: we submit a procedure, let it run until it finishes, and then look at the results.
Often, we use our own computers (laptops) for interactive work, such as developing a procedure in Jupyter.
Sometimes, we need computing resources that are only available on the DSI cluster, such as specific datasets or the right type of GPU, and we need to work interactively because we’re still figuring out what we need to compute. That’s what this tutorial is for.
This procedure involves three computers: your own (maybe a laptop), the DSI head node, and a Slurm node that you will launch and run Jupyter on. When you type commands in a terminal, it’s important to know which computer you’re logged into when you run them.
The steps in the procedure are presented here:
In all of the following, I’ll use > as a shell prompt and ALL-CAPITAL-LETTERS for text that you need to replace with your own.
This is fe01, fe02, or fe03 (your choice), so
> ssh YOUR-USERNAME@fe01.ds.uchicago.edu
If you haven’t already, make sure that you have a local installation of Micromamba in your home directory. Anything you put in your home directory is accessible to any computer in the cluster, so you can set up the Micromamba files once and for all on the head node. If you need to run a command to enable it (tell Linux that you want to run python from your Micromamba installation, instead of the general one), you will need to run that enabling command whenever you log into a new computer in the cluster. (These instructions don’t cover Micromamba; that problem can be handled separately.)
Use srun (see Slurm full instructions) like this:
> srun -p general --gres=gpu:1 --pty -t 12:00:00 bash
srun: job 523847 queued and waiting for resources
srun: job 523847 has been allocated resources
This starts a node requiring a GPU (--gres=gpu:1) for 12 hours (-t 12:00:00) as an interactive environment that you can log into (bash). The time limit is important because it will automatically shut down after the specified length of time (and -p general selects the queue that allows more than 10 minutes). If you’re done early, shut it down manually with scancel to be a good citizen. Also, if you don’t need a GPU, don’t ask for one—let someone else use it!
The node has started, but it hasn’t given you its name so that you can start working. You can get its name from squeue:
> squeue -u YOUR-USERNAME
Here’s some sample output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
523847 general bash jpivarsk R 0:35 1 j001-ds
The node is j001-ds: it’s the one whose JOBID matches what srun returned. For the rest of this tutorial, I’ll call that YOUR-SLURM-NODE.
You do this through an ssh command on the head node, not on your own computer. As shown in the image above, it’s a two-step process.
> ssh YOUR-SLURM-NODE
You can determine whether you’re there by running hostname.
If you need to run a command to enable Micromamba (assign its environment variables so that you’re using it rather than the base Python), do it now, on the Slurm node. When this is set up properly, which python will respond with your installation of Python (a path in your home directory: /home/YOUR-USERNAME/micromamba/...). Also be sure that jupyterlab is one of the packages you’ve installed.
If you’re accustomed to starting Jupyter on the command line, you’ll need a few additional arguments:
> jupyter lab --no-browser --port=8888 --ip=0.0.0.0
Since you’re running this command on a server on a rack somewhere, you don’t want it to try to open a web browser on the server (--no-browser) and you should also specify a port (--port=8888) and make it world-accessible (--ip=0.0.0.0) so that you can open it on a browser on your own computer. JupyterLab will then print a lot of log messages, including a block like
To access the server, open this file in a browser:
file:///home/jpivarski/.local/share/jupyter/runtime/jpserver-570302-open.html
Or copy and paste one of these URLs:
http://j001-ds:8888/lab?token=THE-LONG-STRING-OF-CHARACTERS-THAT-IS-A-PASSWORD
http://127.0.0.1:8888/lab?token=THE-LONG-STRING-OF-CHARACTERS-THAT-IS-A-PASSWORD
You’ll want the last URL (containing 127.0.0.1), but don’t try to open it just yet.
In another terminal on your own computer (see the image above), ssh into the head node again, but this time forward the port from YOUR-SLURM-NODE to your computer. This ssh connection just needs to stay open to connect JupyterLab, running on the server, to your own computer, through that port number.
> ssh -L 8888:YOUR-SLURM-NODE:8888 YOUR-USERNAME@fe01.ds.uchicago.edu
Now open the link Jupyter gave you in your own web browser. When it starts up, you should see the files of the remote computer in its file browser, and if you open Jupyter’s own web-based terminal, you should be able to verify that it’s running on YOUR-SLURM-NODE by typing the hostname command.
Happy computing!