Webpage for the University of Chicago Data Science Clinic
Hosted on GitHub Pages — Theme by orderedlist
This document contains instructions for two important pieces of the data science clinic:
github via sshsshDepending on what you are working on you may need to do both or just have access via github. Sections marked [CLUSTER] are only required for using the cluster.
Please read all portions carefully and only skip if you really know what you are doing. If you come across an issue, check that it isn’t addressed in Troubleshooting before asking.
If you are looking for instructions on using slurm to submit compute jobs, please refer to the clinic’s SLURM documentation
If there is a section which is unclear or needs updating, please open an issue or pull request!
Some students may already have access to the cluster and github and may not need to follow the below instructions. To verify both your github and cluster access, type in the following in a terminal window:
ssh -T git@github.com which, if set up properly should generate:Hi NickRoss! You've successfully authenticated, but GitHub does not provide shell access.
Connection to github.com closed.
ssh fe.ds which, if set up properly should generate:~ ssh fe.ds
###############################################################################
# #
# ***** IMPORTANT NOTICE: DO NOT RUN COMPUTE JOBS ON LOGIN NODE ***** #
# #
# The login node is for connecting, editing, and submitting jobs only! #
# #
# High-intensive compute jobs must be submitted through the SLURM scheduler. #
# Use interactive sessions or submit batch jobs as appropriate. #
# #
# Failure to comply may result in job termination without notice. #
# #
# For help, contact techstaff@cs.uchicago.edu #
# #
###############################################################################
Last login: Fri Sep 13 15:06:37 2024 from 10.150.1.240
(base) nickross@fe01:~$
The above is also how we demonstrate access to the required resources. If you already have access to the resources that are required you do not need to complete this document.
To be able to connect to the DSI’s cluster you will need an internet connection and will log in using a technology called “secured shell protocol” (“SSH”). This document takes you through the steps to set up an account and login via SSH to this system.
SSH is a command line tool which has a steeper learning curve than GUI-based systems such as VS Code. As such the instructions below will also install an extension to the VS Code IDE which allows you to connect your entire VS Code window to the cluster, allowing you to utilize all of VS Code’s features and extensions.
We will focus on installing and verifying a number of different pieces of this software that will be installed or used in the manual below. The table below contains a brief glossary of some of the components that will be used.
| Component Name | What it does / why is it important |
|---|---|
wsl2 (Windows only) |
This is required to be installed on windows machines to access bash and a unix terminal |
OpenSSH |
This is a common implementation of the SSH protocol, depending on your operating system |
ssh-agent |
This is a key manager for ssh which runs in the background. In this course there are two things which ssh-agent does: (1) it allows you to avoid entering in your ssh password every time you login and (2) it allows for forwarding of ssh keys, so that if you are logged into the cluster you can continue to use the keys that are on your machine. |
Public ssh key |
When you create an ssh key there are two files created, one of which is a public key. This is shareable and will usually be a file that ends in .pub |
Private ssh key |
When you create an ssh key the other file that is created is a private key. A private key should never be shared as it is the key that allows you to enter other systems |
This guide is specifically tailored to the University of Chicago DSI Cluster, though it should be generally applicable to most Slurm clusters.
Before continuing, make sure that you have the following completed:
[CLUSTER] Notes: You do not need access to a Slurm partition to continue and set up access to the cluster, but you will need it to use the cluster.
| Do NOT go past this until you have completed the above. |
It can be annoying / burdensome to constantly type in your passwords (something only you know) to connect to the cluster or push/pull from GitHub. We can switch to authenticating based on something only you have using ssh keys and greatly reduce the friction of developing.
There are different steps for Windows users and Mac/Linux users. Follow the below steps.
To set up Windows to use ssh like linux complete the following (from this SO answer):
where ssh to confirm that the top listed path is in System32. Mine is installed at C:\Windows\System32\OpenSSH\ssh.exe. If it’s not in the list you may need to close and reopen cmd.| Windows: Do NOT continue until in PowerShell, `Get-Service ssh-agent` returns with a 'Running' Status after rebooting. |
echo $SSH_AUTH_SOCK
ssh-agent
eval $(ssh-agent) to your shell configuration (.zshrc/.bashrc) file.If ssh-agent was not running, please reboot and verify that it loads on start.
| Mac/Linux: Do NOT continue until you have verified that ssh-agent runs after rebooting. |
.ssh (pronounced “dot-s-s-h”) directory:
cd ~/.sshcd C:\Users\YOUR_USERNAME\.ssh\ssh-keygen, instructions here. Recommended: use ssh-keygen -t ecdsa -b 521 or ssh-keygen -t ed25519 to generate your key.dsi_cluster and verify the file is in the directory listed above. Otherwise you can click enter to accept the default suggestion..ssh directory. A KEYNAME and KEYNAME.pub file will be created by this command. The file with the .pub extension is your public key and can be shared safely. The file with no extension is your private key and should never be shared. KEYNAME will either be the name you specified above or the the encryption type.| Do not continue until you have verified that both files mentioned above exist in the .ssh directory. |
If you are using Windows you need to install WSL (“Windows Subsystem for Linux”) on your machine. Installing this allows Windows users access to core Unix based functionality. If you are doing local developement on your Windows machine, you should do it in WSL. TODO
To confirm proper installation:
wsl printf 'Default shell: $0\nUsername: $USER\nHome Directory: $(cd ~ && pwd)'
This should return:
Default shell: /bin/bash
Username: YOUR_WSL_USERNAME
Home Directory: /home/YOUR_WSL_USERNAME
Where YOUR_WSL_USERNAME is the username you picked when setting up WSL. It should not be root If one of these is incorrect, please go to troubleshooting instructions
The convenience of ‘pretending’ to have two separate operating systems on one can lead to complications. One is with SSH keys, which is the core method we use to authenticate to the DSI Cluster.
The .ssh directory used on your normal Windows system and your WSL will be different from each other. This is fine in most cases, but can lead to headaches when using VS Code. If you wish to connect to a remote SSH machine in VS code, it will use your Windows configuration. So even if you only use WSL and the VS Code extension (WSL) to code in WSL2, you must follow the Windows ssh instructions. To use the same keys on each system, you can copy them. Following these instructions adapted from this article:
mkdir -p ~/.ssh. The -p means to ignore if the directory already exists.cp -r /mnt/c/Users/YOUR_USERNAME/.ssh ~/.sshchmod 600 ~/.ssh/KEYNAME and chmod 644 ~/.ssh/KEYNAME.pub for all the KEYNAMEs you wish to use in WSL.chmod 700 ~/.ssh.| Do not continue until you have verified correct installation of WSL and can find your SSH key in both Windows and WSL |
ssh-agent. To do this type in ssh-add PATH_TO_PRIVATE_KEY. PATH_TO_PRIVATE_KEY should be the full path to the private file. You’ll have to type your password in once and it will be saved for a period of time (terminal session or until your computer next reboots), drastically limiting the amount of times you have to type in your password.
ssh-add -l to list all keys in your ssh agent. Your key should appear here. If this command returns The agent has no identities., step 3 failed.Do not continue until you have verified that your key file appears when you run ssh-add -l |
We have now created an ssh key file that will allow us to login to the cluster. However, to login we will need to provide the path to key file as well as the username each time we want to login (something like ssh -i PATH_TO_KEY USERNAME@fe01.ds.uchicago.edu) which is annoying and error-prone. We will use a config file, in our .ssh directory to simplify this process. Instead we will be able to login using just ssh fe.ds after completing this process.
code C:\Users\USERNAME\.ssh\config where USERNAME is your windows username.touch ~/.ssh/config to create the file if it does not exist and open ~/.ssh/config to open it.Host * or Host fe01.ds.uchicago.edu. If you have a block that has the host information then you probably already had access to the cluster and will need to redo it based on the new keys you created.Host fe.ds*
HostName fe01.ds.uchicago.edu
IdentityFile PATH_TO_PRIVATE_KEY
ForwardAgent yes
User YOUR_CNET
Host *.ds !fe.ds
HostName %h.uchicago.edu
IdentityFile PATH_TO_PRIVATE_KEY
ForwardAgent yes
User YOUR_CNET
ProxyJump fe.ds
[Windows]
Host fe.ds*
HostName fe01.ds.uchicago.edu
IdentityFile PATH_TO_PRIVATE_KEY
ForwardAgent yes
User YOUR_CNET
MACs hmac-sha2-512
Host *.ds !fe.ds
HostName %h.uchicago.edu
IdentityFile PATH_TO_PRIVATE_KEY
ForwardAgent yes
User YOUR_CNET
ProxyJump fe.ds
MACs hmac-sha2-512
Replace YOUR_CNET with your CNET ID and PATH_TO_PRIVATE_KEY with the path the key you previously created. [Windows: PATH_TO_PRIVATE_KEY will be /Users/USERNAME/.ssh/KEYNAME where USERNAME is your windows username and KEYNAME is the name of the key you created. Starting with the root directory / is not standard for windows and will not typically work in other situations.] This will map fe.ds to an ssh command to the listed hostname, with the listed user and private key, and using the listed identity file as your key. ForwardAgent set to yes means that any ssh keys added to your local agent will also be added to the remote machines ssh agent (so you can use your local ssh key for GitHub on the cluster, for example). The second block is for connecting directly to compute nodes.
For a private key to work for authenticating, the service you are authenticating with must have access to your public key. We will set this up for github and the cluster.
type C:\Users\USERNAME\.ssh\KEYNAME.pub where USERNAME is your Windows username and KEYNAME is the key your created.cat ~/.ssh/KEYNAME.pub where KEYNAME is the key you created.ctrl+c may not work in terminal. ctrl+shift+c or right click may work.ssh -T git@github.com it should respond with Hi GITHUB_USERNAME! You've successfully authenticated, but GitHub does not provide shell access. or something similar.Do not continue until you have verified a success message when you run ssh -T git@github.com |
ssh-copy-id -i ~/.ssh/KEYNAME_HERE.pub fe.ds, replacing KEYNAME_HERE with the name of the public ssh key you would like to use (it should end with .pub).USERNAME@fe01.ds.uchicago.edu’s password. This will be your CNET password.ssh fe.ds should connect you to the cluster without typing any password.ssh fe.ds. You’ll have to type in your UChicago password. Your command prompt is now attached to the login node. The bottom left of your screen should say something like USERNAME@fe01:~$..ssh directory. If there is not, run mkdir .ssh.echo "PUBLIC_KEY_HERE" >> .ssh/authorized_keys, replacing PUBLIC_KEY_HERE with the copied public key and maintaining the quotations. ctrl+v may not paste in your terminal. Try right clicking, ctrl+shift+v, and shift+insert.exit to exit the cluster and return to your windows command prompt.ssh fe.ds should connect you to the cluster without typing any password.Reboot your machine.
At this point you should have access to both github and, optionally, the cluster. Verify you access before proceeding.
To learn more about using the cluster, see the slurm documentation