| notebook.community

Introduction to Clemson Computing Resources

Linh B. Ngo

Computing Resources Requirement

Assignments and projects will include components that require execution and evaluation on distributed computing resources available to Clemson students

Shared high performance computing environment (non-root): Palmetto Supercomputer

Shared data-intensive computing environment (non-root): Cypress Hadoop Cluster

Isolated experimental environment for small clusters (root): CloudLab

Remote computing resources (non-root): XSEDE

These resources are shared among all Clemson faculty and students, and in the case of CloudLab and XSEDE, nationally. It is your responsibility to submit your jobs to these resources early enough to ensure that you can meet deadlines for assignments and projects

Palmetto Supercomputer

Ranked $172^{nd}$ on the TOP500 list ($8^{th}$ among all U.S. academic institutions)
Sustained 870.0 TFLOPS in June 2017 (from 814.4 TFLOPS in November 2016)
Highly heterogeneous (more than 14 purchasing phases)
2022 compute nodes, 23863 cores

458 nodes are equipped with NVIDIA Tesla GPUs:

  - 280 nodes with NVIDIA K20 GPUs (2 per node)
  - 138 nodes with NVIDIA K40 GPUs (2 per node)
  - 40 nodes with NVIDIA P100 GPUs (2 per node)

4 nodes with Intel Phi co-processors (2 per node)
7 large memory nodes (5 with 505GB, 1 with 2TB, 1 with 15TB), 334 nodes with 128GB of memory
100GB of personal space (backed up daily for 42 days)
Myrinet, 10Gbps Ethernet, Infiniband networks
Global and local scratch spaces for temporary files (no quota per user - 522TB across all scratch spaces)

Who uses Palmetto?

Detailed information and guide:

https://www.palmetto.clemson.edu/palmetto

How to connect to Palmetto via Command-Line Terminal?

https://www.palmetto.clemson.edu/palmetto/userguide_basic_usage.html

Working with Palmetto

Node naming scheme

Login node: login.palmetto.clemson.edu (seen as login001 internally)
The login node is the only gateway to access Palmetto nodes from outside
DO NOT RUN ANYTHING ON THE LOGIN NODE
File Management: xfer01-ext.palmetto.clemson.edu (only accessible once you are logged into Palmetto)
Compute nodes: nodeXXXX.palmetto.clemson.edu
From inside Palmetto, all nodes can be accessed via their shortened name (without the .palmetto.clemson.edu suffix)

Requesting Resources from Palmetto

SSH into Palmetto: You are now on login001
Two modes to submit resource request:
- Interactive mode: qsub -I - l [resource specification]
- Batch mode: qsub [PBS script containing resource specification and job description]
Resource request can only be done from login001 and not from any other computing nodes

Further reading on Palmetto resource request from command line:

https://www.palmetto.clemson.edu/palmetto/pages/userguide.html#submitting

Working with Palmetto under Batch Mode

`qsub [PBS script containing resource specification and job description]`

qsub can only be called from the headnode
SSH access from local machines
ssh login001 from Jupyter terminal

PBS Script format

Bash script by nature
Line starting with #PBS will be intepreted by the PBS scheduler
Line without #PBS will be executed as bash commands

#!/bin/bash
#PBS -N cpsc4770
#PBS -l select=1:ncpus=4:mpiprocs=4:mem=8gb
#PBS -l walltime=00:50:00
#PBS -j oe
Program execution instructions go here ...

PBS flags

-N (job name)
-l (resource requests)
-j oe (configure PBS to merge output and error logs)

PBS environmental variables

PBS_O_WORKDIR: Environment variable indicating the directory from which qsub was called
PBS_NODEFILE: Name of the file containing list of compute nodes allocated for the job
PBS_ARRAY_INDEX: Index of job launched as part of a PBS job array.

What happens after a job is submitted?

After qsub is executed, you will see a job ID return to the command line
You can also view the status of your job via (from any node) qstat -anu <yourusername>
You can delete job (from head node only) using qdel <job ID>

Connecting to Palmetto with Interactive Mode using Web Browser

https://clemsonciti.github.io/jupyter-docs/documentation.html

1) Log into Palmetto via JupyterHub interface

2) Request (default parameters):

- 1 chunk
- 1 cpu core per chunk
- 1gb of memory per chunk
- no GPU
- 30 minutes walltime
- workq

3) Via Jupyter, create a directory call cpsc4770-6770

4) Inside cpsc4770-6770, use Jupyter's editor to create a file named gethostname.py containing the following lines:

#gethostname.py
import socket
print ("hello world from host %s" % (socket.gethostname()))

5) Execute gethostname.py from the Jupyter terminal

python gethostname.py

6) Execute qstat and compare the outcome with results from gethostname.py

qstat -anu <username>