Introduction to Clemson Computing Resources

Linh B. Ngo

Computing Resources Requirement

Assignments and projects will include components that require execution and evaluation on distributed computing resources available to Clemson students

  • Shared high performance computing environment (non-root): Palmetto Supercomputer
  • Shared data-intensive computing environment (non-root): Cypress Hadoop Cluster
  • Isolated experimental environment for small clusters (root): CloudLab
  • Remote computing resources (non-root): XSEDE

These resources are shared among all Clemson faculty and students, and in the case of CloudLab and XSEDE, nationally. It is your responsibility to submit your jobs to these resources early enough to ensure that you can meet deadlines for assignments and projects

Palmetto Supercomputer

  • Ranked $172^{nd}$ on the TOP500 list ($8^{th}$ among all U.S. academic institutions)
  • Sustained 870.0 TFLOPS in June 2017 (from 814.4 TFLOPS in November 2016)
  • Highly heterogeneous (more than 14 purchasing phases)
  • 2022 compute nodes, 23863 cores
  • 458 nodes are equipped with NVIDIA Tesla GPUs:
      - 280 nodes with NVIDIA K20 GPUs (2 per node)
      - 138 nodes with NVIDIA K40 GPUs (2 per node)
      - 40 nodes with NVIDIA P100 GPUs (2 per node)
  • 4 nodes with Intel Phi co-processors (2 per node)
  • 7 large memory nodes (5 with 505GB, 1 with 2TB, 1 with 15TB), 334 nodes with 128GB of memory
  • 100GB of personal space (backed up daily for 42 days)
  • Myrinet, 10Gbps Ethernet, Infiniband networks
  • Global and local scratch spaces for temporary files (no quota per user - 522TB across all scratch spaces)

Who uses Palmetto?

Detailed information and guide:

https://www.palmetto.clemson.edu/palmetto

How to connect to Palmetto via Command-Line Terminal?

https://www.palmetto.clemson.edu/palmetto/userguide_basic_usage.html

Working with Palmetto

Node naming scheme

  • Login node: login.palmetto.clemson.edu (seen as login001 internally)
  • The login node is the only gateway to access Palmetto nodes from outside
  • DO NOT RUN ANYTHING ON THE LOGIN NODE
  • File Management: xfer01-ext.palmetto.clemson.edu (only accessible once you are logged into Palmetto)
  • Compute nodes: nodeXXXX.palmetto.clemson.edu
  • From inside Palmetto, all nodes can be accessed via their shortened name (without the .palmetto.clemson.edu suffix)

Requesting Resources from Palmetto

  • SSH into Palmetto: You are now on login001
  • Two modes to submit resource request:
    • Interactive mode: qsub -I - l [resource specification]
    • Batch mode: qsub [PBS script containing resource specification and job description]
  • Resource request can only be done from login001 and not from any other computing nodes

Further reading on Palmetto resource request from command line:

https://www.palmetto.clemson.edu/palmetto/pages/userguide.html#submitting

Working with Palmetto under Batch Mode

qsub [PBS script containing resource specification and job description]

  • qsub can only be called from the headnode
  • SSH access from local machines
  • ssh login001 from Jupyter terminal

PBS Script format

  • Bash script by nature
  • Line starting with #PBS will be intepreted by the PBS scheduler
  • Line without #PBS will be executed as bash commands
#!/bin/bash
#PBS -N cpsc4770
#PBS -l select=1:ncpus=4:mpiprocs=4:mem=8gb
#PBS -l walltime=00:50:00
#PBS -j oe
Program execution instructions go here ...

PBS flags

  • -N (job name)
  • -l (resource requests)
  • -j oe (configure PBS to merge output and error logs)

PBS environmental variables

  • PBS_O_WORKDIR: Environment variable indicating the directory from which qsub was called
  • PBS_NODEFILE: Name of the file containing list of compute nodes allocated for the job
  • PBS_ARRAY_INDEX: Index of job launched as part of a PBS job array.

What happens after a job is submitted?

  • After qsub is executed, you will see a job ID return to the command line
  • You can also view the status of your job via (from any node) qstat -anu <yourusername>
  • You can delete job (from head node only) using qdel <job ID>

Connecting to Palmetto with Interactive Mode using Web Browser

1) Log into Palmetto via JupyterHub interface

2) Request (default parameters):

- 1 chunk
- 1 cpu core per chunk
- 1gb of memory per chunk
- no GPU
- 30 minutes walltime
- workq

3) Via Jupyter, create a directory call cpsc4770-6770

4) Inside cpsc4770-6770, use Jupyter's editor to create a file named gethostname.py containing the following lines:

#gethostname.py
import socket
print ("hello world from host %s" % (socket.gethostname()))

5) Execute gethostname.py from the Jupyter terminal

python gethostname.py

6) Execute qstat and compare the outcome with results from gethostname.py

qstat -anu <username>

Create a PBS submission script that requests 4 chunk, 8 core per chunk, 8gb of ram per chunk, 2-minute walltime.

Have this script executes the gethostname.py program

View the PBS log file once the script is completed