CSC 496: Distributed and Cluster Computing

Linh B. Ngo

Course Information

Instructional Staff

Instructor: Linh B. Ngo

  • Email: LNGO at WCUPA dot EDU
  • Office: 144 UNA (25 University Avenue)
  • Office Hours:
    • MWF 10:00AM to 11:00AM
    • TR 3:30PM to 5:00PM

Course Descriptions

This course will investigate issues in modern distributed platforms by examining a number of important technologies in the areas of distributed computing in computational and data-intensive problems.

By the end of the course, each student should understand and be able to apply several specific tradeoffs for parallel application and algorithms development, performance, and management on a number of distributed platforms.

Learning Objectives

  • Students will be able to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices.
  • Students will be able to analyze a problem, and identify and define the computing requirements appropriate to its solution.
  • Students will be able to apply design and development principles in the construction of large-scale computing systems.
  • Students will be able to function effectively on teams to accomplish a common goal.

Prerequisites

  • Working knowledge of C/C++, Java, and Linux system
  • Working knowledge of data structures and algorithms
  • Not being afraid of learning new languages (Python, Scala)

Important Dates

  • Tue, Sep 04, 2018: Last Day of Add/Drop
  • Tue, Oct 23, 2018: Last Day of Course Withdrawal
  • Tue, Nov 20, 2018: Reading/Writing Day
  • Thu, Nov 22, 2018: Fall/Thanksgiving Break
  • Thu, Dec 13 06:00PM to 08:00PM: FINAL EXAM

Laptop requirements

  • Having access to a laptop during class time is critical
    • Working with supercomputing resources in class
    • Working on in-class electronic quizzes on D2L
  • Make sure that your laptop is fully charged for the duration of the class (or come in early and get a spot with access to power outlets)

Software requirements

As laptop style and model can vary, the following common (and free) software environment will be enforced for all lectures and programming assignments:

Other software packages will be specified and installed insite the CentOS virtual machine as needed.

Course Materials

  • Lecture slides and example codes will be available online via links inside the course’ D2L page

  • Links to papers on subjects we will be discussing in class will also be listed and/or embedded in the slides.

    • West Chester University maintains extensive licensed products to academic publishers such as ACM, IEEE, Elsevier, and Springer, and many of the papers required for this course will be available through the library's online database.
    • Google Scholar is another excellent source for downloading preprint or open-source versions of papers.

Grading

Grades will be based on the following distribution:

  • Assignment:
    • Assignment: 60%
  • Exam:
    • Exam 1: 15%
    • Exam 2: 10% (Comprehensive)
  • Quiz: 10%
  • Participation: 5%

Participation

  • Participation accounts for 5% of your grade.
  • This part of your grade will be determined by:
    • whether or not you show up to office hours, and
    • how actively you participate (ask questions, make comments, contribute to activities) during class.
  • Participation grades will be assigned in a coarsely manner:
    • 100% to students who are fully engaged active participants,
    • 50% to students who are nominally engaged (physically there and willing to participate when asked to), and
    • 0% for students who really aren't engaged (trying to hide).

Office Hours:

  • Office hours are an opportunity to reinforce course topics either one-on-one or in small groups. If you are unable to attend during the posted time slots, I am happy to make an appointment.

  • You are required to come to office hours at least once (with something to discuss) before October 1st. If you don't, your participation score for the semester will be zero (0).

Letter grades are assigned according to the standard scale: (i.e. A: 90-100, B: 80-89, C: 70-79, D: 60-69, F: 0-59).

Number 100-93 92-90 89-87 86-83 82-80 79-77 76-73 72-70 69-67 66-63 62-60 <= 59
Letter A A- B+ B B- C+ C C- D+ D D- F

Grading Appeals

Mistakes occasionally happen during the grading process. If you think a mistake has been made regarding your grades, you should send me an email with detailed justification within one week of the date the grades are available. No changes on grades will be made after one week from the date the grades are posted.

Attendance Policy

  • Attendance is critical to the success of students in this class.
  • We will take note of who attends, including occasionally using attendance check in place of a quiz score.
  • If you miss a class, you are responsible for obtaining lecture notes, handouts, and homework assignments from fellow students.
  • If the instructor is late for class, please wait 20 minutes before leaving.

Excused Absences Policy for University-Sanctioned Event

  • Students are advised to carefully read and comply with the excused absences policy for university-sanctioned events contained in the WCU Undergraduate Catalog.
  • In particular, please note that the “responsibility for meeting academic requirements rests with the student,” that this policy does not excuse students from completing required academic work, and that professors can require a “fair alternative” to attendance on those days that students must be absent from class in order to participate in a University-Sanctioned Event.

Late Work

  • Without prior approval from the instructors, late homework assignments will not be accepted but will be assigned a grade of zero.
  • Unless accompanied with a valid medical or University excuse, all late submissions will be penalized.
  • A make-up for the exams will be given only with a valid medical or University excuse.

Academic Integrity

It is the responsibility of each student to adhere to the university’s standards for academic integrity. Violations of academic integrity include any act that violates the rights of another student in academic work, that involves misrepresentation of your own work, or that disrupts the instruction of the course. Other violations include (but are not limited to): cheating on assignments or examinations; plagiarizing, which means copying any part of another’s work and/or using ideas of another and presenting them as one’s own without giving proper credit to the source; selling, purchasing, or exchanging of term papers; falsifying of information; and using your own work from one class to fulfill the assignment for another class without significant modification. Proof of academic misconduct can result in the automatic failure and removal from this course.

Academic Integrity

For questions regarding Academic Integrity, Sexual Harassment, or the Student Code of Conduct, students are encouraged to refer to the “Other” Menu of the Computer Science web page www.cs.wcupa.edu/, the Undergraduate Catalog, the Ram’s Eye View, and the University website at www.wcupa.edu.

Collaboration Policy (for this specific class)

  • Class assignments are opportunities for learning and discovery.
  • Collaboration between students on homework assignments in this class is permitted. Collaboration includes students working together to gain an understanding of course concepts, active discussions with the instructor and other people to learn about course material, and other activities in which a student is actively seeking to learn and understand the topics covered in the course.
  • I do expect that you understand and can explain any homework solution that you submit, no matter how you worked on it.

Collaboration Policy (for this specific class)

  • Plagiarism is not allowed. Taking assignments from other classmates or downloading completed assignments from websites is not allowed.
  • If you collaborate with other students in class or use sources other than those provided for everyone in the course (e.g., instructor, recommended textbook, the course web site, or the lectures) to help yourself learn and understand, then you must give appropriate credit to those collaborators and/or sources.

Collaboration Policy (for this specific class)

  • As long as you acknowledge the collaboration that occurred, your grade will not be affected nor will you be charged with academic misconduct.
  • Any assignment that does not include a collaboration statement will be considered to have been completed with only course materials. If this is found to not be the case, it is considered a failure to acknowledge collaborations or give appropriate credit to sources of help (other than course materials or personnel as noted above) and will be treated as academic dishonesty.

Collaboration Statement

The statement should say:

  • “I worked on this assignment alone, using only course materials.”

OR

  • “I worked on this assignment with names of the people you worked with. My role in completing the assignment was provide description of all your contributions, while provide names of other collaborators role in completing the assignment was provide a description of their contributions. We consulted related material that can be found at cite any other materials not provided as course materials.”

Disability Accommodations

  • To know more about West Chester University’s Services for Students with Disabilities (OSSD), contact the OSSD which is located at 223 Lawrence Center. The OSSD hours of Operation are Monday – Friday 8:30 a.m. – 4:30 p.m. Their phone number is 610-436-2564, their fax number is 610-436-2600, their email address is ossd@wcupa.edu, and their website is at www.wcupa.edu/ussss/ossd.
  • If you have a disability that requires accommodations under the Americans with Disabilities Act (ADA), please present your letter of accommodations to OSSD as soon as possible so that OSSD can support your success in an informed manner. Accommodations cannot be granted retroactively.

Title IX Statement

  • West Chester University and its faculty are committed to assuring a safe and productive educational environment for all students. In order to meet this commitment and to comply with Title IX of the Education Amendments of 1972 and guidance from the Office for Civil Rights, the University requires faculty members to report incidents of sexual violence shared by students to the University's Title IX Coordinator, Ms. Lynn Klingensmith.
  • The only exceptions to the faculty member's reporting obligation are when incidents of sexual violence are communicated by a student during a classroom discussion, in a writing assignment for a class, or as part of a University-approved research project. Faculty members are obligated to report sexual violence or any other abuse of a student who was, or is, a child (a person under 18 years of age) when the abuse allegedly occurred to the person designated in the University protection of minors policy. Information regarding the reporting of sexual violence and the resources that are available to victims of sexual violence is set forth at the webpage for the Office of Social Equity at http://www.wcupa.edu/_admin/social.equity/.

Title IX Statement

  • Ms. Lynn Klingensmith is the West Chester University Title IX Coordinator and is also the Director of Social Equity. She can be reached at 610-436-2433 or by email at LKlingensmith@wcupa.edu and can connect you to resources both on and on campus, as well as provide information about the processes related to cases of sexual misconduct.

  • West Chester University community members also have a right to report acts of sexual misconduct to the Office of Civil Rights. They can be contacted at: The Wanamaker Building, 100 Penn Square East, Suite 515, Philadelphia, PA 19107-3323, (215) 656-8541, OCR.Philadelphia@ed.gov

Emergency Preparedness

  • All students are encouraged to sign up for the University’s free WCU ALERT service, which delivers official WCU emergency text messages directly to your cell phone.
  • For more information, visit www.wcupa.edu/wcualert. To report an emergency, call the Department of Public Safety at 610-436-3311.

Electronic Mail Policy

  • It is expected that faculty, staff, and students activate and maintain regular access to University provided e-mail accounts.
  • Official university communications, including those from your instructor, will be sent through your university e-mail account.
  • You are responsible for accessing that mail to be sure to obtain official University communications.
  • Failure to access will not exempt individuals from the responsibilities associated with this course.

Instructor Email Policy

For individual issue, it is best to contact me via email. I check my email frequently during normal working hours (9-5) on weekdays, and I will try to respond quickly (hopefully the same day). I do also check email on weekends and evenings, but not nearly as frequently (almost never on Sundays). As a result, you should expect longer delays during these times.

If you send me an assignment-related email right before a deadline, I may not answer it in time to be helpful.

How to Succeed in this Class

Siemens, G. (2014). Connectivism: A learning theory for the digital age. International Journal of Instructional Technology and Distance Learning, 2(1). January 2005. (5276 citations)

  • Learning involves the active creation of mental structures, rather than the passive internalization of information acquired from others or from the environment.
  • Learning (defined as actionable knowledge) can reside outside of ourselves (within an organization or a database), is focused on connecting specialized information sets, and the connections that enable us to learn more are more important than our current state of knowing.

How to Succeed in this Class

Learning approach

  • Is based on the connective constructivism learning theory
  • Grading style: Emphasis on what you can learn. You will be graded not only on the correctness of your answers, but also the thought process that takes you to those answers.
  • Learning to learn (hence the collaboration policy)

Git

  • The class materials, including source codes, will be disseminated via Git. Being able to use Git is a critical skill for most, if not all software developers and/or IT professionals. There are many tutorials already available online for Git. Some of the more helpful ones include github's, "the simple guide", and atlassian's);

  • It would be a mistke if you just attempt to access the cloass materials via the web browser. "This is a mistake. Just learn Git. The command line interface is faster and more powerful, and you're going to need to learn it at some point in your life. Why not today?" - Dr. Jacob Sorber, Clemson University.

Tentative Course Outline

  • Cluster of Computers
  • Parallel and Distributed File Systems
  • High Performance and Data-Intensive Computing
  • Job Scheduling on Cluster of Computers
  • Complex Distributed Systems

The Demand for Computation Speed

Measuring of Speed

  • FLOPS: Floating-Point Operation Per Second
    • $FLOPS=Number\_of\_Cores * Average\_Frequency * Instructions\_per\_Cycle$
    • First Computer (ENIAC): 500 FLOPS
  • MFLOPS = 1,000,000 FLOPS
  • GFLOPS = 1,000,000,000 FLOPS

In [1]:
# Intel Core i7-3770K Quad-Core Processor 3.5 GHz (Ivy Bridge):
FLOPS = 4 * 3.5 * 8
print (FLOPS)

# What is the unit? (FLOPS/MFLOPS/GFLOPS)


112.0

Measuring of Speed

  • TFLOPS = 1,000,000,000,000 FLOPS
    • Intel's ASCI Red was the first supercomputer in the world to achieve 1TFLOPS in 1993
    • Simulating the development and maintenance of nuclear weapon
  • PFLOPS = 1,000,000,000,000,000 FLOPS
    • IBM RoadRunner was the first supercomputer to achieve 1PFLOPS in 2008
    • China’s Sunway Taihulight is the current fastest supercomputer in the world at 93PFLOPS
  • EFLOPS = 1,000,000,000,000,000,000 FLOPS
    • A21 (first U.S. Exascale computer, 2021, Argonne National Lab)
    • Tianhe 3 (first China Exascale computer, 2020, National University of Defense Technology)
    • Post-K (first Japan Exascale computer, 2020, RIKEN Center for Computational Science)
    • Europe, India, Taiwan on track

Why do we need supercomputer?

Forecasting Sandy

Computational Power for Weather Forecast Agencies (2013)

http://blogs.agu.org/wildwildscience/2013/02/17/seriously-behind-the-numerical-weather-prediction-gap/

Catching up

http://www.noaa.gov/noaa-completes-weather-and-climate-supercomputer-upgrades

  • Two new supercomputers, Luna and Surge
  • 2.89 PFLOPS each for a total of 5.78 PFLOPS (previous generation is only 776 TFLOPS)
  • Increase water quantity forecast from 4000 locations to 2.7 million locations (700-fold increase in spatial density)
  • Can track and forecast 8 storms at any given time
  • 44.5 million dollars investment

Reduce manufacturing costs

  • Boeing tested 77 prototype wing designs for 767
  • For 787, only 11 wing designs had to be physically developed and tested (7 fold reduction in prototyping cost)
    • This is due to more than 800,000 hours of computer simulation

Sources: Real-World Examples of Supercomputers Used for Economic and Societal Benefits

Improve Oil and Gas Exploration

  • Los Alamos National Lab
  • Development of large-scale data analytic techniques to simulate and predict subsurface fluid distribution, temperature, and pressure
  • This reduces the need for observation wells (has demonstrated commercial success)

Sources: Real-World Examples of Supercomputers Used for Economic and Societal Benefits

Fraud Detection at PayPal

  • 10M+ logins, 13M+ transactions, 300 variables per events
  • ~4B inserts, ~8B selects
  • MPI-like applications, Lustre Parallel File Systems, Hadoop
  • Saved over \$700M in fraudulent transactions during their first year of deployment

Sources: