Computing at Scale

Key Points

Motivation
  • When a problem can no longer run on a desktop or laptop, use a large-scale computing resource.

Resources and Accounts
  • Find a large scale resource that matches your needs.

  • Resources will require an account request or application.

Logging In
  • To log in, you will need an account name and password, and the name of the login computer.

Jobs and Scheduling
  • A job consists of a computational task, usually defined by input data and a piece of software, producing output data.

  • Most large scale systems consist of a head node for logging in and submitting jobs, where jobs are performed on worker nodes.

  • A batch scheduler controls where and when jobs run on the worker nodes.

Submitting a Job
  • In order to submit a job, you need to write a submit description file.

Monitoring a Job
  • Jobs may go through various states in the queue, and can be monitored by a queue command.

Etiquette on Shared Resources
  • It is important to follow your resource’s policies and procedures.

  • Have a data management plan in place before you start computing.

Troubleshooting
  • Always run a test job before submitting a full scale job.

  • To test a new job, use an interactive session beore submitting.

  • You can use log, standard output, and standard error information to determine why jobs fail.

Other Topics
  • Large-scale compute systems are powerful, but only if you ask questions and get the help you need.

Conversion table

Unit Smaller unit
1MB 1000 KB
1GB 1000 MB
1TB 1000 GB
1PB 1000 TB

could use creative way to represent this…

Glossary

batch scheduler
what it is
cluster
definition
computer
definition
core
definition
CPU
definition
disk
definition
execute node
definition
GPU
definition
hard drive, hard disk
definition
head node
where you log in
high-throughput computing
definition
high-performance computing
definition
large-scale computing
definition
log-in node
definition
memory
definition
MPI
definition
node
can be server, or process
parallel
definition
process
definition
processer
definition
RAM
see memory
scheduler
see batch scheduler
server
definition
submit node
definition
supercomputer
definition
thread
definition
worker node
definition