Introduction to HTC at CHTC: Glossary

Key Points

Introduction
  • High throughput computing is good for computations that can be broken into individual pieces.

  • CHTC users can access the CHTC pool, UW Grid and Open Science Grid.

Jobs and Scheduling
  • A job consists of a computational task, usually defined by input data and a piece of software, producing output data.

  • Most large scale systems consist of a head node for logging in and submitting jobs, where jobs are performed on worker nodes.

  • A batch scheduler controls where and when jobs run on the worker nodes.

Submitting a Job
  • A submit file tells the job scheduler information like resource requirements, software and data requirements, and what commands to run.

Life Cycle of a Job
  • Jobs may go through various states in the queue, and can be monitored by a queue command.

Understanding Job Output
  • Successful jobs leave the queue, bring back output files, and have no errors.

  • Log files tell when and where a job waited and ran, and how many resources it used.

  • Output files capture any general information printed by the job’s executable.

  • Error files capture any errors that printed when the job executable ran.

Troubleshooting and Interactive Jobs
  • Always run a test job before submitting a full scale job.

  • To test a new job, use an interactive session beore submitting.

  • You can use log, standard output, and standard error information to determine why jobs fail.

Being a Good Citizen in CHTC
  • It is important to follow your resource’s policies and procedures.

  • Have a data management plan in place before you start computing.

Glossary

FIXME