Requesting Multiple Nodes and CPUs

Next: Explicitly Assigning Slices to Up: Requesting Resources - Reference Previous: Requesting Multiple Nodes and Contents

Requesting Multiple Nodes and CPUs - Advanced

The default method used when allocating nodes and processors is to break the request into slices, where the size of each slice is equal to the value of ppn, and the number of slices is equal to the value of nodes. On some clusters, the scheduler will not necessarily separate different slices on to separate nodes, rather it will try to assign as many slices onto individual nodes as it can according to resource availability. This can lead to situations where you may request 4 nodes with 2 processors per node, but you are assigned 2 nodes with 6 processors on one node, while the second has only 2 processors assigned.

Example:

[jdpoisso@umms-amino blast_test]$ qstat -f 2500610 Job Id: 2500610.umms-amino.ccmb.med.umich.edu Job_Name = blast.sh exec_host = compute-6-14/7+compute-6-14/6+compute-6-14/5+compute-6-14/4+co mpute-6-14/3+compute-6-14/0+compute-6-12/3+compute-6-12/0 <- output omitted -> Resource_List.nodect = 4 Resource_List.nodes = 4:ppn=2 <- output omitted ->

This behavior originated to accommodate, a system in which there are many different kinds of jobs running with different requirements. When a cluster is in general use, some nodes may have many processors free, while another may only have a few free if any. In these cases it make take a great deal of time for the appropriate number of nodes with the appropriate number of processors to become available, where if the job were just assigned slices as available, it could run much sooner. Depending on the type of job you are running this may be a very bad idea. Due to factors, such as memory issues or disk access behavior you may want to restrict a node to only the number of tasks specified, or you may be willing to wait until the scheduler can free nodes exclusively for your job.

Next: Explicitly Assigning Slices to Up: Requesting Resources - Reference Previous: Requesting Multiple Nodes and Contents

2010-08-27