The default method used when allocating nodes and processors is to break the
request into slices, where the size of each slice is equal to the
value of ppn, and the number of slices is equal to the value of
nodes. On some clusters, the scheduler will not necessarily separate
different slices on to separate nodes, rather it will try to assign as many
slices onto individual nodes as it can according to resource availability. This
can lead to situations where you may request 4 nodes with 2 processors per
node, but you are assigned 2 nodes with 6 processors on one node, while the
second has only 2 processors assigned.
Example:
This behavior originated to accommodate, a system in which there are many
different kinds of jobs running with different requirements. When a cluster is
in general use, some nodes may have many processors free, while another may
only have a few free if any. In these cases it make take a great deal of time
for the appropriate number of nodes with the appropriate number of processors
to become available, where if the job were just assigned slices as available,
it could run much sooner. Depending on the type of job you are running this may
be a very bad idea. Due to factors, such as memory issues or disk access
behavior you may want to restrict a node to only the number of tasks
specified, or you may be willing to wait until the scheduler can free nodes
exclusively for your job.