Knowledge of your storage system is an often overlooked, but critical aspect
of any cluster system. For most users, it may be enough to see that there is
space, and therefore they use it accordingly. Unfortunately this is a mistake
that often has a radical impact on the performance of the cluster system,
causing it to slow or fail, and by extension slowing and failing everything
that runs on that cluster.
textbfThe key thing to realize is that
there are different kinds of storage, and different storages are optimized and
fit for particular purposes. When it is time for you to use a cluster system,
it is important to understand the demands of your programs with regards to
storage. Consider whether you have needs for fast storage, local storage,
shared storage, long term storage, short term storage, or large amounts of
storage.
Not all cluster systems have all types of storage available to them.
Its not unusual for a cluster to have only a few kinds of
storage, or be designed primarily around a single type of storage with other
types being merely ancillary. example, the umms-amino cluster is
optimized for fast file access and reads, and is built around a fast storage
system. While the axiom general cluster (at the time of this writing)
has a solution that is engineered for data volume rather than performance. Most
cluster systems have local scratch (storage located on each compute
node, i.e. not shared (in most local systems this storage is located at
/tmp)) available, however some may not and rely entirely on a shared
storage system.
As any shared space can be saturated by the number of programs running on a
cluster system, any need to access or write files to a shared space (like your
home directory) can be affected by other jobs accessing that same general
location. Ideally a cluster system may have hundreds, if not thousands, of jobs
running on it at any given time. you place a thousand cars at a traffic
intersection, no matter how well the intersection is designed or how many lanes
there are, it will still take some time for all those cars to clear the
intersection.
Using storage on a cluster system is often a transparent process. The
details of what the storage is and where it is, and its available features is
often not readily apparent. As you change directories on any individual node in
the cluster system, you may move seamlessly between different storage systems,
each with different characteristics. example, home directories (more on
these later), are often on a shared storage system, but this may not be made
apparent to you without querying the system information and or asking a system
administrator. In most cases you will be directed to a preferred location (such
as /tmp) from which to run your program, and it will be up to you to
take advantage of this location when you submit jobs to the scheduling system.
Note: When you log into a cluster system you typically start out in your
home directory. The home directory is usually a shared location which
you can use to setup your personal configuration and preferences, as well as
test and build programs to run on the cluster. previously mentioned
guidelines for a preferred run location apply mostly to program data,
and not the programs themselves. programs (jobs) that you plan on
submitting to the cluster read specific data files (databases or the like) or
write out large data files (building and assembling a large data structure or
time step analysis), it is best for this data be copied or staged in preferred
locations according to any guidelines given to you (i.e. like /tmp).