Understanding Storage

Next: Understanding the Head Node Up: Understanding the Cluster Previous: Understanding the Cluster Contents

Understanding Storage

Knowledge of your storage system is an often overlooked, but critical aspect of any cluster system. For most users, it may be enough to see that there is space, and therefore they use it accordingly. Unfortunately this is a mistake that often has a radical impact on the performance of the cluster system, causing it to slow or fail, and by extension slowing and failing everything that runs on that cluster.
textbfThe key thing to realize is that there are different kinds of storage, and different storages are optimized and fit for particular purposes. When it is time for you to use a cluster system, it is important to understand the demands of your programs with regards to storage. Consider whether you have needs for fast storage, local storage, shared storage, long term storage, short term storage, or large amounts of storage.

Not all cluster systems have all types of storage available to them. Its not unusual for a cluster to have only a few kinds of storage, or be designed primarily around a single type of storage with other types being merely ancillary. example, the umms-amino cluster is optimized for fast file access and reads, and is built around a fast storage system. While the axiom general cluster (at the time of this writing) has a solution that is engineered for data volume rather than performance. Most cluster systems have local scratch (storage located on each compute node, i.e. not shared (in most local systems this storage is located at /tmp)) available, however some may not and rely entirely on a shared storage system.

As any shared space can be saturated by the number of programs running on a cluster system, any need to access or write files to a shared space (like your home directory) can be affected by other jobs accessing that same general location. Ideally a cluster system may have hundreds, if not thousands, of jobs running on it at any given time. you place a thousand cars at a traffic intersection, no matter how well the intersection is designed or how many lanes there are, it will still take some time for all those cars to clear the intersection.

Using storage on a cluster system is often a transparent process. The details of what the storage is and where it is, and its available features is often not readily apparent. As you change directories on any individual node in the cluster system, you may move seamlessly between different storage systems, each with different characteristics. example, home directories (more on these later), are often on a shared storage system, but this may not be made apparent to you without querying the system information and or asking a system administrator. In most cases you will be directed to a preferred location (such as /tmp) from which to run your program, and it will be up to you to take advantage of this location when you submit jobs to the scheduling system.

Note: When you log into a cluster system you typically start out in your home directory. The home directory is usually a shared location which you can use to setup your personal configuration and preferences, as well as test and build programs to run on the cluster. previously mentioned guidelines for a preferred run location apply mostly to program data, and not the programs themselves. programs (jobs) that you plan on submitting to the cluster read specific data files (databases or the like) or write out large data files (building and assembling a large data structure or time step analysis), it is best for this data be copied or staged in preferred locations according to any guidelines given to you (i.e. like /tmp).

Next: Understanding the Head Node Up: Understanding the Cluster Previous: Understanding the Cluster Contents

2010-08-27