Storage Management Day-to-Day

5.7. Storage Management Day-to-Day

System administrators must pay attention to storage in the course of their day-to-day routine. There are various issues that should be kept in mind:

  • Monitoring free space

  • Disk quota issues

  • File-related issues

  • Directory-related issues

  • Backup-related issues

  • Performance-related issues

  • Adding/removing storage

The following sections discuss each of these issues in more detail.

5.7.1. Monitoring Free Space

Making sure there is sufficient free space available should be at the top of every system administrator's daily task list. The reason why regular, frequent free space checking is so important is because free space is so dynamic; there can be more than enough space one moment, and almost none the next.

In general, there are three reasons for insufficient free space:

  • Excessive usage by a user

  • Excessive usage by an application

  • Normal growth in usage

These reasons are explored in more detail in the following sections.

5.7.1.1. Excessive Usage by a User

Different people have different levels of neatness. Some people would be horrified to see a speck of dust on a table, while others would not think twice about having a collection of last year's pizza boxes stacked by the sofa. It is the same with storage:

  • Some people are very frugal in their storage usage and never leave any unneeded files hanging around.

  • Some people never seem to find the time to get rid of files that are no longer needed.

Many times where a user is responsible for using large amounts of storage, it is the second type of person that is found to be responsible.

5.7.1.1.1. Handling a User's Excessive Usage

This is one area in which a system administrator needs to summon all the diplomacy and social skills they can muster. Quite often discussions over disk space become emotional, as people view enforcement of disk usage restrictions as making their job more difficult (or impossible), that the restrictions are unreasonably small, or that they just do not have the time to clean up their files.

The best system administrators take many factors into account in such a situation. Are the restrictions equitable and reasonable for the type of work being done by this person? Does the person seem to be using their disk space appropriately? Can you help the person reduce their disk usage in some way (by creating a backup CD-ROM of all emails over one year old, for example)? Your job during the conversation is to attempt to discover if this is, in fact, the case while making sure that someone that has no real need for that much storage cleans up their act.

In any case, the thing to do is to keep the conversation on a professional, factual level. Try to address the user's issues in a polite manner ("I understand you are very busy, but everyone else in your department has the same responsibility to not waste storage, and their average utilization is less than half of yours.") while moving the conversation toward the matter at hand. Be sure to offer assistance if a lack of knowledge/experience seems to be the problem.

Approaching the situation in a sensitive but firm manner is often better than using your authority as system administrator to force a certain outcome. For example, you might find that sometimes a compromise between you and the user is necessary. This compromise can take one of three forms:

  • Provide temporary space

  • Make archival backups

  • Give up

You might find that the user can reduce their usage if they have some amount of temporary space that they can use without restriction. People that often take advantage of this situation find that it allows them to work without worrying about space until they get to a logical stopping point, at which time they can perform some housekeeping, and determine what files in temporary storage are really needed or not.

WarningWarning
 

If you offer this situation to a user, do not fall into the trap of allowing this temporary space to become permanent space. Make it very clear that the space being offered is temporary, and that no guarantees can be made as to data retention; no backups of any data in temporary space are ever made.

In fact, many administrators often underscore this fact by automatically deleting any files in temporary storage that are older than a certain age (a week, for example).

Other times, the user may have many files that are so obviously old that it is unlikely continuous access to them is needed. Make sure you determine that this is, in fact, the case. Sometimes individual users are responsible for maintaining an archive of old data; in these instances, you should make a point of assisting them in that task by providing multiple backups that are treated no differently from your data center's archival backups.

However, there are times when the data is of dubious value. In these instances you might find it best to offer to make a special backup for them. You then back up the old data, and give the user the backup media, explaining that they are responsible for its safekeeping, and if they ever need access to any of the data, to ask you (or your organization's operations staff — whatever is appropriate for your organization) to restore it.

There are a few things to keep in mind so that this does not backfire on you. First and foremost is to not include files that are likely to need restoring; do not select files that are too new. Next, make sure that you are able to perform a restoration if one ever is requested. This means that the backup media should be of a type that you are reasonably sure will be used in your data center for the foreseeable future.

TipTip
 

Your choice of backup media should also take into consideration those technologies that can enable the user to handle data restoration themselves. For example, even though backing up several gigabytes onto CD-R media is more work than issuing a single command and spinning it off to a 20GB tape cartridge, consider that the user can then be able to access the data on CD-R whenever they want — without ever involving you.

5.7.1.2. Excessive Usage by an Application

Sometimes an application is responsible for excessive usage. The reasons for this can vary, but can include:

  • Enhancements in the application's functionality require more storage

  • An increase in the number of users using the application

  • The application fails to clean up after itself, leaving no-longer-needed temporary files on disk

  • The application is broken, and the bug is causing it to use more storage than it should

Your task is to determine which of the reasons from this list apply to your situation. Being aware of the status of the applications used in your data center should help you eliminate several of these reasons, as should your awareness of your users' processing habits. What remains to be done is often a bit of detective work into where the storage has gone. This should narrow down the field substantially.

At this point you must then take the appropriate steps, be it the addition of storage to support an increasingly-popular application, contacting the application's developers to discuss its file handling characteristics, or writing scripts to clean up after the application.

5.7.1.3. Normal Growth in Usage

Most organizations experience some level of growth over the long term. Because of this, it is normal to expect storage utilization to increase at a similar pace. In nearly all circumstances, ongoing monitoring can reveal the average rate of storage utilization at your organization; this rate can then be used to determine the time at which additional storage should be procured before your free space actually runs out.

If you are in the position of unexpectedly running out of free space due to normal growth, you have not been doing your job.

However, sometimes large additional demands on your systems' storage can come up unexpectedly. Your organization may have merged with another, necessitating rapid changes in the IT infrastructure (and therefore, storage). A new high-priority project may have literally sprung up overnight. Changes to an existing application may have resulted in greatly increased storage needs.

No matter what the reason, there are times when you will be taken by surprise. To plan for these instances, try to configure your storage architecture for maximum flexibility. Keeping spare storage on-hand (if possible) can alleviate the impact of such unplanned events.

5.7.2. Disk Quota Issues

Many times the first thing most people think of when they think about disk quotas is using it to force users to keep their directories clean. While there are sites where this may be the case, it also helps to look at the problem of disk space usage from another perspective. What about applications that, for one reason or another, consume too much disk space? It is not unheard of for applications to fail in ways that cause them to consume all available disk space. In these cases, disk quotas can help limit the damage caused by such errant applications, forcing it to stop before no free space is left on the disk.

The hardest part of implementing and managing disk quotas revolves around the limits themselves. What should they be? A simplistic approach would be to divide the disk space by the number of users and/or groups using it, and use the resulting number as the per-user quota. For example, if the system has a 100GB disk drive and 20 users, each user should be given a disk quota of no more than 5GB. That way, each user would be guaranteed 5GB (although the disk would be 100% full at that point).

For those operating systems that support it, temporary quotas could be set somewhat higher — say 7.5GB, with a permanent quota remaining at 5GB. This would have the benefit of allowing users to permanently consume no more than their percentage of the disk, but still permitting some flexibility when a user reaches (and exceeds) their limit. When using disk quotas in this manner, you are actually over-committing the available disk space. The temporary quota is 7.5GB. If all 20 users exceeded their permanent quota at the same time and attempted to approach their temporary quota, that 100GB disk would actually have to be 150GB to allow everyone to reach their temporary quota at the same time.

However, in practice not everyone exceeds their permanent quota at the same time, making some amount of overcommitment a reasonable approach. Of course, the selection of permanent and temporary quotas is up to the system administrator, as each site and user community is different.

5.7.3. File-Related Issues

System administrators often have to deal with file-related issues. The issues include:

  • File Access

  • File Sharing

5.7.3.1. File Access

Issues relating to file access typically revolve around one scenario — a user is not able to access a file they feel they should be able to access.

Often this is a case of user #1 wanting to give a copy of a file to user #2. In most organizations, the ability for one user to access another user's files is strictly curtailed, leading to this problem.

There are three approaches that could conceivably be taken:

  • User #1 makes the necessary changes to allow user #2 to access the file wherever it currently exists.

  • A file exchange area is created for such purposes; user #1 places a copy of the file there, which can then be copied by user #2.

  • User #1 uses email to give user #2 a copy of the file.

There is a problem with the first approach — depending on how access is granted, user #2 may have full access to all of user #1's files. Worse, it might have been done in such a way as to permit all users in your organization access to user #1's files. Still worse, this change may not be reversed after user #2 no longer requires access, leaving user #1's files permanently accessible by others. Unfortunately, when users are in charge of this type of situation, security is rarely their highest priority.

The second approach eliminates the problem of making all of user #1's files accessible to others. However, once the file is in the file exchange area the file is readable (and depending on the permissions, even writable) by all other users. This approach also raises the possibility of the file exchange area becoming filled with files, as users often forget to clean up after themselves.

The third approach, while seemingly an awkward solution, may actually be the preferable one in most cases. With the advent of industry-standard email attachment protocols and more intelligent email programs, sending all kinds of files via email is a mostly foolproof operation, requiring no system administrator involvement. Of course, there is the chance that a user will attempt to email a 1GB database file to all 150 people in the finance department, so some amount of user education (and possibly limitations on email attachment size) would be prudent. Still, none of these approaches deal with the situation of two or more users needing ongoing access to a single file. In these cases, other methods are required.

5.7.3.2. File Sharing

When multiple users need to share a single copy of a file, allowing access by making changes to file permissions is not the best approach. It is far preferable to formalize the file's shared status. There are several reasons for this:

  • Files shared out of a user's directory are vulnerable to disappearing unexpectedly when the user either leaves the organization or does nothing more unusual than rearranging their files.

  • Maintaining shared access for more than one or two additional users becomes difficult, leading to the longer-term problem of unnecessary work required whenever the sharing users change responsibilities.

Therefore, the preferred approach is to:

  • Have the original user relinquish direct ownership of the file

  • Create a group that will own the file

  • Place the file in a shared directory that is owned by the group

  • Make all users needing access to the file part of the group

Of course, this approach would work equally well with multiple files as it would with single files, and can be used to implement shared storage for large, complex projects.

5.7.4. Adding/Removing Storage

Because the need for additional disk space is never-ending, a system administrator often needs to add disk space, while sometimes also removing older, smaller drives. This section provides an overview of the basic process of adding and removing storage.

NoteNote
 

On many operating systems, mass storage devices are named according to their physical connection to the system. Therefore, adding or removing mass storage devices can result in unexpected changes to device names. When adding or removing storage, always make sure you review (and update, if necessary) all device name references used by your operating system.

5.7.4.1. Adding Storage

The process of adding storage to a computer system is relatively straightforward. Here are the basic steps:

  1. Installing the hardware

  2. Partitioning

  3. Formatting the partition(s)

  4. Updating system configuration

  5. Modifying backup schedule

The following sections look at each step in more detail.

5.7.4.1.1. Installing the Hardware

Before anything else can be done, the new disk drive has to be in place and accessible. While there are many different hardware configurations possible, the following sections go through the two most common situations — adding an ATA or SCSI disk drive. Even with other configurations, the basic steps outlined here still apply.

TipTip
 

No matter what storage hardware you use, you should always consider the load a new disk drive adds to your computer's I/O subsystem. In general, you should try to spread the disk I/O load over all available channels/buses. From a performance standpoint, this is far better than putting all disk drives on one channel and leaving another one empty and idle.

5.7.4.1.1.1. Adding ATA Disk Drives

ATA disk drives are mostly used in desktop and lower-end server systems. Nearly all systems in these classes have built-in ATA controllers with multiple ATA channels — normally two or four.

Each channel can support two devices — one master, and one slave. The two devices are connected to the channel with a single cable. Therefore, the first step is to see which channels have available space for an additional disk drive. One of three situations is possible:

  • There is a channel with only one disk drive connected to it

  • There is a channel with no disk drive connected to it

  • There is no space available

The first situation is usually the easiest, as it is very likely that the cable already in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place only has two connectors (one for the channel and one for the already-installed disk drive), then it is necessary to replace the existing cable with a three-connector model.

Before installing the new disk drive, make sure that the two disk drives sharing the channel are appropriately configured (one as master and one as slave).

The second situation is a bit more difficult, if only for the reason that a cable must be procured so that it can connect a disk drive to the channel. The new disk drive may be configured as master or slave (although traditionally the first disk drive on a channel is normally configured as master).

In the third situation, there is no space left for an additional disk drive. You must then make a decision. Do you:

  • Acquire an ATA controller card, and install it

  • Replace one of the installed disk drives with the newer, larger one

Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer's bus slots, there must be an open slot for it, and it must be supported by your operating system. Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:

  • Write the data to a backup device and restore it after installing the new disk drive

  • Use your network to copy the data to another system with sufficient free space, restoring the data after installing the new disk drive

  • Use the space physically occupied by a third disk drive by:

    1. Temporarily removing the third disk drive

    2. Temporarily installing the new disk drive in its place

    3. Copying the data to the new disk drive

    4. Removing the old disk drive

    5. Replacing it with the new disk drive

    6. Reinstalling the temporarily-removed third disk drive

  • Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer

As you can see, sometimes a bit of effort must be expended to get the data (and the new hardware) where it needs to go.

5.7.4.1.1.2. Adding SCSI Disk Drives

SCSI disk drives normally are used in higher-end workstations and server systems. Unlike ATA-based systems, SCSI systems may or may not have built-in SCSI controllers; some do, while others use a separate SCSI controller card.

The capabilities of SCSI controllers (whether built-in or not) also vary widely. It may supply a narrow or wide SCSI bus. The bus speed may be normal, fast, ultra, utra2, or ultra160.

If these terms are unfamiliar to you (they were discussed briefly in Section 5.3.2.2 SCSI), you must determine the capabilities of your hardware configuration and select an appropriate new disk drive. The best resource for this information would be the documentation for your system and/or SCSI adapter.

You must then determine how many SCSI buses are available on your system, and which ones have available space for a new disk drive. The number of devices supported by a SCSI bus varies according to the bus width:

  • Narrow (8-bit) SCSI bus — 7 devices (plus controller)

  • Wide (16-bit) SCSI bus — 15 devices (plus controller)

The first step is to see which buses have available space for an additional disk drive. One of three situations is possible:

  • There is a bus with less than the maximum number of disk drives connected to it

  • There is a bus with no disk drives connected to it

  • There is no space available on any bus

The first situation is usually the easiest, as it is likely that the cable in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place does not have an unused connector, it is necessary to replace the existing cable with one that has at least one more connector.

The second situation is a bit more difficult, if only for the reason that a cable must be procured so that it can connect a disk drive to the bus.

If there is no space left for an additional disk drive, you must make a decision. Do you:

  • Acquire and install a SCSI controller card

  • Replace one of the installed disk drives with the new, larger one

Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer's bus slots, there must be an open slot for it, and it must be supported by your operating system.

Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:

  • Write the data to a backup device, and restore it after installing the new disk drive

  • Use your network to copy the data to another system with sufficient free space, and restore after installing the new disk drive

  • Use the space physically occupied by a third disk drive by:

    1. Temporarily removing the third disk drive

    2. Temporarily installing the new disk drive in its place

    3. Copying the data to the new disk drive

    4. Removing the old disk drive

    5. Replacing it with the new disk drive

    6. Reinstalling the temporarily-removed third disk drive

  • Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer

Once you have an available connector in which to plug the new disk drive, you must make sure that the drive's SCSI ID is set appropriately. To do this, you must know what all of the other devices on the bus (including the controller) are using for their SCSI IDs. The easiest way to do this is to access the SCSI controller's BIOS. This is normally done by pressing a specific key sequence during the system's power-up sequence. You can then view the SCSI controller's configuration, along with the devices attached to all of its buses.

Next, you must consider proper bus termination. When adding a new disk drive, the rule is actually quite straightforward — if the new disk drive is the last (or only) device on the bus, it must have termination enabled. Otherwise, termination must be disabled.

At this point, you can move on to the next step in the process — partitioning your new disk drive.

5.7.4.1.2. Partitioning

Once the disk drive has been installed, it is time to create one or more partitions to make the space available to your operating system. Although the tools vary depending on the operating system, the basic steps are the same:

  1. Select the new disk drive

  2. View the disk drive's current partition table, to ensure that the disk drive to be partitioned is, in fact, the correct one

  3. Delete any unwanted partitions that may already be present on the new disk drive

  4. Create the new partition(s), being sure to specify the desired size and partition type

  5. Save your changes and exit the partitioning program

WarningWarning
 

When partitioning a new disk drive, it is vital that you are sure the disk drive you are about to partition is the correct one. Otherwise, you may inadvertently partition a disk drive that is already in use, resulting in lost data.

Also make sure you have decided on the best partition size. Always give this matter serious thought, because changing it later is much more difficult than taking a bit of time now to think things through.

5.7.4.1.3. Formatting the Partition(s)

At this point, the new disk drive has one or more partitions that have been created. However, before the space contained within those partitions can be used, the partitions must first be formatted. By formatting, you are selecting a specific file system to be used within each partition. As such, this is a pivotal time in the life of this disk drive; the choices you make now cannot be changed later without going through a great deal of work.

The actual process of formatting is done by running a utility program; the steps involved in this vary according to the operating system. Once formatting is complete, the disk drive is now properly configured for use.

Before continuing, it is always best to double-check your work by accessing the partition(s) and making sure everything is in order.

5.7.4.1.4. Updating System Configuration

If your operating system requires any configuration changes to use the new storage you have added, now is the time to make the necessary changes.

At this point you can be relatively confident that the operating system is configured properly to automatically make the new storage accessible every time the system boots (although if you can afford a quick reboot, it would not hurt to do so — just to be sure).

The next section explores one of the most commonly-forgotten steps in the process of adding new storage.

5.7.4.1.5. Modifying the Backup Schedule

Assuming that the new storage is being used to hold data worthy of being preserved, this is the time to make the necessary changes to your backup procedures and ensure that the new storage will, in fact, be backed up. The exact nature of what you must do to make this happen depends on the way that backups are performed on your system. However, here are some points to keep in mind while making the necessary changes:

  • Consider what the optimal backup frequency should be

  • Determine what backup style would be most appropriate (full backups only, full with incrementals, full with differentials, etc.)

  • Consider the impact of the additional storage on your backup media usage, particularly as it starts to fill up

  • Judge whether the additional backup could cause the backups to take too long and start using time outside of your alloted backup window

  • Make sure that these changes are communicated to the people that need to know (other system administrators, operations personnel, etc.)

Once all this is done, your new storage is ready for use.

5.7.4.2. Removing Storage

Removing disk space from a system is straightforward, with most of the steps being similar to the installation sequence (except, of course, in reverse):

  1. Move any data to be saved off the disk drive

  2. Modify the backup schedule so that the disk drive is no longer backed up

  3. Update the system configuration

  4. Erase the contents of the disk drive

  5. Remove the disk drive

As you can see, compared to the installation process, there are a few extra steps to take. These steps are discussed in the following sections.

5.7.4.2.1. Moving Data Off the Disk Drive

Should there be any data on the disk drive that must be saved, the first thing to do is to determine where the data should go. This decision depends mainly on what is going to be done with the data. For example, if the data is no longer going to be actively used, it should be archived, probably in the same manner as your system backups. This means that now is the time to consider appropriate retention periods for this final backup.

TipTip
 

Keep in mind that, in addition to any data retention guidelines your organization may have, there may also be legal requirements for retaining data for a certain length of time. Therefore, make sure you consult with the department that had been responsible for the data while it was still in use; they should know the appropriate retention period.

On the other hand, if the data is still being used, then the data should reside on the system most appropriate for that usage. Of course, if this is the case, perhaps it would be easiest to move the data by reinstalling the disk drive on the new system. If you do this, you should make a full backup of the data before doing so — people have dropped disk drives full of valuable data (losing everything) while doing nothing more hazardous than walking across a data center.

5.7.4.2.2. Erase the Contents of the Disk Drive

No matter whether the disk drive has valuable data or not, it is a good idea to always erase a disk drive's contents prior to reassigning or relinquishing control of it. While the obvious reason is to make sure that no sensitive information remains on the disk drive, it is also a good time to check the disk drive's health by performing a read-write test for bad blocks over the entire drive.

ImportantImportant
 

Many companies (and government agencies) have specific methods of erasing data from disk drives and other data storage media. You should always be sure you understand and abide by these requirements; in many cases there are legal ramifications if you fail to do so. The example above should in no way be considered the ultimate method of wiping a disk drive.

In addition, organizations that work with classified data may find that the final disposition of the disk drive may be subject to certain legally-mandated procedures (such as physical destruction of the drive). In these instances your organization's security department should be able to offer guidance in this matter.