Individual Data Management Strategies


One of the most basic and important aspects of using a computer is understanding where you store your files, and how you take care of your collection.  What do you save?  Why?  Who else might need it?  Who _requires_ it?  Does it need to be backed up?  In addition to knowing your options for storing files at Lehigh, you should also keep some basic strategies in mind.  Having a strategy is equally important, if subtly different, for faculty, students and staff.

1. Do it Yourself

No one other than you is likely to have a clearer understanding of what data you have, what data you need, and how it's likely to be used or how soon it's going to be needed.  It's easy in a large organization to get the idea that everything is handled by someone else, and that your every need will be seen to by a specially assigned functionary.  Unfortunately, your data is too unique to be practically or efficiently handled by someone else.  Others can provide tools to make your data management easier, more efficient and more beneficial, but the actual decisions and management are up to you.

2. Do it Regularly

In today's world more than ever, data builds up constantly, and at an ever increasing rate.  It doesn't organize itself unless someone takes action.  For most scholars, access to information and creation of new information is their lifeblood and their livelihood.  Anyone who doesn't take time to put some order to their information will end up inundated, and their work hobbled, and possibly forgotten with their own departure from their post or organization.  Like cleaning a house, regular, frequent time spent reviewing stored data makes for easier work.

3. Do it with a Clear Understanding of your Data and your Needs

Key things to know about your data (and which get more important the more you have and the more people you share it with) include:

  • Where is your data?

It may seem simplistic, but the absolute first step to managing data is being aware of where and what it is.  Getting a comprehensive picture of your data might require some focussed thought, legwork, and consultation with co-workers.  Is it in Word files?  Excel files?  Family photos you set as your desktop background?  Matlab scripts?  Search criteria?  Photos?  Videos?  Spectra?  Lab notes?  Stored journal articles?  Are they on the hard drive of your assigned office PC?  The instrument interface machine in the lab?  The old PC that you put in the grad student office?  The network file server?  Google?  Course Site?  Other servers?  A $20 flash drive in the desk drawer?  That old external hard drive in the file cabinet?

For many, an actual written description of your data and its locations can be a handy thing to have.  For some grants, it's required.

Some software and devices that you might use don't make it entirely obvious where your data is stored (on your device, or in the cloud), or who else has access to that data and what it's used for.  Keep that in mind when considering information privacy requirements.

Further, when examining storage media like your computer's hard disk, it's important to be able to separate what files are actually ones you've created, which ones are created automatically by the operating system and software, and of all of those, which of them you really need to save.  Personal computer operating systems automatically create storage systems, and segregate each user's files into profile folders by default.  (C:\Users\<username>; Macintosh HD/Users/<username>).  Additional volumes are left to the user to manage.  There's no substitute for familiarizing yourself with the file system on your own computer.

  • How much do you have?

Only when you have a complete picture can you make a full accounting.  Depending on what kind of data you have, and which of it is really yours to manage, its size can vary greatly.  Video files are large, audio files less so, and generally text files and spreadsheets much smaller.

While your operating system will have some tools to allow you to find the size of folders (right-click and Properties, GetInfo; 'Disk Cleanup,' 'Manage Storage'), third party tools such as WinDirStat (Windows Directory Statistics) and Disk Inventory X (macOS) can provide convenient, comprehensive visual representations of your local storage drives.  Google Drive also has a 'Storage' tab to show you the size of each file.

  • How fast is it being created?  (How much more storage will you need, and how soon?)

The next important statistic is how fast are you creating files?  Is it 20 2-GB videos every two months?  At that rate, when will the hard drive fill up?

  • How much is in active use, and how much is in storage?  (How much can be relegated to low-speed or offline storage?)

Most people only actively work on a fraction of their total volume of retained data at any one time.  Even if they have a large staff of researchers.  The level of performance and availability of storage systems substantially affects its cost.  DVD disks?  Amazon Glacier?  Ceph?  I-Drive?

Cloud storage systems offer intelligent synchronization tools that allow users to keep a portion of their files on local hard disks, while lesser-used files stay in the cloud.  Google Drive's for Desktop App, DropBox's SmartSync feature are good examples.

Be mindful of temporary file storage areas like the 'Downloads' folder, and keep them clean -- move files worth saving to storage, delete software installers, etc.

Regularly moving data between these categories takes some amount of time.  Can you automate it?  If you delete something from your active storage, is it kept in your backup?

  • Which data needs to be protected with additional backups?

For everyone at Lehigh, important data should be stored on network file servers.  LTS backs those up as a matter of course.  But -- space on those servers is a limited resource, and frequently, for various reasons, data must be stored on internal hard drives of personal or shared systems, or external systems (like cloud services).  How are they backed up?  Have you tried a recovery?  Can you do it yourself?  LTS has recommendations.

  • What are the most practical ways to break it into groups and smaller, useful pieces?

When data collections get large, the ways to manage and care for them require that it be broken down into more manageable chunks.  Just as in item 1, you're the best person to know the logic for that division.

Data management work for you or your team might warrant an intentional approach.

4.  Prioritize Storage Areas by Security, Co-worker Accessibility

Being a good steward of the information you generate for the university requires consideration of how that data is stored relative to it's security and accessibility by the people who need it.  Storage options vary in those regards, and LTS recommends the following priorities:

  1.  Approved Cloud Storage Vendors
    1. DropBox
    2. Google
  2. LTS Servers / Contracted Services:  Banner, Course Site, PanOpto, Ensemble, Argos, etc.
  3.  LTS-Administered, Backed-Up LAN Storage
    1. H-Drive
    2. I-Drive
  4.  LTS-Administered Volume Storage (R-drive, Ceph)
  5. Backup-Protected Desktop Drives
    1. Internal PC drives
    2. External drives
  6.  Everything else



For immediate help, contact the LTS Help Desk (Hours)
EWFM Library | Call: 610-758-4357 (8-HELP) | Text: 610-616-5910 | Chat | helpdesk@lehigh.edu
Submit a help request (login required)