by Valerie Enriquez
Ask a humanities major to envision the concept of research and they would probably imagine long hours in a library or archive, perusing books and documents. However, ask a scientist to visualize research, and they will likely picture collecting data out in the field.
Open Science was born from Newton’s idea that scientific advancement relies on “standing on the shoulders of giants.” The sharing of ideas encourages the advanced development of knowledge. However, in the world of publish or perish, the shadow cast upon the shoulder is doubt: fear that in sharing preliminary data, a researcher may be scooped, to borrow a journalism term. As a budding archivist, I find the idea of preserving knowledge for future use appealing and the fear of being scooped short-sighted when considering the long game. What if raw data from someone’s research could be the missing piece to finding the cure for cancer, or at the very least, figure out why all the bees have gone and what we need to do to bring them back? What if important datasets faded to obscurity without anybody ever knowing about them?
What can we, as librarians, do to help encourage more sharing of research data? Article citation rate helps researchers by providing them with a way to measure their impact upon the literature within their field. DataCite is an initiative to help bring this level of prestige to data publication. So, why not help encourage data sharing and citation through outreach and advocacy? For example, providing handouts or workshops about data research and the proper citation of reused data (as per Altman 2007):
- Dataset Author
- Dataset Title
- Date the dataset was published/made public
- Unique Global Identifier (such as a DOI or Handle)
- Universal Numeric Fingerprint
- Bridge Service (such as the DOI resolver)
There are many tools available to help researchers share data. For example, OpenWetware offers researchers a wiki format lab notebook, where they can share their observations with each other and solicit feedback. Digital repositories such as ORNL DAAC (Oak Ridge National Laboratory Distributed Archive) for biogeochemical dynamics, ecological data, and environmental processes; TreeBASE for phylogenetic information, GenBank for genetic sequences, and PANGAEA for geoscientific and environmental data help ensure that the data created through the hard work of researchers is preserved for future researchers to build upon.
Last summer, I participated in an internship with DataONE, where I attempted to find examples of articles citing data that had been created in prior studies. The experience was like trying to find a friend on Facebook if all I knew about them was their hair color and favorite breakfast cereal. At first, I felt like a failure, since as an information scientist, what else could it possibly have been if I could not find the information I was seeking? However, this turned out to be an opportunity to prove the necessity of enforcing data citation standards and creating tools that track data reuse in the same way that we track article citation and journal impact factors.
What can we do? Ongoing evaluation is needed to determine the impact of data reuse and the need for citation standards. I am currently taking courses in evaluation and digital preservation and curation to learn more about past efforts and see how they have been refined over time. My internship mentor from DataOne is going to coordinate a related project that she refers to as the “Tracking 1000 Datasets Project.” Along with staying on top of trends in data research, we must also drive the creation of standards and tools to best serve our user populations. It is time to stop thinking of research and raw data as merely a step towards getting the end product of publishing. If it is truly a “publish or perish” world, we need to advance the idea of publishing, and helping faculty and students find a place to deposit their initial data could be as much of an outreach and instruction opportunity as helping them find related articles or datasets.
It is little wonder that data librarianship is one of the fastest growing fields in library science. It is up to us to grab such opportunities and stay up to date about the resources available to our users, or risk falling off the shoulders of giants.
Thus, we should lead by example through:
- evaluation of the existing literature and of our own practices
- collaboration with our users, other institutions, and our vendors
- and instruction of our users, new librarians and with our own continuing education.
As I like to think of it, we are all in the process of building: building upon our individual base of knowledge, the knowledge of those in the library science field, and the knowledge of those who require our services. If we do not build upon past information and lessons learned from prior mistakes, our structure will fall with no foundation. If we do not build in conjunction with our present users and creators of tools, we risk having our great tower of learning fall to pieces, walling us in isolation and hindering communication. The past, present, and future of our profession are as inextricably connected as our relationships with researchers ought to be.
Valerie Enriquez is a Fellow with the Association of Research Libraries Career Enhancement Program pursuing an MLIS from Simmons College with a concentration in archives management. Her internships have included the Massachusetts Institute of Technology Center for Advanced Visual Studies, the Harvard Countway Center for the History of Medicine, and the DataONE Project. Her career goal is to use the past to contextualize the present and shape the future of how we seek and process information.