In the December issue of Communications of the ACM, Francine Berman calls for a “Research Data Census, ” a reliable, cost-effective storage and preservation of research data on national scale. Berman says: “Just as the U.S. Census drives planning for infrastructure in the physical world, a Research Data Census would inform cost-effective planning for stewardship of federally funded, shared cyberinfrastructure in the Digital World.”
Berman sees the following benefits derived from the data census:
- Useful estimates of the storage capacity required for data stewardship, and a lower bound on data that must be preserved for future timeframes. Data required by regulation or policy to be preserved is a lower bound on valued preservation-worthy research data—additional data sets will need to be preserved for research progress (for example, National Virtual Observatory data sets).
- The types of data services most important for research efforts. Knowing the most common types of useful services and tools can help drive academic and commercial efforts.
- Estimates of the size, training, and skill sets that will be needed for today’s and tomorrow’s data work force.
Putting together a collective memory for the research community will surely result in many other benefits. Three that come to mind immediately are 1) reducing unnecessary duplication in research efforts; 2) facilitating the linking (accessing, aggregating, and reusing) of data sets to enhance understanding of common research questions; and 3) building the foundation for inter-disciplinary research through the standardization of metadata.