"Migrating the entire Arecibo data set, well over a petabyte in size, would take many months or even years if done inefficiently, but could take only weeks with proper hardware, software, and configurations," said Hans Addleman, the Principal Network Systems Engineer for EPOC. The EPOC team provided the infrastructure skills and resources that helped Arecibo design their data transfer framework using the latest research tools and expertise. The CICoE Pilot team is helping Arecibo evaluate their data storage solutions and design their future data management and stewardship experience in order to make Arecibo's data easily accessible to the scientific community.
"Arecibo is an amazing project that has enabled astronomers, planetary scientists, and atmospheric scientists to collect and analyze extremely valuable scientific data over many decades," said Ewa Deelman, Research Director at the USC Information Sciences Institute, and PI of the CI CoE Pilot project.
"The CI CoE Pilot project is very excited to be working with Arecibo, EPOC, TACC, and Globus members in this community effort, making sure the precious data is preserved and made easily findable, accessible, interoperable, and reusable (FAIR). Recently, we have also reached out to members of the International Virtual Observatory Alliance (IVOA), and in particular Bruce Berriman (Caltech/IPAC-NExScI, Vice-Chair of the IVOA Executive Committee) to explore Arecibo's data role in the international community. The collaboration formed around and with Arecibo shows how NSF-funded projects can come together, amplify each other's efforts and have an impact on the international scientific community," Deelman added.
CI CoE Pilot contributes expertise in a number of areas spanning the Arecibo data lifecycle, including data archiving (Angela Murillo, Indiana University), identity management (Josh Drake, IU), semantic technologies (Chuck Vardeman, University of Notre Dame), visualization (Valerio Pascucci and Steve Petruzza, University of Utah), and workflow management (Mats Rynge, and Karan Vahi, USC). The CI CoE Pilot effort is coordinated by Wendy Whitcup (USC).
As a result of Arecibo's limited Internet connectivity, the University of Puerto Rico and Engine-4, a non-profit coworking space and laboratory, are contributing to the data transfer process by allowing Arecibo to share their Internet infrastructure. Further, the irreplaceable nature of the data required a solution that would guarantee data integrity while maximizing transfer speed. This motivated the use of Globus, a platform for research data management developed and operated by the University of Chicago.