Summary: Digital Archiving: A FEDORA-based Infrastructure to Preserve Electronic Journal Articles, Ronald Jantz, Rutgers University

Ronald Jantz, is Data Librarian at Rutgers University Libraries.  He spends a good deal of time working on digital libraries, digital repositories and digital preservation.

The Rutgers project protects original artifacts from handling or total loss by using digital surrogates.  This is most apparent with non-textual material.  Jantz gives the example of the system storing images of all views of ancient Roman coins along with metadata related to these artifacts.  Researchers can access the items without handling them. If the originals are destroyed, the digital objects still remain. 

Jantz sets the scene for describing the current gaps in digital preservation by making the analogy that at the current rate of preservation of digital information, our society is destroying 100 million books in a 12-year period.  Jantz lays out the goal and objectives of the Rutgers project.  The first objective is perpetual access. The University is focusing on developing what they call the core infrastructure that would support access and preservation.  This infrastructure must be one that is sustainable.  The storage is not strictly an “institutional” repository, since some of the data stored does not relate to Rutgers University . 

The preservation budget is small, so decisions must be made about what to preserve. In 2004 the University installed a 7-Terabyte system and by the following year they were ready to invest in more.

Jantz touched on the migration or transferring of visual material from one media format to another.  This is really the core of lifecycle management but is a costly process. Because of limited funding, Rutgers emphasizes preservation of born-digital or rare artifacts from their special collections.  Examples include e-journals and science data. (e.g., GIS data from the Rutgers Center for Remote Sensing).

Rutgers chose to work with FEDORA (http://www.fedora.info/), open source software, for managing and delivering their digital content for its automatic copying of metadata and the audit trail it keeps of every transaction and that is encapsulated in the object.  Jantz notes that FEDORA and DSpace are quite different. DSpace is an end product. FEDORA is a “digital library operating system” with no associated applications.  Applications may be purchased or available as open source.  For example, Amber Fish is the operating, open source search engine that works on the e-journal platforms.  

The Rutgers Libraries are actively publishing open access journals in Fedora through the Open Journal System (OJS).  Each new journal platform is a clone of an instance of the OJS platform, customized and then turned over to the editor.  The journal platform supports electronic peer review.  When authors submit manuscripts, key words are captured, technical metadata is generated automatically, persistent identifiers are computed automatically, and signatures are verified.  Persistent identifiers may be seen as IP addresses for digital objects.  Digital signatures reside in the metadata and are also kept outside of the repository.  Digital signatures can prevent unauthorized changes.  Article formats offered are both .pdf and Déjà Vu.  Institutional e-journal publishing affords new roles for librarians to actually facilitate the publishing of journals.

Jantz discussed the financial model for e-journals envisioned at Rutgers .  The University is currently absorbing the cost of launching the journal. The long-term plan is to institute an annual fee that in affect is an escrow fund that will help pay for any expected migration.  Current staffing includes four programming staff and graduate library school and computer science students partnered, working part-time and earning wages and graduate credits.