Summary: Digital
Archiving: A FEDORA-based Infrastructure
to Preserve Electronic Journal Articles, Ronald
Jantz,
Rutgers
University
Ronald Jantz, is
Data Librarian at Rutgers University Libraries.
He spends a good deal of time working on digital libraries, digital
repositories and digital preservation.
The
Rutgers
project protects original artifacts
from handling or total loss by using digital surrogates.
This is most apparent with non-textual material.
Jantz gives the example of the system storing images of all views of
ancient Roman coins along with metadata related to these artifacts.
Researchers can access the items without handling them. If the originals
are destroyed, the digital objects still remain.
Jantz sets the
scene for describing the current gaps in digital preservation by making the
analogy that at the current rate of preservation of digital information, our
society is destroying 100 million books in a 12-year period.
Jantz lays out the goal and objectives of the
Rutgers
project.
The first objective is perpetual access. The University is focusing on
developing what they call the core infrastructure that would support access and
preservation. This infrastructure
must be one that is sustainable. The
storage is not strictly an “institutional” repository, since some of the
data stored does not relate to
Rutgers
University
.
The preservation
budget is small, so decisions must be made about what to preserve. In 2004 the
University installed a 7-Terabyte system and by the following year they were
ready to invest in more.
Jantz touched on
the migration or transferring of visual material from one media format to
another. This is really the core of
lifecycle management but is a costly process. Because of limited funding,
Rutgers
emphasizes preservation of born-digital
or rare artifacts from their special collections. Examples
include e-journals and science data. (e.g., GIS data from the
Rutgers
Center
for Remote Sensing).
Rutgers
chose to work with FEDORA (http://www.fedora.info/),
open source software, for managing and
delivering their digital content for its
automatic copying of metadata and the audit trail it keeps of every transaction
and that is encapsulated in the object. Jantz
notes that FEDORA and DSpace are quite different. DSpace is an end product.
FEDORA is a “digital library operating system” with no associated
applications. Applications may be
purchased or available as open source. For
example, Amber Fish is the operating, open source search engine that works on
the e-journal platforms.
The Rutgers
Libraries are actively publishing open access journals in Fedora through the
Open Journal System (OJS). Each new
journal platform is a clone of an instance of the OJS platform, customized and
then turned over to the editor. The
journal platform supports electronic peer review.
When authors submit manuscripts, key words are captured, technical
metadata is generated automatically, persistent identifiers are computed
automatically, and signatures are verified.
Persistent identifiers may be seen as IP addresses for digital objects.
Digital signatures reside in the metadata and are also kept outside of
the repository. Digital signatures
can prevent unauthorized changes. Article
formats offered are both .pdf and Déjà Vu.
Institutional e-journal publishing affords new roles for librarians to
actually facilitate the publishing of journals.
Jantz discussed the
financial model for e-journals envisioned at
Rutgers
. The
University is currently absorbing the cost of launching the journal. The
long-term plan is to institute an annual fee that in affect is an escrow fund
that will help pay for any expected migration. Current
staffing includes four programming staff and graduate library school and
computer science students partnered, working part-time and earning wages and
graduate credits.