ETANA:
Electronic Tools and Ancient Near Eastern Archives


ETANA Technology Plan

Draft: February 15, 2001

ETANA, in its mission to be the definitive source on the Web for the field of Ancient Near East (ANE) Archaeology, will involve a significant amount of technical infrastructure. This section describes and details the technical components that will support each of the major features proposed for ETANA and provides estimates of their associated costs.

Assumptions

Technical Infrastructure at Vanderbilt University

The Heard Library at Vanderbilt University has invested heavily in technology. It provides a wide array of information services for its users and is actively involved in multiple digital library projects. The Heard Library relies on a variety of computer platforms in its technical infrastructure, and aims to achieve a high level of integration among these diverse platforms.

Novell NDS. The Heard Library has long used Novell NetWare to support file sharing, print services, and software application management for its network. The library currently operates a set of 10 NetWare servers, and plans to add at least one more within the next year. Novell Directory Services (NDS) provides the framework for the organization of network resources and provides a single point of management. The library also relies on NetWare for its Web servers. The Netscape Enterprise Server software integrated with NetWare has proven to be a very flexible and powerful Web server environment.

Large-scale storage. The Library is developing a network storage system for its ever-expanding collection of digital objects. The library is currently involved in projects that involve streaming video (TV News Archive), streaming audio ("Voices of Vanderbilt" collection), and images (Special Collections Photo Archives) in addition to text-based resources. To support these initiatives, the library is creating a scalable network storage environment that can expand to accommodate dozens of terabytes of information. This storage system is being build from rack-mounted Dell PowerEdge servers and high-capacity UltraSCSI storage units. The storage system will operate under Novell NetWare 5.1. This environment supports access for multiple computing platforms: Unix through NFS, Windows NT, and Macintosh as well as Web, streaming media, and FTP access.

Sun Solaris. For applications that require Unix, the version supported by the Heard Library is Sun Solaris. The library currently operates several major systems in this environment, including our Unicorn Library Management System from SIRSI Corporation, SiteSearch from OCLC, ERL/WebSPIRS from SilverPlatter Information, ETD (Electronic Theses and Dissertations) from Virginia Tech. The Library does not currently support Linux.

Windows NT. While the Library does not use Windows NT for file and print services, it does use this platform as a database and applications server. A number of library applications currently operate from Windows NT servers, including a number of Web-enabled databases based on DB/TextWorks. The metadata and search engine components of the library's current digital library projects use DB/TextWorks.

Campus Network. Vanderbilt University operates a campus network using ATM switching technologies for its backbone and 100mb/second Ethernet for local area networks. The University is in the process of migrating from ATM to Gigabit Ethernet for its core network.

Internet Access. Vanderbilt University's campus network connects to the commercial Internet through a 25mb/sec service provided by Sprint. As an Intenet2 member, it operates a 160mb/sec connection with this network.

ETANA: Primary Web server

Functional description

ETANA will be a thoroughly Web-oriented set of services. A Web server will be established that will be dedicated to ETANA related projects. This Web server must have the capability to handle a high number of simultaneous users, be very reliable, be well-integrated into the overall technology infrastructure, and be easily expandable. This server will support the primary Web pages for ETANA, but also will be available to host affiliated projects through its Scholar's Commons section.

Hardware and software components

The technical characteristics of this server might be as follows:

DescriptionEstimated Cost
Server-class computer (Rack mount)$10,500
2-Processor 700 MHz Pentium III CPUincluded
256 MB System RAMincluded
50GB RAID 5 disk storageincluded
High-performance network cardincluded
Novell NetWare 5.1 operating system (25 User)$1005
Novell Enterprise Server web software included
High-performance network port and associated infrastructure.
Vanderbilt University will provide a 100mb switched Ethernet connection for this server.
Access to this server will be provided through Vanderbilt's campus network and through
its connections to the Internet and to Internet2.
VU Contribution
This server will be protected from power outages through an Uninterruptible Power Source. $1000

As described above, the top-level Web server for Etana is currently designated to operate on on NetWare, as are other Web servers supported by the Heard Library at Vanderbilt. Using NetWare will facilitate the support of the ETANA Web server by the library's technical staff. The use of NetWare for this Web server rather than the traditional Apache on Unix approach will be completely transparent to Etana's end users and its to content providers. Database transactions and other specialized applications will be handed off to other servers. This approach is tentative, though. There are also advantages to running this Web server under Unix using the open-source Apache web server. This change would not have a major budget impact.

Technical development

This server will require 8-12 hours by a Network Administrator for initial installation. Tasks involved include the physical set-up of the hardware, installation of the network operating system, connection to the network, configuration of network interfaces, registration with DNS, installation of the Web server software, creation of related user accounts, configuration of file and directory permissions. The Network Administrator will verify the proper and secure operation of the server.

[Cost: VU Contribution of Network Analyst]

Content development

A Web design team will create the pages that define the basic features of the site. This team will be responsible for a consistent look and feel of all the components of the ETANA Web site. They will organize the basic structure of the site, create graphics, style sheets, and templates that will be used throughout the site. This team will include both experts in Web design and those with expertise in the major content areas of the site. We expect this team to take 2 - 4 months to create the initial design of the ETANA web site.

The ETANA Web site will require perpetual maintenance. One or more content editors will need to be involved to facilitate the addition of new material and features to the site. Technical staff will need to be available to maintain the hardware and software, to adjust file and directory permissions, and to assist with integrating new content and features in to the server.

Scholar's Commons

The Scholar's Commons will be a feature of ETANA that provides scholars with a Web-based arena for scholarly communications in multiple forms. The area of ETANA might include Web space to host pages for projects and institutes in the field, a place to manage collections of images, maps, and texts, as well as forums for message-based or interactive communications among scholars. The Scholars Commons might, for example, include general and focused threaded discussion groups and could offer live interactive chat.

As concieved by ETANA, the Scholar's Commons shares many of the features offered by other e-community portals. Sites with similar features include Cognet and Archnet, both developed at MIT. ETANA might enter a collaborative arrangement with MIT to benefit from their experience in the development and operation of an electronic community of discipline-specific scholarship.

ETANA's Scholar's Commons might also include some forms of e-publishing. An e-print server could be established for scholars in the field to self-archive pre-prints of their articles. An ETD server could be created to collect all the theses and dissertations accepted in Universities worldwide related to the field of ANE studies.

Costs. The general Web-space for hosting ANE projects is included in the costs of the Primary ETANA Web server. A generous amount of storage has been specified to accommodate collections of project Web pages that might ultimately be hosted by ETANA. The implementation of e-community features into the Scholars Commons will involve design and development tasks. Open Source software is available for the management of e-communities from Arsdigita. In order to benefit from MIT's experience in customizing this software into an academic discipline-specific portal, they will license their source code to ETANA for a cost of $xxxxx . This source code would require significant customization and reworking as it is applied to the ETANA Scholar's Commons.

The general infrastructure for ETANA and the ATK includes the ability to accommodate additional image collections that might be added the site. As large collections of images are added to the site, the organizations that own the images would assume the costs of incremental expansion of the network storage servers.

The software currently available for ETD and e-print servers operates under Unix. These two services could exist on the same server. An appropriately configured computer to run these two services would be about $10,000, assuming a configuration of Sun Solaris on a dual-processor Intel server. The ETD and E-Print software is available without cost. About five days of effort from a Unix systems administrator would be required to configure the hardware, install the operating system, and install and configure the software applications.

Abzu

Functional description

Abzu currently operates as a Web site that provides access to material on the Internet related to the field of ANE studies. This project, developed and supported by the Oriental Institute of the University of Chicago, is the most comprehensive guide to material in the discipline. It has a wide audience and is heavily used.

It its current form, the content of Abzu is managed completely through HTML pages. We envision transforming Abzu into a database-driven application that then becomes the metadata engine that ties together not only the ANE resources that exist throughout the Internet, but also those that are directly part of ETANA.

We estimate that Abzu currently describes more than 10,000 ANE Web resources, and receives about 30,000 user sessions per week.

Under ETANA, the content in Abzu would be converted from its current HTML form into database records. The database management system currently used at Vanderbilt is DB/TextWorks from Inmagic, Inc. Preliminary analysis indicates that DB/TextWorks will work well as the basis for Abzu.

The process of developing a new database-driven Abzu will involve establishing a database server appropriately scaled, designing a database structure, programming user query screens and record presentation pages, and implementing a search and retrieval interface. These interfaces would be created in Perl, making use of the SQL-based ODBC interface available in DB/TextWorks.

Once the database server has been appropriately configured, the data from the current Abzu site will be harvested and converted into database records and loaded into DB/TextWorks.

A Web-based interface will also be created for submitting new resources to the Abzu database. This interface will include a feature that allows all new postings to be reviewed by the designated Abzu editor before becoming part of the live Abzu database.

Introductory and query pages related to Abzu will reside on the primary ETANA Web server. As users browse or search for materials within Abzu, these will be processed by the Abzu database server. The interactions between the ETANA web server and the Abzu database server will be transparent to its users.

Hardware and software components

The technical characteristics of this server might be as follows:

DescriptionEstimated Cost
Server-class computer (Rack mount)$12,800
4-Processor 700 MHz Pentium III CPUincluded
256 MB System RAMincluded
50GB RAID 5 disk storageincluded
High-performance network cardincluded
Windows 2000 Advanced Server Edition (25 User)included
Microsoft Internet Information Serverincluded
Inmagic DB/TextWorks$1,100
Inmagic ODBC Driver for DB/TextWorks$5,280
ActivePerl for WinNT$0
Win32::ODBC Perl Mod$0
High-performance network port and associated infrastructure.
Vanderbilt University will provide a 100mb switched Ethernet connection for this server.
Access to this server will be provided through Vanderbilt's campus network and through
its connections to the Internet and to Internet2.
VU Contribution
This server will be protected from power outages through an Uninterruptible Power Source. $1000

Technical development

This server will require 8-12 hours by a Network Administrator for initial installation. Tasks involved include the physical set-up of the hardware, installation of the network operating system, connection to the network, configuration of network interfaces, registration with DNS, installation of the Web server software, creation of related user accounts, configuration of file and directory permissions. The Network Administrator will verify the proper and secure operation of the server.

Preparation of the server to function as a database server is a 2 hour process includes installation of Active Perl, installation of the Win32::ODBC Perl module. The DB/TextWorks software and the DB/TextWorks ODBC driver will also be installed.

The more complex work involves the design of the database structure and the creation of the query forms, the pages that display results, and the programs that control general search and retrieval functions. If this work is modeled on similar projects that have been developed at Vanderbilt, we expect that this portion of the work will involve about 5 days of programming time.

An additional 2-5 days of programming time will be necessary to create the layer that allows new submissions to be screened before they are entered into the live database.

A considerable amount of data currently exists for Abzu. It would be desirable to create a set of scripts and programs that can harvest the information from the current HTML pages and convert them into database records. The programming time to create these programs and to convert the existing data would be about 10 days.

Content development

Once the new Abzu site has been created, one or more editors will need to manage its content. Abzu is currently managed by the Oriental Institute at the University of Chicago. This project assumes no change in this editorial arrangement.

The Oriental Institute will contribute editorial staff time to the new version of Abzu at about the same levels as are currently in place.

Archeologist's Tool Kit

Functional description

The Archeologists Tool Kit (ATK) will be a new software application that will be developed under the direction of ETANA.

The primary goals of this application will be to provide software tools that will facilitate the management of data related to archeological excavations in the field of Near East Archeology, to provide tools for the analysis of archaeological data, and to expedite the process of disseminating findings. The ATK will be designed to be interoperable with other key resources in the field. Its metadata could be incorporated into other archaeological information services through the protocols established by the Open Archives Initiative.

The ATK will ultimately manage a very large amount of data and will have a moderate number of simultaneous users. The technical infrastructure will be scaled accordingly. The specific hardware and software platforms have not been determined in advance. These decisions will be made as the overall design, features, and expected use patterns are developed.

Some of the features of the ATK will include:

Data Collection Interface. An environment that can be used on-site at archaeological digs to record all the pertinent information about all objects discovered.

This environment must be portable, rugged, and efficient. We anticipate that this portion of the ATK would operate on laptop computers or hand-held Personal Digital Assistants.

The application must have the ability to describe a wide variety of objects and artifacts, including: rocks, bones, pottery, coins, carvings.

The system must have the ability to record a large number of data elements regarding each item. Each site uses a complex matrix of locus numbers, grid coordinates, and geospatial data to describe the horizontal and vertical location where each item was discovered. A number of other data elements describe the condition of the object, the type of soil surrounding it, and the like.

Data Transmission system. Data collected in the field will need to be transmitted to the central servers on a regular basis. We anticipate either real-time data transmission, or at least daily uploads. The means for delivering the data to the central servers may vary according to the infrastructure available at each dig site. In some cases Internet connections may be readily available. Given the nature of the discipline, there may be many sites far removed from direct Internet access. The ATK may need the ability to use satellite based communications to transfer information from the data collection clients to the central server. Another alternative would be to allow sites to install local servers for data collection during the dig season, and then create a means by which the data are transmitted in bulk at regular intervals during the season and at the end of the season.

Digital Imaging. Each object may also have one or more digital photographs associated with it. The ATK will have as a large component a digital imaging system to manage these photographs. The images themselves will be housed on a large-scale storage system and will be accessed through metadata in the ATK. The digital imaging component will rely on incremental expansions of the storage systems currently being created at Vanderbilt University for other multimedia digital library projects.

Data Repository. Once the data are collected at the dig site, they will be transferred to one or more database servers. These database servers will have the ability to accommodate data from a large number of archeological dig sites.

One of the major challenges in the construction of the Archeologist's Tool Kit involves the development of data structures that are versatile and robust enough to accommodate the practices of all archaeologists in ANE. In the current environment, there are significant differences in the way that each researcher in the field collects, describes, organizes, and interprets their findings.

The success of the ATK will involve both a technical design that is versatile enough to accommodate differing practices and the development of standards by which archaeologists can describe their findings consistently. ETANA will sponsor a set of meetings, conducted both physically and virtually, to define the standards that will underlie the ATK. This group will also be asked to participate in the developing functional requirements of the ATK software. The group that will be assembled to develop these standards will include an assortment of archaeologists, database design specialists, interface design specialists and a trained facilitator.

Analytical Tools. Not only must the ATK provide raw storage for the data related to each dig, it should also provide the means to summarize, analyze, and process this information. Based on specific needs articulated by the archeologists, ETANA will develop programs that will facilitate the analysis of the archaeological data. It should be possible for researchers to use computer analysis for the study of their own site, and for scholars and researchers to use data from multiple sites to discern patterns that otherwise would not be possible when data from each site is managed separately.

Data Publishing components. One of the primary goals of the ATK involves collapsing the time interval that exists between when data are collected at a dig site and when that information is published to the scholarly community.

A key component of the ATK will be an interface that provides access to this repository of archaeological data to the scholarly community and to the general public.

But this access will be controlled. Each archaeologist will have the ability to suppress public display of their data until it has been completed and verified. The ATK will be built to include authentication, authorization, and rights management features that can be tuned to accommodate any of a number of decisions that might be made regarding the access of its content.

Other efforts have been identified with similar goals to the ATK. As part of the design of the ATK, each of these other projects will be studied for areas of overlap, and opportunities for partnership and collaboration.

Technical Development

The creation of the ATK will be a large project with many components. Although we anticipate some of the general characteristics of the proposed ATK, there are many features and design elements that will need to be determined as the project goes forward.

Project Staffing. A project manager will be assigned to coordinate the technical development process. These coordination activities will total about 8-10 hours per week for the duration of the development cycle. The cost of this position will be contributed by Vanderbilt University.

A programmer will be hired in a one-year term position to write the programs that underlie the ATK. This position will be funded by the Grant for 1.5 years at an expected annual salary of $50,000.

Throughout the development process, archaeologists in ANE studies will be asked to participate in the design, feature specification, and in the testing of the application.

Some development tasks will be contracted out.

Early Core Texts

The technology costs related to the Early Core Texts digitization process will include the network storage costs and the access software, and the associated support costs. The storage for these core texts will be provided by the same network storage systems that support the Digital Imaging components of the ATK. If we estimate a storage requirement of 1.5MB per page, and 300 pages per volume, an average volume will require 450 MB of disk storage. At a cost of $40 per GB for online magnetic storage, the incremental costs for the required storage will add an additional $15 To the $540 cost per volume estimated for the digitization process. We will also need to obtain presentation software that ETANA users will use to view these materials. We anticipate that we will be able to use sofware used with other digital library projects, such as that created for the Making of America collections.

High definition digital images

The InscriptiFact Project sponsored by the Library at the University of Southern California is in the process of creating high resolution digital images of a large collection of tablets, papyri, and other ancient written artifacts. To make selected images from this digital collection available to the general audience of ANE scholars, ETANA will be linked to InscriptiFact. Mechanisms will be implemented within ETANA content so that references to items held within the InscriptiFact collections can be easily viewed through hyperlinks.

The process of creating the links between ETANA and InscriptiFact will be somewhat complex. InscriptiFact includes multiple images for each artifact. References to tablets may be well satisfied by designating the most general view and the best quality image available, or the text may refer to a specific image.

The most practical solution for this feature would be through a linking mechanism based on OpenURL. References in ETANA would be coded with URLs that embed sufficient metadata to identify a single image within InscriptiFact to be presented.

Etana users will also have the option of connecting to the InscriptiFact resource directly to take advantage of the complete features available in their advanced interface.

This OpenURL would interact with a Link Resolver server that would in turn communicate with the InscriptiFact database server to retrieve the desired image based on the metadata associated with the OpenURL.

The image linking mechanisms will be developed by the InscriptiFact staff at the University of Southern California. They estimate a cost of ??? for this component of the project.

Technology Infrastructure Upgrades

The set of projects subsumed under ETANA will involve the need for some general technology infrastructure upgrades at Vanderbilt University, the primary host site. These upgrades include additional power protection equipment and additional data backup devices.

The Heard Library currently relies on a tape backup system to ensure the safety of all the data on its servers. A system based on 8mm tapes, including tape drives from Contemporary Cybernetics and the Arcserve Enterprise Edition software from the Cheyenne division of Computer Associates. As the volume of data increases that needs to be backed up each day, additional hardware is needed. We currently have three tape drives available for the backup system. We propose to add an additional tape drive with a robotic tape changer.

Archival Preservation

As the facets of ETANA create content, it is our obligation to guarantee that this content be permanently preserved. In order to ensure that these digital materials continue to be accessible into the future, data security and digital archival preservation will be well integrated into the ETANA projects.

While Vanderbilt University will operate the online systems that provide access to ETANA content, Case Western Reserve University will maintain digital archives of these digital collections. Vanderbilt will follow standard data center backup procedures to ensure that the active copy can be restored in the event of any hardware failure, software event, or physical catastrophe that might occur. Data will also be regularly transferred to CWRU for permanent archiving. These data will not be stored in data formats according to current standards and on the current generation of optical storage media. As data standards evolve and as media formats evolve, CWRU will migrate all ETANA content through these generational changes.

Costs. CWRU will obtain sufficient digital storage capability to hold a copy of all ETANA data. We propose that they maintian one copy on online magnetic storage at all times and they they also create copies on DVD-ROM or CD-ROM at regular intervals. The amount of online storage needed at the outset of the project will be minimal, but will grow gradually as ETANA becomes a production service. We might project that the eventual volume of storage for ETANA will grow to as large as one Terabyte within a three-year period. The cost to expand existing storage systems at CWRU to accommodate this volume of storage would be about $30,000 today, but actual costs will likely diminish by the time that this capacity is needed. A DVD-ROM Recordable drive for burning the data onto optical discs currently costs about $5,500.

Born Digital Electronic Publications

ETANA intends to create an electronic publishing infrastructure for existing journals of the professional societies in the discipline and new publications. This effort, however, will be deferred for at least one year. The development tasks already proposed are aggressive, and we feel that in the next year there may be facilities emerging to accomplish this goal more effectively than those available today. We are interested, for example, in the DSpace (http://web.mit.edu/dspace/) digital library environment being developed at the Massachusetts Institute of Technology in partnership with Hewlett-Packard.

We anticipate developing a future proposal towards the development of an electronic publishing component of ETANA.