OVERVIEW OF CURRENT DIGITIZATION PRACTICES - U.S. Environmental ...

OVERVIEW OF CURRENT DIGITIZATION

PRACTICES

A W H I T E PA P E R F O R T H E E N V I R O N M E N TA L

PROTECTION AGENCY

Amy R. Berman

D o n a l d F. E g a n

Alan S. Linden

JULY 2007

Overview of Current Digitization Practices 1

JULY 2007

INTRODUCTION

This paper reviews digitization practices in use today, and emphasizes factors that

may be applicable to library digitization efforts at the Environmental Protection

Agency (EPA). The Stratus team identified these practices through market research of digital library practices used by various government agencies and universities and through reviews of our collective experience with comparable efforts

in both the public and private sectors. This paper includes recommendations regarding technique, cost, and quality of work for digitization of hard-copy library

materials, taking into account the various document types that may be digitized,

including printed text, rare or damaged text, book illustrations, manuscripts, maps

and other oversized items, photographic prints, transparencies, and microfilm.

The paper addresses the following topics:

?

Document preparation¡ªthe scanning process

?

Processing requirements for scanning, including the types of scanners and

scanning capture software

?

Indexing, including assigning and capturing bibliographic data such as

subject terms for cataloging and Online Computer Library Center (OCLC)

services

?

Storage and archiving

?

Imaging and library standards and policies, including those established by

the American National Standards Institute (ANSI), Association for Information and Image Management (AIIM), American Library Association

(ALA), and Library of Congress

?

Industry performance metrics

?

Document enhancement software

?

Staffing and training requirements.

1

Notice: The views, opinions, and findings contained in this report are those of LMI and

should not be construed as an official agency position, policy, or decision, unless so designated by

other official documentation.

LMI ? 2007. All rights reserved.

DRAFT¡ª3/26/08

1

DOCUMENT PREPARATION

Taking time to properly prepare documents for scanning saves time and money in

the digitization process. When preparing any type of paper document, such as reports or manuscripts for scanning, the following manual steps are typically taken:

?

Remove the document from its binder or folder, or separate it from its

binding (to avoid removing the binding from a valuable document, a book

scanner can be used)

?

Flatten dog-eared pages and ensure the proper orientation of all pages

?

Remove paperclips, staples, sticky notes, and other items attached to the

document

?

Insert separator sheets with bar codes between document sections for indexing purposes.

?

For document collections containing multiple sizes, grouping by size can

lessen time needed to recalibrate scanners and cameras.

Standard: ANSI/AIIM TR15-1997, Planning Considerations Addressing Preparation of Documents for Image Capture Systems, should be followed when preparing documents for scanning.

PROCESSING REQUIREMENTS

Processing requirements address the choices of software and hardware, such as

scanners, computers and monitors, resolution, speed, single- or double-sided scan,

document delimiters, level of color accuracy, image quality, corrections criteria

and procedures, type of character recognition, format, indexing, and standards regarding metadata.

Standards: ANSI/AIIM TR19-1993, Electronic Imaging Display Devices should

be followed when selecting imaging devices and ANSI/AIIM TR34-1996, Sampling Procedures for Inspection by Attributes of Images in Electronic Image

Management (EIM) should be followed when for sampling rules and quality assurance sampling rules on image quality control.

Scanners

In any digital imaging lab it is essential that the correct scanner be chosen to meet

imaging needs. Several types of scanners¡ªsuch as flatbed, drum, and film scanners¡ªare available. High speed scanners are used for standard paper sizes typically batching documents to optimize scanner use. Libraries that have digitized

library collection materials have found that there is no single scanner that solves

2

all digitization needs and that a variety of scanners are needed to complete digitization tasks spanning multiple formats. The type and condition of a document

drives the scanner selection. For example, a flatbed scanner may be used for fragile, unbound documents or for rescanning, but for non-fragile, unbound items, a

high-speed duplex scanner is optimal. Specialty scanners for bound books are

available though usually at a greater cost.

Scanner selection is based on a number of criteria, including the following:

?

Volume (average number of pages and images to be scanned)

?

Scanner duty cycle (average number of scans recommended for a scanner

model)

?

Need for color, black and white, or gray scale scans

?

Resolution and format

?

Document size

?

Single or double sided (also referred to as simplex or duplex)

?

Scanner warranty

?

Maintenance requirements.

Table 1 summarizes recommended resolutions and bit depths for various document types. Resolution is measured by dots per inch (dpi). The information in the

table is based on studies published by a few university libraries and government

organizations. 2 The following subsections address three key scanner features in

more detail.

Table 1. Recommended Imaging Requirements

Document type

Resolution

Books (text pages) 400 or 600 dpi (access

quality)

Bit depth

1 bit (black and white bitonal)

24 bit (color)

600 dpi (preservation

quality)

Rare/damaged

printed text

300¨C600 dpi

8 bit (gray scale)

24 bit (color)

2

Digital Library Federation, Benchmark for Faithful Digital Reproductions of Monographs

and Serials, December 2002, Government Printing Office (GPO) Specifications and Metrics for

Quality Control of Converted Content, March 2006, University of Virginia Library, University of

Virginia Community Digitization Guidelines, March 6, 2006, and Western States Digital Standards

Group, Digital Imaging Working Group, Western States Digital Imaging Best Practices, Version

1.0, January 2003.

3

Book illustrations

or figures

Manuscripts

400 dpi (access quality) 8 bit (gray scale)

600 dpi (preservation

quality)

24 bit (color)

300 (larger pages) or

400 dpi

8 bit (gray scale)

300 dpi (with text)

24 bit (color)

300¨C600 dpi

8 bit (gray scale)

24 bit (color, if color present in original)

Maps and other

oversized items

400 dpi

8 bit (gray scale)

24 bit (color)

Standard: ANSI/AIIM MS44-1998, Recommended Practice for Quality Control

of Image Scanners, should be followed to ensure scanner quality control and continued maintenance of an established level of quality.

COLOR, BLACK AND WHITE, AND GRAY-SCALE SCANNERS

Most scanners offer color features because the cost of color scanning has been

radically reduced. In addition, storage costs per gigabytes (GB) have declined rapidly so storage capacity is less of a cost issue than in the past. Color gives a truer

rendition of the document if colors are present in the original. Gray scale can be

used to improve the scanned quality of low-contrast images.

RESOLUTION

Resolution is the ¡°density of pixels captured in the digitization of an image¡± when

digitized. 3 Images of library materials can be captured at anywhere from 300 dpi

to 600 dpi, depending on the nature of the documents. The resolution should be

determined according to the type of document being scanned, with quality of the

image taking precedence. The Library of Congress¡¯s standard is 300 dpi.4 This

resolution is also recommended by AIIM and should be considered as the minimum resolution for EPA scanning.

Bit depth refers to the number of colors that can be displayed. The higher the bit

depth, the more color variation that can be captured and displayed.

FORMAT

Documents are typically stored as Tagged Image File Format (TIFF), Portable

Document Format (PDF), Portable Document Format for archiving (PDF/A), or

Joint Photographic Experts Group (JPEG) files. The preferred format depends on

3

University of Virginia Library, Internal Production Digitization Standards, March 6, 2006.

Fleischhauer, Carl. Digital Formats for Content Reproductions. The Library of Congress.

July 13, 1998.

4

4

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download