Order As Received: First Virtual Order for Digital Objects



Order As Received: A Foundational Virtual Order for Digital Records[1]

Patricia Galloway

School of Information, University of Texas at Austin

The archivist deals with the archival collection just as the paleontologist does with the bones of a prehistoric animal: he tries from these bones to put the skeleton of the animal together again....The archivist resembles the paleontologist from still another point of view: both can restore only one particular state of the reconstructed organism, whereas the living organism changed its state again and again.

—Muller, Feith, and Fruin, Manual, 1898

Deleted file information is like a fossil: its skeleton may be missing a bone here or there, but the fossil remains, unchanged, until it is completely overwritten.

—Farmer and Venema, Forensic Discovery, 2005

Introduction

Traditionally, Western archivists have at least claimed to pride themselves on the detection and restoration of the “original order” of the archival fonds, which represented most of the time-honored labor of archival arrangement. This part of archival practice had a rather complex development (so far mostly limited to records having a material form), and its details remain tightly bound to local practice, but although canonical texts of several theoretical persuasions once universally blessed it with the certainty of assertion, by the mid-twentieth century many archivists took it as a matter of faith that since for modern records “original order” is usually a standard filing system, “convenient order” might be much more useful. As backlogs mounted, however, there was a reestablishment of what amounted in effect to a return to “final active order,” in the form of the minimal processing movement. Digital records’ affordances have further complicated this picture. Even in active use they lend themselves to multiple virtual orderings, none of them representing an actual physical ordering of the records on any medium. Further, the way these records reside in a digital file system allow one to set aside the curses of an archival “age of abundance” and investigate what the preservation of more than one order might mean for archivist and historian alike.

In this paper I want to discuss experiments in archiving digital records in a formal “order as received” based upon groupings of digital files on received legacy media prepared by o for a donor and documented through description of the set of derivative orderings available through the original operating system environment. This practice is designed to capture and describe a specific documented state of some part or parts of a fonds, to provide to the potential user a representation of stages in creator activity and archival processing that are normally invisible, and to create a documented basis for other derivative orderings, including those provided by archivists for access and even those that might be contributed by individual researchers and users. I want to suggest that this kind of archival practice is emblematic of a set of practices that can take advantage, without extra work, of existing digital system functions to allow a fuller portrayal of what is really meant by “context” in the digital environment, including the documentation of work practices of records creators—and archivists—that normally remain tacit.

“Original order” in American archival practice: Wisdom as received, ca. 2005

Since the turn of the twentieth century, with the publication and widespread appreciation of the Dutch Manual for the Arrangement and Description of Archives,[2] it has been a fundamental article of faith among Western archivists that materials preserved in an archives must be kept in provenance-based groups representing a more or less “organically” cumulated fonds (or archive group or record group) of some kind and where possible should be preserved in what is referred to as “original order.” These generalizations were developed in the late nineteenth century in Europe primarily for government records then being brought fully under rational discipline by national archives for the first time in the history of emerging nation-states. Since then the concepts have at least ostensibly come to be applied broadly to all kinds of archival materials, from government to business to personal, and have made their way into the enshrined terms of art of North American archival practice. According to Pearce-Moses’s Glossary,[3] “original order” is defined as:

The organization and sequence of records established by the creator of the records....Original order is a fundamental principle of archives. Maintaining records in original order serves two purposes. First, it preserves existing relationships and evidential significance that can be inferred from the context of the records. Second, it exploits the record creator's mechanisms to access the records, saving the archives the work of creating new access tools.

In this definition the term “sequence” seems to be applied rather broadly. That is not too surprising when one looks at the literature: Holmes and Schellenberg, for example, construct sequences all the way from record group down to file unit and even document (if they went that far);[4] where record groups have been abjured in favor of series, the need to shelve physical records still forces sequence. But any single sequence at any level has to be a palimpsest if the records were ever used at all. In fact most bodies of documentary materials do not arrive at the archives in some perfect “original order,” or at best they arrive in an order that may reflect any number of events that have happened to the materials since they were created, were filed, were used, and finally were sent to the archives.[5] This is especially true in the case of privately-created personal documentation normally subject to the informal or idiosyncratic organizational practices of its creator. Given that most records that have a long enough life to reach an archives or to be recognized for secondary use have usually undergone rearrangements in use, especially where an organization has been reorganized or an individual has experienced at least one geographical move, it is clear that what archives have normally received for custody, at least in after-the-fact deposits, amounts to an “order as last found.”[6]

Archivists have, however, often resisted considering the order in which they first see the records as the original order, apparently respecting the ordering principle rather than the actual dialogic use of it in practice by real records creators. Indeed, from the originary summary of the modern practices of arrangement and description by Muller, Feith, and Fruin (MFF), there have been repeated assertions by archival theorists that of course they would rather accept a coherent order from the recordkeepers and any rearrangement is regrettable and a last resort, but the archivist must take on the responsibility of obliterating this unwanted “messy” order and restoring “original order,” acting in accord with any perceptible filing practices to restore the assumed original order and to rationalize any minor departures from it.[7] Although MFF asserted repeatedly that any such changes must be a last resort, that the desires of researchers for subject groupings and complete dismemberment of a fonds for their convenience must be resisted, and that any rearrangement must be fully documented, in practice these warnings have frequently been elided in both governmental and especially in manuscript archives practice.[8]

The sequence of development of the concept through its reception and application by others is interesting. In MFF Chapter II, “The Arrangement of Archival Documents,” the authors begin by laying down the principle: “16. The system of arrangement must be based on the original organization of the archival collection, which in the main corresponds to the organization of the administrative body that produced it”—yet they then proceed to advise methods to use in clearing away prior arrangements of earlier archivists.[9] Jenkinson began his instruction on arrangement by forbidding any change in the order of documents as they arrived at the archives and requiring the preservation of old lists and numeration on the documents themselves. He went further, advising the addition of an accession numeration of all documents in the order received, precisely to preserve that order in case later rearrangement might be found to have been mistaken. Finally, Jenkinson instructed his reader that the goal is to “establish or re-establish the original arrangement.”[10] Less punctilious than his predecessors, Schellenberg attempted to hold back the sea of paper. Although he regarded provenance as sacrosanct, he considered original order as variously applicable according to level. For Schellenberg, the order of series within record group was simply a matter of usability and coherence, while the ordering of items within series might be preserved if it were informative as to administrative process, but filing systems in government use generally provided no evidence of specific activity in their arbitrariness and the archivist could freely rearrange them for use.[11]

MFF presented a metaphor taken from paleontology, used by one of the commenters on section 20 of the Manual, who saw the archivist’s basic work as restoring the structure of the fonds as a paleontologist restores the relationships of an ancient animal skeleton, pointing out that only one state could be restored.[12] This metaphor was quoted by Jenkinson in the context of his arguments for original order.[13] Series of filed materials have thus been fitted into an archivally-created structure deemed to best represent the functional structure of the creating body at some ideal past time.[14] Yet opinions have been varied on how materials should be restored to original order, since although early European writers referred to Dutch and French authorities, in dealing with their ancient records, at least, they all had idiosyncratic institutions to replicate—so the writing on this topic tends to be very much case-based. In current definitions it becomes clear that original order is primarily taken to refer to what Holmes sketched out as the lower levels of arrangement: filing units in a series and documents within filing units.[15] From Pearce-Moses:

Original order is not the same as the order in which materials were received. Items that were clearly misfiled may be refiled in their proper location. Materials may have had their original order disturbed, often during inactive use, before transfer to the archives; see restoration of original order.

Additionally, unfortunately, archivists today rarely follow Jenkinson’s advice in creating records of precisely what changes were made to which records in the reordering, since the overhead of the reordering is already burdensome.[16] At best, a series of standard practices will be outlined in a processing manual, which over time will be revised and its previous versions discarded.[17] It it is therefore very difficult or impossible for the researcher to restore the “original chaos” (see below) and thereby to understand more about the work practices represented by it.[18] In the case of individuals’ records, special collections archivists, inheritors of a tradition that favors organization by document and by functional record types, have frequently taken more drastic measures, since such records frequently come to them upon the death of the creator and much the worse for confusion in such packing-up as they may have been subjected to by relatives. In such a case the records are overtly arranged for the convenience of the user and often according to local categories. In arriving at original order in normal processing practice, whether in the archival or manuscript tradition, it has also been customary to “weed” duplicates and other unwanted categories of materials while this arrangement is taking place.[19] Thus in spite of the complaints of misguided arrangements with which the early literature was rife, archival practice has now been following for nearly a century handbooks in which logical tweaking or even replacement of original order is portrayed as permissible and even called-for.[20] The Glossary, echoing Schellenberg,[21] observes rather more acerbically:

A collection may not have meaningful order if the creator stored items in a haphazard fashion. In such instances, archivists often impose order on the materials to facilitate arrangement and description. The principle of respect for original order does not extend to respect for original chaos.

Yet every archivist who has worked with a “disordered” collection knows that it is nearly impossible for the “disorder” to extend to true randomness or complete chaos: disordered paper materials are after all not shuffled as playing cards may be—in fact cannot be so shuffled because their physical characteristics do not lend themselves to such action. Instead, there are likely to be many loci of order within such a collection, a fact that most authorities admit. Further, if the creator was a messy filer, why should the archivist presume to turn him into a neat one?[22] These archival “norms” are clearly the result of serving two masters—archival convenience and least-common-denominator user requirements—while ignoring the needs of two others: archival managers who want to cut down on processing costs and researchers who want to approach the records with as little intermediation as possible. They also assume physical custody of paper records frozen in time. And finally, these “norms” have not stayed still but have evolved over time, as both quantity and medium of archival records have changed.

Problematics of Ordering

Why is order so important an abstract aspect of archival work? What meaning does “disorder” have? Various commentators have pointed out that the Enlightenment heritage of modern archival practice and its respect for abstract/logical order grew from the intention of reinforcing an idea of the stability of a system of statist discipline (the practices MFF were rationalizing were all about supporting such systems). From this perspective the dialogic interaction between filers and their filing and classification technologies, and the resulting modification of these technologies, in practice and over time, were seen as disorder, to be expunged if possible to preserve stasis/stability. Classical theory also insists (as it must when it pertains to the infrastructure of a system of governmentality) on the intentionality of record creation as a part of the structure of a recordkeeping system, else there is no authority for the system to support. Thus disorder (or “chaos”) may even be read as theft or treason within the active system, or metaphorical theft or treason when researchers wish to use inactive records. In the programmatic literature of archival practice in government there has been little explicit consideration of the meaning that may be present in disorder, because the assignment of abstract intentionality to the locus of authority has erased the existence of low-level file handlers and their work practices, while the affordances of paper have hidden much filing system reorganization.

Modern Ordering of Paper in Organizations

JoAnne Yates’s magisterial study of the flow of communications in modern business organizations via the examination of the communication technologies that supported them provides a revelatory view of the historical introduction of filing technologies and the documentary forms they enabled, and her study is complemented by T.R. Schellenberg’s exposition of the history and practice of the creation and management of active records in American government in the chapters on “Record Management” in Modern Archives.[23] Yates shows how vertical file systems, strongly related to the existing systems innovated by Melvil Dewey for filing library card catalogs, emerged into popularity very quickly when introduced because they provided a previously unknown ease of reordering paper documents (newly efficiently produced via typewriter and reproduced via carbon paper) that could only be in one place at a time, a feature that Schellenberg also observed.[24] The filing systems that vertical files enabled were simple: Yates points to the major choices of numerical, alphabetical, geographical, and subject-based decimal (borrowed from the Dewey Decimal system). Their advantages included the fact that because at least the first three were based on orderings familiar to most school children, they were “self-indexing,” another saving of effort.[25]

The popularity of these filing systems for business and government was that they supported the integration of incoming, outgoing, and internal correspondence documenting a single matter or transaction, promoted the use of correspondence, and favored its preservation as a persistent record. Further, the availability of copying technologies permitted the decentralization of filing. The Taft Commission on Economy and Efficiency for U.S. government studied these practices as they were developing in large companies after the turn of the twentieth century and promoted their adoption in government recordkeeping, for which the subject-based decimal system was preferred.[26] To reproduce these systems, first private secretarial schools and then public education made training in the practices of filing related to them widely available.

Discussion of filing systems, however, is not particularly clear on how they worked in use. Yates provides an image illustrating the difficulties of using previous systems of bound volumes and box files. Schellenberg advises that in the context of office management the appropriate filing system be applied to the appropriate type of files but insists that instruction and manuals be available to those who will do the filing to promote consistency, since “[i]nadequacies of filing are more often attributable to human failings than to failings of system.”[27] Choices might also be a matter of scale: Yates observes that for large and geographically distributed companies registry systems were useful for control, while for small local businesses the simpler schemes sufficed. Collecting archives have long been well aware that individuals’ filing practices carried out for their own purposes, if they exist at all, tend even if influenced by practices learned at work to be idiosyncratic and quite specific to the task at hand. On the other hand, people who do a lot of their own document management at home are likely to carry those practices over into the workplace.

Archival discourse suggests, however, that complex filing systems could not succeed without precise instruction and constant attention or filing specialists who were specifically entrusted with the task—either of which would negate the efficiency that vertical-file based filing systems were supposed to provide. It is obvious that the more complex and subjective filing systems are likely to change over time as the organization or individuals generating the documents change, causing a kind of “filing drift” over time that is familiar to everyone who has done her own filing for more than a few months. This drift will manifest as inconsistency in filing needing correction if “order as received” is taken as representing a single Platonic moment outside of time rather than a single temporally-situated state of the files. Without detailed examination of the files’ content—not to mention an in-depth study of the political economy of filing education, filing accuracy, filing supplies, filing furniture, and the cognitive burden of various filing systems—it is not possible to dispel this assumption. Further, this line of thinking, which animates the profession of records management, operates under the assumption that any information organization activity at any time is a completely routine and routinizable process, whereas in fact the time when lowly file clerks were responsible for it is long past. Many of those who do it now are the same people who created the records being filed, and how they organize those files is apart of the creativity they bring to their work. Research on filing and classification in the digital environment suggests that apparent “mess” in creators’ filing should even be considered valuable for the tacit knowledge it implies or even encodes.[28]

Ordering Digital Records

I want to suggest that especially in the case of digital materials we should not take the drastic step of attempting to establish an idealized original order, and need not do so; it is increasingly obvious that the accepted norms of ordering are an artifact of the affordances of paper as a medium. The affordances of digital records problematize the paper-based notion of order in the first place. A digital record is itself generally a congeries of fragments dictated by an underlying storage scheme optimized for efficiency of access; the records are assembled and presented to the user as wholes by the operating system and/or application program or viewer.[29] Groups of digital records arranged in hierarchical directories are also a construct of the operating system interface in obedience to the user’s choice of representation or to some default assignment of location. The records themselves are not only fragmentary as just noted, but are also not “arranged” on a medium as represented by the directory. New digital files are placed wherever free space is available (and that may be anywhere, given that “free space” includes old fields marked by the system as erased), and the file name in a directory merely represents a pointer to the header of the represented file. Any ordering of digital records, original or otherwise, is thus a representation rather than a physical order (and given the affordances of the display program through which the ordering is viewed, may be only one of several views immediately available to the user, as sorted by title, date, filename, etc.). Representations are the only lenses through which any ordering is perceived in normal use, although at a system level software tools can list media contents as they are physically situated.

From the perspective of specialists in digital forensics, there are three components of the ordering of files as found on digital media. First there is: a “geology” of the results of processes that are autonomous from the average user’s perspective, in that they operate automatically and without direction from the user: most of the management of files, including especially their placement on media, falls under this heading, as does the user interface to the file system, which in current usage offers a kind of metaphor of vertical filing.[30] Second there is an “archaeology” of the processes controlled by the user’s decision, including the “arrangement” of files as the user wants to see them through a virtual file directory that she uses the features of the user interface to construct. Finally, there is a “stratigraphy” encompassing the artifacts of both geology and archaeology, which can be ordered by time-stamps so as to reveal how the current state of a file system came to be as it is. Underneath the familiar representations of the state of the file system, then, there is a plethora of orderings that mesh to keep the system operating efficiently and the user’s desires with reference to the content and arrangement of the files satisfied—and all of this information is potentially available to be used.[31]

Receipt of Digital Records by an Archives

As we have seen with paper records, a first consideration of the archivist is their order, and this is no less true for digital files. Their routes to the repository have been little considered in formal models, but they are a serious part of archival provenance.[32] The possibilities are many, whether the records are received on removable storage media or devices, still stored on an integral disk internal to a computer received by the archives, uploaded directly to the repository by the creator or someone acting for her, or harvested from cloud storage by the archives.

Different sources may have different information to offer about the circumstances of transfer. In the case of removable media, the storage media or devices, of which there are many possible forms, may represent legacy media discovered in an office, backup activities carried out by the creator in normal use, or explicit preparation specifically for transmission to the archives carried out by the creator or someone else. In the case of entire computers used by the creator, we may consider that the records are most likely to be in some kind of “untouched” or terminal state, providing a contextual snapshot of multiple activities in various stages of completion (including the user’s external relationships via the Internet), especially if we know the circumstances under which the computer became inactive before coming into posession of the archives.[33]

Under another option, not yet much in evidence though likely to become more common over time, the record creator will have specifically deposited individual records in an external repository (which might be, by prior arrangement, an archival repository or a commercial cloud storage service), at or near the time of creation, whether through some kind of record management application designed to effect regular capture and storage or through a specific action of deposit. In both cases the order found will probably be some predetermined order into which the materials are made to fit, and it may or may not have been designed through consultation with the creator.[34]

Routes from Desktop to Archival Deposit

|Removable media |Normal storage |

| |Backups (subset of content): Routine, versioning, incremental |

| |Selections for deposit |

|Whole computer |Interim state |

| |Terminal state |

|Direct deposit via network |As captured by RMA to repository |

| |As selected and sent by donor to repository |

| |As recovered from cloud storage |

Backup media may have been created using specific backup utility programs and can represent the whole contents of a computer or some specified subset. Quite frequently, however, backup media may represent a creator’s specific work practices of versioning, as for example when she makes use of removable storage to perform periodic and perhaps piecemeal backup of the different versions of a particular document undergoing intensive work. Some computer users are conscientious in backing up at regular temporal intervals, whereas others will habitually back up groups of files undergoing active work at smaller intervals—usually representing the amount of work they would rather not lose.[35]

Transfer media may also have been prepared especially for archival deposit, whether by the donor, others acting for the donor (including dealers, assistants, or relatives), or acquisition archivists working with the donor, and all of these groups represent different motivations that may be manifest in the order imposed on the media content.[36] In all of these cases, although the backup or copying process will have restored contiguous relationship of file parts if it does not include incremental stages, whether the backup was created by specific software or a simple batch copy made from the file-system display, the order as received undoubtedly has meaning. This meaning may be manifest, for example, in the contrast between the dates listed for the individual items by the operating system and additional dates of those same items as made available in the internal metadata of the individual files.

Where backup copies are deposited on an external server piecemeal or at intervals by the creator, whether through an automated system or by hand, the question arises as to how much influence the creator had over the structure of the file system into which the materials were copied. Whatever might be the case, however, there would be interest in how this practice took place. There are also two means by which such materials might arrive at the archives: downloaded and sent by the creator or downloaded directly by the archives. In both cases there would be concern for the form of the materials transferred—whether they preserved any groupings intact—and a transaction step to be documented.[37]

Even though archivists may not be interested in the meaning overlaid on the materials by what may be seen as third parties, the potential for recovering this meaning can be seen as a valid target for research.[38] This meaning, furthermore, may in fact capture the creator’s ordering in some way, depending on how and when it was carried out. In any case, loss of any of these orderings loses one state of the archival bond, one view of the records’ relationships, and a potential opportunity for contrasting this ordering with other related orderings (as e.g. contrasting an automated backup of an entire computer with the actual computer’s contents).

Arranging Digital Records[39]

Considering that the order as received of digital objects represents a more or less intentional order, it is clear that in preserving it we are preserving something of the work practices of the creator or of people around her or services she used, and as the media in question serve as the interface between the creator and the archives, they also represent at least part of the transfer process. If this aspect of archival materials is of interest, as new views of the continuum of creation and use of records would suggest, the digital affordances discussed above provide a workable possibility for us to preserve this state of the materials and at the same time to expose the professional work of archivists directly. Rather than devising some kind of archivist-created arrangement as the only representation available, we can ingest digital materials into a digital archival repository just as they come to us, according to whatever directory structure may have been created on the medium used to transfer the materials to the archives, creating a virtual “order as received” whose circumstances are fully documented and not making any changes to whatever relationships may have existed among the files and persisted on the transfer medium. Since users can in any case potentially search through all these materials without reference to any ordering, or can use digital tools to order materials according to date, author, or any number of other available attributes, one may adhere to the ideal of “more product less process” and do no more “arranging” at all, thereby preserving one more step in the lives of the records received.[40] If the records are recovered using forensic techniques, there is no need to modify the creating system to generate positional or relational metadata, only to capture what the system has already created.

But in addition, whenever time permits and over time as needs and access requirements change, the archivist may add value—and be seen to add value—by creating a virtual ordering adherent to local archival series arrangements by mapping digital objects from the “order as received” representation into an alternative “local archival order” representation designed to serve one or more groups of potential users.[41] Further, where living authors may be interested in becoming involved with arrangement, an “authorially preferred order” representation may also be constructed; and other virtual orderings suggested by users might also be shared. The point is that in the digital environment, where multiple virtual orderings can be made available and where there can be great interest in seeing any orderings that represent some part of the chain of custody process by which the materials reached the archives and indeed by which they became archival (especially on the chance that they may reflect some ordering through which the materials were used by their creator or primary user), I think it is finally advisable to abandon the too frequently silent emendation by archivists of “order as received” and to recognize such an order as not just a troublesome mess to be easily swept away but as a source of information that should not be discarded.[42]

Implementing Order as Received in the Digital Repository

There are several ways to deal with digital records’ order as received. The closest one can come to preserving the “actual” order, the frequently fragmented order as it actually exists on the medium as distributed by the native operating system, is to capture a disk image or clone of the medium via a sector-by-sector copy captured without creating any metadata recording the fact of capture. Such a disk image contains not only the overt files—those that are displayed by the operating system through an ordinary user interface—but also any partially overwritten “erased” files that may still reside on the medium, all of them distributed across the medium as left by the operating system, as well as any record of system activity appropriate to recording on the medium in question. This is about as close to the archival concept of “fixed to a medium” as one can get in the digital context. A disk image permits the re-representation of the overt file arrangement according to all the affordances of the native operating system when mounted as a disk in a “clean-room” installation of that environment or in an emulation of it—to most archival intents and purposes it is the same disk.[43] Further, files captured in this way are true clones, identical to the files as they existed on the original media (even in being scattered throughout the image), instead of copies of individual files created within an operating system, which is designed to add new time-stamp metadata when the copy function is used. The disk image also permits the recovery of “erased” file fragments using forensic software.[44] Where a donor agreement does not allow the capture in the first place of “erased” files or file fragments, file relationships may instead be captured using non-proprietary “archiving” utilities, like cpio and tar in the UNIX environment, which are designed to preserve overt files without change (although they will have been reconstructed or defragmented, so will not appear as they were at any time “fixed to a medium”) and also to preserve the data needed to restore their current relationships in directories and their permissions and other behavioral characteristics within the native system.

Research being carried out currently will likely establish as standard the use of disk imaging techniques to copy and view for archival purposes a complete image of a magnetic medium.[45] This will obviously permit an even fuller view of the creator’s work processes, especially for older media, since during early use of any magnetic medium it is rarely cheap enough not to reuse and few users to date have known that patterns of former use remain on the medium partly written over by more recent files, or have cared enough to erase the medium more thoroughly through reformatting, which is usually time-consuming. It will be interesting to see what effect this has on recovering “order,” since there is interesting potential for reconstructing previous orderings on most magnetic media that are still intact, because the erased fragments are still present and/or can be recovered via remanent magnetism.[46] Versions of such forensic orderings are at a remove from the user’s grasp of it and there are discussions still to be had about the ethics of recovering information the creator did not intend be seen. I think it likely that from these considerations will emerge a more complex donor agreement addressing these affordances and the archives’ right to access to them that will begin to resemble informed consent agreements. As a practical matter, should the creator agree, it will most likely become standard to archive a disk image of the medium in question along with the overt materials copied forensically from the medium and a specification of their order.

Once archived, however, the question arises of how to represent the captured files and relationships for the archives user. To achieve a representation of order as received, there are at present a couple of options that I have been exploring with my students at the University of Texas at Austin School of Information, limited to a DSpace storage and management environment. The simplest is to create a collection for each media unit, to contain the disk image or overt file bundle plus the individual files extracted from it for convenience of user access. This would allow a researcher able to mount the image or unbundle the bundle in the relevant environment (which could itself be provided in some way by the archives) to access, with the relevant system tools, anything recovered from the media unit, just as the creator would have seen it. But where the researcher only wanted to access individual files, extracted files could be provided. In the case of hierarchical directory trees where the researcher wanted to see some kind of schematic representation of the relationships among the files, a derivative version of the image, constituting overt files obtained from the image or bundle, could be mapped to a virtual structure that would present them in the archival environment in something approximating their relationships on the original media unit. Interestingly, once it is possible to permit the researcher to interact with the image or bundle directly, we must discuss the degree of distance the potential user is likely to want. Emory University, for example, has created an emulation project that allows researchers to sit down and experience Salman Rushdie’s computer environment just as he did, at several different times, but as yet researchers seem to be unclear on how to use it. We are doing some similar work with the Briscoe Center for American History to reconstruct videogame creation environments as used by programmers.

Case Studies: Some Projects in Digital Archiving

From 2009 onward my INF 392K classes in digital archiving have been experimenting with putting these ideas into practice for several projects of processing and ingesting collections of digital objects into our DSpace repository. The sample of materials provided an excellent opportunity to apply this procedure because the records were received on a wide variety of legacy media and represented a like variety of content. In all cases the media processed represented materials set aside from active work rather than a “live” state of a main computer storage device. In the discussions that follow, an individual disk or other medium will be referred to as a “media unit.”

The 2009 projects made archival copies (that is, plain copies with the copying occasion fully documented) and used tar archiving, while the 2010 and 2011 projects were mostly recovered using disk imaging. In one case (the Heather Kelley Papers from the Videogame Archive at the Briscoe Center for American History, 2009) the digital archival team first recovered the contents of seven Jaz disks as tar archives, placing the archive file for each disk plus the files extracted from it into a separate collection. They then created, using the affordances of DSpace for mapping established collections onto virtual collections, an alternate virtual grouping of the extracted files, which represented program elements for a videogame by file categories of interest to a hypothetical user group of programmers. In a second 2009 case (Terrence McNally Papers from the Harry Ransom Center), the 167 disks and 7 CD-ROMs came to the archives in an uncertain order from the Wisconsin Historical Society, but the Ransom Center had preserved that order. The digital archival team copied the disk contents into a single collection per disk where the disks were enumerated in the preserved order as received and then designed a finding aid order for mapping according to the HRC’s practices, designed to serve humanities researchers.

In 2010 we began the routine recovery of disk images from legacy media and archiving in order as received. In 2011 one of the digital archival teams worked with a collection from the Briscoe Center for American History by a well-known linguist from the UT faculty (Denise Schmandt-Besserat Papers), representing research and drafts of her scholarly works. The team established the order as received, based on plastic containers in which the disks were stored, imaged each disk, extracted the files, and placed image and files in ordered collections.

We have already seen some evidence that the practice of preserving order as received can be of benefit beyond the desire simply to preserve a state of the fonds that may be studied and/or experienced by researchers. Notably, in the case of the Harry Ransom Center, which is preparing to create its own digital repository and is accordingly inventorying the backup file copies that were prepared in addition to those placed in our digital repository, it has been found that it is very difficult and time-consuming when managing large numbers of digital files to recover their groupings as originally received unless those groupings are somewhere instantiated as such, whereas maintaining a functional order as received that can be mounted and directly queried enables much simpler management. Imaging alone also provides an intensified guarantee of genuineness in that it captures not only the evidence of the existence of the archival bond between records; it also provides some evidence of how that bond came to be and was maintained. As researchers’ interests grow in the “co-creative” activities of people who are involved in the management of digital materials or otherwise in the production process of these materials (for example, Judith McNally, Norman Mailer’s long-time secretary, who was responsible for all the digital versions of his work at its various stages of completion), the order as received allows a more granular glimpse into the activities surrounding the creation of digital materials. It is in carrying out this kind of study that the relevance of the analytic possibilities of forensic software and other forms of text-mining and analysis tools can be brought to bear by scholars, but none of this will be possible across creator-authored orders if those orders are not preserved.[47]

Conclusions

The role of original order has varied across the historical situation of archival practice. Schellenberg scoffed at Jenkinson’s waste of time in recording a state of ordering that the American saw as of no interest. But Schellenberg was operating in an environment where rational filing systems were either dominant or were seen as needing to be so in order to stem the tide of paper proliferation (which Yates has demonstrated was encouraged by those very filing systems); whereas Jenkinson’s orders were emerging from disciplined file rooms and were made up less of indexical files than of case or project files, always the most problematic to form in the first place and also the most historically telling. Over the past thirty years we have seen the emergence to dominance of digital record-making (though not yet digital recordkeeping), yet in spite of significant efforts by archival researchers, we still have a long way to go before we can preserve and morally defend the digital archives as easily as Jenkinson did the paper ones.

I think we may begin conventionally with file capture, by preserving a digital order as received. I have tried to suggest that we should lie back and enjoy the fruits of the heavy lifting that the designers of computer systems have already done at the fundamental level of file systems. In a sense the “order as received” construct takes advantage of an “order for free” that is emergent from the need for such systems to manage computer files efficiently and effectively.[48] Traces of the live system and the relationships between files residing on it are explicitly captured by disk imaging in a well-documented form that can be repurposed to provide metadata for preserving those relationships and even for exploring them in a live setting, depending upon donor agreement. What is really remarkable is that it can be captured, and in a scope and format that makes it not only visible in a conventional way, but visualizable in additional ways. A preserved order as received preserves the granular provenance of individual files—and one of the most sobering aspects of digital records is that they have to be tracked individually. Further, as has been observed, the order as received, as representing a reasonable original order for individual records, is quite good enough for minimal processing.

From the point of view of the user of the archives, order as received, especially as effected by imaging of the original media, allows a much more detailed view of how the records came to be. Time and again my students have observed that whereas they imagined that digital archivy would be a cold and arid activity, they found that in carrying out the work I have described, the personality of the creator emerged strongly, through the use of specific technologies and programs, through file-naming practices, through versioning habits. And finally, order as received provides a solid anchor for additional orders that may stem from the specifically archival pluralization of digital materials, in that additional orders using it as a beginning point can be created and archived by archivists, researchers, and other interested parties.

Thus I would also like to suggest that this notion of order as received begins to answer to current ideas of a more complete preservation of the complexities of recordkeeping in the digital environment. I have already said that it reflects Debra Barr’s “accession units,” and clearly it also answers to the concept of a “final active order.”[49] We have shown inour experiments with virtual orders that it can support Terry Cook’s call for “new conceptual or virtual ‘orders’ (or ‘series’) for different transactions by different creators.”[50] Peter Horsman’s vision of a “future archivology” in which is instantiated the “concept of preservation of existing structures in an open fonds”[51] calls for the flexibility of preservation of many orders as well, as does Eric Ketelaar when he says “[a]t every stage of the record’s trajectory some ‘archiver,’ while activating the record, tells a story. We have to document these stories.”[52] Tom Nesmith, in redefining provenance to take these new and broader views of “the archive” into account, sums it up:

The provenance of a given record or body of records consists of the social and technical processes of the records’ inscription, transmission, contextualization, and interpretation which account for its existence, characteristics, and continuing history.

For the truth is, archives of the future can look to receive and document many more orders of records, derivative of what they have but also original, offering new combinations and making new links. It is not a bad thing to begin with the order we receive.

-----------------------

[1] In this paper I will not explore in depth the details of archival arrangement, only a tentative discussion of an issue raised by digital records. I also set aside explicit consideration of digital organizational records except as they are a special case of individuals’ records (since desktop records are, in fact if not in law, individually-created records, even when they reflect multiple creators and even in an organization). When I use the term “archives” here I mean the archives charged with preserving the collective memory of a social or political entity.

[2] S. Muller, J.A. Feith, and R. Fruin, Manual for the Arrangement and Description of Archives, 2nd ed., trans. Arthur H. Leavitt (Chicago: SAA, 2003, reprint of 1940 ed.), hereafter MFF.

[3] Richard Pearce-Moses, A Glossary of Archival and Records Terminology (SAA, 2005). Accessed from All citations to this work are reachable from the digital version by term.

[4] See Oliver W. Holmes, “Archival Arrangement—Five Different Operations at Five Different Levels,” American Archivist 27(1), January 1964, 21-41; T.R. Schellenberg, “Archival Principles of Arrangement,” American Archivist 24(1) January 1961, 11-24.

[5] The Australian records continuum approach does recognize these complexities and the necessity to build the ability to record them into digital recordkeeping systems; long before digital records, Peter Scott articulated the notion of “order as last found” (see Colin Smith, “A Case for Abandonment of ‘Respect’,” Archives and Manuscripts 14(2), 1986, 154-168; “A Case for Abandonment of ‘Respect’, Part II,” Archives and Manuscripts 15(1), 1987, 20-28).

[6] Debra Barr, “Protecting Provenance: Response to the Working Group on Description at the Fonds Level,” Archivaria 28 (Summer 1989), 141-145, makes an argument for preserving strict original order in “accession units” of continuing series so as to signal piecemeal transfer of materials (there is an interesting correspondence here with the OAIS concept of the Submission Information Package agreement for transfers to digital repositories). The preference of the MPLP approach to processing also favors acceptance of original order if at all possible, and found in a literature review that a trend was developing in support of this position (Mark Greene and Dennis Meissner, “More Product, Less Process: Revamping Traditional Archival Processing.” American Archivist 68(2), 2005: 208-263; p. 213).

[7] See summaries in Terry Cook, “What is Past is Prologue: A History of Archival Ideas Since 1898, and the Future Paradigm Shift,” Archivaria 43 (Spring 1997) and John Ridener, From Polders to Postmodernism: A Concise History of Archival Theory (Duluth, MN: Litwin Books, 2008); both, however, are more concerned with appraisal than arrangement.

[8] Kathleen D. Roe, in Arranging and Describing Archives and Manuscripts (Chicago: SAA, 2005), advises that the arrangement of manuscripts and personal papers is frequently nonexistent or impenetrable, necessitating arrangement by role if personal, type of material, time period—with a view to assisting access.

[9] MFF, 52.

[10] Jenkinson, Manual, section “Arrangement: Procedure,” [p. 66]

[11] Schellenberg, “Archival Principles.” As we shall see, the break in concern for order at the lower levels of documentation marked not so much a change in theoretical stance as a change in the state of the records received.

[12] MFF, section 20, pp. 70-71, quoted in the epigraph of this paper.

[13] Hilary Jenkinson, A Manual of Archive Administration... (Oxford: Clarendon Press, 1922), 88. Perhaps independently, JoAnne Yates has referred to “the bound volumes and files” of her three case studies of business documentation as “only the skeletal remains of the communication systems that once controlled and coordinated these companies” from which “the muscle and flesh can be deduced….Nevertheless,” she observed, these records “provide both explicit and implicit evidence of thanges in these systems” (Control Through Communication: The Rise of System in American Management [Baltimore: Johns Hopkins, 1989], xix). Yates’s problem was to interpret the reconstructed order she found in the archives, which she did by seeking explicit suggestions of change in the genres and documentary form.

[14] Or perhaps at some historical present.

[15] Holmes, “Archival Arrangement.” Because paper records must be arranged sequentially, an ordering is important at Holmes’s higher levels as well. Rejecting record groups for series still doesn’t get around this, since paper series must be “disciplined” in some way because they have to be sequential, hence to obey some kind of temporal-plus-flattened-organizational-chart order.

[16] Hugh Taylor almost quotes Jenkinson in observing of preserving an accession order: “An older archival tradition had archivists or clerks spending a great deal of time marking each document in a collection with an accession number in the precise order in which the collection was received so that subsequent re-arrangement might be reversable. There may still be some merit in this for series of documents varied in nature yet apparently related in some complex way. For straightforward series it is a waste of time.” See The Arrangement and Description of Archival Records (Munich: K.G. Saur, 1980), 24.

[17] It should be noted that on the evidence of Jenkinson’s detailed description of working with records in “order of arrival,” it is likely that elaborate notes were also accumulated during the remainder of the arrangement process.

[18] For two examples of reorderings wiping out previous meanings, see Peter Horsman, “Dirty Hands: A New Perspective on the Original Order,” Archives and Manuscripts 26(1), 1994, 42-53 and John Randolph, “On the Biography of the Bakunin Family Archive,” in Antoinette Burton (ed.), Archive Stories: Facts, Fictions and the Writing of History (Duke, 2005.)

[19] Jenkinson far preceded current minimal processing ideals in pointing out how laborious and impractical this practice really is if carried out as stated.

[20] We have already seen what MFF, Jenkinson, and Schellenberg have to say. Since 1977 the Society of American Archivists has been producing basic manuals of archival practice, and in spite of lip service to “original order,” the practice of ignoring any “order as received” in order to create a usable order has been consistently advised. See David B. Gracy II, Archives and Manuscripts: Arrangement and Description (Chicago: Society of American Archivists, 1977); Fredric M. Miller, Arranging and Describing Archives and Manuscripts (Chicago: SAA, 1990); and Kathleen D. Roe, Arranging and Describing Archives and Manuscripts (Chicago: SAA, 2005).

[21] Schellenberg observed that preserving filing errors, though it might be interesting in some cases and would even be advisable in the case of the pecadilloes of the famous, would ordinarily be “obviously carrying logic too far” (“Archival Principles”).

[22] Thomas Tanselle brought a needed correction to historical editing practice in the 1970s when he reminded people that correcting George Washington’s spelling did nothing for the authenticity of a historical edition of his letters. See Heather MacNeil’s review of the evolution of literary and historical editorial practice in “Picking our Text: Archival Description, Authenticity, and the Archivist as Editor,” American Archivist 68(2), 2005, 264-278.

[23] This is not a coincidence: the revolution in documentary business communication and the management of business records was directly adopted by American government through the advice of management experts.

[24] Yates, Control through Communication, 56-63; Schellenberg, Modern Archives, 81-82.

[25] See Schellenberg, “American Filing Systems,” (Modern Archives, 78-93). Modern Archives is a significant source here because Schellenberg examined the practices of English-speaking countries in some detail—the book is dedicated to the archivists of Australia, where Schellenberg had been teaching, and it offers comparisons throughout.

[26] Yates, Control, 61-62.

[27] Schellenberg, Modern Archives, 91.

[28] Patricia Galloway, “Big Buckets or Big Ideas: Classification vs. Innovation on the Enterprise 2.0 Desktop,” ARMA Education and Research Foundation Research Report, 2008; accessed 12/1/2011 from

[29] This is true of now dominant random-access media (magnetic or optical disks or flash memory units), although these media are now optimized to keep files in predominantly contiguous stretches, but it is not generally true of formerly dominant sequential tape media. It should be noted that the presence of a fragmentary scattering of files will itself be a sign that the order stems from an active working environment.

[30] In the Apple Macintosh OSX environment, the Finder utility displays the file system is as a literal row of file folders.

[31] See Dan Farmer and Wietse Venema, Digital Discovery (Addison-Wesley, XXXX), Chapter 1, section 1.7, accessed 11/27/2011 from

[32] The Open Archival Information System Submission Information Package specification only suggests that the specific occasion of deposit be documented, but not how the transfer medium is formatted or how it came to be prepared and offered.

[33] Some collecting archives have made arrangements to collect a donor’s outmoded computer at the point when an upgrade occasions its disposal, but given the fact that even powering up the computer can change its stored contents, are now likely to request that the computer not be “tested” to see if it is functional.

[34] So far the only self-deposit that is widely solicited by archives is the deposit of materials by academics into institutional repositories, but at some leading American universities this practice is being mandated and academics are being provided with tools to do it easily, which enforce repository policies directly or permit the mediation of metadata technicians. This was the original idea behind the creation of the DSpace repository software by the MIT Libaries.

[35] Users now often choose to safeguard versions by emailing them to themselves at an external mailbox.

[36] For archival plans in working with creators of personal records see Lucie Paquet, “Appraisal, Acquisition, and Control of Personal Electronic Records: From Myth to Reality,” Archives and Manuscripts (November 2000), 71-91; and the Paradigm Project Workbook on Digital Private Papers (2006) accessed 12/1/2011 at . It would be interesting to ascertain whether creators whose collections are donated or sold to collecting repositories keep copies of their deposited materials for themselves, especially considering the fact that as yet few if any collecting archives accord creators online access to their own materials.

[37] As we begin to see some archival digital repositories taking on storage tasks for other archives (cf. the Washington State Digital Archives, which is said to be accepting deposits from neighboring states; the multi-state PEDALS project, and Florida’s DAITSS project), it is possible to consider the archives’ serving an agency or author as a trusted cloud environment, whether as a funding mechanism as in the first case or as an attraction for donation in the second. This may prove to be a more widely acceptable model than archival supervision of environments it does not control.

[38] For literary holdings there is now a keen interest in the effects of precisely these third parties, sometimes amounting to what may be in effect significant co-authorship.

[39] In the discussion that follows I introduce technological processes capable of preserving the order as received from any digital media, but the primary focus is on legacy removable media.

[40] Greene and Meissner, “More Product, Less Process.”

[41] It is interesting that there has been a good deal of discussion of how archivists might strive to claim credit for their work, yet apart from the discussion of colophons as a means of identifying collection processors and their work (Michelle Light and Tom Hyry, “Colophons and Annotations: New Directions for the Finding Aid,” American Archivist 65 (Fall-Winter 2002), 216-230), archival arrangement has remained invisible and normative.

[42] Compare Barr, “Protecting Provenance,” who with respect to several examples of (re)arrangement uses the terms “destroy” and “obscure” to express their effect on the research value of collections. Her special concern is the interfiling of accretions (“accession units”) to series and the resulting loss of context and temporal process. For researcher interest, see Matthew Kirschenbaum, Mechanisms: New Media and the Forensic Imagination (Cambridge: MIT Press, 2008).

[43] It is possible to go further by making a copy of the magnetic flux patterns on a disk, so that the underlying formatting of the original disk is captured, but it is then necessary to recover the information known to the native operating system using statistical software. This kind of imaging is done in order to recover unknown disk formats so that an emulated controller can then extract a conventional disk image. It may also be used in the adversarial forensic environment to seek evidence that may have been hidden purposefully.

[44] Although forensic software exists to perform such recovery, it is also possible to open a simple disk image made with the UNIX utility dd—which behaves like a single large file—to find “erased” files using more pedestrian utility software designed for system maintenance work and programming.

[45] The CLIR report Computer Forensics and Born-Digital Content in Cultural Heritage Collections, by Matthew Kirschenbaum, Richard Ovenden, and Gabriela Redwine (CLIR, , provides a useful overview of this work. For more technical detail for an archival setting and a discussion of forensic software, see Kam Woods, Christopher Lee, and Simson Garfinkel, “Extending Digital Repository Architectures to Support Disk Image Preservation and Access,” JCDL ’11 (ACM, 2011), 57-66. The latter source is more focused on the contents of system hard drives than on legacy removable media.

[46] Digital forensics specialists distinguish several layers of evidence as manifest in a disk image: digital geology, which reflects the autonomous activities of the operating system of which the user generally has no inkling; digital archaeology, which reflects the intentional actions of the user within the user environment; and digital stratigraphy, which views both of these layers in temporal sequence. See Dan Farmer and Wietse Venema, Digital Discovery (Addison-Wesley, XXXX), Chapter 1, section 1.7, accessed 11/27/2011 from

[47] It should be noted that digital media units bear a significant resemblance to Barr’s “accession units” and clearly preserve the kind of information about work practice that she wished to see preserved.

[48] “Order for free” is an expression fostered by Stuart Kauffman to capture the notion of self-organization arising from complex systems; see At Home in the Universe: The Search for the Laws of Self-Organization and Complexity (New York: Oxford University Press, 1995).

[49] Colin Smith, “A Case for Abandonment.”

[50] “What is Past is Prologue: A History of Archival Ideas Since 1898, and the Future Paradigm Shift,” Archivaria 43 (Spring 1997).

[51] “Dirty Hands,” 51.

[52] “Tacit Narratives: The Meanings of Archives,” Archival Science 1 (2001), 131-141; 140.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download