HARRISBURG, Pa (WHTM) — The State Archive is getting ready to move to a new building, under construction along Sixth Street in Harrisburg. But amidst all the packing (70,000 boxes), taking of inventory, and barcoding, they still must see to the daily round, taking in, sorting, storing and digitizing documents.
“We have about 19 million documents digitized,” says State Archive Director David Carmicheal, “but we have an estimated 250 million documents in the archives.”
That works out to a bit over seven percent of their collection digitized-and that’s just in the Archive Building in downtown Harrisburg. They also have a warehouse.
“This building has 250 million documents in it, my State Records Center has about 750 million documents in it, so we’re caring for a billion pieces of paper,” Carmicheal said.
Conserving hard copy documents gets harder, as cheap wood pulp paper nudges out more expensive rag paper.
“The documents we have here that are a couple of hundred years old are actually easier to preserve than the ones that are modern because the paper is so much better,” Carmicheal said.
But with new technology, they are at least no longer trapped in The Red Queen’s race, running as fast as they can just to stay in the same place.
“Now we’re receiving electronic records from the agencies that we’re collecting from, and they’re already electronic, so now we don’t have to do that step of digitizing.”
Digital documents save a lot of space, and a lot of work, but they come with their own set of problems. Obsolescence ranks high on the list.
“If you give me a piece of paper, I can put it on the shelf and leave it for a hundred years,” Carmichel said. “As long as it’s not got too much light and heat, I’ll come back and I can still read it. If you give me a word document on a thumb drive and come back in a hundred years, there’s no chance you’re going to be able to read that word document on the thumb drive.”
Digital documents need complicated preservation actions to make them last. The choice of format is particularly important.
“We do try to use formats that are specially formulated to not disappear,” Carmicheal said. “For instance, everybody uses PDF documents. There’s actually an archival version of PDF, PDF-A, that is completely self-contained. there’s no other software to read, it just needs itself.”
One thing a PDF-A does is embed fonts in the document file so that if, for example, you put a paragraph into a letter using that Klingon font you picked up at a Star Trek convention, it will show up that way even if the computer reading it doesn’t have that font in its own font folder. Most word processing programs have an option to create a basic PDF-A, just by checking a box in the export options. (Yes, do try this at home.) As Carmicheal notes, that’s just one of the things they can do to refresh and preserve digital documents.
But no matter what technology they have, to squeeze more material into a given space, at some point archivists have to say “this much, and no more.”
“We ultimately preserve only about four percent of all the records the state produces,” Carmicheal explained. “Most records have a very short lifespan, and we don’t need them anymore. We only collect stuff that you’re going to need for a very long time.”