Archivists and TechnologySonia YacoOld Dominion University, Norfolk, Virginia Paper Presented at the 2007 Annual Conference of the International Council on Archives Section on University and Research Institution Archives, University of Dundee, Scotland, August 15, 2007 Archivists are at a strategic crossroads. As a profession with a long history, it is easy to rely on our fine tuned skills for appraising, describing and providing access to our collections. However, in so doing we are putting ourselves at risk. Archivists cannot afford to let our professional abilities stagnate while the world of technology continues to evolve at an ever-quickening rate. Lack of technical skills among archivists bars us from utilizing a variety of new technologies that can be used as tools for processing and for providing better access to our collections. Technical tasks ranging from Web page design, to EAD encoding, setting up components such as style sheets and SGML/XML/XLS/HTML parsers for publishing EAD on the Web are increasingly needed to insure archives are visible to users. Additionally understanding new technologies like POD casts and voice recognition software will help us to increase access to our collections. Today I will be discussing two different research projects I've conducted on different aspects of using technology to facilitate discovery of our holdings. First I will discuss a pilot study I conducted evaluating the potential for using voice recognition software in the appraisal and transcription of oral history tapes. 1 Automatic computer transcription of spoken word material could revolutionize archivists and researchers intellectual access to this material. Audio content, now hidden in sparsely described catalog records, could have key word indexes automatically created with voice recognition software. Low cost transcripts could be used by archivists evaluating and describing spoken word material as well as by researchers seeking to understand the content of tapes before visiting a remote archives. Secondly I will look at the results of a survey I conducted with sixteen archives in the United States examining barriers to the implementation of EAD. One of the major findings from that survey is that there is a gap between the technology needed to encode and publish EAD and the skills of many archivists. Voice recognition software, simply put, is software that transcribes spoken words into written text. Use of this software has the potential of increasing intellectual access to the riches of oral histories and other audio tapes. Having computers automatically create transcripts could greatly increase the speed at which archivists could process tapes, increase the richness of their description and make inexpensive transcripts available to researchers. Sound tapes, which typically come into Archives on analog cassette or reel-to-reel tape, could be converted to digital format, and then automatically transcribed with this software. The mechanism for using voice recognition software begins with a user speaking into the microphone attached to a computer. There are two aspects to voice recognition technology: acoustic phoneme recognition - which is the software mapping sounds to words and statistical language modeling -- weighing the probability that any given word will appear in a sentence. These two components together make it possible for the software to choose between similar sounding words for the most statistically likely one. In my pilot study I evaluated two types of voice recognition software. Dragon NaturallySpeaking™ is speaker dependent, designed to recognize the voice of a specific speaker. AudioMining is a type known as speaker independent, intended to recognize multiple speakers' voices. The spoken material I used as test data in this study was a series of cassette tapes containing interviews conducted by University of Wisconsin-Madison emerita Law Professor June Weisberger in 2004. Weisberger interviewed eight people about their role in the creation of the 1984 Wisconsin Marital Property Reform Bill, which sought to create equal rights for women to property for during a marriage. The nine speakers on the tapes are college educated Caucasians ranging in age from 60 to 80. Eight of the speakers are women and one is a man. They were raised in the Midwestern or eastern part of the United States. For all of these speakers, English is their first language. Table 1 shows the transcription of the first sixty seconds of one of the tapes, by three different transcription methods -- manual, by Dragon Naturally Speaking and by AudioMining. There is significant variation between the three methods of transcription. Dragon NaturallySpeaking™ bears almost no resemblance to the manual transcription. Only three phrases from the entire 60 second transcript were accurately transcribed, 'where we were', 'all of these resources' and 'actually became one of our'. None of these phrases would be helpful in determining the subject of the tape. There is much more of a correlation between AudioMining's transcription and the manual transcription. Although the first few phrases are not transcribed accurately, important key words in the rest of the including, 'Divorce Reform Act,' are correctly transcribed.
Findings Dragon NaturallySpeaking™ produced a transcript that has little relation to the actual contents of the spoken interview. Few important keywords were transcribed. AudioMining was much more accurate in transcribing Professor Weisberger's interviews. It created transcripts that conveyed a good sense of the general topics covered in the interviews. Many important keywords including 'divorce', 'commission on women', 'marital property reform', and even 'Weisberger' were correctly transcribed. However AudioMining is very expensive, approximately $5,000 2 , and requires a great deal of technical expertise to install and use. My criteria for success in this study was for the software to accurately create a written transcription of the contents of the tapes that could be used by archivists evaluating or appraising them, and by researchers wanting access to the information contained on the tapes without listening to them. The voice recognition software I evaluated in this pilot study produces results that are too inexact to provide substantial assistance for archivists or researchers in their current versions. However, the key word identification done by AudioMining show great promise. Combined with tools that analyze the frequency of keywords in a text, AudioMining could allow researchers to discover relevant collections of spoken word material. 3 There are several new voice recognition products that have been released in the last year such as Sonic Foundry's Mediasite4 , which contains a voice recognition component with keyword searching capabilities for videotape recordings; and Blinkx 5 , a web search engine that utilizes automatic speech recognition technology to crawl audio video content on multi-media web sites. The continued development of video, audio, commercial and personal computer voice recognition software suggests that we are tantalizing close to the development of technology that could provide keyword search capability for oral histories. Now that we've discussed one technology that could help provide access to some of our collections in the future, I'd like to discuss the problems involved in implementing a current technology tool designed to increase access. That tool is Encoded Archival Description - EAD. This markup language was designed as a way to standardize finding aids and in so doing facilitate discovery of archival collections across multiple institutions in on line environments. What prevents archives who want to implement EAD from doing so? The answer nine years ago when it was first released, was that the technology simply was not there - there was no software readily available for editing or viewing EAD. Today the answer to the question appears to be more complex. This spring I surveyed sixteen archival repositories to try to identify what those obstacles are and how they might be removed. Through this survey I was able to identify three main barriers to implementing EAD and have explored some possible solutions to those barriers. In order to see what issues there might be among current implementers, I looked at the 78 EAD Implementors Listings on SAA's website. I identified several trends from them: They use a complicated software work stream - some of them use six different packages to markup and publish their finding aids in EAD. Numerous institutions had server and technology problems in the implementation process. Of the solutions found for these problems, the most consistent one mentioned was the use of outside resources for various phases of the process.
| ||||||||||
|
[1] This study is described in more detail in Sonia Yaco, 'The Potential for Use of Voice Recognition.'
'Software in Appraisal and Transcription of Oral History Tapes,' Association of Recorded Sound Collections Journal 2007 (Fall), 214:225.
[2] The full retail purchase price of AudioMining July 2007 was $5,000. [3] An example is the search engine Grokker, http://grokker.com. [4] 'Mediasite.com,' http://www.mediasite.com, (accessed: 13 April 2007). [5] 'Video Search Engine Blinkx,' http://www.blinkx.com, (accessed: 21 October 2007). |