ICA/SUV

Data Warehouses and Archives:
Challenges and Opportunities for University Archivists

Rebecca Schulte
University Archivist,University of Kansas

Paper Presented at the 2007 Annual Conference of the International Council on Archives Section on University and Research Institution Archives, University of Dundee, Scotland, August 14, 2007



The information that I will be presenting here today has its roots in work that I did as a 2004 recipient of a NHPRC Electronic Records Fellowship. For three consecutive years the National Historical Publications and Records Commission, a United States governmental agency, provided funding for archives practitioners to do research; a luxury that few of us can afford but something that some of us in academic institutions are expected indeed required to do because of our standing as members of the university faculty. This program presented the opportunity for people "in the trenches" to step out of the daily routine of responding to reference questions, organizing records and directing staff projects to look at the big picture and consider their jobs and what they do in an analytical way.

The topic that I chose to study during my Fellowship is the data warehouse, a fixture on many college and university campuses and something that I knew nothing about. The conference organizers have asked me to focus on these two key questions for this presentation:
  1. What are the values of data warehousing in a university or college setting; and
  2. What are the values of data warehousing to archival and records management practices.
But, before I delve more deeply into the mysteries of data warehouses, I'd like to tell you a bit about the University of Kansas and my department the University Archives.

KU is located in Lawrence, Kansas near the center of the United States and about 30 miles from the large city of Kansas City which straddles the Missouri/Kansas border. KU was established in 1866 and has been a member of the Association of American Universities since 1909. We are a Carnegie Doctoral/Research Extensive University with 85 academic departments, 90 research centers, laboratories and units, nearly 30,000 students and more than 4,500 faculty and staff.

The university has a substantial and growing international student population as well as a well-established program of collaboration with many other countries including Peru, Costa Rica, and Italy.

The University Archives is located within the Kenneth Spencer Research Library and is part of the University-wide library system. The Libraries along with Information Technology and Networking and Telecommunications Services report to the Vice Provost for Information Services.

Established in 1968, the Archives encompasses nearly 20,000 linear feet of administrative records, serials, books, newspapers, and memorabilia, over one million photographs, slides and negatives, many thousands of feet of movie film, audio and video tape and more than 450 personal paper collections of faculty, staff and alumni. With 3 staff members and 3 student assistants who work part-time, we do extensive indexing, work with staff in Spencer's Processing Department to encode finding aids in EAD and create collection level records in MARC for the University's online catalog. We also do reference, public programming and work with the faculty to incorporate our resources into their curriculum. The Archives has functioned in a records management capacity since it was established but does not have a certified records manager on staff.

As I began to prepare for this presentation by considering these two questions, I realized that its difficult to separate these concepts of the values of data warehousing in colleges and universities as opposed to the values of data warehousing to archival and records management practices because the values that data warehouses bring to the educational enterprise are also values that archives and records management programs require as well. So at times I may be repeating these same ideas. But before I begin describing these values I think I should provide a few basic definitions.

In 1992, William H. Inmon published what has been called the most widely recognized definition of a data warehouse:

"a subject-oriented, integrated, time variant, non-volatile collection of data in support of management's decision-making process."


Several years later, John Rome, the Associate Provost for University Technology at Arizona State University, wrote this working definition for a university environment:

"an integrated repository of enterprise-generated, departmentally captured, and/or externally acquired data used to facilitate data access, reporting, and tactical/strategic decision making."

He further defines a good data warehouse as one that "can identify data problems in time to avoid them, and can locate opportunities that an institution might otherwise miss."

Examples of data that will be found in an academic data warehouse include: student directory, admissions, and demographics, information about sponsored programs, academic courses and financial data, etc.

In order for a data warehouse to be successful, standardization in reporting and consistency of information is required. As a result the information can be considered reliable and authoritative. As an added bonus staff is given a single place for their reporting needs which saves valuable time. Valuable historical data is provided in the data warehouse and there is decreased reliance on departmental "shadow systems" that may not be as reliable. In addition, transactional systems are protected from having to allocate resources for reporting. This last value is one that I can appreciate as a university department that has very limited resources and makes me think about ways that we can capitalize on similar work done by another campus entity such as the data warehouse. From these descriptions I think it is fairly easy to identify some of the values that data warehouses bring to the academic enterprise.

Early in my Fellowship I wanted to get a wider view of how data warehouses operate on different campuses so I made a few site visits. I visited Arizona State University, the Massachusetts Institute of Technology, Harvard, and Tufts and interviewed both the archivists and the data warehouse systems managers. Their comments have made me realize that even though these institutions are very different there are some commonalities that are widely shared.

Data has become "mission critical", one data warehouse director went so far as to say that their purpose was to "capture history" - I remember thinking "wait a minute; I thought that was my purpose." Several managers noted that much time and many resources are spent on staff training. And since I had just come off of a campus-wide committee that had spent quite a bit of time exploring the issue of informing and training staff to deal with electronic records - that comment also struck a cord.

There are several organizations and tools that have been developed to assist with data warehouse management including: Data Warehousing in Higher Education maintains an online directory database of schools in higher education that use data warehouses. The metadata that is captured about the specific warehouses listed in the directory includes: the URL for the data warehouse, the subject areas covered, the database, the data modeling tool, the schema type and also contact information.

Another resource is the Higher Education Data Warehousing Forum. A network of higher education colleges in 202 institutions world-wide, the HEDW Forum meets once a year and also sponsors a listserv.

I want to take a few minutes to describe DEMIS KU's data warehouse. DEMIS stands for Departmental Executive Management Information System. A few key elements if the systems are:
  • Access is by permission only - so security is maintained
  • Note that the "official record" does not reside on DEMIS - significant as one appraises the data warehouse for university record keeping purposes
  • Staff can request individual reports and use the Web front-end for long-term access
Examples of questions that can be answered by using DEMIS include:

          How many students were enrolled by Sept. 1?
          How many women were on the Geology faculty in FY'03, FY'04, and FY'05?
          How many Master's degrees were conferred in the School of Religion
          in any given year?
          What was the rate of query activity on DEMIS?

As I noted earlier in my presentation the capture of metadata is a valuable asset. Included are data definitions, line by line accounting, and information about comparisons and calculations.

I also discovered during my research that a suite of useful tools has been developed by the National Center for Higher Education Management Systems to assist with metadata creation. CHESS, the Consortium for Higher Education Software Services, provides the following tools: Data Definitions for Colleges and Universities, 2nd ed. 2004, CHESS Taxonomy of College and University Activities, 3rd ed. 2004, and the CHESS MetaData Administrator first release 2004.

Review of these products first made me wonder if here too could be applications that would be useful within the archives world.

Which brings me round to the second question being considered for this presentation:

What are the values of data warehouses to archival and records management practices?

As I began my research several years ago it was clear that little had been written about the relationship between archives and data warehouses. But as I consider this question of shared values several things do come to mind.

Firstly, and most importantly, is the fact that the institution's data warehouse should share the vision and mission of the parent institution (in my example, the university). In order to be truly successful both the archives and the data warehouse must begin with their institution's stated purpose.

Secondly, the staff involved in the development and maintenance of the data warehouse, are also information professionals as are archivists although one main difference being that they are trained from the beginning to work with data in an electronic medium while many of us were not.

I've already spoken about our shared concerns about security, authenticity, and integrity but it bears repeating as these are elements that cannot be forgotten.

Finally, let's switch our focus a bit to the user - my institution has not yet begun to put enough resources into the preservation and access to electronic records being generated by offices and staff - while conversely within that same institution, access is one of the keys to the success of our data warehouse. That department has spent much time and effort to develop a staff training program and making their Web front-end user friendly.

I think the historical value is yet to be determined in terms of the data held in the warehouse. Much will depend upon how administrators will want to use it and how much of it is archived. Nevertheless, the data warehouse is still a part of the modern university and a chapter about its role will need to be included in any institutional histories written in the future.

I am not by any stretch of the imagination an expert on electronic records but it seems to me that some of the basics of electronic records management are in place within the data warehouse environment and it's up to us to make the connection with the data warehouse experts at our own institutions. We must also take the initiative to make the case to our administrators that resource sharing between the institution's data warehouse and the records management/archives could be advantageous to the university.

At this point this is just a glimmer of an idea but I've been told that there may be possibilities here to explore further.

Return to 2007 ICA/SUV Conference Papers