August 2, 2004

Mr. Beagrie, in modern societies information is expected to be digitally available. Why is this a threat to the future viability of this information?

Beagrie:In the right conditions papyrus or paper can survive by accident or through benign neglect for centuries or in the case of the Dead Sea Scrolls for thousands of years. It takes hundreds of years for languages and handwriting to evolve to the point where only a few specialists can read them.

In contrast, digital information will never survive and remain accessible by accident: it requires ongoing active management. The information and the ability to read it can be lost in a few years. Storage media such as paper tape, floppy disks, CD-ROM, DVD evolve and fall out of use rapidly. Digital storage media have relatively short archival life-spans compared to other media. As the volumes, heterogeneity, and complexity of digital information grows this requirement for active management becomes more challenging and more critical to a wider range of organisations.

How real and how urgent is this threat?

Beagrie: The threat is very real and insidious and will eat away at the future of our cultural heritage, knowledge economies, and information society if we fail to address it. Statistics on current losses are difficult to compile although there are a number of well-known individual examples of loss or near loss such as the BBC Domesday Disks. Wider overviews are rare. In part this is because few organisations wish to publicise losses. Also sometimes the information can be recovered or substituted in some way (e.g. a paper copy). In such cases the loss is often more subtle: information has effectively been degraded through loss of functionality, linking, or documentation, substantially reducing its real value. We do have current statistics in some areas. For the Web we know the average life of a webpage is around 44 days. This impacts not only on ephemera of just local interest but on core information resources. For example some studies have shown the impact on access to URLs cited in medical articles after only one or three years.

For the future, there is good statistical information on the current explosive growth of digital information and clear projections for a future data deluge in areas such as scientific research. Instruments and experiments currently being built will generate in a few years more data than has previously been generated in the whole of human history up to that point. Not all of this information has constant and persistent value but a significant proportion of it does. A serious and worsening gap has developed between our ability to create digital information and our infrastructure and capacity to manage and preserve it over time. Some commentators have referred to the likely cumulative effect of this as a future “digital dark ages”.

How does this affect our cultural heritage? Does all this hold for companies, too?

Beagrie: Digital preservation is needed for our literature, art, archives, and research data held in digital formats. It will affect our research and capacity to monitor issues such as global climate change. It will increasingly affect both individuals and companies. The threat is already important for companies although they will use a different set of “labels” for the challenges compared to memory organisations such as libraries. New business models based on renting access to digital information rather than ownership of physical copies mean publishers are increasingly looking to trusted third parties such as national libraries to guarantee archiving and re-assure their customers about long-term viability and preservation of digital information.

Increasing regulation, compliance, and accountability across all sectors but particularly in banking, pharmaceuticals, medicine, and aerospace, mean companies often must retain and keep digital information accessible for a decade or more. Also for some companies in broadcast and media their business assets are now largely or solely digital. As noted above, the digital challenges affecting memory organisations that have to think in centuries actually begin to manifest themselves in a decade or less, hence similar issues are beginning to impact on companies.

I find it interesting that key concepts such as information life-cycle management which have been around in the digital preservation community for some years are now beginning to be taken up by vendors and companies. In time I can see interest in the commercial sector in other areas of digital curation and preservation research and practice such as archiving and curation of databases, shared registries of file format information, and tools for emulation or migration of complex and unstructured information.

What technical processes are at hand to ensure continued access to digital information?

Beagrie: Some of the basic building blocks such as mass storage, back-up and replication solutions are widely available. Migration and emulation are established techniques for addressing technical obsolescence. However, in my opinion no vendor currently provides solutions which fully address long-term issues or are scaleable and inter-operable for the volumes and increasing complexity of the information we are creating. Technical solutions are only part of the answer though. Organisational issues such as data policies and implementation, and collaboration within and between organisations are equally critical to an effective solution.

What do you do to prevent data theft?

Beagrie: For very sensitive or commercial materials libraries and archives implement a range of physical and system security measures to control and monitor access. For less sensitive material where the creators such as academics or artists wish to encourage sharing and use of their work there are licences which promote open access but reserve rights such as citation or commercial exploitation. There is an active movement of copyright lawyers and organisations promoting “creative commons” type licences which permit creative re-use or use with citation and acknowledgement.

If information and knowledge is only digitally available, how easy will it be to delete such information or to rewrite, say, the history of the 20th century?
Beagrie: Unfortunately unless we have an infrastructure of trusted digital repositories and some degree of replication between them, it will be all too easy. Trust in sources and services is already a major issue on the Web. Libraries, archives, museums, and learned societies must have a central role in safeguarding the authenticity and preservation of information in the digital age in the same way they have had this role in previous centuries. Replication will also be central. Information which is only held by one organisation or within one country is at far greater cumulative risk over a period of decades and centuries.

Are there international differences in the view on this situation?

Beagrie: There is a high level of agreement on major issues and desirable approaches between leading countries in Europe and the USA. The major differences perhaps are in levels of funding for research or infrastructure. The USA for example has recently launched a $100 million National Digital Information Infrastructure and Digital Preservation Program (NDIIPP). There is nothing approaching this as yet across Europe. However European countries probably have a stronger tradition of co-ordination and collaborative programmes which is a relative strength.

Is there some sort of international cooperation with respect to digital preservation?

Beagrie: Yes there is an increasing level of international and national collaboration. In the UK the Digital Preservation Coalition has a membership of 27 major organisations ranging from the national libraries and archives, publisher organisations, data centres, university research libraries, and funding bodies. As well as UK activity it has collaboration with programmes in Australia, the USA, and increasingly Europe. Similar agencies co-ordinating and promoting cross-sectoral collaboration are being established in other European countries including Germany and Denmark. International interchange is also facilitated by activities such as PADI and Erpanet.

There is also growing international collaboration between the national libraries or the national archives. For example the British Library is part of an international consortium of national libraries working with the Internet Archive on the harvesting and archiving of the Web. This requires a very high level of automation and sophisticated tools evolved from those used by search engines for crawling and indexing the Web.

How can we avoid to face the same problem in some decades again?

Beagrie: I would highlight five areas:

  • attend to the basics: procedures for system security, back-up and disaster recovery, auditing your information assets and documenting them so that you know what you have and it is secure
  • support implementation of open standards and archiving-friendly IPR to ensure long-term inter-operability, migration and archiving of data between systems
  • get information life-cycle management embedded in your organisation so you understand how it needs to be created, managed, and used. Think about records and information needs at the system design and records creation stage. Automate processes and provide support for these activities. Document return on investment achieved through better storage management and information management
  • develop trusted repositories and long-term national infrastructure for digital information. Without this the huge investment and potential benefits of digitisation of our knowledge base will be undermined
  • encourage collaboration between the ICT industry and the digital curation and preservation community to address challenges in long-term management of digital information. This will be needed to produce the systems and tools, which will scale, be sustainable, and meet the needs of individuals and organisations whose information base will become larger, increasingly diverse, and digital in coming years.

