E-GOVERNMENT
Digital library to save web content forever
08-06-2006
by Silicon.com
A new national library that will preserve UK web content forever is being developed by the British Library.
The new National Digital Library will store everything from digitised versions of centuries-old manuscripts to digital journals and web archives, and will hoover up 300 terabytes of data in the next five years.
The British Library has a collection of 150 million items such as books and manuscripts, and is already collecting digital content. And this is going to gather pace when the Legal Deposit Libraries Act 2003 comes into force, probably by 2008.
As Roderic Parker, communications officer for the British Library's Digital Object Management Programme explained: "When that comes into effect the publishers of electronic journals and books and theses published in the UK will be obliged to deposit copies with us - the British Library has the right to receive everything. That's the way it will work with the digital stuff as well."
The library has already been running a voluntary deposit scheme where UK publishers of digital content can hand over their content, and while some people might question the value of some of the content on the web, Parker said even the most lowbrow publication is important.
"Some of it is important from the point of cultural history, just as much as a pamphlet from an 18th century election; it tells you how they thought at the time. It's not our job as a library to say that things are ephemeral," he said.
And while paper can rot, digital materials have different problems - such as obsolescence of the hardware or software used to access them.
"One of the things that we have to do is make sure that [digital materials] can be kept in the long term. There are huge problems with keeping things technologically available. We've got printed materials that are five- to six-hundred-years-old and they are in pretty good condition, whereas digital stuff can be unworkable in 20 years," Parker explained.
The British Library is using cryptographic time-stamping technology to protect the integrity of the electronic documents in its new archive.
It is using nCipher's DSE200 document sealing engine to time-stamp and digitally sign every item to prove that documents are authentic and have not been modified.
"If we supply you with a book you can see if someone has torn out a page or if someone has tried to insert something but you can't do that with an electronic file so we are trying to guarantee you can do that," Parker said.
The library is building the new archive system by starting with the storage technology: designing a system which has to be very scalable, fault tolerant, reliable and resilient. The system will also have to ensure future users can view the material with contemporary applications but still experience the original look-and-feel.
As Parker points out: "If we are serious we have to make sure this is available for centuries. People have to come here in 2100 or 2200 and find what was on the web in 2006. We don't want to be in the position of saying 'we had it but it was damaged and now we can't retrieve it.'"
Steve Ranger writes for Silicon.com.
Reprinted with permission from Silicon.com

