Archiving conference 2014 in Berlin

This week I had the pleasure to attend the Archiving conference (AC) 2014 in Berlin. The event was hosted at the kino Arsenal in Potsdamer Platz in one of the cinema room with extremely comfortable red seats. This conference belongs to a group of conferences run by the Society for Imaging Science and Technology. I usually attend the Color Imaging Conference (CIC) or the Electronic Imaging conference (EIC), both in the US, so when I saw that the AC was taking place this year in Berlin I jumped at the chance and joined the event.

Comparing to CIC the topics of AC are much more applied and not only dedicated to imaging problems. The variety of represented fields makes this event very interesting. Starting from archiving itself, we can understand it by all problems related to collect documents, from papers, books, audio files, video files, electronic documents for e-government, art collections, website, internet. Once the databases are built you have to think of how to access the documents knowing that the amount of information grows without stopping, having documents digitized do not mean they don’t need to be processed anymore (e.g. how to save scanned pages of an old book with its hand-written annotations?). In relation to the scanning task, the one day industrial exhibition gives us the possibility to see different book scanners with special features to be able to manipulate fragile documents.

A never ending challenge in this domain is the continuous evolution of technology. If the tasks remain identical - to archive documents whatever they are - the people in charge of these tasks - working in library, museum, archive department - have to save the documents and keep alive the technology to actually store these documents. Archives being public they have to be accessible to the public. This also asks the question of the founding, the responsible people have to continuously argue with politicians to keep their founding at a reasonable level.

To be give a very simple image, in the past all documents were saved on solid media (ie stone, paper, film…), something physical with a long life duration. Then the time is speeding up, we accumulate more and more information, we need more space to archive it and we need to reduce the physical size of the archiving media. Data centers full of servers are available but the life duration they proposed is not as long as the physical media: you enter the configuration where these centers are continuously copying and migrating information from server to server to be faster than the archiving media life, therefore our global archives are always moving: a global power outage could mean the end of our archives.

Archiving film documents

An important parameter is the constant evolution of technology. This is something the movie industry is facing also continuously. In a way the challenges of restoring and archiving very old film material are very similar to those faced by during a movie production when different sources (ie contents from different cameras, different softwares) have to be combined: there are different workflows that need to be merged in order to produce the final document/film. There are discussions that propose to use the DCP film distribution format to store new films and old films. One advantage it is understood by many people but it brings the matter of DRM in the document. One problem is the projection system and the encryption are linked (if I’m correct), what happend when the projection technology changes? All these aspects were tackled in the opening keynote given by a former fellow from Fraunhofer IIS.

Crowd sourcing and database browsing

Crowd sourcing was to me a very interesting topic. It did appear in oral presentations and in posters. I joined together the database browsing because I think these two points are related. The amount of available information - as I mentioned above - is increasing and if you remember big data we have both feet inside. To process the documents the archiving departments have (eg in the conference program from the BNF in Paris, in the Netherland or in Germany) started campaigns where they ask the public to achieve tasks to validate digitization (I’m simplifying the problem here) where the task can’t be performed fully automatically without errors. It follows the web 2.0 model where the user is also the one creating the content, except here he is helping/working on archive media which concerns everybody, it’s our memory after all.

Graph database was mentioned and I could like to hear more about it. Especially how database query languages are developed to ease the access to the information in those databases. But this also implies to think how to build these databases. Pretty interesting in any cases.

First impressions

To be short there were mostly good impressions. Different people, different backgrounds but similar challenges as archiving, digital preservation, curation are. Multidisciplinary as I like it.