Although the price for storing data may be falling there are additional economic challenges in ensuring digital content remains understandable for future generations. Guaranteeing long term usability for spiralling amounts of data produced or controlled by organisations with commercial interests is quickly becoming a major problem, says Michael Wilson, Secretary, UK e-Infrastructure Leadership Council.
Wilson gave some excellent background on the subject during Thursday's e-health session at eChallenges presenting findings from the ENSURE project which is examining economical solutions for long term digital preservation in user cases (healthcare, clinical trials, and finance).
Digital medical data serves a different purpose depending on the stakeholders. Health records and data can be preserved for the benefit of patients, their families and future medical research. Over time, the reasons for collecting specific data-sets may also change. Medical imaging data takes up to 30% of the digital universe. Each record has it's own formula (e.g. pathology images are saved in DICOM). In a further 10-20 years, software will inevitably have changed but virtual environments can preserve the software to make the data useful (e.g. associated manuals, hardware, operating systems). There are of course risks in preservation associated with different strategies, and this is what ENSURE is researching (i.e looking at cost/value of different strategies, how to automate the lifecycle, as well as scalability options of leveraging new technologies such as cloud technology).
For researchers, a major flaw in social media is its transitory state. Already one project is examining how to preserve and manage weblogs - Blogforever. Recent studies have revealed that blogs on major historical events have already been lost (see another blog post). BlogForever aims to provide a solution to preserve and organise all blogs especially those that have historical significance - one project partner is CERN with the goal of preserving physics-related blogs. GridCast originated at CERN. The ultimate vision of the project is to preserve collections of blogs in a cost efficient manner safeguarding their authenticity and integrity for users/organisations (e.g. a National Library of Medicine would like to preserve a collection of health and medicine blogs). Other aims include enabling full text searching, tagging, sharing and reusing content.