AI-enabled data rescue from historic charts
The Environment Agency has around 10,000 years' worth of hydrological data on river levels and flows - but it is stored on materials that are fast degrading.
The risk of flooding and drought within England is a priority area of focus for the Environment Agency (EA) which strives to protect and enhance the environment, to contribute to sustainable development and to help protect the nation’s security in the face of emergencies.
Over the years, a vast amount of hydrological data has been collected through manual efforts, amassing an impressive physical archive of approximately 10,000 years’ worth of valuable river level and flow information. This vital data could be used to build more accurate climate and flood modelling and help forecast and minimise the impact of future adverse weather events.
However, a significant challenge is that much of this historical environmental surveillance data has been stored on biodegradable materials, such as paper charts, microfilm and punch tape. These important documents face the risk of irreversible degradation and therefore need cataloguing urgently. Adding to this challenge, the EA is losing the ability to interpret even this archive as staff retire.
While manual data extraction is underway, the time-consuming plotting of physical data onto graphs means this process – currently estimated to take 40 years – is unsustainable and a new, faster solution was needed. The Department for Environment, Food & Rural Affairs (Defra) approached the Accelerated Capability Environment (ACE) on behalf of the EA to explore the feasibility of using cutting-edge artificial intelligence (AI) and machine-learning technology to digitise, read and interpret the physical data significantly faster while maintaining accuracy.
Bringing new technologies to bear
ACE invited suppliers from its Vivace community to present ideas for a proof of concept (PoC) solution to see if AI could help with either fully automating, or semi-automating, manual data rescue. From seven bids, The London Data Company was selected to determine the exact data rescue requirement, including characterising the features of the physical data being digitalised, identify the best-fit method, and build and test a PoC data-rescue tool.
Working with domain specialists and data users from across the EA, an initial options analysis identified two suitable open-source tools to take forward for the PoC stage – one which was fully automated, and the second which had a human in the loop – so two, rather than the expected one.
The first PoC, the fully automated tool, showed low feasibility for effective digitisation, due to limitations in accurately rescuing handwritten information that is crucial for understanding axis labels, chart metadata such as location and start date, and adapting to different chart types, such as those with missing gaps, or smudges caused by water damage. It is recommended that further assessments be made in the future as Optical Character Recognition (OCR) performance improves with time.
Pivoting to the second, human-in-the-loop tool for AI-assisted data rescue produced better results, and recommendations were also made for feature changes which would adapt and increase the effectiveness of this tool on live datasets, including integrating additional AI elements from the first PoC.
The importance of collating and analysing good quality historic data and records to better understand climatic trends and management of river catchments cannot be underestimated as it is key to protecting the UK’s security in the face of emergencies.
As the EA works to create better places for people and wildlife and support sustainable development, ACE looks forward to supporting next steps in this, and many other priority environmental areas.