In Italy, hydro-meteorological data collection has been managed at the national level by the National Hydrological and Mareographic Service (Servizio Idrografico e Mareografico Nazionale, SIMN) since the early 1900s. The dismantlement of the SIMN, which occurred about 30 years ago, resulted in data collection being transferred to the regional level, consisting of 19 Regions and 2 Autonomous Provinces. This shift has caused difficulties in the availability of complete and homogeneous records for the whole country.
Data acquired in the most recent years is typically available in digital format. Historical measurements are instead often available only in the printed version of the Hydrological Yearbooks published by the National Hydrological and Mareographic Service. In the past, few initiatives attempted to partially recover this information, but they focused on a limited number of years and/or some regions.
Is this lack of data in a digital format a problem?
Yes, definitely! One of the major problems that both hydrologists and climatologists face is the limited amount of historical data that can be used to test new methodologies or train models. This lack of data is even more critical in a country like Italy, with complex morphology and climate that varies substantially across the territory.
The recovery of this considerable amount of data would not only allow a better understanding of the climate of the last century but would also serve to estimate how the climate and the hydrological cycle could change in the future.
Why is it important to digitize historical time series?
Let's take Piedmont as a case study. Figure 1 shows an estimate of the number of historical series of daily average flows available in each year in Piedmont. Only the most recent observations, from 1995 to the present (in gray), are available in digital format through the Arpa Piemonte web portal.
Fig. 1: number of time series of daily average flows available in Piedmont in each year. The historical series available only in the volumes of the Hydrological Annals are shown in blue.
The less recent series (before 1995, in light blue) are reported in the Hydrological Yearbooks. These latter represent a significant portion and, in this case, the majority of the total daily average flow observations available in Piedmont. A considerable digitization effort is needed to make these historical series easily usable.
Figure 2 shows an example of a time series of daily flows for the Tanaro River at Montecastello. The observations from 1942 to 1985 are available in the Hydrological Yearbooks, while the observations from 1996 to 2010 are available in digital format. Figure 2 also shows the series of average flows in spring and autumn (below), with the corresponding long-term averages in the two periods. The difference between the values of these two periods is evident and suggests the presence of trends in the hydrological regime.
This example highlights the importance of digitizing and reconstructing all the available time series, especially when analyzing trends and changes in hydrological regimes.
Fig. 2: historical series of daily average flows of the Tanaro River in Montecastello and average flows in spring (March, April and May) and autumn (September, October and November). The horizontal lines indicate the long-term average over the two periods.
In other words... we need your help!
Within the SIREN (Saving Italian hydRological mEasuremeNts) project, we aim to digitize the historical series of daily flows by crowd-sourcing the recovery of hydrological measurements from historical Hydrological Yearbooks and to produce a consistent dataset. Phase 1 of the SIREN project will be devoted to recovering daily discharge measurements.
Why do we need your help? Why not using optical character recognition software?
Despite the remarkable improvements achieved in recent years by Optical Character Recognition (OCR) softwares and machine learning / artificial intelligence techniques, the most accurate digitization approach is still based on manual transcription.
Most of these records are printed in old documents, and the ink may be partially damaged. For example, an "8" can be easily detected as "3" in these conditions.
Moreover, these tables contain several hand-written corrections performed by different people, thus, with different calligraphies. All these peculiarities limit the applicability of standardized automatic approaches.
If you are interested in contributing to the digitization of this data, on https://www.zooniverse.org/projects/siren-project/siren-project you will find information about the project and a digitization tool! Even just 10 minutes of your time will be precious for the project!
Research group made of Paola Mazzoglio, Luca Lombardo, Alberto Viglione, Francesco Laio and Pierluigi Claps of the Department of Environment, Land and Infrastructure Engineering of Politecnico di Torino and by Miriam Bertola of the Vienna University of Technology.
Project released by Politecnico di Torino – Department of Environment, Land and Infrastructure Engineering during the World Water Day, 22 March 2023.