As part of the co-operation between the federal and provincial governments required by the Water Act, a monitoring network is operated to determine the water balance in Austria. This data is collected by the federal states and transferred to the Hydrographic Data Management System (HyDaMS).
The HyDaMS has so far been based on proprietary software, with which over 300,000 time series with different parameters and time references are stored in a very specialised format.
As part of the INSPIRE Directive (2007; implementation AT: Geodata Infrastructure Act 2010), geodata must be named and made public (metadata) and made available in a harmonised way (defined common data structure) (data services).
In 2019, the old PSI Directive (Public Sector Information) was revised/amended to become the Open Data Directive, which came into force in 2022 as the Federal Information Reuse Act 2022. It contains minimum rules for the reuse of public data and introduces the open-by-default principle: public data should therefore be open in principle (open data) as long as there is no definitive objection to this. In addition, dynamic data (sensors/time series) must be made available via services/API.
This was tightened up in February 2023 with the publication of the HVD Regulation (High Value Datasets Regulation). The datasets listed there must be made available free of charge (OpenData) via service/API.
Objective
Modern big data analyses and dashboards cannot access the time series data within HyDaMS due to the special structure of the data format used to date.
In order to enable big data analyses or model calculations, the time series are to be automatically mirrored periodically and monitored in an open source (OS) database system via an interface to be developed. This concerns (i) the verified hydrographic data, which are to be imported annually at the end of a yearbook, and (ii) the current remotely sensed (and unverified) data from the countries, which are to be imported continuously.
A modern database system is also a prerequisite for numerous innovations planned by Division I/3 Water Balance, such as a national water balance model. The provision of data via the WebGIS portal eHYD is also no longer up to date. Firstly, the time series and master data must be exported from the HyDaMS as text files and then manually integrated into the eHYD. Due to the large amounts of data, there are limitations, so many time series are only offered in aggregated temporal resolution in order to reduce the amount of data. If a database with a modern interface is available, the eHYD could be linked directly to the new database so that a user can access the entire hydrography data set.
HyDaMS does not have a machine-readable interface (API) and, according to the software company, cannot be equipped with one. The present data pipeline and database project between the BML and the BAB is therefore also conditioned by legal requirements (see ‘Initial situation’).
Main goals
- Stable export of data from HyDaMS (master data and time series)
- Selection of a suitable open source database system
- Development of data pipelines for the import of data from HyDaMS into a selected open source database
- Development and operation of a test system
- Provision of interfaces to dashboard and state-of-the-art evaluation tools
Status of the project
- BML Dept. I/3: existing data (master data and time series) have been reviewed and evaluated
- BAB: Examination of the suitability and comparison of various OS databases with regard to the efficient management of a sensor time series big data test data set (approx. 6 billion data entries) is currently being finalised
Planned procedure, implementation
- BAB: Implementation of a suitable database schema
- BAB / BML Dept. I/2: Development of scripts for the following applications
- Linking the time series from Hydams/Callisto with the master data
- Exporting to arrays with format conversions
- Transfer of data to OS databases
- BAB: Ensuring the functions: Parallelisation, logging and monitoring
- BAB: Time-controlled execution:
- OS-based workflow management
- Create, manage and monitor workflows
- Mapping workflows with directed acyclic graphs
- BAB: Provision and hosting of a test DBMS
- BAB: Export time series data from HyDaMs and Callisto
- BAB: Import time series data into the selected test system
- BAB / BML Dept. I/3: Data validation
Schedule
Project start: 01/2024
Project end: 12/2025