Researchprojects

  • Home /
  • Researchprojects /
  • BAB 069/24: Open Source Data Pipeline & Database for HyDaMS (Hydrographisches Daten Management System)

  • Home /
  • Researchprojects /
  • BAB 069/24: Open Source Data Pipeline & Database for HyDaMS (Hydrographisches Daten Management System)

BAB 069/24: Open Source Data Pipeline & Database for HyDaMS (Hydrographisches Daten Management System)

As part of the cooperation between the federal and provincial governments required by the Water Act, a monitoring network is operated to determine the water balance in Austria. This data is collected by the federal states and transferred to the Hydrographic Data Management System (HyDaMS).

HyDaMS is a version of the TopoDesk software from Toposoft. This is proprietary software based on the AZUR programming language.

In this software, over 300,000 time series with different parameters and time references are stored in a very special format.

As part of the INSPIRE Directive (2007; implementation AT: Geodata Infrastructure Act 2010), geodata must be named and made public (metadata) and made available in a harmonized manner (defined common data structure) (data services).

In 2019, the old PSI Directive (Public Sector Information) was revised/amended to become the Open Data Directive, which came into force in 2022 as the Federal Information Reuse Act 2022. It contains minimum rules for the reuse of public data and introduces the open-by-default principle: public data should therefore be open (OpenData) in principle, as long as there is nothing definitively against it. In addition, dynamic data (sensors/time series) must be made available via services/API.

The situation was tightened in February 2023 with the publication of the HVD Regulation (High Value Datasets Regulation). The datasets listed there must be made available free of charge (OpenData) via a service/API by June 2024 at the latest.

Objective

Modern big data analysis options and dashboards cannot access the time series data within HyDaMS due to the special structure of the data format used to date.

In order to enable big data analyses or model calculations, the time series are to be automatically mirrored periodically and monitored in an open source database system via an interface to be developed. This concerns (i) the verified hydrography data, which is to be imported annually at the end of a yearbook, and (ii) the current remotely sensed (and unverified) data from the countries, which is to be imported continuously.

A modern database system is also a prerequisite for numerous innovations planned by Division I/3 Water Balance, such as a national water balance model. The provision of data via the WebGIS portal eHYD is also no longer up to date. First, the time series and master data must be exported from HyDaMS as text files and then manually integrated into eHYD. Due to the large amounts of data, there are limitations, so many time series are only offered in aggregated temporal resolution in order to reduce the amount of data. If a database with a modern interface is available, the eHYD could be linked directly to the new database so that a user can access the entire hydrography data set.

The HyDaMS does not have a machine-readable interface (API) and, according to Toposoft, cannot be equipped with one. The present data pipeline and database project between the BML and the BAB is therefore also conditioned by legal requirements (see "Initial situation").

Main goals

  • Stable export of data from HyDaMS (master data and time series)
  • Selection of a suitable open source database system
  • Development of data pipelines for the import of data from HyDaMS into a selected open source database
  • Development and operation of a test system
  • Provision of interfaces to dashboard and state-of-the-art evaluation tools

Planned procedure, implementation

  • BML Dept. I/3: Viewing and evaluating the existing data (master data and time series) so that only cleansed time series that are useful for further evaluation are processed
  • BAB: Testing the suitability and comparison of various open source databases with regard to the efficient management of a sensor time series big data test dataset (100 million data entries)
  • BAB: Implementation of a suitable database schema
  • BAB / BML Dept. I/2: Development of scripts for the following applications: - Linking the time series from Hydams/Callisto with the master data; - Exporting to arrays with format conversions; - Transfer of data to OpenSource database
  • BAB: Ensuring the following functions: - Parallelization; - logging; - monitoring
  • BAB: Time-controlled execution: - Open source-based workflow management; --Create, manage and monitor workflows; - Map workflows with directed acyclic graphs
  • BAB: Provision and hosting of a test DBMS
  • BAB: Export time series data from HyDaMs and Callisto
  • BAB: Import time series data into the selected test system
  • BAB / BML Dept. I/3: Data validation

 
Schedule

Project start: 01/2024
Project end: 12/2025

 

 

 

 

 

Dietrichgasse 27
1030 Wien
 +43 (1) 71100 - 637415

© 2024 bab.gv.at. all rights reserved