Development of analysis tools for recurring agricultural policy issues based on the example of drought monitoring
Initial situation
At the Federal Institute for Agricultural Economics and Mining Research (BAB) there is a computing cluster on which an Open Data Cube (ODC https://www.opendatacube.org/) has been set up. Within the framework of ongoing further development, the database is continuously being expanded and extended. The data cube (ODC) was developed in order to be able to efficiently manage the constantly growing amounts of satellite image data. The unique feature of the BAB's Open Data Cube is the extension of the technology originally intended for satellite images, so that other data can also be indexed and loaded as time series into this multi-dimensional data cube. This makes it possible to intersect and analyse a multitude of raster and rasterised vector data (ZAMG, ALS, INVEK-OS and many more) and satellite images in one system. In addition to the purely spatial analysis, the time dimension can also be taken into account in the calculations and evaluations in the multidimensional data cube, which enables the analysis of time series. From the perspective of basic agricultural research and agricultural policy, questions of drought or crop monitoring, such as the temporal development of the cultivated area of bread grain, are of great importance. This is precisely where the strength of data cube technology lies, with its perfor-manent analysis in the Big Data area over additional time dimensions. This makes it possible, for example, to show the development of bread grain cultivation over the last 20 years (altitudes, geographic distribution, climate data from ZAMG, etc.). MODIS satellite data can be used for time series analysis, as long-term time series (from 2000) are already available compared to Sentinel-2 products (from 2015). The need for these tools already became apparent in 2018, when a drought map was created to process drought compensation to farmers using relatively rudimentary, technologically outdated and simple IT/GIS applications. Using current IT and GIS technologies, it should be possible to answer such recurring questions with little lead time in the event of new weather caprices (e.g. spring drought 2020). The improved analysis tool drought monitoring should contribute to being able to answer such agricultural policy questions in a much more targeted manner in the future. Within the framework of a future digitisation strategy in agriculture (geodata infrastructure GDI), the existing ODC at the BAB can serve as an IT system basis for the application of drought monitoring. The findings from the project are to be made freely available to interested experts from the official environment.
Objective
The aim of the project is to develop a drought monitoring system. To this end, current Austrian weather data will be analysed and compared with 30-year time series (climate period) in order to identify periods of drought. The effects of droughts on specific crop species will be investigated and the temporal changes in field use under changing climatic conditions and regional shifts will be analysed (time series analysis). In order to identify all influencing factors, statistical learning and pattern recognition methods (machine learning, AI) are also used. In addition to a (limited) possibility of forecasting, this primarily serves to assess the importance of various influencing factors and to simplify the developed models to as few as possible, but all the more relevant environmental data.
The ODC should serve as a networked data centre and be able to function openly for data integration via cloud object storage interfaces (e.g. S3). This should replace part of the existing geodata infra-structure. This solution should minimise sources of error, make the latest data directly available to all users and be managed decentrally. Users should be able to carry out the necessary analyses independently via internet browser on the AOD infrastructure. Existing analysis functions should be expanded and recurring evaluations of various questions should be able to be updated.
Status of the project
The drought monitoring analysis tool was successfully completed in 2021 and the relevance of the Open Data Cube as a data repository and analysis tool was clarified. Furthermore, defined climate parameters (e.g. the climatic water balance, heat days and dry periods of 10 or more days) were calculated for each cadastral municipality of Austria for the climatic normal period 1961-1990 as well as 1991-2020. Without ODC, such complex calculations would not have been possible in the short processing time of a few days. The results of these analyses were made available to the BMF for further processing. A basis was thus created that allows these or similar recurring questions to be answered at short notice with current data if required. The analysis functions are made available to the user by means of Jupyter notebooks and can be independently adapted and expanded as needed.
In 2022, the infrastructure around the ODC was completely rebuilt. With a Jupyterhub instance and its own container registry (Harbor), a new platform was created within a Kubernetes cluster (system for orchestrating container applications). This allows the Jupyter Notebook application to be made available to a large number of users and makes it possible to offer users pre-configured environments. The environments are tailored to the different needs of the users, save them the installation process and can be extended or adapted as required. The ODC is one of these environments. The platform is easily scalable and supports a variety of authentication protocols.
Planned work 2023
Due to the positive experience gained in the course of the ODC project and its strengths, namely the ability to store large quantities of raster data (satellites, climate data, ALS, etc.) in a structured manner and to evaluate them in combination for many questions, the ODC will continue to exist as a permanent analysis tool. The results of analyses already carried out, such as the calculation of climate parameters for arable and grassland areas, will be updated after the acquisition of current data. The infrastructure will be further optimised and expanded into a dask cluster. This will make it possible to parallelise processes in order to make the best possible use of the hardware resources of the Kubernetes cluster and increase performance.
Schedule
Project start: 01/2021
Project end: 12/2023