31. ML for the identification of Lagrangian flow features


Julian Quinting

 Our Helmholtz Young Investigator Group (VH-NG-1243) at KIT researches the dynamics and predictability of weather. For this purpose we analyse large datasets output by numerical weather prediction models. A particular focus is on the development and application of machine learning based models to identify processes that dilute the skill of numerical weather prediction models on medium- to subseasonal forecast ranges of several days to a few weeks.

What is the data science project’s research question?

The overarching question is whether machine learning methods can be used to identify Lagrangian flow features in numerical weather prediction and climate model outputs. These flow features are of interest to us since they can be associated with severe rainfall events (so-called warm conveyor belts) or trigger dust outbreaks in desert regions (so-called dry intrusions). The computation of these Lagrangian flow features via classical numerical methods is computationally expensive and requires data at a high temporal and spatial resolution. However, numerical weather forecasts or climate model projections are archived at a comparably low resolution to keep the required storage space at a minimum. The aim of the project is thus the development of a computationally less expensive machine learning model that reliably identifies the features from low-resolution data. A first approach has been implemented for the warm conveyor belts successfully, but we are convinced that a talented data scientist will lift our machine learning model to the next level. In addition our group collaborates with Dr Shira Raveh-Rubin at the Weizmann Institute of Science. Her group focuses on dry intrusions and we are keen to test the applicability of our model on other Lagrangian flow features such as the dry intrusion.

What data will be worked on? The data science project will be based on so-called atmospheric reanalyses. These reanalyses are gridded data that provide information on the state of the atmosphere and are available in our group for the past 40 years. The predictand data sets, which are based on trajectories, have already been calculated so that an efficient start into the project is possible.

What tasks will this project involve?  The main task of the project will be the design, training and validation of a deep learning model. In order to make the code publicly available, it should be well documented and shared on software development platforms.

What makes this project interesting to work on? 

The identification of Lagrangian flow features via machine learning is a newly emerging topic in atmospheric science and is particular challenging as it operates in 4 dimensions (time + 3D space). Thus, novel machine learning techniques may significantly advance this research. From an application point of view, numerical weather and climate prediction centers are interested to use the machine learning approaches in order to verify the representation of the Lagrangian flow features in the forecasting models. Thus, we see great potential that the models developed in this project will be used worldwide and will help in the long-term to improve numerical weather prediction and climate models.
Beyond this, the applicant will become part of a vital young investigator group which does not only meet for work but also for social activities. Our established contact to Dr Shira Raveh-Rubin’s team at the Weizmann Institute of Sciences would also help to strengthen the exchange between Israel-based scientists, fostering a strong interdisciplinary Israeli-German collaboration.

What is the expected outcome? Contribution to research paper, Contribution to software development

What infrastructure, programs and tools will be used? Can they be used remotely?  The applicant will have access to high performance computing (HPC) facilities at KIT. The programming language will be Python. Models will be developed on GPUs using the TensorFlow library. Remote access to the HPC facilities is granted.

What skills are necessary for this project? Data analytics / statistics, Data mining / Machine learning, Deep learning

Is the data open source?  Yes

Interested candidates should be at Master level. Julian Quinting is looking for 2 visiting scientists, working on the project together with the team.