25 projects were submitted by our German partners in 2022 . Take a look at this year's proposals.
Click on the ' + ' to learn more about the respective topics, the mentors and the conditions of participation.
Our primary research interests are situated at the intersection of geometrical deep learning, topological machine learning, and representation learning. We want to make use of geometrical and topological information—also known as manifold learning—to imbue neural networks with more information in their respective tasks, leading to better and more robust outcomes. Following the dictum ‘theory without practice is empty,’ we also develop methods to address challenges in biomedicine or healthcare applications.
bastian.rieck@helmholtz-muenchen.de
What is the project's research question?
How can we use topological structures (cycles, cliques) in graphs to compress them (in order to improve generalisation performance)?
What data will your exchange student work on?
Benchmark data sets for graph representation learning as well as new data sets collected from sensor data and social networks (I am very open towards creating new data sets for the community, but I would leave this up to the preferences of the individual student).
What tasks will the project involve?
- Getting acquainted with state-of-the-art models in geometric deep learning and topological machine learning
- Implementing your own graph neural network architectures with `pytorch-geometric`
- Learning about how to train (graph) neural networks with `pytorch-lightning`
- Checking how generalisation performance in graph learning tasks is influenced by compression' of relevant graph structures.
What makes this project interesting to work on?
You will learn a lot about the state of the art in geometric deep learning and topological machine learning, two rapidly-growing domains of machine learning that deal with structured data sets. In addition, you will pick up knowledge about deep learning frameworks—these skills can be super helpful in other projects as well.
The techniques developed in this project will be generically applicable to many graph learning problems (graph classification, link predictions, node classification) on very diverse data sets (sensor networks, meshes of 3D shapes, ...).
Geometrical deep learning and topological machine learning are rapidly-growing areas with connections to applications in the life sciences (gene regulatory networks). This project could be the first step for students to get acquainted with these areas and topics.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Essentially, only Python (with deep learning frameworks based on `pytorch` etc.) and some compute resources will be required. All of these can be provided remotely.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Software development, Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
The Institute for Software Technology aims at software research for scientific and engineering topics in aeronautics, transportation, and energy. At the DLR-site in Brauschweig, the main research is on Model-Based System Engineering and Model-Driven Software Development for space systems. Additionally, an outstanding resaerch activity is on interactive visualization of very large scientific datasets (in conjunction with High-Performance Computing research), immersive environments, Augmented Reality, Mixed Reality, Visual Analytics, and more.
What is the project's research question?
Interactive data processing for explorative visualization of multi-variate climate data
What data will your exchange student work on?
The successful candidate will work on algorithms for atmospheric simulation and measurement datasets. The main topic will be on global time-dependent simulation datasets for weather forecasts and climate informatics. The goal is to process data for interactive visualization approaches. The required Level-of-Detail model has to support view-dependent refinement and streaming from remote high-performance clusters. Topology-based techniques need to be developed to allow analysis and visualization at different levels of detail of multi-field datasets. Besides mapping data on a 3D planet, extracted features from data products are to be processed in a view-dependent manner as well.
What tasks will the project involve?
Literature research; Prototype development; Demonstrating prototype; Writing summary report; Presenting result as talk
What makes this project interesting to work on?
Improved programming skills; Understanding how to process and visualize globe covering 3D simulation datasets; Insight into interaction methods in immersive environments; Improved communication and presentation skills; Insight into German space missions.
What is the project's expected outcome?
Contribution to software, Research methods might eventually result in a joint publication.
Is the data open source? CosmoScout VR is Open Source. Depending on the security level, aproprietary data or benchmark datasets are used
What infrastructure, programs and tools will be used? Can they be used remotely?
The Institute offers access to virtual reality laboratories equipped with a powerwall installation and multiple cluster-based display walls. Some walls offer multi-touch interaction. Besides high-performance workstations, GPU clusters and High-Performance Data Analyses (HPDA) clusters are available for data processing. In addition to this infrastructure, we have head-mounted Displays (Vive, Oculus Rift, etc.) as well as HoloLens glasses.
CosmoScout VR is the software framework in use. It is open source.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Geographic information systems, Software engineering, Software development, C/++,
Interested candidates should be at Master, PhD or Postdoc-level.
We are offering collaboration-as-a-service to matter research scientists in all of the Helmholtz association. We support them to use or expand their use of Machine Learning. For this, we study basic usage of machine learning in data science applications, trustworthiness aspects of machine learning, reproducibility and advanced use of machine learning for non-canonical datasets.
What is the project's research question?
Does the use of Stochastic gradient Langevin dynamics offer an advantage over canonical mean/median summary statistics for MAP estimation
What data will your exchange student work on?
The student will work on standard simulation examples which expose reference datasets in the community, see https://github.com/sbi-benchmark/sbibm/tree/main/sbibm/tasks. Aside of that, the student will work on surrogate or real simulation datasets from accelerator beam monitoring.
What tasks will the project involve?
- understand simulation based inference (see also https://arxiv.org/abs/2112.03235) and why it is needed
- train task with symmetric posterior
- train task with asymmetric posterior
- implement map estimation with Stochastic gradient Langevin dynamics
- benchmark SGLD, mean and median based MAP estimation
- visualise results and report
What makes this project interesting to work on?
Machine learning supported Bayesian statistics for simulation-based inference: The goal of this project would be to benchmark Stochastic gradient Langevin dynamics based optimization for MAP estimation in comparison to mean/median based MAP estimation based on posterior samples.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
This project can be conducted remotely. An average laptop with a fully functional python environment is enough. The sbi [1] and sbibm [2] packages might be required to run. HPC infrastructure will be offered in case of need.
[1] https://github.com/mackelab/sbi/
[2] https://github.com/sbi-benchmark/sbibm
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Python
Interested candidates should be at Master, PhD or Postdoc-level.
Daniel Abou-Ras's group is dealing with correlative electron microscopy applied on thin-film solar cells. Christoph Koch's research focuses on the structural and chemical characterization of matter by electron beam based techniques, at lengths scales ranging from micrometers to atomic resolution. For this purpose we operate several microscopes and related equipment, develop new data acquisition techniques, and along with those new numerical methods to extract relevant information from the collected data.
daniel.abou-ras@helmholtz-berlin.de
https://www.helmholtz-berlin.de/people/daniel-abou-ras/index_de.html
https://www.physik.hu-berlin.de/en/sem
What is the project's research question?
Extract structure-property relationships in optoelectronic semiconductor devices at the nanometer scale
What data will your exchange student work on?
Multidimensional imaging and spectroscopy data from identical specimen positions acquired using various techniques in scanning electron microscopy and transmission electron microscopy
What tasks will the project involve?
- Preparing electron microscopy data for machine learning
-- Using Python package NionSwift for interactive data analysis (GUI running on remote server)
-- Homogenizing length scales of datasets (possibly warping)
-- Dividing data into subsets
- Extracting correlations using machine learning
What makes this project interesting to work on?
- Establishing new methods and making them available as online service
- Extracting relevant materials and device properties directly usable in the research and development of optoelectronic semiconductor devices
- Prospect of joint publication in peer-reviewed journal
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software, Establishing online service for data analysis of correlative microscopy
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
- GPU cluster with 7 NVidia V100 (remotely accessible)
- Electronic lab notebook (open source, elabftw) with Apache Guacamole service (for running applications on the server)
- Access to all the raw data, including the evaluation software used for imaging and spectroscopy
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Software development, Python
Interested candidates should be at Master, PhD or Postdoc-level.
The department "Matter under Extreme Conditions" is one of the research areas of the Center for Advanced Systems Understanding, Helmholtz-Zentrum Dresden-Rossendorf.
We conduct digital systems research of complex systems by combining methods from mathematics, physics, systems theory, data science, and scientific computing.
Within the department, we investigate the non-equilibrium behavior of matter under extreme conditions by developing innovative electronic structure methods and fusing them with machine-learning methodologies for the numerical modeling of high energy density phenomena in warm dense matter induced by extreme electromagnetic fields, temperatures, and pressures.
https://www.casus.science/research/matter-under-extreme-conditions/
What is the project's research question?
We develop a novel class of machine learning model that fuses concepts from quantum computing and random neural networks.
What data will your exchange student work on?
Standard machine learning datasets will be used.
What tasks will the project involve?
Within this scope of the proposed internship, the student will contribute to the validation of our proposed ARQN model using quantum algorithms via the PennyLane Quantum software platform with a limited number of qubits and high-performance computing resources available at our institute.
Further details:
Recently, quantum computing (QC) has been leveraged for machine learning with the hope that the uncertainty in QC can be a great advantage for probability-based modeling, inspiring new research for Noisy Intermediate Scale Quantum (NISQ) devices.
In this project, we focus on a particular class of artificial neural networks called Random Neural Network (RNN).
In RNNs, the neurons are connected such that excitatory (positive) and inhibitory (negative) spike signals are interchanged. While classical (i.e., non-quantum) RNNs have demonstrated effective applications in decision making, signal processing, and image recognition tasks, their implementation has so far been limited to deterministic digital systems that output probability distributions in lieu of stochastic behaviors.
To better exploit the random nature of RNNs, we develop an Artificial Random Quantum Neuron (ARQN) model with a robust training strategy.
The ARQN model relies on the dynamical evolution of two easy-to-implement Hamiltonians and subsequent local measurements. The architecture allows exploiting complex amplitudes and back-action from measurements to influence the input. This approach to learning protocols is advantageous in the case where the input and output of the system are both quantum states. We will demonstrate this by classifying Bell pairs which can be seen as a certification protocol. Stacking the introduced elementary building blocks into larger RNN networks combines the stochastic features of an RNN with the non-local quantum correlations across the networks.
Furthermore, the ARQN has the potential to deal with noise which is crucial for various applications, including computer vision in NISQ devices.
What makes this project interesting to work on?
This project is at the forefront of fusing artificial intelligence with quantum computing. By embedding the ARQN neuron model in classical RNN using classical-quantum algorithms, we will exploit RNN for faster, secure, and energy-efficient computation in noisy environments. It also aims to improve the resulting accuracy for various applications such as pattern recognition, optimization, learning, associative memory, and others.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The high-performance computing facilities of the Helmholtz-Zentrum Dresden-Rossendorf will be used, which are accessible remotely.
What skills are necessary for this project?
Machine learning, Deep learning, High-performance computing, Software development, Python
Interested candidates should be at Bachelor, Master or PhD level.
We are using satellite remote sensing and simulations to study the dynamics of ice sheets in Greenland and Antarctica. Satellite remote sensing comprises optical, SAR and altimetry missions, that are processed in our group to data products for later analysis. We use the data products to investigate the sea level contribution of ice sheets and for process studies.
https://www.awi.de/en/about-us/service/expert-database/angelika-humbert.html
What is the project's research question?
Can ML improve detection of surface returns in terms of quality and efficiency ?
What data will your exchange student work on?
CryoSat-2 data and airborne data
What tasks will the project involve?
ML training with artificial data, subsequently application to satellite altimetry data (CryoSat-2). After that comparison between the results of conventional retrackers and the newly developed approach. If time permits additional application to Sentinel-3 data.
What makes this project interesting to work on?
Ice sheets are major contributors to sea level change. To quantify the mass loss of ice sheets satellite altimetry data is used. Basically, the change in distance between the sensor and the glacier surface is measured repeatedly, so that elevation change of the ice sheet surface can be transferred into mass loss. This type of measurements is based on radar pulses that are transmitted from a space-born or air-borne instruments and the return is recorded as a waveform. As better the quality of the waveform processing is, as lower is the measurement error - hence this project is a direct contribution to a societal relevant topic: the reduction of uncertainty in measurements of sea level contribution of ice sheets. Its context, as well as the fact that signal processing skills obtained in this research field can be easily applied in other research and engineering context as well, makes this beneficial for the applicant.
This is a joint project together with collaboration partners from GFZ Potsdam, Dr. Tilo Schöne.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The HPC infrastructure of AWI is the platform to work on. Tools will potentially be TensorFlow and other standard AI software packages, as well as preprocessing skripts on a unix platform. The tools will be avialably by remote (vpn) access, too.
What skills are necessary for this project?
Machine learning, Deep learning, Python, unix
Interested candidates should be at Master level.
The Department for Planetary Laboratories bundles the astrobiological, spectroscopy and analytical laboratory activities of the Institute for Planetary Research. The department combines the Astrobiological Laboratories, the Planetary Spectroscopy Laboratory (PSL) and the new Sample Analysis Laboratory (SAL). Within this department we offer a wide range of laboratory techniques as well as environmental chambers that cover almost all bodies in the solar system and beyond and participation in ongoing and past solar system missions.
https://www.dlr.de/pf/desktopdefault.aspx/tabid-17241/
What is the project's research question?
Can we understand remote target mneralogy, based on remote hyperspectral data and laboratoy measurements?
What data will your exchange student work on?
Hyperspectral data from lunar samples or lunar samples,Mars data, laboratory measurements.
What tasks will the project involve?
Study / research literature on the remote target and laboratory measurements, select appropriate algorithm, evaluate performance and generalization on known and unknown targets.
What makes this project interesting to work on?
Mix of theory and application, work on solar system remote data, possibility to learn laboratory measurements skills.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? After pubblication.
What infrastructure, programs and tools will be used? Can they be used remotely?
Python, institute server, laboratory data, laboratory instruments (the latter are available only locally).
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Geographic information systems, Python
Interested candidates should be at Master or PhD level.
Using high energy electrons to do real space imaging of two- and three-dimensional spin structures in magnetic nanostructures and devices;
Developing a variety of algorithms to reconstruct 3-dimensional vector fields from 2-dimensional projections of scalar fields;
Searching for novel localized spin structures that are protected by topology.
https://scholar.google.de/citations?user=Ydcc8tkAAAAJ&hl=zh-CN
What is the project's research question?
Inverse problem from 2-dimensional scalar fields to 3-dimensional vector fields
What data will your exchange student work on?
Simulated data of the forward problem; data acquisition from electron microscopy
What tasks will the project involve?
Developing or improving algorithms that aims to solve the inverse problem from 2-dimensional scalar fields to 3-dimensional vector fields
What makes this project interesting to work on?
There is not yet existing algorithms that can do proper tomographic reconstruction of vector fields from a limited 2-dim dataset, in particular, in electron microscopy, there is not yet algorithm that can recover the spin structures in 3D from the measured 2D scalar fields (one component of the magnetic vector potential).
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Juelich supercomputing center, Python, Tensorflow, etc.
What skills are necessary for this project?
Machine learning, Deep learning, High-performance computing, Parallel/distributed programming with GPUs, Python
Interested candidates should be at PhD or Postdoc-level.
The group "Applied Web- and Social Media Data Analysis" is embedded in the "Data Acquisition and Mobilisation" Department (Institute of Data Science, German Aerospace Center (DLR)). Our current researches has a strong focus on Twitter data analysis (NLP), Machine Learning (supervised and unsupervised) and semantic, spatial and temporal pattern analysis. Target applications are disaster management and human-environment interactions, for instance, the effect of Covid-19 on observed Twitter sentiments. List of publications: [https://scholar.google.de/citations?](https://scholar.google.de/citations?hl=de&user=n6C4T6AAAAAJ&view_op=list_works&sortby=pubdate)
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-18211/28950_read-76230/
What is the project's research question?
How can information shared on Twitter be fact-checked and validated using the Twitter stream and other complementary open web sources?
What data will your exchange student work on?
Twitter (full archive access) and webpage text data (NLP), images might also be of interest
What tasks will the project involve?
Many possible directions:
- test tailored word embeddings for semantic Twitter stream clustering
- NLP to detect named entities with special emphasis on place names
- applying web-scraping techniques
- information retrieval
What makes this project interesting to work on?
The research goes beyond the typical focus an a single data source and is intended to be applied in real scenarios. Our practitioners (e.g. World Food Programme) need information context - not only a single Tweet that was classified by a machine learning model. For decision support, they need validated information in order to reduce the noise and to enable the utilization and incorporation of social media and web data into well-established workflows that mainly rely on remote sensing, GIS data and in-situ observations.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Twitter has own terms of use, but other data is open source
What infrastructure, programs and tools will be used? Can they be used remotely?
We use Python and tensorflow for programming and have access to an own HPC infrastructure, which can be accessed remotely.
Ideally, the PC used to log in to our network is provided by DLR (in order to fulfill security regularities).
What skills are necessary for this project?
Data analytics, statistics, Data analytics, statistics, Machine learning, Python, Web scraping and information retrieval would help
Interested candidates should be at Bachelor, Master or PhD level.
I am an interdisciplinary researcher with a passion for developing and integrating qualitative and quantitative tools to natural hazards research. I am particularly interested in investigating the impacts of floods and droughts. Currently, I am diving into the amazing world of data analytics. I use a set of tools to support corpus-based investigations. I employ a series of natural language processing (NLP) techniques and both supervised and unsupervised machine learning methods to uncover hidden patterns in text data.
https://www.ufz.de/index.php?en=46549
What is the project's research question?
What were the direct and indirect impacts of the 2021 flood in Germany?
What data will your exchange student work on?
The student/researcher will work on a dataset that comprises about 30000 newspaper articles that report on the 2021 flood in Germany. The data spans from July 2021 until November 2021
What tasks will the project involve?
The student/researcher will make use of unsupervised machine learning (ML) tools to find latent structure in the unlabelled newspaper text data. The goal is to identify impacts (topics) mentioned in the newspaper articles that the data itself will define.
State-of-the-art probabilistic (e.g. latent dirichlet allocation - LDA) and neural network-based (e.g. doc2vec, word2vec) topic modelling algorithms will be tested (Task 1). The advantage of these tools is that they do not require a priori knowledge of the events and their impacts. They allow for automatically discovering coherent topics in large text corpora. These “topics” correspond to clusters of words that are likely to co-occur following an estimated probability. In this way, topic modelling allows for inductively identifying salient topics within a collection of texts. Standard statistical evaluation metrics, like recall, precision, F-score, as well as topic coherence and diversity, will be used to select the best model. Outputs of this task will indicate, in a boolean fashion, if an article mentions a specific impact. These could include water supply disruption, electricity shortages, closure of roads, and breakage of dams.
Besides using unsupervised ML, the student/researcher will use recent advances in keyword assisted or semi-supervised topic modelling (Task 2) to identify impacts to specific sectors (e.g. forestry, agriculture, energy, food). This will allow us to assess impacts of particular interest and increase the interpretability of the model outcomes. For this semi-supervised technique, the student will integrate domain knowledge via “anchor words”, which will be defined by the Principal Investigator (Mariana Madruga de Brito) before fitting the model. This will allow the student to guide the topic model in the direction of the selected words.
Based on the inferred impacts per event, the PI together with the student/researcher will investigate trends in the impacts over time and space (Task 3). The outcomes of the topic modelling algorithm (i.e. topic similarity) will be used to characterize and summarize connections and analogies between the articles. Through quantifying the similarity of pairs of texts, we will cluster articles with related impacts. Visualization tools such as similarity graphs and PCA (principal component analysis) will be used to visualize how the articles are related in terms of their impacts. Moreover, chord diagrams will be used to analyze connections between the impacts.
What makes this project interesting to work on?
The project will explore novel ways of overcoming some of the major methodological obstacles related to the assessment of flood impacts. NLP is currently not a standard tool in natural hazards research. Hence, there is potential for innovation even when using state-of-the-art NLP tools. Outcomes of this stay can potentially lead to a publication of high impact in the field.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? The newspaper texts are not freely accessible depending on the news outlet
What infrastructure, programs and tools will be used? Can they be used remotely?
The student will use the programming language of their choice. The UFZ computer clusters will be available for the student and they can be used remotely.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Scientific computation, data mining, Machine learning, Parallel/distributed programming with GPUs, Python
Interested candidates should be at Master, PhD or Postdoc level.
The Oceanic Machine Vision Group at the GEOMAR works on the topic of optical underwater surveys employing artificial intelligence (AI) and classical computer vision approaches. To this end, we seek to enable cameras to serve as faithful mesasurement instruments and navigation sensors in the deep sea. The latter, visually challenging environment, presents us with a lot of geometric (refraction) and radiometric problems (attenuation, scattering) which we seek to solve.
What is the project's research question?
How to integrate differentiable physical models with neural networks Optional: data acquisition: devise a fixed calibration structure, which is beneficial for data acquisition
What data will your exchange student work on?
We have multiple sets of underwater imagery, specifically tailored to the problems we work on:
We have sets of real imagery, taken in a water test tank and in the Baltic Sea: It comprises RAW image data, for radiometric calibration and a corresponding test set. In addition, we have 3D models of seafloors and test objects to test water and shadow removal on complex structures. We also have synthetic datasets, which can be used to develop and test algorithms wrt. known ground truth. All datasets comprise different light / water setups.
If necessary, we also have camera-light systems, which can be operated manually, inside a test tank or directly in the Baltic Sea. In addition, they can be attached to AUVs and the like, to capture actual deep sea datasets.
What tasks will the project involve?
Optionally: Acquire new data, with existing approaches, or devise new (more robust) data acquisition strategies/hardware.
Main Task:
Devising novel approaches to integrate neural networks into differentiable physical models, thus replacing parts of (potentially not solvable) closed form approaches with learned sub-modules.
Or reversely: devise novel approaches to integrate differentiable physical models into neural networks, thus enabling enhanced parameter estimation from images and a physically interpretable latent space estimate.
The actual implementation would involve writing Python / C++ Code on Linux machines, potentially in some remote (over ssh) work.
The evaluation can also take place on-site or in remote (ssh) work.
We have a lot of prior work and can adapt the tasks to the level and interests of the applicant.
Finally, we strive to publish the jointly achieved result.
What makes this project interesting to work on?
The goal to apply AI-methods -- while maintaining the precision of classical computer vision / photogrammetry approaches -- to actual deep sea imagery entails a twofold set of interesting aspects:
On the theoretical side, novel approaches at the intersection of computer vision and computer graphics can be discovered. The emerging field of differentiable physical models and its combination with neural networks offers a glimpse on a new generation of physically faithful AI.
On the practical side: A very close hands on, experience with respect to the data and its acquisition is to be expected. We do not download a dataset from the web, but we make our hands wet, to get it out of the water. This not only offers a direct experience of the effects we want to work with (refraction, attenuation and the like) but also the opportunity to improve results by particularly improving on the input side.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
(Optionally) Taking new data, of course, would involve on site work, which can be conducted in our underwater camera-lab.
Real data can be taken in our test tank under controlled conditions.
Furthermore, GEOMAR is situated directly at the Baltic Sea, allowing for easy real data acquisition
However, the existing data is readily available through our GIT, which is -- of course -- accessible throughout the world.
We will also provide a gitlab project space for the development process.
If no local computing power is available to the student, we can provide remote access to a HPC workstation, featuring 2X RTX A6000 NVIDIA cards (48 GiG RAM) / 8 Core Xeon CPU / 128 GiG RAM.
What skills are necessary for this project?
Machine learning, Deep learning, Computer vision and image processing/analysis, Python, C/++
Interested candidates should be at Master, PhD or Postdoc level.
When software engineering and AI system development is regarded as simply a means to an end, such as in scientific settings, security considerations take a back seat. We research reliable vulnerability detection in source code, code clone detection, automatic type inference and code quality assessment and our automatic solutions can improve security and safety without spending too much time on it. We further consider privacy and data protection aspects with privacy-preserving learning algorithms and investigate the overall information security of AI systems.
clemens-alexander.brust@dlr.de
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-12909/22555_read-54542/
What is the project's research question?
Can we improve the general applicability of ML-based type inference solutions by generating and analyzing multi-domain training data?
What data will your exchange student work on?
Our group already has "CrossDomainTypes4Py", a large cross-domain type inference dataset, which focuses on the two domains of scientifc code and web development. It is mined from public source repositories and the projects are associated with domains by studying their dependencies.
What tasks will the project involve?
- Extension of the exisiting pipeline to more than two domains
- Identifcation of suitable domains in a methodical fashion
- Evaluation of state-of-the-art type inference methods across domains, including third-order interactions which has not been done yet.
What makes this project interesting to work on?
Machine-learning based methods have had difficulties with type inference tasks, for example due to nesting of generic types. The formulation as a classification problem is incompatible with a finite number of classes, and struggles with open vocabularies. At the same time, there exist reasonably performant tools based on static analysis. However, machine learning-based approaches could take advantage of semantic information such as identifiers and comments that are not used by static tools. They can also be fine-tuned to certain problem domains and learn common expressions, patterns and usages. This project extends the current research into cross-domain type inference, where we want to generalize from one problem domain to the other by introducing third-order interactions. For example, one might learn on two domains and then apply the resulting model to a third domain where labeled training data is not available.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
All our tools are open-sourced. Our infrastructure, including a HPDA cluster, can be accessed remotely via a VPN connection. Most of the institute works from home currently, so we are well suited to communicating this way.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Software development, Python
Interested candidates should be at Bachelor or Master level.
The research aims of the Remote Sensing and Geoinformatics Section at GFZ are to establish remote sensing as a core method in geosciences. In particular, we aim to increase awareness of the considerable value of remotely sensed data for knowledge generation about Earth’s surface properties and processes, which arises from its ability to provide complete coverage over large spatial scales.
https://www.gfz-potsdam.de/en/section/remote-sensing-and-geoinformatics/overview/
What is the project's research question?
What deep learning algorithm can be applied on Soil spectroscopy data for predicting soil properties accurately ?
What data will your exchange student work on?
The exchange student will work on combining several open access soil spectral libraries in the Visible-Near Infrared-Mid-infrared range for predicting soil properties.
What tasks will the project involve?
- Descriptive analysis
- Outliers detection
- Soil spectral pre-processing
- Model development
- Model validation
What makes this project interesting to work on?
Soil spectroscopy is the measurement of light absorption when the light in the visible, near-infrared or mid-infrared (Vis-NIR–MIR) regions of the electromagnetic spectrum is applied to a soil surface. Vis-NIR- MIR reflectance spectroscopy sensed the proportion of the incident radiation reflected by soil. These characteristic spectra can then estimate numerous soil attributes, including minerals, organic compounds and water. Additionally, It can improve the soil properties prediction when surface reflectance from satellite imagery is used.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
All data processing needs to be done using python or R language.
GFZ's high-performance computing system will be used for developing models.
All development can be realized remotely.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Parallel/distributed programming with GPUs, Python,
Interested candidates should be at Master, PhD or Postdoc level.
The machine learning group is situated at the junction of fundamental machine learning research and practical applications within the German Aerospace Center (DLR), such as computer vision for earth observation data, anomaly detection, explainable artificial intelligence and many more. It aims to be at the state of the art in deep learning, to further develop such methods, and determine how to put them into practice for DLR problems. Consequently, the group considers machine learning not just for a specific set of applications or data sets, but from a holistic perspective.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-13689/23754_read-54469/
What is the project's research question?
What anomaly categories can be mined from the lab notebooks of the eden iss greenhouse and how can they be clustered and mapped to the respective telemetry data.
What data will your exchange student work on?
The main dataset consists of three years of telemetry data (99 variables) recorded in the EDEN ISS research greenhouse in Antarctica.
For the same duration we have the lab notebooks that contain information about anomalous events occurred in the daily work.
We want to make this expert information usable and map it to the anomalies found in the telemetry data.
What tasks will the project involve?
The goal can be approached from different directions depending on the interns interest and skillset, e.g. on extracting the information using NLP methods, categorizing the found events into useful anomaly categories or focusing on the mapping to the telemetry dataset.
What makes this project interesting to work on?
The project offers the opportunity to work on a unique dataset and a challenging task, and bring it one's own ideas and research interest. The EDEN ISS facility was established in 2018 to conduct research on growing vegetables in closed system environments. This knowledge will be used to develop greenhouse habitat modules for future space missions, e.g. to moon or mars.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
High Performance Compute Cluster (can only be used on-site)
Python
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc level.
Our lab develops and applies genomics and computational approaches, in particular from machine learning, to understand mechanisms of gene regulation in eukaryotic organisms. Computational biology has become indispensable to analyze and ultimately make sense of large-scale data sets that look at the phenomenon of gene regulation from different angles. Our long-term goal is to investigate how regulatory networks enable the correct development of complex organisms, with their multitude of cell types that carry out different functions despite the same genome.
https://ohlerlab.mdc-berlin.de/ and https://github.com/ohlerlab
What is the project's research question?
Improving and extending a deep learning model for multimodal single-cell data integration (related to multi-view learning); Generation of biologically meaningful joint data representations of different biological data types with distinct properties (dimensionality & scale (of features), information content, noise)
What data will your exchange student work on?
- Multimodal single-cell data of well studied human cell populations of peripheral blood mononuclear cells and bone marrow mononuclear cells
- Multimodal single-cell data refers to sequencing-based paired measurements of distinct molecular layers of information, such as, e.g., gene expression (RNA) and chromatin accessibility (DNA), originating from the same single-cell
- Data represented as count matrices
What tasks will the project involve?
Dependent on the student's interest, the project will revolve around one of the following aspects:
- Optimizing our existing model for vertical and horizontal multimodal single-cell data integration concerning its training time, by optimizing the existing code (written in PyTorch) or rewriting it using a different framework (e.g., pyro).
- Extending the existing model to allow for mosaic integration, i.e. combining single-view and multi-view datasets. Here, we already have ideas for modifying the existing architecture, but the student could also contribute their own ideas.
- Extending the framework to be able to work with more than two modalities at the same time
- For related prior research of our group see: https://www.biorxiv.org/content/10.1101/2021.05.11.443540v1.full and https://linkinghub.elsevier.com/retrieve/pii/S0168-9525(21)00255-9
What makes this project interesting to work on?
- Insight into problems at the frontier of developments in the field of single-cell genomics
- Using ML to solve of a real-world problem of high relevance for experimentalists
- Speed-up reduces the computational burden and enhances competitiveness
- Extends applicability of existing models to more use cases
What is the project's expected outcome?
Contribution to software, possible co-authorship dependent on project state and outcome
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
- GPU nodes on the institute's cluster
- Python
- (yet) unpublished in-house software library (python/Pytorch)
- Pytorch
- potentially Pyro
- Jupyter Notebooks
- Standard python data science libraries, e.g., pandas, matplotlib, sklearn, domain-specific libraries, and data structures scanpy and anndata
All of them can be , used remotely.
What skills are necessary for this project?
Deep learning, Software development, Python
Interested candidates should be at Bachelor or Master level.
The Causal Inference group at the DLR-Institute of Data Science in Jena develops theoretical foundations, algorithms, and accessible software tools for causal inference and machine learning. Causal inference is a challenging and promising research field and its application to domains such as climate science will have a high impact both to advance science and to address topics of critical importance for society. The core methodological topics include causal inference and causal discovery for spatio-temporal dynamical systems, machine learning, deep learning, and nonlinear time series analysis.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-12906/22550_read-52195/
What is the project's research question?
Are novel algorithms from the emerging field of causal representation learning able to reliably learn latent causal variables and their causal interactions in spatio-temporal data?
What data will your exchange student work on?
We will use synthetic spatio-temporal data generated from a simplified stochastic climate model based on a VAR process that emulates spatially aggregated modes of variability (these are the latent causal variables) which interact via teleconnections (these are their causal interactions).
The data need to be synthetic in order to have ground-truth knowledge about the latent variables and their interactions for the purpose of method evaluation.
What tasks will the project involve?
We will give input and guidance at all steps of the project. These are:
- Familiarize yourself with the task of causal representation learning and the selected algorithms.
- Familiarize yourself with the data-generating model.
- Apply the selected algorithms to synthetically generated data for varying setups and systematically evaluate the algorithms' performance according to appropriate metrics.
What makes this project interesting to work on?
Causal representation learning is a novel and cutting-edge line of research in the field of machine learning. Moreover, its application to spatio-temporal data is of high relevance for data-driven approaches in applied fields such as the climate and environmental sciences. The results of this study may guide future research in this important direction.
The project thus offers the opportunity to work with exciting novel machine learning methods towards an important and relevant research direction.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Synthetic data generated from an open source model
What infrastructure, programs and tools will be used? Can they be used remotely?
We will use GitLab to organize and exchange files. This is accessible remotely. Other tools, e.g. a Python IDE and a LaTeX editor, can be used locally. For computationally expensive operations access to a computing cluster will be provided.
To facilitate remote work we would hold regular online meetings for discussing goals, progress, and other aspects of the project.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Python
Interested candidates should be at Master, PhD or Postdoc level.
We develop and apply numerical-modelling and data-science approaches to simulate and understand the processes responsible for sculpting the Earth's surface. These processes include, but are not limited to, incision by rivers and glaciers, formation of deltas, coastal erosions, natural hazards, and Arctic permafrost landscapes (with our partners in the Alfred Wegener Institute), especially under a changing climate. One of our foci is on using deep-learning methods to augment and enhance our models and datasets.
https://www.gfz-potsdam.de/en/section/earth-surface-process-modelling/overview/
What is the project's research question?
How is the Arctic permafrost degrading in the warming climate and how do we quantify its effects and impact?
What data will your exchange student work on?
Data will consist of remote-sensing imagery from Landsat, Sentinel, and other satellites or aerial surveys with repeat coverage over time. The images are of Earth-surface features related to permafrost in the Arctic and sub-Arctic. Some of the permafrost features are labelled, and some labels may be supplied by existing machine-learning models trained to identify specific classes of features.
What tasks will the project involve?
Participate in the design and development of a deep-learning model that can detect and quantify changes of permafrost features in the pan-Arctic region. Depending on the level of experience and expertise, tasks ranging from data-pipeline development to deep-learning model implementation or even model-design changes are possible. The specifics can be adapted to the candidate, after discussions with the candidate.
What makes this project interesting to work on?
Quantitative change-detection has wide and transferable applications across many fields, both inside and outside of academic research. Moreover, the Arctic region is particularly vulnerable to global warming, with temperatures rising about twice as quickly as the global average -- known as "Arctic Amplification". The permafrost harbours huge quantities of soil organic carbon, potentially exacerbating climate change if thawed and released. The ecosystem and societies living in the Arctic are also intricately linked to the fate of permafrost.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? The data come from both public/open and proprietary sources.l
What infrastructure, programs and tools will be used? Can they be used remotely?
Access to a suitable computing server with GPU capabilities will be provided. All necessary software packages are open source and can be downloaded and installed on most machines. Remote access is possible with VPN.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, High-performance computing, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Geographic information systems, Python
Interested candidates should be at Bachelor, Master or PhD level.
The machine learning group is situated at the junction of fundamental machine learning research and practical applications within the German Aerospace Center (DLR), such as computer vision for earth observation data, anomaly detection, explainable artificial intelligence and many more. It aims to be at the state of the art in deep learning, to further develop such methods, and determine how to put them into practice for DLR problems. Consequently, the group considers machine learning not just for a specific set of applications or data sets, but from a holistic perspective.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-13689/23754_read-54469/
What is the project's research question?
How is the quality of the labels in a popular remote sensing dataset used for training deep neural networks
What data will your exchange student work on?
Aerial or satellite imagery
What tasks will the project involve?
Choosing a dataset to work on from a selection of suitable datasets
Getting familiar with a method of assessing label quality, most likely by estimating uncertainty
Implementing and a applying the method on the chosen dataset
Discussing the validity of the results
What makes this project interesting to work on?
Knowledge in the field of Deep Learning and uncertainty quantification will be gained. The student will furthermore obtain skills in the processing of remote sensing imagery, as well as get insights in the working life of a research institute.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
HPDA cluster can be used remotely
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Python
Interested candidates should be at Bachelor or Master level.
The machine learning group is situated at the junction of fundamental machine learning research and practical applications within the German Aerospace Center (DLR), such as computer vision for earth observation data, anomaly detection, explainable artificial intelligence, and many more. It aims to be at the state of the art in deep learning, to further develop such methods, and determine how to put them into practice for DLR problems. Consequently, the group considers machine learning not just for a specific set of applications or data sets, but from a holistic perspective.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-13689/23754_read-54469/
What is the project's research question?
Can information about structural uncertainties help to decrease human effort with uncertain Neural Network predictions?
What data will your exchange student work on?
The student will work in the field of uncertainty quantification in Neural Network predictions. The student will start with evaluating predictions of pre-trained neural networks with a special focus on the calibration and structure in the predicted uncertainties. The setup will be realized using artificial dummy data and remote sensing data received from satellites and labeled for land cover classification.
What tasks will the project involve?
* Getting familiarized with neural networks and predictive uncertainty in neural networks.
* Evaluating predictions from pre-trained neural networks with a special focus on predictive uncertainties.
* Based on the previous evaluations and in exchange with the supervisor, potential improvements in the training and inference strategies (data feeding, loss function, ...) should be proposed and tested.
What makes this project interesting to work on?
A crucial point for using deep learning approaches in safety-critical real-world applications is the robustness of such approaches and proper uncertainty quantification for difficult predictions. The student will get many insights into this field and has the chance to make the first steps within an established research environment. Besides this interesting field of research, there will be close cooperation with the supervisor who is also working in the same field. And visiting the beautiful city of Jena is also definitely worth it.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
* Python with freely available standard packages. The deep learning part is implemented using PyTorch.
* For visitors who come to Jena (Germany), the larger computations can be run directly on a high-performance computing cluster. Remotely this is unfortunately not possible. (But as Jena is really nice you should come here anyway 😉 (if possible)).
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Python
Interested candidates should be at Bachelor or Master level.
Michael Denker is leader of the team “Data Science for Electro- and Optophysiology Behavioural Neuroscience” at INM-10, Forschungszentrum Jülich, to meet the upcoming challenges in the field of research data management in neuroscience. The main research interest of his team is to investigate the relationship between the correlation structure and spatio-temporal organization of neural activity. In the context of the EU flagship project Human Brain Project (HBP) he coordinates the development and community building of tools for improving reproducibility in analysis and model validation, such as Elephant (python-elephant.org), one of the leading open source analysis tools for electrophysiological data.
https://www.fz-juelich.de/inm/inm-6/EN/Forschung/Gruen/DSEO.html?nn=724694
What is the project's research question?
In this project we aim to perform a batch analysis of large datasets of activity data from the brain with the aim to characterize significant repeating neural activations patterns in the data. Knowledge about the statistics of such patterns is expected to contribute to our understanding of the functional role of ubiquitously observed brain waves. The project consists of two parts: (i) expansion of capabilities of the distributed tensor framework Heat (https://github.com/helmholtz-analytics/heat) as part of the underlying analysis algorithm and (ii) application of the method to a range of data sets using high-performance compute resources and subsequent characterization of the results.
What data will your exchange student work on?
The project will utilize neuroscientific activity data featuring electrophysiological recordings of single neuron spiking activity, i.e., point time series, obtained from a complex timed motor coordination task. The data, covering multiple recording sessions, are readily available, curated and prepared.
What tasks will the project involve?
The project will imply the use of the ASSET analysis method to detect recurring sequences of brain activity in the data (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004939 ). While an implementation of this method exists, in its current naïve form the size of the datasets are too large to be handled in full on a standard compute node. For this reason, a new implementation of the method has been developed that uses the Heat framework (https://github.com/helmholtz-analytics/heat) to distribute matrix operations across multiple nodes. This implementation currently lacks the final step of the ASSET analysis, which is based on a DBSCAN clustering algorithm.
In the first part of the project we will implement a solution to hook the distributed ASSET to an existing parallel implementation of DBSCAN (https://github.com/Markus-Goetz/hpdbscan). In the second part of the project we will apply the method to available data sets, in order to extract sequences of activity patterns that repeat in excess of chance expectation within and across recording sessions. We pool data to construct corresponding pattern statistics, such as sequence lengths, neuronal participation numbers, sequence reliability, pattern similarities.
What makes this project interesting to work on?
By participating in this project, you will be involved in parallelizing machine learning algorithms, and you will help answer fundamental questions on brain function via statistical pattern-mining methods in computational neuroscience.
The Heat framework for parallel computing in Python is being developed to make data-intensive research possible that is otherwise severely hindered by single-CPU memory bottlenecks. When completed, the synergy between Heat and the ASSET method will enable our group to find correlations within our brain activity data on an unprecedented scale.
What is the project's expected outcome?
Contribution to software
Is the data open source? The data is published as Brochier, T., Zehl, L., Hao, Y., Duret, M., Sprenger, J., Denker, M., Grün, S., Riehle, A., 2018. Massively parallel recordings in macaque motor cortex during an instructed delayed reach-to-grasp task. Scientific Data 5, 180055. https://doi.org/10.1038/sdata.2018.55
What infrastructure, programs and tools will be used? Can they be used remotely?
Both Heat and ASSET are written in Python, using NumPy and PyTorch functionalities, and MPI for parallel operations.
Heat: https://github.com/helmholtz-analytics/heat
ASSET (Elephant library): https://github.com/NeuralEnsemble/elephant
We use GitHub, Mattermost and video calls for developer discussions.
We have access to the supercomputers at the Jülich Supercomputing Center for performance and scalability tests:
https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/supercomputers_node.html
All the infrastructure can be used remotely.
What skills are necessary for this project?
Data analytics, statistics, Parallel/distributed programming with GPUs, Python
Interested candidates should be at PhD or Postdoc level.
We work in the field of machine learning and high-performance computing.The methods we work on include classical image processing methods, complex modelling steps such as diffusion tensor reconstruction and modern machine learning techniques such as deep learning models. We develop and adapt methods originally developed for workstations or small clusters for scaling to HPC systems at the Jülich Supercomputer Centre.
https://www.fz-juelich.de/ias/jsc/EN/Expertise/SimLab/slns/_node.html
What is the project's research question?
This project aims at analyzing different approaches to parallelize linear regression to be applied to massive amounts of data. After analyzing the different approaches theoretically, they will be implemented in HEAT((https://github.com/helmholtz-analytics/heat)), an open-source software for high performance data analytics and machine learning. The resulting algorithms will be benchmarked on a high-performance computing (HPC) system and finally applied to real world data from atmospheric science.
What data will your exchange student work on?
The data comes from atmospheric research, on the one hand from air quality forecasting systems (CAMS project), on the other hand from measuring stations all over Europe.
What tasks will the project involve?
The tasks will involve theoretical analysis of machine learning algorithms (distributed linear regression, e.g. https://arxiv.org/abs/1810.00412) as well as its implementation in software followed by benchmarking it on a HPC system. Finally, the implemented method will be applied to real world data from atmospheric science.
What makes this project interesting to work on?
The project gives the opportunity to work on the edge between machine learning and high-performance computation. Also, it allows us to contribute to a state-of-the-art ML software library as well as work on a HPC system.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
The implementation in HEAT (https://github.com/helmholtz-analytics/heat) will be done in Python using the ML library PyTorch (https://pytorch.org/). In addition, knowledge of HPC systems is an advantage, including knowledge of MPI which will be used via MPI4py (https://mpi4py.readthedocs.io/en/stable/). The student has access to the supercomputers at the Jülich Supercomputing Center for performance and scaling tests as well as for the final application to real world data: https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/supercomputers_node.html
All work can be done remotely.
What skills are necessary for this project?
Machine learning, High-performance computing, Software development, Python
Interested candidates should be at Master, PhD or Postdoc level.
1. The Earth System Data Exploration (ESDE) research group at the Jülich Supercomputing Centre (JSC) develops innovative methods and tools for the integration and analysis of complex, heterogeneous, and big datasets related to air pollution, weather, and climate
2. ESDE explores state-of-the-art deep learning for air quality, weather, and climate applications
3. ESDE develops parallelized deep learning workflow toolkit and performs scalable deep learning on HPC systems
What is the project's research question?
Can deep learning be an effective tool for downscaling atmospheric fields (This task is analogous to the super-resolution task in the computer vision domain that projects the input image from low-resolution to high-resolution)? Can deep learning models be generalized to transfer the pre-trained model across geographical regions (particularly in the data-sparse regions) with or without additional fine-tuning in the context of downscaling?
What data will your exchange student work on?
The student will primarily work on the weather and climate benchmark datasets that have been prepared in the MAELSTROM project (see details: https://www.maelstrom-eurohpc.eu/products-ml-apps). Particularly, the student will explore the application “Datasets for 2m temperature and precipitation short-range forecasts” and the application “Dataset for 2m temperature downscaling”.
What tasks will the project involve?
1. Further develop advanced deep learning methods (e.g. GANs, Visual Transformers, etc) for temperature and precipitation downscaling
2. Explore the domain adoption approach to transfer pre-trained neural networks across multi-geographic regions and multi-data sources.
3. Scale the deep learning networks on the JUWELS and JUWELS Booster systems at the Jülich Supercomputing Center
What makes this project interesting to work on?
1. The students will contribute to the EuropHPC project, work with the Machine Learning, HPC, and Earth Science scientists from the world-class international research centers and universities
2. The students will have the opportunity to tackle an important challenge in weather and climate research with deep learning
3. The students will gain experiences and knowledge in the field of deep learning, HPC, and Earth science.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The study will have the opportunity to access to JUWELS and JUWELS Booster HPC systems in JSC remotely. The student will be able to utilize JUPYTER-JSC WEBSERVICE which provides users an interactive environment for application development. (see details: https://docs.jupyter-jsc.fz-juelich.de/github/FZJ-JSC/jupyter-jsc-notebooks/blob/master/Jupyter-JSC_supercomputing-in-the-browser.pdf) . All the infrastructure and tools can be used remotely
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, High-performance computing, Computer vision and image processing/analysis, Software development, Python
Interested candidates should be at Master, PhD or Postdoc level.
The central theme of the metadata management group is making data available not just in its initial context but beyond the boundaries of projects, institutions, and communities. Our activities revolve around areas such as metadata, Semantic Web & knowledge graphs as well as data management in research and industry. Additionally, we aim to address the topic of other services that can we build on top of knowledge graphs and semantic metadata to support users.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-12905/22531_read-49439/
What is the project's research question?
In this project, voice recognition software solutions are to be evaluated with respect to their usefulness in non-contact data and metadata collection in scientific laboratories. The research question is how a speech recognition solution in scientific laboratory can support (meta)data capture with respect to both data quality (e.g. ability to work efficiently in the presence of background noise) and accuracy (capable of recognizing scientific terminology).
What data will your exchange student work on?
Voice input/output (of scientific text) and their electronic text transcription. Data preparation and generation will be part of the project.
What tasks will the project involve?
- tool selection for tests
- input data preparation & creation in a set of experiments
- evaluating performance of selected tools
What makes this project interesting to work on?
Modern laboratory work requires innovative solutions to enable the scientist to capture (meta)data in a digital form efficiently and easily at the same time. The use of electronic laboratory notebooks (ELNs) helps with improving research data management and laboratory processes. However, it is not always possible to use traditional ELNs. Voice recognition is a powerful tool that can support modern laboratories along their path towards digitalisation: collecting data directly during the experiment, transforming speech input into digital text data, and pushing them to the dedicated ELN. Combining artificial intelligence and scientific data management is the unique characteristic of the project.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? one of the tasks will be preparing the data
What infrastructure, programs and tools will be used? Can they be used remotely?
For the voice input capture we will provide the necessary infrastructure, i.e. hardware and access to our server infrastructure. Part of the infrastructure can be used remotely. For the output data analysis, necessary infrastructure is also accessible remotely and will be provided.
What skills are necessary for this project?
Machine learning, Software engineering, Software development, speech recognition, NLP
Interested candidates should be at Master level.
The Earth Surface Geochemistry group at GFZ Potsdam uses cosmogenic and stable metal isotopes to trace material turnover on the Earth’s surface. We employ these isotope fingerprints to understand weathering and climate interactions, to study soil, plant and nutrient cycles and quantify erosion processes and global sediment cycles.
https://www.gfz-potsdam.de/en/section/earth-surface-geochemistry/overview/
What is the project's research question?
Laser ablation methods are gaining huge traction in the Earth sciences, thanks the unique insights they can offer, and the high throughput of sample measurements that is possible. However, although the analyses are fast and require little or no sample preparation, data processing post-analysis is currently time-consuming and laborious, and is a rate-limiting step. The question that will be addressed in this project is how can data science methods be applied to more efficiently analyse and visualise transient stable isotope ratio signals, more reproducibly apply corrections and more rapidly evaluate large datasets and check the accuracy of known samples.
What data will your exchange student work on?
Our group is a leading centre for the determination of stable isotope ratios by laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS). Variation in the abundance of different stable isotopes record processes in biological, geological and chemical systems and allow us to trace pathways and reactions without interfering with the natural conditions. Isotope ratios are routinely employed for age determination, source provenance and as a proxy for e.g. past oceanic pH values and atmospheric CO2 levels. Whereas most isotope ratios are determined by dissolving and processing bulk samples, with lengthy sample preparation steps, our group employs in situ techniques based on laser ablation. LA-ICP-MS is a technique, where a focused laser beam is ablating a small spot in a solid sample, the formed aerosol is transported by a stream of helium into a plasma where the sample is ionised. The ions are accelerated, sorted according to their mass-to-charge ratio and recorded time resolved in a mass spectrometer.
ICP-MS is a relative technique, meaning that the signal recorded by the mass spectrometer is potentially biased and the measurement must be corrected by comparison to known reference materials (calibration). In addition, a challenge with LA-ICP-MS is that the element of interest is not separated from the other components of the samples, which can introduce interferences that must also be corrected for.
For concentration measurements, vendor-provided software packages are typically sufficient to perform fast and reproducible data reduction. However, for stable isotope ratios – in particular when time resolved data is recorded – the provided software is not suitable. Typically, most labs (ours included) will use a set of in house developed spread sheets, macros and simple scripts to apply the correction schema and calibrate their results. Often this means manually copying in data from raw mass spectrometer datafiles (tab delimited text format), and visually screening data. However, this lack of automation means evaluation of outliers, blank correction and data standardisation requires a significant time investment, and can potentially introduce subjectivity.
What tasks will the project involve?
The student’s first task will be to explore representative datasets, to understand current practices in evaluating transient isotope ratio signals and the calculations needed, and better evaluate existing workflows and where they could be improved by automation.
The student would then be tasked with coding an interface to load and parse mass spectrometer datafiles, and perform necessary calculations and visualisations. This should include (semi)automated integration of the sample signal and background to be subtracted (based on user-defined criteria), calculation of isotope ratios, outlier rejection (based on selectable criteria), (manual) classification of sample type, standardisation of the isotope ratio measurements with reference to accompanying measurements of reference materials, and reporting of the results and data quality metrics (uncertainty).
What makes this project interesting to work on?
Laser ablation is an extremely versatile tool, which enables researcher to probe on the µm-scale the elemental and isotopic composition of a wide variety of materials. The types of data that the student will work with – and will help with advancing in the future – include measurements of atmospheric CO2 concentrations in the geological past, the chemistry of meteorites, and the geochemical signals of bleaching events on coral skeletons. The new tool could dramatically improve the speed and efficiency at which we produce these data, meaning the student’s work could have a direct benefit for our understanding of the planet, its climate sensitivity, and the effects of human-induced climate change.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software, If successful, we would expect to make the code publicly available for other scientists to use, with its own peer-reviewed descriptive paper (on which the student would be an author).
Is the data open source? The software tool would be made available, as an open source project, with an accompanying publication describing its use. The (raw) data that will be used to test and evaluate the software tool is acquired for different research projects, and these data’s availability will be dictated by the needs of each project. However, our group actively encourages FAIR data principles and shares datasets in open-access data repositories. Typically, only fully processed data are made publicly available.
What infrastructure, programs and tools will be used? Can they be used remotely?
The basis of the tool can be an extension of currently available (commercial) software packages, or preferably be designed from scratch in freely available coding language: R, or preferably Python. Licenses for some commercial software packages are available at GFZ Potsdam, and remote access can be granted to the collaborator.
What skills are necessary for this project?
Data analytics, statistics, Python
Interested candidates should be at Bachelor or Master level.
The German Research Centre for Geosciences (GFZ) is Germany’s national research centre for Solid-Earth Sciences, which investigates the dynamics of planet Earth as it is shaped by physical, chemical and biological processes.The Seismology group is engaged in the development and application of methods to image the elastic structure of the Earth based on the signals of natural earthquakes and the ambient background `hum’, as well as to analyse earthquake activity across a wide range of scales. Machine learning is currently used to enhance earthquake analysis and improve knowledge of earthquake physics.
https://www.gfz-potsdam.de/en/section/seismology/overview/
What is the project's research question?
How can we improve earthquake monitoring with Deep Learning, and how can we turn the nominal confidences returned by DL models in seismology into calibrated uncertainties?
What data will your exchange student work on?
Time series data representing ground motion from thousands of seismometers worldwide, partially unlabelled, partially labelled with annotations on seismic wave arrivals and corresponding earthquake parameters. Standard python libraries (obspy) will be used to read and preprocess the data. We operate a global network of seismic stations, a seismological data centre and earthquake monitoring service (https://geofon.gfz-potsdam.de), which provides rich archivea and realtime data streaming.
What tasks will the project involve?
The ultimate target of the project is to improve the reliability of Deep Learning (DL) based earthquake analysis models. Performant algorithms for the most straightforward applications (automatic precising of the first arriving wave and selected secondary waves) have been developed recently but there is still a nearly-complete lack of understanding how the nominal confidence returned by DL models relates to actual uncertainties. You will be tasked with testing the performance of algorithms for picking seismic arrivals in different settings and under different noise conditions, and suggest improvements to training strategies (or even model design) and evaluation metrics.
You will be working in the SeisBench framework: https://pypi.org/project/seisbench/ (see also Woollam et al. 2021 https://arxiv.org/abs/2111.00786 ; Münchmeyer et al. 2022 https://doi.org/10.1029/2021JB023499 ), which uses pytorch for implementation of DL models, and provides tools for rapidly benchmarking algorithms.
What makes this project interesting to work on?
You will work with a team of experienced scientists and software engineers and gain experience in coding for a science-driven production environment. You will deepen your knowledge of machine learning for scientific data analysis and the related python packages. Depending on how the project proceeds, your contributions might be included into the Seisbench package or you might become a co-author on a scientific publication. Finally, the developments are designed to directly improve our operational earthquake monitoring, so your work potentially will directly result in improvements for an operational service.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
A desktop or laptop computer will be made available for development work. We rely on open source software for all processing and model training. Both the GFZs high performance computing facilities and the HAICORE Helmholtz-AI GPU cluster can be accessed by this project. All resources can be accessed remotely.
What skills are necessary for this project?
Machine learning, Deep learning, Software development, Python
Interested candidates should be at Master, PhD level.
25 projects were submitted by our German partners in 2022 . Take a look at this year's proposals.
Click on the ' + ' to learn more about the respective topics, the mentors and the conditions of participation.
Our primary research interests are situated at the intersection of geometrical deep learning, topological machine learning, and representation learning. We want to make use of geometrical and topological information—also known as manifold learning—to imbue neural networks with more information in their respective tasks, leading to better and more robust outcomes. Following the dictum ‘theory without practice is empty,’ we also develop methods to address challenges in biomedicine or healthcare applications.
bastian.rieck@helmholtz-muenchen.de
What is the project's research question?
How can we use topological structures (cycles, cliques) in graphs to compress them (in order to improve generalisation performance)?
What data will your exchange student work on?
Benchmark data sets for graph representation learning as well as new data sets collected from sensor data and social networks (I am very open towards creating new data sets for the community, but I would leave this up to the preferences of the individual student).
What tasks will the project involve?
- Getting acquainted with state-of-the-art models in geometric deep learning and topological machine learning
- Implementing your own graph neural network architectures with `pytorch-geometric`
- Learning about how to train (graph) neural networks with `pytorch-lightning`
- Checking how generalisation performance in graph learning tasks is influenced by compression' of relevant graph structures.
What makes this project interesting to work on?
You will learn a lot about the state of the art in geometric deep learning and topological machine learning, two rapidly-growing domains of machine learning that deal with structured data sets. In addition, you will pick up knowledge about deep learning frameworks—these skills can be super helpful in other projects as well.
The techniques developed in this project will be generically applicable to many graph learning problems (graph classification, link predictions, node classification) on very diverse data sets (sensor networks, meshes of 3D shapes, ...).
Geometrical deep learning and topological machine learning are rapidly-growing areas with connections to applications in the life sciences (gene regulatory networks). This project could be the first step for students to get acquainted with these areas and topics.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Essentially, only Python (with deep learning frameworks based on `pytorch` etc.) and some compute resources will be required. All of these can be provided remotely.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Software development, Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc-level.
The Institute for Software Technology aims at software research for scientific and engineering topics in aeronautics, transportation, and energy. At the DLR-site in Brauschweig, the main research is on Model-Based System Engineering and Model-Driven Software Development for space systems. Additionally, an outstanding resaerch activity is on interactive visualization of very large scientific datasets (in conjunction with High-Performance Computing research), immersive environments, Augmented Reality, Mixed Reality, Visual Analytics, and more.
What is the project's research question?
Interactive data processing for explorative visualization of multi-variate climate data
What data will your exchange student work on?
The successful candidate will work on algorithms for atmospheric simulation and measurement datasets. The main topic will be on global time-dependent simulation datasets for weather forecasts and climate informatics. The goal is to process data for interactive visualization approaches. The required Level-of-Detail model has to support view-dependent refinement and streaming from remote high-performance clusters. Topology-based techniques need to be developed to allow analysis and visualization at different levels of detail of multi-field datasets. Besides mapping data on a 3D planet, extracted features from data products are to be processed in a view-dependent manner as well.
What tasks will the project involve?
Literature research; Prototype development; Demonstrating prototype; Writing summary report; Presenting result as talk
What makes this project interesting to work on?
Improved programming skills; Understanding how to process and visualize globe covering 3D simulation datasets; Insight into interaction methods in immersive environments; Improved communication and presentation skills; Insight into German space missions.
What is the project's expected outcome?
Contribution to software, Research methods might eventually result in a joint publication.
Is the data open source? CosmoScout VR is Open Source. Depending on the security level, aproprietary data or benchmark datasets are used
What infrastructure, programs and tools will be used? Can they be used remotely?
The Institute offers access to virtual reality laboratories equipped with a powerwall installation and multiple cluster-based display walls. Some walls offer multi-touch interaction. Besides high-performance workstations, GPU clusters and High-Performance Data Analyses (HPDA) clusters are available for data processing. In addition to this infrastructure, we have head-mounted Displays (Vive, Oculus Rift, etc.) as well as HoloLens glasses.
CosmoScout VR is the software framework in use. It is open source.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Geographic information systems, Software engineering, Software development, C/++,
Interested candidates should be at Master, PhD or Postdoc-level.
We are offering collaboration-as-a-service to matter research scientists in all of the Helmholtz association. We support them to use or expand their use of Machine Learning. For this, we study basic usage of machine learning in data science applications, trustworthiness aspects of machine learning, reproducibility and advanced use of machine learning for non-canonical datasets.
What is the project's research question?
Does the use of Stochastic gradient Langevin dynamics offer an advantage over canonical mean/median summary statistics for MAP estimation
What data will your exchange student work on?
The student will work on standard simulation examples which expose reference datasets in the community, see https://github.com/sbi-benchmark/sbibm/tree/main/sbibm/tasks. Aside of that, the student will work on surrogate or real simulation datasets from accelerator beam monitoring.
What tasks will the project involve?
- understand simulation based inference (see also https://arxiv.org/abs/2112.03235) and why it is needed
- train task with symmetric posterior
- train task with asymmetric posterior
- implement map estimation with Stochastic gradient Langevin dynamics
- benchmark SGLD, mean and median based MAP estimation
- visualise results and report
What makes this project interesting to work on?
Machine learning supported Bayesian statistics for simulation-based inference: The goal of this project would be to benchmark Stochastic gradient Langevin dynamics based optimization for MAP estimation in comparison to mean/median based MAP estimation based on posterior samples.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
This project can be conducted remotely. An average laptop with a fully functional python environment is enough. The sbi [1] and sbibm [2] packages might be required to run. HPC infrastructure will be offered in case of need.
[1] https://github.com/mackelab/sbi/
[2] https://github.com/sbi-benchmark/sbibm
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Python
Interested candidates should be at Master, PhD or Postdoc-level.
Daniel Abou-Ras's group is dealing with correlative electron microscopy applied on thin-film solar cells. Christoph Koch's research focuses on the structural and chemical characterization of matter by electron beam based techniques, at lengths scales ranging from micrometers to atomic resolution. For this purpose we operate several microscopes and related equipment, develop new data acquisition techniques, and along with those new numerical methods to extract relevant information from the collected data.
daniel.abou-ras@helmholtz-berlin.de
https://www.helmholtz-berlin.de/people/daniel-abou-ras/index_de.html
https://www.physik.hu-berlin.de/en/sem
What is the project's research question?
Extract structure-property relationships in optoelectronic semiconductor devices at the nanometer scale
What data will your exchange student work on?
Multidimensional imaging and spectroscopy data from identical specimen positions acquired using various techniques in scanning electron microscopy and transmission electron microscopy
What tasks will the project involve?
- Preparing electron microscopy data for machine learning
-- Using Python package NionSwift for interactive data analysis (GUI running on remote server)
-- Homogenizing length scales of datasets (possibly warping)
-- Dividing data into subsets
- Extracting correlations using machine learning
What makes this project interesting to work on?
- Establishing new methods and making them available as online service
- Extracting relevant materials and device properties directly usable in the research and development of optoelectronic semiconductor devices
- Prospect of joint publication in peer-reviewed journal
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software, Establishing online service for data analysis of correlative microscopy
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
- GPU cluster with 7 NVidia V100 (remotely accessible)
- Electronic lab notebook (open source, elabftw) with Apache Guacamole service (for running applications on the server)
- Access to all the raw data, including the evaluation software used for imaging and spectroscopy
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Software development, Python
Interested candidates should be at Master, PhD or Postdoc-level.
The department "Matter under Extreme Conditions" is one of the research areas of the Center for Advanced Systems Understanding, Helmholtz-Zentrum Dresden-Rossendorf.
We conduct digital systems research of complex systems by combining methods from mathematics, physics, systems theory, data science, and scientific computing.
Within the department, we investigate the non-equilibrium behavior of matter under extreme conditions by developing innovative electronic structure methods and fusing them with machine-learning methodologies for the numerical modeling of high energy density phenomena in warm dense matter induced by extreme electromagnetic fields, temperatures, and pressures.
https://www.casus.science/research/matter-under-extreme-conditions/
What is the project's research question?
We develop a novel class of machine learning model that fuses concepts from quantum computing and random neural networks.
What data will your exchange student work on?
Standard machine learning datasets will be used.
What tasks will the project involve?
Within this scope of the proposed internship, the student will contribute to the validation of our proposed ARQN model using quantum algorithms via the PennyLane Quantum software platform with a limited number of qubits and high-performance computing resources available at our institute.
Further details:
Recently, quantum computing (QC) has been leveraged for machine learning with the hope that the uncertainty in QC can be a great advantage for probability-based modeling, inspiring new research for Noisy Intermediate Scale Quantum (NISQ) devices.
In this project, we focus on a particular class of artificial neural networks called Random Neural Network (RNN).
In RNNs, the neurons are connected such that excitatory (positive) and inhibitory (negative) spike signals are interchanged. While classical (i.e., non-quantum) RNNs have demonstrated effective applications in decision making, signal processing, and image recognition tasks, their implementation has so far been limited to deterministic digital systems that output probability distributions in lieu of stochastic behaviors.
To better exploit the random nature of RNNs, we develop an Artificial Random Quantum Neuron (ARQN) model with a robust training strategy.
The ARQN model relies on the dynamical evolution of two easy-to-implement Hamiltonians and subsequent local measurements. The architecture allows exploiting complex amplitudes and back-action from measurements to influence the input. This approach to learning protocols is advantageous in the case where the input and output of the system are both quantum states. We will demonstrate this by classifying Bell pairs which can be seen as a certification protocol. Stacking the introduced elementary building blocks into larger RNN networks combines the stochastic features of an RNN with the non-local quantum correlations across the networks.
Furthermore, the ARQN has the potential to deal with noise which is crucial for various applications, including computer vision in NISQ devices.
What makes this project interesting to work on?
This project is at the forefront of fusing artificial intelligence with quantum computing. By embedding the ARQN neuron model in classical RNN using classical-quantum algorithms, we will exploit RNN for faster, secure, and energy-efficient computation in noisy environments. It also aims to improve the resulting accuracy for various applications such as pattern recognition, optimization, learning, associative memory, and others.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The high-performance computing facilities of the Helmholtz-Zentrum Dresden-Rossendorf will be used, which are accessible remotely.
What skills are necessary for this project?
Machine learning, Deep learning, High-performance computing, Software development, Python
Interested candidates should be at Bachelor, Master or PhD level.
We are using satellite remote sensing and simulations to study the dynamics of ice sheets in Greenland and Antarctica. Satellite remote sensing comprises optical, SAR and altimetry missions, that are processed in our group to data products for later analysis. We use the data products to investigate the sea level contribution of ice sheets and for process studies.
https://www.awi.de/en/about-us/service/expert-database/angelika-humbert.html
What is the project's research question?
Can ML improve detection of surface returns in terms of quality and efficiency ?
What data will your exchange student work on?
CryoSat-2 data and airborne data
What tasks will the project involve?
ML training with artificial data, subsequently application to satellite altimetry data (CryoSat-2). After that comparison between the results of conventional retrackers and the newly developed approach. If time permits additional application to Sentinel-3 data.
What makes this project interesting to work on?
Ice sheets are major contributors to sea level change. To quantify the mass loss of ice sheets satellite altimetry data is used. Basically, the change in distance between the sensor and the glacier surface is measured repeatedly, so that elevation change of the ice sheet surface can be transferred into mass loss. This type of measurements is based on radar pulses that are transmitted from a space-born or air-borne instruments and the return is recorded as a waveform. As better the quality of the waveform processing is, as lower is the measurement error - hence this project is a direct contribution to a societal relevant topic: the reduction of uncertainty in measurements of sea level contribution of ice sheets. Its context, as well as the fact that signal processing skills obtained in this research field can be easily applied in other research and engineering context as well, makes this beneficial for the applicant.
This is a joint project together with collaboration partners from GFZ Potsdam, Dr. Tilo Schöne.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The HPC infrastructure of AWI is the platform to work on. Tools will potentially be TensorFlow and other standard AI software packages, as well as preprocessing skripts on a unix platform. The tools will be avialably by remote (vpn) access, too.
What skills are necessary for this project?
Machine learning, Deep learning, Python, unix
Interested candidates should be at Master level.
The Department for Planetary Laboratories bundles the astrobiological, spectroscopy and analytical laboratory activities of the Institute for Planetary Research. The department combines the Astrobiological Laboratories, the Planetary Spectroscopy Laboratory (PSL) and the new Sample Analysis Laboratory (SAL). Within this department we offer a wide range of laboratory techniques as well as environmental chambers that cover almost all bodies in the solar system and beyond and participation in ongoing and past solar system missions.
https://www.dlr.de/pf/desktopdefault.aspx/tabid-17241/
What is the project's research question?
Can we understand remote target mneralogy, based on remote hyperspectral data and laboratoy measurements?
What data will your exchange student work on?
Hyperspectral data from lunar samples or lunar samples,Mars data, laboratory measurements.
What tasks will the project involve?
Study / research literature on the remote target and laboratory measurements, select appropriate algorithm, evaluate performance and generalization on known and unknown targets.
What makes this project interesting to work on?
Mix of theory and application, work on solar system remote data, possibility to learn laboratory measurements skills.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? After pubblication.
What infrastructure, programs and tools will be used? Can they be used remotely?
Python, institute server, laboratory data, laboratory instruments (the latter are available only locally).
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Geographic information systems, Python
Interested candidates should be at Master or PhD level.
Using high energy electrons to do real space imaging of two- and three-dimensional spin structures in magnetic nanostructures and devices;
Developing a variety of algorithms to reconstruct 3-dimensional vector fields from 2-dimensional projections of scalar fields;
Searching for novel localized spin structures that are protected by topology.
https://scholar.google.de/citations?user=Ydcc8tkAAAAJ&hl=zh-CN
What is the project's research question?
Inverse problem from 2-dimensional scalar fields to 3-dimensional vector fields
What data will your exchange student work on?
Simulated data of the forward problem; data acquisition from electron microscopy
What tasks will the project involve?
Developing or improving algorithms that aims to solve the inverse problem from 2-dimensional scalar fields to 3-dimensional vector fields
What makes this project interesting to work on?
There is not yet existing algorithms that can do proper tomographic reconstruction of vector fields from a limited 2-dim dataset, in particular, in electron microscopy, there is not yet algorithm that can recover the spin structures in 3D from the measured 2D scalar fields (one component of the magnetic vector potential).
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
Juelich supercomputing center, Python, Tensorflow, etc.
What skills are necessary for this project?
Machine learning, Deep learning, High-performance computing, Parallel/distributed programming with GPUs, Python
Interested candidates should be at PhD or Postdoc-level.
The group "Applied Web- and Social Media Data Analysis" is embedded in the "Data Acquisition and Mobilisation" Department (Institute of Data Science, German Aerospace Center (DLR)). Our current researches has a strong focus on Twitter data analysis (NLP), Machine Learning (supervised and unsupervised) and semantic, spatial and temporal pattern analysis. Target applications are disaster management and human-environment interactions, for instance, the effect of Covid-19 on observed Twitter sentiments. List of publications: [https://scholar.google.de/citations?](https://scholar.google.de/citations?hl=de&user=n6C4T6AAAAAJ&view_op=list_works&sortby=pubdate)
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-18211/28950_read-76230/
What is the project's research question?
How can information shared on Twitter be fact-checked and validated using the Twitter stream and other complementary open web sources?
What data will your exchange student work on?
Twitter (full archive access) and webpage text data (NLP), images might also be of interest
What tasks will the project involve?
Many possible directions:
- test tailored word embeddings for semantic Twitter stream clustering
- NLP to detect named entities with special emphasis on place names
- applying web-scraping techniques
- information retrieval
What makes this project interesting to work on?
The research goes beyond the typical focus an a single data source and is intended to be applied in real scenarios. Our practitioners (e.g. World Food Programme) need information context - not only a single Tweet that was classified by a machine learning model. For decision support, they need validated information in order to reduce the noise and to enable the utilization and incorporation of social media and web data into well-established workflows that mainly rely on remote sensing, GIS data and in-situ observations.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Twitter has own terms of use, but other data is open source
What infrastructure, programs and tools will be used? Can they be used remotely?
We use Python and tensorflow for programming and have access to an own HPC infrastructure, which can be accessed remotely.
Ideally, the PC used to log in to our network is provided by DLR (in order to fulfill security regularities).
What skills are necessary for this project?
Data analytics, statistics, Data analytics, statistics, Machine learning, Python, Web scraping and information retrieval would help
Interested candidates should be at Bachelor, Master or PhD level.
I am an interdisciplinary researcher with a passion for developing and integrating qualitative and quantitative tools to natural hazards research. I am particularly interested in investigating the impacts of floods and droughts. Currently, I am diving into the amazing world of data analytics. I use a set of tools to support corpus-based investigations. I employ a series of natural language processing (NLP) techniques and both supervised and unsupervised machine learning methods to uncover hidden patterns in text data.
https://www.ufz.de/index.php?en=46549
What is the project's research question?
What were the direct and indirect impacts of the 2021 flood in Germany?
What data will your exchange student work on?
The student/researcher will work on a dataset that comprises about 30000 newspaper articles that report on the 2021 flood in Germany. The data spans from July 2021 until November 2021
What tasks will the project involve?
The student/researcher will make use of unsupervised machine learning (ML) tools to find latent structure in the unlabelled newspaper text data. The goal is to identify impacts (topics) mentioned in the newspaper articles that the data itself will define.
State-of-the-art probabilistic (e.g. latent dirichlet allocation - LDA) and neural network-based (e.g. doc2vec, word2vec) topic modelling algorithms will be tested (Task 1). The advantage of these tools is that they do not require a priori knowledge of the events and their impacts. They allow for automatically discovering coherent topics in large text corpora. These “topics” correspond to clusters of words that are likely to co-occur following an estimated probability. In this way, topic modelling allows for inductively identifying salient topics within a collection of texts. Standard statistical evaluation metrics, like recall, precision, F-score, as well as topic coherence and diversity, will be used to select the best model. Outputs of this task will indicate, in a boolean fashion, if an article mentions a specific impact. These could include water supply disruption, electricity shortages, closure of roads, and breakage of dams.
Besides using unsupervised ML, the student/researcher will use recent advances in keyword assisted or semi-supervised topic modelling (Task 2) to identify impacts to specific sectors (e.g. forestry, agriculture, energy, food). This will allow us to assess impacts of particular interest and increase the interpretability of the model outcomes. For this semi-supervised technique, the student will integrate domain knowledge via “anchor words”, which will be defined by the Principal Investigator (Mariana Madruga de Brito) before fitting the model. This will allow the student to guide the topic model in the direction of the selected words.
Based on the inferred impacts per event, the PI together with the student/researcher will investigate trends in the impacts over time and space (Task 3). The outcomes of the topic modelling algorithm (i.e. topic similarity) will be used to characterize and summarize connections and analogies between the articles. Through quantifying the similarity of pairs of texts, we will cluster articles with related impacts. Visualization tools such as similarity graphs and PCA (principal component analysis) will be used to visualize how the articles are related in terms of their impacts. Moreover, chord diagrams will be used to analyze connections between the impacts.
What makes this project interesting to work on?
The project will explore novel ways of overcoming some of the major methodological obstacles related to the assessment of flood impacts. NLP is currently not a standard tool in natural hazards research. Hence, there is potential for innovation even when using state-of-the-art NLP tools. Outcomes of this stay can potentially lead to a publication of high impact in the field.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? The newspaper texts are not freely accessible depending on the news outlet
What infrastructure, programs and tools will be used? Can they be used remotely?
The student will use the programming language of their choice. The UFZ computer clusters will be available for the student and they can be used remotely.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Scientific computation, data mining, Machine learning, Parallel/distributed programming with GPUs, Python
Interested candidates should be at Master, PhD or Postdoc level.
The Oceanic Machine Vision Group at the GEOMAR works on the topic of optical underwater surveys employing artificial intelligence (AI) and classical computer vision approaches. To this end, we seek to enable cameras to serve as faithful mesasurement instruments and navigation sensors in the deep sea. The latter, visually challenging environment, presents us with a lot of geometric (refraction) and radiometric problems (attenuation, scattering) which we seek to solve.
What is the project's research question?
How to integrate differentiable physical models with neural networks Optional: data acquisition: devise a fixed calibration structure, which is beneficial for data acquisition
What data will your exchange student work on?
We have multiple sets of underwater imagery, specifically tailored to the problems we work on:
We have sets of real imagery, taken in a water test tank and in the Baltic Sea: It comprises RAW image data, for radiometric calibration and a corresponding test set. In addition, we have 3D models of seafloors and test objects to test water and shadow removal on complex structures. We also have synthetic datasets, which can be used to develop and test algorithms wrt. known ground truth. All datasets comprise different light / water setups.
If necessary, we also have camera-light systems, which can be operated manually, inside a test tank or directly in the Baltic Sea. In addition, they can be attached to AUVs and the like, to capture actual deep sea datasets.
What tasks will the project involve?
Optionally: Acquire new data, with existing approaches, or devise new (more robust) data acquisition strategies/hardware.
Main Task:
Devising novel approaches to integrate neural networks into differentiable physical models, thus replacing parts of (potentially not solvable) closed form approaches with learned sub-modules.
Or reversely: devise novel approaches to integrate differentiable physical models into neural networks, thus enabling enhanced parameter estimation from images and a physically interpretable latent space estimate.
The actual implementation would involve writing Python / C++ Code on Linux machines, potentially in some remote (over ssh) work.
The evaluation can also take place on-site or in remote (ssh) work.
We have a lot of prior work and can adapt the tasks to the level and interests of the applicant.
Finally, we strive to publish the jointly achieved result.
What makes this project interesting to work on?
The goal to apply AI-methods -- while maintaining the precision of classical computer vision / photogrammetry approaches -- to actual deep sea imagery entails a twofold set of interesting aspects:
On the theoretical side, novel approaches at the intersection of computer vision and computer graphics can be discovered. The emerging field of differentiable physical models and its combination with neural networks offers a glimpse on a new generation of physically faithful AI.
On the practical side: A very close hands on, experience with respect to the data and its acquisition is to be expected. We do not download a dataset from the web, but we make our hands wet, to get it out of the water. This not only offers a direct experience of the effects we want to work with (refraction, attenuation and the like) but also the opportunity to improve results by particularly improving on the input side.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
(Optionally) Taking new data, of course, would involve on site work, which can be conducted in our underwater camera-lab.
Real data can be taken in our test tank under controlled conditions.
Furthermore, GEOMAR is situated directly at the Baltic Sea, allowing for easy real data acquisition
However, the existing data is readily available through our GIT, which is -- of course -- accessible throughout the world.
We will also provide a gitlab project space for the development process.
If no local computing power is available to the student, we can provide remote access to a HPC workstation, featuring 2X RTX A6000 NVIDIA cards (48 GiG RAM) / 8 Core Xeon CPU / 128 GiG RAM.
What skills are necessary for this project?
Machine learning, Deep learning, Computer vision and image processing/analysis, Python, C/++
Interested candidates should be at Master, PhD or Postdoc level.
When software engineering and AI system development is regarded as simply a means to an end, such as in scientific settings, security considerations take a back seat. We research reliable vulnerability detection in source code, code clone detection, automatic type inference and code quality assessment and our automatic solutions can improve security and safety without spending too much time on it. We further consider privacy and data protection aspects with privacy-preserving learning algorithms and investigate the overall information security of AI systems.
clemens-alexander.brust@dlr.de
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-12909/22555_read-54542/
What is the project's research question?
Can we improve the general applicability of ML-based type inference solutions by generating and analyzing multi-domain training data?
What data will your exchange student work on?
Our group already has "CrossDomainTypes4Py", a large cross-domain type inference dataset, which focuses on the two domains of scientifc code and web development. It is mined from public source repositories and the projects are associated with domains by studying their dependencies.
What tasks will the project involve?
- Extension of the exisiting pipeline to more than two domains
- Identifcation of suitable domains in a methodical fashion
- Evaluation of state-of-the-art type inference methods across domains, including third-order interactions which has not been done yet.
What makes this project interesting to work on?
Machine-learning based methods have had difficulties with type inference tasks, for example due to nesting of generic types. The formulation as a classification problem is incompatible with a finite number of classes, and struggles with open vocabularies. At the same time, there exist reasonably performant tools based on static analysis. However, machine learning-based approaches could take advantage of semantic information such as identifiers and comments that are not used by static tools. They can also be fine-tuned to certain problem domains and learn common expressions, patterns and usages. This project extends the current research into cross-domain type inference, where we want to generalize from one problem domain to the other by introducing third-order interactions. For example, one might learn on two domains and then apply the resulting model to a third domain where labeled training data is not available.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
All our tools are open-sourced. Our infrastructure, including a HPDA cluster, can be accessed remotely via a VPN connection. Most of the institute works from home currently, so we are well suited to communicating this way.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Software development, Python
Interested candidates should be at Bachelor or Master level.
The research aims of the Remote Sensing and Geoinformatics Section at GFZ are to establish remote sensing as a core method in geosciences. In particular, we aim to increase awareness of the considerable value of remotely sensed data for knowledge generation about Earth’s surface properties and processes, which arises from its ability to provide complete coverage over large spatial scales.
https://www.gfz-potsdam.de/en/section/remote-sensing-and-geoinformatics/overview/
What is the project's research question?
What deep learning algorithm can be applied on Soil spectroscopy data for predicting soil properties accurately ?
What data will your exchange student work on?
The exchange student will work on combining several open access soil spectral libraries in the Visible-Near Infrared-Mid-infrared range for predicting soil properties.
What tasks will the project involve?
- Descriptive analysis
- Outliers detection
- Soil spectral pre-processing
- Model development
- Model validation
What makes this project interesting to work on?
Soil spectroscopy is the measurement of light absorption when the light in the visible, near-infrared or mid-infrared (Vis-NIR–MIR) regions of the electromagnetic spectrum is applied to a soil surface. Vis-NIR- MIR reflectance spectroscopy sensed the proportion of the incident radiation reflected by soil. These characteristic spectra can then estimate numerous soil attributes, including minerals, organic compounds and water. Additionally, It can improve the soil properties prediction when surface reflectance from satellite imagery is used.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
All data processing needs to be done using python or R language.
GFZ's high-performance computing system will be used for developing models.
All development can be realized remotely.
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Parallel/distributed programming with GPUs, Python,
Interested candidates should be at Master, PhD or Postdoc level.
The machine learning group is situated at the junction of fundamental machine learning research and practical applications within the German Aerospace Center (DLR), such as computer vision for earth observation data, anomaly detection, explainable artificial intelligence and many more. It aims to be at the state of the art in deep learning, to further develop such methods, and determine how to put them into practice for DLR problems. Consequently, the group considers machine learning not just for a specific set of applications or data sets, but from a holistic perspective.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-13689/23754_read-54469/
What is the project's research question?
What anomaly categories can be mined from the lab notebooks of the eden iss greenhouse and how can they be clustered and mapped to the respective telemetry data.
What data will your exchange student work on?
The main dataset consists of three years of telemetry data (99 variables) recorded in the EDEN ISS research greenhouse in Antarctica.
For the same duration we have the lab notebooks that contain information about anomalous events occurred in the daily work.
We want to make this expert information usable and map it to the anomalies found in the telemetry data.
What tasks will the project involve?
The goal can be approached from different directions depending on the interns interest and skillset, e.g. on extracting the information using NLP methods, categorizing the found events into useful anomaly categories or focusing on the mapping to the telemetry dataset.
What makes this project interesting to work on?
The project offers the opportunity to work on a unique dataset and a challenging task, and bring it one's own ideas and research interest. The EDEN ISS facility was established in 2018 to conduct research on growing vegetables in closed system environments. This knowledge will be used to develop greenhouse habitat modules for future space missions, e.g. to moon or mars.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
High Performance Compute Cluster (can only be used on-site)
Python
What skills are necessary for this project?
Data analytics, statistics, Scientific computation, data mining, Machine learning, Deep learning, Python
Interested candidates should be at Bachelor, Master, PhD or Postdoc level.
Our lab develops and applies genomics and computational approaches, in particular from machine learning, to understand mechanisms of gene regulation in eukaryotic organisms. Computational biology has become indispensable to analyze and ultimately make sense of large-scale data sets that look at the phenomenon of gene regulation from different angles. Our long-term goal is to investigate how regulatory networks enable the correct development of complex organisms, with their multitude of cell types that carry out different functions despite the same genome.
https://ohlerlab.mdc-berlin.de/ and https://github.com/ohlerlab
What is the project's research question?
Improving and extending a deep learning model for multimodal single-cell data integration (related to multi-view learning); Generation of biologically meaningful joint data representations of different biological data types with distinct properties (dimensionality & scale (of features), information content, noise)
What data will your exchange student work on?
- Multimodal single-cell data of well studied human cell populations of peripheral blood mononuclear cells and bone marrow mononuclear cells
- Multimodal single-cell data refers to sequencing-based paired measurements of distinct molecular layers of information, such as, e.g., gene expression (RNA) and chromatin accessibility (DNA), originating from the same single-cell
- Data represented as count matrices
What tasks will the project involve?
Dependent on the student's interest, the project will revolve around one of the following aspects:
- Optimizing our existing model for vertical and horizontal multimodal single-cell data integration concerning its training time, by optimizing the existing code (written in PyTorch) or rewriting it using a different framework (e.g., pyro).
- Extending the existing model to allow for mosaic integration, i.e. combining single-view and multi-view datasets. Here, we already have ideas for modifying the existing architecture, but the student could also contribute their own ideas.
- Extending the framework to be able to work with more than two modalities at the same time
- For related prior research of our group see: https://www.biorxiv.org/content/10.1101/2021.05.11.443540v1.full and https://linkinghub.elsevier.com/retrieve/pii/S0168-9525(21)00255-9
What makes this project interesting to work on?
- Insight into problems at the frontier of developments in the field of single-cell genomics
- Using ML to solve of a real-world problem of high relevance for experimentalists
- Speed-up reduces the computational burden and enhances competitiveness
- Extends applicability of existing models to more use cases
What is the project's expected outcome?
Contribution to software, possible co-authorship dependent on project state and outcome
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
- GPU nodes on the institute's cluster
- Python
- (yet) unpublished in-house software library (python/Pytorch)
- Pytorch
- potentially Pyro
- Jupyter Notebooks
- Standard python data science libraries, e.g., pandas, matplotlib, sklearn, domain-specific libraries, and data structures scanpy and anndata
All of them can be , used remotely.
What skills are necessary for this project?
Deep learning, Software development, Python
Interested candidates should be at Bachelor or Master level.
The Causal Inference group at the DLR-Institute of Data Science in Jena develops theoretical foundations, algorithms, and accessible software tools for causal inference and machine learning. Causal inference is a challenging and promising research field and its application to domains such as climate science will have a high impact both to advance science and to address topics of critical importance for society. The core methodological topics include causal inference and causal discovery for spatio-temporal dynamical systems, machine learning, deep learning, and nonlinear time series analysis.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-12906/22550_read-52195/
What is the project's research question?
Are novel algorithms from the emerging field of causal representation learning able to reliably learn latent causal variables and their causal interactions in spatio-temporal data?
What data will your exchange student work on?
We will use synthetic spatio-temporal data generated from a simplified stochastic climate model based on a VAR process that emulates spatially aggregated modes of variability (these are the latent causal variables) which interact via teleconnections (these are their causal interactions).
The data need to be synthetic in order to have ground-truth knowledge about the latent variables and their interactions for the purpose of method evaluation.
What tasks will the project involve?
We will give input and guidance at all steps of the project. These are:
- Familiarize yourself with the task of causal representation learning and the selected algorithms.
- Familiarize yourself with the data-generating model.
- Apply the selected algorithms to synthetically generated data for varying setups and systematically evaluate the algorithms' performance according to appropriate metrics.
What makes this project interesting to work on?
Causal representation learning is a novel and cutting-edge line of research in the field of machine learning. Moreover, its application to spatio-temporal data is of high relevance for data-driven approaches in applied fields such as the climate and environmental sciences. The results of this study may guide future research in this important direction.
The project thus offers the opportunity to work with exciting novel machine learning methods towards an important and relevant research direction.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Synthetic data generated from an open source model
What infrastructure, programs and tools will be used? Can they be used remotely?
We will use GitLab to organize and exchange files. This is accessible remotely. Other tools, e.g. a Python IDE and a LaTeX editor, can be used locally. For computationally expensive operations access to a computing cluster will be provided.
To facilitate remote work we would hold regular online meetings for discussing goals, progress, and other aspects of the project.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Python
Interested candidates should be at Master, PhD or Postdoc level.
We develop and apply numerical-modelling and data-science approaches to simulate and understand the processes responsible for sculpting the Earth's surface. These processes include, but are not limited to, incision by rivers and glaciers, formation of deltas, coastal erosions, natural hazards, and Arctic permafrost landscapes (with our partners in the Alfred Wegener Institute), especially under a changing climate. One of our foci is on using deep-learning methods to augment and enhance our models and datasets.
https://www.gfz-potsdam.de/en/section/earth-surface-process-modelling/overview/
What is the project's research question?
How is the Arctic permafrost degrading in the warming climate and how do we quantify its effects and impact?
What data will your exchange student work on?
Data will consist of remote-sensing imagery from Landsat, Sentinel, and other satellites or aerial surveys with repeat coverage over time. The images are of Earth-surface features related to permafrost in the Arctic and sub-Arctic. Some of the permafrost features are labelled, and some labels may be supplied by existing machine-learning models trained to identify specific classes of features.
What tasks will the project involve?
Participate in the design and development of a deep-learning model that can detect and quantify changes of permafrost features in the pan-Arctic region. Depending on the level of experience and expertise, tasks ranging from data-pipeline development to deep-learning model implementation or even model-design changes are possible. The specifics can be adapted to the candidate, after discussions with the candidate.
What makes this project interesting to work on?
Quantitative change-detection has wide and transferable applications across many fields, both inside and outside of academic research. Moreover, the Arctic region is particularly vulnerable to global warming, with temperatures rising about twice as quickly as the global average -- known as "Arctic Amplification". The permafrost harbours huge quantities of soil organic carbon, potentially exacerbating climate change if thawed and released. The ecosystem and societies living in the Arctic are also intricately linked to the fate of permafrost.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? The data come from both public/open and proprietary sources.l
What infrastructure, programs and tools will be used? Can they be used remotely?
Access to a suitable computing server with GPU capabilities will be provided. All necessary software packages are open source and can be downloaded and installed on most machines. Remote access is possible with VPN.
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, High-performance computing, Parallel/distributed programming with GPUs, Computer vision and image processing/analysis, Geographic information systems, Python
Interested candidates should be at Bachelor, Master or PhD level.
The machine learning group is situated at the junction of fundamental machine learning research and practical applications within the German Aerospace Center (DLR), such as computer vision for earth observation data, anomaly detection, explainable artificial intelligence and many more. It aims to be at the state of the art in deep learning, to further develop such methods, and determine how to put them into practice for DLR problems. Consequently, the group considers machine learning not just for a specific set of applications or data sets, but from a holistic perspective.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-13689/23754_read-54469/
What is the project's research question?
How is the quality of the labels in a popular remote sensing dataset used for training deep neural networks
What data will your exchange student work on?
Aerial or satellite imagery
What tasks will the project involve?
Choosing a dataset to work on from a selection of suitable datasets
Getting familiar with a method of assessing label quality, most likely by estimating uncertainty
Implementing and a applying the method on the chosen dataset
Discussing the validity of the results
What makes this project interesting to work on?
Knowledge in the field of Deep Learning and uncertainty quantification will be gained. The student will furthermore obtain skills in the processing of remote sensing imagery, as well as get insights in the working life of a research institute.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
HPDA cluster can be used remotely
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, Python
Interested candidates should be at Bachelor or Master level.
The machine learning group is situated at the junction of fundamental machine learning research and practical applications within the German Aerospace Center (DLR), such as computer vision for earth observation data, anomaly detection, explainable artificial intelligence, and many more. It aims to be at the state of the art in deep learning, to further develop such methods, and determine how to put them into practice for DLR problems. Consequently, the group considers machine learning not just for a specific set of applications or data sets, but from a holistic perspective.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-13689/23754_read-54469/
What is the project's research question?
Can information about structural uncertainties help to decrease human effort with uncertain Neural Network predictions?
What data will your exchange student work on?
The student will work in the field of uncertainty quantification in Neural Network predictions. The student will start with evaluating predictions of pre-trained neural networks with a special focus on the calibration and structure in the predicted uncertainties. The setup will be realized using artificial dummy data and remote sensing data received from satellites and labeled for land cover classification.
What tasks will the project involve?
* Getting familiarized with neural networks and predictive uncertainty in neural networks.
* Evaluating predictions from pre-trained neural networks with a special focus on predictive uncertainties.
* Based on the previous evaluations and in exchange with the supervisor, potential improvements in the training and inference strategies (data feeding, loss function, ...) should be proposed and tested.
What makes this project interesting to work on?
A crucial point for using deep learning approaches in safety-critical real-world applications is the robustness of such approaches and proper uncertainty quantification for difficult predictions. The student will get many insights into this field and has the chance to make the first steps within an established research environment. Besides this interesting field of research, there will be close cooperation with the supervisor who is also working in the same field. And visiting the beautiful city of Jena is also definitely worth it.
What is the project's expected outcome?
Co-authorship to research paper
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
* Python with freely available standard packages. The deep learning part is implemented using PyTorch.
* For visitors who come to Jena (Germany), the larger computations can be run directly on a high-performance computing cluster. Remotely this is unfortunately not possible. (But as Jena is really nice you should come here anyway 😉 (if possible)).
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Python
Interested candidates should be at Bachelor or Master level.
Michael Denker is leader of the team “Data Science for Electro- and Optophysiology Behavioural Neuroscience” at INM-10, Forschungszentrum Jülich, to meet the upcoming challenges in the field of research data management in neuroscience. The main research interest of his team is to investigate the relationship between the correlation structure and spatio-temporal organization of neural activity. In the context of the EU flagship project Human Brain Project (HBP) he coordinates the development and community building of tools for improving reproducibility in analysis and model validation, such as Elephant (python-elephant.org), one of the leading open source analysis tools for electrophysiological data.
https://www.fz-juelich.de/inm/inm-6/EN/Forschung/Gruen/DSEO.html?nn=724694
What is the project's research question?
In this project we aim to perform a batch analysis of large datasets of activity data from the brain with the aim to characterize significant repeating neural activations patterns in the data. Knowledge about the statistics of such patterns is expected to contribute to our understanding of the functional role of ubiquitously observed brain waves. The project consists of two parts: (i) expansion of capabilities of the distributed tensor framework Heat (https://github.com/helmholtz-analytics/heat) as part of the underlying analysis algorithm and (ii) application of the method to a range of data sets using high-performance compute resources and subsequent characterization of the results.
What data will your exchange student work on?
The project will utilize neuroscientific activity data featuring electrophysiological recordings of single neuron spiking activity, i.e., point time series, obtained from a complex timed motor coordination task. The data, covering multiple recording sessions, are readily available, curated and prepared.
What tasks will the project involve?
The project will imply the use of the ASSET analysis method to detect recurring sequences of brain activity in the data (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004939 ). While an implementation of this method exists, in its current naïve form the size of the datasets are too large to be handled in full on a standard compute node. For this reason, a new implementation of the method has been developed that uses the Heat framework (https://github.com/helmholtz-analytics/heat) to distribute matrix operations across multiple nodes. This implementation currently lacks the final step of the ASSET analysis, which is based on a DBSCAN clustering algorithm.
In the first part of the project we will implement a solution to hook the distributed ASSET to an existing parallel implementation of DBSCAN (https://github.com/Markus-Goetz/hpdbscan). In the second part of the project we will apply the method to available data sets, in order to extract sequences of activity patterns that repeat in excess of chance expectation within and across recording sessions. We pool data to construct corresponding pattern statistics, such as sequence lengths, neuronal participation numbers, sequence reliability, pattern similarities.
What makes this project interesting to work on?
By participating in this project, you will be involved in parallelizing machine learning algorithms, and you will help answer fundamental questions on brain function via statistical pattern-mining methods in computational neuroscience.
The Heat framework for parallel computing in Python is being developed to make data-intensive research possible that is otherwise severely hindered by single-CPU memory bottlenecks. When completed, the synergy between Heat and the ASSET method will enable our group to find correlations within our brain activity data on an unprecedented scale.
What is the project's expected outcome?
Contribution to software
Is the data open source? The data is published as Brochier, T., Zehl, L., Hao, Y., Duret, M., Sprenger, J., Denker, M., Grün, S., Riehle, A., 2018. Massively parallel recordings in macaque motor cortex during an instructed delayed reach-to-grasp task. Scientific Data 5, 180055. https://doi.org/10.1038/sdata.2018.55
What infrastructure, programs and tools will be used? Can they be used remotely?
Both Heat and ASSET are written in Python, using NumPy and PyTorch functionalities, and MPI for parallel operations.
Heat: https://github.com/helmholtz-analytics/heat
ASSET (Elephant library): https://github.com/NeuralEnsemble/elephant
We use GitHub, Mattermost and video calls for developer discussions.
We have access to the supercomputers at the Jülich Supercomputing Center for performance and scalability tests:
https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/supercomputers_node.html
All the infrastructure can be used remotely.
What skills are necessary for this project?
Data analytics, statistics, Parallel/distributed programming with GPUs, Python
Interested candidates should be at PhD or Postdoc level.
We work in the field of machine learning and high-performance computing.The methods we work on include classical image processing methods, complex modelling steps such as diffusion tensor reconstruction and modern machine learning techniques such as deep learning models. We develop and adapt methods originally developed for workstations or small clusters for scaling to HPC systems at the Jülich Supercomputer Centre.
https://www.fz-juelich.de/ias/jsc/EN/Expertise/SimLab/slns/_node.html
What is the project's research question?
This project aims at analyzing different approaches to parallelize linear regression to be applied to massive amounts of data. After analyzing the different approaches theoretically, they will be implemented in HEAT((https://github.com/helmholtz-analytics/heat)), an open-source software for high performance data analytics and machine learning. The resulting algorithms will be benchmarked on a high-performance computing (HPC) system and finally applied to real world data from atmospheric science.
What data will your exchange student work on?
The data comes from atmospheric research, on the one hand from air quality forecasting systems (CAMS project), on the other hand from measuring stations all over Europe.
What tasks will the project involve?
The tasks will involve theoretical analysis of machine learning algorithms (distributed linear regression, e.g. https://arxiv.org/abs/1810.00412) as well as its implementation in software followed by benchmarking it on a HPC system. Finally, the implemented method will be applied to real world data from atmospheric science.
What makes this project interesting to work on?
The project gives the opportunity to work on the edge between machine learning and high-performance computation. Also, it allows us to contribute to a state-of-the-art ML software library as well as work on a HPC system.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? No
What infrastructure, programs and tools will be used? Can they be used remotely?
The implementation in HEAT (https://github.com/helmholtz-analytics/heat) will be done in Python using the ML library PyTorch (https://pytorch.org/). In addition, knowledge of HPC systems is an advantage, including knowledge of MPI which will be used via MPI4py (https://mpi4py.readthedocs.io/en/stable/). The student has access to the supercomputers at the Jülich Supercomputing Center for performance and scaling tests as well as for the final application to real world data: https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/supercomputers_node.html
All work can be done remotely.
What skills are necessary for this project?
Machine learning, High-performance computing, Software development, Python
Interested candidates should be at Master, PhD or Postdoc level.
1. The Earth System Data Exploration (ESDE) research group at the Jülich Supercomputing Centre (JSC) develops innovative methods and tools for the integration and analysis of complex, heterogeneous, and big datasets related to air pollution, weather, and climate
2. ESDE explores state-of-the-art deep learning for air quality, weather, and climate applications
3. ESDE develops parallelized deep learning workflow toolkit and performs scalable deep learning on HPC systems
What is the project's research question?
Can deep learning be an effective tool for downscaling atmospheric fields (This task is analogous to the super-resolution task in the computer vision domain that projects the input image from low-resolution to high-resolution)? Can deep learning models be generalized to transfer the pre-trained model across geographical regions (particularly in the data-sparse regions) with or without additional fine-tuning in the context of downscaling?
What data will your exchange student work on?
The student will primarily work on the weather and climate benchmark datasets that have been prepared in the MAELSTROM project (see details: https://www.maelstrom-eurohpc.eu/products-ml-apps). Particularly, the student will explore the application “Datasets for 2m temperature and precipitation short-range forecasts” and the application “Dataset for 2m temperature downscaling”.
What tasks will the project involve?
1. Further develop advanced deep learning methods (e.g. GANs, Visual Transformers, etc) for temperature and precipitation downscaling
2. Explore the domain adoption approach to transfer pre-trained neural networks across multi-geographic regions and multi-data sources.
3. Scale the deep learning networks on the JUWELS and JUWELS Booster systems at the Jülich Supercomputing Center
What makes this project interesting to work on?
1. The students will contribute to the EuropHPC project, work with the Machine Learning, HPC, and Earth Science scientists from the world-class international research centers and universities
2. The students will have the opportunity to tackle an important challenge in weather and climate research with deep learning
3. The students will gain experiences and knowledge in the field of deep learning, HPC, and Earth science.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
The study will have the opportunity to access to JUWELS and JUWELS Booster HPC systems in JSC remotely. The student will be able to utilize JUPYTER-JSC WEBSERVICE which provides users an interactive environment for application development. (see details: https://docs.jupyter-jsc.fz-juelich.de/github/FZJ-JSC/jupyter-jsc-notebooks/blob/master/Jupyter-JSC_supercomputing-in-the-browser.pdf) . All the infrastructure and tools can be used remotely
What skills are necessary for this project?
Data analytics, statistics, Machine learning, Deep learning, High-performance computing, Computer vision and image processing/analysis, Software development, Python
Interested candidates should be at Master, PhD or Postdoc level.
The central theme of the metadata management group is making data available not just in its initial context but beyond the boundaries of projects, institutions, and communities. Our activities revolve around areas such as metadata, Semantic Web & knowledge graphs as well as data management in research and industry. Additionally, we aim to address the topic of other services that can we build on top of knowledge graphs and semantic metadata to support users.
https://www.dlr.de/dw/en/desktopdefault.aspx/tabid-12905/22531_read-49439/
What is the project's research question?
In this project, voice recognition software solutions are to be evaluated with respect to their usefulness in non-contact data and metadata collection in scientific laboratories. The research question is how a speech recognition solution in scientific laboratory can support (meta)data capture with respect to both data quality (e.g. ability to work efficiently in the presence of background noise) and accuracy (capable of recognizing scientific terminology).
What data will your exchange student work on?
Voice input/output (of scientific text) and their electronic text transcription. Data preparation and generation will be part of the project.
What tasks will the project involve?
- tool selection for tests
- input data preparation & creation in a set of experiments
- evaluating performance of selected tools
What makes this project interesting to work on?
Modern laboratory work requires innovative solutions to enable the scientist to capture (meta)data in a digital form efficiently and easily at the same time. The use of electronic laboratory notebooks (ELNs) helps with improving research data management and laboratory processes. However, it is not always possible to use traditional ELNs. Voice recognition is a powerful tool that can support modern laboratories along their path towards digitalisation: collecting data directly during the experiment, transforming speech input into digital text data, and pushing them to the dedicated ELN. Combining artificial intelligence and scientific data management is the unique characteristic of the project.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? one of the tasks will be preparing the data
What infrastructure, programs and tools will be used? Can they be used remotely?
For the voice input capture we will provide the necessary infrastructure, i.e. hardware and access to our server infrastructure. Part of the infrastructure can be used remotely. For the output data analysis, necessary infrastructure is also accessible remotely and will be provided.
What skills are necessary for this project?
Machine learning, Software engineering, Software development, speech recognition, NLP
Interested candidates should be at Master level.
The Earth Surface Geochemistry group at GFZ Potsdam uses cosmogenic and stable metal isotopes to trace material turnover on the Earth’s surface. We employ these isotope fingerprints to understand weathering and climate interactions, to study soil, plant and nutrient cycles and quantify erosion processes and global sediment cycles.
https://www.gfz-potsdam.de/en/section/earth-surface-geochemistry/overview/
What is the project's research question?
Laser ablation methods are gaining huge traction in the Earth sciences, thanks the unique insights they can offer, and the high throughput of sample measurements that is possible. However, although the analyses are fast and require little or no sample preparation, data processing post-analysis is currently time-consuming and laborious, and is a rate-limiting step. The question that will be addressed in this project is how can data science methods be applied to more efficiently analyse and visualise transient stable isotope ratio signals, more reproducibly apply corrections and more rapidly evaluate large datasets and check the accuracy of known samples.
What data will your exchange student work on?
Our group is a leading centre for the determination of stable isotope ratios by laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS). Variation in the abundance of different stable isotopes record processes in biological, geological and chemical systems and allow us to trace pathways and reactions without interfering with the natural conditions. Isotope ratios are routinely employed for age determination, source provenance and as a proxy for e.g. past oceanic pH values and atmospheric CO2 levels. Whereas most isotope ratios are determined by dissolving and processing bulk samples, with lengthy sample preparation steps, our group employs in situ techniques based on laser ablation. LA-ICP-MS is a technique, where a focused laser beam is ablating a small spot in a solid sample, the formed aerosol is transported by a stream of helium into a plasma where the sample is ionised. The ions are accelerated, sorted according to their mass-to-charge ratio and recorded time resolved in a mass spectrometer.
ICP-MS is a relative technique, meaning that the signal recorded by the mass spectrometer is potentially biased and the measurement must be corrected by comparison to known reference materials (calibration). In addition, a challenge with LA-ICP-MS is that the element of interest is not separated from the other components of the samples, which can introduce interferences that must also be corrected for.
For concentration measurements, vendor-provided software packages are typically sufficient to perform fast and reproducible data reduction. However, for stable isotope ratios – in particular when time resolved data is recorded – the provided software is not suitable. Typically, most labs (ours included) will use a set of in house developed spread sheets, macros and simple scripts to apply the correction schema and calibrate their results. Often this means manually copying in data from raw mass spectrometer datafiles (tab delimited text format), and visually screening data. However, this lack of automation means evaluation of outliers, blank correction and data standardisation requires a significant time investment, and can potentially introduce subjectivity.
What tasks will the project involve?
The student’s first task will be to explore representative datasets, to understand current practices in evaluating transient isotope ratio signals and the calculations needed, and better evaluate existing workflows and where they could be improved by automation.
The student would then be tasked with coding an interface to load and parse mass spectrometer datafiles, and perform necessary calculations and visualisations. This should include (semi)automated integration of the sample signal and background to be subtracted (based on user-defined criteria), calculation of isotope ratios, outlier rejection (based on selectable criteria), (manual) classification of sample type, standardisation of the isotope ratio measurements with reference to accompanying measurements of reference materials, and reporting of the results and data quality metrics (uncertainty).
What makes this project interesting to work on?
Laser ablation is an extremely versatile tool, which enables researcher to probe on the µm-scale the elemental and isotopic composition of a wide variety of materials. The types of data that the student will work with – and will help with advancing in the future – include measurements of atmospheric CO2 concentrations in the geological past, the chemistry of meteorites, and the geochemical signals of bleaching events on coral skeletons. The new tool could dramatically improve the speed and efficiency at which we produce these data, meaning the student’s work could have a direct benefit for our understanding of the planet, its climate sensitivity, and the effects of human-induced climate change.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software, If successful, we would expect to make the code publicly available for other scientists to use, with its own peer-reviewed descriptive paper (on which the student would be an author).
Is the data open source? The software tool would be made available, as an open source project, with an accompanying publication describing its use. The (raw) data that will be used to test and evaluate the software tool is acquired for different research projects, and these data’s availability will be dictated by the needs of each project. However, our group actively encourages FAIR data principles and shares datasets in open-access data repositories. Typically, only fully processed data are made publicly available.
What infrastructure, programs and tools will be used? Can they be used remotely?
The basis of the tool can be an extension of currently available (commercial) software packages, or preferably be designed from scratch in freely available coding language: R, or preferably Python. Licenses for some commercial software packages are available at GFZ Potsdam, and remote access can be granted to the collaborator.
What skills are necessary for this project?
Data analytics, statistics, Python
Interested candidates should be at Bachelor or Master level.
The German Research Centre for Geosciences (GFZ) is Germany’s national research centre for Solid-Earth Sciences, which investigates the dynamics of planet Earth as it is shaped by physical, chemical and biological processes.The Seismology group is engaged in the development and application of methods to image the elastic structure of the Earth based on the signals of natural earthquakes and the ambient background `hum’, as well as to analyse earthquake activity across a wide range of scales. Machine learning is currently used to enhance earthquake analysis and improve knowledge of earthquake physics.
https://www.gfz-potsdam.de/en/section/seismology/overview/
What is the project's research question?
How can we improve earthquake monitoring with Deep Learning, and how can we turn the nominal confidences returned by DL models in seismology into calibrated uncertainties?
What data will your exchange student work on?
Time series data representing ground motion from thousands of seismometers worldwide, partially unlabelled, partially labelled with annotations on seismic wave arrivals and corresponding earthquake parameters. Standard python libraries (obspy) will be used to read and preprocess the data. We operate a global network of seismic stations, a seismological data centre and earthquake monitoring service (https://geofon.gfz-potsdam.de), which provides rich archivea and realtime data streaming.
What tasks will the project involve?
The ultimate target of the project is to improve the reliability of Deep Learning (DL) based earthquake analysis models. Performant algorithms for the most straightforward applications (automatic precising of the first arriving wave and selected secondary waves) have been developed recently but there is still a nearly-complete lack of understanding how the nominal confidence returned by DL models relates to actual uncertainties. You will be tasked with testing the performance of algorithms for picking seismic arrivals in different settings and under different noise conditions, and suggest improvements to training strategies (or even model design) and evaluation metrics.
You will be working in the SeisBench framework: https://pypi.org/project/seisbench/ (see also Woollam et al. 2021 https://arxiv.org/abs/2111.00786 ; Münchmeyer et al. 2022 https://doi.org/10.1029/2021JB023499 ), which uses pytorch for implementation of DL models, and provides tools for rapidly benchmarking algorithms.
What makes this project interesting to work on?
You will work with a team of experienced scientists and software engineers and gain experience in coding for a science-driven production environment. You will deepen your knowledge of machine learning for scientific data analysis and the related python packages. Depending on how the project proceeds, your contributions might be included into the Seisbench package or you might become a co-author on a scientific publication. Finally, the developments are designed to directly improve our operational earthquake monitoring, so your work potentially will directly result in improvements for an operational service.
What is the project's expected outcome?
Co-authorship to research paper, Contribution to software
Is the data open source? Yes
What infrastructure, programs and tools will be used? Can they be used remotely?
A desktop or laptop computer will be made available for development work. We rely on open source software for all processing and model training. Both the GFZs high performance computing facilities and the HAICORE Helmholtz-AI GPU cluster can be accessed by this project. All resources can be accessed remotely.
What skills are necessary for this project?
Machine learning, Deep learning, Software development, Python
Interested candidates should be at Master, PhD level.