Blog post – Jenia Jitsev, Mehdi Cherti

 Large Scale Transfer Learning applied to X-Ray based COVID diagnostics

Jenia Jitsev, Mehdi Cherti

Please briefly describe your project exchange participants would be working on and what makes it exciting.

 In the project we aim to test what is the benefit of training very large deep learning models in terms of gaining better generalization on previously unseen tasks, so for instance when transfering models across different datasets and domains after training.

Recently, strong evidence was obtained that training large models on large data increases their capability to generalize to unseen data and transfer to specific, smaller datasets (e.g., see We would like to test these claims on one currently acutely relevant scenario, namely the problem setting of COVID-19 diagnostics from different medical imaging data. For instance, COVIDx is an openly available X-Ray lung image dataset that contains samples of COVID-19 and non COVID-19 patients (

 Such a dataset is still comparatively small, so the question arises whether pre-training large deep neural networks either on large generic natural image datasets like ImageNet-21k (~14 million images) or on generic medical image datasets like CheXpert (~200  000 images), or on both, would produce any benefit when attempting to transfer to such small specific datasets like COVIDx compared to just training on it from scratch, without any pre-training. COVID-19 diagnostics is just one possible scenario, the same experiments can be conducted with other medical imaging datasets, for instance for X-Ray lung image based Tuberculosis diagnostics. The project covers many essential aspects on frontiers of deep learning: the fundamental question of how to learn a generic  model that can successfully transfer to other domains and tasks, working with real world problems inferring patient state from medical imaging data and employing supercomputers like JUWELS Booster at JSC (, which is necessary for large-scale distributed training to learn such large generic models that may transfer well.

 How did it come to this project? What sparked your interest in it in the first place?

 In our lab we are interested in understanding how to train deep learning models that are highly reusable, transferable across different scenarios, not just on the dataset they were trained on. Back in April 2020, when COVID-19 first wave arrived, it became clear that we have a real world problem that is both acutely relevant and also poses a challenging setting for transfer learning – learning to perform diagnostics on the basis of conventional X-Ray images. X-Ray imaging is cheap and widely available, and can be used to predict different relevant states of COVID-19 patients, as hinted by recent studies. However, datasets available for training such diagnostic tools are still small and taken under different conditions, which makes it difficult to create a diagnostic tool from scratch, especially in regions of the world where developing a deep learning model from scratch is impossible due to lack of necessary equipment. Therefore, we decided to combine our effort on understanding transfer learning with the need to provide a way to develop and deploy such diagnostic tools in an easy way based on already pre-trained, transferable models. This resulted in the COVIDNetX ( initiative that started in April 2020, in which frame the project will also take place.

Which recent studies from your group were exciting and why?

There are two studies I would like to mention that exemplify the broad range of problems we are working on at the Juelich Supercomputing Center and in the frame of Helmholtz AI. One study deals with using generative models, in this case generative adversarial networks (GANs), to achieve super-resolution for a 3D turbulence field flow prediction in a real application to combustion processes (

 It shows strikingly that networks developed originally for a certain problem in particular domain, in this case for obtaining super-resolution on natural images, can be adapted to perform very well in scientific fields where high precision comparable to large-scale simulation of a physical process is required ( Another study looks at deep reinforcement learning in a changing environment of a 3D game, the so-called Obstacle Tower Challenge ( It was surprising for us to see that proximal policy optimization (PPO) equipped with a quite simple convolutional feed-forward neural network as a visual encoder is able to learn the game up to level 10, despite a drifting environment and challenging tasks that are introduced when progressing the levels (

 We also study the generalization capabilities that this kind of learning is able to achieve when confronted with different changes in the game in a follow-up work.

 What is it like to work in your lab? How many people are there, what is their background?

 Our institute hosts different groups that deal with various aspects of basic and applied machine learning. Project is hosted by two labs, cross-sectional team deep learning (CST-DL, and Helmholtz AI consultant team (  People in the labs, currently about 8 members, have different scientific backgrounds, coming from computer science, machine learning, supercomputing, physics or electrical engineering, forming together a very international crowd. We have contacts around the campus with different labs and are also part of JULAIN, a campus-wide network of machine learning interested researchers, which also recently organized a hackathon, Juelich Data Challenges (, where COVIDNetX was one of the challenges to work on.

 What else would you like to share with data scientists interested in applying?

 We, Dr. Jenia Jitsev leading CST-DL and Dr. Mehdi Cherti, researcher at Helmholt AI Consultant Team, are looking for people eager to join in for exploring this transfer learning scenario, where many of the current challenges on frontiers of deep learning as  a field are well represented – the fundamental question of what makes up a transferable, reusable model, dealing with imbalanced, small data, uncertainty quantification and explainability, handling high resolution inputs and doing distributed training of large models on supercomputers. Potential candidates should make sure they feel comfortable with deep learning workflows and also feel home when coding Python and using usual libraries like PyTorch or TensorFlow. We have usual arrangements for remote work, especially given the current situation, and use standard tools to code together such as git, gitlab and wikis. So the scientific and technological stage for the adventure is set, just bring your skills and desire to discover – it will be exciting to work on this project together!