28. Requirements for a training set to achieve the desired level of performance in detection of anomalous railway switch condition


Daniela Narezo

The data science working group Asset Monitoring and Management at the Institute of Transportation Systems is doing research on new data-driven approaches for the condition monitoring of the traffic infrastructure to enable predictive maintenance. Machine learning is applied to derive condition information from embedded wayside and vehicle-mounted sensors. Our research is based on real world data examples gathered together with industry partners in relevant operational environment.

What is the data science project’s research question? What is the minimum size of the training set and which qualities does it need to fulfil in order for our readily available anomaly detection model to have a certain performance?

What data will be worked on? Simulated data: annotated set of point machine current curves (normal or abnormal), which are used to monitor the health condition of a switch

What tasks will this project involve?  

• Review of methods
• Design a study that evaluates anomaly detection model skill versus the size of the training dataset
• Create learning curve graphs
• Develop machine learning pipelines in python
• Validation of results

What makes this project interesting to work on?   This research is part of a bigger project on real operative conditions of railway switches. Thus, it has the potential to make a real difference on railway operations. Additionally, this research question often raises when working with machine learning methods, therefore the experience gathered can be useful for further projects.

What is the expected outcome?   Contribution to research paper, Contribution to software development

What infrastructure, programs and tools will be used? Can they be used remotely?   

• Python
• Open source machine and deep learning frameworks

What skills are necessary for this project?  Data analytics / statistics, Scientific computation, Data mining / Machine learning, Visualization

Is the data open source?  No

Interested candidates should be at Phd level. Daniela Narezo is looking for 1 visiting scientist, working on the project together with the team.