The Data-management Technologies group at the Institute of Data Science is concerned with the various aspects of managing large scientific datasets.
One focus area is using Semantic Technologies to improve and leverage the metadata descriptions of such datasets.
This ranges from supporting users in creating high-quality metadata with low efforts up to easing data access by leveraging semantic connections.
What is the data science project’s research question?
What are useful domain-independent criteria to characterize Semantic Web resources (ontologies, vocabularies, SPARQL endpoints, …) and how can they be applied/implemented at scale?
As a knowledge engineer you are oftentimes faced with the task to find existing resources that cover (parts of) your current domain. On the one hand, this saves time as parts can be reused and do not need to be reinvented. On the other hand, this strengthens the integration with other parts of the growing Linked Data Cloud. With the number of available datasets ever increasing, they can not afford to evaluate each resource in details anymore. Instead, they need a quick summary that lets them discard unsuitable resources rather quickly.
In this project we want to look at both concepts and implementations that characterize and summarize a given Semantic Web resource. While it is rather easy to describe a single, well-behaved resource, challenges arise from the heterogeneity of resources in the wild with all their ideas on how such resources should be built or published.
What data will be worked on? Different publicly available ontologies (will initially be provided by the supervisors, but may be extended throughout the project).
What tasks will this project involve?
Get a rough overview of the state of the art (very rough – the supervisors will provide some input for this).
Define metrics for the characterization for Semantic Web resources – from the state of the art and in discussion with the supervisors.
Implement a selection of those metrics and test them on the provided resources; ensure scalability of the implementation.
What makes this project interesting to work on? Graphs are an increasingly important way to store highly connected data. In this project the student will get hands-on experience with graph data from different sources and how to start analyzing real world datasets without any prior knowledge about their contents.
What is the expected outcome? Contribution to software development
What infrastructure, programs and tools will be used? Can they be used remotely? The used ontologies are publicly available in general.
For developing and running the actual analysis we will provide access to our Gitlab instance as well as our server infrastructure.
With regular meetings discussing goals, progress, and other aspects of the project, remote work will be possible.
What skills are necessary for this project? Data analytics / statistics, Databases
Is the data open source? Yes
Interested candidates should be at Master level . Sirko Schindler is looking for 1 visiting scientist, working on the project together with the team.