Our research is mainly concerned with data mining and machine learning. In particular, we develop theory and algorithms for answering exploratory questions about data, such as `what are the causal dependencies in my data’ or `this is my data, tell me what I need to know’. To identify what is worthwhile structure we often employ well-founded statistical methods based on information theory. On the basis of such scores we then develop efficient algorithms that can extract useful and insightful results from large and complex data.
What is the data science project’s research question? Given a knowledge graph, how can we discover those edges, facts, and paths that are surprising either because they are present, or because they are missing? How can we connect this ‘surpringness’ to commonsense knowledge (that is often not explicitly present in a KG) or commonsense reasoning? Combined, the main question we’d like to study here is how to find those aspects of a Knowledge Graph that are surprising from a common-sense perspective. Just like the title says.
What data will be worked on? We will primarily work on publicly available knowledge graphs (e.g. Yago) and their commonsense variants. In addition we may consider the raw text data these are extract from (e.g. Wikipedia).
What tasks will this project involve? We will develop both the theory (e.g. what is surprisingness in a KG, how does commonsense stand out in this regard, and how do we measure either) and the method (how to discover surprising and commonsense elements from a large KG).”
What makes this project interesting to work on? No matter how hard some may wish for it, data alone is not enough; some questions require background understanding or assumptions on how the world works. Ideally these assumptions are expressed in causal terms, as that permits causal reasoning. Knowledge graphs are well suited to encode such knowledge, but often lack good inclusion of commonsense knowledge; mostly because nobody finds it necessary to say e.g. that elephants are typically grey. In this project we’ll take a causal perspective on knowledge graphs, both to commonsensically reason with a knowledge graph as well as to infer what is commonsensically missing and hence obtain better commonsense knowledge graphs.
What is the expected outcome? Contribution to research paper
What infrastructure, programs and tools will be used? Can they be used remotely? We will rely on existing CISPA infrastructure both regarding asynchronous communication (Mattermost), compute (GPU and CPU), and version control (GitLab) where necessary, all of which can be used remotely. If the exchange will be hosted on site, our OTM team will assist in finding housing, and we’ll provide an office with necessary equipment. If the exchange will be virtual, as well as to communicate with the student’s advisor, we’ll use video-conferencing (Zoom) as the main means of communication. When it comes to instantiating our ideas, we will be pragmatic in the choice of what programming language best fits the project and the skills of the candidate (e.g. C++, Python, etc).
What skills are necessary for this project? Data analytics / statistics, Data mining / Machine learning
Is the data open source? Yes
Interested candidates should be at PhD level. Jilles Vreeken is looking for 1 visiting scientist, working on the project together with the team.