Intern
Machine Learning for Complex Networks

Research Projects

Our research has led to third-party funded projects with a volume of more than 2.4 million Euro. Below you can find an overview of current research projects.

Principal Investigator at our chair: Prof. Dr. Ingo Scholtes,

Academic partners: Prof. Dr. Dominic Grün, Prof. Dr. Andreas Hotho

Project duration: 2023-2026

Funding (our share): EUR 227,286 from BMBF

Principal Investigator at our chair: Lisi Qarkaxhija

Industry Partner: DATEV eG

Project Duration: 2024 - 2025

Funding: 135,000 EUR from BMBF

Principal Investigator at our chair: Prof. Dr. Ingo Scholtes

Academic partners: Prof. Dr. Martin Tomasik, Prof. Dr. Volkan Cevher, Prof. Dr. Urs Moser

Project duration: 2021 - 2024

Funding (our share):  CHF 250,000 from Swiss National Science Foundation (SNF)

Short summary

By referring to intensive longitudinal data from the large-scale formative assessment system implemented in primary and secondary schools of four cantons in Switzerland (N = 100,000), we propose developing and implementing a novel methodology to analyze the developmental trajectories of students from a truly intraindividual perspective. Based on previous research showing that interindividual differences rarely correspond with intraindividual change (e.g., Molenaar & Campbell, 2009), this project aims to contribute to a paradigmatic shift in education science by reinstating the individual student as the primary focus of empirical research. In doing so, we suggest four hierarchically related approaches that address issues that are both fundamental in developmental science and highly relevant in the domain of education. These approaches are built around four increasingly complex statistical concepts (i.e., mean, variance, covariance, and multilevel). First, we suggest comparing different modelling approaches in order to determine a valid developmental score despite the challenge of imbalanced categorical data that has been collected at varying time intervals with different test items at each assessment. Second, we want to explore interindividual heterogeneity within intraindividual change by growth-mixture modelling, dynamic latent class analysis, generalized mixture modelling with dynamic structural equation models, and graph models based on higher-order time-series data. Third, we would inspect intraindividual covariation between the different concepts and content domains underlying the developmental score using dynamic systems, as well as machine learning approaches. Fourth and finally, using a summer vacation design, we would disentangle the institutional from the noninstitutional effects on students’ intraindividual learning progress in order to explore how much of the heterogeneity in learning progress can be explained by which factors. The methods developed and the lessons learned in the proposed project will enable future researchers in education science to conduct research based on data from digital learning platforms and show how such data repositories can provide an opportunity for future long-term longitudinal research on learning trajectories.

By referring to intensive longitudinal data from the large-scale formative assessment system implemented in primary and secondary schools of four cantons in Switzerland (N = 100,000), we propose developing and implementing a novel methodology to analyze the developmental trajectories of students from a truly intraindividual perspective. Based on previous research showing that interindividual differences rarely correspond with intraindividual change (e.g., Molenaar & Campbell, 2009), this project aims to contribute to a paradigmatic shift in education science by reinstating the individual student as the primary focus of empirical research. In doing so, we suggest four hierarchically related approaches that address issues that are both fundamental in developmental science and highly relevant in the domain of education. These approaches are built around four increasingly complex statistical concepts (i.e., mean, variance, covariance, and multilevel). First, we suggest comparing different modelling approaches in order to determine a valid developmental score despite the challenge of imbalanced categorical data that has been collected at varying time intervals with different test items at each assessment. Second, we want to explore interindividual heterogeneity within intraindividual change by growth-mixture modelling, dynamic latent class analysis, generalized mixture modelling with dynamic structural equation models, and graph models based on higher-order time-series data. Third, we would inspect intraindividual covariation between the different concepts and content domains underlying the developmental score using dynamic systems, as well as machine learning approaches. Fourth and finally, using a summer vacation design, we would disentangle the institutional from the noninstitutional effects on students’ intraindividual learning progress in order to explore how much of the heterogeneity in learning progress can be explained by which factors. The methods developed and the lessons learned in the proposed project will enable future researchers in education science to conduct research based on data from digital learning platforms and show how such data repositories can provide an opportunity for future long-term longitudinal research on learning trajectories.

Principal Investigator at our chair: Prof. Dr. Ingo Scholtes

Project duration: 2018-2024

Funding: CHF 1.5 Mio from the Swiss National Science Foundation (SNSF)

Short summary

Graph analytics and (social) network analysis have become cornerstones of data science. They are widely applied to relational data studied in disciplines such as computer science, physics, systems biology, social science or economics. However, we are increasingly confronted with high-frequency, time-resolved data which not only tell us who is related to whom, but also when and in which sequence these relations occurred. The analysis of such data is still a challenge. A naive application of network analysis and modeling techniques discards information on the timing and ordering of relations, which is the foundation of so-called causal or time-respecting paths, i.e. it is needed to answer the question who can influence whom. In my research, I study the effects of temporal ordering in time-resolved relational data from real-world systems. Using a combination of information-theoretic and statistical methods, we could demonstrate that temporal correlations in data from social and biological systems break the transitivity of causal paths. We further showed that the application of network-based data analysis and modeling techniques as well as algebraic methods to time-stamped data yields wrong results.

Addressing the problem that common graphical representations of relational data discard information on the temporal ordering of relations, we developed a data analysis framework based on higher-order graphical models. Extending the common network perspective, it allows to combine information on both topological and temporal characteristics of time-resolved relational data into compact probabilistic graphical models. This approach provides new ways to (i) model dynamical processes like diffusion, cascades or epidemic spreading, (ii) detect temporal-topological clusters based on higher-order Laplacians and spectral methods, (iii) assess the importance of nodes, and (iv) study the controllability of complex systems. This research aims at methodological advances which not only provide us with novel data mining techniques, but whose impact reaches beyond computer science, with applications in the modeling of complex systems in physics, systems biology, social science and economics.

Principal Investigators: Prof. Dr. Ingo Scholtes

Academic partner: Prof. Dr. Aniko Hannak

Project duration: 2020-2024

Funding: CHF 400,000 from Honda Research Institute GmbH

Short summary

With recent significant advances in intelligent systems, the question on the future relation between human and artificial intelligence has gained more interest. There are a number of reasons to promote cooperative systems as opposed to purely autonomous systems. Cooperative systems basically will not replace the human (not even in cases where this might be functionally possible) but will work together with the human in a team. At the moment, we have a very limited understanding of how such teams should be organized and what would be necessary for the human to feel comfortable in such a new team situation. How would human-robot or in general human-AI teams be different from purely human teams? Can artificial intelligence be integrated with human experience, creativity and intelligence such that the resulting collaboratiion between human and AI surpasses both human- and AI-level performance? Experiences from human-animal teams like in security or rescue operations with dogs or horses, as well as insights about the optimal composition of team members with heterogeneous skills can help us to chart possible routes to optimal cooperation patterns between humans and AI technologies.

 

Principal Investigator at our chair: Prof. Dr. Ingo Scholtes

Project duration: 2020-2024

Industry Partner: genua GmbH

Academic Partner: Dr. Christoph Gote, Chair of Systems Design, ETH Zürich

Short summary

Software systems are at the heart of the digital society: They control critical infrastructures like communication or energy systems, fuel the increasing automation in industrial manufacturing and are key drivers of the digital economy. Despite this importance, the development of complex software systems is still a fundamental challenge. Credible reports indicate that the majority of software projects run over time or budget -- or fail altogether, resulting in billions of dollars wasted every year. And while technical aspects like, e.g., programming techniques, testing methods, or developer support tools have improved significantly over the past years, our understanding how human and social factors contribute to success or failure of software projects is still in its infancy.

Addressing these challenges, I use data science to quantitatively study collaborative software engineering processes. As an example, we use network analysis and statistical modeling to study the evolution of software architectures based on large-scale data from software repositories. This not only allows us to trace the maintainability of software systems. We can also assist developers in the refactoring of code. We further extract large data sets from online support tools, and analyze them to better understand how social factors influence software development processes. This approach has helped us to uncover social mechanisms at work in software development, to quantify risks in Open Source communities, and to improve information systems used by software development teams.