Program > Keynote speakers

What do the Sources Say? Exploring Heterogeneous Journalistic Data As a Graph

by Ioana Manolescu

Professional journalism is of utmost importance nowadays. It is a main feature distinguishing dictatorships from democracies, and a mirror sorely needed by society to look upon itself and understand its functioning. In turn, understanding is necessary for making informed decisions, such as political choices.

With the world turning increasingly digital, journalists need to analyze very large amounts of data, while having no control over the structure, organization, and format of the data. Since 2013, my team has been working to understand data journalism and computational fact-checking use cases, to identify and develop tools adapted for this challenging setting. I will describe our SourcesSay project (2020-2024), in which extremely heterogeneous data sources are integrated as graphs, on top of which journalistic applications can be supported through flexible graph queries. I will explain the data source integration module, the role played by Information Extraction and Entity Disambiguation, as well as novel techniques to explore and simplify these graphs.

This work is joint with Angelos Anadiotis, Oana Balalau, Helena Galhardas, Tayeb Merabti, Emmanuel Pietriga, and many other colleagues.

Project Web site: https://sourcessay.inria.fr

Ioana Manolescu is a senior researcher at Inria Saclay and a part-time professor at Ecole Polytechnique, France. She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the scientific director of LabIA, a program ran by the French government whereas AI problems raised by branches of the local and national French public administration are tackled by French research teams. She is a member of the PVLDB Endowment Board of Trustees, and has been Associate Editor for PVLDB, president of the ACM SIGMOD PhD Award Committee, chair of the IEEE ICDE conference, and a program chair of EDBT, SSDBM, ICWE among others.
She has co-authored more than 150 articles in international journals and conferences and co-authored books on "Web Data Management" and on "Cloud-based RDF Data Management".
Her main research interests algebraic and storage optimizations for semistructured data, in particular Semantic Web graphs, novel data models and languages for complex data management, data models and algorithms for fact-checking and data journalism, a topic where she is collaborating with journalists from Le Monde. She is also a recipient of the ANR AI Chair titled "SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas" (2020-2023).

Ontologies for On-Demand Design of Data-Centric Systems

by Magdalena Ortiz

Over the last decade, ontologies have found impactful applications in data management, where they help improve access to data that is incomplete, heterogenous, or poorly structured. An ontology can act as a mediator between users and data sources to facilitate query formulation, and allow us to obtain more complete query answers by leveraging domain knowledge to infer implicit facts. Huge research efforts have been put into understanding the problems associated to query evaluation leveraging diverse ontology languages, many algorithms have been developed, and off-the-shelf engines for knowledge-enriched query answering exist.

In this talk, we advocate a bolder use of ontologies: not only to access the data stored in systems, but to facilitate the development of the systems themselves. Can an ontology be leveraged to infer on-demand the organization of data for a specific application? We claim they can. We address the question of how to specify the scope of a desired application, and propose automated inferences services for obtaining the desired data schema from such a description, validating it, and querying it. The techniques we describe are generic and admit any logic theory as an ontology, but we also present concrete decidability and complexity results for ontologies written in specific description logics.

Magdalena Ortiz is an assistant professor for Knowledge Representation and Reasoning at the Vienna University of Technology, where she works on the boundary between artificial intelligence and databases. She studied computer science in Mexico before moving to Europe to study computational logic in Italy and Austria, and then did her PhD in the Vienna University of Technology. She was an FWF Hertha Firnberg fellow, and is now the leader of a research group that aims at using knowledge to make data-centric systems smarter and more reliable, specially using formalisms based on description logics.

Towards human-guided rule learning

by Matthijs van Leeuwen

Interpretable machine learning approaches such as predictive rule learning have recently witnessed a strong increase in attention, both within and outside the scientific community. Within the field of data mining, the discovery of descriptive rules has long been studied under the name of subgroup discovery. Although predictive and descriptive rule learning have subtle yet important differences, they both suffer from two drawbacks that make them unsuitable for use in many real-world scenarios. First, hyperparameter optimisation is typically cumbersome and/or requires large amounts of data, and second, results obtained by purely data-driven approaches are often unsatisfactory to domain experts.

In this talk I will argue that domain experts often have relevant knowledge not present in the data, which suggests a need for human-guided rule learning that integrates knowledge-driven and data-driven modelling. A first step in this direction is to eliminate the need for extensive hyperparameter tuning. To this end we propose a model selection framework for rule learning that 1) allows for virtually parameter-free learning, naturally trading off model complexity with goodness of fit; and 2) unifies predictive and descriptive rule learning, more specifically (multi-class) classification, regression, and subgroup discovery. The framework we propose is based on the minimum description length (MDL) principle, we consider both (non-overlapping) rule lists and (overlapping) rule sets as models, and we introduce heuristic algorithms for finding good models.

In the last part of the talk, I will give a glimpse of the next steps towards human-guided rule learning, which concern exploiting expert knowledge to further improve rule learning. Specifically, I will describe initial results obtained within the iMODEL project, in which we develop theory and algorithms for interactive model selection, involving the human in the loop to obtain results that are more relevant to domain experts.

Dr. Matthijs van Leeuwen is associate professor and leads the Explanatory Data Analysis group at LIACS, Leiden University, the Netherlands. His primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and—ultimately—novel knowledge? Van Leeuwen was awarded a number of grants, including NWO Rubicon, TOP2, and TTW Perspectief grants, and best paper awards. He co-organised international conferences and workshops, and is on the editorial boards of DAMI and the ECML PKDD Journal Track. He was guest editor of a TKDD special issue on ‘Interactive Data Exploration and Analytics’.

Sustainable AI – What does it take for continued success in deployed applications?

by Stefan Wrobel

Advances in machine learning research have been so impressive that one would be tempted to believe that today most practical problems could easily be solved purely with data and machine learning. However, in the real world, the requirements demanded from a deployed application go far beyond achieving an acceptably low error for a trained model. Deployed applications must guarantee sustained success with respect to their functionality, their business viability, and their ethical acceptability. In this talk, we will analyze the challenges faced in practice with respect to these three dimensions of sustainability, pointing out risks and common misunderstandings and highlighting the role of hybrid modeling. We will then discuss our lessons learned from a number of real world projects for companies with respect to approaches for engineering and operating ML systems. The talk will conclude with a perspective on the demands placed on AI systems by customers and society, presenting our methodology for testing and ultimately certifying such systems.

Prof. Dr. Stefan Wrobel is Professor of Computer Science at University of Bonn and Director of the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS.

He studied computer science and artificial intelligence in Bonn and Atlanta, Georgia/USA (M.S., Georgia Institute of Technology) and obtained his PhD at the University of Dortmund. After holding positions in Berlin and Sankt Augustin he was appointed Professor of Practical Computer Science at Magdeburg University, before taking up his current position in 2002. In addition, he is the managing director of the Bonn-Aachen International Center for Information Technology (b-it) and one of the two spokespersons of the Competence Center Machine Learning Rhine-Ruhr (ML2R).

Professor Wrobel’s work is focused on questions of the digital revolution, in particular intelligent algorithms and systems for the large-scale analysis of data and the influence of Big Data/Smart Data on the use of information in companies and society. He is the author of a large number of publications on data mining and machine learning, is on the Editorial Board of several leading academic journals in his field, and is an elected founding member of the “International Machine Learning Society”. He was honored by the Gesellschaft für Informatik as one of the formative minds in German AI history.

As Speaker of the “Fraunhofer Big Data and Artificial Intelligence Alliance”, director of the “Fraunhofer Center for Machine Learning”, vice-speaker of the “Fraunhofer Information and Communication Technology Group“ as well as speaker of the “Fachgruppe Knowledge Discovery, Data Mining und Maschinelles Lernen”, a Special Interest Group of the German Computer Science Society, he is engaged nationally and internationally in pushing forward the benefits of digitization, big data and artificial intelligence.

Online user: 2

RSS Feed | Privacy