In industry, society and science advanced software is used for solving planning, scheduling and resource allocation problems, collectively known as constraint satisfaction or optimization problems. At the same time, one continuously gathers vast amounts of data about these problems. This project starts from the observation that current software typically does not exploit such data to update schedules, resources and plans. It aims at developing a new approach in which gathered data is analysed systematically in order to dynamically revise and adapt constraints and optimization criteria. Ultimately, this could create a new ICT paradigm, called Inductive Constraint Programming, that bridges the gap between the areas of data mining and machine learning on the one hand, and constraint programming on the other hand. If successful, this would change the face of data mining as well as constraint programming technology. It would not only allow one to use data mining techniques in constraint programming to improve the formulation and solution of constraint satisfaction problems, but also to employ declarative constraint programming principles in data mining and machine learning.
Project overview
Automated Model Acquisition
With today’s technology modeling constraint satisfaction and optimization problems is a cumbersome task. In particular for non-expert users, formalizing a model using appropriate variables and constraints is a challenge. The key challenge is to build systems that learn how to model a problem themselves, or that help a user to build models. We use basic data mining and machine learning technology to address this challenge, focusing on improving constraint programming technology.
Model Reformulation and Solver Optimization
Within the framework of CP the first step in solving a problem is to model it as a constraint satisfaction or optimization problem (CSP(O)). However, even if a problem has been modelled semantically correct as a CSP(O), it is often the case that the CSP(O) obtained is far too hard to solve for state-of-the-art constraint solvers. We use machine learning and data mining to improve a given model or search strategy.
Knowledge-based Data Mining
The result of constraint programming is often a schedule or a roster (e.g., a bus timetable) which can be applied in the real world subsequently. Once data is collected from this world, the inputs to a data mining algorithm include: background knowledge on the world (the layout of a city); background knowledge on schedule constraints (the frequency of bus rides); solutions deployed (the bus schedule); data about the actual execution (number of bus travellers in/out from a neighbourhood); data about other sources (private traffic spatio-temporal trajectories).
Declarative Data Mining
Many data mining problems can be formalized as combinatorial problems in a declarative way. For instance, tasks such as the discovery of patterns in data, or finding clusters of similar examples in data, often require constraints to be satisfied and require solutions that are optimal with respect to a given scoring function.
Application: Energy-aware Data Centres
Predicting running time and resource requirements for individual tasks prior to allocation to servers within a data-centre is a pre-requisite for building constraint programming tools for managing energy-aware data centres. We use earlier workloads to learn models for predicting duration, memory, CPU, disk requirements, etc., for a given task. This helps in scheduling tasks in an energy-aware manner leading to energy cost reduction for datacentres. Furthermore, combining workloads on a single server influences resource utilization in a non-trivial manner.
Application: Learning and Optimisation for Healthcare Delivery
In the current economic climate public health services throughout Europe face an ever increasing challenge to deliver an effective and efficient service to patients. A specific example of this is that the Health Service Executive, the Irish Government agency responsible for healthcare in Ireland, has a dedicated division focusing on this challenge. The HSE Directorate of Quality and Clinical Care has developed a strategy for chronic disease management which focuses on improving quality, increasing access while reducing costs.
Application: Human Mobility
We focus on the analysis of human mobility data for planning public and private transportation systems. In both cases, we make use of massive real life GPS datasets, obtained from subscribers of a pay-as-you-drive car insurance contract, under which the vehicle trajectories are periodically sent (through the GSM network) to a central server for anti-fraud and anti-theft purposes. This dataset has been donated to KDDLAB - UNIPI for research purposes by a leading company in Europe.