Bertossi, L. (2022). Declarative Approaches to Counterfactual Explanations for Classification. Theory Pract. Log. Program., Early Access.
Abstract: We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely responsibility scores. The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this study is on the specification and computation of best counterfactual entities, that is, those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints. Several examples of programs written in the syntax of the DLV ASP-solver, and run with it, are shown.
|
Bertossi, L. (2021). Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowl. Inf. Syst., 63, 199–231.
Abstract: There is a recently established correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints. In this work, answer-set programs that specify database repairs are used as a basis for solving computational and reasoning problems around causality in databases, including causal responsibility. Furthermore, causes are introduced also at the attribute level by appealing to an attribute-based repair semantics that uses null values. Corresponding repair-programs are introduced, and used as a basis for computation and reasoning about attribute-level causes. The answer-set programs are extended in order to capture causality under integrity constraints.
|
Bertossi, L., & Geerts, F. (2020). Data Quality and Explainable AI. ACM J. Data Inf. Qual., 12(2), 11.
Abstract: In this work, we provide some insights and develop some ideas, with few technical details, about the role of explanations in Data Quality in the context of data-based machine learning models (ML). In this direction, there are, as expected, roles for causality, and explainable artificial intelligence. The latter area not only sheds light on the models, but also on the data that support model construction. There is also room for defining, identifying, and explaining errors in data, in particular, in ML, and also for suggesting repair actions. More generally, explanations can be used as a basis for defining dirty data in the context of ML, and measuring or quantifying them. We think dirtiness as relative to the ML task at hand, e.g., classification.
|
Bertossi, L., Li, J., Schleich, M., Suciu, D., & Vagena, Z. (2020). Causality-based Explanation of Classification Outcomes. In DEEM'20: Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning (Vol. 6, pp. 1–10).
|
Livshits, E., Bertossi, L., Kimefeld, B., & Sebag, M. (2021). The Shapley Value of Tuples in Query Answering. Log. Methods Comput. Sci., 17(3), 22.
Abstract: We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single wealth-distribution measure that satisfies some natural axioms. While this value has been investigated in several areas, it received little attention in data management. We study this measure in the context of conjunctive and aggregate queries by defining corresponding coalition games. We provide algorithmic and complexity-theoretic results on the computation of Shapley-based contributions to query answers; and for the hard cases we present approximation algorithms.
|
Livshits, E., Bertossi, L., Kimelfeld, B., & Sebag, M. (2021). Query Games in Databases. Sigmod Rec., 50(1), 78–85.
Abstract: Database tuples can be seen as players in the game of jointly realizing the answer to a query. Some tuples may contribute more than others to the outcome, which can be a binary value in the case of a Boolean query, a number for a numerical aggregate query, and so on. To quantify the contributions of tuples, we use the Shapley value that was introduced in cooperative game theory and has found applications in a plethora of domains. Specifically, the Shapley value of an individual tuple quantifies its contribution to the query. We investigate the applicability of the Shapley value in this setting, as well as the computational aspects of its calculation in terms of complexity, algorithms, and approximation.
|