# Entropy

You are currently browsing articles tagged Entropy.

## Accurate Causal Inference on Discrete Data

by Budhathoki, Kailash; Vreeken, Jilles, MPI Saarland

Published in: Proceedings – IEEE International Conference on Data Mining, ICDM, 2018; http://eda.mmci.uni-saarland.de/pubs/2018/acid-budhathoki,vreeken.pdf

## Problem setting

Given two discrete random variables $X,Y$ with finite domains $\mathcal{X}, \mathcal{Y}$ respectively. Given samples of $X,Y$ from the joint distribution $P_{X,Y}$ and assuming that there is no common cause of $X$ and $Y$ (no confounder), we want to know if $X$ is a cause of $Y$ (denoted as $X \rightarrow Y$), or vice versa. The paper assumes that the data are generated by an additive noise model (ANM), for example $X:=N_X, \\ Y:=f(X) + N_Y,$ with $N_Y \perp X$

It is well known that ANMs in the discrete setting are always identifiable ( $P_{X, Y}$ admits an ANM from $X$ to $Y$, but not in the reverse direction, or vice versa).

The general idea for solving the problem, if $X$ causes $Y$ or vice versa, is to fit an ANM in both directions, choose the direction with „best“ independence between $N_Y$ and $X$ or between $N_X$ and $Y$ as the causal one. In the paper the Shannon entropy $H(X) := -\sum_{x \in \mathcal{X}} P(X=x) \log P(X=x)$ is used as a dependency measure. With $P(Y|X) = P(N_Y |X)$ the total entropy for a sample with $X \rightarrow Y$ is $H(X) + H(N_Y|X)$. The central statement of the paper is that if $X \rightarrow Y$, then $H_{X \rightarrow Y} := H(X) + H(N_Y) < H(Y) + H(N_X)$ whenever $X, Y$ is induced by an ANM from $X \rightarrow Y$.

## Solution

An algorithm called ACID (https://github.molgen.mpg.de/EDA/cisc) is proposed that tries to minimize the entropy of the noise term. ACID performs a discrete regression of $Y$ on $X$ and converges but can be stuck into local minima. The algorithm is a heuristic approach, since the straightforward exact version would be intractable w.r.t. running time. The computational complexity is $O(|\mathcal{Y}|^{|\mathcal{X}|}$. For domains with size less than $40$ the runtime is in an acceptable range.

## Applications and Experimental Results

ACID can be applied for discrete data samples (with reasonable preprocessing it can also be applied to continuous data). ACID was tested with synthetic data generated with artificial ANMs and by using several different families of distributions. It has also been tested on some selected real-world datasets. Since ACID is designed for data coming from an ANM it performs very well on those synthetic data. Also on the (few) selected real-world data, it was quite accurate. However, if other test sets are used (with multiplicative noise, more complex real-world data) it turns out that the overall performance of ACID is not very good compared to other methods (around 50-60% accuracy). Therefore, the conclusion of the authors is a little bit misleading and biased by the good results on the synthetic data set.

## Conclusion

The paper provides a heuristic algorithm for discrete regression of two discrete random variables which is theoretically founded on the information-theoretic fact that data that are generated by an ANM with $X \rightarrow Y$ has smaller entropy in that direction than in the other. Its assumptions on the model and data is quite strong and therefore the impact for practical applications is not too high.

Relevance: relevant for discrete data samples (small domain sizes), strong assumptions

Impact: low to medium for practical applications.

Level of difficulty: easy (basic knowledge about statistics and probability theory)