Appendix

Mathematical Notations

Notation	Definition
$\mathbb{R}$	Real space, i.e., more or less any numerical value.
$\mathbb{N}$	Natural numbers, i.e., any integer greater than 0.
$O$	Object space, i.e., a set of real-world objects.
$\phi$	Feature map, i.e., a map that defines the values of the features for objects.
$\mathcal{F}$	Feature space, i.e., the values of all features. Often the $\mathbb{R}^d$, i.e., the $d$-dimensional real space. In this case there are $d \in \mathbb{N}$ features.
$X$ (clustering, classification, regression)	Used for the instances of objects in the feature space. Depending on the context, $X$ is either a set of instances have $X = \{x_1, ..., x_n\} \subseteq \mathcal{F}$. There are also some cases where $X$ is used as a random variable instead, the set would then be $n$ realizations of this random variable.
$Y$ (clustering, classification, regression)	Used for the value of interest, e.g., the classes for classification or the dependent variable in regression. Defined either as a set $Y= \{y_1, ..., y_n\}$ or a random variable (see $X$).
$I$	Finite set of items {i_1, ..., i_m}.
$T$	Finite set of transactions $T=\{t_1, ..., t_n\}$ where $t_i \subseteq I$ for $i=1, ..., n$.
$X$ (association rules)	Antecedent of an association rule.
$Y$ (association rules)	Consequent of an association rule.
$X \Rightarrow Y$	Association rule where $Y$ is a consequent of $X$.
${n \choose k}$	The binomial coefficient ${n \choose k} = \frac{n!}{(n-k)!k!}$.
$\mathcal{P}(I)$	The power set of a finite set $I$.
$\vert \cdot \vert$	Cardinality of a set, e.g., $\vert X \vert$ for the number of elements of $X$
$d(x,y)$	Distance between two vectors $x$ and $y$, e.g., Euclidean distance, Manhattan distance, or Chebyshev distance.
$argmin_{i=1,...,k} f(i)$	The value of $i$ for which the function $f$ is minimized.
$argmin_{i \in \{1, ..., k\}} f(i)$	Same as $argmin_{i=1,...,k} f(i)$.
$\min_{i=1,...,k} f(i)$	The minimal value of the function $f$ for any value of $i$.
$\min_{i \in \{1, ..., k\}} f(i)$	Same as $\min_{i=1, ..., k}$.
$argmax$	See $argmin$.
$\max$	See $\min$.
$\sim$	Used to define the distribution of a random variable, e.g., $X \sim (\mu, \sigma)$ to specify that $X$ is normally distributed with mean value $\mu$ and standard deviation $\sigma$.
$C$ (classification)	Set of classes.
$C$ (clustering)	Description of a cluster.
$h$	Hypothesis, concept, classifier, classification model.
$h^*$	Target concept.
$h'_c$	Score based hypothesis that computes the scores for the class $c$
$P(X=x)$	Probability that the random variable $X$ is realized by the value $x$.
$p(x)$	$p(x) = P(X=x)$ for a random variable $x$.
$P(X \vert Y)$	Conditional probability of the random variable $X$ given the random variable $Y$.
$H(X)$	Entropy of the random variable $X$.
$H(X \vert Y)$	Conditional entropy of the random variable $X$ given the random variable $Y$.
$I(X; Y)$	Information gain for $X$/$Y$ if the other variable is known. Also known as mutual information.
$e_x$	Residual of a regression.
$x_t$	Values of a time series $\{x_1, ..., x_T\} = \{x_t\}_{t=1}^T$.
$T_t$	Trend term of a time series.
$S_t$	Seasonal term of a time series.
$R_t$	Autoregressive term of a time series.