Association Discovery with the BigML Dashboard

Association Discovery with the BigML Dashboard
Understanding Associations
Association Measures

2.1 Association Measures

This section details the precise formulas that are utilized to compute the BigML association measures. Given the association rule \((A \to C)\) where \(A\) is the antecedent itemset of the rule and \(C\) is the consequent, and \(N\) is the total number of instances in the dataset, below are the mathematical definitions for the measures utilized by the BigML associations:

Support: the proportion of instances in the dataset that contain an itemset.

\[ Support (itemset) = \frac{|\ instance \in D \, itemset \subseteq instance\, |}{N} \]

\[ Support (A \to C) = Support (A \cup C) \]
Coverage: the support of the antecedent of an association rule, i.e., the portion of instances in the dataset that contain the antecedent itemset. It measures how often a rule can be applied.

\[ Coverage (A \to C) = Support (A) \]
Confidence (or Strength): the percentage of instances that contain the consequent and antecedent together over the number of instances that only contain the antecedent. Confidence is computed using the support of the association rule over the coverage of the antecedent.

\[ Confidence (A \to C) = \frac{Support (A \to C)}{Support (A)} \]
Leverage: the difference between the probability of the rule and the expected probability if the items were statistically independent.

\[ Leverage (A \to C) = Support (A \to C) - (Support (A)\times Support(C)) \]
Lift: how many times more often antecedent and consequent occur together than expected if they were statistically independent.

\[ Lift (A \to C) = \frac{Support (A \to C)}{Support (A)\times Support(C)} \]