Association Discovery with the BigML Dashboard

8.3 Association Set Score

Each predicted item has an score associated. This score is used to rank the predicted items returned. (See Figure 8.10 .) The score measures the similarity between the left-hand-side of the discovered rules, a.k.a Antecedent, and the input data.

\includegraphics[]{images/assocset-results2}
Figure 8.10 Score for predicted items

The score uses the cosine similarity to measure the level of coincidence between the input data of the association set and the antecedent of the association rules.

\[ sim(inputs, antecedent) = \frac{|inputs \cup antecedent|}{\sqrt{|inputs|}\sqrt{|antecedent|}} \]

If the rule’s antecedent does not contain any of the input items, the score will be zero. If the rule’s antecedent contains at least one item from the ones given in the input data, the score will be greater than zero. If the antecedent matches the input items exactly, then it will yield the maximum similarity score, which is one. For example, if we have the following rules:

  • Rule 0: \([pears] \rightarrow [kiwis]\)

  • Rule 1: \([bananas, pears] \rightarrow [kiwis]\)

  • Rule 2: \([oranges, bananas] \rightarrow [peaches]\)

  • Rule 3: \([oranges] \rightarrow [apples]\)

Given the input itemset \(oranges\) and \(bananas\), the “Rule 0” will have a score equal to zero, while the “Rule 2” will yield a score equal to one since its antecedent perfectly matches the input data. “Rule 1” and “Rule 3” will have a score between zero and one because they have partial matches.

This similarity score is then multiplied by a given rule measure to produce a similarity-weighted score. You can select any of the measures explained in section 2.1 to weight the score: Coverage, Support, Confidence, Leverage or Lift. (See Figure 8.11 .) By default, BigML uses the same measure used to create the association (see section 4.3 ).

\includegraphics[]{images/assocset-score}
Figure 8.11 Association set scoring measure

For each rule with a non-zero score, its Consequent is added to the prediction, as long as it is not already contained in the input set. If a consequent is predicted by multiple rules, its score will be the sum of the individual rule’s scores.

For a further reading about the association set score refer to this paper.