Association Discovery with the BigML Dashboard

1 Introduction

There are problems that require to find meaningful relationships among two or more values in large datasets across thousands of values, e.g., discovering which products are bought together by customers (i.e., market basket analysis), finding interesting web usage patterns, or detecting software intrusion. These problems can be solved using Association Discovery, a well-known Unsupervised learning learning technique to find relevant associations among values in high-dimensional datasets.

The BigML associations algorithm was acquired from Professor Geoff Webb (Monash University), a globally acknowledged expert, who spent ten years developing the association discovery in Magnum Opus. Read more about BigML algorithm in Chapter 2 .

Association Discovery (also called Association Mining) complements other Machine Learning techniques in two main ways as it:

  • Avoids the problems associated with model selection. Most Machine Learning techniques produce a single global model of the data. A problem with such a strategy is that there will often be many such models, all of which describe the available data equally well. A typical model chooses between these models arbitrarily, without necessarily notifying the user that these alternatives exist. However, while the system may have no reason for preferring one model over another, the user may, e.g., two medical tests may be almost equally predictive in a given application. If so, the user is likely to prefer the model that uses the test that is cheaper or less invasive.

  • A single model that is globally optimal may be locally suboptimal in specific regions of the problem space. By seeking local models, association mining can find models that are optimal in any given region. If there is no need for a global model, locally optimized models may be more effective.

This chapter provides a comprehensive description of the BigML associations, including how they can be created with 1-click (Chapter 3 ), all the configuration options (Chapter 4 ), and the twofold visualization provided by BigML, a network chart and a table (Chapter 5 ). BigML provides certain measures that rate each association; those are explained in section 2.1 . There is also a section devoted to how to structure your data (section 2.2 ), which is very useful to get the best performance of your association’s model. You can also export your associations into a CSV file section 9.1 ), move your associations to another project (Chapter 12 ), or delete them permanently (Chapter 14 ).

In BigML, the sixth tab on the main menu of your Dashboard allows you to list all your available associations. The association list view shows (Figure 1.1 ), for each association, the dataset it was created from, the association’s Name, the K (number of rules found), Age (time elapsed since it was created), and Size. The search menu option in the top right corner allows you to search your associations by name.

\includegraphics[]{images/assoc_list}
Figure 1.1 Associations list view

When you first create an account with BigML, or every time that you start a new project, your list view for associations will be empty. (See Figure 1.2 .)

\includegraphics[]{images/assoc_dashboard}
Figure 1.2 Empty Dashboard association view

Finally, in Figure 1.3 you can see the icon used to represent an association.

\includegraphics[width=2cm]{images/assoc-icon}
Figure 1.3 Associations icon