Cluster Analysis with the BigML Dashboard

2.4 Clusters with Images

Image obviously is one of the most important categories among all data, and its presence is ever increasing. It is estimated that more than 85% of all Internet traffic today are visual data. Research also indicates that 90% of the information transmitted to human brain is visual. Therefore it’s very important to support and develop machine learning with images.

BigML extracts image features at the source level. Image features are sets of numeric fields for each image. They can capture parts or patterns of an image, such as edges, colors and textures. BigML also supports image features extracted by pre-trained CNNs which capture more sophisticated features. Depending on different machine learning use cases and goals, all these image features can be effective in cluster analysis, as well as other unsupervised and supervised models.

For information about the image features, please refer to section Image Analysis of the Sources with the BigML Dashboard [ 22 ] .

\includegraphics[]{images/clusters/cluster-image-dataset-resnet18}
Figure 2.1 A dataset with images and image features

As shown in Figure 2.1 , the example dataset has an image field image_id. It also has image features extracted from the images referenced by image_id. Image feature fields are hidden by default to reduce clutter. To show them, click on the icon “Click to show image features”, which is next to the “Search by name” box. In Figure 2.2 , the example dataset has 512 image feature fields, extracted by a pre-trained CNN, ResNet-18.

\includegraphics[]{images/clusters/cluster-image-dataset-resnet18-fields}
Figure 2.2 A dataset with image feature fields shown

From image datasets like this, clusters can be created and configured using the steps described in the following chapters.