Cluster Analysis with the BigML Dashboard

7.6 Descriptive Information

Descriptive information is what allows you to describe a centroid so you can find it later and easily recognize it among other centroids.

Each centroid has an associated name, description, category, and tags. You can find a brief description for each concept in the following subsections. In Figure 7.32 , you can see the options that the More info panel gives to edit them.

\includegraphics[width=8cm]{images/cluster-predictions/edit-centroid}
Figure 7.32 Edit centroids

7.6.1 Name

If you do not specify a name for your predictions, BigML assigns a default name depending on the type of predictions:

  • Single centroid: the name always follows the structure “Centroid for <objective field name>”.

  • Batch centroid: BigML combines your prediction dataset name and the cluster name: “Batch centroid for <cluster name> with <dataset name>”.

Centroid names are displayed in the list view and also on the top bar of a prediction view. Centroid names are indexed to be used in searches. You can rename your centroids at any time from the More info panel.

The name of a centroid cannot be longer than 256 characters. There is no restriction on the characters that can be used in a name. More than one centroids can have the same name even within the same project, since they are automatically assigned unique internal identifiers.

7.6.2 Description

Each cluster prediction also has a description that it is very useful for documenting your Machine Learning projects. Centroids take the description from the clusters used to create them.

Descriptions can be written using plain text and also markdown. BigML provides a simple markdown editor that accepts a subset of markdown syntax. (See Figure 7.33 .)

\includegraphics[width=0.5\textwidth ]{images/cluster-predictions/cluster-description}
Figure 7.33 Markdown editor for centroids descriptions

Descriptions cannot be longer than 8192 characters and can use almost any character.

7.6.3 Category

Each prediction has associated a category taken from cluster used to create it. Categories are useful to classify predictions according to the domain which your data comes from. This is useful when you use BigML to solve problems across industries or multiple customers.

A prediction category must be one of the categories listed on table Table 7.1 .

Table 7.1 Categories used to classify predictions by BigML

Category

Aerospace and Defense

Automotive, Engineering and Manufacturing

Banking and Finance

Chemical and Pharmaceutical

Consumer and Retail

Demographics and Surveys

Energy, Oil and Gas

Fraud and Crime

Healthcare

Higher Education and Scientific Research

Human Resources and Psychology

Insurance

Law and Order

Media, Marketing and Advertising

Miscellaneous

Physical, Earth and Life Sciences

Professional Services

Public Sector and Nonprofit

Sports and Games

Technology and Communications

Transportation and Logistics

Travel and Leisure

Uncategorized

Utilities

7.6.4 Tags

A prediction can also have a number of tags associated with it that can help to retrieve it via the BigML API or to provide predictions with some extra information. Your prediction inherits the tags from the cluster use to create it. Each tag is limited to a maximum of 128 characters. Each prediction can have up to 32 different tags.