Sources with the BigML Dashboard

1.2 Creating a first source

Figure 1.6 shows an example of a source in ML-ready format. Each row represents a user of a cell phone service and each column is an attribute of each user. The data is structured to predict whether a user will be canceling her account (Churn?) given her current plan (Plan), the number of minutes used last month (Talk), the number of text messages sent last month (Text), the number of applications purchased last month (Purchases), the number of megabytes of data consumed last month (Data), and the current age of the user (Age). The source is a CSV (Comma Separated Values) file and, therefore, in the right format to be processed by BigML.

Plan, Talk, Text, Purchases, Data, Age, Churn?
        family, 148, 72, 0, 33.6, 50, TRUE
        business, 85, 66, 0, 26.6, 31, FALSE
        business, 83, 64, 0, 23.3, 32,TRUE
        individual, 9,  66, 94, 28.1, 21, FALSE
        family, 15, 0, 0, 35.3, 29, FALSE
        individual, 66, 72, 175, 25.8, 51,TRUE
        business, 0, 0, 0, 30, 32, TRUE
        family, 18, 84, 230, 45.8, 31,TRUE
        individual, 71, 110, 240, 45.4, 54, TRUE
        family, 59, 64, 0, 27.4, 40, FALSE
Figure 1.6 An example of a CSV file

To bring the source in Figure 1.6 to BigML, you can just drag and drop the file containing it on top of the BigML Dashboard. You can also paste its content into the BigML inline editor (see Chapter 9 ). A new source in the source list view will be shown. (See Figure 1.7 .)

\includegraphics[width=\textwidth ]{images/sources/dashboard-with-example}
Figure 1.7 Source list view with a first source on it

BigML automatically assigns to each source a unique identifier, “source/id”, where id is a string of 24 alpha-numeric characters, e.g., “source/570c9ae884622c5ecb008cb6”. This special ID can be used to retrieve and refer to the source both via the BigML Dashboard and the BigML API.

Once you click on the newly created source, you will arrive at a new page whose URL matches with the assigned ID. You will see that BigML has parsed the source and automatically identified the type of each of its seven fields as shown in Figure 1.8 .

\includegraphics[]{images/sources/bigml-source-example}
Figure 1.8 A source view

Note: In a source view, BigML transposes rows and columns compared to your original data (compare Figure 1.6 and Figure 1.8 ). That is, each row is associated with one of the fields of your original data, and each column shows the corresponding values of an instance. It becomes much easier to navigate them using a web browser if they are arranged this way when sources contain hundreds or thousands of fields. A source view only shows the first 25 intances of your data. The main goal of this view is to help you quickly identify if BigML is parsing your data correctly.