Sources with the BigML Dashboard

Sources with the BigML Dashboard
Composite Sources
Table+Image Composite Sources

4.4 Table+Image Composite Sources

Machine learning with images oftentimes requires labels of the images, especially in the applications of image classification. In many scenarios, images and their labels were prepared separately, so they appear in separate files. For instance, there is a collection of images while their labels is in a CSV or JSON file.

Besides labels, CSV or JSON files can provide other information of the images, such as captions, comments, geo-coordinates, etc.

What CSV or JSON files do in such context is to provide information about the images in a table format, so here we call these files table files.

To accommodate this common practice of using a separate table file for image labels and other information, BigML provides two solutions. The first one is that users can upload the images and the table file separately. Then users can import all label fields in the table file to the image composite source. This is covered in subsection 4.3.4 .

Another solution is by using composite sources of format “Table+Image”. As implied by the name of the format, there are two parts in the data. One is a collection of images, another a table file that is a CSV or JSON file. The ultimate goal of such format is to create datasets that include the images and the fields of the CSV. In the case when a JSON is used, it is to create datasets that include the images and the lists or dictionaries of the JSON.

By using “Table+Image” composite sources, users don’t have to upload the images and the table file separately. They don’t need to preform the importing operation, and they can create datasets from “Table+Image” composite sources.

As described in section 4.2 , a composite source can be created by uploading an archived collection of files. When the archive contains a list of images and a table file, the resulting composite will have the format “Table+Image”.

The components of the composite source created from images and a table file will have different fields, with the image sources having only image fields while the CSV or JSON source having other fields including labels.

Strictly speaking, such “Table+Image” composite source is heterogeneous, which means not all component sources have uniformly the same fields, hence it could be of the “mixed” format (section 2.2 ). Instead, BigML recognizes that this is a table plus images, with the CSV or JSON providing tabular data, and essentially setting the fields of the composite source to those of the CSV or JSON. Additionally, attached to each row are the auto-generated fields of the extracted features from the images (subsection 4.3.2 ).

The component source from the CSV or JSON in the “Table+Image” composite source is also called the table component. It is expected that one or more columns of the table component refer to an image. Those columns will become fields in the composite source and have the optype path, which contain the (relative) file name of the corresponding image, as extracted from the zip file index. BigML tries to discover which fields in the table component refer to images using the following heuristics:

The field is named “file”, “filename”, “file name”, “path” or “image”, possibly by punctuation (/, -, _, or blank) and a number (e.g. “path 3”, “image/2”).
The preview of the field contains values also found in the preview of the filenames extracted from the images.

In the rare cases of BigML not recognizing the path field properly, users can go to the source configuration, and update the optype of the intended field to path.

Here is a simple example, which is a zip file containing 6 files:

Archive:  images.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       78  08-18-2020 17:07   label.csv
    15980  08-18-2020 17:03   img03.jpg
    51886  08-18-2020 17:03   img02.jpg
    38361  08-18-2020 17:03   img01.jpg
    17700  08-18-2020 17:03   img05.jpg
    55856  08-18-2020 17:03   img04.jpg
---------                     -------
   179861                     6 files

There are 5 images file and a CSV which looks like:

image, label
img01.jpg, a
img02.jpg, b
img03.jpg, b
img04.jpg, c
img05.jpg, a

Once the zip file is uploaded to the BigML Dashboard, a “Table+Image” composite source is created:

\includegraphics[]{images/sources/source-composite-table+image-example} — Figure 4.43 A composite source in Table+Image format

As seen in Figure 4.43 above, there are not only image path, image id and extracted image features as fields in the composite source, but also the “label” field which was from the CSV.

The uploading of the zip file above and the subsequent creation of its dataset is essentially equivalent to the following operations combined:

Upload the CSV and create a source;
Upload the images and create an image composite source;
Create datasets from the two sources, respectively;
Perform a dataset join by the column “image” in the CSV resulted dataset and the field “image” in the image dataset.

4.4.1 Views of Table+Image Composite Sources

Just like “Image” composite sources, “Table+Image” composite sources also have three views.

The fields view lists all the fields in the sources, that are the fields from the images, which includes at least the image field and the path field, as well as the fields from the table source, such as from a CSV.

\includegraphics[]{images/sources/source-table+image-fields-view} — Figure 4.44 The fields view of a Table+Image composite source

As seen above, the fields view shows the image field and the path field from the images, and the categorical field called “label” from the table source.

Optionally, users can click on the “show image features” icon next to the search box to show all image features in the view.

The sources view list all component sources, including the table component, which is a CSV file in the example below.

\includegraphics[]{images/sources/source-table+image-sources-view} — Figure 4.45 The sources view of a Table+Image composite source

The images view of a “Table+Image” composite is different from that of a “Image” composite. For an “Image” composite source, users can preview images, add and edit labels in its images view. But in the images view of a “Table+Image” composite source, users can only preview images.

\includegraphics[]{images/sources/source-table+image-images-view} — Figure 4.46 The images view of a Table+Image composite source

4.4.2 Convert Table+Image Composites to Editable Image Composites

Users can convert a “Table+Image” composite source to an “Image” composite source, which becomes editable. Under any view of a “Table+Image” composite source, mover over the cloud action icon on the right of the source title, then click on the menu item CONVERT TO EDITABLE COMPOSITE

\includegraphics[]{images/sources/source-table+image-convert-to-image} — Figure 4.47 Convert a Table+Image composite source to an editable Image composite source

A new “Image” composite source is then created, adding “editable” to its original title as the default new title.

\includegraphics[]{images/sources/source-table+image-converted} — Figure 4.48 A new Image composite source after conversion

In the new “Image” composite source, all fields from the table source become label fields. It contains all image sources as its component sources, and the total number of component sources is reduced by 1 comparing to the orginal “Table+Image” composite source – the table source is gone.

\includegraphics[]{images/sources/source-table+image-converted-images-view} — Figure 4.49 The images view after conversion

In the images view of the converted composite, not only the images can be previewed with pagination, but also the label fields can be edited.