Sources with the BigML Dashboard

15 Takeaways

This chapter explains Sources in detail. Here’s a list of key points:

  • A source allows you to bring data to BigML.

  • BigML recognizes a variety of formats, protocols, and storages to create new sources.

  • A source stores an arbitrarily-large collection of instances describing an Entity of interest you want to model.

  • BigML works best with data in a tabular format where each row represents an Instances of the entity you want to model, and each column represents a Field describing all the instances.

  • After you create your source in BigML, each field in your source is displayed as a row and each column as an instance. This is because for highly dimensional data the transposed layout provides better navigability (i.e., datasets with thousands of fields can be paginated better).

  • A source helps BigML to know how to parse your data so that the instances and field types can be correctly processed.

  • You can configure your source in multiple ways to ensure BigML parses every field right.

  • You can create sources from local files, remote files, or using an inline editor.

  • Uploading one non-archive file, or one archive file (tar or zip) containing only one file, will create a single source. Uploading an archive file (tar or zip) containing multiple files will create a composite source.

  • BigML supports sources in different formats, such as Table (CSV or JSON), Image, or Table+Image.

  • A source is open when it can be modified. When a source is used to create a dataset, it’s automatically closed.

  • A source can be cloned. A cloned source is created as an open source.

  • You can create sources using image files. BigML supports a wide range of image formats.

  • An archive (tar or zip) file containing more than one images will create an image composite source, which can be used to create datasets for machine learning.

  • If images are inside folders in the archive file (tar or zip), uploading the archive file will create image composite sources with an added label field, the values of the label being the respective innermost folder names.

  • In the fields view, sources view and iamges view of an image composite source, you can view its fields, component sources, and images, respectively. You can also select component sources or images to perform certain operations, including adding labels to images in an open image composite source.

  • When an image composite source is created, by default BigML extracts 234 features per image, resprenseting its histogram of gradients. You can configure five sets of extracted image features in an open image composite source. You can also select one pre-trained convolutional neural network(CNN).

  • You can furnish your source with descriptive information (name, description, tags, and category) and also every individual field (name, label, and description).

  • You can only assign a source to a specific Project.

  • You can permanently delete a source.

  • Figure 15.1 graphically represents the workflows a BigML source enables. A BigML source can be created using local, remote, cloud-stored, or inline sources and can be used to create datasets.

\includegraphics[]{images/sources/sources-workflow}
Figure 15.1 Source workflow