Sources with the BigML Dashboard

8.7 Cloud Storages

You can create BigML Sources by downloading data from your cloud storages. Because of the popularity of cloud storages, BigML gives users the ability to configure their cloud storages on the Dashboard.

8.7.1 Configuring Cloud Storages

BigML allows you to configure the following cloud storage providers at https://bigml.com/account/cloudstorages (see Figure 8.14 ):

  • Google Cloud Storage

  • Google Drive

  • Dropbox

  • Microsoft Azure Marketplace

\includegraphics[]{images/sources/configure-cloud-storages}
Figure 8.14 Configuration Panel of Cloud Storages

If you enable cloud storage providers, you will have a new menu option in the listing source view where you can use a widget to navigate through those storages and locate your source. (See Figure 8.15 .)

\includegraphics[width=0.8\textwidth ]{images/sources/create-source-from-storage}
Figure 8.15 Menu options to create a source from cloud storages

To use any of those cloud storage providers, you need to first grant BigML access to it or provide your credentials. You can revoke the access or disable the new menu options at any time.

8.7.2 Dropbox

Given the OAuth token for a Dropbox file, request its download as a source via the Dropbox scheme, providing the token in the query string, without host:

dropbox:/path/to/file.csv?token=adfwdfda_weke23423_fheh324sxke33
Figure 8.16 Dropbox URL template

For instance, for the file iris.csv at the root of your Dropbox you could use:

dropbox:/iris.csv?token=adfwdfda_weke23423_fheh324sxke33
Figure 8.17 Example of a Dropbox URL

For the same file inside a csv folder the correct URI would be:

dropbox:/csv/iris.csv?token=adfwdfda_weke23423_fheh324sxke33
Figure 8.18 Example of a Dropbox URL using a folder in the path

8.7.3 Google Cloud Storage

Remote sources can use the gcs schema to specify any file stored in a Google Cloud Storage bucket. For publicly shared files, no other parameter is needed, e.g., if iris.csv is in the folder customerdata of the bigml bucket use:

gcs://bigml/customerdata/iris.csv
Figure 8.19 Example of a Google Cloud Storage URL

If the file is protected and you have an OAuth2 access token which has not yet expired, specify it via the token query string parameter:

gcs://bigml/test.csv?token=ya29.ygCrfy3xq1Bg5eIPMlIPUUqzEvOnC0kIXPdI
Figure 8.20 Example of a Google Cloud Storage URL using OAuth2

In addition, if you also have a refresh token, and your client identifier and application secret, they can be specified together with the token using the additional query string parameters refresh-token, client-id and app-secret, respectively, and BigML will take care of refreshing the possibly expired token as needed.

8.7.4 Google Drive

Remote sources using the gdrive protocol refer to files stored in Google Drive (GDrive). The full URI does not use a host, so it usually starts with gdrive:///, and its only path component refers to the required file’s file-id, as provided by the Google Drive service.

GDrive files are granted access via OAuth2, so you also need a client ID, app secret, a token, and refresh token to access the file. Generally, a gdrive URI looks like:

gdrive:///<file-id>?token=<..>&refresh-token=<..>&app-secret=<..>&client-id=<...>
Figure 8.21 Template of a Google Drive URL

For example:

gdrive:///0BxGbAMhJezOScTFBUVFPMy1xT1E?token=ya29.AQHpyxUssLrU7Gy4oEsUjqyV
mPJSPDuZKSc_ze3_Q8_l4miBDJPfOxnqkGC2vPH01savQVGt7oqSg-w&refresh-token=
1/x6zd8Wjy__yk437S7AxZ5Yy7Z
VXjKRME8TUE-Xh06ro&client-id=00723478965317
-07gjg5o912o1v422hhlkf2
rmif7m3no6.apps.googleusercontent.com&app-secret=AvbIGURFindytojt2
342HQWTm4h
Figure 8.22 Example of a Google Drive URL