WhizzML Reference Manual

4.14 Resource workflow

Every resource in BigML is totally reproducible. By inspecting the attributes defined in its JSON and those of the resources it’s derived from, we can reproduce the chain of steps that led to its creation. The summary of these create|update|get operations can be obtained using the resource-workflow procedure.

(resource-workflow res-id bool bool [mapoptions]) \(\rightarrow \) map

The first argument should be the ID of the resource whose creation workflow is rebuilt. When the second one is set to true the workflow will not contain the name or the range or rows in the case of datasets, so that you can reuse it to process new data files with the same structures. The third attribute set to true will only rebuild the last step of the workflow. The last optional argument can contain a map where different parameters can be used to tweak the process.

The available options so far are excluded-attrs, stop-res-ids, level and prediction-wf. We can use excluded-attrs mapped to a list of attributes so that they are not included in the procedure output. Using this option you can avoid storing some of the attributes that, even if they are mandatory for tracing purposes, need to be spared when retraining or sharing resources, like the project a resource has been created in. Also to stop the recursive process at some especific resource, we can use stop-res-ids mapped to a list of IDs. When any of these resources are found in the chain of parent resources, the recursive call will stop. Alternatively, the chain can be stopped after some steps using the level option set to the number of steps back that you want to script. Finally, the prediction-wf set to true will generate a wokflow that only takes into account the transformations needed for the test datasets to reproduce prediction resources, like a batchprediction or a batchcentroid. The models involved in the prediction chain will be stored in the inputs attribute of the workflow using their IDs.

The resulting map will contain the following attributes:

Workflow attribute

Description

steps

List of maps describing the information about the resource being operated on, and the operations applied

inputs

List of inputs that must be provided for the workflow to start, either IDs or remote URLs

output

ID of the resource being reified

name

Name of the resource being reified

description

Description of the workflow

last-step

Whether the workflow includes only the last step

reuse

Whether the workflow has been adapted for reuse

type-counters

Summary of the number of resources per type used in the workflow

mapped-ids

Map of the IDs in the workflow to the variables that represent them

var-ids

Map of the variable names in the workflow to the original IDs

Table 4.2 Workflow attributes

Each step has also a uniform structure:

Step attribute

Description

action

Type of action (create, update or get)

origin

The variable that contains the information used as origin when operating on the resource. It can also be a map with the origin attribute and the variable that contains its value

order

Number of the step in the workflow

output

Variable that will contain the generated output

args

Arguments used in the API call action. Contains both the origin information and the configuration attributes. Usually it’s a map, but when the action is get

ref

Map that contains the reference information for the resource being the output of the step, like its ID, name, name_options and creator

Table 4.3 Step attributes

An example of a step would be.

{"ref"
   {"id" "dataset/5babb5bf92fb56105d001f32"
    "name" "iris"
    "name_options" "150 instances, 5 fields (4 numeric, 1 text)"
    "creator" "demo_user"}
   "output" "dataset2"
   "action" "create"
   "origin" {"source" "source2"}
   "args"
   {"source" "source2" "objective_field" {"id" "000003"} "all_fields" false}
   "order" 6}