WhizzML Reference Manual

4.14 Resource workflow

Every resource in BigML is totally reproducible. By inspecting the attributes defined in its JSON and those of the resources it’s derived from, we can reproduce the chain of steps that led to its creation. The summary of these create|update|get operations can be obtained using the resource-workflow procedure.

(resource-workflow res-id bool bool [mapoptions]) \(\rightarrow \) map

The first argument should be the ID of the resource whose creation workflow is rebuilt. When the second one is set to true the workflow will not contain the name or the range or rows in the case of datasets, so that you can reuse it to process new data files with the same structures. The third attribute set to true will only rebuild the last step of the workflow. The last optional argument can contain a map where different parameters can be used to tweak the process.

The available options so far are excluded-attrs, stop-res-ids, level and prediction-wf. We can use excluded-attrs mapped to a list of attributes so that they are not included in the procedure output. Using this option you can avoid storing some of the attributes that, even if they are mandatory for tracing purposes, need to be spared when retraining or sharing resources, like the project a resource has been created in. Also to stop the recursive process at some especific resource, we can use stop-res-ids mapped to a list of IDs. When any of these resources are found in the chain of parent resources, the recursive call will stop. Alternatively, the chain can be stopped after some steps using the level option set to the number of steps back that you want to script. Finally, the prediction-wf set to true will generate a wokflow that only takes into account the transformations needed for the test datasets to reproduce prediction resources, like a batchprediction or a batchcentroid. The models involved in the prediction chain will be stored in the inputs attribute of the workflow using their IDs.

The resulting map will contain the following attributes:

Workflow attribute	Description
steps	List of maps describing the information about the resource being operated on, and the operations applied
inputs	List of inputs that must be provided for the workflow to start, either IDs or remote URLs
output	ID of the resource being reified
name	Name of the resource being reified
description	Description of the workflow
last-step	Whether the workflow includes only the last step
reuse	Whether the workflow has been adapted for reuse
type-counters	Summary of the number of resources per type used in the workflow
mapped-ids	Map of the IDs in the workflow to the variables that represent them
var-ids	Map of the variable names in the workflow to the original IDs

Table 4.2 Workflow attributes

Each step has also a uniform structure:

Step attribute	Description
action	Type of action (create, update or get)
origin	The variable that contains the information used as origin when operating on the resource. It can also be a map with the origin attribute and the variable that contains its value
order	Number of the step in the workflow
output	Variable that will contain the generated output
args	Arguments used in the API call action. Contains both the origin information and the configuration attributes. Usually it’s a map, but when the action is get
ref	Map that contains the reference information for the resource being the output of the step, like its ID, name, name_options and creator

Table 4.3 Step attributes

An example of a step would be.

{"ref"
   {"id" "dataset/5babb5bf92fb56105d001f32"
    "name" "iris"
    "name_options" "150 instances, 5 fields (4 numeric, 1 text)"
    "creator" "demo_user"}
   "output" "dataset2"
   "action" "create"
   "origin" {"source" "source2"}
   "args"
   {"source" "source2" "objective_field" {"id" "000003"} "all_fields" false}
   "order" 6}