How do Machines work
Workflow execution and machines
There are two steps in assigning machines to a workflow execution:
- Assigning the machine type for each workflow node,
- Assigning the number of machines for a workflow execution in order to achieve parallelism.
Assigning the machine type to a node
As it is shown in Executing a Workflow and Scheduling a Workflow, before creating a Run, user should check and set the right machine type for each workflow node (tool, script and splitter). Virtual machine of defined type will be dedicated to a given node execution.
Default machine type
When building a workflow from scratch, every node has a small machine set as a default. If the workflow is copied from the Library, the machine's configuration is set by the workflow author.
Tool Failed due to "No enough resources"
In case of a failed node (tool or script) in a Run with an error message "No enough resources", go back to your workflow's Builder, define larger machine type for given node, and execute a workflow again with success.
The machine type should be determined regarding the complexity of the node as much as the workflow itself. If the tool, or the script, is set to perform some heavy-load task, more powerful machine should be assigned.
There are 2 key questions to be asked for choosing optimal machine type:
- Does the node need to be executed faster?
- Does the node consume or works with a lot of data?
If one of the answers to this is yes, you should probably give this node a more powerful machine.
Here is benchmark you can use:
| Node Type | Node Purpose | Machine Type | | --------- | ----------------------------- | ------------ | | Script | Merging files into one | Small | | Script | Grep strings in file | Small | | Tool | Bruteforcing with 20 threads | Small | | Script | Zip all screenshots in folder | Medium | | Tool | Bruteforcing with 100 threads | Medium | | Tool | Resolve 100k of subdomains | Medium | | Tool | Bruteforcing with 200 threads | Large | | Tool | Screenshot 30k subdomains | Large | | Tool | Spider url recursively | Large |
Assigning the number of machines to a run
As it is shown in Executing a Workflow and Scheduling a Workflow, in the final step of creating a run, you need to determine number of machines dedicated to a run. There are 2 cases:
- Workflow is one branch - nodes are executed consecutively (one after another). There's no reason to assign more than one machine of a specific type.
- Workflow contains parallel nodes - min. 2 nodes are set in parallel and could execute in parallel. In this case, you can assign multiple machines of a specific type to achieve faster, parallel execution.
Machines are reserved during the run progress
All of machines assigned to a specific run, will be turned on and occupied from run's beginning until its end, no matter how workflow looks like. We are working on optimising this mechanism.Parallelism and splitter
More about determining the number of machines when workflow contains a splitter in Splitter Nodes & Max Machines section.
Splitter Nodes & Max Machines
Workflows containing splitter nodes have no restriction on the number of the nodes needed to execute a workflow. The reason is that those nodes could execute in parallel, meaning that you can choose how many parallel tasks you want to execute for a tool connected to the splitter.
Imagine that you have a single URL that you want to brute-force. This means the tool must be executed using one machine and will not require any additional machines to be scaled.
Now, imagine a list of URLs that you want to brute-force with the same tool. This is a great case for using the splitter because it will multiply that tool and run it on multiple machines in parallel.
It will be executed for each line, which means that it could execute up to 500 URLs. How fast it will go depends on the number of machines assigned to workflow execution. A workflow like this will not have restrictions on how many machines it requires because it could be executed on one or the maximum number of machines from your Fleet.