gospider
Basic Usage Examples
Crawl One Site
Pass a site (e.g. http://trickest.io) to the site-to-crawl
input (type string
).
![screenshot of gospider node connected on the left with input domain node in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236834/docs/gospider/gospider_single_site.png)
Run gospider against single site
Crawl Multiple Sites
Pass a list of sites, contained within a file, to the sites-to-crawl
input (type file).
https://trickest.com
http://trickest.io
![screenshot of gospider node connected on the left with the input file domain node in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236833/docs/gospider/gospider_multiple_site.png)
Run gospider against multiple site
Improvements
Number of Concurrent Requests
The default maximum number of allowed concurrent requests per given site is 5. To modify this value pass a desired number of concurrent requests to the number-of-concurent-req
input (type string
).
![screenshot of gospider node connected on the left with two input nodes in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236833/docs/gospider/gospider_concurent_req.png)
Recursion Depth
To set up a max depth limits for recursion depth of visited sites, pass a desired value to the max-recursion-depth
input (type string
).
![screenshot of gospider node connected on the left with three input nodes in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236833/docs/gospider/gospider_recursion_depth.png)
Multi-threaded Crawl
To crawl multiple sites in parallel, pass desired number of threads to the threads
input (type string
).
![screenshot of gospider node connected on the left with four input nodes in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236834/docs/gospider/gospider_threads.png)
Use Special User-Agent
To set special User-Agent to be used, pass a desired mobile or web User-Agent value to the user-agent
input (type string
).
![screenshot of gospider node connected on the left with two input nodes in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236834/docs/gospider/gospider_user_agent.png)
Include Subdomains
Use include-subdomains
boolean input to include subdomains crawled from given sites.
![screenshot of gospider node connected on the left with two input nodes in the workflow editor](https://res.cloudinary.com/db14crach/image/upload/v1700236833/docs/gospider/gospider_include_subdomains.png)