Category

Content Discovery

Inputs

urls
file
required
List of urls
depth
string
Maximum crawling depth
header
string
Header(s) to include in HTTP requests
in-scope
file
List of URLs, paths, or regular expressions to include in crawling
rate-limit
string
Maximum number of requests to send per second per machine
header-file
file
Header(s) to include in HTTP requests
out-of-scope
file
List of URLs, paths, or regular expressions to exclude from crawling

Outputs

url-detailsurls

Discover Paths via Crawling

Description

Crawl a list of web server URLs to discover endpoints and form a comprehensive map of each asset on your attack surface.

Features

  • Supports headless browser crawling for more accurate spidering.
  • Parses JavaScript code to discover additional endpoints and hidden paths.
  • Can crawl thousands of web servers simulataneously.

Inputs

Required

  • urls: a list of URLs
https://foo.example.com

https://bar.example.com

Optional

  • depth: Maximum crawling depth (default: 5)
  • headless: Enable headless browser mode (default: false)
  • header: Header(s) to include in HTTP requests
  • header-file: File with header(s) to include in HTTP requests
  • rate-limit: Maximum number of requests to send per second per machine (default: 300)

Outputs

  • urls: List of discovered URLs.
https://foo.example.com/about

https://foo.example.com/login

https://bar.example.com/app.js

https://bar.example.com/admin
  • url-details: JSONLines records of URL discovery details.
{"url": "https://foo.example.com/about", "hostname": "foo.example.com", "domain_name": "example.com", "data_source": "crawling", "status_code": 200, "content_length": 9283}

{"url": "https://foo.example.com/login", "hostname": "foo.example.com", "domain_name": "example.com", "data_source": "crawling", "status_code": 200, "content_length": 2031}

{"url": "https://bar.example.com/app.js", "hostname": "bar.example.com", "domain_name": "example.com", "data_source": "crawling", "status_code": 200, "content_length": 4212}

{"url": "https://bar.example.com/admin", "hostname": "bar.example.com", "domain_name": "example.com", "data_source": "crawling", "status_code": 403, "content_length": 385}

Changelog

  • v1.0.0
    • Initial release
  • v1.1.0
    • Added header-file input