waymore
Find way more from the Wayback Machine
Details
Category: Recon
Publisher: trickest-mhmdiaa
Created Date: 7/1/2022
Container: quay.io/trickest/waymore:v4.3
Source URL: https://github.com/xnl-h4ck3r/waymore
Parameters
mode
string
requiredCommand:
-mode
- The mode to run: U (retrieve URLs only), R (download Responses only) or B (Both). If -i is a domain only, then -mode will default to B. If -i is a domain with path then -mode will default to R.input
file
requiredCommand:
--input
- The list of domains to find links for. This can be a domain only, or a domain with a specific path. If it is a domain only to get everything for that domain, don't prefix with www. You can also specify a TLD only by prefixing with a period, e.g. .mil, which will get all subs for all domains with that TLD (NOTE: The Alien Vault OTX source is excluded if searching for a TLD because it requires a full domain).limit
string
Command:
--limit
- How many responses will be saved (if -b is not passed). A positive value will get the first N results, a negative value will will get the last N results. A value of 0 will get ALL responses (default: 5000))config
file
Command:
--config
- Path to the YML config fileno-subs
boolean
Command:
--no-subs
- Don't include subdomains of the target domain (only used if input is not a domain with a specific path).retries
string
Command:
--retries
- The number of retries for requests that get connection error or rate limited (default: 1)timeout
string
Command:
--timeout
- For archived responses only, how many seconds to wait for the server to send data before giving up (default: 30)to-date
string
Command:
--to-date
- What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. 2021, 202112, etc.verbose
boolean
Command:
--verbose
- Verbose outputfrom-date
string
Command:
--from-date
- What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. 2016, 201805, etc.processes
string
Command:
--processes
- The number of processes (threads) used (default: 1)check-only
boolean
Command:
--check-only
- This will make a few minimal requests to show you how many requests, and roughly how long it could take, to get URLs from the sources and downloaded responses from Wayback Machine.regex-after
string
Command:
--regex-after
- RegEx for filtering purposes against links found from archive.org/commoncrawl.org AND responses downloaded. Only positive matches will be output.input-domain
string
requiredCommand:
--input
- The target domain to find links for. This can be a domain only, or a domain with a specific path. If it is a domain only to get everything for that domain, don't prefix with www. You can also specify a TLD only by prefixing with a period, e.g. .mil, which will get all subs for all domains with that TLD (NOTE: The Alien Vault OTX source is excluded if searching for a TLD because it requires a full domain).url-filename
boolean
Command:
-url-filename
- Set the file name of downloaded responses to the URL that generated the response, otherwise it will be set to the hash value of the response. Using the hash value means multiple URLs that generated the same response will only result in one file being saved for that response.keywords-only
boolean
Command:
--keywords-only
- Only return links and responses that contain keywords that you are interested in. This can reduce the time it takes to get results. Keywords are given in the config.yml file with the FILTER_KEYWORDS keylimit-requests
string
Command:
--limit-requests
- Limit the number of requests that will be made when getting links from a source (this doesn't apply to Common Crawl). Some targets can return a huge amount of requests needed that are just not feasible to get, so this can be used to manage that situation. This defaults to 0 (Zero) which means there is no limit.notify-discord
boolean
Command:
--notify-discord
- Whether to send a notification to Discord when waymore completes. It requires WEBHOOK_DISCORD to be provided in the config.yml file.capture-interval
string
Command:
--capture-interval
- Filters the search on archive.org to only get at most 1 capture per hour (h), day (d) or month (m). This filter is used for responses only. The default is 'd' but can also be set to 'none' to not filter anything and get all responses.exclude-url-scan
boolean
Command:
-xus
- Exclude checks for links from urlscan.iomemory-threshold
string
Command:
--memory-threshold
- The memory threshold percentage. If the machines memory goes above the threshold, the program will be stopped and ended gracefully before running out of memory (default: 95)output-inline-js
boolean
Command:
--output-inline-js
- Whether to save combined inline javascript of all relevant files in the response directory when -mode R (or -mode B) has been used. The files are saved with the name combinedInline{}.js where {} is the number of the file, saving 1000 unique scripts per file. The file combinedInlineSrc.txt will also be created, containing the src value of all external scripts referenced in the files.match-status-code
string
Command:
-mc
- Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config FILTER_CODE and -fcfilter-status-code
string
Command:
-fc
- Filter HTTP status codes for retrieved URLs and responses. Comma separated list of codes (default: the FILTER_CODE values from config.yml). Passing this argument will override the value from config.ymllimit-common-crawl
string
Command:
-lcc
- Limit the number of Common Crawl index collections searched, e.g. -lcc 10 will just search the latest 10 collections (default: 3). As of July 2023 there are currently 95 collections. Setting to 0 (default) will search ALL collections. If you don't want to search Common Crawl at all, use the -xcc option.match-keywords-only
string
Command:
--keywords-only
- Only return links and responses that contain keywords that you are interested in. This can reduce the time it takes to get results. you can pass a specific Regex value to use, e.g. -ko admin to only get links containing the word admin, or -ko .js(?|$) to only get JS files. The Regex check is NOT case sensitive.exclude-alient-vault
boolean
Command:
-xav
- Exclude checks for links from alienvault.comexclude-common-crawl
boolean
Command:
-xcc
- Exclude checks for links from commoncrawl.orgfilter-responses-only
boolean
Command:
--filter-responses-only
- The initial links from Wayback Machine will not be filtered, only the responses that are downloaded, , e.g. it maybe useful to still see all available paths from the links even if you don't want to check the content.limit-common-crawl-year
string
Command:
-lcy
- Limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with -lcc). For example, if you are only interested in data from 2015 and after, pass -lcy 2015. This will override the value of -lcc if passed. If you don't want to search Common Crawl at all, use the -xcc option.exclude-wayback-matchine
boolean
Command:
-xwm
- Exclude checks for links from Wayback Machine (archive.org)urlscan-rate-limit-retry
string
Command:
--urlscan-rate-limit-retry
- The number of minutes the user wants to wait for a rate limit pause on URLScan.io instead of stopping with a 429 error (default: 1)wayback-rate-limit-retry
string
Command:
--wayback-rate-limit-retry
- The number of minutes the user wants to wait for a rate limit pause on Watback Machine (archive.org) instead of stopping with a 429 error (default: 3).