Sign Up
A schema with 4 blocks describing the point of Trickest workflow
back to posts

Cloudflare bypass - Discover IP addresses of Web servers in AWS

discover ipscloudflareincapsulasucuriAWSscan
January 30, 2023
6 mins read
Carlos Polop

Carlos Polop

Cloud Pentesting Team Leader

In order to improve the security of web pages, companies use proxy services from providers such as Cloudflare, Incapsula, Sucuri, Akamai, etc. Web applications are allocated behind the mentioned proxies, so the public can only interact with them through these proxies, which are applying security measures. However, these web applications are usually going to be running on a server with an IP address. Therefore, if a pentester manages to find the origin IP address hosting the web application, he or she could be able to access to web application directly in that IP, bypassing all the protections of the proxies.

According to Statista, in 2022, 64% of the Internet was running in the 3 main cloud providers, and 38% in AWS alone. Therefore, there are good chances that the IP running the web application behind a proxy is going to be running inside a cloud provider.

This blog post is going to cover a search for the origin IP address of a web page by scanning all the IP addresses belonging to AWS and checking each using the Trickest workflow automation.

Configuring a Page Behind Cloudflare

Let's start with the web page and put it behind Cloudflare.

This is what it looks like:

Banner Web Page Carlos Polop Hacker's Tools

The web is running in an EC2 instance with the IP, but if I check the "A" DNS record, I get an IP inside Cloudflare's network:


; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>>
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16132
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

; EDNS: version: 0, flags:; udp: 65494
;   IN  A


Therefore, a pentester won't know on which IP this web page is running.

Moreover, if someone accesses the web page through the IP directly, this is what he or she will find:

IP Web Page Apache Ubuntu

Searching Webs in AWS IPs

Let's start by creating a workflow to discover all the web applications in ports 80 and 443 in IPs that belong to AWS.

Only ports 80 and 443 are going to be checked because those are the used ports when hiding things behind Cloudflare.

  1. Download AWS IP ranges

The initial node is connected to a mapcidr node that will reduce too big ranges to smaller ranges because rustscan had some trouble managing big ranges. This is done with a python script.

Load AWS IPs Node in Trcikest workflow editor

  1. Port scan ports 80 and 443 all the IP ranges using rustscan in parallel

The initial part of the workflow is to use parallelize the use of rustscan.

The BATCH_SIZE used in the node generate-line-batches is 50000. It's a value that I found that won't overload rustscan with too many IP addresses (as it will be running on a medium size machine).

If you don't understand this part of the work check out the Trickest docs as a guideline.

Standarize part of the Scan GCP Trickest workflow editor

The parameters used with rustscan are:

  • ports: 80, 443
  • targets: List of IPs to scan

Then, from the rustscan output, get a nice list in the format <IP>:<PORT> using the standardized output node (a custom shell script).

Note that I used rustscan instead of masscan, because, even if masscan could be faster, in my tests, it had too many false negatives (open ports weren't found open).

  1. Check all found open ports for web servers

The tool httpx can be used to confirm that the open ports are web servers.

httpx node of the Scan AWS Trickest workflow editor

This task is again done in parallel with a BATCH_SIZE of 55000 in this case.

The parameters used with httpx are:

  • threads: 200
  • silent: true
  • random-agent: true
  • follow-redirects: true
  • domain-list: The list of <IP>:<PORT> to check

The code seen in the image is from the node get valid urls of webs, which will get the valid URLs from the output of httpx. Some URLs answer with eternal redirects, and we are also removing those in this node.

  1. Zip output

Zip node of the Scan AWS Trickest workflow editor

The last step is to get all the valid URLs in a list and zip that list.

Run It!

Output of a zip-to-out node in the Scan GSP Trickest workflow editor

One Trickest run with 3 medium machines took close to 20 hours to complete and found almost 13 million URLs (35MB zip file).

Try 1 Finding the Original IP Address

Inside those 13M IPs, there is the one that hosts The tool hakoriginfinder is a tool capable of sending http requests to hosts setting the desired Host header, and comparing that response with the original response of the host to find out if it's potentially the same one.

Therefore, a workflow could use this tool over all the previously discovered IP URLs to find the host hosting the web page

  1. Getting the URLs from the previous workflow

In the initial part of the workflow, the zip from the previous workflow is downloaded and unzipped.

Get Zip node of the Check domain for cloudflare Trickest workflow editor

  1. Search for the origin IP address

In the second part, the mentioned hakoriginfinder tool is run through all the AWS URLs discovered previously, and a final grep is performed only to get the matches.

hakoriginfinder node in the Check domain for cloudflare Trickest workflow editor

Run It!

Run tab of the Check domain for cloudflare Trickest Workflow

Well, this is disappointing... I left it running for 28 hours with 3 medium parallel machines, and it only checked a bit more than 10% of all the URLs.

Then I stopped it, and that's why you see red in the last two nodes.

On a positive note, the origin IP address of the web page appeared in the found matches. So at least it found the origin IP address.

Try 2 Finding the Original IP Address

The use of the tool hakoriginfinder was a good idea, as it's capable of recognizing the IP where the page is located, but it was super slow.

In order to make the IP discovery faster, several changes to the previous workflow were made.

  1. Filter using httpx and a unique string

After getting the valid URLs from AWS, a httpx node can be used that will access all the URLs sent in the Host header of the searched domain and will filter the results via an expected unique string in the web response.

Highlighted httpx node in the Final Check Cloudflare domain faster Trickest Workflow editor

This will run much faster than hakoriginfinder and will filter a vast amount of URLs from the initial list.

  1. Filter results with a custom script

Then, from all those URLs, a custom script can be used to filter even more the list based on the results.

Highlighted get exact/similar URLs node in the Final Check Cloudflare domain faster Trickest Workflow editor

The script performs these actions:

  • Removes URLs that redirect or load via SSRF the page indicated in the Host header
  • Gets similar URLs using data such as HTML title and response length.
  • Gets similar URLs using the calculated simhash.
  1. Filter with hakoriginfinder

At this point, all of the remaining results have high chance to find the IP. Usually, I add the last filter to find the best results from hakoriginfinder's output.

highlighted hackoriginfinder tool in the Final Check Cloudflare domain faster Trickest workflow editor

Run It!

Run tab of the Final Check Cloudflare domain faster Trickest workflow

It took 13h to complete, and the final node had 12 final URLs, the first one we were looking for. Therefore we can say this search was finally a success!

Conclusion & Improvements

Trickest managed to bypass Cloudflare, find the website IPs behind a proxy, and find the IP. However, let me sum up what you should keep in mind.

If the webpage was correctly configured, it would only allow Cloudflare proxies to access the website's content. Therefore, using this technique, you won't be able to find it (yet companies usually don't protect this).

The final Trickest workflow found 12 final URLs, but only 1 was valid. It's a great result, but probably tuning the custom script would remove the 11 false positives.

13 hours were necessary to complete the workflow with 3 medium parallel machines, but it checked almost 13 million URLs. You can come up with plenty of ideas to improve this workflow and reduce execution time, and the simplest way to reduce it is to use more machines in parallel.

Another improvement that could be performed is to implement some tool/script capable of finding a highly probable unique string from a website so the user doesn't need to indicate it.

Additionally, there are other well-known techniques to discover the origin IP addresses of web pages, like history DNS records. You can find more info about them in HackTricks.

If you enjoyed this blog post and like the idea of automating your work, sign up and explore Trickest automated workflows.


Complete our registration to elevate and automate your offensive security endeavors.

Get started