Scan 10M websites for X-Recruiting header using GO on AWS Free Tier instance

Requirements and constraints:

  1. Scanning should be done using workers (we have huge list of domains to scan)
  2. Random DNS servers should be used (otherwise we will be banned by a DNS server, cause we’ll do so many DNS lookups)
  3. Memory usage. We want our app to use a small amount a memory, to be able to use Free-tier instance that has only 1Gb RAM)
unzip unzip
awk -F "\"*,\"*" '{print $2}' top-1m.csv > umbrella-1m-domains.txt
-F "\"*,\"*" 'NR>1 {print $3}' majestic_million.csv > majestic-1m-domains.txt
-F "\"*,\"*" 'NR>1 {print $2}' top10milliondomains.csv > domcop-10m-domains.txt
domcop-10m-domains.txt majestic-1m-domains.txt umbrella-1m-domains.txt | sort | uniq -u > uniq-domains.txt
awk -F "\"*,\"*" 'NR>1, NF > 0 {print $1}' nameservers.csv > dns-servers.txt

About the code.

DNS lookup implemented using popular package. Some interface defined to be able to change IP lookups implementation if needed. Resolver loads file’s content to the memory, and then uses random servers for each lookup. Resolve method returns list of returned IPs, if found.

Build and run the app locally

To run it locally you need to clone the project, install dependencies, unzip some data files, install it and run.

# clone the project
mkdir -p ${GOPATH}/src/
cd ${GOPATH}/src/
git clone .
cd hrscanner
# install dependency manager use in the project
go get -u
# install project's dependencies
govendor sync
# install the app
go install
# unzip domains and DNS servers files
cd ${GOPATH}/src/
unzip && rm
unzip &&
cd ${GOPATH}/src/
# run the app with default settings

Run the app on the server

I use MacBook, so to build a binary file to be run on Linux, I use static compiling inside the Docker. I prepared tiny script for this.

scp -r -C -i ~/.ssh/mykey.pem ${GOPATH}/src/ ec2-user@remote_host:~/data/scp -i ~/.ssh/mykey.pem ${GOPATH}/src/ ec2-user@remote_host:~/
sudo touch /etc/security/limits.d/custom.conf
* soft nofile 1000000
* hard nofile 1000000
sudo nano /etc/sysctl.conf
fs.file-max = 1000000
fs.nr_open = 1000000
net.ipv4.netfilter.ip_conntrack_max = 1048576
net.nf_conntrack_max = 1048576
ulimit -n
sudo apt install tmux
# run the app with custom workers number
./hrscanner -workers=500 > logs.txt &
# exit the session
tmux detach
# now u can close the terminal
tmux attach



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alexander Ravikovich

Alexander Ravikovich

In GO we trust. Software Engineer. @Isreal