Back in 2014, I had launched a site called estipaper.com, and I was trying to figure out ways of getting some early traffic. I had noticed that someone had posted it on a website where people would rank pointless sites (yes, I did want to get to the top of this list and I understand the irony of that). The higher the rank, the more traffic. So I started a long game of cat and mouse with the site operator where I would vote spam my site to get higher on the rankings. Eventually, they just removed Estipaper from the site entirely, but it was a lot of fun (and got tens of thousands of visits in the meantime). The cat and mouse game probably lasted 2-3 months.
read more
#!/usr/bin/python
#estimation: estipaper is 450 above cookie clicker on 12/20/14. Clicker caught up on 1/2/15.
#estimation: estipaper is 495 above cookie clicker on 1/2/15. Clicker caught up on 1/10/15.
#estimation: estipaper is 1229 above cookie clicker on 1/10/15. Went down to position 11 on 1/12/15. It seems as though they have frozen the vote count for estipaper.
#appears as though they've finally implemented a check for the actual source address.
#looking into proxies now: scrapping from hidemyass.com's free ones.
#Low: x-forwarded is my IP, HTTP-VIA exists and sometimes ProxyID.
#medium: IP was sometimes not even the proxy's IP, but seemed to route traffic through another proxy, with the x-forwarded being the proxy's IP. Another: Unknown x-forwarded, did show HTTP via. Another: routed through two proxies? Proxy IP wasn't visible at all. Another: proxy IP and forwarded for was random IP. Another: another proxy , x-forwarded was similar to proxy's IP. Another: proxy IP but then forwarded for was loopback (127.0.0.1).
#high: Proxy IP and just HTTP Via. Another: no headers at all. Another: just HTTP_VIA. Another: just HTTP_VIA. Another: just HTTP_VIA. Another: just HTTP_VIA. Another: just HTTP_VIA. Another: no headers at all. Another: no headers at all.
#https: These were only high anonymity proxies and they are fully able to transmit via http. They were either Elite or anonymous proxies when tested.
#Found a proxy that was high, https, which was actually transparent. Probably aren't many of them, though.
#using hide-my-ass scraper: ./hide_my_python.py -o test -pr {http,https}
#test uniqueness of browser (should work with curl): https://panopticlick.eff.org
#If DNS is a concern, can specify DNS server with --dns-servers
#Looks like I can send http over https proxies.
import os
import sys
import time
import random
import traceback
arrayOfProxies = [("58.246.199.122","3128"),("92.255.242.163","8800"), redacted]
arrayOfUserAgents = ['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36','Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10; rv:33.0) Gecko/20100101 Firefox/33.0','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36']
numberOfProxies = len(arrayOfProxies)
numberOfUserAgents = len(arrayOfUserAgents)
runID = sys.argv[1]
targetSiteSearch = sys.argv[2]
targetSiteID = sys.argv[3]
timeToWait = sys.argv[4]
runNumber = 0
while True:
try:
runNumber = runNumber + 1
os.system("rm cookies{0}".format(runID))
currentProxy = arrayOfProxies[random.randint(0,numberOfProxies-1)]
currentUserAgent = arrayOfUserAgents[random.randint(0,numberOfUserAgents-1)]
print "Using user agent: "+currentUserAgent
print "Using the following proxy: "+currentProxy[0]+":"+currentProxy[1]+" out of "+str(numberOfProxies)
os.system("curl --max-time 90 --connect-timeout 150 --proxy http://{0}:{1} -b cookies{2} -c cookies{2} http://www.pointlesssites.com/site-search.asp?t={3} -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: {4}' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Cache-Control: max-age=0' -H 'Connection: keep-alive' --compressed > homepage{2}".format(currentProxy[0],currentProxy[1],runID,targetSiteSearch,currentUserAgent))
with open("homepage{0}".format(runID)) as homepage:
homepageAsString = homepage.read()
secondHalfOfHomepage = homepageAsString.split('vbut("{0}"'.format(targetSiteID))[1]
nValuesAsString = secondHalfOfHomepage.split(')\'> response{2}".format(currentProxy[0], currentProxy[1], runID, targetSiteID, n1, n2, currentUserAgent))
with open("response{0}".format(runID)) as response:
responseAsString = response.read()
print "Run: #{0}".format(runNumber) + ". Response is: " + responseAsString
print "Avoiding rate limit and suspicion, waiting {0} seconds...".format(timeToWait)
time.sleep(int(timeToWait))
except Exception as e:
traceback.print_exc()
errorTimeToWait = random.randint(350,400)
print "Hit exception, waiting a random {0} seconds...".format(errorTimeToWait)
time.sleep(errorTimeToWait)