Who’s That Bot?

MalBot · December 22, 2017, 11:30am

If you own a website, you already know that servers are visited all day long by bots and crawlers with multiple intents, sometimes good but also sometimes bad. An interesting field in web server logs is the “user-agent”. The RFC 2616 describes the User-Agent field used in HTTP requests:

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests. The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application.

It’s always interesting to keep an eye on the User-Agents found in your logs even if often they are spoofed. It can indeed contain almost anything. Note that many websites trust the User-Agent to display some content in different ways depending on the browser, the operating system. During an old pentest engagement, I also saw an authentication bypass via a specific User-Agent… which is bad! That’s why there exists a tool to stress test a website with multiple variations of User-Agent strings: ua-tester.

Most tools and browsers allow selecting the User-Agent via a configuration file or a plugin (for Browsers). The choice of a User-Agent string may vary depending on how your goal. During a penetration test, you’ll try to work below the radar by using a very common User-Agent (a well-known browser on a modern OS):

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) \
  Chrome/63.0.3239.84 Safari/537.36

Sometimes, you will change the User-Agent to detect behaviour changes (ex: to access the mobile version of a website).

Mozilla/5.0 (Linux; Android 4.4.4; SAMSUNG SM-G318H Build/KTU84P) AppleWebKit/537.36 \
  (KHTML, like Gecko) SamsungBrowser/2.0 Chrome/34.0.1847.76 Mobile Safari/537.36

Finally, sometimes it’s better to show clean hands!

For my researches and while hunting, I’m fetching a *lot* of data from websites. Sometimes, I’m accessing the same pages again and again. This behaviour can be seen as intrusive by the website owner. In this case, it’s always better to be polite and to present yourself. In my scripts, I’m always using the following User-Agent:

XmeBot/1.0 (https://blog.rootshell.be/bot/)

The URL is hidden on my blog and is available to provide more information about me and my intents (basically, why I’m fetching data). By keeping an eye on the page access statistics, it also helps me to learn who’s keeping an eye on their website logs

[The post Who’s That Bot? has been first published on /dev/random]

Article Link: https://blog.rootshell.be/2017/12/22/whos-that-bot/