Good Bots VS Bad Bots: How to Tell the Difference and How to Manage
A bot is an automated software that operates on a website or network to perform certain tasks (and usually repetitive tasks). Bot traffic is any traffic coming to a website that comes from bots, and not from humans accessing our site.
The thing is, some bot traffic is from ‘good’ bots and can be beneficial to our site. In fact, these good bots might be installed by the website owner itself. However, there are also bad bots that can produce severely negative impacts on the website, application, or network.
Here we will discuss the difference between good bots and bad bots, and what to expect from them.
What Is a Bot?
Any software application that is programmed to do automated tasks can be considered as ‘bots’. Automated, meaning these programs can run by themselves without any human involvement in starting them up and tweaking them during the operations. In most cases, bots are applied to do simple, repetitive tasks, although advanced bots nowadays (with AI and machine learning) can also perform advanced tasks.
There are a lot of bots over the internet. In fact, more than 50% of the total internet traffic is bots crawling webpages, scanning content, interacting with human users, or looking for potential targets.
Here are some of the most common bots you might have encountered:
- Web crawlers: also called Googlebots, bots that ‘crawl’ web pages and scan content all over the internet
- Chatbots: A.I. bots that simulate human conversations via live chat
- Social bots: bots that crawl social media platforms to moderate content and other means
These applications above are known as ‘good’ bots. They provide benefits to the sites they operate upon, and also for the users. However, there are also ‘bad’, malicious bots that are programmed to perform harmful activities such as:
- Content scraping, or web scraping, is when the bot copies/downloads the content on a website. The content might be used on another website creating a duplicated content issue. Data scraping in any form might be illegal if the website owner doesn’t allow it.
- Credential stuffing, using credentials obtained from a previous data breach to attempt to log in to other services, done automatically by the bot
- Brute-force attacks. Bots can attempt to ‘guess’ passwords and username credentials on various sites.
- Sending spam content, pretty self-explanatory.
- Email address harvesting
- Click fraud, automated clicking to ads to generate illegal revenue for the ad publisher
Below, we will also discuss some examples of bad bots and good bots, how to differentiate them, and how to stop malicious bot activity on your network.
Different Types of Bots and Their Functions
These bots get the nickname ‘spider’ since their main job is to crawl website content. Googlebots are spider bots, and probably the most famous one. Googlebots operate based on the constantly-updated Google algorithm to study the content in a website, index it, and recommend the site’s ranking. Due to how they operate, they are also called crawlers or crawler bots.
Are spider/crawler bots good or bad bots? The answer would depend on two things: whether the owner of the bot is legitimate, and the purpose of the crawling operations.
Nevertheless, all website owners can prevent some or all content of the website to be crawled via Robots Exclusion Standard or more commonly known as Robots.txt. For example, you might not want old content to be crawled by Googlebots, which can eat your crawl budget.
Spider bots are not only limited to search engines. For example, Pricing Assistant is technically a crawler that crawls various eCommerce sites for price changes. In general, when a crawler’s purpose is not malicious according to common sense and legal sense, it can be considered a good bot.
The definition of a chatbot is actually pretty broad: if it is automated and if it performs a conversation, then it is a chatbot. Most chatbots are considered ‘good’, and nowadays they are getting much better at understanding human language and mimicking human conversations. However, the applications of chatbots at the moment are still pretty narrow.
For example, chatbots might be implemented on an eCommerce website to qualify incoming prospects. The chatbot will ask preliminary questions (i.e. “what is your budget?”) and if the prospect is qualified, it can pass the prospect to a human salesperson.
As the name suggests, bots in this category interact with external systems to perform transactions. It’s important not to limit ‘transaction’ to commercial transactions. Any exchange of information and any movements of data from one platform to another can be described as ‘transactions’.
So, the applications of transactional bots are pretty diverse, and we can expect it to continue advancing in the years to come. While transactional bots can be considered ‘good’ bots in the sense that they can improve productivity, they are expected to replace human jobs.
A good example here is the various video game bots, including bots we play against in various video games. With us needing more entertainment than ever, we can expect various implementations of entertainment bots in the years to come.
This type of bots collects and provides real-time information like news, currency rates, weather, and so on. Google Assistant and Siri can be considered as informational bots. The primary use of these good bots is to allow a more seamless way to share and capture information.
Scraper bots are designed so scrape or steal content and various information from other websites. The most common application of scraper bots is to steal content and repurpose the content elsewhere, usually without the agreement of its original owner.
These bots are designed solely to spot security vulnerabilities and spread malware to attack websites and entire networks. The aim is to take control of these computers and networks, which can be used for a variety of malicious purposes. For example, networks of infected computers can be used to perform DDoS attacks (often unbeknownst to the owners), these ‘infected’ computers are known as ‘botnets.
Spambots, as the name suggests, are designed so spread spams (low-quality content) to drive traffic to the spammer’s website. The most common example here is bots on various comment sections on blogs and social media platforms, as well as forums, which spread links to the spam site.
In the past few years, the volume of spambots has significantly declined not because we have found ways to detect and tackle this, but because these tactics are now highly unprofitable due to the changes made by Google Ads and other advertisers.
The main characteristic of bots in this category is to mimic natural user characteristics and so it can be hard to differentiate these bots from legitimate visitors. There are various applications of these imposter bots. For example, propaganda bots during election periods to sway political opinion and drown the opposing opinion are imposter bots. Imposter bots, due to their nature, are also often used in DDoS attacks.
How To Stop Malicious Bot Activities?
The first step in stopping bad bots is to first differentiate between good and malicious bots. One of the best ways to teel a good bot from a bad one is to analyze its behavior. Malicious bad bot behaviors can include:
- Slowly or rapidly overloading the site with traffic
- Probing the site for hidden files
- Content scraping or data scraping in general
- Brute force login attempts
- Hunting for vulnerabilities on the site, like attempting to find outdated apps, plugins, etc.
Bot management refers to the practice of filtering bad bots from good bots and so blocking traffic from malicious bots while allowing good bots to access the website and provide their benefits. This is done by detecting and analyzing bot activity, identifying the source of bot activity, and discerning between good and bad bot behaviors.
There are various tools like DataDome that provide bot management services, and nowadays, bot management is necessary since unchecked bot activities can cause various problems—some can be very severe— from slowing down the website to stealing sensitive and valuable information and even taking control of a website or network altogether.
A good bot management software should be able to:
- Differentiate between bots and legitimate human visitors
- Identify the source of bot traffic and its reputation
- Identify the IP address of bot origin, and filter based on IP reputation
- Analyze each bot’s behavior
- Differentiate between good bots and bad bots, and add good bots to whitelist
- Rate-limit any bots that over-access the website or service
- Deny access for bad bots, and the bot management tool must be able to only limit access to certain content or resources and serve alternative content to the bad bots to ‘fool’ them
In turn, we can mitigate the damages done by bad bots and the impact of their activities using a proper bot management tool by:
- Blocking any bot from accessing certain content or the whole website/app.
- Discerning certain bots with certain characteristics and take the appropriate actions to prevent damages. And, uses the characteristics of the bot to discern other bots with similar characteristics to improve detection speed and accuracy.
- Challenging suspicious activities with a CAPTCHA or other means. This can also help in recognizing future activities from the same bot even if it has mutated into other forms.
It’s important to note that not all bot activities on your website are bad. In fact, some of these good bots might be very crucial in determining your website’s success. For example, if you block all Googlebot’s activities, you won’t get ranked in Google SERP, which is obviously undesirable.
So, discerning between good and bad bots and ensuring you can sort out harmful bot activity from helpful bot activity and legitimate user traffic is very important. Investing in a proper bot management solution is very important so you can identify and block malicious bots based on proper analysis while still allowing good bots to access content and web assets.