Email - harun.bspt2014@gmail.com Phone - +8801717615827

User Agents for Web Scraping

When scraping large amounts of information, the main problem is the risk of blocking and how to avoid it. We have already discussed that you can use captcha-solving services, proxies, or even a web scraping API that takes care of your difficulties.

However, suppose you are collecting data by making simple HTTP requests and want to create your scraper entirely. In that case, you cannot do without using headers in general and User-Agents in particular.

In this article, we will tell you what User Agents are, why they are needed, what they mean, and where to get them. In addition, we will provide code examples for both setting and rotating User Agents in Python and NodeJS.

What is User-Agent String

User-Agent is a string a web browser sends to a server when requesting a web page. It contains information about web browsers, operating systems, and devices.

Regularly changing the User-Agent and proxy is a crucial strategy to avoid blocking in web scraping. By changing the user agent header, you can emulate different devices and browsers, making detecting and blocking automated scraping requests harder for websites.

The Importance of User Agents in Web Scraping

User-Agents play a crucial role in web scraping, enhancing the scraping process, and avoiding detection and blocking. This section explores why you should use User-Agents in your scraping scripts.

Avoiding IP blocking

Not all websites are bot-friendly. Many websites have implemented anti-bot measures to protect their content and prevent unauthorized access. So, setting and changing your User-Agent is crucial to avoid blocking your IP when making automated website requests. Even though not every User-Agent belongs to a human, its absence in a request raises red flags and instantly screams bot.

For example, suppose your script retrieves data without using headless browsers and relies on simple requests. In that case, unless explicitly specified, you won’t send specific data to the site, including the User-Agent. On the other hand, real browsers continuously transmit the User-Agent when users visit a website.

Websites are wary of bots and actively block them to prevent malicious activities. Without a User-Agent, your IP address might be flagged and blocked, hindering your data collection efforts.

To avoid getting blocked, ensure your bot includes a User-Agent string in its requests. This simple step can make your bot appear more human-like and avoid website detection.

Mimicking different devices and browsers

User agent headers spoofing allows scrapers to mimic different devices and browsers, which can help access other versions of websites and content optimized for specific devices.

This is especially important when you want to access information that is only available to specific devices. For example, Google search results can vary significantly depending on the device type used to make the request.

User Agent Syntax

The User-Agent string is a specific format that contains information about the browser, operating system, and other parameters. In general, it looks like this:

User-Agent: <product> / <product-version> <comment>

Here, <product> is the product identifier (its name or code name), <product-version> is the product version number, and <comment> is additional information, such as sub-product details.

For browsers, the syntax expands to:

Mozilla/[version] ([system and browser information]) [platform] ([platform details]) [extensions]

Let’s take a closer look at each parameter and its meaning.

Understanding the components of a user agent

The general syntax of a User-Agent string includes the following components:

  1. Prefix and version: A prefix may be present at the beginning of the string, which usually indicates the type of device or application and its version. For example, “Mozilla/5.0” is often used in browser User-Agent strings.
  2. Browser name: The browser information that makes the request follows the prefix. This may include the name and version of the browser. For example, “Chrome/121.0.6167.87”.
  3. System Information: The operating system on which the request is made is specified after the browser information. This could be like “Windows NT 10.0; Win64; x64”.
  4. Platform details: This may contain the layout engine used by the browser to render web pages and its version, such as WebKit/537.36.
  5. Extensions: The User-Agent may contain other parameters, such as language information (e.g., “en-GB”) or screen resolution.

Let’s use this and compose a User-Agent string that specifies the Windows 10 operating system and Chrome browser version Version 121.0.6167.87.

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/121.0.6167.87 Safari/537.36

User agents for other devices can be composed following a similar pattern.

Common formats and variations

User-agent strings often follow standard formats, like the one shown in the example above. However, some User-Agent strings may contain additional parameters, such as information about browser plugins or unique device identifiers.

To make our examples more complete, let’s consider different variations of User-Agents for different devices:

  1. Linux:
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0
  1. MacOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)
 Chrome/120.0.0.0 Safari/537.36
  1. Mobile browsers:
Mozilla/5.0 (Linux; Android 10; HD1913) AppleWebKit/537.36 (KHTML, like Gecko)
 Chrome/120.0.6099.210 Mobile Safari/537.36 EdgA/120.0.2210.126

Now that we’ve covered User-Agents syntax let’s look at a list of up-to-date ones you can use in your projects.

List Of Latest User Agents For Web Scraping

Below we will provide tables with constantly updated lists of common User-Agents for popular platforms. Our scrapers automatically update list of User-Agents on a daily basis, so you can be sure you’re always using the latest.

Windows User Agents:

OS & Browser User-Agent
Chrome 127.0.0, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Edge 126.0.2592, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/126.0.2592.113
Edge 44.18363.8131, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; Win64; x64; Xbox; Xbox One) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edge/44.18363.8131
Firefox 128.0, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0
Firefox 128.0, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0
Opera 113.0.0, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0
Opera 113.0.0, Windows 10/11 Mozilla/5.0 (Windows NT 10.0; WOW64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0

MacOS User Agents:

OS & Browser User-Agent
Chrome 127.0.0, Mac OS X 10.15.7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Edge 126.0.2592, Mac OS X 10.15.7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/126.0.2592.113
Firefox 128.0, Mac OS X 14.5 Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:128.0) Gecko/20100101 Firefox/128.0
Firefox 128.0, Mac OS X 14.5 Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:128.0) Gecko/20100101 Firefox/128.0
Safari 17.5, Mac OS X 14.5 Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15
Opera 113.0.0, Mac OS X 14.5 Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0

Please note the browser version when choosing or composing a User-Agent. The best and most common user agents will use the latest version of Chrome, as it self-updates on startup. Therefore, most users will use it, and you can better mask your scraper by using custom User-Agents with the latest Chrome version.

How to Set User Agent

The configuration of User Agents depends on the context in which you want to use them. Typically, this involves your scripts that make requests to different websites. Let’s look at how to set User-Agents in two popular programming languages.

We will make requests to the website https://httpbin.org/headers, which returns all headers, including the user agent header:

  1. Python. We will use the Requests library to make the request:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.text)

Output:

{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-65c0adfb-7a198b2f3bf4dff157696ce2"
  }
}
  1. NodeJS. We will use fetch() to make the request:
fetch('https://httpbin.org/headers', {
    headers: {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)
 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
    }
})

The response is similar to the previous one.

If you want to change your User-Agent for some reason, not in a script, but in your browser, you can set the User Agent in the “Network” or “Device” tab using the browser’s developer tools (DevTools). This can be useful for testing websites or web applications. In addition, there are special browser extensions that allow you to switch User-Agents easily.

How to Rotate User Agents

User-Agent rotation is an important part of a strategy to avoid IP address blocking. User-Agent rotation means constantly changing the User-Agent string that your software sends with each request. This can help you to reduce the time between requests without the risk of being blocked.

Importance of rotating user agents

As we mentioned earlier, User-Agent rotation is a crucial mechanism for bypassing protection measures and ensuring the continuity of web scraping operations and automated processes on the Internet. In short, using User-Agent rotation allows you to:

  1. Increase the chances of avoiding IP address blocking.
  2. More effectively mask requests.
  3. Increase the reliability of the scraper.
  4. Emulate requests from different devices and browsers.

In other words, User-Agent rotation allows you to mask requests, making them look more like regular requests made by human users, to access content optimized for specific platforms, or to test the compatibility of web pages on different devices. And if any User-Agent is temporarily blocked or stops working, you can switch to another one to continue scraping without downtime.

Techniques for rotating user agents in web scraping

Now that we have covered why User-Agent rotation is necessary let’s look at simple examples in Python and NodeJS that allow you to implement this functionality.

We will use the previous examples as a basis and add a variable containing a list of User-Agents and a loop that will call different User-Agents from the list. Then, we will make a request to the website, which will return the contents of the headers, display it on the screen, and move on to the next User-Agent.

The algorithm we’ve considered can be implemented in Python as follows:

import requests

# List of User Agents
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
]

# Index to track the current User Agent
user_agent_index = 0

# Make a request with a rotated User Agent
def make_request(url):
    global user_agent_index
    headers = {'User-Agent': user_agents[user_agent_index]}
    response = requests.get(url, headers=headers)
    user_agent_index = (user_agent_index + 1) % len(user_agents)
    return response.text

# Example usage
url_to_scrape = 'https://httpbin.org/headers'

for _ in range(5):
    html_content = make_request(url_to_scrape)
    print(html_content)

For NodeJS, you can use the following code:

const axios = require('axios');

// List of User Agents
const userAgents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
 Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:109.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
];

// Index to track the current User Agent
let userAgentIndex = 0;

// Function to make a request with a rotated User Agent
async function makeRequest(url) {
    const headers = {'User-Agent': userAgents[userAgentIndex]};
    const response = await axios.get(url, {headers});
    userAgentIndex = (userAgentIndex + 1) % userAgents.length;
    return response.data;
}

// Example usage
const urlToScrape = 'http://example.com';
for (let i = 0; i < 5; i++) {
    makeRequest(urlToScrape)
        .then(htmlContent => console.log(htmlContent))
        .catch(error => console.error(error));
}

Both of these options successfully handle User-Agent rotation, and if you find them useful, you are free to use and modify them according to your needs.

How Websites Use User Agents for Identification

Content delivery optimization: Most websites can serve different layouts or styles based on the user agent. For example, a mobile user agent might trigger the website to serve a mobile-friendly version with touch-friendly navigation and simplified content. Additionally, certain features or optimizations may only be available or needed for specific browsers. For instance, a website might use a different method for rendering graphics on Google Chrome compared to Firefox.

Analytics and logging: User agents help in understanding the types of devices and browsers visitors are using. This information is valuable for website analytics to optimize content and improve user experience. Also, data on user agents can be used to track the popularity of different browsers and operating systems over time.

Access control and security: Websites can detect and block known malicious bots and known web scrapers based on their user agent strings. Some sites maintain lists of known bad user agents to automatically deny access. User agents can be used in conjunction with IP addresses to enforce rate limits. If excessive requests are detected from a particular user agent, the server might slow down or block access temporarily.

Feature support and compatibility: Web servers identify the browser, so they can enable or disable features that are known to work or fail in specific environments. For instance, a site might avoid using a particular HTML5 feature on an older browser that doesn’t support it. Furthermore, websites can load additional scripts or polyfills to support features in older browsers identified by its user agent string.

Why Is a User Agent Important for Web Scraping?

Content negotiation: Websites often serve different content based on the device and browser. For example, mobile devices may receive a mobile-optimized version of the site, while desktop browsers get a more feature-rich version. By identifying as a specific browser or device, web scraping tools can ensure it receives the correct version of the content.

Tailoring user experience: Some websites customize the user experience based on the user agent. This includes things like enabling or disabling certain features, changing layouts, and adjusting the presentation to better suit the identified client.

Differentiating human users from bots: By analyzing user agents, websites can differentiate between human users and web scraping bots. They may serve CAPTCHAs or other challenges to suspected bots based on its user agent header

Avoiding detection: Websites often look for an unusual or generic user agent as an indicator of scraping activity. User agent switching that mimics a real browser helps web scrapers avoid detection and blocking.

Respecting website terms of service: Some websites explicitly forbid data extraction in their terms of service but allow access to most web browsers. Using a legitimate user agent helps scrapers respect these boundaries and reduce the risk of legal issues.

Content variations: Websites may serve different content to different devices or browsers. For example, a news site might serve more text-based content to mobile devices and media-rich content to desktops. Using the appropriate user agent ensures the scraper gets the desired version of the content. Different user agents can access different web content, allowing scrapers to customize their requests based on the desired content and target audience.

Testing and validation: By simulating a different user agent, scrapers can test how the target website behaves across various browsers and devices. This is particularly useful with developer tools for understanding cross-browser compatibility and device-specific issues.

How to Check User Agents?

In order to check user agents, websites analyze the User-Agent header in the HTTP request. This process helps them identify the type of client making the request and respond accordingly. Here’s how different web pages check the user agent header:

  1. Receiving the request: When a client (browser, scraper, etc.) sends an HTTP request to a web server, it includes various headers, including the User-Agent.
  2. Extracting the user-agent header: The server reads the User-Agent header from the request to understand the client’s identity.
  3. Analyzing the user-agent string: The server parses the User-Agent string to identify the browser, operating system, device type, and sometimes even the version of the browser.
  4. Responding appropriately: Based on the user agent, the server can: serve different content (e.g., mobile vs. desktop), allow or block requests (e.g., blocking known bots), or apply rate limiting or other access controls.

Below is a Python code snippet that mimics the functionality of a web server checking the user agent string. This example uses the Flask framework to create a simple web server that checks the User-Agent headers from incoming requests:

from flask import Flask, request, jsonify

app = Flask(__name__)

# List of known user agents to block
blocked_user_agents = [
    'BadBot/1.0',  # Example of a known bad bot user agent
    'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 
 # Example of a known good bot user agent
]

@app.route('/')
def check_user_agent():
    user_agent = request.headers.get('User-Agent', '')
    
    # Log the user agent
    print(f"User-Agent: {user_agent}")
    
    # Check if the user agent is blocked
    if user_agent in blocked_user_agents:
        return jsonify({"message": "Access Denied"}), 403
    
    # Respond based on the type of user agent
    if 'Mobile' in user_agent or 'Android' in user_agent:
        return jsonify({"message": "Mobile Content"}), 200
    elif 'Windows' in user_agent or 'Macintosh' in user_agent:
        return jsonify({"message": "Desktop Content"}), 200
    else:
        return jsonify({"message": "Generic Content"}), 200

if __name__ == '__main__':
    app.run(debug=True)

Best Practices and Tips

To increase your success in data scraping, we recommend following some guidelines that can help reduce the risk of getting blocked. While not mandatory, these tips can enhance your script

Updating user agents regularly

Regular User-Agent rotation helps to prevent blocking. Websites have more difficulty detecting and blocking bots that constantly change their User-Agent.

Additionally, it’s essential to keep your User-Agent up to date. Using outdated User-Agents (e.g., Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36) can also lead to blocking.

Keep Random Intervals Between Requests

Besides keeping User-Agents up-to-date, don’t forget to implement random delays between requests. Real users don’t interact with websites without pauses or a fixed delay (e.g., 5 seconds) between requests. This behavior is only typical for bots and is easily detectable.

Random delays between requests help to simulate typical human user behavior, making it harder to detect automated processes. Additionally, delays can reduce the load on the server and make scraping less suspicious.

Rotate User Agents

As mentioned, rotating User-Agents reduces the risk of IP blocking since each request appears to come from a different user. This is especially useful if a website has restrictions on the frequency of requests from the same User-Agent. By rotating User-Agents, you can bypass these restrictions and continue accessing the website without issues.

How to Avoid Getting Your UA Banned

There are multiple approaches towards maintaining unique user agent strings. Let’s take a closer look at four popular methods:

1. Rotate User Agents

Rotating user agents is an effective technique to avoid getting your user agent banned while scraping websites. As you rotate user agents in your HTTP requests, you can simulate traffic from multiple devices and browsers, making it harder for websites to detect and block your scraping activity.

  • Diversifying requests: Thanks to user agent rotation, your requests appear to come from various browsers and devices, reducing the likelihood that a single user agent will be flagged for suspicious activity.
  • Avoiding patterns: Consistently using the same user agent can create a detectable pattern. Rotating them introduces randomness, making it harder for anti-scraping mechanisms to identify your scraper.
  • Evading detection algorithms: Some websites use machine learning algorithms to detect scraping based on user agent patterns. Rotate user agents to bypass these algorithms.
  • Reducing rate limiting: Websites may impose rate limits based on the user agent header. Rotating user agents can distribute the requests across different identities, potentially bypassing these limits.

Here’s a Python code snippet that demonstrates how to implement rotation of the user agent string using the requests library. This example will fetch a web page using different user agents randomly selected from a predefined list.

import requests
from random import choice

# Define the URL you want to scrape
url = 'https://example.com'

# List of different user agents
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/91.0.4472.124 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 
(KHTML, like Gecko) Version/14.0.3 Safari/605.1.15',
    'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 
(KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1',
    'Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/79.0.3945.79 Mobile Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0'
]

# Function to make a request with a random user agent
def fetch_page_with_random_user_agent(url):
    # Choose a random user agent from the list
    user_agent = choice(user_agents)
    
    # Set up the headers with the chosen user agent
    headers = {
        'User-Agent': user_agent
    }
    
    # Send the HTTP request with the custom headers
    response = requests.get(url, headers=headers)
    
    # Print the chosen user agent and the response status
    print(f"Used User-Agent: {user_agent}")
    print(f"Response Status Code: {response.status_code}")
    
    return response.content

# Example usage
for _ in range(5):  # Fetch the page 5 times with different user agents
    content = fetch_page_with_random_user_agent(url)
    # Process the content as needed

2. Keep Random Intervals Between Requests

Adding random intervals between requests is another effective method to avoid detection and banning while scraping websites. By introducing randomness in the timing of your requests, you can mimic human browsing behavior, making it harder for websites to detect your scraping activity as automated.

  • Mimicking human behavior: Human browsing behavior is not consistent and has natural pauses. Random intervals between requests simulate this behavior, making your scraper appear more like a real user.
  • Reducing pattern detection: Consistent request patterns can be easily detected by anti-scraping mechanisms. Random intervals introduce variability, making it harder to identify scraping activity.
  • Evasion of bot detection: Some websites employ sophisticated algorithms to detect bots based on the frequency and regularity of requests. Random intervals can help evade these detections.
import requests
import time
import random

# Define the URL you want to scrape
url = 'https://example.com'

# List of different user agents
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/91.0.4472.124 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 
(KHTML, like Gecko) Version/14.0.3 Safari/605.1.15',
    'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 
(KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1',
    'Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/79.0.3945.79 Mobile Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0'
]

# Function to make a request with a random user agent and random interval
def fetch_page_with_random_delay(url):
    # Choose a random user agent from the list
    user_agent = random.choice(user_agents)
    
    # Set up the headers with the chosen user agent
    headers = {
        'User-Agent': user_agent
    }
    
    # Send the HTTP request with the custom headers
    response = requests.get(url, headers=headers)
    
    # Print the chosen user agent and the response status
    print(f"Used User-Agent: {user_agent}")
    print(f"Response Status Code: {response.status_code}")
    
    return response.content

# Example usage
for _ in range(5):  # Fetch the page 5 times with different user agents
    # Fetch the page with random user agent
    content = fetch_page_with_random_delay(url)
    # Process the content as needed
    
    # Introduce a random delay between 1 and 5 seconds
    delay = random.uniform(1, 5)
    print(f"Sleeping for {delay:.2f} seconds")
    time.sleep(delay)

Use Up-to-date User Agents

Update user agents to make use of modern ones – and you’ll avoid getting banned while using a web scraping API. Modern websites often maintain lists of known outdated UAs or those associated with bots and scrapers. By using an up-to-date user agent, you can blend in with legitimate traffic, reducing the likelihood of being flagged or blocked.

  • Avoiding known bot user agents: Websites often block or monitor requests from outdated or commonly used bot UA strings. Using the latest user agents helps you avoid these lists.
  • Mimicking real users: Up-to-date user agents reflect current browser versions that real users are likely to be using, making your scraping activity less suspicious.
  • Staying compatible: Some websites serve different content or features based on the user agent. Using most common user agents that are modern ensures that you receive the same content as a real user.
  • Avoiding detection: Anti-scraping mechanisms are often updated to recognize outdated user agents. Keeping your user agents up-to-date helps evade these detections.
import requests
import random

# URL to fetch the latest user agents (example URL, you might need to 
use an actual service or maintain your own list)
latest_user_agents_url = 'https://api.example.com/latest-user-agents'

# Function to get the latest user agents
def get_latest_user_agents():
    response = requests.get(latest_user_agents_url)
    if response.status_code == 200:
        return response.json()['user_agents']
    else:
        # Fallback to a predefined list if fetching fails
        return [
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/91.0.4472.124 Safari/537.36',
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) 
Version/14.0.3 Safari/605.1.15',
            'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15
 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1',
            'Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/79.0.3945.79 Mobile Safari/537.36',
            'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0'
        ]

# Function to make a request with a random up-to-date user agent
def fetch_page_with_latest_user_agent(url, user_agents):
    # Choose a random user agent from the list
    user_agent = random.choice(user_agents)
    
    # Set up the headers with the chosen user agent
    headers = {
        'User-Agent': user_agent
    }
    
    # Send the HTTP request with the custom headers
    response = requests.get(url, headers=headers)
    
    # Print the chosen user agent and the response status
    print(f"Used User-Agent: {user_agent}")
    print(f"Response Status Code: {response.status_code}")
    
    return response.content

# Get the latest user agents
user_agents = get_latest_user_agents()

# Example usage
url = 'https://example.com'
for _ in range(5):  # Fetch the page 5 times with different user agents
    content = fetch_page_with_latest_user_agent(url, user_agents)
    # Process the content as needed

Custom User Agents

Using custom user agents can be another effective method to avoid detection and banning while web scraping. By creating custom user agent strings, you can tailor your requests to appear as if they are coming from specific devices or browsers, and even include additional metadata that can further obscure your scraping activity.

  • Tailoring to specific needs: Custom user agents can be designed to mimic specific browsers, operating systems, and devices – this way, the web server identifies web scraping activity less frequently.
  • Adding complexity: By including additional metadata in your user agent strings, you can introduce variability that can confuse detection algorithms.
  • Avoiding known patterns: Custom user agents can help you avoid detection by steering clear of commonly blocked or flagged user agent information.
  • Evading simple filters: Websites that use simple filters to block other user agents may not recognize your custom user agents, allowing your requests to pass through.
import requests
import random

# Define the URL you want to scrape
url = 'https://example.com'

# List of custom user agents
custom_user_agents = [
    'CustomUserAgent/1.0 (Windows NT 10.0; Win64; x64) CustomBrowser/91.0.4472.124',
    'CustomUserAgent/1.0 (Macintosh; Intel Mac OS X 10_15_7) CustomBrowser/14.0.3',
    'CustomUserAgent/1.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) CustomBrowser/14.1.1',
    'CustomUserAgent/1.0 (Linux; Android 10; SM-G973F) CustomBrowser/79.0.3945.79',
    'CustomUserAgent/1.0 (Windows NT 10.0; WOW64) CustomBrowser/45.0'
]

# Function to make a request with a custom user agent
def fetch_page_with_custom_user_agent(url):
    # Choose a random custom user agent from the list
    user_agent = random.choice(custom_user_agents)
    
    # Set up the headers with the chosen user agent
    headers = {
        'User-Agent': user_agent
    }
    
    # Send the HTTP request with the custom headers
    response = requests.get(url, headers=headers)
    
    # Print the chosen user agent and the response status
    print(f"Used Custom User-Agent: {user_agent}")
    print(f"Response Status Code: {response.status_code}")
    
    return response.content

# Example usage
for _ in range(5):  # Fetch the page 5 times with different custom user agents
    content = fetch_page_with_custom_user_agent(url)
    # Process the content as needed

Conclusion and Takeaways

This article has provided an overview of User-Agents in the context of web scraping. We reviewed the reasons for using User-Agents, explored the basics of the syntax, and offered a list of actual User-Agents and code examples for setting up User-Agents in two popular programming languages.

In addition, we described how to improve the effectiveness of User-Agents by rotating them and explained the importance of this practice. Finally, we concluded the article with practical tips to help you reduce the risk of scraping blocking and effectively mimic the behavior of real users.

Dr. Harun Ar Rashid
Show full profile Dr. Harun Ar Rashid

Dr. MD Harun Ar Rashid, FCPS, MD, PhD, is a highly respected medical specialist celebrated for his exceptional clinical expertise and unwavering commitment to patient care. With advanced qualifications including FCPS, MD, and PhD, he integrates cutting-edge research with a compassionate approach to medicine, ensuring that every patient receives personalized and effective treatment. His extensive training and hands-on experience enable him to diagnose complex conditions accurately and develop innovative treatment strategies tailored to individual needs. In addition to his clinical practice, Dr. Harun Ar Rashid is dedicated to medical education and community outreach, often participating in initiatives that promote health awareness and advance medical knowledge. His career is a testament to the high standards represented by his credentials, and he continues to contribute significantly to his field, driving improvements in both patient outcomes and healthcare practices.

Register New Account
Shopping cart