Proxy IP Address Rotation
Proxy IP Address Rotation
Till can randomly rotate through a list of 3rd-party Proxy IP addresses that you supply.
IP address rotation is useful in avoiding detection by anti-scrapers that some websites use.
To turn on this feature, you need to set the following configuration:
Configuration | Value |
---|---|
proxy-file | Specify the path to a text file that contains a list of proxies. For example: ~/.config/datahen/till/proxylist.txt |
Important: If you don't specify a proxy list file, Till will use your real local IP address, which is NOT advisable when scraping at a high scale.
First, create a text file containing a list of proxies, one proxy per line, with the following format:
http://<user>:<password>@<ip>:<port>
For example, you can create a file in the directory ~/.config/datahen/till/proxylist.txt
that contains:
http://your-user:[email protected]:1
http://your-user:[email protected]:2
http://your-user:[email protected]:3
Next, you need to configure Till to point to the file that you've created on Step 1 above.
There are two ways you can configure Till, either through the CLI or by setting it in the config file.
To configure Till via the CLI, run the command with the --proxy-file
flag.
$ till serve --proxy-file ~/.config/datahen/till/proxylist.txt
Note: The example above assumes that you've created the proxy list file in the ~/.config/datahen/till/proxylist.txt
directory.
If you have already created a config.yaml file, you can add a configuration like so:
# Proxy IP settings.
# Path to the text file that contains a list of proxy IPs.
# If you don't specify this, Till will use your real local IP address.
proxy-file: ~/.config/datahen/till/proxylist.txt
Note: The example above assumes that you've created the proxy list file in the ~/.config/datahen/till/proxylist.txt
directory.
Now, you just need to verify that your Till configuration is working.
To verify it using curl
you can do the following command:
$ curl 'https://api.ipify.org' -H 'X-DH-Cache-Freshness: now' -k --proxy http://localhost:2933
You should now be able to see that the proxy IP addresses that you've supplied are being used by Till.
Getting Started
How To Use
Integrations
Python
Node.js
Go
Ruby