Proxy IP Address Rotation

Till can randomly rotate through a list of 3rd-party Proxy IP addresses that you supply.

IP address rotation is useful in avoiding detection by anti-scrapers that some websites use.

To turn on this feature, you need to set the following configuration:

Configuration Value
proxy-file Specify the path to a text file that contains a list of proxies. For example: ~/.config/datahen/till/proxylist.txt

Important: If you don't specify a proxy list file, Till will use your real local IP address, which is NOT advisable when scraping at a high scale.

Configuration Steps

Step 1: Create the proxy list file

First, create a text file containing a list of proxies, one proxy per line, with the following format:

http://<user>:<password>@<ip>:<port>

For example, you can create a file in the directory ~/.config/datahen/till/proxylist.txt that contains:

http://your-user:[email protected]:1
http://your-user:[email protected]:2
http://your-user:[email protected]:3

Step 2: Configure Till

Next, you need to configure Till to point to the file that you've created on Step 1 above.

There are two ways you can configure Till, either through the CLI or by setting it in the config file.

CLI

To configure Till via the CLI, run the command with the --proxy-file flag.

$ till serve --proxy-file ~/.config/datahen/till/proxylist.txt

Note: The example above assumes that you've created the proxy list file in the ~/.config/datahen/till/proxylist.txt directory.

Config File

If you have already created a config.yaml file, you can add a configuration like so:

# Proxy IP settings.
# Path to the text file that contains a list of proxy IPs.
# If you don't specify this, Till will use your real local IP address.
proxy-file: ~/.config/datahen/till/proxylist.txt

Note: The example above assumes that you've created the proxy list file in the ~/.config/datahen/till/proxylist.txt directory.

Step 3: Verify Till

Now, you just need to verify that your Till configuration is working.

To verify it using curl you can do the following command:

$ curl 'https://api.ipify.org' -H 'X-DH-Cache-Freshness: now' -k --proxy http://localhost:2933

You should now be able to see that the proxy IP addresses that you've supplied are being used by Till.