Sticky Sessions
Normally, for every request that comes through Till, it randomizes Proxy IP, User-Agent string, and sets a dedicated Cookie Jar to this particular request.
Sometimes to avoid anti-scraper detection and to mimic real user behavior, you need to reuse these same values so that they are shared across several requests.
The Sticky Sessions feature allows you to "stick" the Proxy IP, User-Agent, and Cookie Jar to a certain session.
You can then specify a set of requests to use this session.
When used properly, it allows you to do advanced scraping scenarios to scale your scrapers and avoid anti-scraper detections.
The Sticky Sessions feature allows you to manage cookies throughout your requests.
It lets you "stick" a cookie jar to a session. Then, when multiple requests use this session, the requests will use the cookies that are stored on the cookie jar. Every new cookie that is modified by the target server, will also be saved into the session's cookie jar.
This feature is useful when you want to do advanced scraping scenarios, and avoid anti-scrapers, as it allows you to mimic a real user that is interacting with the target website.
To use sticky sessions, you need to specify the X-DH-Session-ID
header on the HTTP request.
HTTP Header | Value |
---|---|
X-DH-Session-ID | Any string value. For example: foo |
This is an example of using sticky sessions with curl
:
$ curl 'https://fetchtest.datahen.com/echo/request' -H 'X-DH-Session-ID: foo' -H 'X-DH-Cache-Freshness: now' -k --proxy http://localhost:2933
Note: The above assumes that you're using the foo
session ID for this request.
The following is the behavior of what your requests will look like if you use sticky sessions.
In this example, we are mimicking two different users accessing the target website at the same time.
Req # | Session ID | IP Used | User-Agent | Cookie Jar |
---|---|---|---|---|
1 | user1 | 198.51.100.1 | chrome | (empty) |
2 | user2 | 198.51.100.2 | firefox | (empty) |
3 | user1 | 198.51.100.1 | chrome | some-val: val1 |
4 | user2 | 198.51.100.2 | firefox | some-val: val1 |
5 | user1 | 198.51.100.1 | chrome | some-val: val1, another-val: val2 |
6 | user2 | 198.51.100.2 | firefox | some-val: val1, another-val: val2 |
Note: Sticky Sessions is a Premium feature. If you've already upgraded your plan, you can restart Till and it will be turned on.
The following are the configuration options that you can set:
Configuration | Value |
---|---|
sessions.disabled | Allowed values: true or false . (default false ) |
sessions.ttl | Time-to-live for the session records. Allowed values: minute , hour , day , week , fortnight , month , year , or forever (default week ). |
Note: Till stores the sessions data inside your Data Directory on your local disk. You can change the TTL settings to save disk space. The lesser the TTL, the smaller the space in your disk that will be used.
When you have already created a config file, you can add a configuration like so:
# Sticky Session settings
sessions:
# Disable the sticky sessions feature.
# Defaults to false.
disabled: false
# TTL (Time To Live). How long a session record will be allowed to live before it gets deleted.
# Defaults to "week".
ttl: "week"
Now, you just need to verify that your Till configuration is working and that your requests are served using their sticky sessions.
To verify that, we need to do the following steps:
Note: In the following examples, we will use the session ID user1
.
Let's get started:
Let's send a request to the echo endpoint so that we can compare the result with later requests.
You can use the following curl command:
$ curl 'https://fetchtest.datahen.com/echo/request' -H 'X-DH-Session-ID: user1' -H 'X-DH-Cache-Freshness: now' -kv --proxy http://localhost:2933
...
> GET /echo/request HTTP/1.1
> Host: fetchtest.datahen.com
> User-Agent: curl/7.58.0
> Accept: */*
> X-DH-Session-ID: user1
> X-DH-Cache-Freshness: now
>
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Alt-Svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400
< Cf-Cache-Status: DYNAMIC
< Cf-Ray: 674e9ae4abedebc5-LAX
< Connection: keep-alive
< Content-Type: text/plain; charset=utf-8
< Date: Mon, 26 Jul 2021 15:19:13 GMT
< Expect-Ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Nel: {"report_to":"cf-nel","max_age":604800}
< Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=%2FfZqElU9nSOIMEuhIsctzRRc8xaaNpaG8JtS2%2B9KlZvCCChutSdwabhLWfE2RZfU3fDZHylptfiSRw1ogSceUkj5BbBQk9BFRN4%2Bmol9Ap7UNUjvt7Zfb9K95EjMO49PI%2F6YlXEFcwk%3D"}],"group":"cf-nel","max_age":604800}
< Server: cloudflare
< X-Dh-Gid: fetchtest.datahen.com-144a91f641d36c08dade39f739b05d31
<
GET /echo/request HTTP/2.0
Host: fetchtest.datahen.com
Accept: */*
Accept-Encoding: gzip
Cdn-Loop: cloudflare
Cf-Connecting-Ip: 198.51.100.1
Cf-Ipcountry: US
Cf-Ray: 674e9ae4abedebc5-LAX
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183 Safari/537.36
X-Forwarded-For: 198.51.100.1
X-Forwarded-Proto: https
Make note of the IP, and user agent from the response body, so that you can compare this with a later request.
Cf-Connecting-Ip: 198.51.100.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183 Safari/537.36
Next, we're going to see if our the sticky session's cookie jar do saves cookies.
Let's send a request to a URL endpoint that sets a cookie:
$ curl 'https://fetchtest.datahen.com/cookie' -H 'X-DH-Session-ID: user1' -H 'X-DH-Cache-Freshness: now' -kv --proxy http://localhost:2933
...
> GET /cookie HTTP/1.1
> Host: fetchtest.datahen.com
> User-Agent: curl/7.58.0
> Accept: */*
> X-DH-Session-ID: user1
> X-DH-Cache-Freshness: now
>
< HTTP/1.1 200 OK
< Content-Length: 26
< Alt-Svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400
< Cf-Cache-Status: DYNAMIC
< Cf-Ray: 674ea161aa2ce50e-LAX
< Connection: keep-alive
< Content-Type: text/plain; charset=utf-8
< Date: Mon, 26 Jul 2021 15:23:39 GMT
< Expect-Ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Nel: {"report_to":"cf-nel","max_age":604800}
< Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=MyzDNi0GSp7Y4lNYKDHOPbxUpQrI4nrDgdZmQDOz4cQXWKYczOqIpSW4NsG7OYSu%2BIGR5ptTKNkLqN8O1lKl2VPIlVxbePbsfgMtNgicIlVYScmbUYYZ%2BUvMSVWLrUDvlRCqlEYBbSc%3D"}],"group":"cf-nel","max_age":604800}
< Server: cloudflare
< Set-Cookie: cookieA=cookieAhere
< Set-Cookie: cookieB=cookieBhere
< X-Dh-Gid: fetchtest.datahen.com-7147f8e17fdc930f6798549682ecb175
<
* Connection #0 to host localhost left intact
This should set the cookie
The above request sets the following cookie, which Till will save in the cookie jar of the session with the ID of user1
:
< Set-Cookie: cookieA=cookieAhere
< Set-Cookie: cookieB=cookieBhere
Now that the cookies set on the previous request have been saved into the cookie jar, let's verify this by doing another request to the echo endpoint.
curl 'https://fetchtest.datahen.com/echo/request' -H 'X-DH-Session-ID: user1' -H 'X-DH-Cache-Freshness: now' -kv --proxy http://localhost:2933
...
> GET /echo/request HTTP/1.1
> Host: fetchtest.datahen.com
> User-Agent: curl/7.58.0
> Accept: */*
> X-DH-Session-ID: user1
> X-DH-Cache-Freshness: now
>
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Alt-Svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400
< Cf-Cache-Status: DYNAMIC
< Cf-Ray: 674eaa759b5831af-LAX
< Connection: keep-alive
< Content-Type: text/plain; charset=utf-8
< Date: Mon, 26 Jul 2021 15:29:51 GMT
< Expect-Ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Nel: {"report_to":"cf-nel","max_age":604800}
< Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=BLYoAu46fx4eTjPgGS8hjwpz9IpCwloExRM6aYshaWLJQibUxWdcyJSj6wBykofo%2FbQi%2FFv3PSYX2PsbXYTaBn1anbiGJ3MaiGHn%2FLuB0BMv3I8F0QCl94q3DeEoTzRSrvVacyO0HlI%3D"}],"group":"cf-nel","max_age":604800}
< Server: cloudflare
< X-Dh-Gid: fetchtest.datahen.com-144a91f641d36c08dade39f739b05d31
<
GET /echo/request HTTP/1.1
Host: fetchtest.datahen.com
Accept: */*
Accept-Encoding: gzip
Cdn-Loop: cloudflare
Cf-Connecting-Ip: 198.51.100.1
Cf-Ipcountry: US
Cf-Ray: 674eaa759b5831af-LAX
Cf-Visitor: {"scheme":"https"}
Connection: Keep-Alive
Cookie: cookieA=cookieAhere; cookieB=cookieBhere
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183 Safari/537.36
X-Forwarded-For: 198.51.100.1
X-Forwarded-Proto: https
We now need to verify two things:
Let's verify if the proxy IP and user-agent are the same. Check if the following information is the same as request 1.
Cf-Connecting-Ip: 198.51.100.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183 Safari/537.36
Next, let's verify if the cookie that was set in request 2, was used in this request. If you see the following values in the response, then it means that it works correctly.
Cookie: cookieA=cookieAhere; cookieB=cookieBhere
Congratulations! You've correctly set up and were able to use sticky sessions.
Getting Started
How To Use
Integrations
Python
Node.js
Go
Ruby