HTTP Request Interceptions

This feature allows you to intercept any HTTP requests that pass through till, and respond with whatever response that you want.

The following are some examples of useful scenarios:

  • Ignoring Google Analytics javascript
  • Ignoring images or other files
  • Replacing (stubbing) an API call with a different response
  • Restricting your scraper to only certain URL patterns.

By using interceptions, you can easily scale and maintain your scrapers without the need to build your interception logic inside your scraper codes.

Configuration

Note: HTTP Request Interception is a Premium feature. If you've already upgraded your plan, you can restart Till and it will be turned on.

When you have already created a config file, you can add a configuration like so:

Note: The following interception is just an example. Your specific interception configuration is maybe different.

# Request Interception settings.
interceptions:

  # this is an example of one interception
  # you can add as many interceptions as you wish, 
  # as long as the `name` is unique.
  - name: foo_bar # name can be anything, must be unique.

    # This interception should be disabled or not
    disabled: true
    
    # This is the matcher that will be used to determine
    # if a request should be intercepted or not.
    matches:  
      
      # regex pattern of the URLs you're trying to match
      pattern: '.+\.(jpe?g|png|tiff|bmp|gif|webp)' 
      
      # Methods of the URLs you're trying to match 
      method: GET,POST
    
    # This is what will be served on any intercepted requests.
    responds: 
      
      # HTTP status code
      code: 200 

      # The HTTP header
      header:
        "Content-Type": "image/png"
        "Foo": "bar"
        
      # You can either respond with a `body` or a `file`
      body: "foo body"
      # Or, you can have it serve a local file
      file: "/path/to/your/image.png"   
      
  # Add more of your interceptions below this line

The following are the options that you can set for each interception:

Configuration Value
name String. Anything as long as it's unique. For example: foo_bar
disabled Allowed values: true or false. (default false)
matches: This is used to match a HTTP request, in order to intercept it.
matches.pattern Regular Expression pattern
matches.method can be any HTTP method. If you need multiple methods, separate them with a comma. (example: GET,POST)
responds: Once intercepted, the interception will respond with the following:
responds.code HTTP status code. (example: 200)
responds.header HTTP response header. (example: "Content-Type": "image/png")
responds.body Any thing here that will be served on the response body. (Example: foo bar)
responds.file To serve a file on the response body. (Example: /path/to/your/file.txt)

Recipes

The following are starter recipes that you can copy and modify based on your preference.

Intercepting Google Analytics JS

In order to intercept the Google Analytics Javascript URL https://www.google-analytics.com/analytics.js.

You can add a configuration under the interceptions configuration like so:

interceptions:
  # Add the lines below directly under the `interceptions` configuration:
  - name: google_analytics
    disabled: false
    matches:
      pattern: 'google-analytics\.com\/\S*\.js'
      method: GET
    responds:
      code: 200 
      header:
        "Content-Type": "application/json"
      body: '{"msg":"Yaay it got intercepted!"}'

Now, to confirm that this URL pattern has been intercepted by running the following curl command:

$ curl 'https://www.google-analytics.com/analytics.js' -kv --proxy http://localhost:2933
...
< HTTP/1.1 200 OK
< Connection: close
< Content-Type: application/json
< X-Dh-Gid: www.google-analytics.com-f400c0924cacf5d1918f5a188dfdb2fd
<
* TLSv1.2 (IN), TLS alert, Client hello (1):
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, Client hello (1):
{"msg":"Yaay it got intercepted!"}

If you've seen the above message, it means that your interception worked properly.

Intercepting Files

In this recipe, we're going to try to intercept and replace the image below with a local text file.

cat

First, let's create a text file. Copy the following content into a file on your computer called replacement.txt

Hello from replacement.txt

Next, add a configuration under the interceptions configuration like so:

interceptions:
  # Add the lines below directly under the `interceptions` configuration:
  - name: cat_replacement
    disabled: false
    matches:
      pattern: 'fetchtest\.datahen\.com\/S*\.jpeg'
      method: GET
    responds:
      code: 200 
      header:
        "Content-Type": "text/plain"
      file: '/path/to/your/replacement.txt'

Now, to confirm that this URL pattern has been intercepted by running the following curl command:

curl 'https://fetchtest.datahen.com/assets/img/cat.jpeg' -kv --proxy http://localhost:2933
...
< HTTP/1.1 200 OK
< Connection: close
< Content-Type: text/plain
< X-Dh-Gid: fetchtest.datahen.com-7c46f04a27de059e8c4eb7bd199dea2c
<
* TLSv1.2 (IN), TLS alert, Client hello (1):
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, Client hello (1):
Hello from replacement.txt

If you've seen the above message, it means that your interception worked properly.