Integrating Till with Scrapy

Till can be easily integrated with Scrapy without much code changes.

Please follow the steps below.

Step 1: Install Till

Follow the instructions to install Till

Step 2: Modify your Scrapy project

Next, you need to modify your existing scrapy project to integrate with Till.

You only need to do two things:

  1. Add a custom middleware
  2. Modify the settings file

Note: To see a working example, you can visit this link which shows an example of a Scrapy project that was taken from Scrapy's tutorial page and modified to integrate with Till.

1. Add a custom middleware

On your middleware.py file, add the TillMiddleware class.

# Your custom middleware
class TillMiddleware(object):
    def process_request(self, request, spider):
        # Connect to Till
        request.meta["proxy"] = "http://localhost:2933" 
        # Add the header to force a Cache Miss on Till
        request.headers["X-DH-Cache-Freshness"] = "now" 

Your middleware.py file should now look like this here.

2. Modify the settings file

On your settings.py file, enable the DOWNLOADER_MIDDLEWARES and add the tutorial.middlewares.TillMiddleware key.

DOWNLOADER_MIDDLEWARES = { 
    'tutorial.middlewares.TillMiddleware': 350 # Add this middleware
}

Your settings.py file should now look like this here

Step 3: Run your Scrapy project

Next, run your Scrapy project like you normally would.

Note: If you don't have an existing Scrapy project to try with Till, you can try our working example here.

Step 4: Verify that it works

Visit the Till UI at http://localhost:2980/requests to see that your new requests are shown.

Request Log UI