Integrating Till with Scrapy
Integration Steps:
Integrating Till with Scrapy
Integration Steps:
Till can be easily integrated with Scrapy without much code changes.
Please follow the steps below.
Follow the instructions to install Till
Next, you need to modify your existing scrapy project to integrate with Till.
You only need to do two things:
Note: To see a working example, you can visit this link which shows an example of a Scrapy project that was taken from Scrapy's tutorial page and modified to integrate with Till.
On your middleware.py
file, add the TillMiddleware
class.
# Your custom middleware
class TillMiddleware(object):
def process_request(self, request, spider):
# Connect to Till
request.meta["proxy"] = "http://localhost:2933"
# Add the header to force a Cache Miss on Till
request.headers["X-DH-Cache-Freshness"] = "now"
Your middleware.py
file should now look like this here.
On your settings.py
file, enable the DOWNLOADER_MIDDLEWARES
and add the tutorial.middlewares.TillMiddleware
key.
DOWNLOADER_MIDDLEWARES = {
'tutorial.middlewares.TillMiddleware': 350 # Add this middleware
}
Your settings.py
file should now look like this here
Next, run your Scrapy project like you normally would.
Note: If you don't have an existing Scrapy project to try with Till, you can try our working example here.
Visit the Till UI at http://localhost:2980/requests to see that your new requests are shown.
Getting Started
How To Use
Integrations
Python
Node.js
Go
Ruby