Signals

Scrapy uses signals extensively to notify when certain actions occur. You can catch some of those signals in your Scrapy project or extension to perform additional tasks or extend Scrapy to add functionality not provided out of the box.

Even though signals provide several arguments, the handlers which catch them don’t have to receive all of them.

For more information about working when see the documentation of pydispatcher (library used to implement signals).

Built-in signals reference

Here’s a list of signals used in Scrapy and their meaning, in alphabetical order.

engine_started

scrapy.core.signals.engine_started()

Sent when the Scrapy engine is started (for example, when a crawling process has started).

engine_stopped

scrapy.core.signals.engine_stopped()

Sent when the Scrapy engine is stopped (for example, when a crawling process has finished).

item_scraped

scrapy.core.signals.item_scraped(item, spider, response)

Sent when the engine receives a new scraped item from the spider, and right before the item is sent to the Item Pipeline.

Parameters:
  • item (Item object) – is the item scraped
  • spider (BaseSpider object) – the spider which scraped the item
  • response (Response object) – the response from which the item was scraped

item_passed

scrapy.core.signals.item_passed(item, spider, output)

Sent after an item has passed all the Item Pipeline stages without being dropped.

Parameters:
  • item (Item object) – the item which passed the pipeline
  • spider (BaseSpider object) – the spider which scraped the item
  • output – the output of the item pipeline. This is typically the same Item object received in the item parameter, unless some pipeline stage created a new item.

item_dropped

scrapy.core.signals.item_dropped(item, spider, exception)

Sent after an item has been dropped from the Item Pipeline when some stage raised a DropItem exception.

Parameters:
  • item (Item object) – the item dropped from the Item Pipeline
  • spider (BaseSpider object) – the spider which scraped the item
  • exception (DropItem exception) – the exception (which must be a DropItem subclass) which caused the item to be dropped

spider_closed

scrapy.core.signals.spider_closed(spider, reason)

Sent after a spider has been closed. This can be used to release per-spider resources reserved on spider_opened.

Parameters:
  • spider (BaseSpider object) – the spider which has been closed
  • reason (str) – a string which describes the reason why the spider was closed. If it was closed because the spider has completed scraping, it the reason is 'finished'. Otherwise, if the spider was manually closed by calling the close_spider engine method, then the reason is the one passed in the reason argument of that method (which defaults to 'cancelled'). If the engine was shutdown (for example, by hitting Ctrl-C to stop it) the reason will be 'shutdown'.

spider_opened

scrapy.core.signals.spider_opened(spider)

Sent after a spider has been opened for crawling. This is typically used to reserve per-spider resources, but can be used for any task that needs to be performed when a spider is opened.

Parameters:spider (BaseSpider object) – the spider which has been opened

spider_idle

scrapy.core.signals.spider_idle(spider)

Sent when a spider has gone idle, which means the spider has no further:

  • requests waiting to be downloaded
  • requests scheduled
  • items being processed in the item pipeline

If the idle state persists after all handlers of this signal have finished, the engine starts closing the spider. After the spider has finished closing, the spider_closed signal is sent.

You can, for example, schedule some requests in your spider_idle handler to prevent the spider from being closed.

Parameters:spider (BaseSpider object) – the spider which has gone idle

request_received

scrapy.core.signals.request_received(request, spider)

Sent when the engine receives a Request from a spider.

Parameters:
  • request (Request object) – the request received
  • spider (BaseSpider object) – the spider which generated the request

request_uploaded

scrapy.core.signals.request_uploaded(request, spider)

Sent right after the download has sent a Request.

Parameters:
  • request (Request object) – the request uploaded/sent
  • spider (BaseSpider object) – the spider which generated the request

response_received

scrapy.core.signals.response_received(response, spider)
Parameters:
  • response (Response object) – the response received
  • spider (BaseSpider object) – the spider for which the response is intended

Sent when the engine receives a new Response from the downloader.

response_downloaded

scrapy.core.signals.response_downloaded(response, spider)

Sent by the downloader right after a HTTPResponse is downloaded.

Parameters:
  • response (Response object) – the response downloaded
  • spider (BaseSpider object) – the spider for which the response is intended