Signals¶
Scrapy uses signals extensively to notify when certain actions occur. You can catch some of those signals in your Scrapy project or extension to perform additional tasks or extend Scrapy to add functionality not provided out of the box.
Even though signals provide several arguments, the handlers which catch them don’t have to receive all of them.
For more information about working when see the documentation of pydispatcher (library used to implement signals).
Built-in signals reference¶
Here’s a list of signals used in Scrapy and their meaning, in alphabetical order.
engine_started¶
-
scrapy.core.signals.
engine_started
()¶ Sent when the Scrapy engine is started (for example, when a crawling process has started).
engine_stopped¶
-
scrapy.core.signals.
engine_stopped
()¶ Sent when the Scrapy engine is stopped (for example, when a crawling process has finished).
item_scraped¶
-
scrapy.core.signals.
item_scraped
(item, spider, response)¶ Sent when the engine receives a new scraped item from the spider, and right before the item is sent to the Item Pipeline.
Parameters: - item (
Item
object) – is the item scraped - spider (
BaseSpider
object) – the spider which scraped the item - response (
Response
object) – the response from which the item was scraped
- item (
item_passed¶
-
scrapy.core.signals.
item_passed
(item, spider, output)¶ Sent after an item has passed all the Item Pipeline stages without being dropped.
Parameters: - item (
Item
object) – the item which passed the pipeline - spider (
BaseSpider
object) – the spider which scraped the item - output – the output of the item pipeline. This is typically the
same
Item
object received in theitem
parameter, unless some pipeline stage created a new item.
- item (
item_dropped¶
-
scrapy.core.signals.
item_dropped
(item, spider, exception)¶ Sent after an item has been dropped from the Item Pipeline when some stage raised a
DropItem
exception.Parameters: - item (
Item
object) – the item dropped from the Item Pipeline - spider (
BaseSpider
object) – the spider which scraped the item - exception (
DropItem
exception) – the exception (which must be aDropItem
subclass) which caused the item to be dropped
- item (
spider_closed¶
-
scrapy.core.signals.
spider_closed
(spider, reason)¶ Sent after a spider has been closed. This can be used to release per-spider resources reserved on
spider_opened
.Parameters: - spider (
BaseSpider
object) – the spider which has been closed - reason (str) – a string which describes the reason why the spider was closed. If
it was closed because the spider has completed scraping, it the reason
is
'finished'
. Otherwise, if the spider was manually closed by calling theclose_spider
engine method, then the reason is the one passed in thereason
argument of that method (which defaults to'cancelled'
). If the engine was shutdown (for example, by hitting Ctrl-C to stop it) the reason will be'shutdown'
.
- spider (
spider_opened¶
-
scrapy.core.signals.
spider_opened
(spider)¶ Sent after a spider has been opened for crawling. This is typically used to reserve per-spider resources, but can be used for any task that needs to be performed when a spider is opened.
Parameters: spider ( BaseSpider
object) – the spider which has been opened
spider_idle¶
-
scrapy.core.signals.
spider_idle
(spider)¶ Sent when a spider has gone idle, which means the spider has no further:
- requests waiting to be downloaded
- requests scheduled
- items being processed in the item pipeline
If the idle state persists after all handlers of this signal have finished, the engine starts closing the spider. After the spider has finished closing, the
spider_closed
signal is sent.You can, for example, schedule some requests in your
spider_idle
handler to prevent the spider from being closed.Parameters: spider ( BaseSpider
object) – the spider which has gone idle
request_received¶
-
scrapy.core.signals.
request_received
(request, spider)¶ Sent when the engine receives a
Request
from a spider.Parameters: - request (
Request
object) – the request received - spider (
BaseSpider
object) – the spider which generated the request
- request (
request_uploaded¶
-
scrapy.core.signals.
request_uploaded
(request, spider)¶ Sent right after the download has sent a
Request
.Parameters: - request (
Request
object) – the request uploaded/sent - spider (
BaseSpider
object) – the spider which generated the request
- request (
response_received¶
-
scrapy.core.signals.
response_received
(response, spider)¶ Parameters: - response (
Response
object) – the response received - spider (
BaseSpider
object) – the spider for which the response is intended
Sent when the engine receives a new
Response
from the downloader.- response (
response_downloaded¶
-
scrapy.core.signals.
response_downloaded
(response, spider)¶ Sent by the downloader right after a
HTTPResponse
is downloaded.Parameters: - response (
Response
object) – the response downloaded - spider (
BaseSpider
object) – the spider for which the response is intended
- response (