Skip to content

Commit

Permalink
feat(settings.py): add RedisKeySpiderMiddleware to the SPIDER_MIDDLEW…
Browse files Browse the repository at this point in the history
…ARES to handle Redis key generation for spiders

chore(settings.py): remove unused code and configurations to improve code cleanliness and performance
  • Loading branch information
Morelatto committed Sep 20, 2023
1 parent b9eb2d7 commit cd72534
Showing 1 changed file with 4 additions and 9 deletions.
13 changes: 4 additions & 9 deletions rent_crawler/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
'scrapy_poet.InjectionMiddleware': 543,
}

SPIDER_MIDDLEWARES = {
'rent_crawler.middlewares.RedisKeySpiderMiddleware': 600,
}

USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 "
"Safari/601.3.9"
Expand All @@ -38,17 +42,8 @@
# "scrapy_fake_useragent.providers.FixedUserAgentProvider",
# ]

# Use Scrapy-Redis's scheduler and dupefilter
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
DUPEFILTER_CLASS = "rent_crawler.dupefilter.RedisDupeFilter"
DUPEFILTER_DEBUG = True

# Don't cleanup Redis queues, allows pause/resume crawls
SCHEDULER_PERSIST = True

# Schedule requests using a priority queue (FIFO by default)
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderPriorityQueue'

ITEM_PIPELINES = {
"rent_crawler.pipelines.MongoDBPipeline": 100
}
Expand Down

0 comments on commit cd72534

Please sign in to comment.