Scrapy cookies浅析
首先打消大家的疑虑, Scrapy会自动管理cookies, 就像浏览器一样:
Does Scrapy manage cookies automatically?
Yes, Scrapy receives and keeps track of cookies sent by servers, and sends them back on subsequent requests, like any regular web browser does.
Cookies的管理是通过CookiesMiddleware, 它属于DownloadMiddleware的一部分, 所有的requests和response都要经过它的处理.
首先我们看处理request的部分
代码:
class CookiesMiddleware(object): """This middleware enables working with sites that need cookies""" def init(self, debug=False): # 用字典生成多个cookiesjar self.jars = defaultdict(CookieJar) self.debug = debug def process_request(self, request, spider): if request.meta.get('dont_merge_cookies', False): return # 每个cookiesjar的key都存储在 meta字典中 cookiejarkey = request.meta.get("cookiejar") jar = self.jars[cookiejarkey] cookies = self._get_request_cookies(jar, request) # 把requests的cookies存储到cookiesjar中 for cookie in cookies: jar.set_cookie_if_ok(cookie, request) # set Cookie header # 删除原有的cookies request.headers.pop('Cookie', None) # 添加cookiesjar中的cookies到requests header jar.add_cookie_header(request) self._debug_cookie(request, spider)
流程如下:
- 使用字典初始化多个cookies jar
- 把每个requests指定的cookies jar 提取出来
- 然后根据policy把requests中的cookies添加cookies jar
- 最后把cookies jar中合适的cookies添加到requests首部
接下来看看如何处理response中的cookies:
def process_response(self, request, response, spider): if request.meta.get('dont_merge_cookies', False): return response # extract cookies from Set-Cookie and drop invalid/expired cookies cookiejarkey = request.meta.get("cookiejar") jar = self.jars[cookiejarkey] jar.extract_cookies(response, request) self._debug_set_cookie(response, spider) return response
流程如下:
- 首先从cookies jar 字典中把requests对应的cookiesjar提取出来.
- 使用extract_cookies把response首部中的cookies添加到cookies jar
当前内容版权归 piaosanlang 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 piaosanlang .