scrapyd 服务器添加认证信息

我们也可以在scrapyd前面加一层反向代理来实现用户认证。以nginx为例, 配置nginx

安装nginx

  1. sudo apt-get install nginx

配置nginx

vi /etc/nginx/nginx.conf修改如下:

  1. # Scrapyd local proxy for basic authentication.
  2. # Don't forget iptables rule.
  3. # iptables -A INPUT -p tcp --destination-port 6800 -s ! 127.0.0.1 -j DROP
  4. http {
  5. server {
  6. listen 6801;
  7. location / {
  8. proxy_pass http://127.0.0.1:6800/;
  9. auth_basic "Restricted";
  10. auth_basic_user_file /etc/nginx/conf.d/.htpasswd;
  11. }
  12. }
  13. }

/etc/nginx/htpasswd/user.htpasswd里设置的用户名 enlong和密码都是test 修改配置文件,添加用户信息

Nginx使用htpasswd创建用户认证

  1. python@ubuntu:/etc/nginx/conf.d$ sudo htpasswd -c .htpasswd enlong
  2. New password:
  3. Re-type new password:
  4. Adding password for user enlong
  5. python@ubuntu:/etc/nginx/conf.d$ cat .htpasswd
  6. enlong:$apr1$2slPhvee$6cqtraHxoxclqf1DpqIPM.
  7. python@ubuntu:/etc/nginx/conf.d$ sudo htpasswd -bc .htpasswd admin admin

apache htpasswd命令用法实例

1、如何利用htpasswd命令添加用户?

  htpasswd -bc .passwd www.leapsoul.cn php

  在bin目录下生成一个.passwd文件,用户名www.leapsoul.cn,密码:php,默认采用MD5加密方式

2、如何在原有密码文件中增加下一个用户?

  htpasswd -b .passwd leapsoul phpdev

  去掉c选项,即可在第一个用户之后添加第二个用户,依此类推

重启nginx

  1. sudo service nginx restart

测试Nginx

  1. F:\_____gitProject_______\curl-7.33.0-win64-ssl-sspi\tieba_baidu>curl http://localhost:6800/schedule.json -d project=tutorial -d spider=tencent -u enlong:test
  2. {"status": "ok", "jobid": "5ee61b08428611e6af1a000c2969bafd", "node_name": "ubuntu"}

配置scrapy.cfg文件

  1. [deploy]
  2. url = http://192.168.17.129:6801/
  3. project = tutorial
  4. username = admin
  5. password = admin

注意上面的url已经修改为了nginx监听的端口。

提醒: 记得修改服务器上scrapyd的配置bind_address字段为127.0.0.1,以免可以从外面绕过nginx, 直接访问6800端口。 关于配置可以参看本文后面的配置文件设置.

修改配置文件

sudo vi /etc/scrapyd/scrapyd.conf

  1. [scrapyd]
  2. bind_address = 127.0.0.1

scrapyd启动的时候会自动搜索配置文件,配置文件的加载顺序为

/etc/scrapyd/scrapyd.conf/etc/scrapyd/conf.d/*scrapyd.conf~/.scrapyd.conf

最后加载的会覆盖前面的设置

默认配置文件如下, 可以根据需要修改

  1. [scrapyd]
  2. eggs_dir = eggs
  3. logs_dir = logs
  4. items_dir = items
  5. jobs_to_keep = 5
  6. dbs_dir = dbs
  7. max_proc = 0
  8. max_proc_per_cpu = 4
  9. finished_to_keep = 100
  10. poll_interval = 5
  11. bind_address = 0.0.0.0
  12. http_port = 6800
  13. debug = off
  14. runner = scrapyd.runner
  15. application = scrapyd.app.application
  16. launcher = scrapyd.launcher.Launcher
  17. [services]
  18. schedule.json = scrapyd.webservice.Schedule
  19. cancel.json = scrapyd.webservice.Cancel
  20. addversion.json = scrapyd.webservice.AddVersion
  21. listprojects.json = scrapyd.webservice.ListProjects
  22. listversions.json = scrapyd.webservice.ListVersions
  23. listspiders.json = scrapyd.webservice.ListSpiders
  24. delproject.json = scrapyd.webservice.DeleteProject
  25. delversion.json = scrapyd.webservice.DeleteVersion
  26. listjobs.json = scrapyd.webservice.ListJobs

关于配置的各个参数具体含义,可以参考官方文档

http://scrapyd.readthedocs.io/en/stable/config.html