Urlencode

urlencode()函数原理就是首先把中文字符转换为十六进制,然后在每个字符前面加一个标识符%

http://www.lagou.com/jobs/list_Python?px=default&city=%E5%8C%97%E4%BA%AC&district=%E6%9C%9D%E9%98%B3%E5%8C%BA&bizArea=%E6%9C%9B%E4%BA%AC#filterBox

urlencode编码 - 图1

提出个问题:中文字符按什么编码格式进行转化成十六进制呢?

utf-8、gb2312、gbk urlencode编码

  • utf-8与utf-8 urlencode区别
  1. import urllib
  2. country = u'中国'
  3. country.encode('utf-8')
  4. '\xe4\xb8\xad\xe5\x9b\xbd'
  5. urllib.quote(country.encode('utf-8'))
  6. '%E4%B8%AD%E5%9B%BD'
  • gb2312与gb2312 urlencode区别
  1. import urllib
  2. country = u'中国'
  3. country.encode('gb2312')
  4. '\xd6\xd0\xb9\xfa'
  5. urllib.quote(country.encode('gb2312'))
  6. '%D6%D0%B9%FA'

案例

模拟出 拉勾网 如下url地址:

http://www.lagou.com/jobs/list_Python?px=default&city=%E5%8C%97%E4%BA%AC&district=%E6%9C%9D%E9%98%B3%E5%8C%BA&bizArea=%E6%9C%9B%E4%BA%AC#filterBox

  1. # -*- coding: utf-8 -*-
  2. import urllib
  3. import chardet
  4. city=u'北京'.encode('utf-8')
  5. district=u'朝阳区'.encode('utf-8')
  6. bizArea=u'望京'.encode('utf-8')
  7. query={
  8. 'city':city,
  9. 'district':district,
  10. 'bizArea':bizArea
  11. }
  12. print chardet.detect(query['city'])
  13. {'confidence': 0.7525, 'encoding': 'utf-8'}
  14. print urllib.urlencode(query)
  15. city=%E5%8C%97%E4%BA%AC&bizArea=%E6%9C%9B%E4%BA%AC&district=%E6%9C%9B%E4%BA%AC
  16. print 'http://www.lagou.com/jobs/list_Python?px=default&'+urllib.urlencode(query)+'#filterBox'
  17. http://www.lagou.com/jobs/list_Python?px=default&city=%E5%8C%97%E4%BA%AC&bizArea=%E6%9C%9B%E4%BA%AC&district=%E6%9C%9B%E4%BA%AC#filterBox

模拟出 阿里巴巴 如下url地址:

https://s.1688.com/selloffer/offer_search.htm?keywords=%CA%D6%BB%FA%BC%B0%C5%E4%BC%FE%CA%D0%B3%A1

urlencode编码 - 图2

  1. # -*- coding: utf-8 -*-
  2. import urllib
  3. import chardet
  4. keywords=u'手机及配件市场'.encode('gbk')
  5. query={
  6. 'keywords':keywords,
  7. }
  8. print chardet.detect(query['keywords'])
  9. {'confidence': 0.99, 'encoding': 'GB2312'}
  10. print urllib.urlencode(query)
  11. keywords=%CA%D6%BB%FA%BC%B0%C5%E4%BC%FE%CA%D0%B3%A1
  12. print 'https://s.1688.com/selloffer/offer_search.htm?'+urllib.urlencode(query)
  13. https://s.1688.com/selloffer/offer_search.htm?keywords=%CA%D6%BB%FA%BC%B0%C5%E4%BC%FE%CA%D0%B3%A1

练习

模拟出 环球经贸网 如下url地址:

http://search.nowec.com/search?q=%B0%B2%C8%AB%C3%C5

  1. # -*- coding: utf-8 -*-
  2. import urllib
  3. import chardet
  4. q=u'安全门'.encode('gb2312')
  5. query={
  6. 'q':q,
  7. }
  8. print chardet.detect(query['q'])
  9. {'confidence': 0.99, 'encoding': 'GB2312'}
  10. print urllib.urlencode(query)
  11. q=%B0%B2%C8%AB%C3%C5
  12. print 'http://search.nowec.com/search?'+urllib.urlencode(query)
  13. http://search.nowec.com/search?q=%B0%B2%C8%AB%C3%C5