环视断言

  • 环视断言 = look around (assertion)
    • 包括
      • look ahead (assertion)=正向断言
        • positive lookahead assertion(?=xxx)
        • negative lookahead assertion(?!xxx)
      • look behind (assertion)=反向断言
        • positive lookbehind assertion(?<=xxx)
        • negative lookbehind assertion(?<!xxx)

如果觉得look ahead和look behind很费解的话,看这个图,就容易懂了:

python_look_ahead_and_look_behind

总体就2个逻辑:

  • 站在 当前所要匹配的内容
    • 往哪看
      • ahead:向前 向右 ➡️ 当前字符串继续往后的方向
        • 从左到右 叫做 向前,属于正向
      • behind:向左 ◀️ 向后 ⬅️ 当前字符串之前的方向
        • 所以会额外加上一个 <小于号 表示向后看的意思
          • (?<=xxx)
          • (?<!xxx)
    • positive/negative:
      • positive=正面的,肯定的,用 等于号=,意思是:=xxx
      • negative=负面的,否定的,用 不等于号!,意思是:!=xxx

==》因此推导出:

  • positive lookahead assertion(?=xxx)
  • negative lookahead assertion(?!xxx)
  • positive lookbehind assertion(?<=xxx)
  • negative lookbehind assertion(?<!xxx)

官网文档:

  1. (...)
  2. Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group;
  3. the contents of a group can be retrieved after a match has been performed,
  4. and can be matched later in the string with the \number special sequence, described below.
  5. To match the literals '(' or ')', use \( or \), or enclose them inside a character class: [(], [)].
  6. (?=...)
  7. Matches if ... matches next, but doesnt consume any of the string.
  8. This is called a lookahead assertion.
  9. For example, Isaac (?=Asimov) will match 'Isaac ' only if its followed by 'Asimov'.
  10. (?!...)
  11. Matches if ... doesnt match next.
  12. This is a negative lookahead assertion.
  13. For example, Isaac (?!Asimov) will match 'Isaac ' only if its not followed by 'Asimov'.
  14. (?<=...)
  15. Matches if the current position in the string is preceded by a match for ... that ends at the current position.
  16. This is called a positive lookbehind assertion.
  17. (?<=abc)def will find a match in 'abcdef', since the lookbehind will back up 3 characters and check if the contained pattern matches.
  18. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not.
  19. Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched;
  20. (?<!...)
  21. Matches if the current position in the string is not preceded by a match for ....
  22. This is called a negative lookbehind assertion.
  23. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length.
  24. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

代码详细解释:

  1. #!/usr/bin/python
  2. # -*- coding: utf-8 -*-
  3. # Author: Crifan Li
  4. # Update: 20191224
  5. # Function: Demo python re lookahead and lookbehind group
  6. import re
  7. def demoReLookAheadBehind():
  8. inputStrList = [
  9. "date=20191224&name=CrifanLi&language=python",
  10. "language=python&name=CrifanLi&date=20191224", # lookahead will NOT match
  11. "language=python&name=CrifanLi date=20191224", # negative lookahead CAN match, the whole 'name=CrifanLi'
  12. "language=go&name=CrifanLi date=20191224", # positive lookbehind CAN match
  13. "language=go name=CrifanLi date=20191224", # negative lookbehind CAN match
  14. ]
  15. groupNormalPattern = "name=(\w+)" # 匹配任何 name=XXX 其中XXX是字母数字下划线均可
  16. groupLookaheadPattern = "name=(\w+)(?=&language)" # 只匹配后面 是&language 的情况
  17. groupNegativelookaheadPattern = "name=(\w+)(?!&)" # 只匹配后面 不是& 的情况
  18. groupPositivelookbehindPattern = "(?<=go&)name=(\w+)" # 只匹配前面 是go& 的情况
  19. groupNegativelookbehindPattern = "(?<!&)name=(\w+)" # 只匹配前面 不是& 的情况
  20. for curIdx, eachInputStr in enumerate(inputStrList):
  21. print("\n%s [%d] %s %s" % ("="*20, curIdx, eachInputStr, "="*20))
  22. print("%s %s %s" % ("-"*10, "normal group", "-"*10))
  23. foundGroupNormal = re.search(groupNormalPattern, eachInputStr)
  24. print("foundGroupNormal=%s" % foundGroupNormal)
  25. if foundGroupNormal:
  26. wholeMatchStrNormal = foundGroupNormal.group(0)
  27. print("wholeMatchStrNormal=%s" % wholeMatchStrNormal)
  28. matchedGroupsNormal = foundGroupNormal.groups()
  29. print("matchedGroupsNormal=%s" % (matchedGroupsNormal, ))
  30. foundName = foundGroupNormal.group(1)
  31. print("foundName=%s" % foundName)
  32. print("%s %s %s" % ("-"*10, "lookahead group", "-"*10))
  33. foundGroupLookahead = re.search(groupLookaheadPattern, eachInputStr)
  34. print("foundGroupLookahead=%s" % foundGroupLookahead)
  35. if foundGroupLookahead:
  36. wholeMatchStrLookahead = foundGroupLookahead.group(0)
  37. print("wholeMatchStrLookahead=%s" % wholeMatchStrLookahead)
  38. matchedGroupsLookahead = foundGroupLookahead.groups()
  39. print("matchedGroupsLookahead=%s" % (matchedGroupsLookahead, ))
  40. foundName = foundGroupLookahead.group(1)
  41. print("foundName=%s" % foundName)
  42. print("%s %s %s" % ("-"*10, "negative lookahead group", "-"*10))
  43. foundGroupNegativelookahead = re.search(groupNegativelookaheadPattern, eachInputStr)
  44. print("foundGroupNegativelookahead=%s" % foundGroupNegativelookahead)
  45. if foundGroupNegativelookahead:
  46. wholeMatchStrNegativelookahead = foundGroupNegativelookahead.group(0)
  47. print("wholeMatchStrNegativelookahead=%s" % wholeMatchStrNegativelookahead)
  48. matchedGroupsNegativelookahead = foundGroupNegativelookahead.groups()
  49. print("matchedGroupsNegativelookahead=%s" % (matchedGroupsNegativelookahead, ))
  50. foundName = foundGroupNegativelookahead.group(1)
  51. print("foundName=%s" % foundName)
  52. print("%s %s %s" % ("-"*10, "positive lookahead group", "-"*10))
  53. foundGroupPositivelookbehind = re.search(groupPositivelookbehindPattern, eachInputStr)
  54. print("foundGroupPositivelookbehind=%s" % foundGroupPositivelookbehind)
  55. if foundGroupPositivelookbehind:
  56. wholeMatchStrPositivelookbehind = foundGroupPositivelookbehind.group(0)
  57. print("wholeMatchStrPositivelookbehind=%s" % wholeMatchStrPositivelookbehind)
  58. matchedGroupsPositivelookbehind = foundGroupPositivelookbehind.groups()
  59. print("matchedGroupsPositivelookbehind=%s" % (matchedGroupsPositivelookbehind, ))
  60. foundName = foundGroupPositivelookbehind.group(1)
  61. print("foundName=%s" % foundName)
  62. print("%s %s %s" % ("-"*10, "positive lookahead group", "-"*10))
  63. foundGroupNegativelookbehind = re.search(groupNegativelookbehindPattern, eachInputStr)
  64. print("foundGroupNegativelookbehind=%s" % foundGroupNegativelookbehind)
  65. if foundGroupNegativelookbehind:
  66. wholeMatchStrNegativelookbehind = foundGroupNegativelookbehind.group(0)
  67. print("wholeMatchStrNegativelookbehind=%s" % wholeMatchStrNegativelookbehind)
  68. matchedGroupsNegativelookbehind = foundGroupNegativelookbehind.groups()
  69. print("matchedGroupsNegativelookbehind=%s" % (matchedGroupsNegativelookbehind, ))
  70. foundName = foundGroupNegativelookbehind.group(1)
  71. print("foundName=%s" % foundName)
  72. # ==================== [0] date=20191224&name=CrifanLi&language=python ====================
  73. # ---------- normal group ----------
  74. # foundGroupNormal=<re.Match object; span=(14, 27), match='name=CrifanLi'>
  75. # wholeMatchStrNormal=name=CrifanLi
  76. # matchedGroupsNormal=('CrifanLi',)
  77. # foundName=CrifanLi
  78. # ---------- lookahead group ----------
  79. # foundGroupLookahead=<re.Match object; span=(14, 27), match='name=CrifanLi'>
  80. # wholeMatchStrLookahead=name=CrifanLi
  81. # matchedGroupsLookahead=('CrifanLi',)
  82. # foundName=CrifanLi
  83. # ---------- negative lookahead group ----------
  84. # foundGroupNegativelookahead=<re.Match object; span=(14, 26), match='name=CrifanL'>
  85. # wholeMatchStrNegativelookahead=name=CrifanL
  86. # matchedGroupsNegativelookahead=('CrifanL',)
  87. # foundName=CrifanL
  88. # ---------- positive lookahead group ----------
  89. # foundGroupPositivelookbehind=None
  90. # ---------- positive lookahead group ----------
  91. # foundGroupNegativelookbehind=None
  92. # ==================== [1] language=python&name=CrifanLi&date=20191224 ====================
  93. # ---------- normal group ----------
  94. # foundGroupNormal=<re.Match object; span=(16, 29), match='name=CrifanLi'>
  95. # wholeMatchStrNormal=name=CrifanLi
  96. # matchedGroupsNormal=('CrifanLi',)
  97. # foundName=CrifanLi
  98. # ---------- lookahead group ----------
  99. # foundGroupLookahead=None
  100. # ---------- negative lookahead group ----------
  101. # foundGroupNegativelookahead=<re.Match object; span=(16, 28), match='name=CrifanL'>
  102. # wholeMatchStrNegativelookahead=name=CrifanL
  103. # matchedGroupsNegativelookahead=('CrifanL',)
  104. # foundName=CrifanL
  105. # ---------- positive lookahead group ----------
  106. # foundGroupPositivelookbehind=None
  107. # ---------- positive lookahead group ----------
  108. # foundGroupNegativelookbehind=None
  109. # ==================== [2] language=python&name=CrifanLi date=20191224 ====================
  110. # ---------- normal group ----------
  111. # foundGroupNormal=<re.Match object; span=(16, 29), match='name=CrifanLi'>
  112. # wholeMatchStrNormal=name=CrifanLi
  113. # matchedGroupsNormal=('CrifanLi',)
  114. # foundName=CrifanLi
  115. # ---------- lookahead group ----------
  116. # foundGroupLookahead=None
  117. # ---------- negative lookahead group ----------
  118. # foundGroupNegativelookahead=<re.Match object; span=(16, 29), match='name=CrifanLi'>
  119. # wholeMatchStrNegativelookahead=name=CrifanLi
  120. # matchedGroupsNegativelookahead=('CrifanLi',)
  121. # foundName=CrifanLi
  122. # ---------- positive lookahead group ----------
  123. # foundGroupPositivelookbehind=None
  124. # ---------- positive lookahead group ----------
  125. # foundGroupNegativelookbehind=None
  126. # ==================== [3] language=go&name=CrifanLi date=20191224 ====================
  127. # ---------- normal group ----------
  128. # foundGroupNormal=<re.Match object; span=(12, 25), match='name=CrifanLi'>
  129. # wholeMatchStrNormal=name=CrifanLi
  130. # matchedGroupsNormal=('CrifanLi',)
  131. # foundName=CrifanLi
  132. # ---------- lookahead group ----------
  133. # foundGroupLookahead=None
  134. # ---------- negative lookahead group ----------
  135. # foundGroupNegativelookahead=<re.Match object; span=(12, 25), match='name=CrifanLi'>
  136. # wholeMatchStrNegativelookahead=name=CrifanLi
  137. # matchedGroupsNegativelookahead=('CrifanLi',)
  138. # foundName=CrifanLi
  139. # ---------- positive lookahead group ----------
  140. # foundGroupPositivelookbehind=<re.Match object; span=(12, 25), match='name=CrifanLi'>
  141. # wholeMatchStrPositivelookbehind=name=CrifanLi
  142. # matchedGroupsPositivelookbehind=('CrifanLi',)
  143. # foundName=CrifanLi
  144. # ---------- positive lookahead group ----------
  145. # foundGroupNegativelookbehind=None
  146. # ==================== [4] language=go name=CrifanLi date=20191224 ====================
  147. # ---------- normal group ----------
  148. # foundGroupNormal=<re.Match object; span=(12, 25), match='name=CrifanLi'>
  149. # wholeMatchStrNormal=name=CrifanLi
  150. # matchedGroupsNormal=('CrifanLi',)
  151. # foundName=CrifanLi
  152. # ---------- lookahead group ----------
  153. # foundGroupLookahead=None
  154. # ---------- negative lookahead group ----------
  155. # foundGroupNegativelookahead=<re.Match object; span=(12, 25), match='name=CrifanLi'>
  156. # wholeMatchStrNegativelookahead=name=CrifanLi
  157. # matchedGroupsNegativelookahead=('CrifanLi',)
  158. # foundName=CrifanLi
  159. # ---------- positive lookahead group ----------
  160. # foundGroupPositivelookbehind=None
  161. # ---------- positive lookahead group ----------
  162. # foundGroupNegativelookbehind=<re.Match object; span=(12, 25), match='name=CrifanLi'>
  163. # wholeMatchStrNegativelookbehind=name=CrifanLi
  164. # matchedGroupsNegativelookbehind=('CrifanLi',)
  165. # foundName=CrifanLi
  166. if __name__ == "__main__":
  167. demoReLookAheadBehind()

对于代码中的结果,总结起来就是:

  • name=(\w+):普通的group
    • 匹配结果
      • 5个都匹配
        • date=20191224&name=CrifanLi&language=python
        • language=python&name=CrifanLi&date=20191224
        • language=python&name=CrifanLi date=20191224
        • language=go&name=CrifanLi date=20191224
        • language=go name=CrifanLi date=20191224
      • 匹配到内容都是:
        • name=CrifanLi
    • 解析:因为只是普通的(xxx)的组,没有限制,所以都能匹配到
  • name=(\w+)(?=&language):lookahead=positive lookahead=正向先行断言
    • 匹配结果
      • 只匹配了1个:
        • date=20191224&name=CrifanLi&language=python
      • 其余4个都不匹配
        • language=python&name=CrifanLi&date=20191224
        • language=python&name=CrifanLi date=20191224
        • language=go&name=CrifanLi date=20191224
        • language=go name=CrifanLi date=20191224
    • 解析:
      • (?=&language) 表示 后面一定是 &language
        • 而上面4个的后面,分别是:
          • &date=
          • date=
          • date=
          • date=
        • 所以都不匹配
  • name=(\w+)(?!&):negative look ahead=负向先行断言
    • 匹配结果
      • 5个都匹配到了,但是匹配的内容不一样
        • 2个匹配到了:name=CrifanL
          • date=20191224&name=CrifanLi&language=python
          • language=python&name=CrifanLi&date=20191224
        • 3个匹配到了:name=CrifanLi
          • language=python&name=CrifanLi date=20191224
          • language=go&name=CrifanLi date=20191224
          • language=go name=CrifanLi date=20191224
    • 解析
      • 注意 前2个匹配到的 最后没有i,是CrifanL,而不是CrifanLi
      • 因为(?!&)表示 后面不能是 &
        • 所以类似于
          • name=CrifanLi&language
          • name=CrifanLi&date
        • 这种,只能匹配到L,而不是i,因为i后面是&,此处要求后面不能是&
  • (?<=go&)name=(\w+):positive look behind=正向后行断言
    • 匹配结果
      • 只匹配到1个
        • language=go&name=CrifanLi date=20191224
    • 解析
      • 因为此处(?<=go&)意思是,前面一定要是 go& 所以只有这个匹配
      • 其他的
        • 20191224&name=
        • python&name=
        • python&name=
        • go name=
      • 中name=的前面 都不符合条件
  • (?<!&)name=(\w+):negative look behind=负向后行断言
    • 匹配结果
      • 只匹配到1个:
        • language=go name=CrifanLi date=20191224
    • 解析
      • 因为(?<!&)的意思是:前面不能是 &
        • 所以只有
          • go name=
        • 这个符合,而其余的
          • 20191224&name=
          • python&name=
          • python&name=
          • go&name=
        • name=前面都是&,所以不符合条件,不匹配

(positive) look behind

官网解释:

  1. (?<=...)
  2. Matches if the current position in the string is preceded by a match for ... that ends at the current position.
  3. This is called a positive lookbehind assertion.
  4. (?<=abc)def will find a match in 'abcdef', since the lookbehind will back up 3 characters and check if the contained pattern matches.
  5. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not.
  6. Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched;
  7. (?P<name>...)
  8. Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name.
  9. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression.
  10. A symbolic group is also a numbered group, just as if the group were not named.

用代码详细解释:

  1. #!/usr/bin/python
  2. # -*- coding: utf-8 -*-
  3. # Author: Crifan Li
  4. # Update: 20191224
  5. # Function: Demo python re lookbehind group
  6. import re
  7. def demoReLookbehind():
  8. namedGroupPattern = "(SID=)(?P<sidValue>[^&]+)"
  9. lookBehindGroupPattern = "(?<=SID=)(?P<sidValue>[^&]+)"
  10. """
  11. 正则含义的解释:
  12. (?<=SID=)[^&]+
  13. (?<=SID=) 属于(?<=XXX),其中XXX是"SID="这个固定长度的4个字符的字符串 用于匹配你要的值的前面的部分 SID=YYY中的 SID=
  14. [^&]+
  15. [] 中括号中是所允许出现的字符
  16. ^ 是取反,除了...之外的,所以^&就是除了&字符之外的,因为你要匹配的字符串是
  17. SID=8E3qOreRiOnbhkl84Uc&YYY 可以避免匹配到 最后的&和YYY
  18. + 表示1个或更多个 直到遇到 不允许出现的&字符,匹配此处的 即从8到c,即8E3qOreRiOnbhkl84Uc
  19. """
  20. inputStrList = [
  21. "GeneralSearch&SID=8E3qOreRiOnbhkl84Uc&preferencesSaved",
  22. ]
  23. for eachInputStr in inputStrList:
  24. print("="*60)
  25. foundNamedGroup = re.search(namedGroupPattern, eachInputStr)
  26. foundLookbehindGroup = re.search(lookBehindGroupPattern, eachInputStr)
  27. if foundNamedGroup and foundLookbehindGroup:
  28. print("foundNamedGroup=%s" % foundNamedGroup)
  29. print("foundLookbehindGroup=%s" % foundLookbehindGroup)
  30. groupsNamedGroup = foundNamedGroup.groups()
  31. print("groupsNamedGroup=%s" % (groupsNamedGroup, ))
  32. groupsLookbehindGroup = foundLookbehindGroup.groups()
  33. print("groupsLookbehindGroup=%s" % (groupsLookbehindGroup, ))
  34. wholeStrNamedGroup = foundNamedGroup.group(0)
  35. print("wholeStrNamedGroup=%s" % wholeStrNamedGroup)
  36. wholeStrLookbehindGroup = foundLookbehindGroup.group(0)
  37. print("wholeStrLookbehindGroup=%s" % wholeStrLookbehindGroup)
  38. sidValueNamedGroup = foundNamedGroup.group("sidValue")
  39. print("sidValueNamedGroup=%s" % sidValueNamedGroup)
  40. sidValueLookbehindGroup = foundLookbehindGroup.group("sidValue")
  41. print("sidValueLookbehindGroup=%s" % sidValueLookbehindGroup)
  42. # foundNamedGroup=<re.Match object; span=(14, 37), match='SID=8E3qOreRiOnbhkl84Uc'>
  43. # foundLookbehindGroup=<re.Match object; span=(18, 37), match='8E3qOreRiOnbhkl84Uc'>
  44. # groupsNamedGroup=('SID=', '8E3qOreRiOnbhkl84Uc')
  45. # groupsLookbehindGroup=('8E3qOreRiOnbhkl84Uc',)
  46. # wholeStrNamedGroup=SID=8E3qOreRiOnbhkl84Uc
  47. # wholeStrLookbehindGroup=8E3qOreRiOnbhkl84Uc
  48. # sidValueNamedGroup=8E3qOreRiOnbhkl84Uc
  49. # sidValueLookbehindGroup=8E3qOreRiOnbhkl84Uc
  50. if __name__ == "__main__":
  51. demoReLookbehind()