configs详解——之selector

selector是页面元素选择器类,下面介绍此类可以调用的方法

select($html, $selector, $selector_type = 'xpath')

@param $html 需筛选的网页内容@param $selector 选择器规则@param $selector_type 选择器类型: xpath、regex、css, 默认为xpath选择类型

栗子1:

通过xpath选择器提取网页内容的标题

  1. $html = requests::get("http://www.epooll.com/archives/806/");
  2. $data = selector::select($html, "//div[contains(@class,'page-header')]//h1//a");
  3. var_dump($data);

栗子2:

通过css选择器提取网页内容的标题

  1. $html = requests::get("http://www.epooll.com/archives/806/");
  2. $data = selector::select($html, ".page-header > h1 > a", "css");
  3. var_dump($data);

栗子3:

通过正则匹配提取网页内容的标题

  1. $html = requests::get("http://www.epooll.com/archives/806/");
  2. $data = selector::select($html, '@<title>(.*?)</title>@', "regex");
  3. var_dump($data);

remove($html, $selector, $selector_type = 'xpath')

@param $html 需过滤的网页内容@param $selector 选择器规则@param $selector_type 选择器类型: xpath、regex、css, 默认为xpath选择类型

举个例子:

  1. $html =<<<STR
  2. <div id="demo">
  3. aaa
  4. <span class="tt">bbb</span>
  5. <span>ccc</span>
  6. <p>ddd</p>
  7. </div>
  8. STR;
  9. // 获取id为demo的div内容
  10. $html = selector::select($html, "//div[contains(@id,'demo')]");
  11. // 在上面获取内容基础上,删除class为tt的span标签
  12. $data = selector::remove($html, "//span[contains(@class,'tt')]");
  13. print_r($data);