words-match

words-match组件是基于字典树(DFA)并利用UnixSock通讯和自定义进程实现,开发本组件的目的是帮小伙伴们快速部署内容检测服务。

使用场景

  • 跟文字内容相关的产品都有应用场景。

  • 博客类的文章,评论的检测

  • 聊天内容的检测

  • 对垃圾内容的屏蔽

组件要求

None

安装方法

composer require easyswoole/words-match

仓库地址

easyswoole/words-match

基本使用

准备词库

服务启动的时候会一行一行将数据读出来,每一行的第一列为敏感词,其它列为附属信息

  1. php,是世界上,最好的语言
  2. java
  3. golang
  4. 程序员
  5. 代码
  6. 逻辑

服务注册

  1. <?php
  2. namespace EasySwoole\EasySwoole;
  3. use EasySwoole\Component\Di;
  4. use EasySwoole\EasySwoole\AbstractInterface\Event;
  5. use EasySwoole\EasySwoole\Swoole\EventRegister;
  6. use EasySwoole\Http\Request;
  7. use EasySwoole\Http\Response;
  8. use EasySwoole\WordsMatch\WMServer;
  9. class EasySwooleEvent implements Event
  10. {
  11. public static function initialize()
  12. {
  13. date_default_timezone_set('Asia/Shanghai');
  14. Di::getInstance()->set(SysConst::HTTP_GLOBAL_ON_REQUEST, function (Request $request, Response $response): bool {
  15. // TODO: Implement onRequest() method.
  16. return true;
  17. });
  18. Di::getInstance()->set(SysConst::HTTP_GLOBAL_AFTER_REQUEST, function (Request $request, Response $response): void {
  19. // TODO: Implement onRequest() method.
  20. });
  21. }
  22. public static function mainServerCreate(EventRegister $register)
  23. {
  24. // 配置 words-match
  25. $wdConfig = new \EasySwoole\WordsMatch\Config();
  26. $wdConfig->setDict(__DIR__ . '/dictionary.txt'); // 配置 词库地址
  27. $wdConfig->setMaxMEM(1024); // 配置 每个进程最大占用内存(M),默认为 512 M
  28. $wdConfig->setTimeout(3.0); // 配置 内容检测超时时间。默认为 3.0 s
  29. $wdConfig->setWorkerNum(3); // 配置 进程数
  30. // $wdConfig->setSockDIR(sys_get_temp_dir()); // (不建议修改)配置 socket 存放地址,默认为 sys_get_temp_dir(),即 '/tmp'
  31. // 注册服务
  32. WMServer::getInstance($wdConfig)->attachServer(ServerManager::getInstance()->getSwooleServer());
  33. }
  34. }

客户端使用

  1. <?php
  2. namespace App\HttpController;
  3. use EasySwoole\Http\AbstractInterface\Controller;
  4. use EasySwoole\WordsMatch\WMServer;
  5. class Index extends Controller
  6. {
  7. function detect()
  8. {
  9. // 需要检测的内容敏感词
  10. $content = 'php是世界上最好的语言';
  11. // 检测结果(返回 -1 表示检测超时,匹配检测到时返回检测到的敏感词内容)
  12. $result = WMServer::getInstance()->detect($content, 3);
  13. var_dump($result);
  14. /**
  15. * 输出结果:
  16. * array(1) {
  17. [0]=>
  18. object(EasySwoole\WordsMatch\Dictionary\DetectResult)#96 (5) {
  19. ["word"]=>
  20. string(30) "php是世界上最好的语言"
  21. ["location"]=>
  22. array(1) {
  23. [0]=>
  24. array(3) {
  25. ["word"]=>
  26. string(30) "php是世界上最好的语言"
  27. ["length"]=>
  28. int(12)
  29. ["location"]=>
  30. array(1) {
  31. [0]=>
  32. int(0)
  33. }
  34. }
  35. }
  36. ["count"]=>
  37. int(1)
  38. ["remark"]=>
  39. string(0) ""
  40. ["type"]=>
  41. int(1)
  42. }
  43. * }
  44. */
  45. }
  46. }

压测结果

对此组件分别进行1.5万、13万等级的词库测试,服务默认开启3个进程。

仅做参考,具体还以线上验证

电脑配置

  1. MacBook Air (13-inch, 2017)
  2. 处理器 1.8 GHz Intel Core i5
  3. 内存 8 GB 1600 MHz DDR3

1.5万词

并发10总请求数100
  1. 10 100
  2. Concurrency Level: 10
  3. Time taken for tests: 0.067 seconds
  4. Complete requests: 100
  5. Failed requests: 0
  6. Non-2xx responses: 100
  7. Total transferred: 17300 bytes
  8. HTML transferred: 2600 bytes
  9. Requests per second: 1492.49 [#/sec] (mean)
  10. Time per request: 6.700 [ms] (mean)
  11. Time per request: 0.670 [ms] (mean, across all concurrent requests)
  12. Transfer rate: 252.15 [Kbytes/sec] received
并发100总请求数1000
  1. Concurrency Level: 100
  2. Time taken for tests: 0.239 seconds
  3. Complete requests: 1000
  4. Failed requests: 0
  5. Non-2xx responses: 1000
  6. Total transferred: 173000 bytes
  7. HTML transferred: 26000 bytes
  8. Requests per second: 4189.17 [#/sec] (mean)
  9. Time per request: 23.871 [ms] (mean)
  10. Time per request: 0.239 [ms] (mean, across all concurrent requests)
  11. Transfer rate: 707.74 [Kbytes/sec] received

13万词

并发10总请求数100
  1. Concurrency Level: 10
  2. Time taken for tests: 0.057 seconds
  3. Complete requests: 100
  4. Failed requests: 0
  5. Non-2xx responses: 100
  6. Total transferred: 17300 bytes
  7. HTML transferred: 2600 bytes
  8. Requests per second: 1751.71 [#/sec] (mean)
  9. Time per request: 5.709 [ms] (mean)
  10. Time per request: 0.571 [ms] (mean, across all concurrent requests)
  11. Transfer rate: 295.94 [Kbytes/sec] received
并发100总请求数1000
  1. Concurrency Level: 100
  2. Time taken for tests: 0.225 seconds
  3. Complete requests: 1000
  4. Failed requests: 0
  5. Non-2xx responses: 1000
  6. Total transferred: 173000 bytes
  7. HTML transferred: 26000 bytes
  8. Requests per second: 4444.84 [#/sec] (mean)
  9. Time per request: 22.498 [ms] (mean)
  10. Time per request: 0.225 [ms] (mean, across all concurrent requests)
  11. Transfer rate: 750.93 [Kbytes/sec] received