跟踪系统调用

当想知道一个进程在做什么事情的时候,可以通过strace命令跟踪一个进程的所有系统调用。

1、运行 php start.php status 能看到workerman相关进程的信息 如下:

  1. Hello admin
  2. ---------------------------------------GLOBAL STATUS--------------------------------------------
  3. WorkerMan version:3.0.1
  4. start time:2014-08-12 17:42:04 run 0 days 1 hours
  5. load average: 3.34, 3.59, 3.67
  6. 1 users 8 workers 14 processes
  7. worker_name exit_status exit_count
  8. BusinessWorker 0 0
  9. ChatWeb 0 0
  10. FileMonitor 0 0
  11. Gateway 0 0
  12. Monitor 0 0
  13. StatisticProvider 0 0
  14. StatisticWeb 0 0
  15. StatisticWorker 0 0
  16. ---------------------------------------PROCESS STATUS-------------------------------------------
  17. pid memory listening timestamp worker_name total_request packet_err thunder_herd client_close send_fail throw_exception suc/total
  18. 10352 1.5M tcp://0.0.0.0:55151 1407836524 ChatWeb 12 0 0 2 0 0 100%
  19. 10354 1.25M tcp://0.0.0.0:7272 1407836524 Gateway 3 0 0 0 0 0 100%
  20. 10355 1.25M tcp://0.0.0.0:7272 1407836524 Gateway 0 0 1 0 0 0 100%
  21. 10365 1.25M tcp://0.0.0.0:55757 1407836524 StatisticWeb 0 0 0 0 0 0 100%
  22. 10358 1.25M tcp://0.0.0.0:7272 1407836524 Gateway 3 0 2 0 0 0 100%
  23. 10364 1.25M tcp://0.0.0.0:55858 1407836524 StatisticProvider 0 0 0 0 0 0 100%
  24. 10356 1.25M tcp://0.0.0.0:7272 1407836524 Gateway 3 0 2 0 0 0 100%
  25. 10366 1.25M udp://0.0.0.0:55656 1407836524 StatisticWorker 55 0 0 0 0 0 100%
  26. 10349 1.25M tcp://127.0.0.1:7373 1407836524 BusinessWorker 5 0 0 0 0 0 100%
  27. 10350 1.25M tcp://127.0.0.1:7373 1407836524 BusinessWorker 0 0 0 0 0 0 100%
  28. 10351 1.5M tcp://127.0.0.1:7373 1407836524 BusinessWorker 5 0 0 0 0 0 100%
  29. 10348 1.25M tcp://127.0.0.1:7373 1407836524 BusinessWorker 2 0 0 0 0 0 100%

2、例如我们想知道pid为10354的gateway进程在做什么,则可以运行命令 strace -p 10354 (可能需要root权限) 类似如下:

  1. sudo strace -p 10354
  2. Process 10354 attached - interrupt to quit
  3. clock_gettime(CLOCK_MONOTONIC, {118627, 242986712}) = 0
  4. gettimeofday({1407840609, 102439}, NULL) = 0
  5. epoll_wait(3, 985f4f0, 32, -1) = -1 EINTR (Interrupted system call)
  6. --- SIGUSR2 (User defined signal 2) @ 0 (0) ---
  7. send(7, "\f", 1, 0) = 1
  8. sigreturn() = ? (mask now [])
  9. clock_gettime(CLOCK_MONOTONIC, {118627, 699623319}) = 0
  10. gettimeofday({1407840609, 559092}, NULL) = 0
  11. epoll_wait(3, {{EPOLLIN, {u32=9, u64=9}}}, 32, -1) = 1
  12. clock_gettime(CLOCK_MONOTONIC, {118627, 699810499}) = 0
  13. gettimeofday({1407840609, 559277}, NULL) = 0
  14. recv(9, "\f", 1024, 0) = 1
  15. recv(9, 0xb60b4880, 1024, 0) = -1 EAGAIN (Resource temporarily unavailable)
  16. epoll_wait(3, 985f4f0, 32, -1) = -1 EINTR (Interrupted system call)
  17. --- SIGUSR2 (User defined signal 2) @ 0 (0) ---
  18. send(7, "\f", 1, 0) = 1
  19. sigreturn() = ? (mask now [])
  20. clock_gettime(CLOCK_MONOTONIC, {118628, 699497204}) = 0
  21. gettimeofday({1407840610, 558937}, NULL) = 0
  22. epoll_wait(3, {{EPOLLIN, {u32=9, u64=9}}}, 32, -1) = 1
  23. clock_gettime(CLOCK_MONOTONIC, {118628, 699588603}) = 0
  24. gettimeofday({1407840610, 559023}, NULL) = 0
  25. recv(9, "\f", 1024, 0) = 1
  26. recv(9, 0xb60b4880, 1024, 0) = -1 EAGAIN (Resource temporarily unavailable)
  27. epoll_wait(3, 985f4f0, 32, -1) = -1 EINTR (Interrupted system call)
  28. --- SIGUSR2 (User defined signal 2) @ 0 (0) ---
  29. send(7, "\f", 1, 0) = 1
  30. sigreturn() = ? (mask now [])

3、其中每一行是一个系统调用,从这个信息中我们很容易看到进程在做一些什么事情,可以定位到进程卡在哪里,卡在链接还是读取网络数据等