AI Audit Log Reference

Kong AI Gateway provides a standardized logging format for AI plugins, enabling the emission of analytics events and facilitating the aggregation of AI usage analytics across various providers.

Log Formats

Each AI plugin returns a set of tokens.

All log entries include the following attributes:

  1. "ai": {
  2. "payload": { "request": "[$optional_payload_request_]" },
  3. "[$plugin_name_1]": {
  4. "payload": { "response": "[$optional_payload_response]" },
  5. "usage": {
  6. "prompt_token": 28,
  7. "total_tokens": 48,
  8. "completion_token": 20,
  9. "cost": 0.0038,
  10. "time_per_token": 133
  11. },
  12. "meta": {
  13. "request_model": "command",
  14. "provider_name": "cohere",
  15. "response_model": "command",
  16. "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
  17. "llm_latency": 2670
  18. }
  19. },
  20. "[$plugin_name_2]": {
  21. "payload": { "response": "[$optional_payload_response]" },
  22. "usage": {
  23. "prompt_token": 89,
  24. "total_tokens": 145,
  25. "completion_token": 56,
  26. "cost": 0.0012,
  27. "time_per_token": 87
  28. },
  29. "meta": {
  30. "request_model": "gpt-35-turbo",
  31. "provider_name": "azure",
  32. "response_model": "gpt-35-turbo",
  33. "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b",
  34. "llm_latency": 4927
  35. }
  36. }
  37. }

Log Details

Each log entry includes the following details:

PropertyDescription
ai.payload.requestThe request payload.
ai.[$plugin_name].payload.responseThe response payload.
ai.[$plugin_name].usage.prompt_tokenNumber of tokens used for prompting.
ai.[$plugin_name].usage.completion_tokenNumber of tokens used for completion.
ai.[$plugin_name].usage.total_tokensTotal number of tokens used.
ai.[$plugin_name].usage.costThe total cost of the request (input and output cost).
ai.[$plugin_name].usage.time_per_tokenThe average time to generate an output token, in milliseconds.
ai.[$plugin_name].meta.request_modelModel used for the AI request.
ai.[$plugin_name].meta.provider_nameName of the AI service provider.
ai.[$plugin_name].meta.response_modelModel used for the AI response.
ai.[$plugin_name].meta.plugin_idUnique identifier of the plugin.
ai.[$plugin_name].meta.llm_latencyThe time, in milliseconds, it took the LLM provider to generate the full response.
ai.[$plugin_name].cache.cache_statusThe cache status. This can be Hit, Miss, Bypass or Refresh.
ai.[$plugin_name].cache.fetch_latencyThe time, in milliseconds, it took to return a cache response.
ai.[$plugin_name].cache.embeddings_providerFor semantic caching, the provider used to generate the embeddings.
ai.[$plugin_name].cache.embeddings_modelFor semantic caching, the model used to generate the embeddings.
ai.[$plugin_name].cache.embeddings_latencyFor semantic caching, the time taken to generate the embeddings.

Caches logging

If you’re using the AI Semantic Cache plugin, logging will include some additional details about caching:

  1. "ai": {
  2. "payload": { "request": "[$optional_payload_request_]" },
  3. "[$plugin_name_1]": {
  4. "payload": { "response": "[$optional_payload_response]" },
  5. "usage": {
  6. "prompt_token": 28,
  7. "total_tokens": 48,
  8. "completion_token": 20,
  9. "cost": 0.0038,
  10. "time_per_token": 133
  11. },
  12. "meta": {
  13. "request_model": "command",
  14. "provider_name": "cohere",
  15. "response_model": "command",
  16. "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
  17. "llm_latency": 2670
  18. },
  19. "cache": {
  20. "cache_status": "Hit",
  21. "fetch_latency": 21
  22. }
  23. },
  24. "[$plugin_name_2]": {
  25. "payload": { "response": "[$optional_payload_response]" },
  26. "usage": {
  27. "prompt_token": 89,
  28. "total_tokens": 145,
  29. "completion_token": 56,
  30. "cost": 0.0012,
  31. },
  32. "meta": {
  33. "request_model": "gpt-35-turbo",
  34. "provider_name": "azure",
  35. "response_model": "gpt-35-turbo",
  36. "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b",
  37. },
  38. "cache": {
  39. "cache_status": "Hit",
  40. "fetch_latency": 444,
  41. "embeddings_provider": "openai",
  42. "embeddings_model": "text-embedding-3-small",
  43. "embeddings_latency": 424
  44. }
  45. }
  46. }

Note: When returning a cache response, time_per_token and llm_latency are omitted. The cache response can be returned either as a semantic cache or an exact cache. If it’s returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.