This page covers the basics of emitting logs to create availability and latency SLIs. It also provides implementation examples of how to define SLOs using logs-based metrics.
Using data elements in log entries to create service-level indicators is one way to take advantage of existing log payloads. Otherwise, it may be possible to add logging to an existing service, which may be easier than creating metric instrumentation.
Logs and metricsLogs collect records called log entries that describe specific events that take place in computer systems. Logs are written by code, by the platform services the code is running on (for example, Dataflow), and the infrastructure the platform depends on (for example, Compute Engine instances).
Because logs in modern systems descend from—and sometimes still are—text files written to disk, a log entry is analogous to a line in a log file and can be considered the quantum unit of logging.
A log entry minimally consists of two things:
Logs can also carry associated metadata, especially when they're ingested into Cloud Logging. Such metadata might include the resource that's writing the log, the log name, and a severity for each entry.
LogsLogs are used for two main purposes:
Unlike logs, metrics usually don't describe specific events. More commonly, metrics are used to represent the state or health of a system over time. A metric is made up of a series of data points that measure something about your system; each data point includes a timestamp and a numeric value.
Metrics can also have metadata associated with them; the series of data points, referred to as a time series, might include information like the metric name, a description, and often labels that specify which resource is writing the data. For information on the Monitoring metric model, see Metrics, time series, and resources.
Logs-based metricsLogs-based metrics are metrics created from log entries by extracting information from log entries and transforming it into time-series data. Cloud Logging provides mechanisms for creating two kinds of metrics from log entries:
Counter metrics, which count the number of log entries that match a particular filter. You can use a counter metric to determine, for example, the number of requests or errors recorded in the log.
Distribution metrics, which use regular expressions to parse the payload in each log entry to extract numeric values as a distribution.
For more information on logs-based metrics in Cloud Logging, see Using logs-based metrics.
Using log-based metrics as SLIsLogs-based metrics let you extract data from logs in a form you can use for building SLIs in Monitoring:
You can use logs-based counter metrics to express a request-based availability SLI.
You can use a logs-based distribution metric to express a request-based latency SLI.
The Stack Doctor application is an example of a service instrumented to emit log messages that contain information about all requests, errors, and latency made to the service. The code for the service is available in the stack-doctor
GitHub repository.
The service generates Cloud Logging log entries in the projects/stack-doctor/logs/bunyan_log
log. The log entry for each type of event includes a different message
value. The log entries for different types of events look like the following:
On every request:
{
"insertId": "..........iTRVT5MOK2VOsVe31bzrTD",
"jsonPayload": {
"pid": 81846,
"time": "Mon Aug 31 2020 20:30:49 GMT-0700 (Pacific Daylight Time)",
"hostname": "<hostname>",
"level": 30,
"message": "request made",
"v": 0,
"name": "sli-log"
},
"resource": {
"type": "global",
"labels": {
"project_id": "stack-doctor"
}
},
"timestamp": "2020-09-01T03:30:49.263999938Z",
"severity": "INFO",
"logName": "projects/stack-doctor/logs/bunyan_log",
"receiveTimestamp": "2020-09-01T03:30:50.003471183Z"
}
On successful requests:
{
"insertId": "..........qTRVT5MOK2VOsVe31bzrTD",
"jsonPayload": {
"name": "sli-log",
"v": 0,
"pid": 81846,
"level": 30,
"hostname": "<hostname>",
"time": "Mon Aug 31 2020 20:30:49 GMT-0700 (Pacific Daylight Time)",
"message": "success!"
},
"resource": {
"type": "global",
"labels": {
"project_id": "stack-doctor"
}
},
"timestamp": "2020-09-01T03:30:49.874000072Z",
"severity": "INFO",
"logName": "projects/stack-doctor/logs/bunyan_log",
"receiveTimestamp": "2020-09-01T03:30:50.201547371Z"
}
On completed requests:
{
"insertId": "..........mTRVT5MOK2VOsVe31bzrTD",
"jsonPayload": {
"time": "Mon Aug 31 2020 20:30:49 GMT-0700 (Pacific Daylight Time)",
"level": 30,
"name": "sli-log",
"message": "slept for 606 ms",
"hostname": "<hostname>",
"pid": 81846,
"v": 0
},
"resource": {
"type": "global",
"labels": {
"project_id": "stack-doctor"
}
},
"timestamp": "2020-09-01T03:30:49.874000072Z",
"severity": "INFO",
"logName": "projects/stack-doctor/logs/bunyan_log",
"receiveTimestamp": "2020-09-01T03:30:50.201547371Z"
}
On error:
{
"insertId": "..........DTRVT5MOK2VOsVe31bzrTD",
"jsonPayload": {
"hostname": "<hostname>",
"level": 50,
"pid": 81846,
"message": "failure!",
"name": "sli-log",
"time": "Mon Aug 31 2020 20:30:44 GMT-0700 (Pacific Daylight Time)",
"v": 0
},
"resource": {
"type": "global",
"labels": {
"project_id": "stack-doctor"
}
},
"timestamp": "2020-09-01T03:30:44.414999961Z",
"severity": "ERROR",
"logName": "projects/stack-doctor/logs/bunyan_log",
"receiveTimestamp": "2020-09-01T03:30:46.182157077Z"
}
Based on these entries, you can create logs-based metrics that count all requests, count errors, and track request latency. You can then use the logs-based metrics to create availability and latency SLIs.
Creating logs-based metrics for SLIs.Before you can create SLIs on logs-based metrics, you must create the logs-based metrics.
After you create the your logs-based metrics, you can find them in Monitoring by searching for them in Metrics Explorer. In Monitoring, logs-based metrics have the prefix logging.googleapis.com/user
.
global
monitored resource, which is generally inadvisable. For a real deployment, take care to emit logs against an existing monitored resource. Metrics for availability SLIs
You express a request-based availability SLI in the Cloud Monitoring API by using the TimeSeriesRatio
structure to set up a ratio of "good" or "bad" requests to total requests. This ratio is used in the goodTotalRatio
field of a RequestBasedSli
structure.
You must create logs-based counter metrics that can be used to construct this ratio. You must create at least two of the following:
A metric that counts total events; use this metric in the ratio's totalServiceFilter
.
For the "stack-doctor" example, you can create a logs-based metric that counts log entries in which the message string "request made" appears.
A metric that counts "bad" events, use this metric in the ratio's badServiceFilter
.
For the "stack-doctor" example, you can create a logs-based metric that counts log entries in which the message string "failure!" appears.
A metric that counts "good" events, use this metric in the ratio's goodServiceFilter
.
For the "stack-doctor" example, you can create a logs-based metric that counts log entries in which the message string "success!" appears.
The SLI described for this example is based on a metric for total requests named log_based_total_requests
, and a metric for errors named log_based_errors
.
You can create logs-based metrics by using the Google Cloud console, the Cloud Logging API or the Google Cloud CLI. To create logs-based counter metrics by using the Google Cloud console, you can use the following procedure:
In the Google Cloud console, go to the Log-based Metrics page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
The logs-based metrics page shows a table of user-defined metrics and a table of system-defined metrics.
Click Create Metric, located above the table of user-defined metrics.
In the Metric type pane, select Counter.
In the Details pane, give your new metric a name. For the "stack-doctor" example, enter log_based_total_requests
or log_based_errors
.
You can ignore the other fields for this example.
In the Filter selection panel, create a query that retrieves only the log entries that you want to count in your metric.
For the "stack-doctor" example, the query for log_based_total_requests
might include the following:
resource.type="global" logName="projects/stack-doctor/logs/bunyan_log" jsonPayload.message="request made"
The query for logs_based_errors
changes the message string:
resource.type="global" logName="projects/stack-doctor/logs/bunyan_log" jsonPayload.message="failure!"
Click Preview logs to check your filter, and adjust it if necessary.
Ignore the Labels pane for this example.
Click Create Metric to finish the procedure.
For more information on creating logs-based counter metrics, see Creating a counter metric.
Metrics for latency SLIsYou express a request-based latency SLI in the Cloud Monitoring API by using a DistributionCut
structure, which is used in the distributionCut
field of a RequestBasedSli
structure. You must create a logs-based distribution metric to create a latency SLI. This example create a logs-based distribution metric named log_based_latency.
You can create logs-based metrics by using the Google Cloud console, the Cloud Logging API or the Google Cloud CLI. To create logs-based distribution metrics by using the Google Cloud console, you can use the following procedure:
In the Google Cloud console, go to the Log-based Metrics page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
The logs-based metrics page shows a table of user-defined metrics and a table of system-defined metrics.
Click Create Metric, located above the table of user-defined metrics.
In the Metric type pane, select Distribution.
In the Details pane, give your new metric a name. For the "stack-doctor" example, enter log_based_latency
.
You can ignore the other fields for this example.
In the Filter selection panel, create a query that retrieves only the log entries that you want to count in your metric.
For the "stack-doctor" example, the query for log_based_latency
might include the following:
resource.type="global" logName="projects/stack-doctor/logs/bunyan_log" jsonPayload.message="slept for"
Specify the following fields for the filter query:
json.message
Regular expression: \s(\d*)\s
The message string for completed requests has the form "slept for n ms". The regular expression extracts the latency value n from the string.
Ignore the Labels pane for this example.
Click Create Metric to finish the procedure.
For more information on creating logs-based distribution metrics, see Creating Distribution metrics.
Availability SLIsIn Cloud Monitoring, you express a request-based availability SLI by using a TimeSeriesRatio
structure. The following example shows an SLO that uses the log_based_total_requests
and log_based_errors
metrics in the ratio. This SLO expects that the ratio of "good"-to-total requests is at least 98% over a rolling 24-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"totalServiceFilter":
"metric.type=\"logging.googleapis.com/user/log_based_total_requests\"
resource.type=\"global\"",
"badServiceFilter":
"metric.type=\"logging.googleapis.com/user/log_based_errors\"
resource.type=\"global\""
}
}
},
"goal": 0.98,
"rollingPeriod": "86400s",
"displayName": "Log-Based Availability"
}
Latency SLIs
In Cloud Monitoring, you express a request-based latency SLI by using a DistributionCut
structure. The following example shows an SLO that uses the log_based_latency
metric and expects that 98% of requests are under 500 ms over a rolling 24-hour period:
{
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter":
"metric.type=\"logging.googleapis.com/user/log_based_latency\"
resource.type=\"global\"",
"range": {
"min": 0,
"max": 500
}
}
}
},
"goal": 0.98,
"rollingPeriod": "86400s",
"displayName": "98% requests under 500 ms"
}
Additional resources
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4