This document introduces the structures used to represent services and SLOs in the SLO API and maps them to the concepts described in generally in Concepts in service monitoring.
The SLO API is used to set up service-level objectives (SLOs) that can be used to monitor the health of your services.
Service Monitoring adds the following resources to the Monitoring API:
For information on invoking the API, see Working with the API.
ServicesA service is represented by a Service
object. This object includes the following fields:
BasicService
types.To define a basic service, you specify the type of service and provide a set of service-specific labels that describe the service:
{ "serviceType": string, "serviceLabels": { string: string, ... } }
The following sections provide examples for each type of service.
Basic service typesThis section provides examples of services definitions built on the BasicService
type, where the value of the serviceType
field is one of the following:
APP_ENGINE
CLOUD_ENDPOINTS
CLUSTER_ISTIO
ISTIO_CANONICAL_SERVICE
CLOUD_RUN
Each of these service types uses the BasicSli
service-level indicator.
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "APP_ENGINE", "serviceLabels": { "module_id": "MODULE_ID" }, }, }Cloud Endpoints
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "CLOUD_ENDPOINTS", "serviceLabels": { "service": "SERVICE" }, }, }Cluster Istio
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "CLUSTER_ISTIO", "serviceLabels": { "location": "LOCATION", "cluster_name": "CLUSTER_NAME", "service_namespace": "SERVICE_NAMESPACE", "service_name": "SERVICE_NAME" }, }, }Istio Canonical Service
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "ISTIO_CANONICAL_SERVICE", "serviceLabels": { "mesh_uid": "MESH_UID", "canonical_service_namespace": "CANONICAL_SERVICE_NAMESPACE", "canonical_service": "CANONICAL_SERVICE" }, }, }Cloud Run
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "CLOUD_RUN", "serviceLabels": { "service_name": "SERVICE_NAME", "location": "LOCATION" }, }, }Basic GKE service types
This section contains examples of GKE service definitions built on the BasicService
type, where the value of the serviceType
field is one of the following:
GKE_NAMESPACE
GKE_WORKLOAD
GKE_SERVICE
You must define SLIs for these service types. They can't use BasicSli
service-level indicators. For more information, see Service-level indicators.
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "GKE_NAMESPACE", "serviceLabels": { "project_id": "PROJECT_ID", "location": "LOCATION", "cluster_name": "CLUSTER_NAME", "namespace_name": "NAMESPACE_NAME" } }, }GKE workload
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "GKE_WORKLOAD", "serviceLabels": { "project_id": "PROJECT_ID", "location": "LOCATION", "cluster_name": "CLUSTER_NAME", "namespace_name": "NAMESPACE_NAME", "top_level_controller_type": "TOPLEVEL_CONTROLLER_TYPE", "top_level_controller_name": "TOPLEVEL_CONTROLLER_NAME", } }, }GKE service
{ "displayName": "DISPLAY_NAME", "basicService": { "serviceType": "GKE_SERVICE", "serviceLabels": { "project_id": "PROJECT_ID", "location": "LOCATION", "cluster_name": "CLUSTER_NAME", "namespace_name": "NAMESPACE_NAME", "service_name": "SERVICE_NAME" } }, }Custom services
You can create custom services if none of the basic service types is suitable. A custom service looks like the following:
{ "displayName": "DISPLAY_NAME", "custom": {} }
You must define SLIs for these service types. They can't use BasicSli
service-level indicators. For more information, see Service-level indicators.
A service-level indicator (SLI) provides a measure of the performance of a service. An SLI is based on metric captured by the service. Exactly how the SLI is defined depends on the type of metric used as the indicator metric, but it is generally some comparison between acceptable results and total results.
A SLI is represented by the ServiceLevelIndicator
object. This object is a collective way to refer the three supported types of SLIs:
A basic SLI, which is created automatically for instances of the the BasicService
service type. This type of SLI is described in Service-level-objectives; it is represented by a BasicSli
object and measures availability or latency.
A request-based SLI, which you can use to count events that represent acceptable service. Use of this type of SLI is described in Request-based SLOs; it is represented by a RequestBasedSli
object.
A window-based SLI, which you can use to count periods of time that meet some goodness criterion. Use of this type of SLI is described in Windows-based SLOs; it is represented by a WindowsBasedSli
object.
For example, the following shows a basic availability SLI:
{ "basicSli": { "availability": {}, "location": [ "us-central1-c" ] } }Structures for request-based SLIs
A request-based SLI is based on a metric that counts units of service as a ratio between a particular outcome and the total. For example, if you use a metric that counts requests, you can build the ratio between the number of requests that return success and the total number of requests.
There are two ways to build a request-based SLI:
TimeSeriesRatio
, when the ratio of good service to total service is computed from two time series whose values have a metric kind of DELTA
or CUMULATIVE
.DistributionCut
, when the time series has value type DISTRIBUTION
and whose values have a metric kind of DELTA
or CUMULATIVE
. The good-service value is the count of items that fall into the histogram buckets in a specified range, and the total is the count of all values in the distribution.The following shows the JSON representation of an SLI that uses a time-series ratio:
{ "requestBased": { "goodTotalRatio": { "totalServiceFilter": "resource.type=https_lb_rule metric.type="loadbalancing.googleapis.com/https/request_count"", "goodServiceFilter": "resource.type=https_lb_rule metric.type="loadbalancing.googleapis.com/https/request_count" metric.label.response_code_class=200", } } }
The time series in this ratio are identified by the pair of monitored-resource type and metric type:
https_lb_rule
loadbalancing.googleapis.com/https/request_count
The value for the totalServiceFilter
is represented by the pair of metric and resource type. The value for the goodServiceFilter
is represented by the same pair but where some label has a particular value; in this case, when the value of the response_code_class
label is 200
.
The ratio between the filters measures the number of requests that return a 2xx HTTP status over the total number of requests.
The following shows the JSON representation of an SLI that uses a distribution cut:
{ "requestBased": { "distribution_cut": { "distribution_filter": "resource.type=https_lb_rule metric.type="loadbalancing.googleapis.com/https/backend_latencies" metric.label.response_code_class=200", "range": { "min": "-Infinity", "max": 500.0 } } } }
The time series is identified by the monitored-resource type, metric type, and value for a metric label:
https_lb_rule
loadbalancing.googleapis.com/https/backend_latencies
response_code_class
= 200
The range of latencies considered good is designated by the range
field. This SLI computes the ratio of latencies of 2xx-class responses below 500 to the latencies of all 200-class responses.
A windows-based SLI counts time windows in which the provided service is considered good. The criterion for determining how good service is part of the SLI definition.
All windows-based SLIs include a window period, 60–86,400 seconds (1 day).
There are two ways to specify the good-service criterion for a windows-based SLI:
true
. This filter is called the goodBadMetricFilter
.Create a PerformanceThreshold
object that represents a threshold for acceptable performance. This object is specified as the value of the goodTotalRatioThreshold
.
A PerformanceThreshold
object specifies a threshold value and a performance SLI. If the value of the performance SLI meets or exceeds the threshold value, then the time window counts as good.
There are two ways to specify the performance SLI:
BasicSli
object in the basicPerformanceSli
field.RequestBasedSli
object in the performance
field. If you use a request-based SLI, then the metric kind of your SLI must be DELTA
or CUMULATIVE
. You can't use GAUGE
metrics in request-based SLIs.The following shows the JSON representation a windows-based SLI built on a performance threshold for a basic availability SLI:
{ "windowsBased": { "goodTotalRatioThreshold": { "threshold": 0.9, "basicSliPerformance": { "availability": {}, "location": [ "us-central1-c" ] } }, "windowPeriod": "300s" } }
This SLI specifies good performance as a 5-minute window in which availability reaches 90% or better. The structure of a basic SLI is shown in Service-level indicators.
You can also embed a request-based SLI in the windows-based SLI. For more information on the embedded structures, see Structures for request-based SLIs.
Service-level objectivesA service-level objective (SLO) is represented by a ServiceLevelObjective
object. This object includes the following fields:
ServiceLevelIndicator
objectThe following shows the JSON representation of an SLO that uses a basic availability SLI as the value of the serviceLevelIndicator
field:
{ "name": "projects/PROJECT_NUMBER/services/PROJECT_ID-zone-us-central1-c-csm-main-default-currencyservice/serviceLevelObjectives/3kavNVTtTMuzL7KcXAxqCQ", "serviceLevelIndicator": { "basicSli": { "availability": {}, "location": [ "us-central1-c" ] } }, "goal": 0.98, "calendarPeriod": "WEEK", "displayName": "98% Availability in Calendar Week" }
This SLO sets the performance goal at 98 percent availability over a period of a week.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4