A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/stackdriver/docs/solutions/slo-monitoring/sli-metrics/overview below:

Overview | Google Cloud Observability

Overview

Stay organized with collections Save and categorize content based on your preferences.

This section reviews the concept of service-level indicators (SLIs), defines what makes for a good or useful SLI, and provides examples of SLI implementations for selected services. This page is intended for people who want examples that implement service-specific SLIs.

Introduction to SLIs

The reliability of a service is an abstract notion; what reliability means depends on the service and the needs of its users. A service-level indicator (SLI) is a measure of that reliability to be used for both communicating about the reliability of the service and to manage the service.

SLIs are measured over a time window. The size of the window typically depends upon the decision the information is being used to make. For example, you might measure a single SLI in the following ways:

We recommend 28 days as a starting point for measuring your SLI; this value provides a good balance between the strategic and tactical use cases.

For more information, see the following sections of the Site Reliability Engineering Workbook:

Properties of a good SLI

We consider "good" SLIs to be those measures that meet the following criteria:

Promises

One way of implementing an SLI that has these properties is to think of the SLI in terms of promises made to your users. By counting the promises that you made and upheld over a time window, you can derive a number that ranges from 0% to 100%. Such SLIs also translate well into error budgets: for a given SLO, your error budget is the number of promises you can fail to uphold over a time window while still meeting your SLO.

Examples of promises include:

SLI specifications and implementations

An SLI specification is what you want to measure. The specification doesn't include the exact technical details of how you are going to measure it. For example, the following is a specification of an SLI for page-loading time:

There can be many ways to measure an SLI, each with trade-offs and benefits. The ways of measuring the SLI are the SLI Implementations. For example, you might implement the page-loading specification as one of the following:

Each of these choices involves trade-offs between the following characteristics:

Fidelity to user experience usually improves when the SLI is measured closer to the user. For example, the implementation that uses code in the user's browser results in a more accurate measurement of latency than the latency perceived by the user or by other measurement choices.

The tradeoff is that the browser-based measurement also includes any latency introduced by the user's connection to your service. For example, when a service is used over the public internet, this latency might vary significantly with geographic location or network conditions.

The result is that the browser-based signal is a good proxy for user happiness. However, this signal might not provide actionable information you can use to improve the reliability of your service.

For information about combining multiple measurements to balance this tradeoff, see this post from The Telegraph.

Bucketing

You might need multiple SLIs for a service when your service performs different kinds of work for different users, or when it performs a particular task with different possible outcomes.

Different tasks

Services that perform multiple types of work, for different categories of users, and in which each type of work influences user happiness differently benefit from multiple SLIs.

For example, if your service handles both read and write requests, users performing those tasks might have different requirements:

To capture these different requirements, your SLI must be able to distinguish between these two cases. Typically, the SLI metric has a label that you can use to classify values into one of several buckets.

One task with different outcomes

Services that perform a single type of work but where user expectations differ based on the response benefit from multiple SLIs.

For example, if your service offers only read access to data, users might have different tolerance for latency depending on the outcome of the request:

In this case, your latency SLI needs to be able to distinguish between successful and unsuccessful requests.

What's next

For information about implementing SLIs for Google Cloud services using Google Cloud metrics, see the following:

For information about implementing application-specific SLIs, see the following:

For an example that illustrates how to create a SLI for services that report custom metrics, see Setting SLOs: observability using custom metrics.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-11 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-11 UTC."],[],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4