Google Cloud Observability includes observability services that help you to understand the behavior, health, and performance of your applications. Visibility into how applications behave and how components are connected help you to anticipate, identify, and respond to unexpected changes more quickly and effectively.
This document includes the following information:
Observability is a holistic approach to gathering and analyzing telemetry data in order to understand the state of your environment. Telemetry data is metrics, logs, traces, and other data generated by your applications and the application infrastructure that provide information about application health and performance. Application-centric observability refers to tools that let you visualize and analyze the telemetry data from the perspective of an application.
A log is a generated record of system or application activity over time. Each log is a collection of time stamped log entries, and each log entry describes an event at a specific point in time.
A log often contains rich, detailed information that helps you understand what happened with a specific part of your application. However, logs don't provide good information about how a change in one component of your application relates to activity in another component. Traces can help to bridge that gap.
Traces represent the path of a request across the parts of your distributed application. A metric or log entry in one application component that triggered an alert notification might be a symptom of a problem that originates in another component. Traces let you follow the flow of a request and examine latency data to help you to identify the root cause of an issue.
You can gain additional insights by analyzing metrics, logs, and traces in the context of other data. For example, a label for the severity of an alert or the customer ID associated with a request in logs provide context that can be useful for troubleshooting and debugging.
Monitoring, debugging, and troubleshooting distributed applications can be difficult because there are many systems and software components involved, often with a mix of open source and commercial software.
Observability tools help you to navigate this complexity by collecting meaningful data and providing features to explore, analyze, and correlate the data. An observable environment helps you to:
In short, an observable environment helps you to maintain application reliability. An application is reliable when it meets your current objectives for availability and resilience to failures.
To learn more about reliability practices, including principles and practices related to observability, read the book Site Reliability Engineering: How Google Runs Production Systems. Topics include:
Services in Google Cloud Observability help you to collect, analyze, and correlate telemetry data, both from your applications and from the underlying infrastructure. These services also provide built-in defaults to help you get started faster, such as default dashboards for your App Hub applications and preconfigured alerting policies.
Cloud Monitoring, Cloud Logging, and Cloud Trace are among the services enabled by default when you create a Google Cloud project.
Monitoring: Use collected metrics to monitor health and performance, identify trends and issues, and notify for changes in behavior.
Logging: Use collected logs to debug, troubleshoot, and gain insights about your applications.
Error Reporting: View and analyze errors from running cloud services:
Trace: View and analyze the flow and latency of application requests when you are debugging and troubleshooting.
Cloud Profiler: Analyze CPU and memory usage for your applications so that you can identify opportunities to improve performance.
This section describes steps you can take to get familiar with observability features in Google Cloud.
Try the quickstartsTry the quickstarts to get familiar with the available services.
Look at automatically collected dataMost Google Cloud services automatically generate predefined metrics and logs. This means that you can start looking at some observability data for supported Google Cloud services without additional configuration.
You can also chart collected metrics in Metrics Explorer, view logs in Logs Explorer, or view traces in Trace. To review related data together, create custom dashboards. For example, you can create a dashboard that includes logs, performance metrics, and alerting policies for virtual machines.
Configure Compute Engine VMs to collect additional dataCompute Engine VMs only collect basic system metrics and logs by default without the Ops agent
Install the Ops Agent to collect additional telemetry data (logs, metrics, and traces) from your Compute Engine instances and applications for troubleshooting, performance monitoring, and alerting.
By default, GKE clusters send system logs and system metrics to Logging and Monitoring. Google Cloud Managed Service for Prometheus handles collection of third-party and user-defined metrics.
If you have a have a Cloud Run service that writes Prometheus metrics, then you can use the Prometheus sidecar to send the metrics to Cloud Monitoring.
If your Cloud Run service writes OTLP metrics instead, then you can use an OpenTelemetry sidecar. For an example, see the tutorial for collecting OTLP metrics by using the sidecar.
Instrument your applicationsInstrumentation is code that you add to an application to emit telemetry data. There are several open-source instrumentation frameworks let you collect metrics, logs, and traces from your application and send that data to any vendor, including Google Cloud. However, you might not need to instrument your application. For example, Cloud Run, Cloud Run functions, and App Engine provide automatic tracing.
To instrument your application, we recommend that you use a vendor-neutral instrumentation framework that is open source, such as OpenTelemetry, instead of vendor- and product-specific APIs or client libraries. For information about instrumenting your application, see Instrumentation and observability.
For code samples that illustrate how to instrument your application to send telemetry to Google Cloud, see the following:
You might also be interested in exploring the following topics:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.5