A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_monitoring.html below:

Website Navigation


Amazon SageMaker Model Monitor — sagemaker 2.250.0 documentation

Amazon SageMaker Model Monitor

Amazon SageMaker Model Monitor allows you to create a set of baseline statistics and constraints using the data with which your model was trained, then set up a schedule to monitor the predictions made on your endpoint.

Contents

Background

Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that encompasses the entire machine learning workflow. You can label and prepare your data, choose an algorithm, train a model, and then tune and optimize it for deployment. You can deploy your models to production with Amazon SageMaker to make predictions at lower costs than was previously possible.

Amazon SageMaker Model Monitor enables you to capture the input, output and metadata for the invocations of the models that you deploy. It also enables you to analyze the data and monitor its quality. In this notebook, you learn how Amazon SageMaker enables these capabilities.

Setup

To get started, you must satisfy the following prerequisites:

Capture real-time inference data from Amazon SageMaker endpoints

To enable data capture for monitoring the model data quality, specify the new capture option called DataCaptureConfig when deploying to an endpoint. You can choose to capture the request payload, the response payload or both with this configuration. The capture config applies to all variants. For more about the DataCaptureConfig object, see the API documentation.

from sagemaker.model_monitor import DataCaptureConfig

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri='s3://path/for/data/capture'
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.m4.xlarge',
    data_capture_config=data_capture_config
)

When you invoke the endpoint, the request and response payload, along with some additional metadata, is saved in the Amazon S3 location that you have specified in the DataCaptureConfig. You should expect to see different files from different time periods, organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl

You can use the S3Downloader utility to view and download the captured data in Amazon S3:

from sagemaker.s3 import S3Downloader

# Invoke the endpoint
predictor.predict(data)

# Get a list of S3 URIs
S3Downloader.list('s3://path/for/data/capture')

# Read a specific file
S3Downloader.read_file('s3://path/for/data/capture/endpoint-name/variant-name/2020/01/01/00/filename.jsonl')

The contents of the single captured file should be all the data captured in an Amazon SageMaker-specific JSON-line formatted file. Each inference request is captured in a single line in the jsonl file. The line contains both the input and output merged together.

Baselining and continuous monitoring

In addition to collecting the data, Amazon SageMaker provides the capability for you to monitor and evaluate the data observed by the endpoints. Two tasks are needed for this:

Constraint suggestion with baseline/training dataset

You can ask Amazon SageMaker to suggest a set of baseline constraints and generate descriptive statistics that characterize the data in a training dataset stored in Amazon S3. DefaultModelMonitor.suggest_baseline() starts a Processing Job using a Model Monitor container provided by Amazon SageMaker to generate the constraints. You can read more about suggest_baseline() in the API documentation.

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

my_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_monitor.suggest_baseline(
    baseline_dataset='s3://path/to/training-dataset-with-header.csv',
    dataset_format=DatasetFormat.csv(header=True),
)

With the monitor object, you can also explore the generated constraints and statistics:

import pandas as pd

baseline_job = my_monitor.latest_baselining_job
schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df.head(10)

constraints_df = pd.io.json.json_normalize(baseline_job.suggested_constraints().body_dict["features"])
constraints_df.head(10)
Analyze the data collected for data quality issues

You can also analyze and monitor the data with Monitoring Schedules.

Using DefaultMonitor.create_monitoring_schedule(), you can create a model monitoring schedule for an endpoint that compares the baseline resources (constraints and statistics) against the realtime traffic. For more about this method, see the API documentation.

from sagemaker.model_monitor import CronExpressionGenerator

my_monitor.create_monitoring_schedule(
    monitor_schedule_name='my-monitoring-schedule',
    endpoint_input=predictor.endpoint_name,
    statistics=my_monitor.baseline_statistics(),
    constraints=my_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)

The schedule starts jobs at the specified interval.

Note

Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. This is expected and done for load balancing on the backend.

Once the executions have started, you can use list_executions() to view them:

executions = my_monitor.list_executions()

You can also view the status of a specific execution:

latest_execution = executions[-1]

latest_execution.describe()['ProcessingJobStatus']
latest_execution.describe()['ExitMessage']

Here are the possible terminal states and what each of them means:

You can also get the S3 URI for the output with latest_execution.output.destination and analyze the results.

Visualize results

You can use the monitor object to gather reports for visualization:

suggested_constraints = my_monitor.suggested_constraints()
baseline_statistics = my_monitor.baseline_statistics()

latest_monitoring_violations = my_monitor.latest_monitoring_constraint_violations()
latest_monitoring_statistics = my_monitor.latest_monitoring_statistics()

For a tutorial on how to visualize the results, see SageMaker Model Monitor - visualizing monitoring results.

Delete the resources

When deleting an endpoint, you need to first delete the monitoring schedule:

my_monitor.delete_monitoring_schedule()

predictor.delete_endpoint()
predictor.delete_model()
Learn More Further documentation Notebook examples

Consult our notebook examples for in-depth tutorials: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_model_monitor


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4