A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.aws.amazon.com/step-functions/latest/dg/input-output-itemreader.html below:

ItemReader (Map) - AWS Step Functions

ItemReader (Map)

The ItemReader field is a JSON object, which specifies a dataset and its location. A Distributed Map state uses this dataset as its input.

The following example shows the syntax of the ItemReader field in a JSONPath-based workflow, for a dataset in a text delimited file that's stored in an Amazon S3 bucket.

"ItemReader": {
    "ReaderConfig": {
        "InputType": "CSV",
        "CSVHeaderLocation": "FIRST_ROW"
    },
    "Resource": "arn:aws:states:::s3:getObject",
    "Parameters": {
        "Bucket": "myBucket",
        "Key": "csvDataset/ratings.csv",
        "VersionId": "BcK42coT2jE1234VHLUvBV1yLNod2OEt"
    }
}

The following example shows that in JSONata-based workflows, Parameters is replaced with Arguments.

"ItemReader": {
    "ReaderConfig": {
        "InputType": "CSV",
        "CSVHeaderLocation": "FIRST_ROW"
    },
    "Resource": "arn:aws:states:::s3:getObject",
    "Arguments": {
        "Bucket": "amzn-s3-demo-bucket",
        "Key": "csvDataset/ratings.csv"
    }
}

Tip

In Workflow Studio, you specify the dataset and its location in the Item source field.

Contents of the ItemReader field

Depending on your dataset, the contents of the ItemReader field varies. For example, if your dataset is a JSON array passed from a previous step in the workflow, the ItemReader field is omitted. If your dataset is an Amazon S3 data source, this field contains the following sub-fields.

ReaderConfig

A JSON object that specifies the following details:

Resource

The Amazon S3 API action that Step Functions must invoke depending on the specified dataset.

Parameters

A JSON object that specifies the Amazon S3 bucket name and object key that the dataset is stored in. In this field, you can also provide the Amazon S3 object version, if the bucket has versioning enabled.

Important

Make sure that your Amazon S3 buckets are in the same AWS account and AWS Region as your state machine.

Note that even though your state machine may be able to access files in buckets across different AWS accounts that are in the same AWS Region, Step Functions only supports state machines to list objects in S3 buckets that are in both the same AWS account and the same AWS Region as the state machine.

Examples of datasets

You can specify one of the following options as your dataset:

Important

Step Functions needs appropriate permissions to access the Amazon S3 datasets that you use. For information about IAM policies for the datasets, see IAM policies for datasets.

A Distributed Map state can accept a JSON input passed from a previous step in the workflow. This input must either be an array, or must contain an array within a specific node. To select a node that contains the array, you can use the ItemsPath (Map, JSONPath only) field.

To process individual items in the array, the Distributed Map state starts a child workflow execution for each array item. The following tabs show examples of the input passed to the Map state and the corresponding input to a child workflow execution.

Note

Step Functions omits the ItemReader field when your dataset is a JSON array from a previous step.

Input passed to the Map state

Consider the following JSON array of three items.

"facts": [
    {
        "verdict": "true",
        "statement_date": "6/11/2008",
        "statement_source": "speech"
    },
    {
        "verdict": "false",
        "statement_date": "6/7/2022",
        "statement_source": "television"
    },
    {
        "verdict": "mostly-true",
        "statement_date": "5/18/2016",
        "statement_source": "news"
    }
]
Input passed to a child workflow execution

The Distributed Map state starts three child workflow executions. Each execution receives an array item as input. The following example shows the input received by a child workflow execution.

{
  "verdict": "true",
  "statement_date": "6/11/2008",
  "statement_source": "speech"
}

A Distributed Map state can iterate over the objects that are stored in an Amazon S3 bucket. When the workflow execution reaches the Map state, Step Functions invokes the ListObjectsV2 API action, which returns an array of the Amazon S3 object metadata. In this array, each item contains data, such as ETag and Key, for the data stored in the bucket.

To process individual items in the array, the Distributed Map state starts a child workflow execution. For example, suppose that your Amazon S3 bucket contains 100 images. Then, the array returned after invoking the ListObjectsV2 API action contains 100 items. The Distributed Map state then starts 100 child workflow executions to process each array item.

Note

The following tabs show examples of the ItemReader field syntax and the input passed to a child workflow execution for this dataset.

ItemReader syntax

In this example, you've organized your data, which includes images, JSON files, and objects, within a prefix named processData in an Amazon S3 bucket named amzn-s3-demo-bucket.

"ItemReader": {
    "Resource": "arn:aws:states:::s3:listObjectsV2",
    "Parameters": {
        "Bucket": "amzn-s3-demo-bucket",
        "Prefix": "processData"
    }
}
Input passed to a child workflow execution

The Distributed Map state starts as many child workflow executions as the number of items present in the Amazon S3 bucket. The following example shows the input received by a child workflow execution.

{
  "Etag": "\"05704fbdccb224cb01c59005bebbad28\"",
  "Key": "processData/images/n02085620_1073.jpg",
  "LastModified": 1668699881,
  "Size": 34910,
  "StorageClass": "STANDARD"
}

A Distributed Map state can accept a JSON file that's stored in an Amazon S3 bucket as a dataset. The JSON file must contain an array.

When the workflow execution reaches the Map state, Step Functions invokes the GetObject API action to fetch the specified JSON file. The Map state then iterates over each item in the array and starts a child workflow execution for each item. For example, if your JSON file contains 1000 array items, the Map state starts 1000 child workflow executions.

Note

The following tabs show examples of the ItemReader field syntax and the input passed to a child workflow execution for this dataset.

For this example, imagine you have a JSON file named factcheck.json. You've stored this file within a prefix named jsonDataset in an Amazon S3 bucket. The following is an example of the JSON dataset.

[
  {
    "verdict": "true",
    "statement_date": "6/11/2008",
    "statement_source": "speech"
  },
  {
    "verdict": "false",
    "statement_date": "6/7/2022",
    "statement_source": "television"
  },
  {
    "verdict": "mostly-true",
    "statement_date": "5/18/2016",
    "statement_source": "news"
  },
  ...
]
ItemReader syntax
"ItemReader": {
    "Resource": "arn:aws:states:::s3:getObject",
    "ReaderConfig": {
        "InputType": "JSON"
    },
    "Parameters": {
        "Bucket": "amzn-s3-demo-bucket",
        "Key": "jsonDataset/factcheck.json"
    }
}
Input to a child workflow execution

The Distributed Map state starts as many child workflow executions as the number of array items present in the JSON file. The following example shows the input received by a child workflow execution.

{
  "verdict": "true",
  "statement_date": "6/11/2008",
  "statement_source": "speech"
}

A Distributed Map state can accept a JSON Lines file that's stored in an Amazon S3 bucket as a dataset.

Note

The following tabs show examples of the ItemReader field syntax and the input passed to a child workflow execution for this dataset.

For this example, imagine you have a JSON Lines file named factcheck.jsonl. You've stored this file within a prefix named jsonlDataset in an Amazon S3 bucket. The following is an example of the file's contents.

{"verdict": "true", "statement_date": "6/11/2008", "statement_source": "speech"} 
{"verdict": "false", "statement_date": "6/7/2022", "statement_source": "television"}
{"verdict": "mostly-true", "statement_date": "5/18/2016", "statement_source": "news"}
ItemReader syntax
"ItemReader": {
    "Resource": "arn:aws:states:::s3:getObject",
    "ReaderConfig": {
        "InputType": "JSONL"
    },
    "Parameters": {
        "Bucket": "amzn-s3-demo-bucket",
        "Key": "jsonlDataset/factcheck.jsonl"
    }
}
Input to a child workflow execution

The Distributed Map state starts as many child workflow executions as the number of lines present in the JSONL file. The following example shows the input received by a child workflow execution.

{
  "verdict": "true",
  "statement_date": "6/11/2008",
  "statement_source": "speech"
}

Note

The CSVDelimiter field enables ItemReader more flexibility to support files that are delimited by other characters besides the comma. Therefore, assume that our references to CSV files in relation to ItemReader also include files that use delimiters accepted by the CSVDelimiter field.

A Distributed Map state can accept a text delimited file that's stored in an Amazon S3 bucket as a dataset. If you use a text delimited file as your dataset, you need to specify a column header. For information about how to specify a header, see Contents of the ItemReader field.

Step Functions parses text delimited files based on the following rules:

For more information about how Step Functions parses a text delimited file, see Example of parsing an input CSV file.

When the workflow execution reaches the Map state, Step Functions invokes the GetObject API action to fetch the specified file. The Map state then iterates over each row in the file and starts a child workflow execution to process the items in each row. For example, suppose that you provide a text delimited file that contains 100 rows as input. Then, the interpreter passes each row to the Map state. The Map state processes items in serial order, starting after the header row.

Note

The following tabs show examples of the ItemReader field syntax and the input passed to a child workflow execution for this dataset.

ItemReader syntax

For example, say that you have a CSV file named ratings.csv. Then, you've stored this file within a prefix that's named csvDataset in an Amazon S3 bucket.

{
  "ItemReader": {
    "ReaderConfig": {
      "InputType": "CSV",
      "CSVHeaderLocation": "FIRST_ROW",
      "CSVDelimiter": "PIPE"
    },
    "Resource": "arn:aws:states:::s3:getObject",
    "Parameters": {
      "Bucket": "amzn-s3-demo-bucket",
      "Key": "csvDataset/ratings.csv"
    }
  }
}
Input to a child workflow execution

The Distributed Map state starts as many child workflow executions as the number of rows present in the CSV file, excluding the header row, if in the file. The following example shows the input received by a child workflow execution.

{
  "rating": "3.5",
  "movieId": "307",
  "userId": "1",
  "timestamp": "1256677221"
}

A Distributed Map state can accept an Amazon S3 inventory manifest file that's stored in an Amazon S3 bucket as a dataset.

When the workflow execution reaches the Map state, Step Functions invokes the GetObject API action to fetch the specified Amazon S3 inventory manifest file. The Map state then iterates over the objects in the inventory to return an array of Amazon S3 inventory object metadata.

Note

The following is an example of an inventory file in CSV format. This file includes the objects named csvDataset and imageDataset, which are stored in an Amazon S3 bucket that's named amzn-s3-demo-source-bucket.

"amzn-s3-demo-source-bucket","csvDataset/","0","2022-11-16T00:27:19.000Z"
"amzn-s3-demo-source-bucket","csvDataset/titles.csv","3399671","2022-11-16T00:29:32.000Z"
"amzn-s3-demo-source-bucket","imageDataset/","0","2022-11-15T20:00:44.000Z"
"amzn-s3-demo-source-bucket","imageDataset/n02085620_10074.jpg","27034","2022-11-15T20:02:16.000Z"
...

Important

Currently, Step Functions doesn't support user-defined Amazon S3 inventory report as a dataset. You must also make sure that the output format of your Amazon S3 inventory report is CSV. For more information about Amazon S3 inventories and how to set them up, see Amazon S3 Inventory in the Amazon S3 User Guide.

The following example of an inventory manifest file shows the CSV headers for the inventory object metadata.

{
  "sourceBucket" : "amzn-s3-demo-source-bucket",
  "destinationBucket" : "arn:aws:s3:::amzn-s3-demo-inventory",
  "version" : "2016-11-30",
  "creationTimestamp" : "1668560400000",
  "fileFormat" : "CSV",
  "fileSchema" : "Bucket, Key, Size, LastModifiedDate",
  "files" : [ {
    "key" : "amzn-s3-demo-bucket/destination-prefix/data/20e55de8-9c21-45d4-99b9-46c732000228.csv.gz",
    "size" : 7300,
    "MD5checksum" : "a7ff4a1d4164c3cd55851055ec8f6b20"
  } ]
}

The following tabs show examples of the ItemReader field syntax and the input passed to a child workflow execution for this dataset.

ItemReader syntax
{
  "ItemReader": {
    "ReaderConfig": {
      "InputType": "MANIFEST"
    },
    "Resource": "arn:aws:states:::s3:getObject",
    "Parameters": {
      "Bucket": "amzn-s3-demo-destination-bucket",
      "Key": "destination-prefix/amzn-s3-demo-bucket/config-id/YYYY-MM-DDTHH-MMZ/manifest.json"
    }
  }
}
Input to a child workflow execution
{
  "LastModifiedDate": "2022-11-16T00:29:32.000Z",
  "Bucket": "amzn-s3-demo-source-bucket",
  "Size": "3399671",
  "Key": "csvDataset/titles.csv"
}

Depending on the fields you selected while configuring the Amazon S3 inventory report, the contents of your manifest.json file may vary from the example shown.

IAM policies for datasets

When you create workflows with the Step Functions console, Step Functions can automatically generate IAM policies based on the resources in your workflow definition. These policies include the least privileges necessary to allow the state machine role to invoke the StartExecution API action for the Distributed Map state. These policies also include the least privileges necessary Step Functions to access AWS resources, such as Amazon S3 buckets and objects and Lambda functions. We highly recommend that you include only those permissions that are necessary in your IAM policies. For example, if your workflow includes a Map state in Distributed mode, scope your policies down to the specific Amazon S3 bucket and folder that contains your dataset.

Important

If you specify an Amazon S3 bucket and object, or prefix, with a reference path to an existing key-value pair in your Distributed Map state input, make sure that you update the IAM policies for your workflow. Scope the policies down to the bucket and object names the path resolves to at runtime.

The following IAM policy examples grant the least privileges required to access your Amazon S3 datasets using the ListObjectsV2 and GetObject API actions.

Example IAM policy for Amazon S3 object as dataset

The following example shows an IAM policy that grants the least privileges to access the objects organized within processImages in an Amazon S3 bucket named amzn-s3-demo-bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "processImages"
                    ]
                }
            }
        }
    ]
}
Example IAM policy for a CSV file as dataset

The following example shows an IAM policy that grants least privileges to access a CSV file named ratings.csv.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket/csvDataset/ratings.csv"
            ]
        }
    ]
}
Example IAM policy for an Amazon S3 inventory as dataset

The following example shows an IAM policy that grants least privileges to access an Amazon S3 inventory report.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::destination-prefix/amzn-s3-demo-bucket/config-id/YYYY-MM-DDTHH-MMZ/manifest.json",
                "arn:aws:s3:::destination-prefix/amzn-s3-demo-bucket/config-id/data/*"
            ]
        }
    ]
}

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4