If you are experiencing S3 connector or workflow failures after adding a new S3 bucket or updating an existing S3 bucket, it could be due to S3 latency issues. You might need to wait up to a few hours before any related S3 connectors and workflows begin working without failures.Various Amazon S3 operations such as propagating DNS records for new buckets, updating bucket access policies and permissions, reusing bucket names after deletion, and using AWS Regions that are not geographically closer to your users or applications, can take a few minutes to hours to fully propagate across the Amazon network.
The preceding video does not show how to create an AWS account; enable anonymous access to the bucket (which is supported but not recommended); or generate AWS STS temporary access credentials if required by your organization’s security requirements. For more information about requirements, see the following:s3:ListBucket
and s3:GetObject
for that bucket. For write access, the authenticated AWS IAM user must have at minimum the permission of s3:PutObject
for that bucket. Permissions can be granted in one of the following ways:
AccessKeyId
), AWS secret access key (SecretAccessKey
), and AWS STS session token (SessionToken
). AWS STS credentials (consisting of an AWS access key, AWS secret access key, and AWS STS session token) can be valid for as little as 15 minutes or as long as 36 hours, depending on how the credentials were initially generated. After the expiry time, the credentials are no longer valid and will no longer work with the corresponding S3 connector. You must get a new set of credentials to replace the expired ones by having the user temporarily assume the role again by using the AWS CLI or an AWS SDK, which produces a new, refreshed temporary AWS access key, AWS secret access key, and AWS STS session token.To overwrite the expired credentials with the new set:
--key
, --secret
, and --token
(CLI) or key
, secret
, and token
(Python) in your command or code for the corresponding S3 source or destination connector.protocol://bucket/
(for example, s3://my-bucket/
). If the target files are in a folder, the path to the target folder in the S3 bucket, formatted as protocol://bucket/path/to/folder/
(for example, s3://my-bucket/my-folder/
).Your organization might have stricter bucket policy requirements. Check with your AWS account administrator if you are unsure.
root
to that specific username. In this policy, replace the following:
<my-account-id>
with your AWS account ID.<my-bucket-name>
in two places with the name of your bucket.{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAuthenticatedUsersInAccountReadWrite",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<my-account-id>:root"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<my-bucket-name>",
"arn:aws:s3:::<my-bucket-name>/*"
],
"Condition": {
"StringEquals": {
"aws:PrincipalType": "IAMUser"
}
}
}
]
}
Your organization might have stricter bucket policy requirements. Check with your AWS account administrator if you are unsure.
create-s3-bucket.yaml
. To change the following bucket policy to restrict it to a specific user in the AWS account, change root
to that specific username.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'CloudFormation template to create an S3 bucket with specific permissions for account users.'
Parameters:
BucketName:
Type: String
Description: 'Name of the S3 bucket to create'
Resources:
MyS3Bucket:
Type: 'AWS::S3::Bucket'
Properties:
BucketName: !Ref BucketName
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: false
IgnorePublicAcls: true
RestrictPublicBuckets: true
BucketPolicy:
Type: 'AWS::S3::BucketPolicy'
Properties:
Bucket: !Ref MyS3Bucket
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: AllowAllAuthenticatedUsersInAccount
Effect: Allow
Principal:
AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
Action:
- 's3:GetObject'
- 's3:PutObject'
- 's3:ListBucket'
- 's3:DeleteObject'
Resource:
- !Sub 'arn:aws:s3:::${BucketName}'
- !Sub 'arn:aws:s3:::${BucketName}/*'
Outputs:
BucketName:
Description: 'Name of the created S3 bucket'
Value: !Ref MyS3Bucket
Your organization might have stricter bucket policy requirements. Check with your AWS account administrator if you are unsure.
create-s3-bucket.sh
. To change the following bucket policy to restrict it to a specific user in the AWS account, change root
to that specific username. In this script, replace the following:
<my-account-id>
with your AWS account ID.<my-unique-bucket-name>
with the name of your bucket.<us-east-1>
with your AWS Region.#!/bin/bash
# Set variables for the AWS account ID, Amazon S3 bucket name, and AWS Region.
ACCOUNT_ID="<my-account-id>"
BUCKET_NAME="<my-unique-bucket-name>"
REGION="<us-east-1>"
# Temporary filename for the bucket policy.
# Do not change this variable.
POLICY_FILE="bucket_policy.json"
# Create the bucket.
aws s3api create-bucket --bucket $BUCKET_NAME --region $REGION
# Wait for the bucket to exist.
echo "Waiting for bucket '$BUCKET_NAME' to be fully created..."
aws s3api wait bucket-exists --bucket $BUCKET_NAME
# Check if the wait command was successful.
if [ $? -eq 0 ]; then
echo "The bucket '$BUCKET_NAME' has been fully created."
else
echo "Error: Timed out waiting for bucket '$BUCKET_NAME' to be created."
exit 1
fi
# Remove the "block public policy" bucket access setting.
aws s3api put-public-access-block \
--bucket $BUCKET_NAME \
--public-access-block-configuration \
'{"BlockPublicPolicy": false, "IgnorePublicAcls": false, "BlockPublicAcls": false, "RestrictPublicBuckets": false}'
# Check if the operation was successful.
if [ $? -eq 0 ]; then
echo "The block public policy access setting was removed from '$BUCKET_NAME'."
else
echo "Error: Failed to remove the block public policy access setting from '$BUCKET_NAME'."
exit 1
fi
# Create the bucket policy.
cat << EOF > $POLICY_FILE
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAuthenticatedUsersInAccountReadWrite",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$ACCOUNT_ID:root"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::$BUCKET_NAME",
"arn:aws:s3:::$BUCKET_NAME/*"
],
"Condition": {
"StringEquals": {
"aws:PrincipalType": "IAMUser"
}
}
}
]
}
EOF
# Apply the bucket policy.
aws s3api put-bucket-policy --bucket $BUCKET_NAME --policy file://$POLICY_FILE
# Check if the policy application was successful.
if [ $? -eq 0 ]; then
echo "The bucket policy was applied to '$BUCKET_NAME'."
else
echo "Error: Failed to apply the bucket policy to '$BUCKET_NAME'."
exit 1
fi
# Verify the applied policy.
echo "Verifying the applied policy:"
aws s3api get-bucket-policy --bucket $BUCKET_NAME --query Policy --output text
# Remove the temporary bucket policy file.
rm $POLICY_FILE
x-amz-meta-
and is followed by a unique name. For more information about how to add or replace user-defined metadata for a file in S3, see the following:
Unstructured outputs any user-defined metadata that it finds for a file into the metadata.data_source.record_locator.metadata
field of the document elements’ output for the corresponding file. For example, if Unstructured processes a file with the user-defined metadata x-amz-meta-mymetadata
name set to the value myvalue
, Unstructured outputs the following into the metadata.data_source.record_locator.metadata
field of the document elements’ output for the corresponding file:
[
{
"type": "...",
"element_id": "...",
"text": "...",
"metadata": {
"data_source": {
"record_locator": {
"metadata": {
"mymetadata": "myvalue"
}
}
}
}
}
]
Create the source connectorTo create an S3 source connector, see the following examples.
import os
from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import CreateSourceConnector
with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
response = client.sources.create_source(
request=CreateSourceRequest(
create_source_connector=CreateSourceConnector(
name="<name>",
type="s3",
config={
# For anonymous authentication:
# "anonymous": True,
# For AWS access key ID with AWS secret access key authentication:
# "key": "<key>",
# "secret": "<secret>",
# For AWS STS token authentication:
# "token": "<token>",
# "key": "<key>",
# "secret": "<secret>",
"remote_url": "<remote_url>",
"endpoint_url": "<endpoint-url>",
"recursive": <True|False> # Boolean: True or False, no quotes
}
)
)
)
print(response.source_connector_information)
Replace the preceding placeholders as follows:
<name>
(required) - A unique name for this connector.<key>
- The AWS access key ID for the authenticated AWS IAM user (required).<secret>
- The AWS secret access key corresponding to the preceding AWS access key ID (required).<token>
- The AWS STS session token for temporary access (required).<endpoint-url>
- A custom URL, if connecting to a non-AWS S3 bucket.<remote-url>
(required) - The S3 URI to the bucket or folder, formatted as s3://my-bucket/
(if the files are in the bucket’s root) or s3://my-bucket/my-folder/
.recursive
(source connector only), set to true
to access subfolders within the bucket. The default is false
if not otherwise specified.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4