Query data from Cross Region Cross Account AWS Glue Data catalog

  1. You account or etl role would need permissions on source S3 bucket itself on source account, specially List and Get.
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<you account id>:root"
},
"Action": [
"s3:List*",
"s3:Get*"
],
"Resource": [
"arn:aws:s3:::<source bucket in us-west-2>",
"arn:aws:s3:::<source bucket in us-west-2>/*"
]
}
]
}
{
"Version": "2012-10-17",
"Id": "key-consolepolicy",
"Statement": [{
"Sid": "Enable IAM User Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<source account id>:root"
},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Allow access for Key Administrators",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<source account id>:role/Admin"
},
"Action": [
"kms:Create*",
"kms:Describe*",
"kms:Enable*",
"kms:List*",
"kms:Put*",
"kms:Update*",
"kms:Revoke*",
"kms:Disable*",
"kms:Get*",
"kms:Delete*",
"kms:TagResource*",
"kms:UntagResource*",
"kms:ScheduleKeyDeletion*",
"kms:CancelKeyDeletion*"
],
"Resource": "*"
},
{
"Sid": "Allow use of the key",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<source account id>:role/Admin",
"arn:aws:iam::<your account id>:root"
]
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt",
"kms:GenerateDataKey",
"kms:DescribeKey"
],
"Resource": "*"
},
{
"Sid": "Allow attachment of persistent resources",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<source account id>:role/Admin"
},
"Action": [
"kms:CreateGrant",
"kms:ListGrants",
"kms:RevokeGrant"
],
"Resource": "*",
"Condition": {
"Bool": {
"kms:GrantIsForAWSResource": "true"
}
}
}
]
}
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
output_dir = "s3://your-bucket-name/you-path/"
items = glueContext.create_dynamic_frame.from_catalog(database=<source db>,table_name=<source table name>)
glueContext.write_dynamic_frame.from_options(frame = items, connection_type = "s3", connection_options = {"path": output_dir}, format = <format you want>)
job.commit()

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

GURUFIN PROJECT TEAM SERVICES

Playing around with Routing Table

On Disjoint Sets

Type-Safe API without writing and maintaining them in our CQRS + Event Sourcing Application

Temperature sensor library for Raspberry Pi written in Go

Filter Embedded Documents Array in MongoEngine, GraphQL, Django, MongoDB.

Support Angel for handling external disruptors during the sprint

Support Angel for handling external disruptors during the sprint by Hamid Zarei

What’s Different About Java 17 and Containers?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
nikkileenee

nikkileenee

More from Medium

Introduction

Getting Started with Apache Spark on Databricks

Spark Performance Tuning: Skewness Part 2

Understanding Apache Hive Transforms