Simplify entry administration with Amazon Redshift and AWS Lake Formation for customers in an Exterior Id Supplier

Many organizations use identification suppliers (IdPs) to authenticate customers, handle their attributes, and group memberships for safe, environment friendly, and centralized identification administration. You could be modernizing your information structure utilizing Amazon Redshift to allow entry to your information lake and information in your information warehouse, and are on the lookout for a centralized and scalable option to outline and handle the information entry primarily based on IdP identities. AWS Lake Formation makes it simple to centrally govern, safe, and globally share information for analytics and machine studying (ML). Presently, you could have to map consumer identities and teams to AWS Id and Entry Administration (IAM) roles, and information entry permissions are outlined on the IAM function stage inside Lake Formation. This setup just isn’t environment friendly as a result of establishing and sustaining IdP teams with IAM function mapping as new teams are created is time consuming and it makes it tough to derive what information was accessed from which service at the moment.

Amazon Redshift, Amazon QuickSight, and Lake Formation now combine with the brand new trusted identification propagation functionality in AWS IAM Id Middle to authenticate customers seamlessly throughout companies. On this submit, we focus on two use circumstances to configure trusted identification propagation with Amazon Redshift and Lake Formation.

Resolution overview

Trusted identification propagation supplies a brand new authentication choice for organizations that wish to centralize information permissions administration and authorize requests primarily based on their IdP identification throughout service boundaries. With IAM Id Middle, you possibly can configure an current IdP to handle customers and teams and use Lake Formation to outline fine-grained entry management permissions on catalog assets for these IdP identities. Amazon Redshift helps identification propagation when querying information with Amazon Redshift Spectrum and with Amazon Redshift Information Sharing, and you should use AWS CloudTrail to audit information entry by IdP identities to assist your group meet their regulatory and compliance necessities.

With this new functionality, customers can hook up with Amazon Redshift from QuickSight with a single sign-on expertise and create direct question datasets. That is enabled through the use of IAM Id Middle as a shared identification supply. With trusted identification propagation, when QuickSight belongings like dashboards are shared with different customers, the database permissions of every QuickSight consumer are utilized by propagating their end-user identification from QuickSight to Amazon Redshift and implementing their particular person information permissions. Relying on the use case, the creator can apply further row-level and column-level safety in QuickSight.

The next diagram illustrates an instance of the answer structure.

On this submit, we stroll by means of configure trusted identification propagation with Amazon Redshift and Lake Formation. We cowl the next use circumstances:

Redshift Spectrum with Lake Formation
Redshift information sharing with Lake Formation

Conditions

This walkthrough assumes you will have arrange a Lake Formation administrator function or an identical function to comply with together with the directions on this submit. To study extra about establishing permissions for a knowledge lake administrator, see Create a knowledge lake administrator.

Moreover, you need to create the next assets as detailed in Combine Okta with Amazon Redshift Question Editor V2 utilizing AWS IAM Id Middle for seamless Single Signal-On:

An Okta account built-in with IAM Id Middle to sync customers and teams
A Redshift managed software with IAM Id Middle
A Redshift supply cluster with IAM Id Middle integration enabled
A Redshift goal cluster with IAM Id Middle integration enabled (you possibly can skip the part to arrange Amazon Redshift role-based entry)
Customers and teams from IAM Id Middle assigned to the Redshift software
A permission set assigned to AWS accounts to allow Redshift Question Editor v2 entry

Add the under permission to the IAM function utilized in Redshift managed software for integration with IAM Id Middle.

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess",
                "glue:GetTable",
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartitions",
                "lakeformation:GetResourceLFTags",
                "lakeformation:ListLFTags",
                "lakeformation:GetLFTag",
                "lakeformation:SearchTablesByLFTags",
                "lakeformation:SearchDatabasesByLFTags"
           ],
            "Useful resource": "*"
        }
    ]
}

Use case 1: Redshift Spectrum with Lake Formation

This use case assumes you will have the next conditions:

Log in to the AWS Administration Console as an IAM administrator.
Go to CloudShell or your AWS CLI and run the next AWS CLI command, offering your bucket identify to repeat the information:

aws s3 sync s3://redshift-demos/information/NY-Pub/ s3://<bucketname>/information/NY-Pub/

On this submit, we use an AWS Glue crawler to create the exterior desk ny_pub saved in Apache Parquet format within the Amazon S3 location s3://<bucketname>/information/NY-Pub/. Within the subsequent step, we create the answer assets utilizing AWS CloudFormation to create a stack named CrawlS3Source-NYTaxiData in us-east-1.

Obtain the .yml file or launch the CloudFormation stack.

The stack creates the next assets:

The crawler NYTaxiCrawler together with the brand new IAM function AWSGlueServiceRole-RedshiftAutoMount
The AWS Glue database automountdb

When the stack is full, proceed with the next steps to complete establishing your assets:

On the AWS Glue console, beneath Information Catalog within the navigation pane, select Crawlers.
Open NYTaxiCrawler and select Edit.

Below Select information sources and classifiers, select Edit.

For Information supply, select S3.
For S3 path, enter s3://<bucketname>/information/NY-Pub/.
Select Replace S3 information supply.

Select Subsequent and select Replace.
Select Run crawler.

After the crawler is full, you possibly can see a brand new desk referred to as ny_pub within the Information Catalog beneath the automountdb database.

After you create the assets, full the steps within the subsequent sections to arrange Lake Formation permissions on the AWS Glue desk ny_pub for the gross sales IdP group and entry them by way of Redshift Spectrum.

Allow Lake Formation propagation for the Redshift managed software

Full the next steps to allow Lake Formation propagation for the Redshift managed software created in Combine Okta with Amazon Redshift Question Editor V2 utilizing AWS IAM Id Middle for seamless Single Signal-On:

Log in to the console as admin.
On the Amazon Redshift console, select IAM Id Middle connection within the navigation pane.
Choose the managed software that begins with redshift-iad and select Edit.

Choose Allow AWS Lake Formation entry grants beneath Trusted identification propagation and save your modifications.

Arrange Lake Formation as an IAM Id Middle software

Full the next steps to arrange Lake Formation as an IAM Id Middle software:

On the Lake Formation console, beneath Administration within the navigation pane, select IAM Id Middle integration.

Evaluation the choices and select Submit to allow Lake Formation integration.

The combination standing will replace to Success.
Alternatively, you possibly can run the next command:

aws lakeformation create-lake-formation-identity-center-configuration 
--cli-input-json '{"CatalogId": "<catalog_id>","InstanceArn": "<identitycenter_arn>"}'

Register the information with Lake Formation

On this part, we register the information with Lake Formation. Full the next steps:

On the Lake Formation console, beneath Administration within the navigation pane, select Information lake areas.
Select Register location.
For Amazon S3 path, enter the bucket the place the desk information resides (s3://<bucketname>/information/NY-Pub/).
For IAM function, select a Lake Formation user-defined function. For extra data, check with Necessities for roles used to register areas.
For Permission mode, choose Lake Formation.
Select Register location.

Subsequent, confirm that the IAMAllowedPrincipal group doesn’t have permission on the database.

On the Lake Formation console, beneath Information catalog within the navigation pane, select Databases.
Choose automountdb and on the Actions menu, select View permissions.
If IAMAllowedPrincipal is listed, choose the principal and select Revoke.
Repeat these steps to confirm permissions for the desk ny_pub.

Grant the IAM Id Middle group permissions on the AWS Glue database and desk

Full the next steps to grant database permissions to the IAM Id Middle group:

On the Lake Formation console, beneath Information catalog within the navigation pane, select Databases.
Choose the database automountdb and on the Actions menu, select Grant.
Select Grant database.
Below Principals, choose IAM Id Middle and select Add.
Within the pop-up window, if that is the primary time assigning customers and teams, select Get began.
Enter the IAM Id Middle group within the search bar and select the group.
Select Assign.
Below LF-Tags or catalog assets, automountdb is already chosen for Databases.
Choose Describe for Database permissions.
Select Grant to use the permissions.

Alternatively, you possibly can run the next command:

aws lakeformation grant-permissions --cli-input-json '
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:identitystore:::group/<identitycenter_group_name>"
    },
    "Useful resource": {
        "Database": {
            "Title": "automountdb"
        }
    },
    "Permissions": [
        "DESCRIBE"
    ]
}'

Subsequent, you grant desk permissions to the IAM Id Middle group.

Below Information catalog within the navigation pane, select Databases.
Choose the database automountdb and on the Actions menu, select Grant.
Below Principals, choose IAM Id Middle and select Add.
Enter the IAM Id Middle group within the search bar and select the group.
Select Assign.
Below LF-Tags or catalog assets, automountdb is already chosen for Databases.
For Tables, select ny_pub.
Choose Describe and Choose for Desk permissions.
Select Grant to use the permissions.

Alternatively, you possibly can run the next command:

aws lakeformation grant-permissions --cli-input-json '
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:identitystore:::group/<identitycenter_group_name>"
    },
    "Useful resource": {
        "Desk": {
            "DatabaseName": "automountdb",
            "Title": "ny_pub "
        }
    },
    "Permissions": [
        "SELECT",
        "DESCRIBE"

    ]
}'

Arrange Redshift Spectrum desk entry for the IAM Id Middle group

Full the next steps to arrange Redshift Spectrum desk entry:

Check in to the Amazon Redshift console utilizing the admin function.
Navigate to Question Editor v2.
Select the choices menu (three dots) subsequent to the cluster and select Create connection.

Join because the admin consumer and run the next instructions to make the ny_pub information within the S3 information lake out there to the gross sales group:

create exterior schema if not exists nyc_external_schema from DATA CATALOG database 'automountdb' catalog_id '<accountid>'; 
grant utilization on schema nyc_external_schema to function "awsidc:awssso-sales"; 
grant choose on all tables in schema nyc_external_schema to function "awsidc:awssso- gross sales";

Validate Redshift Spectrum entry as an IAM Id Middle consumer

Full the next steps to validate entry:

On the Amazon Redshift console, navigate to Question Editor v2.
Select the choices menu (three dots) subsequent to the cluster and select Create connection
Select choose IAM Id Middle choice for Join choice. Present Okta consumer identify and password within the browser pop-up.
As soon as related as a federated consumer, run the next SQL instructions to question the ny_pub information lake desk:

choose * from nyc_external_schema.ny_pub;

Use Case 2: Redshift information sharing with Lake Formation

This use case assumes you will have IAM Id Middle integration with Amazon Redshift arrange, with Lake Formation propagation enabled as per the directions offered within the earlier part.

Create a knowledge share with objects and share it with the Information Catalog

Full the next steps to create a knowledge share:

Check in to the Amazon Redshift console utilizing the admin function.
Navigate to Question Editor v2.
Select the choices menu (three dots) subsequent to the Redshift supply cluster and select Create connection.

Join as admin consumer utilizing Quickly credentials utilizing a database consumer identify choice and run the next SQL instructions to create a knowledge share:

CREATE DATASHARE salesds; 
ALTER DATASHARE salesds ADD SCHEMA sales_schema; 
ALTER DATASHARE salesds ADD TABLE store_sales; 
GRANT USAGE ON DATASHARE salesds TO ACCOUNT ‘<accountid>’ by way of DATA CATALOG;

Authorize the information share by selecting Information shares within the navigation web page and choosing the information share salesdb.
Choose the information share and select Authorize.

Now you possibly can register the information share in Lake Formation as an AWS Glue database.

Check in to the Lake Formation console as the information lake administrator IAM consumer or function.
Below Information catalog within the navigation pane, select Information sharing and think about the Redshift information share invites on the Configuration tab.
Choose the datashare salesds and select Evaluation Invitation.
When you assessment the small print select Settle for.
Present a reputation for the AWS Glue database (for instance, salesds) and select Skip to Evaluation and create.

After the AWS Glue database is created on the Redshift information share, you possibly can view it beneath Shared databases.

Grant the IAM Id Middle consumer group permission on the AWS Glue database and desk

Full the next steps to grant database permissions to the IAM Id Middle group:

On the Lake Formation console, beneath Information catalog within the navigation pane, select Databases.
Choose the database salesds and on the Actions menu, select Grant.
Select Grant database.
Below Principals, choose IAM Id Middle and select Add.
Within the pop-up window, enter the IAM Id Middle group awssso within the search bar and select the awssso-sales group.
Select Assign.
Below LF-Tags or catalog assets, salesds is already chosen for Databases.
Choose Describe for Database permissions.
Select Grant to use the permissions.

Subsequent, grant desk permissions to the IAM Id Middle group.

Below Information catalog within the navigation pane, select Databases.
Choose the database salesds and on the Actions menu, select Grant.
Below Principals, choose IAM Id Middle and select Add.
Within the pop-up window, enter the IAM Id Middle group awssso within the search bar and select the awssso-sales group.
Select Assign.
Below LF-Tags or catalog assets, salesds is already chosen for Databases.
For Tables, select sales_schema.store_sales.
Choose Describe and Choose for Desk permissions.
Select Grant to use the permissions.

Mount the exterior schema within the goal Redshift cluster and allow entry for the IAM Id Middle consumer

Full the next steps:

Check in to the Amazon Redshift console utilizing the admin function.
Navigate to Question Editor v2.
Join as an admin consumer and run the next SQL instructions to mount the AWS Glue database customerds as an exterior schema and allow entry to the gross sales group:

create exterior schema if not exists sales_datashare_schema from DATA CATALOG database salesds catalog_id '<accountid>';
create function "awsidc:awssso-sales"; # If the function was not already created 
grant utilization on schema sales_datashare_schema to function "awsidc:awssso-sales";
grant choose on all tables in schema sales_datashare_schema to function "awsidc:awssso- gross sales";

Entry Redshift information shares as an IAM Id Middle consumer

Full the next steps to entry the information shares:

On the Amazon Redshift console, navigate to Question Editor v2.
Select the choices menu (three dots) subsequent to the cluster and select Create connection.
Join with IAM Id Middle and the present IAM Id Middle consumer and password within the browser login.
Run the next SQL instructions to question the information lake desk:

SELECT * FROM "dev"."sales_datashare_schema"."sales_schema.store_sales";

With Transitive Id Propagation we are able to now audit consumer entry to dataset from Lake Formation dashboard and repair used for accessing the dataset offering full trackability. For federated consumer Ethan whose Id Middle Person ID is ‘459e10f6-a3d0-47ae-bc8d-a66f8b054014’ you possibly can see the under occasion log.

"eventSource": "lakeformation.amazonaws.com",
    "eventName": "GetDataAccess",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "redshift.amazonaws.com",
    "userAgent": "redshift.amazonaws.com",
    "requestParameters": {
        "tableArn": "arn:aws:glue:us-east-1:xxxx:desk/automountdb/ny_pub",
        "durationSeconds": 3600,
        "auditContext": {
            "additionalAuditContext": "{"invokedBy":"arn:aws:redshift:us-east-1:xxxx:dbuser:redshift-consumer/awsidc:ethan.doe@gmail.com", "transactionId":"961953", "queryId":"613842", "isConcurrencyScalingQuery":"false"}"
        },
        "cellLevelSecurityEnforced": true
    },
    "responseElements": null,
    "additionalEventData": {
        "requesterService": "REDSHIFT",
        "LakeFormationTrustedCallerInvocation": "true",
        "lakeFormationPrincipal": "arn:aws:identitystore:::consumer/459e10f6-a3d0-47ae-bc8d-a66f8b054014",
        "lakeFormationRoleSessionName": "AWSLF-00-RE-726034267621-K7FUMxovuq"
    }

Clear up

Full the next steps to scrub up your assets:

Delete the information from the S3 bucket.
Delete the Lake Formation software and the Redshift provisioned cluster that you simply created for testing.
Check in to the CloudFormation console because the IAM admin used for creating the CloudFormation stack, and delete the stack you created.

Conclusion

On this submit, we coated simplify entry administration for analytics by propagating consumer identification throughout Amazon Redshift and Lake Formation utilizing IAM Id Middle. We discovered get began with trusted identification propagation by connecting to Amazon Redshift and Lake Formation. We additionally discovered configure Redshift Spectrum and information sharing to assist trusted identification propagation.

Study extra about IAM Id Middle with Amazon Redshift and AWS Lake Formation. Depart your questions and suggestions within the feedback part.

Concerning the Authors

Harshida Patel is a Analytics Specialist Principal Options Architect, with AWS.

Srividya Parthasarathy is a Senior Large Information Architect on the AWS Lake Formation staff. She enjoys constructing information mesh options and sharing them with the group.

Maneesh Sharma is a Senior Database Engineer at AWS with greater than a decade of expertise designing and implementing large-scale information warehouse and analytics options. He collaborates with varied Amazon Redshift Companions and clients to drive higher integration.

Poulomi Dasgupta is a Senior Analytics Options Architect with AWS. She is enthusiastic about serving to clients construct cloud-based analytics options to unravel their enterprise issues. Outdoors of labor, she likes travelling and spending time along with her household.