Amazon Redshift is a quick, totally managed, petabyte-scale knowledge warehouse that gives the pliability to make use of provisioned or serverless compute in your analytical workloads. With Amazon Redshift Serverless and Question Editor v2, you may load and question giant datasets in only a few clicks and pay just for what you utilize. The decoupled compute and storage structure of Amazon Redshift allows you to construct extremely scalable, resilient, and cost-effective workloads. Many purchasers migrate their knowledge warehousing workloads to Amazon Redshift and profit from the wealthy capabilities it presents, equivalent to the next:
- Amazon Redshift seamlessly integrates with broader knowledge, analytics, and AI or machine studying (ML) providers on AWS, enabling you to decide on the precise instrument for the precise job. Trendy analytics is far wider than SQL-based knowledge warehousing. With Amazon Redshift, you may construct lake home architectures and carry out any type of analytics, equivalent to interactive analytics, operational analytics, massive knowledge processing, visible knowledge preparation, predictive analytics, machine studying, and extra.
- You don’t want to fret about workloads equivalent to ETL (extract, remodel, and cargo), dashboards, ad-hoc queries, and so forth interfering with one another. You possibly can isolate workloads utilizing knowledge sharing, whereas utilizing the identical underlying datasets.
- When customers run many queries at peak instances, compute seamlessly scales inside seconds to offer constant efficiency at excessive concurrency. You get 1 hour of free concurrency scaling capability for twenty-four hours of utilization. This free credit score meets the concurrency demand of 97% of the Amazon Redshift buyer base.
- Amazon Redshift is simple to make use of with self-tuning and self-optimizing capabilities. You may get sooner insights with out spending worthwhile time managing your knowledge warehouse.
- Fault tolerance is inbuilt. All knowledge written to Amazon Redshift is routinely and repeatedly replicated to Amazon Easy Storage Service (Amazon S3). Any {hardware} failures are routinely changed.
- Amazon Redshift is straightforward to work together with. You possibly can entry knowledge with conventional, cloud-native, containerized, serverless net providers or event-driven purposes. It’s also possible to use your favourite enterprise intelligence (BI) and SQL instruments to entry, analyze, and visualize knowledge in Amazon Redshift.
- Amazon Redshift ML makes it simple for knowledge scientists to create, practice, and deploy ML fashions utilizing acquainted SQL. It’s also possible to run predictions utilizing SQL.
- Amazon Redshift offers complete knowledge safety at no additional value. You possibly can arrange end-to-end knowledge encryption, configure firewall guidelines, outline granular row-level and column-level safety controls on delicate knowledge, and extra.
On this submit, we present easy methods to migrate an information warehouse from Microsoft Azure Synapse to Redshift Serverless utilizing AWS Schema Conversion Instrument (AWS SCT) and AWS SCT knowledge extraction brokers. AWS SCT makes heterogeneous database migrations predictable by routinely changing the supply database code and storage objects to a format suitable with the goal database. Any objects that may’t be routinely transformed are clearly marked in order that they are often manually transformed to finish the migration. AWS SCT may also scan your utility code for embedded SQL statements and convert them.
Resolution overview
AWS SCT makes use of a service account to hook up with your Azure Synapse Analytics. First, we create a Redshift database into which Azure Synapse knowledge will probably be migrated. Subsequent, we create an S3 bucket. Then, we use AWS SCT to transform Azure Synapse schemas and apply them to Amazon Redshift. Lastly, emigrate knowledge, we use AWS SCT knowledge extraction brokers, which extract knowledge from Azure Synapse, add it into an S3 bucket, and duplicate it to Amazon Redshift.
The next diagram illustrates our answer structure.
This walkthrough covers the next steps:
- Create a Redshift Serverless knowledge warehouse.
- Create the S3 bucket and folder.
- Convert and apply the Azure Synapse schema to Amazon Redshift utilizing AWS SCT:
- Hook up with the Azure Synapse supply.
- Hook up with the Amazon Redshift goal.
- Convert the Azure Synapse schema to a Redshift database.
- Analyze the evaluation report and tackle the motion gadgets.
- Apply the transformed schema to the goal Redshift database.
- Migrate knowledge from Azure Synapse to Amazon Redshift utilizing AWS SCT knowledge extraction brokers:
- Generate belief and key shops (this step is optionally available).
- Set up and configure the information extraction agent.
- Begin the information extraction agent.
- Register the information extraction agent.
- Add digital partitions for big tables (this step is optionally available).
- Create a neighborhood knowledge migration activity.
- Begin the native knowledge migration activity.
- View knowledge in Amazon Redshift.
Stipulations
Earlier than beginning this walkthrough, you should have the next conditions:
- A workstation with AWS SCT, Amazon Corretto 11, and Redshift drivers.
- A database consumer account that AWS SCT can use to hook up with your supply Azure Synapse Analytics database.
- Grant VIEW DEFINITION and VIEW DATABASE STATE to every schema you are attempting to transform to the database consumer used for migration.
Create a Redshift Serverless knowledge warehouse
On this step, we create a Redshift Serverless knowledge warehouse with a workgroup and namespace. A workgroup is a group of compute sources and a namespace is a group of database objects and customers. To isolate workloads and handle totally different sources in Redshift Serverless, you may create namespaces and workgroups and handle storage and compute sources individually.
Observe these steps to create a Redshift Serverless knowledge warehouse with a workgroup and namespace:
- On the Amazon Redshift console, select the AWS Area that you simply need to use.
- Within the navigation pane, select Redshift Serverless.
- Select Create workgroup.
- For Workgroup identify, enter a reputation that describes the compute sources.
- Confirm that the VPC is identical because the VPC because the EC2 occasion with AWS SCT.
- Select Subsequent.
- For Namespace, enter a reputation that describes your dataset.
- Within the Database identify and password part, choose Customise admin consumer credentials.
- For Admin consumer identify, enter a consumer identify of your selection (for instance,
awsuser
). - For Admin consumer password, enter a password of your selection (for instance,
MyRedShiftPW2022
).
- Select Subsequent.
Observe that knowledge within the Redshift Serverless namespace is encrypted by default.
- Within the Assessment and Create part, select Create.
Now you create an AWS Identification and Entry Administration (IAM) function and set it because the default in your namespace. Observe that there can solely be one default IAM function.
- On the Redshift Serverless Dashboard, within the Namespaces / Workgroups part, select the namespace you simply created.
- On the Safety and encryption tab, within the Permissions part, select Handle IAM roles.
- Select Handle IAM roles and select Create IAM function.
- Within the Specify an Amazon S3 bucket for the IAM function to entry part, select one of many following strategies:
- Select No extra Amazon S3 bucket to permit the created IAM function to entry solely the S3 buckets with names containing the phrase redshift.
- Select Any Amazon S3 bucket to permit the created IAM function to entry all S3 buckets.
- Select Particular Amazon S3 buckets to specify a number of S3 buckets for the created IAM function to entry. Then select a number of S3 buckets from the desk.
- Select Create IAM function as default.
- Seize the endpoint for the Redshift Serverless workgroup you simply created.
- On the Redshift Serverless Dashboard, within the Namespaces / Workgroups part, select the workgroup you simply created.
- Within the Normal data part, copy the endpoint.
Create the S3 bucket and folder
Throughout the knowledge migration course of, AWS SCT makes use of Amazon S3 as a staging space for the extracted knowledge. Observe these steps to create an S3 bucket:
- On the Amazon S3 console, select Buckets within the navigation pane.
- Select Create bucket.
- For Bucket identify, enter a singular DNS-compliant identify in your bucket (for instance,
uniquename-as-rs
).
For extra details about bucket names, discuss with Bucket naming guidelines.
- For AWS Area, select the Area during which you created the Redshift Serverless workgroup.
- Select Create bucket.
- Select Buckets within the navigation pane and navigate to the S3 bucket you simply created (
uniquename-as-rs
). - Select Create folder.
- For Folder identify, enter incoming.
- Select Create folder.
Convert and apply the Azure Synapse schema to Amazon Redshift utilizing AWS SCT
To transform the Azure Synapse schema to Amazon Redshift format, we use AWS SCT. Begin by logging in to the EC2 occasion that you simply created beforehand and launch AWS SCT.
Hook up with the Azure Synapse supply
Full the next steps to hook up with the Azure Synapse supply:
- On the File menu, select Create New Undertaking.
- Select a location to retailer your venture recordsdata and knowledge.
- Present a significant however memorable identify in your venture (for instance, Azure Synapse to Amazon Redshift).
- To connect with the Azure Synapse supply knowledge warehouse, select Add supply.
- Select Azure Synapse and select Subsequent.
- For Connection identify, enter a reputation (for instance,
olap-azure-synapse
).
AWS SCT shows this identify within the object tree in left pane.
- For Server identify, enter your Azure Synapse server identify.
- For SQL pool, enter your Azure Synapse pool identify.
- Enter a consumer identify and password.
- Select Take a look at connection to confirm that AWS SCT can connect with your supply Azure Synapse venture.
- When the connection is efficiently validated, select Okay and Join.
Hook up with the Amazon Redshift goal
Observe these steps to hook up with Amazon Redshift:
- In AWS SCT, select Add goal.
- Select Amazon Redshift, then select Subsequent.
- For Connection identify, enter a reputation to explain the Amazon Redshift connection.
AWS SCT shows this identify within the object tree in the precise pane.
- For Server identify, enter the Redshift Serverless workgroup endpoint you captured earlier.
- For Server port, enter 5439.
- For Database, enter dev.
- For Person identify, enter the consumer identify you selected when creating the Redshift Serverless workgroup.
- For Password, enter the password you selected when creating the Redshift Serverless workgroup.
- Deselect Use AWS Glue.
- Select Take a look at connection to confirm that AWS SCT can connect with your goal Redshift workgroup.
- When the check is profitable, select OK.
- Select Join to hook up with the Amazon Redshift goal.
Alternatively, you need to use connection values which can be saved in AWS Secrets and techniques Supervisor.
Convert the Azure Synapse schema to a Redshift knowledge warehouse
After you create the supply and goal connections, you will notice the supply Azure Synapse object tree within the left pane and the goal Amazon Redshift object tree in the precise pane. We then create mapping guidelines to explain the supply goal pair for the Azure Synapse to Amazon Redshift migration.
Observe these steps to transform the Azure Synapse dataset to Amazon Redshift format:
- Within the left pane, select (right-click) the schema you need to convert.
- Select Convert schema.
- Within the dialog field, select Sure.
When the conversion is full, you will notice a brand new schema created within the Amazon Redshift pane (proper pane) with the identical identify as your Azure Synapse schema.
The pattern schema we used has three tables; you may see these objects in Amazon Redshift format in the precise pane. AWS SCT converts all of the Azure Synapse code and knowledge objects to Amazon Redshift format. It’s also possible to use AWS SCT to transform exterior SQL scripts, utility code, or extra recordsdata with embedded SQL.
Analyze the evaluation report and tackle the motion gadgets
AWS SCT creates an evaluation report back to assess the migration complexity. AWS SCT can convert the vast majority of code and database objects, however some objects could require guide conversion. AWS SCT highlights these objects in blue within the conversion statistics diagram and creates motion gadgets with a complexity connected to them.
To view the evaluation report, change from Primary view to Evaluation Report view as proven within the following screenshot.
The Abstract tab reveals objects that have been transformed routinely and objects that weren’t transformed routinely. Inexperienced represents routinely transformed objects or objects with easy motion gadgets. Blue represents medium and sophisticated motion gadgets that require guide intervention.
The Motion gadgets tab reveals the really useful actions for every conversion challenge. For those who select an motion merchandise from the listing, AWS SCT highlights the thing that the motion merchandise applies to.
The report additionally accommodates suggestions for easy methods to manually convert the schema merchandise. For instance, after the evaluation runs, detailed studies for the database and schema present you the trouble required to design and implement the suggestions for changing motion gadgets. For extra details about deciding easy methods to deal with guide conversions, see Dealing with guide conversions in AWS SCT. AWS SCT completes some actions routinely whereas changing the schema to Amazon Redshift; objects with such actions are marked with a crimson warning signal.
You possibly can consider and examine the person object DDL by choosing it in the precise pane, and it’s also possible to edit it as wanted. Within the following instance, AWS SCT modifies the ID column knowledge sort from decimal(3,0) in Azure Synapse to the smallint knowledge sort in Amazon Redshift.
Apply the transformed schema to the goal Redshift knowledge warehouse
To use the transformed schema to Amazon Redshift, choose the transformed schema in the precise pane, right-click, and select Apply to database.
Migrate knowledge from Azure Synapse to Amazon Redshift utilizing AWS SCT knowledge extraction brokers
AWS SCT extraction brokers extract knowledge out of your supply database and migrate it to the AWS Cloud. On this part, we configure AWS SCT extraction brokers to extract knowledge from Azure Synapse and migrate to Amazon Redshift. For this submit, we set up the AWS SCT extraction agent on the identical Home windows occasion that has AWS SCT put in. For higher efficiency, we suggest that you simply use a separate Linux occasion to put in extraction brokers if attainable. For very giant datasets, AWS SCT helps the usage of a number of knowledge extraction brokers operating on a number of cases to maximise throughput and enhance the velocity of knowledge migration.
Generate belief and key shops (optionally available)
You should use Safe Socket Layer (SSL) encrypted communication with AWS SCT knowledge extractors. Once you use SSL, all knowledge handed between the purposes stays non-public and integral. To make use of SSL communication, you’ll want to generate belief and key shops utilizing AWS SCT. You possibly can skip this step for those who don’t need to use SSL. We suggest utilizing SSL for manufacturing workloads.
Observe these steps to generate belief and key shops:
- In AWS SCT, select Settings, International settings, and Safety.
- Select Generate belief and key retailer.
- Enter a reputation and password for the belief and key shops.
- Enter a location to retailer them.
- Select Generate, then select OK.
Set up and configure the information extraction agent
Within the set up package deal for AWS SCT, you’ll find a subfolder referred to as brokers (aws-schema-conversion-tool-1.0.newest.zipagents
). Find and set up the executable file with a reputation like aws-schema-conversion-tool-extractor-xxxxxxxx.msi
.
Within the set up course of, observe these steps to configure AWS SCT Information Extractor:
- For Service port, enter the port quantity the agent listens on. It’s 8192 by default.
- For Working folder, enter the trail the place the AWS SCT knowledge extraction agent will retailer the extracted knowledge.
The working folder will be on a unique laptop from the agent, and a single working folder will be shared by a number of brokers on totally different computer systems.
- For Enter Redshift JDBC driver file or recordsdata, enter the placement the place you downloaded the Redshift JDBC drivers.
- For Add the Amazon Redshift driver, enter
YES
. - For Allow SSL communication, enter
sure
. EnterNo
right here for those who don’t need to use SSL. - Select Subsequent.
- For Belief retailer path, enter the storage location you specified when creating the belief and key retailer.
- For Belief retailer password, enter the password for the belief retailer.
- For Allow consumer SSL authentication, enter
sure
. - For Key retailer path, enter the storage location you specified when creating the belief and key retailer.
- For Key retailer password, enter the password for the important thing retailer.
- Select Subsequent.
Begin the information extraction agent
Use the next process to begin extraction brokers. Repeat this process on every laptop that has an extraction agent put in.
Extraction brokers act as listeners. Once you begin an agent with this process, the agent begins listening for directions. You ship the brokers directions to extract knowledge out of your knowledge warehouse in a later part.
To begin the extraction agent, navigate to the AWS SCT Information Extractor Agent listing. For instance, in Microsoft Home windows, use C:Program FilesAWS SCT Information Extractor AgentStartAgent.bat
.
On the pc that has the extraction agent put in, from a command immediate or terminal window, run the command listed in your working system. To cease an agent, run the identical command however substitute begin with cease. To restart an agent, run the identical RestartAgent.bat file.
Observe that it’s best to have administrator entry to run these instructions.
Register the information extraction agent
Observe these steps to register the information extraction agent:
- In AWS SCT, change the view to Information Migration view select Register.
- Choose Redshift knowledge agent, then select OK.
- For Description, enter a reputation to determine the agent.
- For Host identify, for those who put in the extraction agent on the identical workstation as AWS SCT, enter 0.0.0.0 to point native host. In any other case, enter the host identify of the machine on which the AWS SCT extraction agent is put in. It’s endorsed to put in extraction brokers on Linux for higher efficiency.
- For Port, enter the quantity you used for the listening port (default 8192) when putting in the AWS SCT extraction agent.
- Choose Use SSL to encrypt AWS SCT connection to Information Extraction Agent.
- For those who’re utilizing SSL, navigate to the SSL tab.
- For Belief retailer, select the belief retailer you created earlier.
- For Key retailer, select the important thing retailer you created earlier.
- Select Take a look at connection.
- After the connection is validated efficiently, select OK and Register.
Create a neighborhood knowledge migration activity
Emigrate knowledge from Azure Synapse Analytics to Amazon Redshift, you create, run, and monitor the native migration activity from AWS SCT. This step makes use of the information extraction agent emigrate knowledge by making a activity.
Observe these steps to create a neighborhood knowledge migration activity:
- In AWS SCT, below the schema identify within the left pane, select (right-click) the desk you need to migrate (for this submit, we use the desk
tbl_currency
). - Select Create Native activity.
- Select from the next migration modes:
- Extract the supply knowledge and retailer it on a neighborhood PC or digital machine the place the agent runs.
- Extract the information and add it to an S3 bucket.
- Extract the information, add it to Amazon S3, and duplicate it into Amazon Redshift. (We select this feature for this submit.)
- On the Superior tab, present the extraction and duplicate settings.
- On the Supply server tab, ensure you are utilizing the present connection properties.
- On the Amazon S3 settings tab, for Amazon S3 bucket folder, present the bucket and folder names of the S3 bucket you created earlier.
The AWS SCT knowledge extraction agent uploads the information in these S3 buckets and folders earlier than copying it to Amazon Redshift.
- Select Take a look at Process.
- When the duty is efficiently validated, select OK, then select Create.
Begin the native knowledge migration activity
To begin the duty, select Begin or Restart on the Duties tab.
First, the information extraction agent extracts knowledge from Azure Synapse. Then the agent uploads knowledge to Amazon S3 and launches a replica command to maneuver the information to Amazon Redshift.
At this level, AWS SCT has efficiently migrated knowledge from the supply Azure Synapse desk to the Redshift desk.
View knowledge in Amazon Redshift
After the information migration activity is full, you may connect with Amazon Redshift and validate the information. Full the next steps:
- On the Amazon Redshift console, navigate to the Question Editor v2.
- Open the Redshift Serverless workgroup you created.
- Select Question knowledge.
- For Database, enter a reputation in your database.
- For Authentication, choose Federated consumer
- Select Create connection.
- Open a brand new editor by selecting the plus signal.
- Within the editor, write a question to pick out from the schema identify and desk or view identify you need to confirm.
You possibly can discover the information, run ad-hoc queries, and make visualizations, charts, and views.
The next screenshot is the view of the supply Azure Synapse dataset we used on this submit.
Clear up
Observe the steps on this part to wash up any AWS sources you created as a part of this submit.
Cease the EC2 occasion
Observe these steps to cease the EC2 occasion:
- On the Amazon EC2 console, within the navigation pane, select Situations.
- Choose the occasion you created.
- Select Occasion state, then select Terminate occasion.
- Select Terminate when prompted for affirmation.
Delete the Redshift Serverless workgroup and namespace
Observe these steps to delete the Redshift Serverless workgroup and namespace:
- On the Redshift Serverless Dashboard, within the Namespaces / Workgroups part, select the workspace you created
- On the Actions menu, select Delete workgroup.
- Choose Delete the related namespace.
- Deselect Create closing snapshot.
- Enter delete within the affirmation textual content field and select Delete.
Delete the S3 bucket
Observe these steps to delete the S3 bucket:
- On the Amazon S3 console, select Buckets within the navigation pane.
- Select the bucket you created.
- Select Delete.
- To substantiate deletion, enter the identify of the bucket.
- Select Delete bucket.
Conclusion
Migrating an information warehouse is usually a difficult, advanced, and but rewarding venture. AWS SCT reduces the complexity of knowledge warehouse migrations. This submit mentioned how an information migration activity extracts, downloads, and migrates knowledge from Azure Synapse to Amazon Redshift. The answer we introduced performs a one-time migration of database objects and knowledge. Information adjustments made in Azure Synapse when the migration is in progress received’t be mirrored in Amazon Redshift. When knowledge migration is in progress, put your ETL jobs to Azure Synapse on maintain or rerun the ETL jobs by pointing to Amazon Redshift after the migration. Think about using the greatest practices for AWS SCT.
To get began, obtain and set up AWS SCT, register to the AWS Administration Console, try Redshift Serverless, and begin migrating!
In regards to the Authors
Ahmed Shehata is a Senior Analytics Specialist Options Architect at AWS based mostly on Toronto. He has greater than twenty years of expertise serving to prospects modernize their knowledge platforms. Ahmed is keen about serving to prospects construct environment friendly, performant, and scalable analytic options.
Jagadish Kumar is a Senior Analytics Specialist Options Architect at AWS centered on Amazon Redshift. He’s deeply keen about Information Structure and helps prospects construct analytics options at scale on AWS.
Anusha Challa is a Senior Analytics Specialist Resolution Architect at AWS centered on Amazon Redshift. She has helped many purchasers construct large-scale knowledge warehouse options within the cloud and on premises. Anusha is keen about knowledge analytics and knowledge science and enabling prospects obtain success with their large-scale knowledge initiatives.