Simplify Amazon Redshift monitoring utilizing the brand new unified SYS views

Amazon Redshift is a totally managed, petabyte-scale knowledge warehouse service within the cloud, offering as much as 5 occasions higher price-performance than every other cloud knowledge warehouse, with efficiency innovation out of the field at no extra price to you. Tens of hundreds of consumers use Amazon Redshift to course of exabytes of information on daily basis to energy their analytics workloads.

On this submit, we talk about Amazon Redshift SYS monitoring views and the way they simplify the monitoring of your Amazon Redshift workloads and useful resource utilization.

Overview of SYS monitoring views

SYS monitoring views are system views in Amazon Redshift that can be utilized to watch question and workload useful resource utilization for provisioned clusters in addition to for serverless workgroups. They provide the next advantages:

They’re categorized primarily based on useful alignment, contemplating question state, efficiency metrics, and question varieties
We’ve got launched new efficiency metrics like planning_time, lock_wait_time, remote_read_io, and local_read_io to assist in efficiency troubleshooting
It improves the usability of monitoring views by logging the user-submitted question as a substitute of the Redshift optimizer-rewritten question
It offers extra troubleshooting metrics utilizing fewer views
It permits unified Amazon Redshift monitoring by enabling you to make use of the identical question throughout provisioned clusters or serverless workgroups

Let’s take a look at among the options of SYS monitoring views and the way they can be utilized for monitoring.

Unify varied query-level monitoring metrics

The next desk reveals how one can unify varied metrics and knowledge for a question from a number of system tables & views into one SYS monitoring view.

STL/SVL/STV	Data ingredient	SYS Monitoring View	View columns
STL_QUERY	elapsed time, question label, person ID, transaction, session, label, stopped queries, database title	SYS_QUERY_HISTORY	user_id query_id query_label transaction_id session_id database_name query_type standing result_cache_hit start_time end_time elapsed_time queue_time execution_time error_message returned_rows returned_bytes query_text redshift_version usage_limit compute_type compile_time planning_time lock_wait_time
STL_WLM_QUERY	queue time, runtime
SVL_QLOG	outcome cache
STL_ERROR	error code, error message
STL_UTILITYTEXT	non-SELECT SQL
STL_DDLTEXT	DDL statements
SVL_STATEMENTEXT	all sorts of SQL statements
STL_RETURN	return rows and bytes
STL_USAGE_CONTROL	utilization restrict
STV_WLM_QUERY_STATE	present state of WLM
STV_RECENTS	latest and in-flight queries
STV_INFLIGHT	in-flight queries
SVL_COMPILE	compilation

For extra info on SYS to STL/SVL/STV mapping, check with Migrating to SYS monitoring views.

Consumer query-level logging

To reinforce question efficiency, the Redshift question engine can rewrite user-submitted queries. The user-submitted question identifier is totally different than the rewritten question identifier. We check with the user-submitted question because the mother or father question and the rewritten question because the little one question on this submit.

The next diagram illustrates logging on the mother or father question degree and little one question degree. The mother or father question identifier is 1000, and the kid question identifiers are 1001, 1002, and 1003.

Question lifecycle timings

SYS_QUERY_HISTORY has an enhanced checklist of columns to offer granular time metrics referring to the totally different question lifecycle phases. Observe all occasions are recorded in microseconds. The next desk summarizes these metrics.

Time metrics	Description
planning_time	The time the question spent previous to working the question, which usually contains question lifecycle phases like parse, analyze, planning and rewriting.
lock_wait_time	The time the question spent on buying the locks on the required database objects referenced.
queue_time	The time the question spent within the queue ready for sources to be out there to run.
compile_time	The time the question spent compiling.
execution_time	The time the question spent working. Within the case of a SELECT question, this additionally contains the return time.
elapsed_time	The top-to-end time of the question run.

Answer overview

We talk about the next eventualities to assist acquire familiarity with the SYS monitoring views:

Workload and question lifecycle monitoring
Knowledge ingestion monitoring
Exterior question monitoring
Gradual question efficiency troubleshooting

Conditions

It is best to have the next conditions to comply with together with the examples on this submit:

Moreover, obtain all of the SQL queries which are referenced on this submit as Redshift Question Editor v2 SQL notebooks.

Workload and question lifecycle monitoring

On this part, we talk about how one can monitor the workload and question lifecycle.

Determine in-flight queries

SYS_QUERY_HISTORY offers a singular view to take a look at all of the in-flight queries in addition to historic runs. See the next instance question:

SELECT  
  *
FROM    
  sys_query_history
WHERE    standing IN ('planning', 'queued', 'working', 'returning')
ORDER BY
  start_time;

We get the next output.

Determine prime long-running queries

The next question helps retrieve the highest 100 queries which are taking the longest to run. Analyzing (and, if possible, optimizing) these queries might help enhance total efficiency. These metrics are accrued statistics throughout all runs of the question. Observe that on a regular basis values are in microseconds.

--top lengthy working question by elapsed_time
SELECT  
  user_id
  , transaction_id
  , query_id
  , database_name
  , query_type
  , query_text::VARCHAR(100)
  , lock_wait_time
  , planning_time
  , compile_time
  , execution_time
  , elapsed_time
FROM    
  sys_query_history
ORDER BY
  elapsed_time DESC
LIMIT 100;

We get the next output.

Collect each day counts of queries by question varieties, interval, and standing

The next question offers perception into the distribution of several types of queries throughout totally different days and helps consider and observe any modifications within the workload:

--daily breakdown of workload by question varieties and standing
SELECT  
  DATE_TRUNC('day', start_time) period_daily
  , query_type
  , standing
  , COUNT(*)
FROM    
  sys_query_history
GROUP BY
  period_daily
  , query_type
  , standing
ORDER BY
  period_daily
  , query_type
  , standing;

We get the next output.

Collect run particulars of an in-flight question

To find out the run-level particulars of a question that’s in-flight, you should utilize the is_active = ‘t’ filter when querying the SYS_QUERY_DETAIL desk. See the next instance:

SELECT  
  query_id
  , child_query_sequence
  , stream_id
  , segment_id
  , step_id
  , step_name
  , table_id
  , coalesce(table_name,'')|| coalesce(supply,'') as table_name
  , start_time
  , end_time
  , period
  , blocks_read
  , local_read_io
  , remote_read_io
FROM    
  sys_query_detail
WHERE is_active="t"
ORDER BY
  query_id
  , child_query_sequence
  , stream_id
  , segment_id
  , step_id;

To view the most recent 100 COPY queries run, use the next code:

SELECT  
  session_id
  , transaction_id
  , query_id
  , database_name
  , table_name
  , data_source
  , loaded_rows
  , loaded_bytes
  , period / 1000.00 duration_ms
FROM    
  sys_load_history
ORDER BY
  start_time DESC LIMIT 100;

We get the next output.

Collect transaction-level particulars for commits and undo

SYS_TRANSACTION_HISTORY offers transaction-level logging by offering insights into dedicated transactions with particulars like blocks dedicated, standing, and isolation degree (serializable or snapshot used). It additionally logs particulars in regards to the rolled again or undo transactions.

The next screenshots illustrate fetching particulars a few transaction that was dedicated efficiently.

The next screenshots illustrate fetching particulars a few transaction that was rolled again.

Stats and vacuum

The SYS_ANALYZE_HISTORY monitoring view offers particulars just like the final timestamp of analyze queries, the period for which a selected analyze question ran, the variety of rows within the desk, and the variety of rows modified. The next instance question offers a listing of the most recent analyze queries that ran for all of the everlasting tables:

SELECT  
  TRIM(schema_name) schema_name
  , TRIM(table_name) table_name
  , table_id
  , standing
  , COUNT(*) times_analyze_was_triggered
  , MAX(last_analyze_time) last_analyze_time
  , MAX(end_time) end_time
  , AVG(ROWS) "rows"
  , AVG(modified_rows) modified_rows
FROM    
  sys_analyze_history
WHERE
   standing != 'Skipped'
GROUP BY
  schema_name
  , table_name
  , table_id
  , standing
ORDER BY
  schema_name
  , table_name
  , table_id
  , standing
  , end_time;

We get the next output.

The SYS_VACUUM_HISTORY monitoring view offers an entire set of particulars on VACUUM in a single view. For instance, see the next code:

SELECT  
  user_id
  , transaction_id
  , query_id
  , TRIM(database_name) as database_name
  , TRIM(schema_name) as schema_name
  , TRIM(table_name) table_name
  , table_id
  , vacuum_type
  , is_automatic as is_auto
  , period
  , rows_before_vacuum
  , size_before_vacuum
  , reclaimable_rows
  , reclaimed_rows
  , reclaimed_blocks
  , sortedrows_before_vacuum
  , sortedrows_after_vacuum
FROM    
  sys_vacuum_history
WHERE    standing LIKE '%Completed%'
ORDER BY
  start_time;

We get the next output.

Knowledge ingestion monitoring

On this part, we talk about how one can monitor knowledge ingestion.

Abstract of ingestion

SYS_LOAD_HISTORY offers particulars into the statistics of COPY instructions. Use this view for summarized insights into your ingestion workload. The next instance question offers an hourly abstract of ingestion damaged down by tables wherein knowledge was ingested:

SELECT  
  date_trunc('hour', start_time) period_hourly
  , database_name
  , table_name
  , standing
  , file_format
  , SUM(loaded_rows) total_rows_ingested
  , SUM(loaded_bytes) total_bytes_ingested
  , SUM(source_file_count) num_of_files_to_process
  , SUM(file_count_scanned) num_of_files_processed
  , SUM(error_count) total_errors
FROM    
  sys_load_history
GROUP BY
  period_hourly
  , database_name
  , table_name
  , standing
  , file_format
ORDER BY
  table_name
  , period_hourly
  , standing;

We get the next output.

File-level ingress logging

SYS_LOAD_DETAIL offers extra granular insights into how ingestion is carried out on the file degree. For instance, see the next question utilizing sys_load_history:

SELECT  
  *
FROM    
  sys_load_history
WHERE table_name="catalog_sales"
ORDER BY
  start_time;

We get the next output.

The next instance reveals what detailed file-level monitoring appears like:

 SELECT  
  user_id
  , query_id
  , TRIM(file_name) file_name
  , bytes_scanned
  , lines_scanned
  , splits_scanned
  , record_time
  , start_time
  , end_time
FROM    
  sys_load_detail
WHERE query_id = 1824870
ORDER BY
  start_time;

Test for errors throughout ingress course of

SYS_LOAD_ERROR_DETAIL allows you to observe and troubleshoot errors which will have occurred through the ingestion course of. This view logs particulars for the file that encountered the error through the ingestion course of together with the road quantity at which the error occurred and column particulars inside that line. See the next code:

choose * from sys_load_error_detail order by start_time restrict 100;

We get the next output.

Exterior question monitoring

SYS_EXTERNAL_QUERY_DETAIL offers run particulars for exterior queries, which incorporates Amazon Redshift Spectrum and federated queries. This view logs particulars on the section degree and offers helpful insights to troubleshoot and monitor efficiency of exterior queries in a single monitoring view. The next are a number of helpful metrics and knowledge factors this monitoring view offers:

Variety of exterior information scanned (scanned_files) and format of exterior information (file_format) similar to Parquet, textual content file, and so forth
Knowledge scanned when it comes to rows (returned_rows) and bytes (returned_bytes)
Utilization of partitioning (total_partitions and qualified_partitions) by exterior queries and tables
Granular insights into time taken in itemizing (s3list_time) and qualifying partitions (get_partition_time) for a given exterior object
Exterior file location (file_location) and exterior desk title (table_name)
Kind of exterior supply (source_type), similar to Amazon Easy Storage Service (Amazon S3) for Redshift Spectrum, or federated
Recursive scan for subdirectories (is_recursive) or entry of nested column knowledge sort (is_nested)

For instance, the next question reveals the each day abstract of the variety of exterior queries run and knowledge scanned:

SELECT  
  DATE_TRUNC('hour', start_time) period_hourly
  , user_id
  , TRIM(source_type) source_type
  , COUNT (DISTINCT query_id) query_counts
  , SUM(returned_rows) returned_rows
  , ROUND(SUM(returned_bytes) / 1024^3,2) returned_gb
FROM    
  sys_external_query_detail
GROUP BY
  period_hourly
  , user_id
  , source_type
ORDER BY
  period_hourly
  , user_id
  , source_type;

We get the next output.

Utilization of partitions

You may confirm whether or not the exterior queries scanning massive sums of information and information are partitioned or not. Once you use partitions, you possibly can limit the quantity of information that your exterior question has to scan by pruning primarily based on the partition key. See the next code:

SELECT  
  file_location
  , CASE
      WHEN NVL(total_partitions,0) = 0
      THEN 'No'
      ELSE 'Sure'
    END is_partitioned
  , SUM(scanned_files) total_scanned_files
  , COUNT(DISTINCT query_id) query_count
FROM    
  sys_external_query_detail
GROUP BY
  file_location
  , is_partitioned
ORDER BY
  total_scanned_files DESC;

We get the next output.

For any errors encountered with exterior queries, look into SYS_EXTERNAL_QUERY_ERROR, which logs particulars on the granularity of file_location, column, and rowid inside that file.

Gradual question efficiency troubleshooting

Consult with the sysview_slow_query_performance_troubleshooting SQL pocket book downloaded as a part of the conditions for a step-by-step information on how one can carry out query-level troubleshooting utilizing SYS monitoring views and discover solutions to the next questions:

Do the queries being in contrast have comparable question textual content?
Did the question use the outcome cache?
Which components of the question lifecycle (queuing, compilation, planning, lock wait) are contributing probably the most to question runtimes?
Has the question plan modified?
Is the question studying extra knowledge blocks?
Is the question spilling to disk? If that’s the case, is it spilling to native or distant storage?
Is the question extremely skewed with respect to knowledge (distribution) and time (runtime)?
Do you see extra rows processed in be part of steps or nested loops?
Are there any alerts indicating staleness in statistics?
When was the final vacuum and analyze carried out for the tables concerned within the question?

Clear up

Should you created any Redshift provisioned clusters or Redshift Serverless workgroups as a part of this submit and not want them on your workloads, you possibly can delete them to keep away from incurring extra prices.

Conclusion

On this submit, we defined how you should utilize the Redshift SYS monitoring views to watch workloads of provisioned clusters and serverless workgroups. The SYS monitoring views present simplified monitoring of the workloads, entry to numerous query-level monitoring metrics from a unified view, and the power to make use of the identical SYS monitoring view question to run throughout each provisioned clusters and serverless workgroups. We additionally lined some key monitoring and troubleshooting eventualities utilizing SYS monitoring views.

We encourage you to begin utilizing the brand new SYS monitoring views on your Redshift workloads. When you have any suggestions or questions, please go away them within the feedback.

In regards to the authors

Urvish Shah is a Senior Database Engineer at Amazon Redshift. He has greater than a decade of expertise engaged on databases, knowledge warehousing and in analytics house. Exterior of labor, he enjoys cooking, travelling and spending time together with his daughter.

Ranjan Burman is a Analytics Specialist Options Architect at AWS. He makes a speciality of Amazon Redshift and helps prospects construct scalable analytical options. He has greater than 15 years of expertise in several database and knowledge warehousing applied sciences. He’s keen about automating and fixing buyer issues with using cloud options.

Simplify Amazon Redshift monitoring utilizing the brand new unified SYS views

Overview of SYS monitoring views

Unify varied query-level monitoring metrics

Consumer query-level logging

Question lifecycle timings

Answer overview

Conditions

Workload and question lifecycle monitoring

Determine in-flight queries

Determine prime long-running queries

Collect each day counts of queries by question varieties, interval, and standing

Collect run particulars of an in-flight question

Collect transaction-level particulars for commits and undo

Stats and vacuum

Knowledge ingestion monitoring

Abstract of ingestion

File-level ingress logging

Test for errors throughout ingress course of

Exterior question monitoring

Utilization of partitions

Gradual question efficiency troubleshooting

Clear up

Conclusion

In regards to the authors

LEAVE A REPLY Cancel reply

Editor Picks

Must read

Popular categories