This submit is co-authored with Guillaume Saint-Martin at Solar King.
Solar King is the world’s main off-grid photo voltaic power firm, and is on a mission to energy entry to brighter lives by way of off-grid photo voltaic. Solar King designs, distributes, installs, and funds photo voltaic residence power merchandise for individuals at present residing with out dependable power entry. It serves over 100 million customers in 65 international locations internationally.
Over 26,000 brokers throughout Africa right this moment assist native households get entry to Solar King off-grid merchandise to have extra productive lives. These brokers are knowledgeable in near-real time to search out the correct geographical areas and households who should not have entry to low price energy. Solar King is pushed by information for analyzing areas of progress throughout hundreds of miles utilizing a dashboards which can be powered by Amazon Redshift.
On this submit, we share how Solar King makes use of Amazon Redshift and Redshift’s options like Information Sharing capabilities to enhance the efficiency of queries in Looker for over 1,000 of our workers.
Amazon Redshift is a totally managed, scalable cloud information warehouse that accelerates your time to insights with quick, straightforward, and safe analytics at scale. Tens of hundreds of consumers depend on Amazon Redshift to research exabytes of knowledge and run advanced analytical queries, making it a extensively used cloud information warehouse. You may run and scale analytics in seconds on all of your information with out having to handle your information warehouse infrastructure.
Use case
Solar King makes use of a Redshift provisioned cluster to run its extract, remodel and cargo (ETL) and analytics processes to supply and remodel information from varied sources. It then gives entry to this information for enterprise customers by way of Looker. Amazon Redshift at present manages diverse consumption necessities for Looker customers throughout the globe
Amazon Redshift is used to wash and mixture information into pre-processed tables, execute Solar King’s ETL pipelines, and course of Looker “persistent derived tables” (PDTs) scheduled at an hourly frequency or much less. These ETLs pipelines and PDTs had been competing workloads and typically bumped into learn/write conflicts.
As data-driven firm continues increasing, Solar King wanted an answer that does the next:
- Permits a whole lot of queries to run in parallel with desired question throughput.
- Optimize workload administration to allow ETL, enterprise intelligence (BI4) and Looker workloads to run concurrently with out impacting one another.
- Seamlessly scale capability with the rise in person base and keep price effectivity.
Answer overview
As the information volumes, question counts, and customers proceed to develop, Solar King determined to maneuver from a single cluster to a multi-cluster structure with information sharing to benefit from workload isolation and separate ETL and analytics workloads throughout completely different clusters whereas nonetheless utilizing a single copy of the information.
The answer at Solar King is comprised of a number of Redshift provisioned clusters and an Amazon Elastic Compute Cloud (EC2) Community Load Balancer, utilizing the information sharing functionality in Amazon Redshift.
Amazon Redshift Information Sharing permits information entry throughout Redshift clusters with out having to repeat or transfer information. Subsequently, when a workload is moved from one Redshift cluster to a different, the workload can proceed to entry information within the preliminary Redshift cluster. For extra data, consult with Sharing Amazon Redshift information securely throughout Amazon Redshift clusters for workload isolation.
The answer consists of the next key elements:
- Core ETL cluster: A core ETL producer cluster (8 ra3.xlplus nodes) with information share.
- Looker cluster: A producer/client cluster (8 ra3.4xlarge nodes) with information share to run the next:
- Giant ETL processes
- Looker initiated ETL processes (PDTs)
- Information group workloads
- BI clusters: This consists of 4 massive client clusters (6 ra3.4xlarge nodes every):
- Three clusters utilizing reserved situations (RIs) which can be on 24/7
- One on-demand cluster turned on for six hours each weekday
- Community Load Balancer: The community load balancer distributes queries originating from Looker between the buyer clusters
- Concurrency scaling free tier: Every of the three clusters utilizing reserved situations (RIs) produces one hour of concurrency scaling credit per day, that are used on Mondays, whereas the on-demand cluster produces 4 hours of concurrency scaling credit retaining the concurrency scaling price below free tier.
The next diagram reveals the answer and workflow steps
Outcomes
Solar King noticed the next enhancements with this answer:
- Efficiency – The advance in efficiency was drastic and speedy after implementing the distributed producer/client structure. Most queries (95%) that used to take between 50-90 seconds to finish prior to now take at most 40 seconds, 75% of queries used to take as much as 5 seconds prior to now take lower than one second. Moreover, the variety of queries run (Amazon Redshift Adoption) elevated by 40%, pushed by a better utilization of Looker following the structure change.
- Workload administration – After this architectural change, queries don’t spend a very long time queued anymore. The next chart illustrates queued vs working queries on one of many clusters earlier than and after the modernization engagement.
- Scalability – With this Redshift information share enabled structure, the Solar King information group was capable of deliver again a suitable efficiency to its customers, resulting in renewed engagement , measured with the doubling of the variety of month-to-month queries over the next few month, thus growing adoption of Amazon Redshift throughout the corporate.
Solar King prices are estimated to solely enhance by 35%, by reserving most situations used for 3 years (26 ra3.4xlarge and eight ra3.xlplus) and counting on the concurrency scaling free tier for a lift of efficiency on the day of highest utilization. That is in comparison with the smaller variety of reserved clusters (8 ra3.4xlarge) and a a lot bigger utilization of concurrency scaling (two concurrency scaling clusters, almost all the time on). This modernization elevated the productiveness of the brokers by offering them quicker and close to actual time entry to areas that want entry to low price energy.
Conclusion:
On this submit, we mentioned how Solar King used Amazon Redshift information sharing capabilities to distribute workload and scale Amazon Redshift to deal with end-user efficiency necessities from Looker and preserve management over the price of Amazon Redshift consumption. Strive the approaches mentioned on this submit and tell us your suggestions within the feedback.
In regards to the authors
Guillaume Saint-Martin leads the Information and Analytics group at Solar King. With 10 years of expertise within the information and growth sectors, he manages a group of over 30 analysts, information engineers, and information scientists to help Solar King long run modeling and development evaluation.
Aaber Jah is a Senior Analytics Specialist at AWS primarily based in Chicago, Illinois. He focuses on driving and sustaining AWS Information Analytics enterprise worth for patrons.
Rohit Vashishtha is a Senior Analytics Specialist Options Architect at AWS primarily based in Dallas, Texas. He has over 17 years of expertise architecting, constructing, main, and sustaining large information platforms. Rohit helps clients modernize their analytic workloads utilizing the breadth of AWS companies and ensures that clients get the most effective worth/efficiency with utmost safety and information governance.