13.8 C
London
Friday, March 29, 2024

Nexthink scales to trillions of occasions per day with Amazon MSK


Actual-time knowledge streaming and occasion processing current scalability and administration challenges. AWS provides a broad number of managed real-time knowledge streaming providers to effortlessly run these workloads at any scale.

On this put up, Nexthink shares how Amazon Managed Streaming for Apache Kafka (Amazon MSK) empowered them to attain large scale in occasion processing. Experiencing enterprise hyper-growth, Nexthink migrated to AWS to beat the scaling limitations of on-premises options. With Amazon MSK, Nexthink now seamlessly processes trillions of occasions per day, reaching over 5 GB per second of aggregated throughput.

Within the following sections, Nexthink introduces their product and the necessity for scalability. They then spotlight the challenges of their legacy on-premises software and current their transition to a cloud-centered software program as a service (SaaS) structure powered by Amazon MSK. Lastly, Nexthink particulars the advantages achieved by adopting Amazon MSK.

Nexthink’s have to scale

Nexthink is the chief in digital worker expertise (DeX). The corporate is shaping the way forward for work by offering IT leaders and C-levels with insights into staff’ day by day expertise experiences on the gadget and software stage. This enables IT to evolve from reactive problem-solving to proactive optimization.

The Nexthink Infinity platform combines analytics, monitoring, automation, and extra to handle the worker digital expertise. By amassing gadget and software occasions, processing them in actual time, and storing them, our platform analyzes knowledge to resolve issues and increase experiences for over 15 million staff throughout 5 continents.

In simply 3 years, Nexthink’s enterprise grew tenfold, and with the introduction of extra real-time knowledge our software needed to scale from processing 200 MB per second to five GB per second and trillions of occasions day by day. To allow this development, we modernized our software from an on-premises single-tenant monolith to a cloud-based scalable SaaS resolution powered by Amazon MSK.

The subsequent sections element our modernization journey, together with the challenges we confronted and the advantages we realized with our new cloud-centered, AWS-based structure.

The on-premises resolution and its challenges

Let’s first discover our earlier on-premises resolution, Nexthink V6, earlier than analyzing how Amazon MSK addressed its challenges. The next diagram illustrates its structure.

Nexthink v6

V6 was made up of two monolithic, single-tenant Java and C++ purposes that had been tightly coupled. The portal was a backend-for-frontend Java software, and the core engine was an in-house C++ in-memory database software that was additionally dealing with gadget connections, knowledge ingestion, aggregation, and querying. By bundling all these features collectively, the engine grew to become troublesome to handle and enhance.

V6 additionally lacked scalability. Initially supporting 10,000 units, some new tenants had over 300,000 units. We reacted by deploying a number of V6 engines per tenant, rising complexity and value, hampering person expertise, and delaying time to market. This additionally led to longer proof of idea and onboarding cycles, which damage the enterprise.

Moreover, the absence of a streaming platform like Kafka created dependencies between groups by means of tight HTTP/gRPC coupling. Moreover, groups couldn’t entry real-time occasions earlier than ingestion into the database, limiting characteristic growth. We additionally lacked an information buffer, risking potential knowledge loss throughout outages. Such constraints impeded innovation and elevated dangers.

In abstract, though the V6 system served its preliminary function, reinventing it with cloud-centered applied sciences grew to become crucial to boost scalability, reliability, and foster innovation by our engineering and product groups.

Transitioning to a cloud-centered structure with Amazon MSK

To realize our modernization objectives, after thorough analysis and iterations, we carried out an event-driven microservices design on Amazon Elastic Kubernetes Service (Amazon EKS), utilizing Kafka on Amazon MSK for distributed occasion storage and streaming.

Our transition from the v6 on-prem resolution to the cloud-centered platform was phased over 4 iterations:

  • Part 1 – We lifted and shifted from on premises to digital machines within the cloud, lowering operational complexities and accelerating proof of idea cycles whereas transparently migrating clients.
  • Part 2 – We prolonged the cloud structure by implementing new product options with microservices and self-managed Kafka on Kubernetes. Nevertheless, working Kafka clusters ourselves proved overly troublesome, main us to Part 3.
  • Part 3 – We switched from self-managed Kafka to Amazon MSK, bettering stability and lowering operational prices. We realized that managing Kafka wasn’t our core competency or differentiator, and the overhead was excessive. Amazon MSK enabled us to give attention to our core software, releasing us from the burden of undifferentiated Kafka administration.
  • Part 4 – Lastly, we eradicated all legacy elements, finishing the transition to a completely cloud-centered SaaS platform. This multi-year journey of studying and transformation took 3 years.

Immediately, after our profitable transition, we use Amazon MSK for 2 key features:

  • Actual-time knowledge ingestion and processing of trillions of day by day occasions from over 15 million units worldwide, as illustrated within the following determine.

Nexthink Architecture Ingestion

  • Enabling an event-driven system that decouples knowledge producers and customers, as depicted within the following determine.

Nexthink Architecture Event Driven

To additional improve our scalability and resilience, we adopted a cell-based structure utilizing the huge availability of Amazon MSK throughout AWS Areas. We at the moment function over 10 cells, every representing an unbiased regional deployment of our SaaS resolution. This cell-based strategy minimizes the realm of affect in case of points, addresses knowledge residency necessities, and permits horizontal scaling throughout AWS Areas, as illustrated within the following determine.

Nexthink Architecture Cells

Advantages of Amazon MSK

Amazon MSK has been vital in enabling our event-driven design. On this part, we define the primary advantages we gained from its adoption.

Improved knowledge resilience

In our new structure, knowledge from units is pushed on to Kafka subjects in Amazon MSK, which gives excessive availability and resilience. This makes positive that occasions may be safely obtained and saved at any time. Our providers consuming this knowledge inherit the identical resilience from Amazon MSK. If our backend ingestion providers face disruptions, no occasion is misplaced, as a result of Kafka retains all printed messages. When our providers resume, they seamlessly proceed processing from the place they left off, because of Kafka’s producer semantics, which permit processing messages exactly-once, at-least-once, or at-most-once based mostly on software wants.

Amazon MSK permits us to tailor the information retention period to our particular necessities, starting from seconds to limitless period. This flexibility grants uninterrupted knowledge availability to our software, which wasn’t potential with our earlier structure. Moreover, to safeguard knowledge integrity within the occasion of processing errors or corruption, Kafka enabled us to implement an information replay mechanism, making certain knowledge consistency and reliability.

Organizational scaling

By adopting an event-driven structure with Amazon MSK, we decomposed our monolithic software into loosely coupled, stateless microservices speaking asynchronously through Kafka subjects. This strategy enabled our engineering group to scale quickly from simply 4–5 groups in 2019 to over 40 groups and roughly 350 engineers right now.

The unfastened coupling between occasion publishers and subscribers empowered groups to give attention to distinct domains, equivalent to knowledge ingestion, identification providers, and knowledge lakes. Groups may develop options independently inside their domains, speaking by means of Kafka subjects with out tight coupling. This structure accelerated characteristic growth by minimizing the chance of latest options impacting present ones. Groups may effectively devour occasions printed by others, providing new capabilities extra quickly whereas lowering cross-team dependencies.

The next determine illustrates the seamless workflow of including new domains to our system.

Adding domains

Moreover, the event-driven design allowed groups to construct stateless providers that would seamlessly auto scale based mostly on MSK metrics like messages per second. This event-driven scalability eradicated the necessity for in depth capability planning and handbook scaling efforts, releasing up growth time.

Through the use of an event-driven microservices structure on Amazon MSK, we achieved organizational agility, enhanced scalability, and accelerated innovation whereas minimizing operational overhead.

Seamless infrastructure scaling

Nexthink’s enterprise grew tenfold in 3 years, and plenty of new capabilities had been added to the product, resulting in a considerable improve in site visitors from 200 MB per second to five GB per second. This exponential knowledge development was enabled by the strong scalability of Amazon MSK. Attaining such scale with an on-premises resolution would have been difficult and costly, if not infeasible.

Making an attempt to self-manage Kafka imposed pointless operational overhead with out offering enterprise worth. Operating it with simply 5% of right now’s site visitors was already complicated and required two engineers. At right now’s volumes, we estimated needing 6–10 devoted employees, rising prices and diverting sources away from core priorities.

Actual-time capabilities

By channeling all our knowledge by means of Amazon MSK, we enabled real-time processing of occasions. This unlocked capabilities like real-time alerts, event-driven triggers, and webhooks that had been beforehand unattainable. As such, Amazon MSK was instrumental in facilitating our event-driven structure and powering impactful improvements.

Safe knowledge entry

Transitioning to our new structure, we met our safety and knowledge integrity objectives. With Kafka ACLs, we enforced strict entry controls, permitting customers and producers to solely work together with approved subjects. We based mostly these granular knowledge entry controls on standards like knowledge sort, area, and staff.

To securely scale decentralized administration of subjects, we launched proprietary Kubernetes Customized Useful resource Definitions (CRDs). These CRDs enabled groups to independently handle their very own subjects, settings, and ACLs with out compromising safety.

Amazon MSK encryption made positive that the information remained encrypted at relaxation and in transit. We additionally launched a Deliver Your Personal Key (BYOK) choice, permitting application-level encryption with buyer keys for all single-tenant and multi-tenant subjects.

Enhanced observability

Amazon MSK gave us nice visibility into our knowledge flows. The out-of-the-box Amazon CloudWatch metrics allow us to see the quantity and varieties of knowledge flowing by means of every subject and cluster. This helped us quantify the utilization of our product options by monitoring knowledge volumes on the subject stage. The Amazon MSK operational metrics enabled easy monitoring and right-sizing of clusters and brokers. General, the wealthy observability of Amazon MSK facilitated data-driven selections about structure and product options.

Conclusion

Nexthink’s journey from an on-premises monolith to a cloud SaaS was streamlined through the use of Amazon MSK, a completely managed Kafka service. Amazon MSK allowed us to scale seamlessly whereas benefiting from enterprise-grade reliability and safety. By offloading Kafka administration to AWS, we may keep centered on our core enterprise and innovate quicker.

Going ahead, we plan to additional enhance efficiency, prices, and scalability by adopting Amazon MSK capabilities equivalent to tiered storage and AWS Graviton-based EC2 occasion sorts.

We’re additionally working carefully with the Amazon MSK staff to arrange for upcoming service options. Quickly adopting new capabilities will assist us stay on the forefront of innovation whereas persevering with to develop our enterprise.

To be taught extra about how Nexthink makes use of AWS to serve its international buyer base, discover the Nexthink on AWS case examine. Moreover, uncover different buyer success tales with Amazon MSK by visiting the Amazon MSK weblog class.


Concerning the Authors

Moe HaidarMoe Haidar is a principal engineer and particular tasks lead @ CTO workplace of Nexthink. He has been concerned with AWS since 2018 and is a key contributor to the cloud transformation of the Nexthink platform to AWS. His focus is on product and expertise incubation and structure, however he additionally loves doing hands-on actions to maintain his data of applied sciences sharp and updated. He nonetheless contributes closely to the code base and likes to sort out complicated issues.
Simone PomataSimone Pomata is Senior Options Architect at AWS. He has labored enthusiastically within the tech business for greater than 10 years. At AWS, he helps clients reach constructing new applied sciences day by day.
Magdalena GargasMagdalena Gargas is a Options Architect captivated with expertise and fixing buyer challenges. At AWS, she works principally with software program corporations, serving to them innovate within the cloud. She participates in business occasions, sharing insights and contributing to the development of the containerization discipline.

Latest news
Related news

LEAVE A REPLY

Please enter your comment!
Please enter your name here