Driving Resilience with Userguiding: Implementing AWS Opensearch Disaster Recovery with DevOps Practices

Alperen Ruhbaş
bestcloudforme
Published in
5 min readMay 25, 2023

About the Customer

Userguiding is a dynamic SaaS company that specializes in providing user onboarding and product adoption solutions to businesses across various industries. With a strong focus on improving user experiences, Userguiding enables companies to create seamless and interactive onboarding experiences for their users. By offering intuitive product tours, personalized checklists, and contextual help widgets, Userguiding helps businesses drive user engagement, increase product adoption, and reduce churn.

The company’s user-centric approach and commitment to innovation have earned them a trusted reputation among their customers. Userguiding’s user onboarding platform empowers businesses to easily design and implement effective onboarding processes, ensuring that users have a smooth and successful journey from the very beginning. By providing step-by-step guidance, Userguiding helps users quickly familiarize themselves with new products, features, and workflows, ultimately leading to increased user satisfaction and loyalty.

Customer Challenge

As a core component of the system design, the AWS Opensearch (Elasticsearch) service holds significant importance, considering the potential impact of any issues that may arise and their potential to severely impact the system. To ensure optimal performance, it was essential to establish a Disaster Recovery (DR) scenario that addresses both the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) values while minimizing costs. This DR solution needed to facilitate quick recovery without requiring manual intervention, allowing the system to seamlessly operate and maintain its desired state.

Simultaneously, another critical aspect of the design revolved around accommodating the substantial size and continuous growth of the structure, consisting of over 10,000 indexes and 3 terabytes of data stored in AWS Opensearch. Ensuring compatibility with this large and expanding dataset was crucial. This involved breaking down the structure into smaller components, enabling seamless transition between different regions or even scaling up to significantly larger scenarios while preserving the system’s functionality and operability. By achieving this level of adaptability, the system maintained its robustness and performance under various conditions and scale requirements.

Partner Solution

As the Bestcloudforme team, in the designed architecture:

To enhance the resilience and availability of the system, the AWS Opensearch (Elasticsearch) service holds a crucial role. Considering the potential impact of any issues that may arise, it was necessary to implement a robust Disaster Recovery (DR) scenario that can minimize the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) while keeping the costs at a minimum.

Simultaneously, ensuring the adaptability of the structure to the substantial size of the AWS Opensearch, with 3 terabytes of data and over 10,000 indexes, was another essential aspect. This was achieved by breaking down the structure into smaller components, enabling the seamless migration to different regions or scaling to larger deployments, while maintaining the operational capability.

By leveraging the native capabilities of AWS Opensearch and implementing frequent S3 snapshots as backups, the RPO was minimized by reducing the time between snapshots. The optimization of RTO and cost played a central role in the evolution of the process.

To achieve the shortest possible RTO, it was initially considered to keep AWS Opensearch continuously active. However, given the requirement of storing all data by restoring it, scaling up the cluster nodes would result in a significant cost increase. To address this challenge, a balanced approach was adopted, prioritizing the utilization of AWS serverless resources. Two key services were leveraged:

  • AWS Step Functions: The design incorporated Step Functions, enabling the creation of a State Machine that effectively handles potential errors and facilitates sequential process execution.
  • AWS Lambda: Lambda functions were utilized for custom configuration adjustments, early resolution of potential errors, dynamic retrieval of up-to-date snapshot and index information from AWS Opensearch, and the initiation of the restore process based on the acquired data.

The strategic use of these tools, without incurring active costs and minimizing execution times, allowed for the optimization of both RTO and costs. Additionally, the adoption of a serverless architecture, combined with the ability to start the process with a single click rather than keeping the tools continuously running, facilitated a fast and streamlined approach towards achieving the desired outcomes.

In the final testing phase, the architecture demonstrated its capabilities by achieving an RPO of 1 hour and an RTO of 45 minutes for the structure consisting of over 10,000 indexes and 3 terabytes of data. Moreover, the solution incurred no costs when not in use, further enhancing its efficiency and cost-effectiveness.

Results and Benefits

The implemented architecture yielded significant results and brought numerous benefits to the organization:

1. Enhanced Resilience
2. Cost Optimization
3. Efficient Snapshot Management
4. Streamlined Operations
5. Scalable Architecture

Overall, the implemented architecture successfully achieved the desired results of improved resilience, cost optimization, efficient snapshot management, streamlined operations, and scalability. The organization now has a reliable and cost-effective DR solution for its critical AWS Opensearch infrastructure, ensuring minimal downtime, data integrity, and business continuity in the face of any potential disruptions.

Thank you so much for taking the time to read this article. I hope you found the information helpful and valuable. If you have any questions, comments, or feedback, please feel free to reach out.

Contact E-Mail: hello@bestcloudfor.me
Project Team:

  • Ahmet Uçan
  • Görkem Aktaş
  • Alperen Ruhbaş

--

--