Innovation doesn’t need to wait for a “unicorn event”

Published in

Vanguard Tech

10 min readNov 15, 2023

In many complex organizations, the road from innovation to execution can present a multifaceted challenge. While top-down initiatives from leadership are indispensable for shaping the future of the IT landscape, these initiatives can often face hurdles at the operational level, such as urgent security vulnerabilities, infrastructure provisioning delays, or service downtime. Additionally, the rapid pace of technological evolution creates a race against time before moving on to the next innovation in the pipeline.

At Vanguard, we believe in the addition of a bottom-up approach to innovation to accelerate the delivery of our highest value objectives. Who is better equipped to tackle the day-to-day issues slowing down teams than those on the front lines?

In this article, we will walk through an example of one project that grew from a local idea to enterprise-wide adoption. By examining the key cultural factors that facilitated this growth, we aim to shed light on how a “two-way street” approach to innovation can unlock immense organizational potential.

Iteration 0: The origin

Vanguard has been building microservices in AWS using serverless technologies such as Lambda and StepFunctions, and over time, our applications have become more distributed. While this transformation has increased development speed, one major challenge has been end-to-end testing. This was the case for our team that was developing a modern trading application in AWS. Our application’s architecture was distributed and consisted of a Step Function orchestrating several Lambdas that communicated with external webservices (see Figure 1). Due to local environment constraints, we were not able to run these distributed tests on our own machines, and the continuous integration pipeline could only test each artifact in isolation. The only option left was to deploy resources to the AWS region and test the data in the QA environments. This worked for a while, but it soon led to major issues:

Data scarcity: Quality data from external services was not abundant in the QA region.
Destructive tests: Running tests called APIs that had side effects, such as changing mock balances or holdings, which could affect other tests.
Exact responses from dependencies: To test specific scenarios, such as what occurs when a trade executes for more than expected in the market, a specific set of responses from dependencies is needed. Given the QA region mimicked the market’s unpredictable price movement, it was nearly impossible to curate a specific scenario.
Dependency stability: Webservice dependencies in the QA region failed often with an unexpected issue such as a timeout error, 5xx status code, or an outage.

If any of the four issues occurred, a test failed. As you can begin to see with the figure below, the amount of failure points grows with each dependency, and therefore, the time taken to do even a small test suite grows exponentially.

Figure 1: Possible failures when manually testing the application.

Iteration 1: A coupled dependency system

After delivery speed had decreased to painful levels, the team was determined to find a solution. We decided we wanted to build a tool to simulate our dependencies, leaving the door open for future customizability with an in-house solution rather than using an existing industry tool. A more tenured member of the team coached us on how to advocate for funding. With a renewed sense of enthusiasm, we were able to gain support from our product owner and ultimately secured time to build it out.

Through this collaboration, we were able to highlight two key benefits of the new testing solution: an improved testing process and, consequently, better business outcomes. With that sprint time, we created a solution that proxied external webservice calls to a simulator that would return prefabricated responses back to the application as if it had called the real service. This accomplished two things: it removed the growing failure points, and it gave more flexibility to create any scenario, whether a success, a delayed response, or a specific failure response. Hence, we could spend the majority of our time focusing on testing just our application and leave the validation of external interactions to contract testing and targeted manual integration tests.

This marked the start of the first iteration of our dependency simulation service colloquially named “Sanic,” an endearing twist on the blue hedgehog video-game character Sonic known for his speed. At this stage, we were solely focused on tackling our team’s specific application and quickly assessing feasibility rather than being extensible to other applications. The figure below discusses, at a high level, how the solution worked.

Figure 2: A coupled version of the dependency simulation application

How it works:

Code the desired webservice’s mock responses into the Scenario Creator.
Call the dependency simulation service to create mock response data in the application’s QA region database for the upcoming test scenarios.
Invoke the application under test with the unique test scenario ID.
The Lambdas used the new simulation logic to route outgoing requests to the simulator along with the scenario ID. This ID would let the dependency simulator know which fabricated response to return.

The first iteration was simple. It was tightly coupled to the service, utilized the service’s database and required custom code for each Lambda to decide when to call the dependency simulation. Response types were limited to only those necessary for our application. It made assumptions about how the system stored its data, and how the messages were passed from Lambda to Lambda inside of the Step Function. When a new dependency was introduced, code changes to the simulation service were required.

While the first iteration left room for improvement, Sanic achieved its objectives and was considered a success. Tests that took days to weeks to accomplish in an unstable QA region could now be completed in minutes. The team was happier, more confident, and more streamlined in deploying changes. The test suites were now more stable and could be run almost instantly, which enabled quick feedback that the core logical flows of the distributed service were operating as expected.

Iteration 2: Generalized dependency simulation

Later, a new application with similar architecture experienced the same challenges in the QA region. Our product owner, along with help from leadership, communicated our success and identified an opportunity for our solution to be adopted not only by this other team, but also for broader use across any application. Given the proven success of the initial implementation, leadership felt confident enough to allocate dedicated sprint time to expand and generalize the service.

Enter “Knackles,” a tool similarly named after video game character “Knuckles” and containing several critical improvements:

Improvement 1: Make it a standalone tool.

Dependency simulator logic and storage were moved out to a centralized, shared service and were no longer coupled tightly to a specific service. Any new service could use the dependency simulation with no infrastructure provisioning and minimal setup.

Improvement 2: Minimize change required in the application under test.

All logic to decide whether to call real services or the simulator was moved into libraries. Because of variations in languages, frameworks, and architectures (event driven vs. request/response), we created a variety of libraries to accommodate each variation. For example, NestJS services had dependency logic abstracted via configuration while event- driven systems that relied on AWS integrations required a bit more work to wrap the SDKs so the simulation ID could be passed to the dependency simulation service.

Improvement 3: Generalize how scenarios are created.

Previously, a code change would have to be made to create a new scenario. This shifted to an API-based approach where a user could input the scenario they wanted to create in the system.

Improvement 4: Reduce a scenario’s reliance on the exact order of external calls.

In the first iteration, one had to know that a call to service A happened first, and a call to service B happened after. Now, one just needed to provide a map of expected external calls and the desired mock responses, regardless of order (see Figure 3.1).

This resulted in a more flexible system as shown below.

Figure 3: A generalized version of the dependency simulation application

How it works:

Call the simulation creator API with the expected requests and associated mock responses and receive a unique scenario ID.
Invoke the process with the scenario ID (implementation differs based on the type of service)
The process routes outgoing requests to the dependency simulation tool along with the scenario ID and gets the simulated responses back.
The scenario data expires and is cleaned up by Dynamo with the use of the TTL (time to live) attribute.

Figure 3.1: A simplified comparison of inputting mock responses into iteration 1 and 2

Knackles was also a success for the new team and now it was generalized for others to use.

Through socialization of the tool and its accomplishments, a sense of enthusiasm was cultivated among developers. Even outside the team, developers volunteered their time to help implement libraries and features. Leadership also began to identify additional use cases for the tool.

Iteration 3: Performance testing

Our division decided to initiate a top-down directive to bring application resiliency to the forefront through increased performance testing. This would bring to light potential weaknesses and avoid unwanted surprises in production. Not surprisingly, the QA region was again complicating their efforts. The large load was exacerbating existing limitations and leading to disruptions that affected other teams’ tests. With dependency simulation, we could run these tests in a prod-like environment without impacting outside applications. Moreover, it offered the flexibility to design specific load scenarios, delays, and timeouts at scale to assess how applications would respond to unique constraints.

Now fully aligned on the value-adding potential of dependency simulation, divisional leaders appointed a new team to create a performance testing framework around the tool named “Taals”. Through partnership with us and Vanguard’s central performance testing platform team, this new team implemented caching and changed the platform from Lambda to ECS to lower cost at high loads. We also added a feature to change at run-time what subset of external application calls would route to the simulator. This last feature enabled consumers to pinpoint specific external applications to include in the performance test without risking an outage on applications that weren’t ready.

Soon, this was integrated into an enterprise-wide performance testing strategy from the Resiliency Architecture Office, and both Taals and Knackles were supported by dedicated teams.

What started with a simple tool for one team became a tool for a division, and soon grew into an enterprise solution. Every company experiences constraints in the work environment. We had our own with local testing and QA stability which led us to the innovation. These constraints were captured by a set of engineers who dealt with these daily challenges. With the issues identified and a climate created for innovation, something local became something much greater than an idea.

The secret sauce

What innovation enablers made this project a success, and what changes has Vanguard made to lay the groundwork for our full innovative potential?

Developer mentality

Project enablers: Throughout each stage of the dependency simulation project, developers were self-motivated to pursue excellence and challenge the status quo.

Examples of Vanguard’s groundwork: Foster an ownership mindset in developers through things like promoting active participation in application design and technical requirements gathering. Appoint thought-leadership responsibilities such as being the subject-matter-expert on a new feature and presenting/defending decisions at technical forums.

Support

Project enablers: Leaders within the team mentored and coached the developers on how to prove the worth of their idea. Additionally, leaders outside the team used their professional network to bubble the idea up the chain of command, vetting the value and cascading it through to higher levels.

Examples of Vanguard’s groundwork: Create a process where innovative ideas from anywhere can be properly evaluated and gain support. During a regularly occurring event, submitted ideas are evaluated by localized technical and business leadership before being bubbled up to higher levels of leadership, closing the gap between technical know-how and business value.

Time allocation

Project enablers: Teams involved in the project used dedicated innovation time to test out their ideas and were later given dedicated sprint time to implement a full-scale solution.

Examples of Vanguard’s groundwork: Regularly reserve meeting-free time at a department level that can optionally be used for personal development and innovation. This offers stigma-free focus time that developers have used to study for certifications, or ideate on how to improve their own processes, increasing Vanguard’s level of engineering excellence.

Conclusion

In sharing this experience at Vanguard, we hope to have inspired you to be solution-minded when it comes to bottom-up innovation. We’ve learned that innovation doesn’t have to be a “unicorn event,” but rather something that can be developed through a carefully curated environment. By cultivating developers’ ownership mentality, providing mentorship and structure for innovations, and protecting time allocation for the most value-added ideas, organizations can complement their innovation strategy through strategic partnership between developer teams and high-level leadership. Let’s aspire to create a future where everyone can participate as a thought leader and no good idea is left unexplored.

Come work with us!
Vanguard’s technologists design, architect, and build modernized cloud-based applications to deliver world-class experiences to 50 million investors worldwide. Hear more about our tech — and the crew behind it — at vanguardjobs.com.

Innovation doesn’t need to wait for a “unicorn event”

Written by Vanguard Tech