Dev Talks

Kelly Blue Book's Journey to Microservices

Vincit Dev Talks
March 2nd 2020
Vincit Dev Talks is a quarterly tech meetup held in Southern California. It’s designed to bring directors down from their towers and developers out from their cubicles. We feature speeches on a variety of topics ranging from marketing and business development to software engineering. Talks will be brief but informative, leaving time to chat and enjoy the free food and drinks. Check out vincitdevtalks.com for upcoming events.
In this talk from our February 2020 event, developers Himaja Suman and Anika Corpus from Cox Automotive take us through Kelly Blue Book's decision to implement microservices and how it improved the development process for their engineers.
Himaja: Thank you.
Anika: Hi, my name's Anika Corpus. I am a user interface engineer at Kelly Blue Book. I've been working there for almost three years. When I first started, I work on a microservice and then back to the long-life and then more microservices. I quickly realized how different those two developer experiences were.
Himaja: Hi, I'm Himaja. I'm a senior UI engineer at Cox Automotive. I've been working with Cox for over two years and I've been involved in the whole process of moving the kb.com from monolith to microservices. And currently I'm working on complete migration and look and feel refresh of owner's application. There you can find the value of your car. So, today, we are here to share our journey to microservices.
Anika: So, with a show of hands, who's actually heard of KBB.com before this talk? Okay, pretty good actually. Wasn't expecting that. Well, KBB.com is a site where you can get car values when you're either buying or selling your car. You can also get expert reviews, get an instant cash offer, and today you can even go in and get a service and repair estimate or even schedule your next appointment. So, to kind of give you an idea of how big KBB.com is, we get about 400 visits per minute. That's about 24,000 visits per hour, which is pretty huge. In a month, we get about 18.5 million unique visitors. I have a clicker, I don't know why I'm using it. And about 408 million vehicles provided last year alone.
Anika: Well, why should we move from the monolith to the microservices architecture? Up until two years ago, we were built on a traditional n-teir MVC architecture spanned on a two on-prime data centers. Can you imagine if one of those two data centers went down?
Anika: We have over a million Googled index pages and a number of different application funnels. It was very difficult to keep the user experience from page to page. We have over 17,000 files on GitHub and yet we only deploy twice a week. Onboarding was very slow. It could take a developer almost two days to set up their dev machine or their local environment. We need to move faster without any downtime.
Anika: We wanted to spend our time improving our architecture and to evolve the technologies that we used, we wanted to improve agility and deliver products with a faster turn around time and provide better value for our customers.
Himaja: And this is how our monolith architecture looked like. It is built using ASP.net MVC architecture. So, we have sections of our website divided into modules, where each module could have been an application in itself. And as our product evolved, we kept on adding to these modules and these modules are differentiated by folders in a huge big code repository. Though it is MVC using separation of concerns, due to the way these modules were set up, there's a lot of code being reused between these modules.
Himaja: There were no clear application boundaries, and the contracts between these services and these applications weren't clear, weren't well defined and managed. To simply put, it is a big ball of mud. So, a minor edit could take down the complete website which directly affects our users and therefore our revenue. And the business logic is often duplicated and accidentally different that would cause data discrepancies and we ended up putting lots of hacks in our code. And scalability in terms of infrastructure and technology has become very cumbersome, and team scalability has created overhead and it slowed down the pace of development and it was difficult to introduce and use new technologies and patterns.
Himaja: A simple decision to use ES6 code on our front end caused several issues and we ended up creating a separate folder just for ES6 code at our root level and our bills and deployments were huge and they were very well coordinated. There was a lot of manual process. We had two week... We have twice per week deployment cadence where it would take from two days to a week for a small bug fix or minor feature to go live.
Himaja: These build and deployment times were slow due to the manual processes. We had scrum teams to support the agility and independence, but the application and code is still very tightly coupled and it was challenging for teams to move faster and be accountable and monitor that performance of the components that they have built.
Anika: So, you've seen what the monolith looked like, but how do we migrate that into smaller, more consumable pieces? The first step we did was do a lift and shift. We essentially took the monolith, which was in one GitHub repository, and uploaded that into the cloud. We also took all of our external services and then uploaded up that to the cloud as well. But the next task is how do we strangle the monolith to satisfy not only engineering needs but also alignment product? The answer we had was domain driven design.
Anika: Domain driven design is the concept where you take business goals from a user's perspective and in that sense we identified nine bounded contexts. You can describe a bounded context as an intent a user has on the site. For example, I'm a user and I'm looking to sell my car, so since I own a car, he would fall under the owner's bounded context. Another example is I'm a user and I'm looking to buy a car but I don't know what car I want, so you would fall under the research bounded context.
Anika: After defining each bounded context, we work closely with UX teams to create a component library. The purpose of the component library was to maintain a consistent look and feel across all of the bounded contexts and microservices without having to recreate each piece every single time.
Anika: From there, we determine service level agreements and objectives between each service dependencies. We automated our build and deployments and embedded these previous roles into each team as a resource for support. Each scrum team is now responsible for the build deployment logging and monitoring other bounded contexts.
Himaja: So, this is how our microservice architecture looked like. As Anika mentioned earlier, we decompose KBB.com into nine bounded context, where each bounded context focuses on single business task that is meaningful to users, not technology or infrastructure. And we aligned our infrastructure to this principle by creating part and non-part accounts in AWS for each bounded context. And these bounded context applications can have several microservice apps, where each microservice app handles a strong and single entity with a business logic and data layer.
Himaja: Now the question is how big a microservice should be, and should all the microservices in bounded contexts be in one big code repository or multiple code repositories? Well, we handled it by creating... We handled all these questions by creating a KBB blueprint that provides guidance and clarity on what a microservice should be.
Himaja: We started strangling our KBB.com by moving our lowest traffic bounded context into a microservice. We experimented with technologies and we monitored the performance and we came up with a baseline tech stack and some best practices. And with all these learnings, we created KBB blueprint that provides guiding principle to create these microservices. And it also provides templates to create, to make this migration faster.
Himaja: And these micro service apps can live in separate repositories or one big monorepo. But all these should be able to be built separately and independently deployable. We applied this pattern for our front end as well and created our component library to ensure the consistency in UI and UX across these bounded context apps. As we progressed, we identified some services like location detection or feature flag management that are needed across these bounded context apps and we moved them to a separate AWS account and started consuming them as external services.
Himaja: So, this helped us establish clear dependencies between apps and come up and define the service level agreements and objectives. And the scrum teams are now responsible for one single bounded context app where they can focus on build, measure, learn cycle.
Anika: So, you've seen the bigger picture of the microservices, kind of how they talk to each other. But what does a single microservice look like? So, we have several layers. We can have a UI layer a BFF layer, I'll get to that later. Cache, red API, database and your infrastructure layer. So for our UI layer, we use a combination of react and emotion JS. Emotion JS is just an open source package that uses CSS and JS.
Anika: For our BFF layer, which is the back end for your front end, we use a combination of graph QL and Apollo. We also utilize node JS for server side rendering to meet SEO requirements and approved performance. All of this is deployed to Elastic Beanstalk, which is a service on Amazon that we used to deploy our rep applications and scaling.
Anika: To minimize our BFF calls, we also utilize a cache layer which gets deployed to AWS ElastiCache. If complex data manipulation and business logic is needed, we put it in front of the web API which is built on .net core that gets deployed to the EWS Lambda. And if you really need a database, we use Amazon S3 or Amazon Aurora. We also use Amazon S3 to save our web API calls into adjacent files, so you can just access that directly instead of making those calls. And for infrastructure as code to create and destroy our Amazon services, we use Terraform.
Himaja: So how do we know whether we are prepared to run 100% of our traffic on microservices? Well, this migration made us reiterate our automation testing strategy, so we added automated tests at every layer of our microservice apps. We use just an Enzyme to test our front end. We automated our rest API tests using Postman and we run end-to-end tests and integration tests on Puppeteer.
Himaja: We also do some load testing with BlazeMeter where we see how our app reacts under high load and under high traffic. Traditionally, scrum teams were only responsible for product development and delivery, not for the operations or infrastructure, but you build it you run it made them responsible for the complete stack, so to guard ourselves from failures, we implemented intensive logging, monitoring and alerting in our applications. We enabled logging and monitoring for all our AWS services and we set up dashboards and alerts using CloudWatch.
Himaja: We also track the performance of our application using New Relic based on different metrics like throughput, network, latency, CPU usage and memory usage, error rates and so on. We could have done the same using CloudWatch, but for our team in particular, New Relic is just easier to use. And we also set up New Relic synthetics that notify us when our app is down or when the alert conditions were triggered.
Himaja: And we conduct, we often conduct game days where we test the resilience of our application. We simulate failure scenarios on our applications and we match the results with our hypotheses. So, these game days helped us find out the cracks in our code logging and monitoring. We also do fire drills where we slightly, we implement these game days slightly differently to also see the resilience of our operations.
Himaja: These game days helped us to be prepared for the failure scenarios. And finally they have pager duty set up where every member of the team is on-call rotation. Fortunately, we aren't on-call this week, but if we did, we may get a call right now. Remember, with great power comes great responsibility. You can clap for that. Thanks.
Anika: Okay. You've seen all the exciting technology that we you, but how do we actually deploy? You've seen our architecture, you've seen how we test, but how do we deploy into production?
Anika: We created an in-house tool that basically is a hub in you GitHub repository from our source code, and triggers a Lambda that uploads a configuration file to AWS Code Build. Within AWS Code Build we install dependencies, compile code, run unit tests and start your app up. It then deploys your application to your inactive pool. It runs integration testing and runs all of the tests and pass, and only when it passes that it switch from your inactive pool to your active pool and there you have it, your lag 100% with just one click of a button. Besides going into production, we also used the CI/CD pipeline for testing to feature branches. It's good for early integration testing and also UAT testing. We also use our tool Git to Code Build for destroying our infrastructure within that branch.
Himaja: So, how are these microservices doing? Well, this process is still a work in progress. We have 40% of code still on monolith due to several reorgs and business commitments and smaller team sizes. We weren't able to move faster, but having said that, we saw significant improvements in our application performance and page performance just by modernizing our tech stack. And it also eliminated the deep coupling between these modules and our applications are now fault-tolerant, resilient, and highly available.
Himaja: Recently, our monolith went down and it took eight hours for it to get back up and running. In contrast, one of our bounded context apps also went down and it took nearly 15 to 20 minutes for us to get it back up and running. This alone made us realize how beneficial this whole migration process is. And now with microservices, it is easier to test new technologies or adapt new patterns.
Himaja: We were able to work on the complete look and feel refresh of our website while we are migrating to microservices and it also improved agility. Now the teams are more focused and accountable because they're only responsible for the things that they built. And our push button CI/CD helped us to move faster and deliver the products faster. Now we are spending more time on building robust applications and delivering value to the users. And we also had significant cost savings by adapting to cloud native and serverless architectures and solutions. And onboarding new employees have become much easier because now they don't have to learn the whole application. They are more focused.
Anika: So you've seen some of the benefits of the microservices architecture, but what are some of the challenges that we face? So, now that everything is an individual service, one of the struggles we had was global updates, coordination and maintenance. For example, this past quarter my teamed work on a header redesign. So, we had to support every single microservices that we had. Some teams didn't keep up with all of the packaging updates out there, such as React, Apollo, and also in-house KBB.com packages. So they had to account for all of those changes within one single branch.
Anika: We also don't have a team to build all our shared tools. Most of the tools that was built, like Git to Code Build, was kind of built on the side to kind of help other teams. So, on top of that a lot of things are abstracted or going on under the hood, which makes it easier, which makes it difficult to debug. Another challenge for me personally was this whole you build it, you run it concept. Before we moved to the cloud, I literally just pushed my UI code and I was done. But now, our team is in charge for the app's full life cycle, which is build, monitoring, deploying and also on-call duties.
Anika: Depending on your team, you may have up to four bounded context and multiple apps to manage. On top of that, if someone on your team built a [inaudible 00:18:56] they're also responsible for maintaining and adding more features to the app. Also, as we move to this new architecture, stakeholders need to keep up as well. Stakeholders such as marketing, analytics, sales, they may need to change some of their processes to keep up with this new [inaudible 00:19:15].
Himaja: So, today we showed you the challenges we had in our monolith and we give you a step-by-step process on how we slowly moved towards microservices. We also broke down our evolving microservice architecture and tech stack. We told you how to be prepared for a 100% running, 100% traffic on microservices. We walked you through our CI/CD process. We also revealed the current state we are in and current challenges that we are facing.
Anika: Some things that maybe you guys can take away out of this talk, be sure to automate what you can. Whether your it's infrastructure as code or your automation strategies, automate wherever you can. Another one is establish a CI/CD pipeline. This will help you save time and a lot of effort. Dashboards are your friend. Set up logging, monitoring, and automated dashboards. This is important in measuring the performance of your app and how are you going to improve it. Another one is do what's best for you. It's important to understand your project's needs and your use needs. Not just the end use needs, not just because it's the newest trend out there. The last one is failure is okay. It's how you learn, iterate, and get better.
Himaja: We hope you enjoyed our talk and hopefully learn something from our experience that you can apply in your next project. Thank you.

Want invites to cool events and things?

Boom, Newsletter sign up form.