PegaWorld | 36:01

PegaWorld 2025: From Chaos to Control: Testing & Releasing Dynamic Pega App with Smart Test Automation

In a world where our Pega application changes as fast as the business demands, traditional testing just couldn’t keep up. We reimagined our QA strategy by shifting testing left and replacing manual regression with smart, API-based automation. This let us catch issues at the source—well before the UI ever came into play—slashing feedback time and boosting test accuracy.

But we didn’t stop there. To truly reflect real-world performance, we engineered a unique testing setup that connected our Pega app to a production-like backend. This gave us powerful insight into how the app behaves under live conditions—surfacing bottlenecks and edge cases that would otherwise slip through.

The result? Faster releases, fewer bugs, and a quality pipeline built to scale with change.

PegaWorld 2025: From Chaos to Control - Testing & Releasing Dynamic Pega App with Smart Test Automation

Presenter: Harneek Singh, Engineering Manager Testing at Customer Service, Booking.com

Good afternoon everyone. How is everyone doing? I know it's the last session of the day, but trust me, it would be fun.

Something about myself. Hello everyone. I'm Harneek Singh. I am Engineering Manager testing at Customer Service at Booking.com and I'm responsible for gatekeeping the quality of bookings. Customer service application which is built on Pega customer service platform. I live in Netherlands almost more than eight years and if the weather allows, which only one Some one month a year. I love to play tennis.

So. Before I begin, I would like to ask a question. How many of you have used Booking.com? Nice. I expected more hands, but fine. How many of you have called customer service at Booking.com when something went wrong with your reservation or booking? Okay. Nice. What was your main expectation when you called customer service at Booking.com? You wanted to get your solution in one call, right? So let's hold on to that thought.

Let's talk about what's the mission of Booking.com. Booking.com wants to make it easier for everyone to travel. And how does customer service help here? Customer service wants to support every customer with the query first time around. So they really want to ensure that all your calls are solved in one go. And to make this mission possible, and to make this dream possible. Booking.com engaged with Pega to for the inception of customer service application at Booking.com, which was built on Pega Customer Service platform.

Let me give you an idea about the scale of customer service at Booking.com. So we have multiple business units. We have accommodations for all the hotel listed. We have flights for all the flights you book from Booking.com. We have transport trips and transport for all the car rentals. And we have something called attractions as well where you would like to go and book some attractions. And we have multiple layers of end users. We have our agents who are servicing the customers. We have partners who are actually the hotel listers, and we have multiple layers of agents as well. Some are senior, some are not senior. We have around about 30 case types, and we have around about 15 plus teams working altogether in parallel to make this application possible. So that's the scale. And we have 12,000 to 15,000 concurrent users at the peak time, which is July to August for us.

So let's now minimize to a timeline between 2023 and 2024. When we started our development was in silos. Um, there were a lot of things missing. There was no regression properly. The deployment was not automated, which meant very low quality and very slow releases towards us. Then we made some changes and we moved to faster releases by automating the deployment, automating a bit of regression, and we started ensuring that unit tests are part of our integral development life cycle. So we improved our quality, we improved our releases, but still not there. And at the end now we worked on some very innovative innovations, which makes us ensure that we can deliver high quality customer service. Pega application really fast.

And and how did we do that? How did we make it possible? In my presentation in the next 20 minutes, I will explain that by covering four major topics like what is the significance of automation? What? How should you automate your regression? What is the significance of automated sanity. When you have multiple downstream to upstream environments. And when you automate your regression, what should you focus on in your regression? And still, everything is automated, but you need to ensure that your application is performing well. Especially when it's used by 12,000 to 15,000 agents concurrently. And I will explain this with the help of something we might be familiar with the quality pyramid.

How many of you have seen this pyramid? I mean, if you open the textbook of software development, this is the first thing you see there. Yeah, I will be using that.

So let's look at a before and after state. So when we started we had multiple downstream to upstream environments from development to testing to acceptance. And our deployment was kind of manual. And we had branches floating around in all the environments, which meant that you can push your change to any upstream or downstream application without any quality check. We worked really hard towards automating the deployment, but also ensuring that every step has unit test coverage using Pega unit so that we cover the basic the base business logic of the application.

So as we see here, we covered the basics. We covered we made the foundation strong by covering the unit tests layer and ensuring that every deployment from downstream to upstream has unit test coverage. But testing at scale and quickly is hard. There are multiple teams doing regression at the same time and the full regression cycle for such a large app, it takes a lot of time. We need to release faster.

So what do we need to do to release faster? Definitely we need to automate the regression. But there are challenges in automating a very large and dynamically changing application. When 15 teams are working on the application and there are frequent UI changes, automation and scaling, the automation is hard. And there are so many changes that one day you make automate a flow. Next day it won't be same. So your automation has to be going through a lot of maintenance.

So we had three options to solve this problem. One of the option was let's stick to some of the automation and let's do everything manually. But we had a problem here. We won't achieve our faster releases. We will still have slow releases and everything will be tested manually.

Second option was let's do some API level of automation. By the way, are you guys aware what's an API level of automation? It's like basically you bypass the UI and we just hit the APIs. I will explain this more in my later slides. So basically we thought let's do some API automation and then switch over to UI automation if it's not possible to go through API. But this flow also had a bit of a problem. Every switching is a failure point. Whenever you switch from API to UI, your flow can fail.

And then we had a third option. Let's. Let's aim for almost 90% API automation and still do 10% automation through UI. Why? Because we want to see some flows on UI, and we want to do some cross-browser testing because we have multiple browser supports and we want to ensure that our tests run both the browsers. This approach had a bit of a drawback. That UI might change rapidly, but it might impact only 10% of the flows. So, and we had a lot of conviction that this idea might work. And this idea could be a success, but every idea has to be tested.

So what we did, we did a POC on the two options. We selected five complex scenarios, and we automated them from both the options. And then we evaluated on on six parameter ranking where we would we would judge what is the learning curve, how easy it is to learn, how fast we have to maintain, how easy it is to maintain how fast these tests run. What is the failure rate of test? There is no use of automation. If your tests fail all the time, it's it's equal to not having an automation. And can we reuse automation? We don't want to reinvent the wheel again and again by creating different automation.

And you can see that we had a very clear winner which is the option number three, which is the all API approach. The only negative point in this approach was maintainability was a bit difficult. But as you can see that the frequency was very low. So it was fine. If it breaks very less, it's okay to maintain some time.

So we took this findings and we started to extend this POC into a full scale automation framework. We've nailed down a scope of like 150 scenarios. We started knowledge transfer not not to forget. We our implementation partners were Mindtree who helped us to execute this automation framework for the duration of 4 to 5 months. We made a plan for like, let's do it in four months, maximum five months. Because we wanted to reuse. We wanted to use this automation as soon as possible before going to our peak in 2025, which is right now.

But as you know, plans don't go the way we think they will go. We had a lot of customizations in our applications. Flows were changing rapidly and it was very hard to create a stable JSON. As you guys might be aware of, Jsons are the way APIs interact with each other, but we thought we have decided we have done a successful POC. Let's start creating reusable components. Let's start creating some data. Let's start fixing some customizations so that our framework can work. And we actually started fixing some customizations.

I will now go a bit deeper in this API approach where I will exactly explain what did we do. I will take an example of the most common customer service journey you will have, right? Cancellation. You cannot make it to a to a holiday. Your flight is canceled or you change your plans. Now you want to cancel the hotel or you cancel the room.

So what happens is that you call the customer service. Hello? I cannot make it. So can I cancel my room? And depending upon your policy, your cancellation policy. As you might have seen, some things you can cancel free. Of course, for some cancellation, you need to pay something. It depends. So whenever a customer calls an agent, it's called an interaction. And a case gets created, which is a cancellation case. In this case, the agent collects the details and then agents contacts the partner, which is the hotel lister, the the hotel which has been listed on the Booking.com. And then the partner replies that he agrees to cancellation or not, and then we processed the reply of the partner and then we go towards guest approval. We want to say, hey, your cancellation is possible free of cost or this much is the fees, right? And then we go towards, uh, finalization of the case. The case gets resolved. So this is a journey a cancellation journey, a very high level view.

What we did is that all this every step is a is a UI screen. And you go from step number create case to resolve case through UI screens. We thought let's do the same workflow but not through UI, because UI is changing rapidly through API calls. And instead of sending data through UI, we send data. We send the parameters through JSON, JSON files. And that's what we did.

We. In this process we extracted the whole APIs of Pega which are responsible for creating the case and executing the case through different workflows, depending upon what type of case it is, we almost copy the entire DCS API in our automation framework. And we were using almost only two APIs here, which is a create case API and an assignment API. So we had to only use two APIs with different combination of Jsons to make every possible flow work through APIs. So that was a that was a victory, as you can see in the screenshot, in the in the screenshot of the folder structure, you can see that we actually have a copy of every JSON request response going to DCS APIs responsible for every possible journey in our application.

I will give a short demo now. Is it playing? Yeah. So here I'm running the test. Mind it. It's all headless. You don't see anything on the UI. And we also do data creation. We also create reservations. Reservation creation of data is part of our test. And we delete the data afterwards so that we start with a clean state and we end with a clean state.

Here we're starting the case which means equal to starting an interaction. Calling a customer agent. Now the agent is collecting details here and it's happening through API calls. And um. Now the case has been sent for partner approval here. Depending upon the cancellation policy, partner will approve your cancellation or reject the cancellation or say, okay, I give you a partial refund.

So here what's happening? We verify the partner response. I want to highlight that we are able to check every email template automatically. We have more than 200 email templates and we can verify them through this automation framework. We don't need to manually send emails every time to check them.

Now we are looking for guest approval. That if guest accepts or reject what will happen, it will just show the outcome. As you can see on the screen that all the data is being sent. You can see in the screen what reservation was that, what reservation ID was that, what was the cancellation fees and everything you can see in the logs. We turned on the logs so that you can see. And then we go towards resolving of the case here. The case has been resolved.

Mind it. This step takes 20 minutes for a tester to execute manually because it has a lot of screens. And it took us, I think, almost less than two minutes to. Execute this flow. So this is the success of this automation framework. So we are bypassing the UI but still executing our flows automatically.

So let's look at the same timeline again. So we had some test already created. And to prove they're worthy of the test we started including them into our CI, CD, CI, CD pipeline. So tests were running as soon as we had like 10 to 20 tests. And we started running them nightly and on demand. And we realized that we don't have to cover 150 scenarios. 120 scenarios can cover even more than 90% of our application. So less test, more coverage. It's a victory.

And I would like to highlight a very important point. We are we each every run. We save 46% days amount of work and for every release. And we do two releases every month, so we save 92% day's worth of work every month, which is a lot of a lot of time. And we have almost 90% of the regression covered. I am very safe. I say 90%, but we have even more.

But things keeps on breaking, tests keeps on breaking. So let's say we cover 90%. But we have one more problem. We have a lot of downstream upstream environments, and the code has to go from development to testing to acceptance to production to production. And every deployment has to be tested, sanitized, and every sanity takes almost two hours to check everything if you do manually. Now we have automated tests. Why not deploy them for testing sanity?

So what we did is that we created several sanity packs, and we trigger sanity tests test whenever we whenever we deploy from testing to, to to acceptance to production and any any middle environments as well. So we can actually go to production in one 20 minutes if there are no bugs found. And actually, we have tested this in our patch releases where not a lot of regression is required manually. We can go to production in one 20 minutes, which means that we can release anytime we want. And time is the most critical factor in software development. You want to go to to your end users as soon as possible, so that you can get feedback as soon as possible.

Now let's get back to our friendly quality pyramid. We covered the middle layer of the pyramid as well. We covered the integration layer and we made sure that we cover our application integration level of the application.

But what happens when you automate regression? What should be your focus of regression? So what we do is that we test all the new flows which are newly introduced through manual checks, exploratory testing because they are new and we haven't got a chance to automate them. Also, it's new, so it's nice to exploratory test them so that you can deeper test the flows which are built, freshly built. But also we pretend all the testing team, we pretend like a end user agent and we execute end to end flows manually. Some of them, not everyone, every every flow, doing a live interaction and then taking a case from one step to the final step, like a real agent. So we test end to end user tests as well. And in that process we also check the UI because we believe a code should not test a UI human. I can test a UI better. You can spot mistakes unless and until you're using a screenshot short verification and you do pixel to pixel comparison, right? Because UI test, which verifies a UI can ensure that the button is there, if the button is upside down, it won't spot that.

So which means that we covered the top layer of the pyramid as well. We make sure that our UI is working without any bugs, and we do that by exploratory tests.

I would like to highlight a bit of a before and a now state. Our regression used to take four days for 23 testers. It was actually more than four days, almost five days. We used to use we used to start on Monday and we used to finish on Friday. Now we can do the whole regression with automated automated test in two days. And in the last release, we were able to finish that in just less than one and a half day. So in one and a half day we can run all the automated test tests so it can test the entire application, and we can quickly check the new journeys, new scenarios, and we can release, which means that what used to happen before when we tested manually all our journeys, it used to take us one day to find the bugs, and we would find the bugs right at the end of the second day or the middle of the second day. And then we have to go through the cycle of fixing the bugs. Is it a bug or not? Then the whole thing starts. But now a tester comes in the morning, checks the run, is some debugging happening, and then they find out, oh, there's a bug which we have to fix before we go to production. So we already know the bugs even before we start the regression. And I will give you an example. In our previous release we had a very critical bug. And because of that, no, the application didn't work. And and our tests were all red. And we spotted what happened. And we saw that a simple thing is missing and it was fixed, and that was the only bug in the whole release. This bug would have taken one day to, to be to be explored.

And since we are using DCS APIs of Pega, we can use the same automation framework on Constellation with a little bit of modification. Yes, we are still on UIKit, and that's why we wanted to invest in this framework so that if in future we move towards Constellation, we don't have to throw away our automation framework. We can reuse reuse the same thing.

And until now, I think we deployed our tests completely in February. Until now there are 90% reliable, only 10% flakiness is there, which we are also trying to reduce.

But automation is not everything, right? An application which is at a scale of 15,000 concurrent users, which is a lot like a call coming every 2 or 3 minutes. You need to have a very well performing application. And for that we do a very innovative, uh, performance testing as well.

First of all, we have set some strict SLAs, SLOs that our CPU utilization should be not more than 70%, error rate should not be more than 3%, and our heap size should not climb more than 80%. Um, and as you can see that we have 12,000 to 15,000 agents based on depends upon the day how busy it is. And right now, for example, we have 20 web nodes and ten batch nodes. Web nodes are used for web processing. Batch nodes are used for background processing. But how did we decide this? Web nodes and batch nodes. How do you decide what is the size of your production?

So what do we do is that we do two types of performance testing. We do performance testing to find out the size of infra right. And we do application performance testing to find out the bottlenecks in the application. What are the slow points in the application? Where should we invest to make it faster? And until now, we have done multiple performance tests to find the right size of our cloud three infra because we recently moved to cloud three, but we were not aware what should be the optimal size of our of our infra. And we have been doing this performance test every time to fine tune our size, which means that not only we covered the pyramid, but also we covered our tip of the iceberg, which is the nonfunctional testing.

Now let's look back on the timeline, which I showed in the beginning. So what did we do from chaos? Which means uncountable change requests, production bugs, no release cadence ten days to release. Just in the beginning, it used to take ten days to release to a little bit of rest. Regression. Like five days. Monthly releases. Yeah. Manageable to a total control where there are very few production bugs. It only takes two days to do regression and we can release every Sprint. How beautiful it is that we can release Pega CSS application for Booking.com every Sprint.

And how did we do that? Was there a magic? We just stick to the basics. Just we just stick to the basic, which is the quality pyramid. And we made sure that our foundations are strong. That's how we went from chaos to control and Booking.com CSS, which is based on the Pega Customer Service platform.

Thank you. That's it.

Q&A Session All right. I told you that was going to be worth waiting for. How good is that? I mean, we're talking real transformative at scale results there. I cannot thank you enough for sharing that story with us. So folks, who is going to be brave and take the question microphone first to start the flood of questions. I'm sure they're going to come in. So can I invite, uh, the first willing volunteer to stand up and come and ask a question? Or I thought we thought we'd have one there. Yes, sir. Please come and stand up and take the mic.

Question 1: Yeah. This nice presentation. Thanks. So the question is the automation is more of on the quality side. Right. But do we have any focus on the development side, the unit test automation or the scenario test automation.

Answer: Yeah. So we have a quality gate in the unit test which we don't, which we have to cover uh, at every, every step, uh, scenario testing. You mean the Pega scenario test? We, we have we tried that. We didn't find it reliable and we didn't use it. We are getting more results by the automation framework, which I've shown you before. So the idea is, um, what to test and how to test is most important. And we find that this is the way which suits best for us. There is a focus on development. There are code reviews. There are strict code reviews. There is a strict calculation of unit tests. And our deployments are automated. So if you don't pass your quality gates, your deployment will be locked like it won't be going ahead. Yeah. So basically it's a shift left. We follow a shift left approach. We want to go towards development as close as possible so that we are not to end in the process. We can find bugs and issues in the beginning. I hope it answered your question.

Question 2: How hard it is to maintain your code. Like if you need to do update. If you need to change your code, how much of work that requires. In the automation tests. Yes.

Answer: Yeah. So as I mentioned in in my presentation that the maintenance is low, Slow, right? We found out that the API's don't change more or less, even if we add a field on the UI. Either we can find it very easily, or we already know or it doesn't. It's not needed. Like there are not mandatory fields on the UI which are needed. So I would tell you that from the real facts, every run 1 or 2 tests fails. And it's also because something has changed. For example, a space in the text has changed or something in the backend has failed. So we are also spotting backend bugs through these automation tests. So automation is so maintenance is not that heavy and it's not very frequent, which was the main reason to select this framework because in UI our maintenance would be immense and we don't want to do that. The only thing is if it fails your test fails, then you have to go in the Pega Development center and you need to find out what changed. what should be the equivalent JSON parameter. So that's a bit of a challenging task. But since our tests don't fail that much, we really have to do that. Yeah.

Question 3: Excellent. Excellent story. Um, the question I have is, um, where do you draw the line between PDC. And so when I say PDC, we do use PDC to get flags and notifications when something goes bad from the from the performance standpoint or something breaks. Right. But we don't have anything in production to say or any. The test cases that you said is from dev to UAT or or Pre-prod, but do you have anything pointing to production to flag certain.

Answer: Yeah. So um, we, we monitor PDC in production. Smoke test. Yeah, yeah, yeah. So we keep on keep an eye on the PDC graphs as well. But at Booking.com we have active monitoring as well. We have Grafana logs where we keep on looking at the latency of our services. We keep on looking at how many cases are there, how many users are in the system. We keep on monitoring that and there are alerts and something goes wrong. We have something called core agent action, which means if agents are not able to take a call, pick an item to. We get alerts and then we we have to go through an incident and find out what's happening. Also this automation test run on production as well. A slice of automation test so we can test the flows on production. But we also do a performance test on the exact replica of production. So we do the performance test on the exact replica. We find out the issues and if there are any issues. We fine tune the production infrastructure as well. So there's a mix of active monitoring, passive monitoring, some performance tests and some automation tests.

Follow-up: So you have even configured some core functional test cases to to monitor the production. Yeah. Okay.

Answer: Yeah. We recently did that because we were able to um, and we are running around about 25 tests on production, key end to end flows to ensure that our production is not broken. What we want to do now is do a health check run so that if something breaks, it sends an alert that one of the flow is not working better have a look at that. Thank you.

Question 4: Yeah. You mentioned you are not using Pega scenario testing, but are you using Pega unit testing at all?

Answer: Yeah, we are using Pega unit to cover the tests.

Follow-up: Okay. And then how do you run the coverage report to ensure that? Because, uh, we've been trying and we are not able to successfully see how much code is covered. Um, how do you make sure it's 100% covered?

Answer: We have a health metrics in. And we also cover the quality percentage in the deployment manager. Um, I am not sure about the more details, but we have a bottom line coverage which we ensure that we cover that in the collection of PDC scores. Plus, um, what is the health of the every rule set? And we cover all those things and we have a, we have a requirement that every new functionality should be covered by unit tests during the development.

Follow-up: Okay. So to recap, you use the Pega unit testing plus the automated regression testing. And do you use a specific tool for automating.

Answer: Yeah. So we use the most trusted uh API tool, which is rest assured. I think everyone who has done API testing has used it. And for UI, a little bit of UI testing, we do. We use playwright, which is the upcoming and the most stable tool in the market right now. Thank you.

Question 5: Yeah. Thanks. On that coverage part, you mentioned coverage on unit testing is basically a follow up question. on on her question. Um, the coverage score, in my opinion, doesn't say that much about the quality of testing. How do you determine the quality of your tests that are actually there in a unit? For example. You mean the unit test. In the unit testing?

Answer: Yeah. So there are active reviews happening by by a lead system architects. So they keep on reviewing the new development to ensure that the quality of unit test is better.

Follow-up: And are you including testers in that part. Because are you actually including testers in that review?

Answer: Um, not yet, but we have an idea to include testers in that review as well. But what we do is that we supply scenarios to the developers that, hey, this is the scenarios we're going to check so that developers can write unit tests looking at the scenarios, uh, what, but what you're saying right now, that's the next step we want to do. We want to open the reviews to the testers as well, because in the high code development, testers actually review the unit tests. And that's what we want to go. Towards at least, at least include the skills of the testers in that part. Yeah, so that's what we're trying to do. We have some basic things to fix. Uh, and after that, that's the next step to do actually basically shift left, bring everything close to development, testing, performance, everything.

Follow-up: Okay. Thanks.

Question 6: Uh, you mentioned, uh, you your application is already in UI kit and the, the test scenarios that you have mentioned or you have already created, working a little bit of modification in Constellation also. So that is a kind of POC you have already created, because the UI part in UI kit and Constellation is a bit of change.

Answer: Yeah. So we don't we don't, uh, connect to the UI. So we use the APIs and Constellation and UI kit, use the APIs, one uses v1 and the Constellation uses v2. So there's a little bit of changes we need to do. Uh, maybe in the JSON format uh, or how do you connect. But all in all, 80% to 90%. It's going to stay the same. And from a UI point of view, testing that is still. Yeah. So that will change. Definitely. Um, because it's a Constellation UI, it totally looks different. That's why we only have 10% of the tests on UI, because we don't want to invest too much on the UI test. When they are flaky, they will break. And the UI in Pega is also auto generated. So it's very typical and very hard to automate the UI test. It took us a lot of time to only automate 10% of the tests in UI. So it's a very effort intensive task. So that's why we are that's why this whole framework was ideated, because we didn't want it to deal with UI and changing from UI kit to Constellation, and maybe in future something else comes up. So we want to didn't have a throwaway automation framework.

Um, let's give him another huge round of applause. Thanks, everyone. Thanks for coming. Thanks.

We'd prefer it if you saw us at our best.

PegaWorld 2025: From Chaos to Control: Testing & Releasing Dynamic Pega App with Smart Test Automation

Related Resource