Regression consuming any sprint
Hi there,
I'm a PO for two products (very similar apps) and my team is facing "regression" issues with each release cycle. I did read other threads but did not find the answer matching my issue - maybe you can point me towards one:
We have a quite large regression suit that grew over 5 years (about 1300 end-to-end TC, unit testing not counting, we have no issue with them). A lot is automated and our automation run already takes about 18h to complete (we use night runs) with 80% pass rate. The other 20% need to be rechecked manually. In addition we have a manual suit for things that are not automated yet, cannot be automated or take a lot of effort to automate.
All in all it takes about 14 working days (for one person) to do these activities. One sprint is 2 weeks (10 working days) so we usually need two sprints after the main work is done to get a release candidate that is securely tested. +4 weeks that is = not very agile
Now you might have some classic arguments here:
a) Automation is key, you need more automation -> True, but my automation team is telling me that a) there are a lot of TC that can or should not be automated and b) our DEV team is as fast with changes as the automation team is with automating (= they have to fix TC all the time)
b) Definition of done: Regression should be in the same sprint otherwise you have no done increment -> Also true, but we simply cannot do that right now since the regression requires more time than the sprint itself (+ nothing else can be tested then like new User Stories)
c) You need to test less -> True, but my manual team is telling me that they cannot take out any (big part of) TC without potentially missing critical areas
d) You need more people -> Not so sure since it looks like a quite expensive (ROI?) solution to just through a lot of people on this task
I see no, no possibility how I should shape this process to get releases every two sprint, not even dreaming about each sprint without introducing major risks with omitting a lot of testing. Even if everything would be automated (also I don't see how this can be possible) with the 20% update and fail rate we still would need more than 3 days in each sprint for these activities.
This is my opinion, but I will state that I have a background in Software Test and what I am about to state has been proven to work.
In an agile environment, you have to change how you work. In the matter of testing, it is that you become smarter. Instead of testing everything to make sure nothing is broken, you need to focus test depending on the areas of the code base that has changed to ensure that the changes are producing the expected behavior. Here is how I did it.
I worked with our QA Engineers and Software Engineers to work early in refinement and identify the portion of the code base that would be targeted with changes. Then using that information the Engineers would identify areas of functionality that would depend on those code areas under change. The QA would then determine the best way to test the changes. Those "best ways" would also include testing at the unit, integration, system levels. If a change could adequately be tested by unit tests, there is no reason to test it from the UI. Test the changes at the level that makes the most sense and results in the lowest effort.
This is done for every Product Backlog item during refinement so that by the time the item is pulled into a Sprint, they have already identified a test plan for the item. It also allows you to quit relying on a large suite of tests to possibly find problems and instead focuses on preventing those problems from being introduced into the iteration. As work begins, you will continue to inspect and adapt the test plan based upon learnings that occur during the Sprint.
This method makes testing more focused, faster, and specific. It also leads to a better understanding of the code base by the Developers of the Scrum Team (remember that everyone involved in the work to change the product is considered a Developer on the Scrum Team). Over time, our product quality improved and we did not find any benefit to having a large regression suite.
I will say that this is not what has been done in the past so it is not comfortable for everyone. Software Engineers became very reliant on QA Engineers to validate their changes and that they didn't cause problems in other parts of the application suite. In my opinion, that has lead to some long test cycles that never should have happened. Move the testing closer to the code changes, make them part of the code changes, focus them on the actual code changes. This is more efficient and eliminates waste. What waste, you ask? How often do you run your 2 week regression cycle to find a few, or no, significant issues?
Just a few things to think about:
18 hours to run 1300 tests is very slow. I'm currently working with a team that maintains an automated test suite that's over 10x larger in terms of the number of test cases and doesn't take that much longer to run in a configuration where it's publishing test execution evidence to a third-party test case management suite. What is making your automated test suite so slow? I'd look at everything from the infrastructure the tests are run on, opportunities for parallelization, and profiling for performance bottlenecks (either in the application under test - users may also be experiencing these performance problems - or the tests themselves).
I'd point out that a typically work day is 8 hours. Some organizations work 10 hour days. That leaves 14 hours in the day. If you can get your test suite to consistently run in less than 14 hours, you can run it every night, perhaps on the most recent development snapshot.
Along these lines, Daniel brings up a good point. Do you need to execute all 1300 tests every release? There's a concept called risk-based testing. You need to be more exhaustive when testing the changes, but you may be able to use white-box techniques to drive the selection of your regression tests based on the specific changes made, the architecture of the system, and the criticality of certain functions provided by the system. If you run fewer tests, your tests will finish faster. Plus, if you work on performance improvements, you can get even more rewards. Including unit and integration test coverage in addition to the system test coverage can also be useful in risk-based testing strategies.
A 20% failure rate is also very high. That's over 250 failed tests per execution. Of these, how many are true failures versus false positives? I would invest in the stability of the tests. Flaky tests or false failures are expensive, since someone has to figure out if the problem is real and where it is. That's time that could be spent elsewhere to get more value.
Get more people involved in testing. Outside of critical systems, there's no reason why testing can't be done by the whole team. Even when building those critical systems, the independent testing should be done by an independent team outside of the Scrum Team, so a round of testing can be done by all of the Scrum Team's Developers as part of the Sprint. You may want someone with a little more independence to do things like review changes made to test cases or test scripts by someone who is primarily a programmer rather than a test engineer, but you can get through manual test cases faster and get more maintenance done on your automation scripts if you get everyone involved.
Enable your developers to run the automated system tests as part of their development process. This may pair nicely with developing a risk-based testing strategy. If you can choose which tests to run, you can allow your developers to look at their changes and run the testing as part of their development process. Perhaps even get their help to identify and fix flaky tests. Having prior test runs before the regression can boost confidence in the stability of the system.
If developers are able to develop the automation test suite, then you could make some level of automation maintenance and automated test case development part of the Definition of Done. Minimally, the developers disabling the tests affected by their changes so the test runs are true regression runs would get you some improvement, but I'd consider having the developers consider updating those tests and perhaps even creating at least "happy path" new test cases for their changes as part of being Done.
I struggle with the idea that some test cases should not be automated. Depending on the architecture and design of the system, there may be cases that are difficult to automate, and you can perform a cost/benefit analysis to determine how to prioritize test cases for automation, but a low priority test case doesn't mean that it shouldn't be on the list to figure out how to automate.
Now you might have some classic arguments here
Yes I do: TDD.
You haven't mentioned refactoring once. When refactoring is short-changed, technical debt is incurred, and you may well end up with an unscalable product and test architecture.
So, if I understand correctly, these are end-to end test cases (is TC=test case?), and so you are testing on the User Interface.
One possible solution is to run all the tests on the layer under the UI, and then choose specific tests to run on the UI. I'm thinking (waiting for) the UI system is what is slowing the testing down.
This is, of course, when the UI is really only the UI, and doesn't make decisions. This may require refactoring, but can you continue where you are now? Soon it'll be 2 Sprints before the testing finishes!
Thanks everyone for the answers. Let me give you a bit more insights into it:
@Daniel W. Instead of testing everything to make sure nothing is broken, you need to focus test depending on the areas of the code base that has changed to ensure that the changes are producing the expected behavior. [...] Then using that information the Engineers would identify areas of functionality that would depend on those code areas under change.
We do exactly that, but that isn't regression (that least for us) isn't it? A new change in a given User Story is tested with affected area right and then in the sprint as part of "definition of done". Regression is checking the other old parts that in theory should not have been affected, but ofc you cannot be sure
@Daniel W. How often do you run your 2 week regression cycle to find a few, or no, significant issues?
We usually have 4 sprints with dev+affected ares testing as definition of done. (=> we cannot release these items yet without risk), followed by the regression to release everything.
@Thomas O. What is making your automated test suite so slow? I'd look at everything from the infrastructure the tests are run on, opportunities for parallelization, and profiling for performance bottlenecks
Note that e.g. our unit tests (we have about 5K) run in about 1min and are required for every merge. We already use parallel runs and parallel machines. The regression is build from manual TC (TestCases) and they are all on the UI level. So the regression uses an automated simulator to perform the actions on screen and then checks the content of the screen. 1 TC can take 30-60sec (since it is end-to-end, from user input to system output)
@Thomas O. There's a concept called risk-based testing. You need to be more exhaustive when testing the changes
We started to do that (taking the regression down to about 50% of its content... still far too much to handle of course). And my QC experts tell me that it is a very high risk to exclude even more
@Thomas O. A 20% failure rate is also very high. That's over 250 failed tests per execution. Of these, how many are true failures versus false positives?
Excluding them would still mean to run them manually (no big change in efforts). Since code changes frequently we have to fix old TC all the time: Imagine we have a setup wizard and change the first screen. All TC that go through the wizard are now failing, even though it was a small change for the DEV team.
Thomas O. Get more people involved in testing. Outside of critical systems, there's no reason why testing can't be done by the whole team.
The DEVs do "DEV testing" and run their unit tests but ofc that is something different than the end-to-end tests. (In addition - and this is ofc anti-scrum, my DEVs want to do "the coding" and not "the manual testing" part ... so that might be a mindset issue here which is hard to change)
Thomas O. I struggle with the idea that some test cases should not be automated.
There are just tech limitation. E.g. we have some tests with notification on a specific time but our framework cannot handle time changes in a stable way or tests that require changes in a (3. party) backend we do not own so we cannot set up e.g. preconditions automatically.
@Ian M. You haven't mentioned refactoring once.
Refactoring is done all time. If it changes UI, then a lot of end-to-end TC are broken (usually). If it is just technically changes, then we have no issue since we can test it automatically so I don't understand your point here.
@Mario One possible solution is to run all the tests on the layer under the UI, and then choose specific tests to run on the UI. I'm thinking (waiting for) the UI system is what is slowing the testing down. This is, of course, when the UI is really only the UI, and doesn't make decisions.
Indeed, all UI cases. But ofc we have automated unit tests, too. There are just no issues with them so I didn't talk about it. However, there is so much happening on the UI level (input validation, navigation, display) that we can't imagine testing with mainly unit tests
Refactoring is done all time. If it changes UI, then a lot of end-to-end TC are broken (usually). If it is just technically changes, then we have no issue since we can test it automatically so I don't understand your point here.
Is continuous refactoring used to improve product and test architecture, ensuring that neither becomes too monolithic?
We usually have 4 sprints with dev+affected ares testing as definition of done. (=> we cannot release these items yet without risk), followed by the regression to release everything.
You didn't answer my question. If you are running these regressions suites and never find significant issues, what is the return on investment for doing them? Yes, not running every single test that has ever been written has risk. But if you really evaluate the risk, you could probably justify taking on a bit more.
We do exactly that, but that isn't regression (that least for us) isn't it?
I argue that it is regression but it is focused based upon the actual work that is done, rather than a large effort with the hope that it finds some issues. This is the point that you need to address, and it is mostly what everyone is saying. An agile organization is willing to make changes even if those changes are initially uncomfortable. From the way you talk, your organization is not willing to make changes in some areas. If your QC org is not willing to adapt, if your QA org is not willing to adapt, then you really have no choice but to introduce waste in your process which will slow down the time to delivery. If that is the choice, your best option is to have a Sprint where all the Developers take on the work for testing and no new development is done.
The common thread that I see is the need to make significant investments in change.
One change is to end the idea of "dev testing". The developers are part of a team that is supposed to be delivering a working system to stakeholders. Although the best case is for the people to be using their strongest skills, that's not always possible and people may need to help in other areas that they may not be as strong in. Sometimes, that means that developers need to be working on the automated tests and test framework to improve it. Other times, that could mean manual testing. If they don't like to do manual testing, they can use their knowledge and skills to try to automate that manual testing. Developers do need to do more than write production code and throw it over the wall to testers, though, for a team and organization to be successful.
Investing in your test framework and test environment would also help. For example, you say that your current framework can't handle time changes in a stable way. Fix that. If your system is dependent on time, that may mean migrating to a new framework that can support the tests needed to exercise your system and give you and your stakeholders confidence. If you have dependencies on third-party tools, you may need to either work with the third-party to be able to test the system or build mocks, simulators, or emulators to remove those dependencies in development and test environments. Mocking or simulating the third-party may not be perfect because of differences between the test environment and the real third-party system, but it may be helpful to reduce integration risks.
There won't be easy solutions to these cultural and technical problems. But if you want to be agile and responsive to changes, you need to be able to reduce the feedback loops. 14 days for regression testing is where you are now - without intervention, that cycle will grow longer. The longer you wait to make these investments, the more expensive those investments will need to be to see the necessary changes
I agree whole heartedly with all the points provided by the experts. However, one thing which strikes out is the heavy-use of UI automation which is weighing the whole process down.
Please keep in mind that the UI testing automation is the COSTLIEST test automation done and it would become a nightmare for you to manage it with the steady increase in your regression suit. Instead of doing that, I would suggest to focus more on the unit-level testing (which I understood is already been done nicely) as well as the service-layer testing (testing all the APIs/ Microservices/ batch functions or whatever your application uses underneath to drive business logic). This will be 10-50 times more faster and would be cost-effective to run and can cover all the input validations/ edge cases and any other scenarios. If in case your application has some specific business logic applied on the UI for which it must be automated, that would seem to be a design & architecture issue in itself and should be a candidate for refactoring in the long run.
Finally, I would summarize saying that yours seem to be a classic Inverted Test Pyramid issue and needs to be optimized in order for it to shape up like an actual pyramid.
So let's see:
Ian: Is continuous refactoring used to improve product and test architecture, ensuring that neither becomes too monolithic?
I'd argue yes. Both the normal and the automation team have a refactoring task in each sprint. (E.g. we are currently splitting some code parts so they don't need. to be tested in case these areas are not changed)
Daniel: f you are running these regressions suites and never find significant issues, what is the return on investment for doing them?
We do find some issues. In each regression about maybe 10. Some of them I would rate major/critical but the majority is minor/trivial. The issue is that you ofc never know before what the tests will find.
Thomas: One change is to end the idea of "dev testing".
To highlight that again, we already have DEV testing and unit tests. They are codes and run as part of the "definition of done" in each sprint and no issues with that. The only issue is the huge UI-automation-regression at the end.
Thomas: Investing in your test framework and test environment would also help
I do agree but I'm not an expert in this field and my automation experts tell me that it is not possible atm / feasible. It is hard to argue against that since I want the team to pick the solutions that allow them to create the most value and they think we have a very good system.
Soumyadeep: Finally, I would summarize saying that yours seem to be a classic Inverted Test Pyramid issue and needs to be optimized in order for it to shape up like an actual pyramid.
Here I lack experience but it kind of results in the same issue as above. My QC/QA team thinks that we get the most value with the current approach and I miss expert knowledge to be sure that this is wrong... I can only point out the issue with e.g. time to market creating the struggle I'm currently in :)
Here I lack experience but it kind of results in the same issue as above. My QC/QA team thinks that we get the most value with the current approach and I miss expert knowledge to be sure that this is wrong... I can only point out the issue with e.g. time to market creating the struggle I'm currently in :)
Something to help you have a discussion with your team:
https://martinfowler.com/articles/practical-test-pyramid.html
Don’t let perfect be the enemy of good. Consider if you have entered into a rigidity trap on how much regression is actually needed for your increments vs. feeling like you need it because you have always done it that way.
Consider the scope of your Sprint Goals. Smaller product changes means the team can better understand what has changed and what the actual impact will be. Should issues be found, it is easier to correct them. If a small change warrants a full regression test, then it may be worth looking at the stability of the system.
It may also be time to clean up your regression suite. A regression suit that has grown over 5 years is likely to have accumulated redundant or obsolete test cases. Is every test case still valid? How do you know? Are they optimized? Are there any other opportunities here?
If you are finding major/critical issues during regression, that indicates weaknesses in your earlier testing. Perhaps some root cause analysis to find out why. It is very costly to find issues during regression, so look for ways to strengthen earlier testing.
You keep mentioning “teams” which leads me to believe this isn’t a Scrum Team in the framework sense. This may be contributing to the resistance to change, as each team seems to be working in isolation, focused only on their functions, and handing things over as someone else’s problem. As one Scrum Team, everyone ought to be looking at delivery as a cohesive unit. There is no such thing as “main work is done” of dev done or any kind of Done until the work is in a releasable state.
You also mention “they think we have a very good system”. What empirical evidence is available to support this? You have called out that you have a time to market issue. Do you have data on release frequency, cycle time, lead time etc. to support this? Has this been held up to the team as a mirror so they see the whole picture, vs. just their function? If you bring full transparency to the situation they may be in a better position to inspect and adapt as a cohesive team focusing on the overall delivery system.