I am sure you are familiar with the following scenario: a user is reporting to
your Support team that something is not working for him as expected. Your
Support team investigates the issue and agrees that there is a bug in the
system. They open a JIRA bug to the R&D department with all the
information they have collected, as expected from them. But then… a furious argument begins on the ticket. Support is saying that they think R&D should solve this bug within a week. The Customer Success Manager is saying this is a critical customer just before renewal. Therefore we need to make all of the effort to solve it within 48 hours, but R&D doesn’t see this as an urgent matter and thinks the bug should be solved within 30 days.
Who is right?!
On the two last engineering organizations, I led as a VP, when I joined, we faced that kind of dilemma daily. When we analyzed it, we figured out that there are just too many open bugs all across. In this reality while the support is trying to call what is the right SLA, the developers are overwhelmed with the number of open bugs without a clear idea on how to prioritize between them, therefor calling loose deadlines, and the client-facing teams are also trying to figure out how to move in this mess and lack of SLA communication, while not counting on the R&D to solve soon enough (or even to solve at all), therefore expediting those bugs, more than needed.
The solution, was a combination of “Zero Bugs Policy”, formulating a “fix policy”, and enhancing communication interfaces. This was achieved in some easy-to-implement / straightforward steps.
Reducing the number of open bugs.
Go over the list of all your open bugs and decide for each one of those if it is a real bug, which requires a code change and a new version deployed, or is it some logical issue (e.g statistical AI-related) and then move it into a task (if required), or is this some configuration issue, for which, you should open a support ticket and let your technical support personnel solve it soon enough. Even if this is a real bug, decide if it is something you need to fix, if it’s minor and already alive for a few months, it probably not worth a fix, either-wise if it is on some component you are going to replace soon, don’t fix it. If it still “survives” as a bug — You should solve it fast enough (not longer than 30 days), which means bugs are not aging in the system.
This step resulted in reducing the number of open bugs in the system from high 100’s into low 10’s. Now it’s clearer and easier to communicate over a shorter list of bugs. On average every team has not more than 5 open bugs to solve in any given moment.
Formulate the fixing policy and correlate between it and contractual SLAs
The next step was to make sure everyone is aligned on what are the fix policies required for different bugs. Adding custom fields into JIRA, using some of the existing JIRA fields for bugs, and making sure that all the fields that are needed for this formula are mandatory when ticketing a new bug — We, automatically (JIRA script based) formulated a “bug score”, which is advised to be revisited in some months frequency.
Such fields can be the Number of effected clients and end-users (we are a B2B, SAAS product). The effected component. Is there any workaround?. Which client is it? (prioritizing clients with some VIP score which the client-facing organization can reset every month). Is this a mission-critical function?
Once we had this score we tied it with the “fix policy”. We decided that we want to support the following SLAs: Incident (ASAP), 48 hours, 1 week, 30 days.
This step result was that every bug in the JIRA has a clear fix policy. Statistically, we found out the 95% of the fix policies that were the automation result, were all agreed by the client-facing team, technical support team, and R&D organization. We also add a JIRA process in which one can argue the automatically calculated score.
Monitor Bugs Fixing progress
The last step was to create visualization and a push/pull call for action items over our fixing policies.
JIRA dashboards and data studio reports that present all the needed information, including aging bugs, policy meeting grade (on a team level), and bugs amount (created and solved).
Push notifications, Slack channel, and Email based were sent to the bug’s reporter and assignee whenever a bug wasn’t progressing at the right pace. We allowed a maximum of 3 days in which the bug’s assignee could have “argued” the “fix policy” and/or confirm and verify the bug is on him. We alerted “close to aging” bugs and we nudged when a bug was crossing its fix policy.
On both companies, those steps took a few weeks to deploy. The result was that internally, all teams were aligned over our bugs fixing policies. Communication and overhead around bugs, were significantly reduced and people were motivated to solve bugs on time.
The quality of our product was much better. Clients were praising us around bugs fixing communication (time to first response, time to respond) and actual addressing (time to solve). Our NPS score was higher.
As a VP of R&D, with some little needed to buy in into this process, and then almost zero micromanagement (Automate everything you can!). I could monitor anything, share it with my peers and quickly deep-dive into a specific bug when needed.