Project Overview
- I started this project while I was still with IBM Research business unit and exploring moving into a product business unit. This project acted as my bridge between business units and my chance to show how I can bring the best of both worlds
- This project was under the CTO team in the product business unit, which is now part of Kyndryl and this was a highly impactful project, which would be prevent high costs incurred from customer attrition due to the incidents
- The idea is to identify and flag possible incident causing code changes based on past patterns and also once an incident is reported, identify and rank within a few minutes the possible code changes that could have cause this
- The team consisted of 3 technical members (including me) and one domain expert on incident codes and reporting flows and we were each from different countries and timezones.
- I took ownership of the multi level similarity detection module between the code change description and incident description, computing more derived features for the combined model and drafting & presenting the disclosure submission for this idea, which is now a patent submission with USPTO
- For both the internal use and client use of this system, we had to present its working along with metrics, savings potentials and real time efficiency to several groups, as this was impactful to operational efficiency to many entities
Skills
The bridge between business units
While I had also worked on ML going into product at IBM Research, this was my first time experience with using ML for a use case with high monetization and cost saving abilities
Real time impact
The risk based cost mitigation this system provided required to be implemented in near real time. Since the data was huge, we had to have a multi filtering system including clustering to group and rank several code changes together, before dividing and conquering
Selling the solution
With the amount of interest and visiblity into this project, came high demands on metrics like top n ranking accuracy, throughput, latency and security as well. We worked to improve these and setup and demonstrate this functionality on client systems for those who could not move / share information about their code out of their system