Project Overview

  • This project was purely an experimental direction taken as part of an ongoing project between the lab and a local hospital
  • There were several other ML models we had built on their data as well as open (after HIPAA training) data sources such as protein clustering, prediction of length of stay of a patient in the hospital based on predictor variables and so on
  • Finally after exploring all possible sources and features, there were discharge summaries lying untouched and unexplored
  • At that point, no one else in the lab had much knowledge of NLP techniques, nor had the bandwidth to explore this path
  • So I took it upon myself to explore this path and discovered that representing text into vectors is a whole research topic by itself, before even training models on that representation
  • So I explored a hybrid embedding technique combining character level and word level representations, to capture both context and not be harsh on the mispellings and typos made in the discharge summaries
  • The dataset consisted of delimiter separated and padded sentences from the discharge summary, each multi-labelled with the medical entities present in it - such as an adverse event, a drug, a medical procedure etc.
  • The entity labeling was done using CTakes - a rule based system. The idea of using this silver standard data was to prove that with the right representation of language, the NLP model can learn to go beyond rules to recognize new way to identify entities
  • With this embedding technique and a multi-label classification problem setup, a biLSTM was trained to label each sentence with entities identified in it. Later when a question is asked on: "What drugs has this patient taken?", all the sentences labelled with drug entity would be returned

Skills

Treading into the world of NLP

This was a purely explorative step, which I went in without much expectations, but it got me so interested and excited that I ended up presenting my work at a conference, which later converted to a research internship and then a full time role. My first dance with NLP would later evolve into a career

Importance of the right representation

I discovered that there is an entire research stream, conferences and requirement for representation learning in itself. Investing in this research and learning early on allowed me to build custom models and embedding algorithms that can adopt to new domains and business problems, that I would go on to publish and patent

Discovering entire markets and industried powered by NLP and ML solutions

At the time I did this project, NLP was still in a very nascent stage. But getting involved in this stage and talking about my work connected me to the booming tech and research community using these techniques to solve critical problems using the highly expressive and rich human language