Project Overview

  • This project was explored as part of Watson Health Offerings to embed knowledge from ontologies carefully curated over years
  • Embeddings can learn context of and between two words provided there's sufficient data to establish that context between them. But they still cannot establish ordering relations such as is->a and part->of
  • This kind of relationship for medical domain is seen mostly in ontologies such as MeSH and SNOMED-CT, with the latter having more depth of ordering relationships than the former
  • Embedding this information while representing texts, helps the NLP systems built on these embeddings to have generalization or specialization abilities, with respect to diseases and treatments
  • I worked on this project with one other researcher and we started with it's application in the medical domain
  • Later we tested it's efficacy on generic english as well using WordNet and published the findings

Skills

Importance of using knowledge bases

This project showed how important it is to stop and utilize knowledge in the right way to help ML systems learn efficiently and directionally, compared to just expecting the ML systems to figure it out themselves

Graph structure and efficient programming

One of the main selling points for this work was it's efficient computation compared to larger embedding models. And for this I had to learn to code efficiently with several case handlers and yield generators for the graph structures we had

Successful Brainstorming

This was one of the most intensive projects I worked on, with respect to the amount of new directions that were explored to arrive at this algorithm. The teammate I worked with was a seasoned researcher who had a unique skill in engaging in such brainstorming sessions that opens up many possibilities and yet adds clarity. I was able to involve and pick up some of these skills