Conceptualizing health and illness through word embeddings

20 minutes into her conversation with a patient with a diagnosis of irritable bowel syndrome (IBS), Dr. Zurcher realizes that she and her patient aren’t at all on the same page. With her own concept of “IBS” in mind, she tries her best to convey the fact that IBS is a syndrome characterized by constipation and/or diarrhea . Her patient, on the other hand, is less interested in discussing his constipation or medication for IBS than he is in bringing to his doctor’s attention his crippling social anxiety, which disrupts his life much worse than any of his gastrointestinal complaints. Dr. Zurcher’s grasp of IBS as a diagnosis established according to the Rome III criteria, while medically sound, has little to do with her patient’s conceptualization of his diseases, and unless she appreciates this, the encounter is unlikely to be productive.

As much as medical schools and residencies train physicians to listen carefully to their patients, physicians invariably approach the patient encounter with an agenda (to document a patient encounter, generate ICD-10 codes, and establish a problem list and plan) that doesn’t always coincide with a patient’s agenda.

To better understand how my patients conceptualize health and illness, I trained gensim’s word2vec implementation on 2 million disease-specific tweets. The beauty of this method is its capacity to uncover both obvious and less obvious semantic relationships among words. I challenge healthcare professionals to contrast their understanding of disease with their patients’ conceptualizations of  illness.

Try searching for “heart failure”, “obesity”, “alcohol”, or “IBS”, for example. Each query returns the 10 semantically and/or lexically nearest neighbors in 100-dimensional space, along with their cosine similarity to the query term. The closer to 1.0, the closer they are in hyperspace.

Screenshot from 2016-03-04 23:56:44