Carly, a voice-activated health coach for Amazon Echo


Since my team won MIT’s Hacking Medicine hackathon 2 years ago with an app that generates structured documentation from an unstructured patient-doctor interaction (by passively listening and watching the interaction), I’ve taken on the challenge of natural user interfaces. Recently, APIs such as Google’s WebSpeechAPI, which I’ve used in the setting of academic research, proved their ability to convert speech to text with enough fidelity to be useful in real-life applications. With devices like Amazon Echo and wide-range microphones capable of discerning speech through ambient noise, we’ve bypassed the second major hurdle in natural user interfaces.

Now it’s time to start bidding farewell to keyboards, mouses, and ugly software that looks like an Excel spreadsheet. Instead of scrolling through a mind-numbing list of vital signs and lab values in an electronic health record, a provider should simply be able to say: How high did John Doe’s blood pressure get in the past 4 hours? or Trend Jane Doe’s creatinine level over the past week. This is what I mean by natural language interface: software that allows humans to interact with it in the same way humans think, which is through natural speech.

If you have an Amazon Echo, check out my new Alexa Skill, Carly, a voice-activated health coach for Amazon Echo. The next stop will be introducing doctors to Carly and the joy of natural user interface.

Conceptualizing health and illness through word embeddings

20 minutes into her conversation with a patient with a diagnosis of irritable bowel syndrome (IBS), Dr. Zurcher realizes that she and her patient aren’t at all on the same page. With her own concept of “IBS” in mind, she tries her best to convey the fact that IBS is a syndrome characterized by constipation and/or diarrhea . Her patient, on the other hand, is less interested in discussing his constipation or medication for IBS than he is in bringing to his doctor’s attention his crippling social anxiety, which disrupts his life much worse than any of his gastrointestinal complaints. Dr. Zurcher’s grasp of IBS as a diagnosis established according to the Rome III criteria, while medically sound, has little to do with her patient’s conceptualization of his diseases, and unless she appreciates this, the encounter is unlikely to be productive.

As much as medical schools and residencies train physicians to listen carefully to their patients, physicians invariably approach the patient encounter with an agenda (to document a patient encounter, generate ICD-10 codes, and establish a problem list and plan) that doesn’t always coincide with a patient’s agenda.

To better understand how my patients conceptualize health and illness, I trained gensim’s word2vec implementation on 2 million disease-specific tweets. The beauty of this method is its capacity to uncover both obvious and less obvious semantic relationships among words. I challenge healthcare professionals to contrast their understanding of disease with their patients’ conceptualizations of  illness.

Try searching for “heart failure”, “obesity”, “alcohol”, or “IBS”, for example. Each query returns the 10 semantically and/or lexically nearest neighbors in 100-dimensional space, along with their cosine similarity to the query term. The closer to 1.0, the closer they are in hyperspace.

Screenshot from 2016-03-04 23:56:44

Machine Understanding

I’m at the opera house watching The Nutcracker. Toward the end of Act II, Scene 1, one of the lead ballerinas stumbles, nearly falling over. The audience falls silent, but before anyone can grasp what’s happening, she leaps into her role again. Thunderous applause follows the curtain’s fall, despite the less than perfect rendition. In the next scene, a robot replaces Ms. Akhmatova, and robo-ballerina executes an immaculate interpretation of the “Dance of the Sugarplum Fairy.”

As I wait in the long line to the men’s restroom in this fictional opera house, I ask myself which of the ballerinas, Ms. Akhmatova or robo-ballerina, has a better grasp of the ballet. If the essence of a ballet lies in its execution, does the robot, with its flawless performance, “understand” the ballet more completely than Ms. Akhmatova, whose occasional missteps fail to escape the seasoned observer?

With “Recursive Neural Networks Can Learn Logical Semantics”, Samuel Bowman, Christopher Potts, and Christopher Manning successfully trained recursive neural networks (RNNs) to apply logical inference to natural language. Like many other pivotal scientific works, the significance of this phenomenal work won’t become fully appreciated or manifest except in retrospect. Machine learning (ML) researchers have been applying neural networks (NNs) to a variety of problems, from image recognition to signal processing, but as a student of natural language processing, this work renewed my faith in neural networks’ capacity to live up to the term “deep learning” and uncover profundity in data.

There is a tendency among non-technical admirers of ML to regard these methods as beyond their creators: independent entities that will one day, given refined enough algorithms and enough energy, out-comprehend their human creators and overwhelm humanity with their artificial consciousnesses. The term “neural networks” is itself a misnomer that doesn’t at all reflect the elaborate complexity of how human neurons represent and acquire information; it’s simply a term for nonlinear classification algorithms that began catching on once the computing power to run them emerged.

The question of whether or not Samuel Bowman’s NN, or the robo-ballerina in the opening scenario, are capable of “understanding” is largely a theoretical concern for the ML practitioner, who spends the bulk of his or her time undertaking the hard work of curating manually labeled data, fine-tuning his or her neural classifier with methods (or hacks) such as dropout, stochastic gradient descent, convolution and recursion, to increase its accuracy by a few fractions of a percentage point. Ten or twenty years from now, I imagine we’ll be dealing with a novel set of ML tools that will evolve with the rise of quantum computing (the term “machine learning” will probably be ancient history, too), but the essence of these methods will probably remain: to train a mathematical model to perform task X while generalizing its performance to the real world.

I don’t mean to detract from the brilliance of Sam Bowman’s work. I don’t remember the last time a scientific paper excited me so much (in contrast to the medical literature, with its mantra of randomized control trials and cohort studies), and I can’t help but let my imagination wander at the thought that a RNN can actually learn logical inference. As exciting as I find Bowman et al’s paper, it also led me to grapple with the hairy question: What is understanding, and what is mimicry? Trying to answer this question (without using the word “consciousness”) led to a great deal of mental turmoil that culminated in the writing of this essay.

Professor Timothy Winters, a philosopher from Oxford University, praised man’s ability to name as his/her greatest gift. Implicit in this statement, I think, is man’s ability to conceptualize. When I call the energy illuminating my desk lamp “electricity,” I’m not just associating a phonetic time series with my halogen bulb’s white glow, I’m also instantiating an abstract class of natural phenomena and associating with it a body of hypotheses (for instance, Ohm’s Law and Kirchoff’s circuit laws). Had I called this “light” instead of “electricity”, I would have been operating under a different set of hypotheses using different mental schemata.

So what is understanding? To understand is to admit that one doesn’t comprehend anything at all. To understand is to use our uniquely human ability to create mental schemata of the world, models for how things and people interrelate and to systematically test and revise these hypotheses. These models might be inspired by a combination of personal experience, bodies of scientific thought, religion or spirituality, but they represent models nevertheless that are subject to change, and we ought accept them as such else we one day discover our worlds as brittle as the models themselves.

My understanding of people as inherently good, or my understanding of myself as a member of society with a moral duty to serve others, or my belief in human reason, are models subject to change based on my own experiences and the experiences of those who influence me. The word “understand” is itself utopic, an attempt at an ultimately impossible feat.

5 years in pursuit of meaning

There is a zen to hard work that leaves one too weary to think deeply about anything. I’ve spent the past week working 14-16 hour days, caring for patients and not writing code. It was a typical week on the medical wards. These are days of fasting, days that leave me spiritually satisfied but intellectually starved.

But today I’m rested and at peace with my thoughts for the first time in a long time. A question that has been whirling about in my mind’s undercurrents for years now resurfaces, as it does on days like these, bobbing up and down, restlessly spinning on its way downstream. Cool mist sweeps over San Francisco; my French press stands at the edge of my desk, the smoky taste of dark coffee lingers on my tongue.

I sift through the bold, curly lines my Uniball pen leaves on my Moleskine’s thick pages in pursuit of meaning. This act is mechanically easier when I write on paper than when I type on my PC, but the search is just as fruitless. I comb through words and brush them off the notebook’s lined pages until I’m staring at a blank page, and I start to make out an image of a cold, overcast day in October 2012. I’m reading Wittgenstein’s Tractatus logico-philosophicus in the original German on the balcony of my Berlin apartment. A woman next to me sips her coffee and lights a cigarette. All around us Berlin is perpetually becoming but never being [1]. I blink, and a new skyscraper appears. The young woman puts out her cigarette, and passengers exit a new train station that wasn’t there moments ago.

Wittgenstein’s words overtook me like a hallucinogen, profoundly changing the way I would think and perceive the world thereafter. The zen-like opening lines “Die Welt ist alles, was der Fall ist” (“the world is everything, that happens to be the case”) lead into crisp deductive reasoning that uses logic to piece together a Weltbild as sound and beautiful as a diamond. The truth is in logic, I thought, that unadulterated fabric holding the world together — free, unbiased, untainted by language, loyal to no school of thought and no civilization of the Occident or Orient. And so I felt, for the subsequent years, that I had stumbled upon something extraordinary. Pull the fabric here, and this happens. Pull it there, and that will happen.

My faith in logic and love for words led me to the discipline of “natural language processing”, a term I grow to dislike the more experienced I become in the field. Nearly every day for the past year, I spend a few hours dipping my bucket in the endless ether, collecting data and running calculations. The results themselves are scientifically interesting, but the more data I have, the more removed I feel from that original goal of understanding semantics, to hold the word “coffee”, squeeze it between my fingers, and watch the dark drops of meaning stain my pages, drops whose coarse texture I can feel between my fingers, drops whose bitterness I can taste on my tongue, smear across the page, and say: “Here it is! Here is the meaning of the word!”

Five years into my pursuit of meaning and I catch myself in a free-fall, grasping for “it” but reaching only that logical fabric connecting words with one another. I can hardly even make out the individual words. Dangling from the fabric holding together “cool” and “mist”, my fingers cramp, my muscles ache…I can’t hold on any longer…

…I fall…

…and catch myself on the fabric connecting “sweeps” and the prepositional phrase “over San Francisco.” My fingers slip, and I fall again…This continues, again and again and again, until I begin to wonder, in my exhausted delirium, whether words themselves are entirely devoid of meaning. Does meaning lie in associations between words rather than the words themselves? To know “Omar Metwally” the hacker, the physician, Twitter handle “osmode” is not to know Omar Metwally at all. But to know Omar Metwally, the son of Moustafa Metwally, the husband of Marwa El-Hamidi, the father of Ismail, the neighbor of Evgeniy, is to begin to know him — as a node in a web of inter-relationships, and it is these inter-relationships, I believe, which correspond to “meaning” as we understand it.

I’ve read Kafka’s The Metamorphosis at least a dozen times in English and German [2]. When meaninglessness overwhelmed me, I turned to Kafka’s writing for its rich, layered meaning, each sentence woven to the preceding and succeeding ones by time (the few seconds it takes to read each flowing sentence) and space (their arrangement on the page). In pursuit of meaning, I unravel The Metamorphosis, splitting the story into sentences, breaking its spatial and temporal semantic bonds, and reconnect them based on lexical similarity (that is, how many words they share) [3,4]. Kafka must be rolling in his grave now; forgive me for the sake of this thought experiment.




What new meaning, if any, does this text now have? Certainly not the literary grace it once carried; gone is the melancholic apartment Gregor Samsa shared with family until the day he woke to find himself a cockroach. Gone is his angry boss, his family, his miserable job as a traveling salesman — as Gregor the cockroach observed it from the cold walls of his former home. The above looks more like a story written by a search engine. In place of that sad apartment, which Gregor’s family rented out to strangers to support themselves (now that their son-turned-cockroach became unable to help the family pay off its debt), is an ugly, urban mess: apartment buildings filled with people who don’t know each other and don’t want to know each other, buildings connected by fiber optic cables, high-speed rails, and crowded streets.

The result is far from meaningless, but it certainly lacks meaning in the sense that it once carried, as it exists in the crumbling yet very much living pages on my bookshelf.

My attention wanders across the bookshelf, to a 3-ring notebook from my first-year linguistics seminar on discourse, which I had the privilege of attending with Professor Jon Swales, a pioneer of the field (and one of the most cultured Englishmen I have ever met). My semester project was “An interdisciplinary examination of textbook interactivity,” in which I analyzed the grammars of history, calculus, and chemistry textbooks to understand how grammatical structure correlates with a textbook’s perceived interactivity. I smile at the memory of spending Thanksgiving break during my first college semester at the University of Michigan circling second-person pronouns, manually counting words in textbooks. If I were to repeat the project in the year 2016 rather than the year 2003, I would have probably written a Python script to do the task in a few seconds.

But there was something romantic about holing myself up in my apartment, watching that winter’s first snowfall, and circling words, as there was about the “natural language processing” that Professor Swales pioneered. He would cringe if he ever heard me describe his work as NLP, and in fact, his work on discourse is too artistic and not quantitative enough to be called NLP [5]. Yet it’s precisely the fact that he is neither a machine learning practitioner nor a computer scientist that his work is so far reaching in the linguistics community. He is the proverbial Englishman at the polo club who has traveled so much that his ears can recognize any Arabic dialect and poke fun at linguistic nuances that go over most of our heads.

My search for meaning continues somewhere between the statistical methods currently in vogue and Professor Swales’ softer, almost literary approach. Quantifying linguistics helps us identify patterns and test hypotheses, but sacrificing art at computation’s stake, as I hope this essay illustrates, can divulge into meaninglessness if we are not careful.

–Omar Metwally


[1] These are Schopenhauer’s words. He described the perpetually becoming but never being world (“die immer werdende aber nie seiende Welt”) in his Die Welt as Wille und Vorstellung.

[2] The English version of The Metamorphosis used for this experiment is from the Gutenberg Project.

[3] The 500 commonest English words (such as this, and, a, the,…) are excluded here.

[4] Email me for my code. I’m happy to share it.

[5] NLP and computational linguistics are different but overlapping fields, and linguistics itself is a very broad discipline. I will not get into that here but simply acknowledge these facts.


Trumping political discourse on Twitter

Trumping political discourse on Twitter

In this political netnography study, my goal was to map clusters of conversation about Donald Trump on Twitter. I randomly selected subsets tweets from a database of ~20,000 tweets collected with the Twitter API by searching for “Donald Trump” over the past week. I vectorized these tweets and calculated the cosine similarity between them. Using the Python networkx module, I created edges between all nodes with cosine similarity >=0.65. All terms containing the words “Muslim” and/or “Islam” are colored red.

The result is a lexical topography of Twitter discourse about GOP candidate Donald Trump. This method nicely clusters lexically related tweets: notice the tentacular pattern emerging from the center of the figure and the smaller clusters closer to the periphery. The closer to the center of the graph, the higher the centrality of the tweets (the more connected they are to other tweets). Interestingly, there is a lot of red in the central part of figure, corresponding to tweets with high centrality and lexical power about Muslims and Islam.

Here are a handful of tweets with high clustering coefficients:

  • “seeking to alter jury selection, lawyer in terror case cites donald trump’s muslim remarks”
  • “trump digs the hole deeper as he justifies his creepy sexism by blaming hillary clinton”
  • “man how did this country even allow donald trump to run”
  • “this year will be remember by terrorist attacks and donald trump. what a time to be alive”
  • “tyrese blasts donald trump for bigotry on Instagram: watch america show you how we really feel”
  • “my mom said if donald trump becomes president we’re changing our names and moving to mexico”:
  • “bernie sanders explaining what’s so dangerous about donald trump running for president”
  • “donald trump saw a boy who was lost in new york and didn’t tell anyone. is this the man we can trust as president?”
  • “gael garcia bernal: donald trump calls mexicans rapists and drug dealers”

As the saying goes, it doesn’t matter what they’re saying about you, as long as they’re talking about you. Provocative, Islamophobic comments seem to be kindling wood feeding the fire of Mr. Trump’s political campaign, winning him the attention he needs on social media.


— Omar Metwally