Incorporating domain expertise to machine learning

Machine learning has the potential to help medical experts to deliver better healthcare. There are, however, important technical challenges that need to be solved before we can develop reliable models for clinical practice, including: (1) Limited number of labeled instances, (2) Uncertainty of the labels used during training, and (3) Differences between the distributions that generated the training and test data. My research focuses on strategies for effectively applying machine learning under these circumstances.

For learning models from a limited number of labeled instances, we propose incorporating domain expert knowledge during the training process. This domain expertise can be encoded in the form of probabilistic labels, which provide more information per instance than the commonly used categorical labels, or by using machine learning to extend the medical models currently used by human experts. We demonstrate the effectiveness of the probabilistic labels in three medical image classification tasks: for diagnosing hip dysplasia, fatty liver, and glaucoma. We observed gains up to 22 % in terms of classification accuracy when compared with the use of categorical labels. We also show how to use machine learning to extend an SIR epidemiological model for predicting the evolution in the number of people infected with COVID-19, achieving state-of-the-art results in terms of mean absolute percentage error (MAPE) in data from the United States and Canada.

For addressing the uncertainty around the labels, we use probabilistic graphical models. Instead of providing a point-estimate, probabilistic models predict an entire probability distribution, which accounts for the uncertainty in the data. Probabilistic models are a key component of the probabilistic labels mentioned above, and they also allow the incorporation of human decision making for tracking the number of new infections when using machine learning with the SIR model.

Finally, a consequence of training machine learning models with a limited number of labeled instances is that the training set might not be an accurate reflection of the data used during inference — in particular, the test set might not follow the same probability distribution that generated the training data. This means that a predictor learned from one dataset might do poorly when applied to a second dataset. This problem is known as batch effects or dataset shift, while approaches to correct for the discrepancies in these probability distributions fall under the umbrella term domain adaptation. Depending on the assumptions on what causes the discrepancy, these problems might be studied under specific names, such as covariate-shift, class-imbalance, etc.

Although all the applications in my research are related to the medical domain, we expect that the techniques shown here are applicable when: (1) expert knowledge can be encoded as probabilities, (2) there exist a parametric model currently used by domain experts for analyzing a phenomenon, and/or (3) there is a discrepancy between the source and target domains.

Related publications

Computational Psychiatry and machine learning

As part of the computational psychiatry group, most of my current research is focused on the application of machine learning techniques for the diagnosis of mental disorders. This is a very challenging task, even for medical experts, since there are still many unknowns about the brain and there is no standard biological test that allows the diagnosis of the mental illnesses. The task is no easier for a computer. The number of labeled instances (patients or healthy controls) is usually in the few tens or hundreds. Meanwhile, the dimensionality of the data collected (brain imaging, genetic data, clinical tests) are in the hundreds of thousands. This problem is known in machine learning as “small n, large p”. How to effectively learn in this scenario, and how to effectively combine data obtained from different modalities, is part of my current research.

Computational Psychiatry Group at University of Alberta

Batch effects in fMRI* data

* fMRI (functional magnetic resonance imaging) is a brain imagining technique that consists of taking a 3D picture of you brain every two or three seconds, for a period of a few minutes. Informally, you can think of it as a ‘movie’ that show the activity of the brain. One way of analyzing the data is to create a ‘connection diagram’ that shows which parts of the brain are acting together.

Example of functional brain connectivity computed using fMRI

Imagine that you are the manager of the hospital A, and you want to use machine learning to build a classifier. Given an fMRI scan of a subject, you want to determine if that subject has schizophrenia or not. You collect data from 100 patients and 100 healthy controls using your MRI scanner and build your classifier. You then test it in your hospital with new possible patients, and you find that its accuracy is 70%.

Days later, you realize that your friend, who is the manager of hospital B, is also collecting data from people with schizophrenia and healthy controls at her hospital. She has collected data from 100 patients and 100 healthy controls, so you decide to share data to build a better classifier. You do it, but when you test this new classifier at your hospital the accuracy is now 65%. What happened? Isn’t “the more data the better” one of the mantas of machine learning?

Batch effects is a phenomenon in which technical noise confounds the real biological signal. In this case, the technical noise is introduced by using different MRI scanners. The end results is that the probability distribution of the data from hospital A is different than the on from hospital B, which complicates the job of machine learning. Our current research consists in finding a representation from the data such that A and B follow the same probability distribution, so we effectively combine them to increase the performance in accuracy, relative to the one obtained using data from a single site. Batch effects is closely related to transfer learning and domain adaptation.

Related publications

Sparse graphical models for psychiatry

Graphical model showing the statistical relationship of some tests in people with depression.

For some applications, besides having a good classification accuracy, we are interested in gaining some insights about the data. Or maybe you have some knowledge about the data and you want your classifier to use this information. Imagine that you want to identify the differences in the ‘connectivity diagram’ of people with schizophrenia and healthy controls. Graphical models are a tool that allow us to not only make predictions (inference), but also to identify statistical relationships between the nodes. In this example the nodes would be brain regions.

For the case of continuous variables (such as brain connectivity), this problem is well studied. Several algorithms, such as graphical LASSO, are widely used to create the graphical models. In recent years there have been some research about effective ways of learning these sparse model also for discrete or for mixed data (discrete and continuous). Right now I’m exploring these techniques and their applications for the mental health problems. Mixed graphical models are one alternative for combining multimodal data (e.g. genetic data, which is discrete, and brain imagining data, which is continuous). Besides, more often than not, we have the problem of missing values in medical data . Probabilistic graphical models offer a good solution for dealing with missing values in an easy way via marginalization.

Related publications

Neurocognitive graphs of First-Episode Schizophrenia and Major Depression Based on Cognitive Features [Journal] [Preprint]

Unpublished reports (Course projects)

Gender and age group classification using functional Magnetic Resonance Imaging and Gaussian Markov Random Fields

Analysis of EEG* data

EEG (Electroencephalography ) is another modality to extract data from the brain. It consists on placing electrodes in the scalp to get an estimate of the electrical activity in the brain.

Over the years, I have worked in some projects using EEG data, mainly in the context of the development of brain-machine interfaces using motor imagery tasks. A motor imagery task is when a person just “imagines” that he/she is performing a certain action (e.g. moving the right hand) but without actually doing it. The objective is, using only information from the brain activity, identify the action that the person was imaging. This is relevant for people with severe physical disabilities, who cannot execute the actual movement. Most of my research in this area has been a collaboration with the group of Prof. Mauricio Antelis in Mexico. I have used Convolutional and Recurrent Neural Networks, as well as Hiden Markov Models while working with this data.