Catherine Yeo
Explaining Machine Learning Predictions & Building Trust with LIME
A technique to explain how black-box machine learning classifiers make predictions

Photo by Joshua Hoehne on Unsplash
It’s needless to say: machine learning is powerful.
At the most basic level, machine learning algorithms can be used to classify things. Given a collection of cute animal pictures, a classifier can separate the pictures into buckets of ‘dog’ and ‘not a dog’. Given data about customer restaurant preferences, a classifier can predict what restaurant a user goes to next.
However, the role of humans is overlooked in the technology. It does not matter how powerful a machine learning model is if one does not use it. With so little explanation or reasoning as to how these algorithms made their predictions, if users do not trust a model or a prediction, they will not use it.
“If the users do not trust a model or a prediction, they will not use it.”
As machine learning becomes deployed in even more domains, such as medical diagnosis and recidivism, the decisions these models make can have incredible consequences. Thus, it is of utmost importance to understand and explain how their predictions came to be, which then builds trust.
In their paper “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier”, Ribeiro, Singh, and Guestrin present a new technique to do so: LIME (Local Interpretable Model-agnostic Explanations). This post will summarize their findings and introduce LIME.
One Line Summary
LIME is a new technique that explains predictions of any machine learning classifier and has been shown to increase human trust and understanding.
Explaining Prediction

Figure 1 from paper
Why is explaining predictions useful?
Let’s look at the example use case of medical diagnosis. Given the patient’s symptoms and measurements, a doctor must make their best judgment as to what the patient’s diagnosis is.
Humans (both the doctor and the patient) are more willing to accept (trust) a diagnosis when they have more prior knowledge.
A model has the potential to help a doctor even more with greater data and scalability. Adding an explanation into the process, like in the above figure, would then help humans to trust and use machine learning more effectively.
What do explanations need?
1) The explanation needs to be interpretable.
An interpretable model provides qualitative understanding between the inputs and the output.
Interpretability must also take into account user limitations and target audience. It is not reasonable to expect a user to understand why a prediction was made if thousands of features contribute to that prediction.
2) The explanation needs to be locally faithful.
Fidelity measures how well the explanation approximates the model’s prediction. High fidelity is good, low fidelity is useless. Local fidelity means the explanation needs to approximate well to the model’s prediction for a subset of the data.
3) The explanation needs to be model agnostic.
We should always treat the original machine learning model as a black box. This helps equalize non-interpretable and interpretable models + adds flexibility for future classifiers.
4) The explanation needs to provide a global perspective.
Rather than only explaining one prediction, we should select a few explanations to present to users such that they represent the whole model.
How does LIME work?
“The overall goal of LIME is to identify an interpretable model over the interpretable representation that is locally faithful to the classifier.”
LIME boils down to one central idea: we can learn a model’s local behavior by varying the input and seeing how the outputs (predictions) change.
This is really useful for interpretability, because we can change the input to make sense for humans (words, images, etc.), while the model itself might use more complicated data representations. We call this input changing process perturbation. Some examples of perturbation include adding/removing words and hiding a part of an image.
Rather than trying to approximate a model globally, which is a daunting task, it is easier to approximate a model locally (close to the prediction we want to explain). We can do so by approximating a model by an interpretable one learned from perturbations of the original data, and the perturbed data samples are weighted by how similar they are to the original data.
Examples were shown in the paper with both text classification and image classification. Here is an image classification example:

Say we want to explain a classification model that predicts whether an image contains a frog. Given the original image (left), we carve up the photo into different interpretable elements (right).

Then, we generate a data set of perturbed samples by hiding some of the interpretable elements (the parts colored gray). For each sample, as we see in the middle table above, we derive the probability of whether the frog is in the image. We learn a locally-weighted model from this dataset (perturbed samples more similar to the original image are more important).
Finally, we return the parts of the image with the highest weights as the explanation.
User studies with real humans
To evaluate the effectiveness of LIME, a few experiments (with both simulated users and human subjects) were conducted with these 3 questions in mind:
Are the explanations faithful to the model?
Can the explanations help users increase trust in predictions?
Are the explanations useful for evaluating the model as a whole?
Are the explanations faithful to the model?
For each classifier, the researchers kept note of a gold set of features — the most important features. Then, they computed the fraction of the gold features recovered by LIME’s explanations. In the simulated user experiments, LIME consistently provided > 90% recall on all datasets.
Can the explanations help users increase trust in predictions?
In the simulated user experiments, the results showed that LIME outperformed other explainability methods. With real human subjects (Amazon Mechanical Turk users), they showed high agreement in choosing the best classifier and improving them.
“Before observing the explanations, more than a third trusted the classifier… After examining the explanations, however, almost all of the subjects identified the correct insight, with much more certainty that it was a determining factor.”
Are the explanations useful for evaluating the model as a whole?
From both simulated user and human subject experiments, yes, it does seem so. Explanations are useful for models in the text and image domains especially, in deciding which model is best to use, assessing trust, improving untrustworthy classifiers, and getting more insight about models’ predictions.
My Final Thoughts
LIME presents a new method to explain predictions of machine learning classifiers. It’s certainly a necessary step in achieving greater explainability and trust in AI, but not perfect — recent work has demonstrated flaws in LIME; for example, this paper from 2019 showed that adversarial attacks on LIME and SHAP (another interpretability technique) could successfully fool their systems. I am excited to continue seeing more research and improvements on LIME and other similar interpretability techniques.
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier.” ACM Conference on Knowledge Discovery and Data Mining (KDD) 2016.
. . .
Thank you for reading! Subscribe to read more about research, resources, and issues related to fair and ethical AI.