
When machine-learning models are deployed in real-world situations, perhaps to flag potential disease in X-rays for a radiologist to review, human users must know when to trust the model’s predictions.
But machine-learning models are so large and sophisticated that even the scientists who design them don’t understand exactly how the models make predictions. So, they create techniques often called saliency methods that seek to elucidate model behavior.
With latest methods being released on a regular basis, researchers from MIT and IBM Research created a tool to assist users select the most effective saliency method for his or her particular task. They developed saliency cards, which offer standardized documentation of how a way operates, including its strengths and weaknesses and explanations to assist users interpret it appropriately.
They hope that, armed with this information, users can deliberately select an appropriate saliency method for each the form of machine-learning model they’re using and the duty that model is performing, explains co-lead creator Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).
Interviews with AI researchers and experts from other fields revealed that the cards help people quickly conduct a side-by-side comparison of various methods and pick a task-appropriate technique. Selecting the precise method gives users a more accurate picture of how their model is behaving, in order that they are higher equipped to appropriately interpret its predictions.
“Saliency cards are designed to offer a fast, glanceable summary of a saliency method and likewise break it down into probably the most critical, human-centric attributes. They’re really designed for everybody, from machine-learning researchers to put users who try to know which method to make use of and select one for the primary time,” says Boggust.
Joining Boggust on the paper are co-lead creator Harini Suresh, an MIT postdoc; Hendrik Strobelt, a senior research scientist at IBM Research; John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering at MIT; and senior creator Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will likely be presented on the ACM Conference on Fairness, Accountability, and Transparency.
Picking the precise method
The researchers have previously evaluated saliency methods using the notion of faithfulness. On this context, faithfulness captures how accurately a way reflects a model’s decision-making process.
But faithfulness just isn’t black-and-white, Boggust explains. A technique might perform well under one test of faithfulness, but fail one other. With so many saliency methods, and so many possible evaluations, users often decide on a way since it is popular or a colleague has used it.
Nevertheless, picking the “unsuitable” method can have serious consequences. As an example, one saliency method, often called integrated gradients, compares the importance of features in a picture to a meaningless baseline. The features with the most important importance over the baseline are most meaningful to the model’s prediction. This method typically uses all 0s because the baseline, but when applied to pictures, all 0s equates to the colour black.
“It’ll let you know that any black pixels in your image aren’t vital, even in the event that they are, because they’re equivalent to that meaningless baseline. This might be a giant deal when you are X-rays since black might be meaningful to clinicians,” says Boggust.
Saliency cards can assist users avoid a lot of these problems by summarizing how a saliency method works when it comes to 10 user-focused attributes. The attributes capture the way in which saliency is calculated, the connection between the saliency method and the model, and the way a user perceives its outputs.
For instance, one attribute is hyperparameter dependence, which measures how sensitive that saliency method is to user-specified parameters. A saliency card for integrated gradients would describe its parameters and the way they affect its performance. With the cardboard, a user could quickly see that the default parameters — a baseline of all 0s — might generate misleading results when evaluating X-rays.
The cards may be useful for scientists by exposing gaps within the research space. As an example, the MIT researchers were unable to discover a saliency method that was computationally efficient, but may be applied to any machine-learning model.
“Can we fill that gap? Is there a saliency method that may do each things? Or possibly these two ideas are theoretically in conflict with each other,” Boggust says.
Showing their cards
Once that they had created several cards, the team conducted a user study with eight domain experts, from computer scientists to a radiologist who was unfamiliar with machine learning. During interviews, all participants said the concise descriptions helped them prioritize attributes and compare methods. And regardless that he was unfamiliar with machine learning, the radiologist was capable of understand the cards and use them to participate within the technique of selecting a saliency method, Boggust says.
The interviews also revealed a number of surprises. Researchers often expect that clinicians want a way that’s sharp, meaning it focuses on a specific object in a medical image. However the clinician on this study actually preferred some noise in medical images to assist them attenuate uncertainty.
“As we broke it down into these different attributes and asked people, not a single person had the identical priorities as anyone else within the study, even once they were in the identical role,” she says.
Moving forward, the researchers need to explore a number of the more under-evaluated attributes and maybe design task-specific saliency methods. Additionally they need to develop a greater understanding of how people perceive saliency method outputs, which could lead on to higher visualizations. As well as, they’re hosting their work on a public repository so others can provide feedback that may drive future work, Boggust says.
“We’re really hopeful that these will likely be living documents that grow as latest saliency methods and evaluations are developed. Ultimately, this is basically just the beginning of a bigger conversation around what the attributes of a saliency method are and the way those play into different tasks,” she says.
The research was supported, partly, by the MIT-IBM Watson AI Lab, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.