Explanations, please

Explainability & Interpretability of ML Model Behavior


  • I see better understanding of neural networks, their decision making and their training processes as a key requirement for applying them in the real world.
  • I’m especially interested in going beyond just single instances and explaining the global behavior of complex deep-learning models.


PyPremise allows to easily identify patterns or explanations of where a machine learning classifier performs well and where it fails. It is independent of any specific classifier or architecture. It has been evaluated both on NLP text tasks and data with binary features. For a recent Visual Question Answering model, it, e.g., identifiers that the model struggles with counting, visual orientation and higher reasoning questions.

You can check out our Python library on Github.


  1. Michael A. Hedderich, Jonas Fischer, Dietrich Klakow, and Jilles Vreeken
    In International Conference on Machine Learning (ICML), 2022
  2. In Proceedings of the 31th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2021
  3. Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, and Dietrich Klakow
    In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Nov 2020