Mateusz Malinowski

I am a PhD student in Computer Vision at the Max Planck Institute for Informatics in Saarbruecken, supervised by Dr. Mario Fritz.

I am interested in building and understanding holistic machines that take advantage of increasingly larger data, adapt to new tasks, and combine different modalities with little and accessible supervision.


  1. Visual Turing Challenge
    This line of research focuses on building machines that answer questions about image's content as well as exploring different ways of benchmarking such machines on this complex and subjective task. Another goal lies on distilling and understanding main challenges behind such Visual Question Answering task. In this project, we introduce a dataset for the real-world visual question answering task, a set of automatic performance measures, and propose and examine two architectures: a symbolic and neural-based. The symbolic approach, based on the semantic parser, explicitly uses a chain of perception, knowledge representation, and formal deduction system to retrieve the answer. An alternative neural approach is the end-to-end, jointly trained and scalable architecture, which builds upon CNN and LSTM, and generates multiple words answer based on an image and a question. A part of this project was mentioned in Bloomberg Business.

  2. Learning Spatial Relations
    Despite of strong progress on object recognition and image-to-text retrieval techniques, surprisingly little has been done on incorporating spatial representation in the inference process. In this work, we propose a pooling interpretation of spatial relations and show how it improves image-to-text retrieval task. We show improvements on previous work on two datasets as well as we provide additional insights on a new dataset with an explicit focus on spatial relations.

  3. Learning Smooth Pooling Regions
    for Visual Recognition
    This line of research argues for a joint and discriminative training of two last stages of the multi-stages recognition architectures, namely: pooling and classification. Here, we introduce and examine a learnable variant of the pooling stage, which couples together a classifier with a novel aggregation operator. The experimental evaluation shows that our approach significantly improves over similar recognition architectures with hand-designed pooling stage.