Publications

2024

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
L. Barsellotti, R. Amoroso, M. Cornia, L. Baraldi, R. Cucchiara
Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition

FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval
Luca Barsellotti, Roberto Amoroso, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU
Maximilian Bernhard, Yannic Kindermann, Roberto Amoroso, Matthias Schubert, Lorenzo Baraldi, Rita Cucchiara, Volker Tresp
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

2023

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
M. Cornia, L. Baraldi, G. Fiameni, R. Cucchiara
International Journal of Computer Vision

Fully-Attentive Iterative Networks for Region-based Controllable Image and Video Captioning
M. Cornia, L. Baraldi, A. Tal, R. Cucchiara
Computer Vision and Image Understanding

Evaluating synthetic pre-Training for handwriting processing tasks
V. Pippi, S. Cascianelli, L. Baraldi, R. Cucchiara
PATTERN RECOGNITION LETTERS

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
Nicholas Moratelli, Manuele Barraco, Davide Morelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
SENSORS

Fully-Attentive Iterative Networks for Region-based Controllable Image and Video Captioning
Marcella Cornia, Lorenzo Baraldi, Tal Ayellet, Rita Cucchiara
COMPUTER VISION AND IMAGE UNDERSTANDING

Video Surveillance and Privacy: A Solvable Paradox?
Rita Cucchiara, Lorenzo Baraldi, Marcella Cornia, Sara Sarto
IEEE Computer

Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
Luca Barsellotti, Roberto Amoroso, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 22nd International Conference on Image Analysis and Processing

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation
Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe
Proceedings of ACM International Conference on Multimedia 2023

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation
Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models
Roberto Amoroso, Matteo Tomei, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the British Machine Vision Conference 2023

SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning
Davide Caffagni, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 22nd International Conference on Image Analysis and Processing

Towards Explainable Navigation and Recounting
Samuele Poppi, Niyati Rawal, Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 22nd International Conference on Image Analysis and Processing

Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis
Federico Cocchi, Lorenzo Baraldi, Samuele Poppi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 22nd International Conference on Image Analysis and Processing

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE/CVF International Conference on Computer Vision

2022

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions
Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

Focus on Impact: Indoor Exploration with Intrinsic Motivation
Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
IEEE ROBOTICS AND AUTOMATION LETTERS

From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach
Marcella Cornia, Matteo Tomei, Lorenzo Baraldi, Rita Cucchiara
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

Explaining Transformer-based Image Captioning Models: An Empirical Analysis
Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
AI COMMUNICATIONS

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara
Proceedings of the 19th International Conference on Content-based Multimedia Indexing

CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco, Matteo Stefanini, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 26th International Conference on Pattern Recognition

Dual-Branch Collaborative Transformer for Virtual Try-On
Emanuele Fenocchi, Davide Morelli, Marcella Cornia, Lorenzo Baraldi, Fabio Cesari, Rita Cucchiara
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

Embodied Navigation at the Art Gallery
R. Bigazzi, F. Landi, S. Cascianelli, M. Cornia, L. Baraldi, R. Cucchiara
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Investigating Bidimensional Downsampling in Vision Transformer Models
Paolo Bruno, Roberto Amoroso, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 21st International Conference on Image Analysis and Processing

Retrieval-Augmented Transformer for Image Captioning
Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 19th International Conference on Content-based Multimedia Indexing

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments
Federico Landi, Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 26th International Conference on Pattern Recognition

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition
Silvia Cascianelli, Vittorio Pippi, Martin Maarand, Marcella Cornia, Lorenzo Baraldi, Christopher Kermorvant, Rita Cucchiara
Proceedings of the 26th International Conference on Pattern Recognition

The Unreasonable Effectiveness of CLIP features for Image Captioning: an Experimental Analysis
Manuele Barraco, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

2021

Multimodal attention networks for low-level vision-and-language navigation
F. Landi, M. Cornia, L. Baraldi, M. Corsini, R. Cucchiara
Computer Vision and Image Understanding

From Show to Tell: A Survey on Image Captioning
M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, R. Cucchiara
IEEE TPAMI

Video action detection by learning graph-based spatio-temporal interactions
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
Computer Vision and Image Understanding

A Computational Approach for Progressive Architecture Shrinkage in Action Recognition
Matteo Tomei, Lorenzo Baraldi, Giuseppe Fiameni, Simone Bronzin, Rita Cucchiara
SOFTWARE, PRACTICE AND EXPERIENCE

Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara
COMPUTER VISION AND IMAGE UNDERSTANDING

Video action detection by learning graph-based spatio-temporal interactions
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
COMPUTER VISION AND IMAGE UNDERSTANDING

Working Memory Connections for LSTM
Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
NEURAL NETWORKS

Assessing the Role of Boundary-level Objectives in Indoor Semantic Segmentation
Roberto Amoroso, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 19th International Conference on Computer Analysis of Images and Patterns

Estimating (and fixing) the Effect of Face Obfuscation in Video Recognition
Matteo Tomei, Lorenzo Baraldi, Simone Bronzin, Rita Cucchiara
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

Improving Indoor Semantic Segmentation with Boundary-level Objectives
Roberto Amoroso, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 16th International Work-conference on Artificial Neural Networks

Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data
Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Maria Ludovica Piazzi, Rosiana Schiuma, Rita Cucchiara
Proceedings of the 19th International Conference on Computer Analysis of Images and Patterns

Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Marco Cagrandi, Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the ACM International Conference on Multimedia Retrieval

Out of the Box: Embodied Navigation in the Real World
Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 19th International Conference on Computer Analysis of Images and Patterns

Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis
Samuele Poppi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
2021 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

2020

Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling
Federico Bolelli, Stefano Allegretti, Lorenzo Baraldi, Costantino Grana
IEEE TRANSACTIONS ON IMAGE PROCESSING

A Unified Cycle-Consistent Neural Model for Text and Image Retrieval
Marcella Cornia, Lorenzo Baraldi, Hamed R. Tavakoli, Rita Cucchiara
MULTIMEDIA TOOLS AND APPLICATIONS

A Novel Attention-based Aggregation Function to Combine Vision and Language
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 25th International Conference on Pattern Recognition

Explore and Explain: Self-supervised Navigation and Recounting
Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 25th International Conference on Pattern Recognition

Meshed-Memory Transformer for Image Captioning
MARCELLA CORNIA, MATTEO STEFANINI, LORENZO BARALDI, Rita CUCCHIARA
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Marcella Cornia, LORENZO BARALDI, Rita Cucchiara
International Conference on Robotics and Automation

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions
Iulian Cojocaru, Silvia Cascianelli, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
Proceedings of the 25th International Conference on Pattern Recognition

Ai4ar: An ai-based mobile application for the automatic generation of ar contents
R. Pierdicca, M. Paolanti, E. Frontoni, L. Baraldi
AUGMENTED REALITY, VIRTUAL REALITY, AND COMPUTER GRAPHICS, AVR 2020, PT I

RMS-Net: Regression and Masking for Soccer Event Spotting
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
Proceedings of the 25th International Conference on Pattern Recognition

2019

M-VAD Names: a Dataset for Video Captioning with Naming
Stefano Pini, Marcella Cornia, Federico Bolelli, Lorenzo Baraldi, Rita Cucchiara
MULTIMEDIA TOOLS AND APPLICATIONS

Explaining Digital Humanities by Aligning Images and Textual Descriptions
Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
PATTERN RECOGNITION LETTERS

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
Image Analysis and Processing – ICIAP 2019

Image-to-Image Translation to Unfold the Reality of Artworks: an Empirical Analysis
Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Image Analysis and Processing – ICIAP 2019

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Connected Components Labeling on DRAGs: Implementation and Reproducibility Notes
Federico Bolelli, Michele Cancilla, Lorenzo Baraldi, Costantino Grana
Reproducible Research in Pattern Recognition

Towards Cycle-Consistent Models for Text and Image Retrieval
Marcella Cornia, Lorenzo Baraldi, Hamed Rezazadegan Tavakoli, Rita Cucchiara
Computer Vision – ECCV 2018 Workshops

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach
Angelo Carraggi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Computer Vision – ECCV 2018 Workshops

What was Monet seeing while painting? Translating artworks to photo-realistic images
Matteo Tomei, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
Computer Vision – ECCV 2018 Workshops

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
Proceedings of 30th British Machine Vision Conference

A Deep-learning-based approach to VM behavior Identification in Cloud Systems
M. Stefanini, R. Lancellotti, L. Baraldi, S. Calderara
CLOSER 2019 - Proceedings of the 9th International Conference on Cloud Computing and Services Science

Recognizing social relationships from an egocentric vision perspective
Stefano Alletto, Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
MULTIMODAL BEHAVIOR ANALYSIS IN THE WILD: ADVANCES AND CHALLENGES

2018

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era
Marcella Cornia, Davide Abati, Lorenzo Baraldi, Andrea Palazzi, Simone Calderara, Rita Cucchiara
INTELLIGENZA ARTIFICIALE

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
IEEE TRANSACTIONS ON IMAGE PROCESSING

Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms
Federico Bolelli, Michele Cancilla, Lorenzo Baraldi, Costantino Grana
JOURNAL OF REAL-TIME IMAGE PROCESSING

A Hierarchical Quasi-Recurrent approach to Video Captioning
Federico Bolelli, Lorenzo Baraldi, Costantino Grana
2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)

Aligning Text and Document Illustrations: towards Visually Explainable Digital Humanities
Lorenzo Baraldi, Marcella Cornia, Costantino Grana, Rita Cucchiara
Proceedings of the 24th International Conference on Pattern Recognition

Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts
Marcella Cornia, Stefano Pini, Lorenzo Baraldi, Rita Cucchiara
Digital Libraries and Multimedia Archives

Connected Components Labeling on DRAGs
Federico Bolelli, Lorenzo Baraldi, Michele Cancilla, Costantino Grana
2018 24th International Conference on Pattern Recognition (ICPR)

Connected Components Labeling on DRAGs: Implementation and Reproducibility Notes
Federico Bolelli, Michele Cancilla, Lorenzo Baraldi, Costantino Grana
Proceedings of the 25th International Conference on Pattern Recognition Workshops

LAMV: Learning to align and match videos with kernelized temporal layers
Lorenzo Baraldi, Matthijs Douze, Rita Cucchiara, Hervé Jégou
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

SAM: Pushing the Limits of Saliency Prediction Models
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

2017

Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
IEEE TRANSACTIONS ON MULTIMEDIA

A Video Library System Using Scene Detection and Automatic Tagging
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Digital Libraries and Archives

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era
Marcella Cornia, Davide Abati, Lorenzo Baraldi, Andrea Palazzi, Simone Calderara, Rita Cucchiara
AI*IA 2017 Advances in Artificial Intelligence

Hierarchical Boundary-Aware Neural Encoder for Video Captioning
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on

Layout analysis and content classification in digitized books
Andrea Corbelli, Lorenzo Baraldi, Fabrizio Balducci, Costantino Grana, Rita Cucchiara
Digital Libraries and Multimedia Archives

Modeling Multimodal Cues in a Deep Learning-based Framework for Emotion Recognition in the Wild
Stefano Pini, Olfa Ben Ahmed, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara, Benoit Huet
Proceedings of the 19th ACM International Conference on Multimodal Interaction

NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach
Stefano Pini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Image Analysis and Processing - ICIAP 2017

Visual Saliency for Image Captioning in New Multimedia Services
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
Multimedia & Expo Workshops (ICMEW), 2017 IEEE International Conference on

Preface
C. Grana, L. Baraldi
Communications in Computer and Information Science

2016

A Browsing and Retrieval System for Broadcast Videos using Scene Detection and Automatic Annotation
Lorenzo Baraldi, Costantino Grana, Alberto Messina, Rita Cucchiara
Proceedings of the 2016 ACM on Multimedia Conference

A Deep Multi-Level Network for Saliency Prediction
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
Pattern Recognition (ICPR), 2016 23rd International Conference on

Analysis and Re-use of Videos in Educational Digital Libraries with Automatic Scene Detection
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Digital Libraries on the Move

Context Change Detection for an Ultra-Low Power Low-Resolution Ego-Vision Imager
Francesco Paci, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara, Luca Benini
Computer Vision – ECCV 2016 Workshops

Historical Document Digitization through Layout Analysis and Deep Content Classification
Andrea Corbelli, Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Proceedings of the 23rd International Conference on Pattern Recognition

Multi-Level Net: a Visual Saliency Prediction Model
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
Computer Vision – ECCV 2016 Workshops

Optimized Connected Components Labeling with Pixel Prediction
Costantino Grana, Lorenzo Baraldi, Federico Bolelli
Advanced Concepts for Intelligent Vision Systems

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Shot, scene and keyframe ordering for interactive video re-use
Lorenzo Baraldi, Costantino Grana, Guido Borghi, Roberto Vezzani, Rita Cucchiara
Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

YACCLAB - Yet Another Connected Components Labeling Benchmark
Costantino Grana, Federico Bolelli, Lorenzo Baraldi, Roberto Vezzani
2016 23rd International Conference on Pattern Recognition (ICPR)

2015

A Deep Siamese Network for Scene Detection in Broadcast Videos
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Proceedings of the 23rd ACM international conference on Multimedia

2013

Hand Segmentation for Gesture Recognition in EGO-Vision
Giuseppe Serra, Marco Camurri, Lorenzo Baraldi, Benedetti Michela, Rita Cucchiara
Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices