
Lorenzo Baraldi

Tenure Track Assistant Professor (RTD-B), AImageLab
ELLIS Scholar and Coordinator of the Modena Unit
University of Modena and Reggio Emilia
  • Email: lorenzo.baraldi -at-
  • Curriculum: C.V.


Paper accepted to NeurIPS 2024 · 09/26/2024

Our paper, "Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments", has been accepted to NeurIPS 2024, Datasets and Benchmarks track!

Introducing LLaVa-MORE · 08/03/2024

🔥 Today we are introducing LLaVA-MORE, a family of models that enhances LLaVA by integrating LLaMA 3.1 as the language model. Check out our Github repo!

Oral paper accepted to BMVC 2024 · 07/20/2024

Our paper, "Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization", has been accepted for oral presentation to BMVC 2024!

MINERVA proposal successful! · 07/04/2024

Our proposal MINERVA, submitted to the DIGITAL-EUROHPC-JU-2023-AISC-03-01 call, and coordinated by CINECA, has been successfully approved!

Three papers accepted at ECCV 2024! · 07/01/2024

Glad to announce that we have three papers accepted at ECCV 2024: "Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models", "Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities" and "BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues". 

Paper accepted to ACL 2024 · 05/16/2024

Our paper, "The Revolution of Multimodal Large Language Models: A Survey", has been accepted to the ACL 2024 Findings!

Paper accepted as highlight at CVPR 2023 · 03/02/2023

We are glad to announce that our paper "Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation" has been accepted to CVPR 2023 as highlight paper (top 2.5% of submissions). Arxiv and Github.

ELLIS Scholar · 07/29/2021

I have been elected as an ELLIS Scholar in the ELLIS society, the European Laboratory for Learning and Intelligent Systems.

Interview with La Repubblica · 09/23/2020

I have been interviewed by Jaime D'Alessandro on Rep: Scienze, about Gpt-3 and Transformed-based language models. You can read the article here.

LAMV is being used at Facebook to detect harmful content · 08/05/2019

Our solution for matching and detecting copied videos, published in CVPR 2018, is now being used in production scale at Facebook to detect harmful content.

See the official announcement on the Facebook newsroom website, and the Github repository with the source code.

Older news can be found in the news archive.

Featured publications

Complete list is available in the publications page.

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Luca Barsellotti, Roberto Bigazzi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
NeurIPS 2024

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
ECCV 2024


Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara
ECCV 2024


BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
ECCV 2024


Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition


The Revolution of Multimodal Large Language Models: A Survey

The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
Findings of the Association for Computational Linguistics: ACL 2024


What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU

What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU
Maximilian Bernhard, Yannic Kindermann, Roberto Amoroso, Matthias Schubert, Lorenzo Baraldi, Rita Cucchiara, Volker Tresp
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)


FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval

FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval
Luca Barsellotti, Roberto Amoroso, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)


Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
M. Cornia, L. Baraldi, G. Fiameni, R. Cucchiara
International Journal of Computer Vision


Fully-Attentive Iterative Networks for Region-based Controllable Image and Video Captioning

Fully-Attentive Iterative Networks for Region-based Controllable Image and Video Captioning
M. Cornia, L. Baraldi, A. Tal, R. Cucchiara
Computer Vision and Image Understanding


With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE/CVF International Conference on Computer Vision


Video Surveillance and Privacy: A Solvable Paradox?

Video Surveillance and Privacy: A Solvable Paradox?
Rita Cucchiara, Lorenzo Baraldi, Marcella Cornia, Sara Sarto
IEEE Computer


Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models

Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models
Roberto Amoroso, Matteo Tomei, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the British Machine Vision Conference 2023


Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation
Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition


Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation
Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe
Proceedings of ACM International Conference on Multimedia 2023


From Show to Tell:  A Survey on Image Captioning

From Show to Tell: A Survey on Image Captioning
M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, R. Cucchiara


Video action detection by learning graph-based spatio-temporal interactions

Video action detection by learning graph-based spatio-temporal interactions
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
Computer Vision and Image Understanding


SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Marcella Cornia, LORENZO BARALDI, Rita Cucchiara
International Conference on Robotics and Automation


Meshed-Memory Transformer for Image Captioning

Meshed-Memory Transformer for Image Captioning
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition


Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
BMVC 2019


Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
CVPR 2019


Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
CVPR 2019


Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara


LAMV: Learning to align and match videos with kernelized temporal layers

LAMV: Learning to align and match videos with kernelized temporal layers
Lorenzo Baraldi, Matthijs Douze, Rita Cucchiara, Hervé Jégou
CVPR 2018


Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Hierarchical Boundary-Aware Neural Encoder for Video Captioning
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
CVPR 2017



Complete list is available in the teaching page.

Architettura dei Calcolatori (2024/2025)
Course material
Ingegneria Informatica
Rita Cucchiara, Lorenzo Baraldi

Computer Vision and Cognitive Systems (2023/2024)
Course material · Upcoming exams
Laurea Magistrale in Ingegneria Informatica
Lorenzo Baraldi, Vittorio Cuculo

AI for Automotive (2023/2024)
Electronic Engineering for Intelligent Vehicles
Rita Cucchiara, Lorenzo Baraldi

Scalable AI (2023/2024)
Course material
Laurea Magistrale in Ingegneria Informatica
Lorenzo Baraldi, Giuseppe Fiameni, Marta Lovino