Profile
                picture

Lorenzo Baraldi

Tenure Track Assistant Professor (RTD-B), AImageLab
ELLIS Scholar and Coordinator of the Modena Unit
University of Modena and Reggio Emilia
  • Email: lorenzo.baraldi -at- unimore.it
  • Curriculum: C.V.
  •   

News

Paper accepted to NeurIPS 2024 · 09/26/2024

Our paper, "Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments", has been accepted to NeurIPS 2024, Datasets and Benchmarks track!

Introducing LLaVa-MORE · 08/03/2024

🔥 Today we are introducing LLaVA-MORE, a family of models that enhances LLaVA by integrating LLaMA 3.1 as the language model. Check out our Github repo!

Oral paper accepted to BMVC 2024 · 07/20/2024

Our paper, "Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization", has been accepted for oral presentation to BMVC 2024!

MINERVA proposal successful! · 07/04/2024

Our proposal MINERVA, submitted to the DIGITAL-EUROHPC-JU-2023-AISC-03-01 call, and coordinated by CINECA, has been successfully approved!

Three papers accepted at ECCV 2024! · 07/01/2024

Glad to announce that we have three papers accepted at ECCV 2024: "Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models", "Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities" and "BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues". 

Paper accepted to ACL 2024 · 05/16/2024

Our paper, "The Revolution of Multimodal Large Language Models: A Survey", has been accepted to the ACL 2024 Findings!

Paper accepted as highlight at CVPR 2023 · 03/02/2023

We are glad to announce that our paper "Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation" has been accepted to CVPR 2023 as highlight paper (top 2.5% of submissions). Arxiv and Github.

ELLIS Scholar · 07/29/2021

I have been elected as an ELLIS Scholar in the ELLIS society, the European Laboratory for Learning and Intelligent Systems.

Interview with La Repubblica · 09/23/2020

I have been interviewed by Jaime D'Alessandro on Rep: Scienze, about Gpt-3 and Transformed-based language models. You can read the article here.

LAMV is being used at Facebook to detect harmful content · 08/05/2019

Our solution for matching and detecting copied videos, published in CVPR 2018, is now being used in production scale at Facebook to detect harmful content.

See the official announcement on the Facebook newsroom website, and the Github repository with the source code.

Older news can be found in the news archive.

Featured publications

Complete list is available in the publications page.

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
ECCV 2024

   

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara
ECCV 2024

   

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
ECCV 2024

   

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition

   

The Revolution of Multimodal Large Language Models: A Survey

The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
Findings of the Association for Computational Linguistics: ACL 2024

 

What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU

What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU
Maximilian Bernhard, Yannic Kindermann, Roberto Amoroso, Matthias Schubert, Lorenzo Baraldi, Rita Cucchiara, Volker Tresp
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

 

FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval

FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval
Luca Barsellotti, Roberto Amoroso, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

 

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
M. Cornia, L. Baraldi, G. Fiameni, R. Cucchiara
International Journal of Computer Vision

 

Fully-Attentive Iterative Networks for Region-based Controllable Image and Video Captioning

Fully-Attentive Iterative Networks for Region-based Controllable Image and Video Captioning
M. Cornia, L. Baraldi, A. Tal, R. Cucchiara
Computer Vision and Image Understanding

 

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the IEEE/CVF International Conference on Computer Vision

   

Video Surveillance and Privacy: A Solvable Paradox?

Video Surveillance and Privacy: A Solvable Paradox?
Rita Cucchiara, Lorenzo Baraldi, Marcella Cornia, Sara Sarto
IEEE Computer

 

Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models

Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models
Roberto Amoroso, Matteo Tomei, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the British Machine Vision Conference 2023

 

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation
Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition

   

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation
Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe
Proceedings of ACM International Conference on Multimedia 2023

 

From Show to Tell:  A Survey on Image Captioning

From Show to Tell: A Survey on Image Captioning
M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, R. Cucchiara
IEEE TPAMI

   

Video action detection by learning graph-based spatio-temporal interactions

Video action detection by learning graph-based spatio-temporal interactions
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
Computer Vision and Image Understanding

   

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Marcella Cornia, LORENZO BARALDI, Rita Cucchiara
International Conference on Robotics and Automation

 

Meshed-Memory Transformer for Image Captioning

Meshed-Memory Transformer for Image Captioning
MARCELLA CORNIA, MATTEO STEFANINI, LORENZO BARALDI, Rita CUCCHIARA
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition

   

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
BMVC 2019

   

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
CVPR 2019

   

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
CVPR 2019

   

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
IEEE TRANSACTIONS ON IMAGE PROCESSING

   

LAMV: Learning to align and match videos with kernelized temporal layers

LAMV: Learning to align and match videos with kernelized temporal layers
Lorenzo Baraldi, Matthijs Douze, Rita Cucchiara, Hervé Jégou
CVPR 2018

   

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Hierarchical Boundary-Aware Neural Encoder for Video Captioning
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
CVPR 2017

 

Teaching

Complete list is available in the teaching page.

Architettura dei Calcolatori (2024/2025)
Course material
Ingegneria Informatica
Rita Cucchiara, Lorenzo Baraldi

Computer Vision and Cognitive Systems (2023/2024)
Course material · Upcoming exams
Laurea Magistrale in Ingegneria Informatica
Lorenzo Baraldi, Vittorio Cuculo

AI for Automotive (2023/2024)
Electronic Engineering for Intelligent Vehicles
Rita Cucchiara, Lorenzo Baraldi

Scalable AI (2023/2024)
Course material
Laurea Magistrale in Ingegneria Informatica
Lorenzo Baraldi, Giuseppe Fiameni, Marta Lovino