Munich🥨NLP x Xpeng: Synthetic Data Generation

Save the Date!

We’re thrilled to announce our next meetup, hosted in collaboration with XPeng at their office, Weimarer Str. 32, 80807 München on Wednesday, June 10, 2026!.

RSVP

About the Event

We’re thrilled to announce our next meetup, hosted in collaboration with XPeng! Join us for an evening of cutting-edge research, industry insights, and networking at the intersection of synthetic data generation and multilingual LLM evaluation.

Agenda - TIMETABLE:

18:00 | Doors open + 🍕 food & drinks (pizza, juice, schorle, soda)
18:45 – 19:00 | Intro (XPeng, MunichNLP)
19:00 – 19:40 | Talk 1: “Synthetic Dialogue Generation & Audio Augmentation for Voice Assistants”
19:40 – 19:50 | Short break - 19:50 – 20:30 | Talk 2: “Synthetic Data for LLM Evaluation: Toward Dynamic, Scalable, and Multilingual Assessment”
20:30 – 21:00 | Networking & drinks

Talks

Florent Duême & Galina Lavrenteva: Synthetic Dialogue Generation & Audio Augmentation for Voice Assistants
Discover how XPeng generates large-scale synthetic dialogues and augments them with realistic acoustic environments for voice assistant training and evaluation. Built on SDialog, this talk covers the full pipeline: LLM-orchestrated persona-driven text generation, controlled evaluation of dialogue quality, and generative audio methods for simulating background noise and spatial sound conditions. The result is a scalable framework improving both NLU and ASR robustness.
Florent Duême focuses on synthetic text data generation and evaluation—designing persona-driven dialogue orchestration, scenario scripting, and automated quality metrics.
Galina Lavrenteva specializes in audio augmentation, leveraging generative audio to simulate realistic acoustic environments and noise conditions.
Raoyuan Zhao: Synthetic Data for LLM Evaluation: Toward Dynamic, Scalable, and Multilingual Assessment
Static, English-centric benchmarks struggle to keep pace with LLM advancements. Raoyuan will present recent work on synthetic data for controllable, adaptive evaluation, including perturbations to probe model robustness (e.g., typographical variation) and methods to reduce data contamination. She’ll also discuss reliability, diversity, and efficiency in evaluation frameworks.
Raoyuan Zhao is a PhD student at LMU Munich’s MaiNLP Lab, supervised by Dr. Michael A. Hedderich. Her research focuses on LLM evaluation, synthetic data, reinforcement learning, and multilinguality, with a commitment to advancing scalable and inclusive AI assessment.