Building Rewrite: How Phi Silica Enables Efficient On-Device Paraphrasing
Save the Date!
July 29th, 2025 19:00-20:00 – Munich🥨NLP Discord Server.
About this Event
In this talk, Marat Saidov will discuss the advances in building efficient on-device language models optimized for NPUs, highlighting techniques such as memory-mapped embeddings, KV caching, 4-bit quantization and speculative decoding. A key focus is Rewrite, Microsoft’s publicly available paraphrasing skill, covering comprehensive data collection strategies, carefully designed evaluation metrics utilizing LLM-as-a-judge, and various adapters such as LoRA. He will also highlight the role of system prompts and soft prompts, highlighting their effectiveness and competitiveness compared to LoRA. He will share insights on deploying compact models at scale, practical lessons learned, and future challenges we face.
Speakers
Marat Saidov is a Senior Software Engineer at Applied Sciences Group, Microsoft. Based in Belgrade, Serbia. Previously improved Speech Recognition and Natural Language Understanding services at Alice Voice Assistant, Yandex. Besides that, I was an NLP Research Assistant at HSE University, Russia.