Nikolai Zakharov (罗一阳)
Senior Machine Learning Engineer (Multimodal / LLM) at VK AI
Multimodal AI engineer building LLMs, AI Agents, and audio-text systems. Research foundation (Tsinghua MSc) with industry pragmatism. 10 years in China's tech ecosystem. Currently at VK AI working on next-generation speech understanding.
Technical
From Sound to Meaning: Leveraging Audio Language Models for Music Relevance Assessment
How foundation audio models and Audio LLMs bridge the gap between acoustic similarity and perceptual relevance. Exploring MusicFM, CLAP, MuQ-MuLan, and Audio LLMs for music recommendation.
[AI for all] What is Intelligence?
Exploring the fundamental question of what constitutes intelligence, both human and artificial, and the implications for AI development.
Life & Experience
Beijing Car Visa: The Capital's Invisible Border
The unique challenges of entering Beijing by car - exploring the physical and bureaucratic barriers that make Beijing a special case in China's governance system.
Education in China: From Disappointment to Opportunity
How I pivoted from mechanical engineering to CS at Tsinghua University, navigating language barriers and cultural challenges.
Thoughts on Learning Chinese
Reflections on language learning and cultural immersion - from necessity to fluency in China's tech ecosystem.
About
I'm Nikolai Zakharov, a Multimodal AI Engineer specializing in Audio LLMs and speech understanding. With 8+ years of production experience across VK, NIO, Huawei, and Tinkoff, I bridge the gap between research and industry deployment.
My track record includes: Audio LLMs at VK, RAG-based AI Agents at NIO, voice synthesis at Tinkoff, and multilingual NLU at Huawei. With fluency in Russian, English, and Chinese, I bring a unique cross-cultural perspective to AI development.
Currently at VK AI's Applied Research team, building next-generation speech understanding systems.