Nikolai Zakharov (罗一阳)

Senior Machine Learning Engineer (Multimodal / LLM) at VK AI

Multimodal AI engineer building LLMs, AI Agents, and audio-text systems. Research foundation (Tsinghua MSc) with industry pragmatism. 10 years in China's tech ecosystem. Currently at VK AI working on next-generation speech understanding.

Audio LLMs Multimodal AI LLM Inference 🇷🇺 🇬🇧 🇨🇳

Technical

From Sound to Meaning: Leveraging Audio Language Models for Music Relevance Assessment

How foundation audio models and Audio LLMs bridge the gap between acoustic similarity and perceptual relevance. Exploring MusicFM, CLAP, MuQ-MuLan, and Audio LLMs for music recommendation.

Mar 7, 2026 Audio LLM Research 23 min read

[AI for all] What is Intelligence?

Exploring the fundamental question of what constitutes intelligence, both human and artificial, and the implications for AI development.

Jan 29, 2025 LLM Research 1 min read

Pet Projects

Paper Clusterer: An Obsidian Plugin for Research Note Organization

Turn your scattered research notes into an interactive knowledge map. AI-powered clustering with embeddings, automatic labeling, and beautiful visualizations. Built with agentic coding tools.

Mar 14, 2026 Open Source Tool 5 min read

Life & Experience

Beijing Car Visa: The Capital's Invisible Border

The unique challenges of entering Beijing by car - exploring the physical and bureaucratic barriers that make Beijing a special case in China's governance system.

Jan 28, 2025 China Experience 2 min read

Education in China: From Disappointment to Opportunity

How I pivoted from mechanical engineering to CS at Tsinghua University, navigating language barriers and cultural challenges.

Jan 27, 2025 China Experience 3 min read

Thoughts on Learning Chinese

Reflections on language learning and cultural immersion - from necessity to fluency in China's tech ecosystem.

Jan 26, 2025 China Experience 2 min read

About

I'm Nikolai Zakharov, a Multimodal AI Engineer specializing in Audio LLMs and speech understanding. With 8+ years of production experience across VK, NIO, Huawei, and Tinkoff, I bridge the gap between research and industry deployment.

My track record includes: Audio LLMs at VK, RAG-based AI Agents at NIO, voice synthesis at Tinkoff, and multilingual NLU at Huawei. With fluency in Russian, English, and Chinese, I bring a unique cross-cultural perspective to AI development.

Currently at VK AI's Applied Research team, building next-generation speech understanding systems.