Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
从 PPO 到 DPO 再到 GRPO:经典大模型强化学习算法解读
TL;DR:在大语言模型(LLM)的对齐(Alignment)领域,强化学习扮演着核心角色。从 OpenAI 提出的 PPO,到斯坦福大学提出的 DPO,再到 DeepSeek 提出的 GRPO,每一代算法都在解决前一代的痛点。本文将从原理、公式推导到工程实现,系统梳理这三大算法的核心思想与演进逻辑,帮助读者建立完整的技术图谱。
位置编码的发展历程:从绝对、相对到多模态旋转编码
TL;DR: 本文梳理了位置编码的四代演进:从最初的可学习绝对位置编码(BERT、GPT)和正弦绝对编码(Transformer)的局限性,到相对位置编码(T5、Transformer-XL)的改进,再到旋转位置编码(RoPE)的突破性创新。RoPE 通过以”绝对之形,行相对之实”的设计,同时兼得了绝对和相对编码的优点。随后的2D-RoPE 和 M-RoPE 将这一机制创造性地扩展到了视觉和多模态场景,为现代视觉语言模型(如 Qwen2-VL、Qwen3-VL)的多维时空位置感知提供了坚实的数学基础。本文的核心洞见是:位置编码的发展本质上是对”距离”这一根本概念理解的深化,从难以捕捉的隐式相对关系,到通过旋转矩阵显式编码的相对位置,再到能够同时处理文本、图像和视频的统一时空坐标系。
从 Qwen-VL 到 Qwen3-VL:多模态大模型的四代进化之路
TL;DR: 本文系统梳理了 Qwen-VL 系列四代视觉语言模型的技术演进——从基础的视觉-语言对齐(Qwen-VL),到原生动态分辨率与多模态位置编码(Qwen2-VL),再到工程级推理效率优化(Qwen2.5-VL),最终走向更深层的视觉-语言融合(Qwen3-VL)。
CapRL: 用强化学习激发视觉语言模型的描述能力
Paper: CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning (CVPR 2025)
Authors: Xing et al.
TL;DR: CapRL提出了一种新颖的强化学习框架,通过将主观的”描述好不好”问题转化为客观的”问题能否答对”问题,有效解决了图像描述任务中的reward hacking难题,显著提升了模型生成稠密、准确(Dense and Accurate)描述的能力。
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2 
publications
Paper Title Number 1
Published in Journal 1, 2009
This paper is about the number 1. The number 2 is left for future work.
Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1).
Download Paper | Download Slides | Download Bibtex
Paper Title Number 2
Published in Journal 1, 2010
This paper is about the number 2. The number 3 is left for future work.
Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2).
Download Paper | Download Slides
Paper Title Number 3
Published in Journal 1, 2015
This paper is about the number 3. The number 4 is left for future work.
Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3).
Download Paper | Download Slides
Paper Title Number 4
Published in GitHub Journal of Bugs, 2024
This paper is about fixing template issue #693.
Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper
Paper Title Number 5, with math \(E=mc^2\)
Published in GitHub Journal of Bugs, 2024
This paper is about a famous math equation, \(E=mc^2\)
Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper
talks
Talk 1 on Relevant Topic in Your Field
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.