Deepseek R1蒸馏模型本地部署 | xiaojing's personal blog

Deepseek在春节期间算是火出圈了，访问其服务器，没回答几条就会看见以下提示：

那有没有办法在自己的电脑上部署呢？首先“满血版”的R1(671b，6710亿个参数)模型就不要想了，一般人的电脑带不动，下面是推荐的配置：硬盘：最少404G安装空间内存：1342GB显卡: 多显卡如NVIDIA A100 80GB ×16满血版的跑不动，但是可以尝试下“残血版”的低参数的模型嘛，deepseek提供了以下六个小模型，分别是基于阿里的千问和Meta的Llama。DeepSeek 团队已经证明，较大模型的推理模式可以提炼为较小的模型，与通过强化学习在小型模型上发现的推理模式相比，其性能更好。

以下是使用 DeepSeek-R1 生成的推理数据，针对研究界广泛使用的几种密集模型进行微调而创建的模型。评估结果表明，提炼后的较小密集模型在基准测试中表现优异。

蒸馏模型	基础模型
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-Math-7B
DeepSeek-R1-Distill-Llama-8B	Llama-3.1-8B
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B
DeepSeek-R1-Distill-Llama-70B	Llama-3.3-70B-Instruct

以下是蒸馏模型的评估结果：

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH-500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT-4o-0513	9.3	13.4	74.6	49.9	32.9	759
Claude-3.5-Sonnet-1022	16.0	26.7	78.3	65.0	38.9	717
o1-mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ-32B-Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek-R1-Distill-Qwen-1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek-R1-Distill-Qwen-7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek-R1-Distill-Qwen-14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek-R1-Distill-Qwen-32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek-R1-Distill-Llama-8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek-R1-Distill-Llama-70B	70.0	86.7	94.5	65.2	57.5	1633