pydata: Huiming's learning notes

Keep Looking, Don't Settle

DeepSeek V3 learning notes

1. What the problem to solve?

When I went back for the Spring Festival, DeepSeek released a new model. For a while, all kinds of media discussed it a lot, almost rising to the height of national destiny. The most important points discussed should be two: the first is the …

DeepSeek V3

1. What the problem to solve?

春节回去的时候正好碰上DeepSeek发布新的模型,一时间各路媒体讨论的沸沸扬扬,几乎上升到国运的高度。讨论的最重要的应 …

Prediction in decoder and KV-Cache

1. Prediciton in Decoder

在前面GPT summary里面对GPT的模型有一个综合的介绍,这里用一个fake example来解释一步步GPT是怎么做的,self attention是怎么计算的,KV cache是怎么回事。

GPT是decoder only的模型,根据前面的token来预测下一个token。比如有一个句子 "it is sunny today.",现在有初始输入 …

Image Generation 2: Latent Diffusion model / Stable Diffusion

In the previous blog we introduced diffusion model (DDPM) which is to learn the step (time \(t\)) and the noise function (NN model) by adding Gaussian noise to an image step by step and reversing the process by denosiing from Gaussian noise to an image. Diffusion model is the most …