DeepSeek Expert Parallelism Load Balancer (EPLB) Code Reading

Sun 20 April 2025

Introduction

In the previous Introduction to DeepSeek-V3, a crucial component highlighted was the use of DeepSeekMoE. When employing Expert Parallelism, different Experts are assigned to different GPUs. Since the load on different Experts may vary depending on the current workload, maintaining load balance across GPUs is critical. DeepSeek-MoE addresses this …

DeepSeek V3 learning notes

Sun 23 February 2025

1. What the problem to solve?

When I went back for the Spring Festival, DeepSeek released a new model. For a while, all kinds of media discussed it a lot, almost rising to the height of national destiny. The most important points discussed should be two: the first is the …

DeepSeek V3

Sun 16 February 2025

1. What the problem to solve?

春节回去的时候正好碰上DeepSeek发布新的模型，一时间各路媒体讨论的沸沸扬扬，几乎上升到国运的高度。讨论的最重要的应 …

Prediction in decoder and KV-Cache

Sun 21 April 2024

1. Prediciton in Decoder

在前面GPT summary里面对GPT的模型有一个综合的介绍，这里用一个fake example来解释一步步GPT是怎么做的，self attention是怎么计算的，KV cache是怎么回事。

GPT是decoder only的模型，根据前面的token来预测下一个token。比如有一个句子 "it is sunny today."，现在有初始输入 …

Image Generation 2: Latent Diffusion model / Stable Diffusion

Sun 01 October 2023

In the previous blog we introduced diffusion model (DDPM) which is to learn the step (time \(t\)) and the noise function (NN model) by adding Gaussian noise to an image step by step and reversing the process by denosiing from Gaussian noise to an image. Diffusion model is the most …

Image Generation 1: Diffusion model

Tue 04 July 2023

The previous notes introduced the text generation models (GPT family). This reading note is about image generator papers.

Similar to text generator which generate the next token, OpenAI has image-GPT which is a large transformer trained on next pixel prediction in which the pixels are concated into a vector to …

GPT-1, GPT-2, GPT-3, InstructGPT / ChatGPT and GPT-4 summary

Sun 28 May 2023

1. GPT-1

Improving Language Understandingby Generative Pre-Training

What the problem GPT-1 solve?

Before GPT-1, NLP was usually a supervised model. For each task, there are some labeled data, and then develop a suoervised model based on these labeled data. There are several problems with this approach: First, labeled data is …

Recommendation System 05 - Bayesian Optimization

Fri 31 December 2021

Recommendation System 04 - Gaussian process regression

Sun 26 December 2021

职场话题——关于职场communications skill的一点感想 (转载)

Wed 22 December 2021

← Older

pydata: Huiming's learning notes

Keep Looking, Don't Settle

DeepSeek Expert Parallelism Load Balancer (EPLB) Code Reading

Introduction

DeepSeek V3 learning notes

1. What the problem to solve?

DeepSeek V3

1. What the problem to solve?

Prediction in decoder and KV-Cache

1. Prediciton in Decoder

Image Generation 2: Latent Diffusion model / Stable Diffusion

Image Generation 1: Diffusion model

GPT-1, GPT-2, GPT-3, InstructGPT / ChatGPT and GPT-4 summary

1. GPT-1

What the problem GPT-1 solve?

Recommendation System 05 - Bayesian Optimization

Recommendation System 04 - Gaussian process regression

职场话题——关于职场communications skill的一点感想 (转载)