Kl divergence and PSI

Zhang Bojun
Aug 10 · 10 min read

所謂的相對熵就是KL Divergence。現在,假設我們想知道某個策略和最優策略之間的差異,我們就可以用相對熵(relative entropy)來衡量這兩者之間的差異即,相對熵 = 某個策略的交叉熵 — 資訊熵(根據系統真實分佈計算而得的資訊熵,為最優策略),公式如下:

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade