#entropy zero g/t
Explore tagged Tumblr posts
trollsntrash Ā· 6 months ago
Text
Tumblr media
"So, can I call you Big Boss now?" "Fuck you, Wilson." messing around with gmod animated props and discovered there was a model resizing feature so i made entropy: zero g/t out of boredom
clone cop's NOT amused LMFAOOO
4 notes Ā· View notes
sunnydaleherald Ā· 2 months ago
Text
The Sunnydale Herald Newsletter, Wednesday, September 18-Thursday, September 19
Master: You're dead! Buffy: I may be dead, but I'm still pretty. Which is more than I can say for you.
~~Prophecy Girl~~
[Drabbles & Short Fiction]
Tumblr media
Nothing's The Same (Buffy, PG) by badly_knitted
bufy a girl (Buffy/Spike, M) by lottiecrabie
Tumblr media
A Tale of Two Slayers (Buffy/Kendra, E) by Ann_Douglas
Convincing (Buffy/Giles, T) by simple_nothings
[Chaptered Fiction]
Tumblr media
Erotic Entertainment, Chapter 4 (Multi, E) by TheInfiniteDoctor
Cat Scratch Fever, Chapter 2 (Buffy/Cordelia/Oz, E) by KNZ1
My (love) Kill, Chapter 1 (Buffy/Spike, G) by scared_worm
Tumblr media
Love Lives Here, Chapter 100 (Buffy/Spike, NC-17) by Passion4Spike
Forty-eight days in LA, Chapter 7 (Buffy/Spike, NC-17) by Blissymbolics
Mysterious Destinies, Chapter 10 (Buffy/Spike, NC-17) by EnchantedWillow
Lost in Desolation, Chapter 9 (Buffy/Spike, NC-17) by Melme1325
Tumblr media
A Nice Surprise, Chapter 1 (Crossover with Riptide, FR15) by calikocat
Meanwhile, back on the farm..., Chapter 6 (Multiple crossings, FR13) by hysteriumredux
Seeing Clearly, Chapter 200 (Multiple crossings, FR13) by JoeB
Tumblr media
Perception, Chapter 4 (Buffy/Spike, G) by flootzavut
Champion of War, Chapter 3 (Buffy/Spike, NC-17) by Desicat
To All We Guard, Chapter 25 (Buffy/Spike, NC-17) by simmony
Waiting for You, Chapter 21 (Buffy/Spike, NC-17) by honeygirl1885
A Sword in the Man, Chapter 4 (Buffy/Spike, PG-13) by Desicat
[Images, Audio & Video]
Tumblr media
Artwork:The Slayer's pet by JSBirsa
Tumblr media
Manip:moodboard: buffy summers by courtillyy
Manip:Collage #172 by thedecadentraven
Gifset:Buffy Meme: [1/8 Episodes] 6x07 Once More With Feeling by lovebvffys
Gifset:7x15 | ā€œGet It Doneā€ by clarkgriffon
Gifset:6x18 "Entropy" by peeta-mellark
Gifset:Timestamp Roulette 3x03 | Faith, Hope, & Trick by clarkgriffon
Gifset:3.04 | Beauty and the Beasts by bangelgifs
Gifset:Mercedes McNab as HARMONY KENDALL šŸ¦„ by whatisyourchildhoodtrauma
Tumblr media
Video: Buffy Best Comebacks // Buffy The Vampire Slayer Funniest Moments by Hollistic Witch
Video: Buffy and Faith - Empty by juliaroxs241
[Reviews & Recaps]
Tumblr media
Big Bad Powerful She-Witches | Buffy the Vampire Slayer 7x9 "Never Leave Me" | Normies Reaction! by The Normies
Buffy the Vampire Slayer 4x11 & 4x12 REACTION | "Doomed" & "A New Man" by The Horror Bandwagon
Buffy the Vampire Slayer REACTION | Season 7, Episode 8: Sleeper by JayPeaKay
*BETTER THAN MCDONALD'S?!* Buffy the Vampire Slayer S6 Ep 12 "Double Meat Palace" Reaction FIRST by Nick Reacts
First Date: Buffy 7x14 Reaction by Dakara
[Fandom Discussions]
Tumblr media
[Something Blue with zero context] by mybloodyvampire
[Headcanon- Dru has a lot of things] by voices-not-echoes
Tumblr media
Question relating to Season 7 by multiple authors
I just watched s6e19 for the first timeā€¦ by multiple authors
I do like this moment of sincerity by multiple authors
Poll - Would you rather scenario regarding Giles in season 7 by multiple authors
The man in the middle by multiple authors
How has your perspective on the show changed as you get older? by multiple authors
What's on your Drusilla/Spike playlist? by multiple authors
Why is "Anne" such an disliked episode? by multiple authors
What are your best Buffy trivia questions? by multiple authors
How much time passes between 'The Body' and 'The Gift'? by multiple authors
Jasmine's best exploits/power statements? by multiple authors
Top 2 favorite episodes... by multiple authors
I finished the show 20 years ago, and Iā€™m only on season 1 of my rewatch, but I have thoughts about Cordelia and Angel by multiple authors
Buffy and Anya by multiple authors
Response to criticism over Buffyā€™s perceived hardness in S:7 by multiple authors
What did Faith and Willow talk about on their car ride back from Los Angeles? by multiple authors
Just finished watching Angel for the first time ever-VENT by multiple authors
It always seemed to me that Willow's arc was setting up to stall before Smashed and Wrecked by multiple authors
I Robot, You Jane appreciation thread by multiple authors
Buffy The Vampire Slayer is one of the most iconic shows ever produced but which season was THE season and why ? by multiple authors
Submit a link to be included in the newsletter!
Join the editor team :)
5 notes Ā· View notes
stucky-ficrecs Ā· 4 years ago
Text
Completed WIPs, June 2020
The Color Of Blood by MonsieurBlueSky (MyChemicalRachel) (T, 9K)
8:31pm by justforme23 (T, 1775)
Otherkind by MizDiablo (E, 65K)
Toothpaste Kisses by buckybees (T, 19K)
Try Me On For Size by fandomfluffandfuck (E, 7K)
where else would we go by kocuria (G, 1433)
A Higher Epsilon by deadto27 (E, 91K)
Red Carpet Rescue Mission by darter_blue (E, 10K)
Encouragement and Persuasion by orphan_account (NR, 36K)
The Discovery by fandomfluffandfuck (E, 5K)
Siberia by CC99trialanderrorgirl (E, 10K)
A Study In Steve Rogers' Stupidity by TheGameIsOver (T, 8K)
He Wears It Well by roe87 (E, 15K)
Nothing colder than being alone by lalalalalahahahahaha (M, 4K)
Heavy Lies the Head by inflomora, odetteandodile (M, 60K)
The Safer Course by seapigeon (M, 8K)
Broken memories by mcuwhore (G, 26K)
Very, Very Easy To Be True by Whendoestheshipsail (AO3 login required) (E, 57K)
Paradise Lost (& Found) by JJK (M, 121K)
Latte Art and Slow Dancing in the Dark by deadonarrival for The Twitter Collaborative (E, 90K)
ā€œHey, Sarge, you got a dame?ā€ by rox_fanfics (T, 3K)
Out Of Order by Marvel_Mania (E, 19K)
Bucky Makes Cheesecake for Clint's Birthday Dinner by E_Greer (M, 8K)
way far beyond by squirmingbubbles (T, 7K)
Third one's the charmed by NatyCeleste (E, 12K)
When The World Was At War (we just kept dancing) by PeachyKeener (E, 41K)
Renegades by crinklefries (M, 142K)
Phase One: MechAvengers Assemble by DarkFairytale (T, 51K)
Podfic for "Dinner for Two" by Dira Sudis by MsPooslie for Dira Sudis (dsudis), hpismyhomeboy (E, 22)
Take Me Home by fandomfluffandfuck (NR, 5K)
cradling the flame by astudyinsolitude for ZepysGirl (E, 17K)
Brooklyn Syndrome by lordelannette (E, 158K)
They Were Zoomates!!! by Written_prose_things (G, 4K)
college 101 by mareviils (E, 14K)
Under the Table and Dreaming by Daretodream66 (M, 76K)
So. You want to know more about the A/B/O universe. by moonythejedi394 (M, 13K)
Corrupted by nerdyrose24 (T, 3K)
The Comeback Kid by grimeysociety (E, 169K)
Latitude Zero by Madara_Nycteris (E, 27K)
why would i shake your hand (when i could shake your bed)? by mediocre_fanfics (NR, 23K)
in the ruins of our worlds by made_of_sunshine (T, 56K)
Old Friends, New Problems by BFab (NR, 14K)
A Perfect Prescription by thewaythatwerust (E, 45K)
I Call Out Your Name, It Feels Like a Song I Know So Well by breatheforeverypart (M, 21K)
two gentlemen of brooklyn by rooonil_waazlib (E, 8K)
Baby Seasons Change But People Don't by fandomfluffandfuck (NR, 6K)
put you on something new by howdoyousleep, the1918 (E, 14K)
Raging War Within by softboibarnes (E, 124K)
Any Day We Can Wash Out To Sea by ashdeanmanns (NR, 26K)
entropy by truehumandisaster (NR, 4K)
Starting Over by Annaelle (T, 50K)
Everybody's a Dreamer by powerfulowl (StuckyFlangst) (E, 13K)
Quality Wood by cloudycelebrations (G, 600)
14 notes Ā· View notes
nelsonbeauchejason Ā· 2 years ago
Text
youtube
youtube
Tumblr media
youtube
5x5 == *Y.ank E&/|&3\B
RSC:BeCCa
UšŸšŸ«§šŸš½
KšŸŖ šŸš¾šŸšæ
RA(C++);
]TšŸŽ§@[:šŸ˜ ā€˜AR$U$
youtube
NA:[R]
šŸ¤©šŸŽ¶šŸŽµ
šŸŽ§šŸŽ¼šŸ‘©ā€šŸŽ¤
youtube
E šŸ‘Øā€šŸŽ¤[šŸŽ»]šŸ¤” <- šŸ’­
R {^.*}ā€™a.$.c = šŸ“
G QL ALF:šŸŖ• == šŸ
O IšŸ†ššŸŽ¹; Tu -> Du%.dd
Tumblr media
W AS H8F8CHK ? M*
šŸ¤©ā€™DA +e Rā€”i $V.etI &V3 =šŸ§–vāš ļø.@šŸ’«6^2Z
šŸŒŠ[
Tumblr media
šŸ„£<Ti~T@-MLe3^šŸ°>@
[šŸ¼:$&ap&U QA
$BUN.exe -šŸ½ļø.UB>DB4.$ -io -UX
šŸ˜Ž@šŸŒŸ] M.O. {šŸ„„}PšŸ†š R:Q
C i =šŸ“<=:N.UM<ā€”]nI8ā€”P
Tumblr media
*^-[~]:
H.EX E(8);L<DšŸ„¢ā€™X
>e%Piā€™
>$ šŸ˜.exe -opt66 -view -7D81
$šŸ˜‘.FHƆ(šŸŽ šŸ§²šŸŖ); AuTO:-šŸŖ«.CMD
BGr*.* &! ā€˜YEllā€™&ā€OWā€..
6]x9[:96ā€”H2]O, (SPlinDA); {ā›²ļø, s-infinitive}.(ā€¦
..
UTIL(Prior S; a cogito; sm-3); [gerund, šŸŒ„].. ā€¦)
.1šŸ˜’ (48āœ’ļø3P8 = H8.NuLL) ; ā€”I <=> šŸ¦»ā€™haā€™
.2šŸ˜›ā€™hšŸ˜¤H <-8šŸ–‹ļøFxEh.00
.EšŸŽÆ.šŸ§ā€ā™€ļø.SšŸ§4
Pā˜•ļøRšŸ—£ļøOšŸ”ŠDuCšŸ”‰T = .^. Sā€˜šŸ“£
Mā€™šŸ§­ & ā€œW [ā€˜šŸ”ˆ]
I = S[4.MEM]&E..
šŸ’­ <=> [.@*]
@ ā€˜Iā€™ <PHD@:8šŸ§8:-er:> ā€œO ā€œ :: @A+1 :šŸ”»:0
Tumblr media
R/3:<<=:DAP
O = AD
A !== [#&] T[$AvP] @šŸ…æļø00P; mDNA
Z a:šŸÆ
a šŸ¦”:z
NaH:0šŸ
2{np(šŸ„£);
P(šŸ¼|^:šŸ¦§:-SM&P EaveLR/2)
<šŸ„„F^L>iK.3X>Eā€”
:$šŸ½ļø,TYPES E%MAD.di :$: T$eve-N$T0$
šŸ„¢ā€”i :šŸ’­:
O [šŸ¼Yiā€”N m, I] :šŸ¤”:
U == Particular(šŸ‘ļø|//šŸ˜³šŸ¤ššŸ”™; šŸ¦€)
Qāš“ļø = {šŸššŸ„”šŸ§ƒ} == QA
BU N AZN ZzZ BOT-tOm HE -> 3N.EM AP
T šŸ„›; zero-sum STAigiNā€™ TEST MA@ FR MA
N šŸŒŒ.T4ReLĆ© AS C^Eā€™M:EA šŸ˜® A.EA šŸ˜®
Sound; Induction D0 <- Beg[šŸ§Š]
Abducting-DeDuction šŸ· <- W__$
UNSouNā€™: Epistemological D1 M__%
OrTHoGoNai :. PREmi$e. IĀ°E0 CON:P2,P1
youtube
pReMi$e PMP;šŸ…æļø E G^Tu ƃ {|šŸŖØšŸ’­šŸ„”|};
šŸ˜–!šŸ§“:šŸš–
šŸ…æļø = Quench(Ƃ^BoMB<-8|PrioR);
(
MeTHod A = DeB[8]^(ORa) &H.it(t*h)
T-I =! C^2 cos [(FIFO), (LIFO)]
E-D = <FTE> /ā€˜J__1@YAh/OO.CoMā€™
E&D-A:8:=:8N-BEck i.e. EXā€™LuSi = ā€3,ā€ e.g.)
)&šŸ…æļøvšŸ‘¦
Tumblr media
PšŸ„”&šŸ¦€S;4P https://youtu.be/9p_U_o1pMKo FY
TY
TU
^PiPe & @Foe,
#Do&e-3^Sā€™X
#XoDx && YoDy FR E!=[VALuE]
Tumblr media
TR:ERR; 5x5 = {me,myself & ā€˜iā€™}
youtube
šŸ˜¹EAM is RoRfnRIM(šŸÆ);
youtube
#MSG; B*R*B.8.1.3.69-Tu-šŸŖ« && šŸ“¢
UNK KNOWN
šŸ³ļøā€šŸŒˆ
KNOWN UNK
šŸ¦„
ENTROPY šŸ¹ SIGINT ā˜”ļø
GšŸŒļøā€ā™€ļøašŸ©²mšŸ‘™BšŸ¤æLĆ© AšŸ©±M šŸŖŸPyoX *~i:.
.:~i[*šŸ§»-P|TuRā€™N domainā€™s,
N-geNSUffI T.fprint] ::
x(RANDšŸ•Šļø)šŸļø.MSD ? M ! SUBJECT .?.
*^*^*^*^*
LIFE ALERT Flee THen Fight
*v*v*v*v%[fn(dx); b8KoVj^i
BšŸ–¤iā®ļøLšŸ§»
DšŸ±šŸ“„šŸ’ā˜‘ļøšŸ“„šŸ“¦
IšŸ“«šŸ§ƒšŸ—ƒļøšŸ“­šŸ„ŠšŸšƒ
VeKTOR šŸ“„L,šŸ“¦L,šŸŽu,āœ…N.N
OšŸ“«ProšŸ“¤PšŸ—³ļøer šŸ“®TšŸ„”, išŸ±e
GoS4, e.g.
RECOMMEND THE REST OF OUR FAMILY AND THEN GO LIVE WITH YOUR EYES TODAY AND GET YOUR OWN MONEY šŸ’°
youtube
MšŸŖŸoYšŸŒ¬ļøušŸƒNšŸŖŸRšŸŽ‹oYšŸ”gšŸŽ„BišŸŖŸVšŸ““ BuF -ARDuOuS$ &_OvR ā€˜_BoT@
U->
<-D
F :: šŸ’‹
B;šŸŽÆ
L-is-šŸ‘‰
Tumblr media
R $ÄÆ šŸ‘‰H.ikeā€”K.išŸ¼šŸ‘‰UšŸŽ©šŸ‘‰Mā˜•ļøstršŸ”¢
šŸ‘‰IšŸ§‚šŸ‘‰NšŸ›¹šŸ‘‰TšŸ…æļøremise2
Undivided Mā€™D; medium
Tumblr media
[THOT]
{RAPE}
J(;$)K$-/in//var/i//a/nt
$>&Predicted @v@
Tu fn dmDB rn Q s;tiL(B&)e[&L] e.g.
I A M B a d B o i {i,s,g,&u; n&m:@C&} C.HuE
Tumblr media
EGO-ToKen
Lose-Lose &ā€UH-LO$,ā€™3
THis.is.DeaTH.šŸ§¬
SšŸ«£PšŸ•µļøā€ā™‚ļøTšŸ•µļøā€ā™‚ļøCašŸ•µļøā€ā™€ļøsšŸ‘€t
youtube
Strategy Process āš§ļø 4pL(ā™ ļø)orM@
šŸ¦>šŸ’¶
šŸ§<šŸ’·
šŸ’“>šŸ’ø
šŸ’šā²ļø == šŸ’“šŸ‘”šŸ’ž SUCCESS = āŒ›ļø
šŸ’–šŸ”šŸ’žšŸ‘©ā€šŸ’»šŸ“ā€ā˜ ļø
šŸ§‘ā€šŸ’»xšŸ‘Øā€šŸ’»
:šŸ•³ļø:
ā€œTo have someone understand your mind is a different kind of intimacy.ā€
ā€” Unknown
4K notes Ā· View notes
arxt1 Ā· 4 years ago
Text
Gravitational domain wall in two-dimensional dilaton gravity. (arXiv:2006.07962v1 [hep-th])
We study a special two-dimensional dilaton gravity with Lagrangian $\mathcal{L}=\frac{1}{2}\sqrt{-g}(\phi R+W(\phi))$ where $W(\phi)={\rm sech}^2\phi$. This theory describes two-dimensional spacetimes that are asymptotically flat. Very interestingly, it has an exact solution for the metric, ${\rm d} s^2=-(\tanh x){\rm d} t^2+1/(\tanh x)\, {\rm d} x^2$, which presents an event horizon but no singularity. Because of the kink profile for the metric components appearing in this solution, we refer to it as gravitational domain wall with the wall simply being the event horizon and separating two asymptotically Minkowskian spacetimes. The global causal structure for such an object is studied via coordinate extension and the thermodynamical quantities are computed. While the gravitational domain wall has non-zero temperature $1/4\pi$, its energy and entropy are vanishing.
from gr-qc updates on arXiv.org https://ift.tt/3hwHU5L
0 notes
theresawelchy Ā· 6 years ago
Text
tmp
<span id="MathJax-Element-1-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="ā€ role=ā€presentationā€>
Lecturer: Aram Harrow
Scribes: Sinho Chewi, William Moses, Tasha Schoenstein, Ary Swaminathan
November 9, 2018
Outline
Sampling from thermal states was one of the first and (initially) most important uses of computers. In this blog post, we will discuss both classical and quantum Gibbs distributions, also known as thermal equilibrium states. We will then discuss Markov chains that have Gibbs distributions as stationary distributions. This leads into a discussion of the equivalence of mixing in time (i.e. the Markov chain quickly equilibrates over time) and mixing in space (i.e. sites that are far apart have small correlation). For the classical case, this equivalence is known. After discussing what is known classically, we will discuss difficulties that arise in the quantum case, including (approximate) Quantum Markov states and the equivalence of mixing in the quantum case.
Gibbs distributions
We have already learned about phase transitions in a previous blog post, but they are important, so we will review them again. The Gibbs or thermal distribution is defined as follows: Suppose that we have an energy function <span id="MathJax-Element-2-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E:{0,1}nā†’Rā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>E:{0,1}nā†’RE:{0,1}nā†’R
E : {\{0,1\}}^n \to {\mathbb R}, which takes <span id="MathJax-Element-3-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>nn
n-bit strings to real numbers. Usually, <span id="MathJax-Element-4-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E=āˆ‘i=1mEiā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>E=āˆ‘mi=1EiE=āˆ‘i=1mEi
E = \sum_{i=1}^m E_i, where each <span id="MathJax-Element-5-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eiā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>EiEi
E_i term depends only on a few bits. For example, the energy might be the number of unsatisfied clauses in a 3-SAT formula, or it may arise from the Ising model. The Gibbs distribution is
<span id="MathJax-Element-6-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="p(x)=expā”{āˆ’E(x)/T}Zā€ role=ā€presentationā€>p(x)=exp{āˆ’E(x)/T}Zp(x)=expā”{āˆ’E(x)/T}Z
p(x) = \frac{\exp\{-E(x)/T\}}{Z}
where the normalization factor in the denominator, also called the partition function, is <span id="MathJax-Element-7-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Z=āˆ‘xāˆˆ{0,1}nexpā”{āˆ’E(x)/T}ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Z=āˆ‘xāˆˆ{0,1}nexp{āˆ’E(x)/T}Z=āˆ‘xāˆˆ{0,1}nexpā”{āˆ’E(x)/T}
Z = \sum_{x \in {\{0,1\}}^n} \exp\{-E(x)/T\}. Another, perhaps more operational, way to define the Gibbs distribution is:
<span id="MathJax-Element-8-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="p=argmaxqāˆˆP({0,1}n)ā”H(q)Ā subject to the constraintĀ āŸØq,EāŸ©=EĀÆ.ā€ role=ā€presentationā€>p=argmaxqāˆˆP({0,1}n)H(q)Ā subject to the constraintĀ āŸØq,EāŸ©=ĀÆE.p=argmaxqāˆˆP({0,1}n)ā”H(q)Ā subject to the constraintĀ āŸØq,EāŸ©=EĀÆ.
p = \operatorname*{arg\,max}_{q \in {\mathcal{P}}({\{0,1\}}^n)} H(q)~\text{subject to the constraint}~\braket{q,E} = \bar{E}.
In this expression, <span id="MathJax-Element-9-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P({0,1}n)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>P({0,1}n)P({0,1}n)
{\mathcal{P}}({\{0,1\}}^n) is the set of probability distributions on <span id="MathJax-Element-10-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="{0,1}nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>{0,1}n{0,1}n
{\{0,1\}}^n, <span id="MathJax-Element-11-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Hā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>HH
H is the Shannon entropy, and <span id="MathJax-Element-12-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="EĀÆā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>ĀÆEEĀÆ
\bar E is a constant representing the average energy. We are thinking of probability distributions and <span id="MathJax-Element-13-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>EE
E as vectors of size <span id="MathJax-Element-14-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="2nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>2n2n
2^n. It turns out that if we solve this optimization problem, then the Gibbs distribution is the unique solution.
Uses of Gibbs distributions
Why is it useful to work with Gibbs distributions?
Gibbs distributions arise naturally in statistical physics systems, such as constraint satisfaction problems (CSPs), the Ising model, and spin glasses. One approach to deal with Gibbs distributions is through belief propagation (BP), which yields exact inference on tree graphical models and sometimes phase transition predictions on loopy graphs. Instead, we will focus on a different approach, namely, sampling from the Gibbs distribution.
If we want to minimize <span id="MathJax-Element-15-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>EE
E (say, to find a 3-SAT solution), we can use simulated annealing. The idea of annealing is that we want to produce a crystal; a crystal is the lowest energy configuration of molecules. If we heat up the substance to a liquid and then cool it quickly, we will not get a nice crystal, because little bits of the material will point in different directions. In order to form a crystal, we need to cool the system slowly.
In computer science terms, we take a sample from a high temperature because sampling is generally easier at a higher temperature than at a lower temperature. We then use that sample as the starting point for an equilibration process at a slightly lower temperature, and repeat this procedure. If we reach zero temperature, then we are sampling from the minimizers of <span id="MathJax-Element-16-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>EE
E. In practice, the system will usually stop mixing before we get to zero temperature, but this is a good heuristic. You can think of this process as gradient descent, with some additional randomness.
Gibbs distributions are used to simulate physical systems.
Gibbs distributions are used in Bayesian inference due to the Hammersley-Clifford theorem, which will be discussed next.
Gibbs distributions are also connected to multiplicative weights for linear programming (not discussed in this blog post).
Bayesian inference & the Hammersley-Clifford theorem
In order to present the Hammersley-Clifford theorem, we must first discuss Markov networks. For this part, we will generalize our setup to a finite alphabet <span id="MathJax-Element-17-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī£ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī£Ī£
\Sigma, so the energy function is now a function <span id="MathJax-Element-18-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī£nā†’Rā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī£nā†’RĪ£nā†’R
\Sigma^n \to \mathbb R.
Markov chains
First, let us recall the idea of a Markov chain with variables <span id="MathJax-Element-19-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X1X1
X_1, <span id="MathJax-Element-20-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X2X2
X_2, <span id="MathJax-Element-21-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X3ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X3X3
X_3.
The random variables <span id="MathJax-Element-22-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X1X1
X_1, <span id="MathJax-Element-23-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X2X2
X_2, <span id="MathJax-Element-24-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X3ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X3X3
X_3 form a Markov chain if their joint distribution can be written in a factored way: <span id="MathJax-Element-25-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x1,x2,x3)=p1,2(x1,x2)p3āˆ£2(x3āˆ£x2)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>p(x1,x2,x3)=p1,2(x1,x2)p3āˆ£2(x3āˆ£x2)p(x1,x2,x3)=p1,2(x1,x2)p3āˆ£2(x3āˆ£x2)
p(x_1,x_2,x_3) = p_{1,2}(x_1,x_2)p_{3 \mid 2}(x_3 \mid x_2). For example, imagine that <span id="MathJax-Element-26-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X1X1
X_1, <span id="MathJax-Element-27-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X2X2
X_2, <span id="MathJax-Element-28-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X3ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X3X3
X_3 represent the weather on Monday, Tuesday, and Wednesday respectively. These random variables form a Markov chain if, conditioned on the weather on Tuesday, we have all of the information we need to forecast the weather on Wednesday. Another way to say this is that conditioned on the weather on Tuesday, then the weather on Monday and the weather on Wednesday are conditionally independent. Note that the weather on Monday and the weather on Wednesday are not independent; there can be correlations, but these correlations are mediated through the weather on Tuesday. It is important to note that the definition of a Markov chain is symmetric with respect to going forwards or backwards in time, so we can also write the conditional independence condition as <span id="MathJax-Element-29-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x1,x2,x3)=p2,3(x2,x3)p1āˆ£2(x1āˆ£x2)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>p(x1,x2,x3)=p2,3(x2,x3)p1āˆ£2(x1āˆ£x2)p(x1,x2,x3)=p2,3(x2,x3)p1āˆ£2(x1āˆ£x2)
p(x_1,x_2,x_3) = p_{2,3}(x_2,x_3) p_{1 \mid 2}(x_1 \mid x_2).
The conditional independence condition can also be written as <span id="MathJax-Element-30-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p1,3āˆ£2(x1,x3āˆ£x2)=p1āˆ£2(x1āˆ£x2)p3āˆ£2(x3āˆ£x2).ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>p1,3āˆ£2(x1,x3āˆ£x2)=p1āˆ£2(x1āˆ£x2)p3āˆ£2(x3āˆ£x2).p1,3āˆ£2(x1,x3āˆ£x2)=p1āˆ£2(x1āˆ£x2)p3āˆ£2(x3āˆ£x2).
p_{1,3 \mid 2}(x_1, x_3 \mid x_2) = p_{1 \mid 2}(x_1 \mid x_2) p_{3 \mid 2}(x_3 \mid x_2). Recall that for two random variables <span id="MathJax-Element-31-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X1X1
X_1 and <span id="MathJax-Element-32-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X2X2
X_2 with joint distribution <span id="MathJax-Element-33-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>pp
p, they are independent, i.e., <span id="MathJax-Element-34-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x1,x2)=p1(x1)p2(x2)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>p(x1,x2)=p1(x1)p2(x2)p(x1,x2)=p1(x1)p2(x2)
p(x_1,x_2) = p_1(x_1) p_2(x_2), if and only if <span id="MathJax-Element-35-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(X1;X2)=0ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(X1;X2)=0I(X1;X2)=0
I(X_1; X_2) = 0, where <span id="MathJax-Element-36-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Iā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>II
I here denotes the mutual information. Similarly, conditional independence is equivalent to the conditional mutual information <span id="MathJax-Element-37-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(X1;X3āˆ£X2)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(X1;X3āˆ£X2)I(X1;X3āˆ£X2)
I(X_1; X_3 \mid X_2) equaling zero. This quantity is defined as <span id="MathJax-Element-38-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(X1;X3āˆ£X2)=H(X1āˆ£X2)+H(X3āˆ£X2)āˆ’H(X1,X3āˆ£X2)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(X1;X3āˆ£X2)=H(X1āˆ£X2)+H(X3āˆ£X2)āˆ’H(X1,X3āˆ£X2)I(X1;X3āˆ£X2)=H(X1āˆ£X2)+H(X3āˆ£X2)āˆ’H(X1,X3āˆ£X2)
I(X_1;X_3 \mid X_2) = H(X_1 \mid X_2) + H(X_3 \mid X_2) ā€“ H(X_1, X_3 \mid X_2).
Keep in mind that conditional independence is characterized in two equivalent ways: via an algebraic condition on the distributions, and via mutual information.
Markov networks
A Markov network is like a Markov chain, but with more random variables and a more interesting structure. Imagine that we have a graph, where each node is associated with a random variable and the edges encode possible correlations. A Markov network has the property that if we take any disjoint collection of nodes <span id="MathJax-Element-39-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A, <span id="MathJax-Element-40-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B, and <span id="MathJax-Element-41-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C such that <span id="MathJax-Element-42-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and <span id="MathJax-Element-43-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C are fully separated by <span id="MathJax-Element-44-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B (that is, any path from <span id="MathJax-Element-45-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A to <span id="MathJax-Element-46-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C must go through <span id="MathJax-Element-47-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B, or alternatively, removing <span id="MathJax-Element-48-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B leaves <span id="MathJax-Element-49-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and <span id="MathJax-Element-50-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C disconnected), then <span id="MathJax-Element-51-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(XA;XCāˆ£XB)=0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>I(XA;XCāˆ£XB)=0I(XA;XCāˆ£XB)=0
I(X_A; X_C \mid X_B) = 0. The notation <span id="MathJax-Element-52-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="XAā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>XAXA
X_A here means the collection of random variables associated with the nodes in <span id="MathJax-Element-53-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A.
For example:
Here, if <span id="MathJax-Element-54-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="A={1,5,6}ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>A={1,5,6}A={1,5,6}
A=\{1,5,6\}, <span id="MathJax-Element-55-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="B={2,7}ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>B={2,7}B={2,7}
B=\{2,7\}, and <span id="MathJax-Element-56-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C={3,4}ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>C={3,4}C={3,4}
C=\{3,4\}, then <span id="MathJax-Element-57-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>BB
B separates <span id="MathJax-Element-58-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A and <span id="MathJax-Element-59-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>CC
C.
A Markov network is also called a graphical model or a Markov random field; and yet another name for them is Gibbs distribution, which is the content of the following theorem:
Theorem 1 (Hammersley-Clifford Theorem): Let <span id="MathJax-Element-60-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>pp
p be a strictly positive distribution on <span id="MathJax-Element-61-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī£nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī£nĪ£n
\Sigma^n. Then, <span id="MathJax-Element-62-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>pp
p can be represented as a Markov network with respect to a graph <span id="MathJax-Element-63-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Gā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>GG
G if and only if <span id="MathJax-Element-64-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>pp
p can be expressed as a Gibbs distribution <span id="MathJax-Element-65-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x)āˆexpā”{āˆ’āˆ‘CāˆˆC(G)EC(xC)}ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>p(x)āˆexp{āˆ’āˆ‘CāˆˆC(G)EC(xC)}p(x)āˆexpā”{āˆ’āˆ‘CāˆˆC(G)EC(xC)}
p(x) \propto \exp\{-\sum_{C \in {\mathcal{C}}(G)} E_C(x_C)\}, where <span id="MathJax-Element-66-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C(G)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>C(G)C(G)
{\mathcal{C}}(G) is the set of cliques (fully connected subsets) of <span id="MathJax-Element-67-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Gā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>GG
G.
This theorem says that Markov networks are the same as Gibbs states, with the same notion of locality.
The Hammersley-Clifford theorem implies an area law for mutual information; we will explain what this is and sketch why this is true. Divide a system into two disjoint pieces <span id="MathJax-Element-68-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A and <span id="MathJax-Element-69-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>BB
B. We want to know about the mutual information between <span id="MathJax-Element-70-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A and <span id="MathJax-Element-71-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>BB
B, <span id="MathJax-Element-72-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;B)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(A;B)I(A;B)
I(A;B). The Hammersley-Clifford theorem gives us a bound which depends only on the size of the boundary <span id="MathJax-Element-73-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āˆ‚
\partial between these sets. For simplicity, assume <span id="MathJax-Element-74-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚āŠ†Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āŠ†Bāˆ‚āŠ†B
\partial \subseteq B. Also, assume that the interactions have bounded range; then, the Hammersley-Clifford theorem tells us that <span id="MathJax-Element-75-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;Bāˆ£āˆ‚)=0ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(A;Bāˆ£āˆ‚)=0I(A;Bāˆ£āˆ‚)=0
I(A; B \mid \partial) = 0.
Now, we will use the fact <span id="MathJax-Element-76-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;Bāˆ£āˆ‚)=I(A;B,āˆ‚)āˆ’I(A;āˆ‚)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(A;Bāˆ£āˆ‚)=I(A;B,āˆ‚)āˆ’I(A;āˆ‚)I(A;Bāˆ£āˆ‚)=I(A;B,āˆ‚)āˆ’I(A;āˆ‚)
I(A; B \mid \partial) = I(A; B,\partial) ā€“ I(A; \partial). We can see this by writing out the expressions, but the intuition is that the term on the left asks about how much <span id="MathJax-Element-77-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A knows about <span id="MathJax-Element-78-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>BB
B, having already known about <span id="MathJax-Element-79-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āˆ‚
\partial. This equals how much <span id="MathJax-Element-80-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A knows about <span id="MathJax-Element-81-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>BB
B and <span id="MathJax-Element-82-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āˆ‚
\partial combined, minus how much <span id="MathJax-Element-83-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A knows about <span id="MathJax-Element-84-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āˆ‚
\partial alone. In this case, since we said <span id="MathJax-Element-85-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚āŠ†Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āŠ†Bāˆ‚āŠ†B
\partial \subseteq B, then <span id="MathJax-Element-86-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;B)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(A;B)I(A;B)
I(A; B) is the same as <span id="MathJax-Element-87-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;B,āˆ‚)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(A;B,āˆ‚)I(A;B,āˆ‚)
I(A; B, \partial). In general, however, we have an upper bound:
<span id="MathJax-Element-88-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="I(A;B)ā‰¤I(A;B,āˆ‚)=I(A;āˆ‚)+I(A;Bāˆ£āˆ‚)ā‰¤H(āˆ‚)ā‰¤|āˆ‚|logā”|Ī£|ā€ role=ā€presentationā€>I(A;B)ā‰¤I(A;B,āˆ‚)=I(A;āˆ‚)+I(A;Bāˆ£āˆ‚) ā‰¤H(āˆ‚)ā‰¤|āˆ‚|log|Ī£|I(A;B)ā‰¤I(A;B,āˆ‚)=I(A;āˆ‚)+I(A;Bāˆ£āˆ‚)ā‰¤H(āˆ‚)ā‰¤|āˆ‚|logā”|Ī£|
I(A;B) \le I(A; B, \partial) = I(A; \partial) + \cancel{I(A;B \mid \partial)} \le H(\partial) \le |\partial| \log |\Sigma|
In this calculation, we have used <span id="MathJax-Element-89-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;āˆ‚)=H(āˆ‚)āˆ’H(āˆ‚āˆ£A)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>I(A;āˆ‚)=H(āˆ‚)āˆ’H(āˆ‚āˆ£A)I(A;āˆ‚)=H(āˆ‚)āˆ’H(āˆ‚āˆ£A)
I(A; \partial) = H(\partial) ā€“ H(\partial \mid A) (the information between <span id="MathJax-Element-90-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A and <span id="MathJax-Element-91-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āˆ‚
\partial is the amount by which the entropy of <span id="MathJax-Element-92-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‚āˆ‚
\partial gets reduced once we know <span id="MathJax-Element-93-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A) and <span id="MathJax-Element-94-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="H(āˆ‚āˆ£A)ā‰„0ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>H(āˆ‚āˆ£A)ā‰„0H(āˆ‚āˆ£A)ā‰„0
H(\partial \mid A) \ge 0 (which is true classically).
Since the mutual information only scales with the surface area of the boundary and not with the area of the two regions <span id="MathJax-Element-95-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>AA
A and <span id="MathJax-Element-96-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>BB
B, this is known as an area law [1].
Relationship to Bayesian inference
In Bayesian inference, we have a model for a system which can be very complicated. The model represents our assumptions on how parts of the system are causally related to the rest of the system. We have some observations, and we want to sample from a distribution conditionally on the fixed observations. Sampling from a conditional distribution is not the same as sampling from the original distribution, but we can still formally represent the conditional distribution as a Markov network. Therefore, sampling from Markov networks is a broadly useful task.
As an example of a complicated Bayesian model, consider a hierarchical Bayesian model [2]. Bayesian statistics requires choosing a prior distribution, and when there is a natural parameterized family of priors that a statistician can use, it may make sense to introduce a distribution over the priors; this is known as introducing a hyperparameter, and inference in the resulting hierarchical model (including computation of the posterior distribution) is frequently intractable. However, it is still desirable to work with these models because they are often more accurate than models in which the prior is handpicked by a statistician.
Sampling from Gibbs distributions
The task of sampling from an arbitrary Gibbs distribution is MA-complete [3], and it is not hard to see that at low enough temperatures this problem is at least NP-hard. So, how do we sample from these distributions?
This section will discuss Monte Carlo Markov chain (MCMC) methods, namely the Metropolis-Hastings algorithm and Glauber dynamics. Readers familiar with these methods may wish to skip to the discussion of mixing in time. For readers who wish to build more intuition about Markov chains before proceeding, see the Appendix, where the simple example of the random walk on a cycle is treated in detail.
Monte Carlo Markov chain (MCMC) methods
The general approach is to use a Markov chain. Let <span id="MathJax-Element-97-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©=Ī£nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī©=Ī£nĪ©=Ī£n
\Omega=\Sigma^n be the possible states of the system. Effectively, a Markov chain is a way of doing a random walk over <span id="MathJax-Element-98-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī©Ī©
\Omega.
The transition probabilities of the Markov chain are1 <span id="MathJax-Element-99-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P{X(t+1)=yāˆ£X(t)=x}=Ty,x.ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>P{X(t+1)=yāˆ£X(t)=x}=Ty,x.P{X(t+1)=yāˆ£X(t)=x}=Ty,x.
{\mathbb P}\{X(t+1) = y \mid X(t) = x\} = T_{y,x}. Here, <span id="MathJax-Element-100-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T is the transition probability matrix. The column at index <span id="MathJax-Element-101-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x of <span id="MathJax-Element-102-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T is the probability distribution of the next state of the Markov chain, if the current state is <span id="MathJax-Element-103-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x. The row at index <span id="MathJax-Element-104-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y is a row of probability values which give the probabilities of jumping into state <span id="MathJax-Element-105-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y from every other state. It has the properties that its entries are non-negative and for every <span id="MathJax-Element-106-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x, <span id="MathJax-Element-107-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‘yāˆˆĪ©Ty,x=1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ‘yāˆˆĪ©Ty,x=1āˆ‘yāˆˆĪ©Ty,x=1
\sum_{y \in \Omega} T_{y,x} = 1. These properties say that <span id="MathJax-Element-108-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T is a (column) stochastic matrix.
Suppose we start at a state <span id="MathJax-Element-109-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="x(0)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>x(0)x(0)
x(0); or, more generally, we will start with a distribution <span id="MathJax-Element-110-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>pp
p over <span id="MathJax-Element-111-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī©Ī©
\Omega. If we move according to the chain once, the distribution will be <span id="MathJax-Element-112-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tpā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TpTp
Tp. If we move agian, the distribution will be <span id="MathJax-Element-113-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="T2pā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>T2pT2p
T^2 p. In general, after <span id="MathJax-Element-114-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>tt
t movements, the distribution is <span id="MathJax-Element-115-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ttpā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TtpTtp
T^t p. So, we can express the dynamics of the chain as matrix-vector multiplication.
It is worth mentioning that if we are simulating the chain on a computer and we are manipulating <span id="MathJax-Element-116-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>nn
n-bit numbers, then these probability vectors are of size <span id="MathJax-Element-117-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="2nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>2n2n
2^n so it becomes impractical to store the entire probability distributions.
The justification for our algorithms is the following theorem.
Theorem 2 (Perron-Frobenius Theorem): If <span id="MathJax-Element-118-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T is a stochastic aperiodic matrix, then one of the eigenvalues is <span id="MathJax-Element-119-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>11
1, and all other eigenvalues have magnitude strictly less than <span id="MathJax-Element-120-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>11
1. There is a unique probability distribution <span id="MathJax-Element-121-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€Ļ€
\pi such that <span id="MathJax-Element-122-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="TĻ€=Ļ€ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TĻ€=Ļ€TĻ€=Ļ€
T\pi = \pi.
The theorem implies that <span id="MathJax-Element-123-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ttpā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TtpTtp
T^t p will converge to the stationary distribution <span id="MathJax-Element-124-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€Ļ€
\pi as <span id="MathJax-Element-125-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tā†’āˆžā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>tā†’āˆžtā†’āˆž
t\to\infty. So, if we want to sample from a distribution, this provides a method of doing so: cook up a Markov chain that equilibrates to the desired distribution, and then run the Markov chain until convergence. A priori, it is not obvious how we can design the Markov chain. At first, our problem was to sample from a probability distribution (a vector), and now we have changed the problem to designing an entire matrix, which does not appear to make our task easier.
Now, the question becomes: how does one come up with Markov chains that give you the desired stationary distribution?
Metropolis-Hastings algorithm
The first algorithm we will introduce is the Metropolis-Hastings algorithm. One more desirable feature of a Markov chain is that it satisfies detailed balance, which says <span id="MathJax-Element-126-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€xTy,x=Ļ€yTx,yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€xTy,x=Ļ€yTx,yĻ€xTy,x=Ļ€yTx,y
\pi_x T_{y,x} = \pi_y T_{x,y} for all <span id="MathJax-Element-127-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x and <span id="MathJax-Element-128-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y. This condition says that if we pick a point with probability according to the stationary distribution and transition, the probability of picking <span id="MathJax-Element-129-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x and then moving to <span id="MathJax-Element-130-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y should be the same as picking <span id="MathJax-Element-131-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y and then moving to <span id="MathJax-Element-132-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x.
For a Markov chain in equilibrium, the total amount of probability flowing out of <span id="MathJax-Element-133-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x must equal the total amount of probability flowing into <span id="MathJax-Element-134-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x. For example, the United States might export products to Europe and import from China. Detailed balance says that the flow along each edge must balance, which is a more demanding condition. In the example with country trade deficits, we are requiring that all bilateral trade deficits must be zero.
Mathematically, detailed balance implies that <span id="MathJax-Element-135-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T can be transformed, via similarity transformations, into a symmetric matrix. The Metropolis-Hastings algorithm says that we should choose <span id="MathJax-Element-136-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T with the property <span id="MathJax-Element-137-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tx,yTy,x=Ļ€xĻ€y.ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Tx,yTy,x=Ļ€xĻ€y.Tx,yTy,x=Ļ€xĻ€y.
\frac{T_{x,y}}{T_{y,x}} = \frac{\pi_x}{\pi_y}. Suppose that we have an underlying graph on our state space, and suppose that we are at a state <span id="MathJax-Element-138-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x. The algorithm chooses a random neighbor, say <span id="MathJax-Element-139-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y, and then accepts or rejects this move with some probability. If the move is accepted, then we move to <span id="MathJax-Element-140-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y and continue the algorithm from there. Otherwise, if the move is rejected, then we stay at <span id="MathJax-Element-141-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x. We are free to choose any underlying graph (as long as it is connected and has a self-loop), and then we will tune the acceptance probability so that detailed balance holds.
Look at the trial move <span id="MathJax-Element-142-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā†’yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xā†’yxā†’y
x\to y. One way we can accomplish detailed balance is by looking at the ratio <span id="MathJax-Element-143-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€y/Ļ€xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€y/Ļ€xĻ€y/Ļ€x
\pi_y/\pi_x. If <span id="MathJax-Element-144-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€y/Ļ€xā‰„1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€y/Ļ€xā‰„1Ļ€y/Ļ€xā‰„1
\pi_y/\pi_x \ge 1, then always accept the move. If <span id="MathJax-Element-145-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€y/Ļ€x<1ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€y/Ļ€x<1Ļ€y/Ļ€x<1
% <![CDATA[ \pi_y/\pi_x , then accept the move with probability <span id="MathJax-Element-146-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€x/Ļ€yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€x/Ļ€yĻ€x/Ļ€y
\pi_x/\pi_y.
To get an idea for how the algorithm works, suppose that our underlying graph is <span id="MathJax-Element-147-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="dā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>dd
d-regular. Then, for neighbors <span id="MathJax-Element-148-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x and <span id="MathJax-Element-149-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y,
<span id="MathJax-Element-150-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="Ty,x=min{1,Ļ€yĻ€x}1d,Tx,y=min{1,Ļ€xĻ€y}1dā€ role=ā€presentationā€>Ty,x=min{1,Ļ€yĻ€x}1d,Tx,y=min{1,Ļ€xĻ€y}1dTy,x=min{1,Ļ€yĻ€x}1d,Tx,y=min{1,Ļ€xĻ€y}1d
%
Claim: <span id="MathJax-Element-151-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ty,xĻ€x=1dmin{Ļ€x,Ļ€y},ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ty,xĻ€x=1dmin{Ļ€x,Ļ€y},Ty,xĻ€x=1dmin{Ļ€x,Ļ€y},
T_{y,x} \pi_x = \frac{1}{d} \min\{\pi_x,\pi_y\}, which is manifestly symmetric in <span id="MathJax-Element-152-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x and <span id="MathJax-Element-153-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>yy
y; thus, we have reversibility. This is the basic idea of the Metropolis-Hastings algorithm.
How does it work for a Gibbs distribution <span id="MathJax-Element-154-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€x=expā”{āˆ’E(x)/T}/Zā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€x=exp{āˆ’E(x)/T}/ZĻ€x=expā”{āˆ’E(x)/T}/Z
\pi_x = \exp\{-E(x)/T\}/Z, where the energy function might, for example, count the number of violated clauses in a 3-SAT formula? In this case, we might be a little worried. The numerator of <span id="MathJax-Element-155-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€xĻ€x
\pi_x is pretty easy to compute (we can count how many violated constraints there are), but the denominator is hard to compute. In general, it is #P-hard to compute the denominator, because as <span id="MathJax-Element-156-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>TT
T drops to <span id="MathJax-Element-157-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>00
0, the partition function in this case approaches the number of 3-SAT solutions. So, how do we calculate the ratios <span id="MathJax-Element-158-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€y/Ļ€xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€y/Ļ€xĻ€y/Ļ€x
\pi_y/\pi_x that the algorithm requires? Weā€™re able to do this because the ratio does not depend on <span id="MathJax-Element-159-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Zā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>ZZ
Z:
<span id="MathJax-Element-160-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="Ļ€yĻ€x=expā”E(x)āˆ’E(y)T.ā€ role=ā€presentationā€>Ļ€yĻ€x=expE(x)āˆ’E(y)T.Ļ€yĻ€x=expā”E(x)āˆ’E(y)T.
\frac{\pi_y}{\pi_x} = \exp \frac{E(x)-E(y)}{T}.
Suppose that the energy is a sum of local terms, and the underlying graph corresponds to modifying one site at at a time. What this means is that the graph is <span id="MathJax-Element-161-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©={0,1}nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ī©={0,1}nĪ©={0,1}n
\Omega = {\{0,1\}}^n and the edges in the graph correspond to flipping exactly one bit. In this case, it becomes very easy to evaluate the computations needed for the algorithm; in fact, we can even do them in parallel.
How do we choose the underlying graph? The key idea is that we do not want the majority of our moves to be rejected. A good example to keep in mind is the Ising model, where the configurations are <span id="MathJax-Element-162-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xāˆˆ{0,1}nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xāˆˆ{0,1}nxāˆˆ{0,1}n
x \in {\{0,1\}}^n and the energy is <span id="MathJax-Element-163-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E(x)=āˆ’āˆ‘i,j=1nJi,jxixjā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>E(x)=āˆ’āˆ‘ni,j=1Ji,jxixjE(x)=āˆ’āˆ‘i,j=1nJi,jxixj
E(x) = -\sum_{i,j=1}^n J_{i,j} x_i x_j. If <span id="MathJax-Element-164-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ji,jā‰„0ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ji,jā‰„0Ji,jā‰„0
J_{i,j} \ge 0 for all <span id="MathJax-Element-165-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>ii
i, <span id="MathJax-Element-166-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="jā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>jj
j, then we say that the model is ferromagnetic (we obtain lower energy by making the sites agree with each other). Of course, an antiferromagnetic model is just the opposite of this.
Assume that the bits are laid out in a square and <span id="MathJax-Element-167-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ji,j=Jā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ji,j=JJi,j=J
J_{i,j} = J if <span id="MathJax-Element-168-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>ii
i and <span id="MathJax-Element-169-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="jā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>jj
j are neighbors on the square, and <span id="MathJax-Element-170-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ji,j=0ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ji,j=0Ji,j=0
J_{i,j} = 0 if they are not. As we vary the quantity <span id="MathJax-Element-171-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="J/Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>J/TJ/T
J/T, we observe a phase transition. If <span id="MathJax-Element-172-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="J/Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>J/TJ/T
J/T is small, then the coupling between the random variables is weak and the different parts of the system are almost independent; we call this the disordered phase. If <span id="MathJax-Element-173-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="J/Tā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>J/TJ/T
J/T is large, then the spins want to align in the same direction and the Gibbs distribution will look almost like the following: with probability <span id="MathJax-Element-174-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>1/21/2
1/2, all spins are <span id="MathJax-Element-175-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>++
+, and with probability <span id="MathJax-Element-176-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā€³ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>1/21/2
1/2, all spins are <span id="MathJax-Element-177-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ’āˆ’
-; we call this the ordered phase.
In the disordered phase, when the spins do not need to align so closely, the Metropolis-Hastings algorithm will work well. In the ordered phase, the algorithm is doomed. Indeed, suppose that most of the spins are <span id="MathJax-Element-178-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>++
+. As time proceeds, any <span id="MathJax-Element-179-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ’āˆ’
-s will switch to <span id="MathJax-Element-180-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>++
+. There may be islands of <span id="MathJax-Element-181-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ’āˆ’
ā€“ spins initially, but it will be energetically favorable for these islands to shrink over time. Therefore, there will be an exponentially small chance for the system to switch to a configuration with mostly <span id="MathJax-Element-182-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ’āˆ’
-ā€™s, and thus the chain takes exponentially long to mix. Here, people are interested in understanding the autocorrelation time, because the goal is to run the chain for some time, get one sample, run the chain for some more time, get another sample, etc.
Glauber dynamics
This next method (Glauber dynamics) is essentially the same as Metropolis-Hastings, but this is not immediately obvious. We are at a state <span id="MathJax-Element-183-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="x=(x1,ā€¦,xn)āˆˆĪ£nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>x=(x1,ā€¦,xn)āˆˆĪ£nx=(x1,ā€¦,xn)āˆˆĪ£n
x = (x_1,\dotsc,x_n) \in \Sigma^n. (For the Metropolis-Hastings algorithm, we could be walking on a state space without a product structure. However, Glauber dynamics requires a product structure.) Then, we update <span id="MathJax-Element-184-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xx
x to <span id="MathJax-Element-185-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(x1,ā€¦,xiāˆ’1,xiā€²,xi+1,ā€¦,xn)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>(x1,ā€¦,xiāˆ’1,xā€²i,xi+1,ā€¦,xn)(x1,ā€¦,xiāˆ’1,xiā€²,xi+1,ā€¦,xn)
(x_1,\dotsc,x_{i-1},x_iā€™,x_{i+1},\dotsc,x_n) with chance <span id="MathJax-Element-186-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€iāˆ£āˆ’i(xiā€²āˆ£x1,ā€¦,xiāˆ’1,xi+1,ā€¦,xn)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€iāˆ£āˆ’i(xā€²iāˆ£x1,ā€¦,xiāˆ’1,xi+1,ā€¦,xn)Ļ€iāˆ£āˆ’i(xiā€²āˆ£x1,ā€¦,xiāˆ’1,xi+1,ā€¦,xn)
\pi_{i\mid -i}(x_iā€™ \mid x_1,\dotsc,x_{i-1},x_{i+1},\dotsc,x_n). In other words, we hold all other bits fixed, and conditioned on those other bits, we resample the <span id="MathJax-Element-187-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>ii
ith bit. Like Metropolis-Hastings, <span id="MathJax-Element-188-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€Ļ€
\pi is stationary for this chain.
It is not obvious that these conditional distributions can be computed efficiently, but it is possible since normalizing the conditional distribution only requires summing over the possible configurations for a single random variable. On a Markov network, the conditional probability is <span id="MathJax-Element-189-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€iāˆ£N(i)(xiā€²āˆ£xN(i))ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€iāˆ£N(i)(xā€²iāˆ£xN(i))Ļ€iāˆ£N(i)(xiā€²āˆ£xN(i))
\pi_{i \mid N(i)}(x_iā€™ \mid x_{N(i)}), where <span id="MathJax-Element-190-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="N(i)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>N(i)N(i)
N(i) denotes the set of neighbors of <span id="MathJax-Element-191-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>ii
i. This makes the computation a constant-sized calculation (i.e., does not depend on the size of the system).
For example, in the Ising model, suppose we are at state <span id="MathJax-Element-192-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xāˆˆ{Ā±1}nā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>xāˆˆ{Ā±1}nxāˆˆ{Ā±1}n
x \in {\{\pm 1\}}^n. In Glauber dynamics, we pick a vertex <span id="MathJax-Element-193-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iāˆˆ[n]ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>iāˆˆ[n]iāˆˆ[n]
i \in [n] u.a.r. and update it to <span id="MathJax-Element-194-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>++
+ with probability <span id="MathJax-Element-195-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="piāˆ£N(i)(+āˆ£xN(i))=expā”(Tāˆ’1āˆ‘jāˆˆN(i)xj)expā”(āˆ’Tāˆ’1āˆ‘jāˆˆN(i)xj)+expā”(Tāˆ’1āˆ‘jāˆˆN(i)xj).ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>piāˆ£N(i)(+āˆ£xN(i))=exp(Tāˆ’1āˆ‘jāˆˆN(i)xj)exp(āˆ’Tāˆ’1āˆ‘jāˆˆN(i)xj)+exp(Tāˆ’1āˆ‘jāˆˆN(i)xj).piāˆ£N(i)(+āˆ£xN(i))=expā”(Tāˆ’1āˆ‘jāˆˆN(i)xj)expā”(āˆ’Tāˆ’1āˆ‘jāˆˆN(i)xj)+expā”(Tāˆ’1āˆ‘jāˆˆN(i)xj).
p_{i \mid N(i)}(+ \mid x_{N(i)}) = \frac{\exp(T^{-1}\sum_{j\in N(i)} x_j)}{\exp(-T^{-1} \sum_{j\in N(i)} x_j) + \exp(T^{-1} \sum_{j\in N(i)} x_j)}.
Mixing in time
Mixing in time means that the dynamics will equilibrate rapidly. It turns out that this is equivalent to mixing in space, which means that <span id="MathJax-Element-196-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>Ļ€Ļ€
\pi itself has decaying correlations. For example, the Ising model at low temperature has a lot of long-range correlations, but at high temperature it does not. For the high temperature regime, we can prove that mixing in time occurs. We will prove this for the ferromagnetic Ising model. The result is known more generally, but the proofs are much easier for the Ising model.
People have known about the Metropolis-Hastings algorithm since the 1950s, but only recently have researchers been able to prove convergence guarantees for the 2D Ising model. There is a large gap between theory and practice, but in some situations we can prove that the algorithm works.
Sampling from the distribution is roughly equivalent to estimating the partition function (sampling-counting equivalence). There have been many papers addressing tasks such as estimating the non-negative permanent, the number of colorings of a graph, etc.2 A dominant way of accomplishing these tasks is proving that the Metropolis-Hastings algorithm converges for these problems. It is easy to find algorithms for these problems that converge to Gibbs distributions, but the convergence may take exponential time.
We will look at the situation when the energy function looks like the Ising model, in the sense that the interactions are local and reflect the structure of some underlying space. Also, assume that the interactions are of size <span id="MathJax-Element-197-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="O(1)ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>O(1)O(1)
O(1) and that the scaling comes from the size of the system. When can we expect that our algorithms work? There are two main cases when we can argue that there should be rapid mixing.
High temperature regime: The system is very disordered, and in the limit as the temperature approaches infinity, we get the uniform distribution.
One-dimension: In 1D, we can exactly compute the partition function using dynamic programming. Before, we mentioned that if there are a sea of <span id="MathJax-Element-198-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>++
+s and an island of <span id="MathJax-Element-199-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>āˆ’āˆ’
-s, then it is energetically favorable for the island to shrink; note that this is no longer true in 1D. In a way, 1D systems are more ā€œboringā€ because they cannot exhibit arbitrarily long-range correlations.
In this part of the blog post, we will try to be more proof-oriented. We will start by explaining why it is plausible that high temperature means that the chain will mix rapidly in time.
Coupling method
One method of proving rates of convergence for Markov chains is by analzying the spectral gap. Another method is the coupling method.
The idea behind the coupling method is to start with two configurations <span id="MathJax-Element-200-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0),Y(0)āˆˆĪ©ā€ role=ā€presentationā€ style=ā€font-size: 109%; position: relative;ā€>X(0),Y(0)āˆˆĪ©X(0),Y(0)āˆˆĪ©
X(0),Y(0) \in \Omega. We want each one to evolve under the Markov chain.
The key part is that there is still some freedom with respect to what the dynamics looks like. In particular, we are allowed to correlate the <span id="MathJax-Element-201-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Xā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>XX
X and <span id="MathJax-Element-202-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Yā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>YY
Y processes. Thus, we are defining a joint transition probability <span id="MathJax-Element-203-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P{X(1)=x(1),Y(1)=y(1)āˆ£X(0),Y(0)}ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>P{X(1)=x(1),Y(1)=y(1)āˆ£X(0),Y(0)}P{X(1)=x(1),Y(1)=y(1)āˆ£X(0),Y(0)}
{\mathbb P}\{X(1)=x(1),Y(1)=y(1) \mid X(0),Y(0)\}. We want to design the process such that <span id="MathJax-Element-204-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(1)X(1)
X(1) and <span id="MathJax-Element-205-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(1)Y(1)
Y(1) are closer together than <span id="MathJax-Element-206-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)X(0)
X(0) and <span id="MathJax-Element-207-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(0)Y(0)
Y(0). Imagine that we have two particles bouncing around. Each particle follows the dynamics of <span id="MathJax-Element-208-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>TT
T, but they are correlated so that they drift together, and once they meet, they stick together. It turns out that the mixing time can be upper bounded by the time it takes for the particles to meet each other.
Assume we have some sort of distance function <span id="MathJax-Element-209-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="distā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>distdist
\operatorname{dist} on the underlying space and we can prove that <span id="MathJax-Element-210-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā”distā”(X(1),Y(1))ā‰¤expā”(āˆ’Ī±)distā”(X(0),Y(0))ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Edist(X(1),Y(1))ā‰¤exp(āˆ’Ī±)dist(X(0),Y(0))Eā”distā”(X(1),Y(1))ā‰¤expā”(āˆ’Ī±)distā”(X(0),Y(0))
\operatorname{\mathbb E}\operatorname{dist}(X(1),Y(1)) \le \exp(-\alpha) \operatorname{dist}(X(0),Y(0)). Then, it turns out that the mixing time <span id="MathJax-Element-211-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tmix(Ļµ)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>tmix(Ļµ)tmix(Ļµ)
t_{\rm mix}(\epsilon), i.e. the time required to get within <span id="MathJax-Element-212-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļµā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻµĻµ
\epsilon of the stationary distribution, is upper bounded as
<span id="MathJax-Element-213-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="tmix(Ļµ)ā‰¤logā”{(diamā”Ī©)/Ļµ}Ī±ā€ role=ā€presentationā€>tmix(Ļµ)ā‰¤log{(diamĪ©)/Ļµ}Ī±tmix(Ļµ)ā‰¤logā”{(diamā”Ī©)/Ļµ}Ī±
t_{\rm mix}(\epsilon) \le \frac{\log\{(\operatorname{diam}\Omega)/\epsilon\}}{\alpha}
Initially, the two particles can be <span id="MathJax-Element-214-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="diamā”Ī©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>diamĪ©diamā”Ī©
\operatorname{diam}\Omega apart, but the expected distance is exponentially shrinking as we run the coupling, so the mixing time is logarithmic in the diameter.
The distance between probability distributions is defined as follows. Let <span id="MathJax-Element-215-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pp
p and <span id="MathJax-Element-216-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>qq
q be two probability distributions on <span id="MathJax-Element-217-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī©Ī©
\Omega. Then, the metric is:3
<span id="MathJax-Element-218-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="12ā€–pāˆ’qā€–1=12āˆ‘xāˆˆĪ©|p(x)āˆ’q(x)|=min(X,Y)āˆ¼rāˆˆP(Ī©Ć—Ī©)r1=pr2=qPr{Xā‰ Y}.ā€ role=ā€presentationā€>12āˆ„pāˆ’qāˆ„1=12āˆ‘xāˆˆĪ©|p(x)āˆ’q(x)|=min(X,Y)āˆ¼rāˆˆP(Ī©Ć—Ī©)r1=pr2=qPr{Xā‰ Y}.12ā€–pāˆ’qā€–1=12āˆ‘xāˆˆĪ©|p(x)āˆ’q(x)|=min(X,Y)āˆ¼rāˆˆP(Ī©Ć—Ī©)r1=pr2=qPr{Xā‰ Y}.
\frac{1}{2} \|p-q\|_1 = \frac{1}{2}\sum_{x \in \Omega}|p(x)-q(x)| = \min_{\substack{(X,Y) \sim r \in {\mathcal{P}}(\Omega \times \Omega) \\ r_1 = p \\ r_2 = q}} {\mathbb P}_r\{X \ne Y\}.
In this expression, <span id="MathJax-Element-219-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="r1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>r1r1
r_1 and <span id="MathJax-Element-220-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="r2ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>r2r2
r_2 denote the first and second marginals of <span id="MathJax-Element-221-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="rā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>rr
r respectively. The minimum is taken over all couplings of <span id="MathJax-Element-222-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pp
p and <span id="MathJax-Element-223-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>qq
q. This is the correct way to measure the distance between distributions. To give some intuition for this quantity, the quantity on the right represents the best test to distinguish the two distributions. If <span id="MathJax-Element-224-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pp
p and <span id="MathJax-Element-225-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>qq
q are the same, we can take a coupling in which <span id="MathJax-Element-226-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Xāˆ¼pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Xāˆ¼pXāˆ¼p
X \sim p and <span id="MathJax-Element-227-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Yāˆ¼qā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Yāˆ¼qYāˆ¼q
Y \sim q are always identical. If <span id="MathJax-Element-228-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pp
p and <span id="MathJax-Element-229-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>qq
q have disjoint supports, then no matter what coupling we use, <span id="MathJax-Element-230-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Xā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>XX
X and <span id="MathJax-Element-231-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Yā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>YY
Y will never be equal.
It suffices to consider when <span id="MathJax-Element-232-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)X(0)
X(0) and <span id="MathJax-Element-233-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(0)Y(0)
Y(0) are neighbors, i.e. at distance <span id="MathJax-Element-234-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1 apart. This is because if we have <span id="MathJax-Element-235-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)X(0)
X(0) and <span id="MathJax-Element-236-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(0)Y(0)
Y(0) far apart, then we could look at the path between them and reduce to the case when they are neighbors. Formally, this is known as path coupling. The formal statement is in Theorem 12.3 of [4]:
Theorem 3: Let <span id="MathJax-Element-237-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī“ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī“Ī“
\Gamma be a connected weighted graph on the state space, where no edge has weight less than <span id="MathJax-Element-238-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="dminā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>dmindmin
d_{\min}. Let <span id="MathJax-Element-239-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="d(C,Cā€²)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>d(C,Cā€²)d(C,Cā€²)
d(C,Cā€™) be the length of the shortest path from <span id="MathJax-Element-240-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C to <span id="MathJax-Element-241-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€²ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Cā€²Cā€²
Cā€™ in <span id="MathJax-Element-242-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī“ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī“Ī“
\Gamma and let <span id="MathJax-Element-243-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="dmax=maxC,Cā€²āˆˆĪ©d(C,Cā€²)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>dmax=maxC,Cā€²āˆˆĪ©d(C,Cā€²)dmax=maxC,Cā€²āˆˆĪ©d(C,Cā€²)
d_{\max} = \max_{C,Cā€™ \in \Omega} d(C,Cā€™) be the diameter of <span id="MathJax-Element-244-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī“ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī“Ī“
\Gamma. Suppose there is a coupling such that for some <span id="MathJax-Element-245-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī“>0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī“>0Ī“>0
\delta > 0,
<span id="MathJax-Element-246-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Eā”[d(X(1),Y(1))|(X(0),Y(0))=(C,Cā€²)]ā‰¤(1āˆ’Ī“)d(C,Cā€²)ā€ role=ā€presentationā€>E[d(X(1),Y(1))āˆ£āˆ£(X(0),Y(0))=(C,Cā€²)]ā‰¤(1āˆ’Ī“)d(C,Cā€²)Eā”[d(X(1),Y(1))|(X(0),Y(0))=(C,Cā€²)]ā‰¤(1āˆ’Ī“)d(C,Cā€²)
\operatorname{\mathbb E}\bigl[d\bigl(X(1),Y(1)\bigr) \bigm\vert \bigl(X(0),Y(0)\bigr) = (C,Cā€™)\bigr] \le (1-\delta)d(C,Cā€™)
for all neighboring pairs <span id="MathJax-Element-247-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C, <span id="MathJax-Element-248-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€²ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Cā€²Cā€²
Cā€™, i.e., those pairs connected by an edge in <span id="MathJax-Element-249-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī“ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī“Ī“
\Gamma. Then, the mixing time is bounded by
<span id="MathJax-Element-250-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="tmix(Ļµ)ā‰¤logā”(Ļµāˆ’1dmax/dmin)Ī“.ā€ role=ā€presentationā€>tmix(Ļµ)ā‰¤log(Ļµāˆ’1dmax/dmin)Ī“.tmix(Ļµ)ā‰¤logā”(Ļµāˆ’1dmax/dmin)Ī“.
t_{\rm mix}(\epsilon) \le \frac{\log(\epsilon^{-1}d_{\max}/d_{\min})}{\delta}.
Glauber dynamics at high temperature
Recall that in Glauber dynamics, we pick a site <span id="MathJax-Element-251-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ii
i randomly and then update the site conditioned on its neighbors. The first way we will couple together <span id="MathJax-Element-252-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(1)X(1)
X(1) and <span id="MathJax-Element-253-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(1)Y(1)
Y(1) is by picking the same site for both of them.
Pick a random <span id="MathJax-Element-254-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iāˆˆ[n]ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>iāˆˆ[n]iāˆˆ[n]
i \in [n].
If <span id="MathJax-Element-255-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)N(i)=Y(0)N(i)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)N(i)=Y(0)N(i)X(0)N(i)=Y(0)N(i)
{X(0)}_{N(i)} = {Y(0)}_{N(i)}, then set <span id="MathJax-Element-256-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)i=Y(1)iā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(1)i=Y(1)iX(1)i=Y(1)i
{X(1)}_i = {Y(1)}_i (if the neighborhoods of the two points agree, then update them the same way). Otherwise, update them using the best possible coupling, i.e., pick a coupling for <span id="MathJax-Element-257-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(X(1)i,Y(1)i)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>(X(1)i,Y(1)i)(X(1)i,Y(1)i)
({X(1)}_i, {Y(1)}_i) which minimizes <span id="MathJax-Element-258-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P{X(1)iā‰ Y(1)i}ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>P{X(1)iā‰ Y(1)i}P{X(1)iā‰ Y(1)i}
{\mathbb P}\{ {X(1)}_i \ne {Y(1)}_i \}.
So if <span id="MathJax-Element-259-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)=Y(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)=Y(0)X(0)=Y(0)
X(0) = Y(0), then the points will never drift apart. The reason why analyzing this coupling is non-trivial is because there is a chance that the distance between the two points can increase.
Assume that the degree of the graph is <span id="MathJax-Element-260-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”Ī”
\Delta. Suppose that <span id="MathJax-Element-261-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="distā”(X(0),Y(0))=1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>dist(X(0),Y(0))=1distā”(X(0),Y(0))=1
\operatorname{dist}(X(0),Y(0)) = 1, that is, there is a single <span id="MathJax-Element-262-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>aa
a such that <span id="MathJax-Element-263-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)aā‰ Y(0)aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)aā‰ Y(0)aX(0)aā‰ Y(0)a
{X(0)}_a \ne {Y(0)}_a. What will happen to <span id="MathJax-Element-264-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(1)X(1)
X(1) and <span id="MathJax-Element-265-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(1)Y(1)
Y(1)? We start by picking a random <span id="MathJax-Element-266-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iāˆˆ[n]ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>iāˆˆ[n]iāˆˆ[n]
i \in [n]. There are three cases:
<span id="MathJax-Element-267-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iāˆ‰({a}āˆŖN(a))ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>iāˆ‰({a}āˆŖN(a))iāˆ‰({a}āˆŖN(a))
i \notin (\{a\} \cup N(a)) (with probability <span id="MathJax-Element-268-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1āˆ’(Ī”+1)/nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1āˆ’(Ī”+1)/n1āˆ’(Ī”+1)/n
1 ā€“ (\Delta + 1)/n): Nothing changes; <span id="MathJax-Element-269-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)X(0)
X(0) and <span id="MathJax-Element-270-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(0)Y(0)
Y(0) agree at <span id="MathJax-Element-271-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ii
i, and <span id="MathJax-Element-272-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(1)X(1)
X(1) and <span id="MathJax-Element-273-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(1)Y(1)
Y(1) will also agree at <span id="MathJax-Element-274-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ii
i. The distance remains at <span id="MathJax-Element-275-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1.
<span id="MathJax-Element-276-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="i=aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>i=ai=a
i = a (with probability <span id="MathJax-Element-277-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1/n1/n
1/n): We picked the one spot in which the two configurations differ. The neighborhoods of <span id="MathJax-Element-278-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>aa
a are the same for <span id="MathJax-Element-279-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)X(0)
X(0) and <span id="MathJax-Element-280-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Y(0)Y(0)
Y(0), so we update in the same way for both processes, and the distance drops to <span id="MathJax-Element-281-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>00
0.
<span id="MathJax-Element-282-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iāˆˆN(a)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>iāˆˆN(a)iāˆˆN(a)
i \in N(a) (with probability <span id="MathJax-Element-283-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”/nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”/nĪ”/n
\Delta/n): We could have different updates. Here, we have to use the high temperature assumption, which says that if we change one bit, the probability of a configuration cannot change too much.
In the Ising model, <span id="MathJax-Element-284-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E(x)=āˆ‘i,j=1nJi,jxixjā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>E(x)=āˆ‘ni,j=1Ji,jxixjE(x)=āˆ‘i,j=1nJi,jxixj
E(x) = \sum_{i,j=1}^n J_{i,j} x_i x_j. Changing <span id="MathJax-Element-285-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>aa
a can bias the energy by at most <span id="MathJax-Element-286-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”maxiJi,aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”maxiJi,aĪ”maxiJi,a
\Delta\max_i J_{i,a}, so the expected distance afterwards is <span id="MathJax-Element-287-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1+O(maxi,j=1nJi,j/T)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1+O(maxni,j=1Ji,j/T)1+O(maxi,j=1nJi,j/T)
1 + O(\max_{i,j=1}^n J_{i,j}/T).
Adding these cases up to get the overall expected distance gives
<span id="MathJax-Element-288-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Eā”distā”(X(1),Y(1))=1āˆ’1n+O(Ī”JmaxT)āŸā‰¤11n=1āˆ’cnā€ role=ā€presentationā€>Edist(X(1),Y(1))=1āˆ’1n+O(Ī”JmaxT)ī…’ī…”ī…”ī…”ī…”ī…”ī…”ī…‘ī…ī…”ī…”ī…”ī…”ī…”ī…”ī…“ā‰¤11n=1āˆ’cnEā”distā”(X(1),Y(1))=1āˆ’1n+O(Ī”JmaxT)āŸā‰¤11n=1āˆ’cn
\operatorname{\mathbb E}\operatorname{dist}\bigl(X(1), Y(1)\bigr) = 1-\frac{1}{n} + \underbrace{O\Bigl(\frac{\Delta J_{\max}}{T}\Bigr)}_{\le 1}\frac{1}{n} = 1 ā€“ \frac{c}{n}
for <span id="MathJax-Element-289-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>TT
T large enough, so the expected distance will shrink. This argument also tells us how large the temperature must be, which is important for applications. This gives us <span id="MathJax-Element-290-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tmix(Ļµ)=O(nlogā”nĻµ).ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>tmix(Ļµ)=O(nlognĻµ).tmix(Ļµ)=O(nlogā”nĻµ).
t_{\rm mix}(\epsilon) = O\Bigl(n\log\frac{n}{\epsilon}\Bigr). Notice that this is the same dependence as the coupon collector problem. Therefore, in the high temperature regime, the system behaves qualitatively as if there are no correlations.
Temporal and spatial mixing equivalence
The analysis of Glauber dynamics at high temperature is already a version of the equivalence between mixing in time and mixing in space. It says that if the correlations even with the immediate neighbors of a node are weak, then Glauber dynamics rapidly mixes.
Now, we want to consider the situation in which there can be strong correlations between immediate neighbors, but weak correlation with far away sites. We want to show that spatial mixing implies temporal mixing.
We will give a few definitions of correlation decay. (Note: The definitions of correlation decay below are not exactly the ones from Aramā€™s lecture. These definitions are from [5] and [6].)
For non-empty <span id="MathJax-Element-291-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="WāŠ†Vā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>WāŠ†VWāŠ†V
W \subseteq V and <span id="MathJax-Element-292-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ„āˆˆĪ£Vāˆ–Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ļ„āˆˆĪ£Vāˆ–WĻ„āˆˆĪ£Vāˆ–W
\tau \in \Sigma^{V\setminus W}, let <span id="MathJax-Element-293-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¼WĻ„ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī¼Ļ„WĪ¼WĻ„
\mu_W^\tau be the distribution of the spins in <span id="MathJax-Element-294-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>WW
W conditional on the spins in <span id="MathJax-Element-295-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Vāˆ–Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Vāˆ–WVāˆ–W
V \setminus W being fixed to <span id="MathJax-Element-296-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ„ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ļ„Ļ„
\tau. For <span id="MathJax-Element-297-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”āŠ†Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”āŠ†WĪ”āŠ†W
\Delta \subseteq W, let <span id="MathJax-Element-298-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¼W,Ī”Ļ„ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī¼Ļ„W,Ī”Ī¼W,Ī”Ļ„
\mu_{W,\Delta}^\tau be the marginal of <span id="MathJax-Element-299-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¼WĻ„ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī¼Ļ„WĪ¼WĻ„
\mu_W^\tau on the spins in <span id="MathJax-Element-300-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”Ī”
\Delta. We will assume that the interactions between the spins have finite range <span id="MathJax-Element-301-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="r>0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>r>0r>0
r > 0, and <span id="MathJax-Element-302-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚rWā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ‚rWāˆ‚rW
\partial_r W denotes the <span id="MathJax-Element-303-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="rā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>rr
r-boundary of <span id="MathJax-Element-304-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>WW
W, i.e., <span id="MathJax-Element-305-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="{vāˆˆVāˆ–W:distā”(v,W)ā‰¤r}ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>{vāˆˆVāˆ–W:dist(v,W)ā‰¤r}{vāˆˆVāˆ–W:distā”(v,W)ā‰¤r}
\{v \in V \setminus W : \operatorname{dist}(v,W) \le r\}.
(Weak decay of correlations) Weak spatial mixing holds for <span id="MathJax-Element-306-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>WW
W if there exist constants <span id="MathJax-Element-307-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C,Ī¾>0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>C,Ī¾>0C,Ī¾>0
C, \xi > 0 such that for any subset <span id="MathJax-Element-308-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”āŠ†Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”āŠ†WĪ”āŠ†W
\Delta \subseteq W, <span id="MathJax-Element-309-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="supĻ„,Ļ„ā€²āˆˆĪ£Vāˆ–Wā€–Ī¼W,Ī”Ļ„āˆ’Ī¼W,Ī”Ļ„ā€²ā€–1ā‰¤Cāˆ‘xāˆˆĪ”,yāˆˆāˆ‚rWexpā”(āˆ’distā”(x,y)Ī¾).ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>supĻ„,Ļ„ā€²āˆˆĪ£Vāˆ–Wāˆ„Ī¼Ļ„W,Ī”āˆ’Ī¼Ļ„ā€²W,Ī”āˆ„1ā‰¤Cāˆ‘xāˆˆĪ”,yāˆˆāˆ‚rWexp(āˆ’dist(x,y)Ī¾).supĻ„,Ļ„ā€²āˆˆĪ£Vāˆ–Wā€–Ī¼W,Ī”Ļ„āˆ’Ī¼W,Ī”Ļ„ā€²ā€–1ā‰¤Cāˆ‘xāˆˆĪ”,yāˆˆāˆ‚rWexpā”(āˆ’distā”(x,y)Ī¾).
\sup_{\tau,\tauā€™ \in \Sigma^{V\setminus W}}\|\mu_{W,\Delta}^\tau ā€“ \mu_{W,\Delta}^{\tauā€™}\|_1 \le C\sum_{x\in\Delta, \; y \in \partial_r W} \exp\Bigl(- \frac{\operatorname{dist}(x,y)}{\xi}\Bigr).
(Strong decay of correlations) Strong spatial mixing holds for <span id="MathJax-Element-310-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>WW
W if there exist constants <span id="MathJax-Element-311-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C,Ī¾>0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>C,Ī¾>0C,Ī¾>0
C,\xi > 0 such that for every <span id="MathJax-Element-312-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī”āŠ†Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī”āŠ†WĪ”āŠ†W
\Delta \subseteq W and every <span id="MathJax-Element-313-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ„,Ļ„ā€²āˆˆĪ£Vāˆ–Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ļ„,Ļ„ā€²āˆˆĪ£Vāˆ–WĻ„,Ļ„ā€²āˆˆĪ£Vāˆ–W
\tau,\tauā€™ \in \Sigma^{V\setminus W} differing only at site <span id="MathJax-Element-314-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yāˆˆVāˆ–Wā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>yāˆˆVāˆ–WyāˆˆVāˆ–W
y \in V\setminus W, <span id="MathJax-Element-315-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ā€–Ī¼W,Ī”Ļ„āˆ’Ī¼W,Ī”Ļ„ā€²ā€–1ā‰¤Cexpā”(āˆ’distā”(y,Ī”)Ī¾).ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ„Ī¼Ļ„W,Ī”āˆ’Ī¼Ļ„ā€²W,Ī”āˆ„1ā‰¤Cexp(āˆ’dist(y,Ī”)Ī¾).ā€–Ī¼W,Ī”Ļ„āˆ’Ī¼W,Ī”Ļ„ā€²ā€–1ā‰¤Cexpā”(āˆ’distā”(y,Ī”)Ī¾).
\|\mu_{W,\Delta}^\tau ā€“ \mu_{W,\Delta}^{\tauā€™}\|_1 \le C\exp\Bigl(-\frac{\operatorname{dist}(y,\Delta)}{\xi}\Bigr).
(Strong decay of correlations) Strong spatial mixing in the truncated sense holds for <span id="MathJax-Element-316-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Vā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>VV
V if there exist <span id="MathJax-Element-317-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="n,Ī¾>0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>n,Ī¾>0n,Ī¾>0
n, \xi > 0 such that for all functions <span id="MathJax-Element-318-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="f,g:Ī©ā†’Rā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>f,g:Ī©ā†’Rf,g:Ī©ā†’R
f, g : \Omega \to {\mathbb R} which depend only on the sites at <span id="MathJax-Element-319-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī›fā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī›fĪ›f
\Lambda_f and <span id="MathJax-Element-320-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī›gā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī›gĪ›g
\Lambda_g respectively and such that <span id="MathJax-Element-321-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="distā”(Ī›f,Ī›g)ā‰„nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>dist(Ī›f,Ī›g)ā‰„ndistā”(Ī›f,Ī›g)ā‰„n
\operatorname{dist}(\Lambda_f,\Lambda_g) \ge n, <span id="MathJax-Element-322-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="supĻ„āˆˆĪ£Vāˆ–WcovĪ¼WĻ„ā”(f,g)ā‰¤|Ī›f||Ī›g|ā€–fā€–āˆžā€–gā€–āˆžexpā”(āˆ’d(Ī›f,Ī›g)Ī¾).ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>supĻ„āˆˆĪ£Vāˆ–WcovĪ¼Ļ„W(f,g)ā‰¤|Ī›f||Ī›g|āˆ„fāˆ„āˆžāˆ„gāˆ„āˆžexp(āˆ’d(Ī›f,Ī›g)Ī¾).supĻ„āˆˆĪ£Vāˆ–WcovĪ¼WĻ„ā”(f,g)ā‰¤|Ī›f||Ī›g|ā€–fā€–āˆžā€–gā€–āˆžexpā”(āˆ’d(Ī›f,Ī›g)Ī¾).
\sup_{\tau \in \Sigma^{V\setminus W}} \operatorname{cov}_{\mu_W^\tau}(f, g) \le |\Lambda_f||\Lambda_g|\|f\|_\infty \|g\|_\infty \exp\Bigl(-\frac{d(\Lambda_f,\Lambda_g)}{\xi}\Bigr).
Here, <span id="MathJax-Element-323-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¾ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī¾Ī¾
\xi is the correlation length (in physics, it is the characteristic length scale of a system). In the disordered phase, the correlation length is a constant independent of system size. For our purposes, the main consequence of these definitions is that the effective interaction range of each spin is <span id="MathJax-Element-324-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="O(Ī¾)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>O(Ī¾)O(Ī¾)
O(\xi). For the Ising model, there is a key simplification due to monotonicity. Namely, the ferromagnetic Ising model has the nice property (which is not true for other models) that if we flip a sign from <span id="MathJax-Element-325-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ’āˆ’
ā€“ to <span id="MathJax-Element-326-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+, this only makes <span id="MathJax-Element-327-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+ more likely everywhere. This is because the spins want to agree. There are a lot of boundary conditions to consider, but here, due to monotonicity, we only need to consider two: all of the spins are <span id="MathJax-Element-328-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ’āˆ’
-, and all of the spins are <span id="MathJax-Element-329-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+. All <span id="MathJax-Element-330-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+ spins will give the highest probability of a <span id="MathJax-Element-331-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+ spin, and all <span id="MathJax-Element-332-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ’āˆ’
ā€“ spin will give the lowest probability of a <span id="MathJax-Element-333-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+ spin. This monotonicity property is generally not required for time-space mixing equivalence to hold, but it greatly simplifies proofs.
It is a very non-obvious fact that all of these notions of spatial mixing are equivalent. We will sketch a proof that strong correlation decay implies that <span id="MathJax-Element-334-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tmix=O(nlogā”n)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>tmix=O(nlogn)tmix=O(nlogā”n)
t_{\rm mix} = O(n\log n).
The idea is to use another coupling argument. Let <span id="MathJax-Element-335-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0),Y(0)āˆˆ{Ā±1}Vā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0),Y(0)āˆˆ{Ā±1}VX(0),Y(0)āˆˆ{Ā±1}V
X(0), Y(0) \in {\{\pm 1\}}^V differ in one coordinate, i.e., <span id="MathJax-Element-336-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)aā‰ Y(0)aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)aā‰ Y(0)aX(0)aā‰ Y(0)a
{X(0)}_a \ne {Y(0)}_a and <span id="MathJax-Element-337-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)i=Y(0)iā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>X(0)i=Y(0)iX(0)i=Y(0)i
{X(0)}_i = {Y(0)}_i for <span id="MathJax-Element-338-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā‰ aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>iā‰ aiā‰ a
i \ne a. We want to argue that the expected distance between the processes will decrease. The proof uses a generalization of Glauber dynamics called block Glauber dynamics. In Glauber dynamics, we take a single spin and resample it conditioned on its neighbors. In block Glauber dynamics, we take an <span id="MathJax-Element-339-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="LƗLā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>LƗLLƗL
L\times L box and resample it conditioned on its neighbors. There is an argument, called canonical paths, which can be used to show that if block Glauber dynamics mixes, then regular Glauber dynamics also mixes (slightly more slowly; we lose a <span id="MathJax-Element-340-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="polyā”(L)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>poly(L)polyā”(L)
\operatorname{poly}(L) factor, but anyway <span id="MathJax-Element-341-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Lā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>LL
L will be a large constant) so analyzing block Glauber dynamics is fine.
If <span id="MathJax-Element-342-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>aa
a lies in the box, then the expected change in distance is <span id="MathJax-Element-343-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’L2/nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ’L2/nāˆ’L2/n
-L^2/n. If <span id="MathJax-Element-344-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>aa
a is far away from the box, then there is no change. If <span id="MathJax-Element-345-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>aa
a is in the boundary of the box, then it is possible for the distance to increase. However, strong spatial mixing allows us to control the influence of a single site, so the expected change in distance is bounded by <span id="MathJax-Element-346-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="O(LĪ¾2/n)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>O(LĪ¾2/n)O(LĪ¾2/n)
O(L\xi^2/n). Now, since <span id="MathJax-Element-347-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¾ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī¾Ī¾
\xi is a constant, if we choose <span id="MathJax-Element-348-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Lā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>LL
L sufficiently large, then we will have the same situation as in the high temperature case: the expected distance will exponentially shrink over time.
Quantum systems
The quantum version of Markov chains has many more difficulties. The first difficulty is that the Hammersley-Clifford theorem (which we have been relying on throughout this blog post) fails.
Notation
To properly discuss what we mean, letā€™s set up some notation. Readers already familiar with density matrices, quantum entropy, and quantum mutual information may wish to skip to the next subsection. Most of the time we discuss quantum objects here, weā€™ll be using density matricies, often denoted <span id="MathJax-Element-349-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻĻ
\rho. A density matrix can be thought of as an extension to regular quantum states <span id="MathJax-Element-350-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|ĻˆāŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>|ĻˆāŸ©|ĻˆāŸ©
\ket{\psi}, where there is some classical source of uncertainty.
A density matrix is a positive semidefinite matrix with trace <span id="MathJax-Element-351-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1. This extends the notion of a classical probability distribution; in the quantum setting, a classical probability distribution <span id="MathJax-Element-352-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pp
p (thought of as a vector whose entries sum to <span id="MathJax-Element-353-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1) is represented as the density matrix <span id="MathJax-Element-354-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="diagā”(p)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>diag(p)diagā”(p)
\operatorname{diag}(p).
For example, we can consider a situation in which there is a <span id="MathJax-Element-355-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1/21/2
1/2 probability that we started with the quantum state <span id="MathJax-Element-356-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|ĻˆāŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>|ĻˆāŸ©|ĻˆāŸ©
\ket{\psi} and a <span id="MathJax-Element-357-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1/21/2
1/2 probability that we started with the quantum state <span id="MathJax-Element-358-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|Ļ•āŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>|Ļ•āŸ©|Ļ•āŸ©
\ket{\phi}. This would be denoted as follows:
<span id="MathJax-Element-359-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Ļ=12|ĻˆāŸ©āŸØĻˆ|+12|Ļ•āŸ©āŸØĻ•|ā€ role=ā€presentationā€>Ļ=12|ĻˆāŸ©āŸØĻˆ|+12|Ļ•āŸ©āŸØĻ•|Ļ=12|ĻˆāŸ©āŸØĻˆ|+12|Ļ•āŸ©āŸØĻ•|
\rho = \frac{1}{2} \ket{\psi} \bra{\psi} + \frac{1}{2} \ket{\phi} \bra{\phi}
Density matricies are generally useful for a lot of tasks, but for our purposes a density matrix will be used to discuss both the classical and quantum ā€œuncertaintyā€ we have about what state we have.
Now letā€™s also talk about a second important piece of notation: the tensor product. Often when discussing quantum states, it is important to discuss multiple quantum states simultaneously. For example, Alice has one system <span id="MathJax-Element-360-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and Bob has another system <span id="MathJax-Element-361-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B. However, these systems might be entangled, meaning that the results of the two systems are correlated.
For instance, let us consider the following state:
<span id="MathJax-Element-362-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="|ĻˆāŸ©=12(|+āŸ©A|+āŸ©B+|āˆ’āŸ©A|āˆ’āŸ©B)ā€ role=ā€presentationā€>|ĻˆāŸ©=1āˆš2(|+āŸ©A|+āŸ©B+|āˆ’āŸ©A|āˆ’āŸ©B)|ĻˆāŸ©=12(|+āŸ©A|+āŸ©B+|āˆ’āŸ©A|āˆ’āŸ©B)
\ket{\psi} = \frac{1}{\sqrt{2}}\left(\ket{+}_A \ket{+}_B + \ket{-}_A \ket{-}_B \right)
This particular state has the property that Alice and Bob will always both measure <span id="MathJax-Element-363-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>++
+ or they will both measure <span id="MathJax-Element-364-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ’ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ’āˆ’
-. The notation for tensors is often ambiguous in the literature as there are many ways of specifying tensors. For instance, above we used subscripts to explicitly denote which particle was in system <span id="MathJax-Element-365-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and which was in system <span id="MathJax-Element-366-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B. One may also choose to simply use the index of the system as below. The symbol <span id="MathJax-Element-367-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āŠ—ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āŠ—āŠ—
\otimes is used to denote a tensor between states (where it is assumed that the first state is system <span id="MathJax-Element-368-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and the second, system <span id="MathJax-Element-369-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B). Gradually folks may shorten the notation as follows:
<span id="MathJax-Element-370-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="|ĻˆāŸ©=12(|+āŸ©|+āŸ©+|āˆ’āŸ©|āˆ’āŸ©)|ĻˆāŸ©=12(|++āŸ©+|āˆ’āˆ’āŸ©)|ĻˆāŸ©=12(10)āŠ—(01)ā€ role=ā€presentationā€>|ĻˆāŸ©=1āˆš2(|+āŸ©|+āŸ©+|āˆ’āŸ©|āˆ’āŸ©)|ĻˆāŸ©=1āˆš2(|++āŸ©+|āˆ’āˆ’āŸ©)|ĻˆāŸ©=1āˆš2(10)āŠ—(01)|ĻˆāŸ©=12(|+āŸ©|+āŸ©+|āˆ’āŸ©|āˆ’āŸ©)|ĻˆāŸ©=12(|++āŸ©+|āˆ’āˆ’āŸ©)|ĻˆāŸ©=12(10)āŠ—(01)
%
These are all notations for the same state. Letā€™s now talk about this state in the context of a density matrix. The density matrix of this state is as follows:
<span id="MathJax-Element-371-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻA,B=12(|++āŸ©+|āˆ’āˆ’āŸ©)(āŸØ++|+āŸØāˆ’āˆ’|)ĻA,B=12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)ĻA,B=12(1001000000001001)ā€ role=ā€presentationā€>ĻA,B=12(|++āŸ©+|āˆ’āˆ’āŸ©)(āŸØ++|+āŸØāˆ’āˆ’|)ĻA,B=12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)ĻA,B=12āŽ›āŽœ āŽœ āŽœāŽ1001000000001001āŽžāŽŸ āŽŸ āŽŸāŽ ĻA,B=12(|++āŸ©+|āˆ’āˆ’āŸ©)(āŸØ++|+āŸØāˆ’āˆ’|)ĻA,B=12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)ĻA,B=12(1001000000001001)
%
Writing the density matrix <span id="MathJax-Element-372-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻĻ
\rho as <span id="MathJax-Element-373-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻA,Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻA,BĻA,B
\rho_{A,B} makes explicit that this is the density matrix over systems <span id="MathJax-Element-374-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and <span id="MathJax-Element-375-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B.
A crucial operation that one will often perform using density matricies is the partial trace. The partial trace is a way of allowing us to consider only a smaller part of the larger part of the system, while taking into account the influence of the larger system around it.
Hereā€™s an example: Suppose Bob wants to know what his state is. However, Bob really doesnā€™t care about Aliceā€™s system and just wants to know what the density matrix for his system is. Bobā€™s density matrix is simply the following density matrix (a 50% chance of being in <span id="MathJax-Element-376-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|+āŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>|+āŸ©|+āŸ©
\ket{+} and a 50% chance of being in <span id="MathJax-Element-377-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|āˆ’āŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>|āˆ’āŸ©|āˆ’āŸ©
\ket{-}).
<span id="MathJax-Element-378-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻB=12(|+āŸ©āŸØ+|+|āˆ’āŸ©āŸØāˆ’|)ā€ role=ā€presentationā€>ĻB=12(|+āŸ©āŸØ+|+|āˆ’āŸ©āŸØāˆ’|)ĻB=12(|+āŸ©āŸØ+|+|āˆ’āŸ©āŸØāˆ’|)
%
More explicitly, we could write the following:
<span id="MathJax-Element-379-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻB=12(|+āŸ©BāŸØ+|B+|āˆ’āŸ©BāŸØāˆ’|B)ā€ role=ā€presentationā€>ĻB=12(|+āŸ©BāŸØ+|B+|āˆ’āŸ©BāŸØāˆ’|B)ĻB=12(|+āŸ©BāŸØ+|B+|āˆ’āŸ©BāŸØāˆ’|B)
%
The partial trace is an operation that will let us take our original density matrix <span id="MathJax-Element-380-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻA,Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻA,BĻA,B
\rho_{A,B} and generates a new density matrix <span id="MathJax-Element-381-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻBā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻBĻB
\rho_B that ignores system <span id="MathJax-Element-382-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A. This is specifically called the partial trace over <span id="MathJax-Element-383-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A, or <span id="MathJax-Element-384-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="trAā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>trAtrA
\operatorname{tr}_A.
So how do we do this? We simply sum over the state <span id="MathJax-Element-385-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A (effectively taking a trace, but only along one axis):
<span id="MathJax-Element-386-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="trAā”ĻA,B=āˆ‘iāŸØi|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|iāŸ©Aā€ role=ā€presentationā€>trAĻA,B=āˆ‘iāŸØi|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|iāŸ©AtrAā”ĻA,B=āˆ‘iāŸØi|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|iāŸ©A
%
This is easier to evaluate using certain choices of notation:
<span id="MathJax-Element-387-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="trAā”ĻA,B=āŸØ+|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|+āŸ©A+āŸØāˆ’|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|āˆ’āŸ©A=12(|+āŸ©BāŸØ++|+|+āŸ©BāŸØāˆ’āˆ’|)|+āŸ©A+12(|āˆ’āŸ©BāŸØ++|+|āˆ’āŸ©BāŸØāˆ’āˆ’|)|āˆ’āŸ©A=12(|+āŸ©BāŸØ+|B)+12(|āˆ’āŸ©BāŸØāˆ’|B)=12(|+āŸ©BāŸØ+|B+|āˆ’āŸ©BāŸØāˆ’|B)=ĻBā€ role=ā€presentationā€>trAĻA,B=āŸØ+|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|+āŸ©A+āŸØāˆ’|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|āˆ’āŸ©A=12(|+āŸ©BāŸØ++|+|+āŸ©BāŸØāˆ’āˆ’|)|+āŸ©A+12(|āˆ’āŸ©BāŸØ++|+|āˆ’āŸ©BāŸØāˆ’āˆ’|)|āˆ’āŸ©A=12(|+āŸ©BāŸØ+|B)+12(|āˆ’āŸ©BāŸØāˆ’|B)=12(|+āŸ©BāŸØ+|B+|āˆ’āŸ©BāŸØāˆ’|B)=ĻBtrAā”ĻA,B=āŸØ+|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|+āŸ©A+āŸØāˆ’|A12(|++āŸ©āŸØ++|+|āˆ’āˆ’āŸ©āŸØ++|+|++āŸ©āŸØāˆ’āˆ’|+|āˆ’āˆ’āŸ©āŸØāˆ’āˆ’|)|āˆ’āŸ©A=12(|+āŸ©BāŸØ++|+|+āŸ©BāŸØāˆ’āˆ’|)|+āŸ©A+12(|āˆ’āŸ©BāŸØ++|+|āˆ’āŸ©BāŸØāˆ’āˆ’|)|āˆ’āŸ©A=12(|+āŸ©BāŸØ+|B)+12(|āˆ’āŸ©BāŸØāˆ’|B)=12(|+āŸ©BāŸØ+|B+|āˆ’āŸ©BāŸØāˆ’|B)=ĻB
%
This gives us the answer that we had expected.
We now have all of the tools we need to talk about quantum entropy. Intuitively, entropy can be thought of as the amount of uncertainty we have for our system, or equivalently the amount of information it takes to define our system. The entropy for a quantum system <span id="MathJax-Element-388-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻĻ
\rho is defined as follows:
<span id="MathJax-Element-389-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="H(Ļ)=āˆ’trā”(Ļlog2ā”Ļ)ā€ role=ā€presentationā€>H(Ļ)=āˆ’tr(Ļlog2Ļ)H(Ļ)=āˆ’trā”(Ļlog2ā”Ļ)
%
Note that here we use the shorthand <span id="MathJax-Element-390-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻBā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻBĻB
\rho_B to denote <span id="MathJax-Element-391-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="trAā”ĻA,Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>trAĻA,BtrAā”ĻA,B
\operatorname{tr}_A \rho_{A,B}. Here, writing <span id="MathJax-Element-392-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="trā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>trtr
\operatorname{tr} without the subscript indicates that this is the full or normal trace that one might expect (or equivalently performing the partial trace over all systems). We can now define the conditional entropy of a system as follows:
<span id="MathJax-Element-393-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="H(Aāˆ£B)Ļ=H(ĻA,B)āˆ’H(ĻB)ā€ role=ā€presentationā€>H(Aāˆ£B)Ļ=H(ĻA,B)āˆ’H(ĻB)H(Aāˆ£B)Ļ=H(ĻA,B)āˆ’H(ĻB)
%
This definition intuitively makes sense since we can think of conditional entropy as the amount of information it takes to describe our joint system <span id="MathJax-Element-394-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(A,B)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>(A,B)(A,B)
(A,B), given that we already know what <span id="MathJax-Element-395-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B is.
We can now discuss quantum mutual information, the amount of information that measuring system <span id="MathJax-Element-396-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A will provide you about system <span id="MathJax-Element-397-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B. Like the classical case, this is defined as follows:
<span id="MathJax-Element-398-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="I(A;B)Ļ=H(A,B)Ļāˆ’H(Aāˆ£B)Ļāˆ’H(Bāˆ£A)Ļā€ role=ā€presentationā€>I(A;B)Ļ=H(A,B)Ļāˆ’H(Aāˆ£B)Ļāˆ’H(Bāˆ£A)ĻI(A;B)Ļ=H(A,B)Ļāˆ’H(Aāˆ£B)Ļāˆ’H(Bāˆ£A)Ļ
%
We can now finally discuss quantum mutual information (QCMI), defined as follows: <span id="MathJax-Element-399-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;Bāˆ£C)Ļ=I(A;B,C)Ļāˆ’I(A;C)Ļā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>I(A;Bāˆ£C)Ļ=I(A;B,C)Ļāˆ’I(A;C)ĻI(A;Bāˆ£C)Ļ=I(A;B,C)Ļāˆ’I(A;C)Ļ
{I(A;B \mid C)}_\rho = {I(A;B,C)}_\rho ā€“ {I(A;C)}_\rho. With some algebraic simplifications, one can arrive at the expression:
<span id="MathJax-Element-400-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="I(A;Bāˆ£C)Ļ=H(A,C)Ļ+H(B,C)Ļāˆ’H(A,B,C)Ļāˆ’H(C)Ļ.ā€ role=ā€presentationā€>I(A;Bāˆ£C)Ļ=H(A,C)Ļ+H(B,C)Ļāˆ’H(A,B,C)Ļāˆ’H(C)Ļ.I(A;Bāˆ£C)Ļ=H(A,C)Ļ+H(B,C)Ļāˆ’H(A,B,C)Ļāˆ’H(C)Ļ.
%
The QCMI equals <span id="MathJax-Element-401-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>00
0 if and only if <span id="MathJax-Element-402-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻĻ
\rho is a quantum Markov state. Classically, the entropic characterization of conditional independence corresponds to an algebraic characterization.
Recovery Maps
Here, the algebraic characterization is more grueling. We have
<span id="MathJax-Element-403-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻABC=expā”(logā”ĻAB+logā”ĻBCāˆ’logā”ĻB)ā€ role=ā€presentationā€>ĻABC=exp(logĻAB+logĻBCāˆ’logĻB)ĻABC=expā”(logā”ĻAB+logā”ĻBCāˆ’logā”ĻB)
\rho_{ABC} = \exp(\log \rho_{AB} + \log \rho_{BC} ā€“ \log \rho_B)
Equivalently,
<span id="MathJax-Element-404-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻABC=ĻAB1/2ĻBāˆ’1/2ĻBCĻBāˆ’1/2ĻAB1/2=RBā†’AB(ĻBC)ā€ role=ā€presentationā€>ĻABC=Ļ1/2ABĻāˆ’1/2BĻBCĻāˆ’1/2BĻ1/2AB=RBā†’AB(ĻBC)ĻABC=ĻAB1/2ĻBāˆ’1/2ĻBCĻBāˆ’1/2ĻAB1/2=RBā†’AB(ĻBC)
\rho_{ABC} = \rho_{AB}^{1/2} \rho_B^{-1/2} \rho_{BC}\rho_B^{-1/2} \rho_{AB}^{1/2} = R_{B\to AB}(\rho_{BC})
Here, <span id="MathJax-Element-405-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="RBā†’ABā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>RBā†’ABRBā†’AB
R_{B\to AB} is called the Petz recovery map,4 <span id="MathJax-Element-406-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻBā†’AB(X)=ĻAB1/2ĻBāˆ’1/2XĻBāˆ’1/2ĻAB1/2ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻBā†’AB(X)=Ļ1/2ABĻāˆ’1/2BXĻāˆ’1/2BĻ1/2ABĻBā†’AB(X)=ĻAB1/2ĻBāˆ’1/2XĻBāˆ’1/2ĻAB1/2
\rho_{B\to AB}(X) = \rho_{AB}^{1/2} \rho_B^{-1/2} X\rho_B^{-1/2} \rho_{AB}^{1/2}. One can think of a recovery may as a way that we can reconstruct the entire system <span id="MathJax-Element-407-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="A,Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>A,BA,B
A, B using just system <span id="MathJax-Element-408-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B. It is not obvious that this is a quantum channel, but it is.
Suppose <span id="MathJax-Element-409-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻĻ
\rho is a probability distribution, so <span id="MathJax-Element-410-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ=diagā”(p)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ļ=diag(p)Ļ=diagā”(p)
\rho =\operatorname{diag}(p) for some vector <span id="MathJax-Element-411-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pp
p. Then, all of the density matrices are diagonal and commuting. Then, the recovery map means that we divide by <span id="MathJax-Element-412-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pBā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pBpB
p_B and multiply by <span id="MathJax-Element-413-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pABā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pABpAB
p_{AB}, i.e., multiply by <span id="MathJax-Element-414-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pAāˆ£Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>pAāˆ£BpAāˆ£B
p_{A \mid B}. This is the natural thing to do if we lost our information about <span id="MathJax-Element-415-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and were trying to figure out what <span id="MathJax-Element-416-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A was based on our knowledge of <span id="MathJax-Element-417-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B. This is why <span id="MathJax-Element-418-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="RBā†’A,Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>RBā†’A,BRBā†’A,B
R_{B\to A,B} is known as a recovery map, and it is used to discuss conditional distributions in the quantum setting. In the classical case, if we start with <span id="MathJax-Element-419-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="B,Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>B,CB,C
B, C, look only at <span id="MathJax-Element-420-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B, and use this to reconstruct <span id="MathJax-Element-421-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A, then we would have the whole state in a Markov chain. That is why this is a plausible quantum version of being a Markov chain.
However, quantum Gibbs states are not, in general, quantum Markov chains. The failure of this statement to hold is related to topological order, which is similar to the degrees of freedom that show up in error correcting codes.
Quantum Markov Networks
Here, we will formally define a quantum Markov network. The reference for this is [7].
Let <span id="MathJax-Element-422-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="G=(V,E)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>G=(V,E)G=(V,E)
G = (V, E) be a finite graph. We associate with each vertex <span id="MathJax-Element-423-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="vāˆˆVā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>vāˆˆVvāˆˆV
v \in V a Hilbert space <span id="MathJax-Element-424-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Hvā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>HvHv
{\mathcal{H}}_v and we consider a density matrix <span id="MathJax-Element-425-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻVā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻVĻV
\rho_V acting on <span id="MathJax-Element-426-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ā؂vāˆˆVHvā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ā؂vāˆˆVHvā؂vāˆˆVHv
\bigotimes_{v\in V} {\mathcal{H}}_v. Then, <span id="MathJax-Element-427-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(G,ĻV)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>(G,ĻV)(G,ĻV)
(G, \rho_V) is a quantum Markov network if for all <span id="MathJax-Element-428-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="UāŠ†Vā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>UāŠ†VUāŠ†V
U\subseteq V, <span id="MathJax-Element-429-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Uā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>UU
U is conditionally independent of <span id="MathJax-Element-430-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Vāˆ–(UāˆŖāˆ‚U)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Vāˆ–(UāˆŖāˆ‚U)Vāˆ–(UāˆŖāˆ‚U)
V \setminus (U \cup \partial U) given <span id="MathJax-Element-431-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ‚Uā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ‚Uāˆ‚U
\partial U, where the conditional independence statement is w.r.t. <span id="MathJax-Element-432-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻVā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻVĻV
\rho_V and means that the corresponding QCMI satisfies <span id="MathJax-Element-433-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(U;Vāˆ–(UāˆŖāˆ‚U)āˆ£āˆ‚U)ĻV=0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>I(U;Vāˆ–(UāˆŖāˆ‚U)āˆ£āˆ‚U)ĻV=0I(U;Vāˆ–(UāˆŖāˆ‚U)āˆ£āˆ‚U)ĻV=0
{I(U; V\setminus (U \cup \partial U) \mid \partial U)}_{\rho_V} = 0.
A quantum Markov network is called positive if <span id="MathJax-Element-434-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻVā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻVĻV
\rho_V has full rank. (Recall that in the statement of the Hammersley-Clifford Theorem, , it is assumed that the distribution is strictly positive.)
Now, consider the following example. First, we introduce the Pauli matrices
<span id="MathJax-Element-435-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Ļƒx:=[0110],Ļƒz:=[100āˆ’1],Ļƒy:=[0āˆ’ii0].ā€ role=ā€presentationā€>Ļƒx:=[0110],Ļƒz:=[100āˆ’1],Ļƒy:=[0āˆ’ii0].Ļƒx:=[0110],Ļƒz:=[100āˆ’1],Ļƒy:=[0āˆ’ii0].
%
We define a Hamiltonian on three qubits <span id="MathJax-Element-436-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A, <span id="MathJax-Element-437-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B, <span id="MathJax-Element-438-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C by
<span id="MathJax-Element-439-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="H:=(ĻƒAxĻƒBx+ĻƒAyĻƒBy+ĻƒAzĻƒBz)IC+IA(ĻƒBxĻƒCx+ĻƒByĻƒCy+ĻƒBzĻƒCz)ā€ role=ā€presentationā€>H:=(ĻƒxAĻƒxB+ĻƒyAĻƒyB+ĻƒzAĻƒzB)IC+IA(ĻƒxBĻƒxC+ĻƒyBĻƒyC+ĻƒzBĻƒzC)H:=(ĻƒAxĻƒBx+ĻƒAyĻƒBy+ĻƒAzĻƒBz)IC+IA(ĻƒBxĻƒCx+ĻƒByĻƒCy+ĻƒBzĻƒCz)
H := (\sigma_A^x \sigma_B^x + \sigma_A^y \sigma_B^y + \sigma_A^z \sigma_B^z) I_C + I_A (\sigma_B^x \sigma_C^x + \sigma_B^y \sigma_C^y + \sigma_B^z \sigma_C^z)
(Juxtaposition in the above expression signifies the tensor product as discussed before.) Finally, for <span id="MathJax-Element-440-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī²>0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī²>0Ī²>0
\beta > 0, we define the Gibbs state
<span id="MathJax-Element-441-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻA,B,C(Ī²):=1Z(Ī²)expā”(āˆ’Ī²H)ā€ role=ā€presentationā€>ĻA,B,C(Ī²):=1Z(Ī²)exp(āˆ’Ī²H)ĻA,B,C(Ī²):=1Z(Ī²)expā”(āˆ’Ī²H)
\rho_{A,B,C}(\beta) := \frac{1}{Z(\beta)} \exp(-\beta H)
The Hamiltonian here has local terms which correspond to interactions <span id="MathJax-Element-442-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(A,B)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>(A,B)(A,B)
(A,B), <span id="MathJax-Element-443-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(B,C)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>(B,C)(B,C)
(B, C). However, it can be shown that the QCMI between <span id="MathJax-Element-444-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>AA
A and <span id="MathJax-Element-445-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>CC
C conditioned on <span id="MathJax-Element-446-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>BB
B w.r.t. <span id="MathJax-Element-447-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻA,B,Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻA,B,CĻA,B,C
\rho_{A,B,C} is non-zero, which means that this is not a quantum Markov network w.r.t. the line graph <span id="MathJax-Element-448-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="A
Tumblr media
B
Tumblr media
Cā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Aā†”Bā†”CAā†”Bā†”C
A \leftrightarrow B \leftrightarrow C. This demonstrates the failure of the Hammersley-Clifford Theorem in the quantum setting.
Important Results
We will briefly discuss the results of two papers.
[8] This paper shows that mixing in space implies mixing in time in the quantum case. However, the result of the paper only applies to commuting Hamiltonians. For commuting Hamiltonians, it turns out that quantum Gibbs states are quantum Markov networks. They use a version of Glauber dynamics, which can be simulated on a quantum computer but are also plausible dynamics for a physical system in nature. This is a difficult paper to read, but it is worth digesting if you want to work in the field.
[9] This second paper is much easier and more general, covering non-commuting Hamiltonians, but it requires more conditions. They give a method of preparing the Gibbs state which can run on a quantum computer, but the dynamics are not plausible as a physical system because they are too complicated. The more complicated dynamics allows them to make the proof work. The paper also uses QCMI.
They have two assumptions. The first assumption looks like mixing in space (weak correlation decay). The second assumption is that the state looks approximately like a quantum Markov network (this is definitely not met in general). A very important paper in this space is a recent breakthrough ([10]) which characterizes quantum Markov chains. They show that if the QCMI is bounded by <span id="MathJax-Element-449-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļµā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻµĻµ
\epsilon, then the recovery map <span id="MathJax-Element-450-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="RBā†’A,B(ĻBC)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>RBā†’A,B(ĻBC)RBā†’A,B(ĻBC)
R_{B\to A,B}(\rho_{BC}) is <span id="MathJax-Element-451-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļµā€²ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ļµā€²Ļµā€²
\epsilonā€™-close to <span id="MathJax-Element-452-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻABCā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>ĻABCĻABC
\rho_{ABC}, i.e., low QCMI implies that the recovery map works well. This is trivial to prove classically, but very difficult in the quantum world.
The algorithm in [9] is very elegant. Essentially, we take the entire system and punch out constant-sized boxes. If we can reconstruct the region outside of the boxes, then we can use the recovery maps to reconstruct the regions inside of the boxes, and the boxes are far apart enough so they are almost independent. For this argument, we must assume that the QCMI decays exponentially. Whenever we have exponential decay, we get a correlation decay that sets the size of the boxes. It is very difficult to condition on quantum states, but recovery maps provide a sense in which it is meaningful to do so. The paper gives an efficient method of preparing Gibbs states and simulating quantum systems on quantum computers.
Additional reading
The standard treatment of information theory is [11]. This book contains definitions and properties of entropy, conditional entropy, mutual information, and conditional mutual information.
To see a treatment of the subject of Markov chains from the perspective of probability theory, see [12] or the mathematically more sophisticated counterpart [13]. An introduction to coupling can be found in [14], as well as [4] (the latter also contains an exposition to spatial mixing). The connection between Markov chain mixing and the so-called logarithmic Sobolev inequality is described in [15].
Appendix: Intuition for Markov chains
Random walk on the cycle
We have <span id="MathJax-Element-453-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>nn
n points on the cycle, <span id="MathJax-Element-454-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0,1,ā€¦,nāˆ’1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>0,1,ā€¦,nāˆ’10,1,ā€¦,nāˆ’1
0,1,\dotsc,n-1. At each step, we move left or right with probability <span id="MathJax-Element-455-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1/21/2
1/2. We can write the transition matrix as
<span id="MathJax-Element-456-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="T=S+Sāˆ’12ā€³ role=ā€presentationā€>T=S+Sāˆ’12T=S+Sāˆ’12
T = \frac{S + S^{-1}}{2}
where <span id="MathJax-Element-457-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Sā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>SS
S is the shift operator <span id="MathJax-Element-458-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="S|xāŸ©=|x+1modnāŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>S|xāŸ©=|x+1modnāŸ©S|xāŸ©=|x+1modnāŸ©
S\ket{x} = \ket{x+1 \bmod n}. The matrix <span id="MathJax-Element-459-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Sā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>SS
S is diagonalized by the Fourier transform. Define, for <span id="MathJax-Element-460-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=0,1,ā€¦,nāˆ’1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>k=0,1,ā€¦,nāˆ’1k=0,1,ā€¦,nāˆ’1
k=0,1,\dotsc,n-1,
<span id="MathJax-Element-461-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="|k~āŸ©=1nāˆ‘x=0nāˆ’1expā”(2Ļ€ikxn)|xāŸ©ā€ role=ā€presentationā€>|~kāŸ©=1āˆšnnāˆ’1āˆ‘x=0exp(2Ļ€ikxn)|xāŸ©|k~āŸ©=1nāˆ‘x=0nāˆ’1expā”(2Ļ€ikxn)|xāŸ©
\ket{\tilde k} = \frac{1}{\sqrt n} \sum_{x=0}^{n-1} \exp\Bigl( \frac{2\pi i k x}{n} \Bigr) \ket{x}
We have the same amount of amplitude at every point, but there is a varying phase which depends on <span id="MathJax-Element-462-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>kk
k. If <span id="MathJax-Element-463-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>k=0k=0
k = 0, we get the all-ones vector. If <span id="MathJax-Element-464-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>kk
k is small, then the phase is slowly varying. If <span id="MathJax-Element-465-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>kk
k is large, then the phase is rapidly varying. Look at what happens after we apply the shift operator:
<span id="MathJax-Element-466-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="S|k~āŸ©=1nāˆ‘x=0nāˆ’1expā”(2Ļ€ikxn)|x+1modnāŸ©=1nāˆ‘x=1nexpā”(2Ļ€ik(xāˆ’1)n)|xmodnāŸ©=expā”(āˆ’2Ļ€ikn)|k~āŸ©.ā€ role=ā€presentationā€>S|~kāŸ©=1āˆšnnāˆ’1āˆ‘x=0exp(2Ļ€ikxn)|x+1modnāŸ©=1āˆšnnāˆ‘x=1exp(2Ļ€ik(xāˆ’1)n)|xmodnāŸ©=exp(āˆ’2Ļ€ikn)|~kāŸ©.S|k~āŸ©=1nāˆ‘x=0nāˆ’1expā”(2Ļ€ikxn)|x+1modnāŸ©=1nāˆ‘x=1nexpā”(2Ļ€ik(xāˆ’1)n)|xmodnāŸ©=expā”(āˆ’2Ļ€ikn)|k~āŸ©.
%
After the shift, we pick up an additional phase based on how rapidly the phase is varying. From this, we get:
<span id="MathJax-Element-467-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="T|k~āŸ©=expā”(2Ļ€ik/n)+expā”(āˆ’2Ļ€ik/n)2|k~āŸ©=cosā”(2Ļ€kn)|k~āŸ©.ā€ role=ā€presentationā€>T|~kāŸ©=exp(2Ļ€ik/n)+exp(āˆ’2Ļ€ik/n)2|~kāŸ©=cos(2Ļ€kn)|~kāŸ©.T|k~āŸ©=expā”(2Ļ€ik/n)+expā”(āˆ’2Ļ€ik/n)2|k~āŸ©=cosā”(2Ļ€kn)|k~āŸ©.
%
The eigenvalues are
<span id="MathJax-Element-468-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Ī»k=cosā”2Ļ€kn,k=0,1,ā€¦,nāˆ’1.ā€ role=ā€presentationā€>Ī»k=cos2Ļ€kn,k=0,1,ā€¦,nāˆ’1.Ī»k=cosā”2Ļ€kn,k=0,1,ā€¦,nāˆ’1.
\lambda_k = \cos \frac{2\pi k}{n}, \qquad k=0,1,\dotsc,n-1.
Only <span id="MathJax-Element-469-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>k=0k=0
k = 0 will give me an eigenvalue of <span id="MathJax-Element-470-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1.
How do we analyze <span id="MathJax-Element-471-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tt|pāŸ©ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Tt|pāŸ©Tt|pāŸ©
T^t \ket{p}? We should Fourier transform the distribution.
<span id="MathJax-Element-472-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Tt|pāŸ©=Ttāˆ‘k=0nāˆ’1pk|k~āŸ©=āˆ‘k=0nāˆ’1pkĪ»kt|k~āŸ©.ā€ role=ā€presentationā€>Tt|pāŸ©=Ttnāˆ’1āˆ‘k=0pk|~kāŸ©=nāˆ’1āˆ‘k=0pkĪ»tk|~kāŸ©.Tt|pāŸ©=Ttāˆ‘k=0nāˆ’1pk|k~āŸ©=āˆ‘k=0nāˆ’1pkĪ»kt|k~āŸ©.
\begin{aligned} T^t \ket{p} = T^t \sum_{k=0}^{n-1} p_k \ket{\tilde{k}} = \sum_{k=0}^{n-1} p_k \lambda_k^t \ket{\tilde k}.\end{aligned}
If <span id="MathJax-Element-473-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>nn
n is odd, then as <span id="MathJax-Element-474-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tā†’āˆžā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>tā†’āˆžtā†’āˆž
t\rightarrow\infty, <span id="MathJax-Element-475-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī»ktā†’0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ī»tkā†’0Ī»ktā†’0
\lambda_k^t \to 0 for all <span id="MathJax-Element-476-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=1,ā€¦,nāˆ’1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>k=1,ā€¦,nāˆ’1k=1,ā€¦,nāˆ’1
k=1,\dotsc,n-1, so <span id="MathJax-Element-477-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ttā†’|Ļ€āŸ©āŸØ1n|ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ttā†’|Ļ€āŸ©āŸØ1n|Ttā†’|Ļ€āŸ©āŸØ1n|
T^t \to \ket{\pi}\bra{1_n}. Whatever you put into this operator, you get <span id="MathJax-Element-478-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ€ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>Ļ€Ļ€
\pi out.
Spectral gap
The example of the random walk on the cycle shows that there is generally a unique stationary distribution and suggests that the speed of convergence is determined by how close the other eigenvalues are to <span id="MathJax-Element-479-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1. Specifically, suppose for simplicity that the eigenvalues of <span id="MathJax-Element-480-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>TT
T are <span id="MathJax-Element-481-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1=Ī»0ā‰„Ī»1ā‰„ā‹Æā‰„0ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>1=Ī»0ā‰„Ī»1ā‰„ā‹Æā‰„01=Ī»0ā‰„Ī»1ā‰„ā‹Æā‰„0
1 = \lambda_0 \ge \lambda_1\ge\cdots \ge 0 (real and positive). Then, the convergence time is on the order of <span id="MathJax-Element-482-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ¼1/(1āˆ’Ī»1)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ¼1/(1āˆ’Ī»1)āˆ¼1/(1āˆ’Ī»1)
\sim 1/(1-\lambda_1).
Typically, the distance of the eigenvalues from <span id="MathJax-Element-483-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>11
1 reflects the size of the physical system. Even from the simple example, we can get some physical intuition from this. If <span id="MathJax-Element-484-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>kk
k is small, then the spectral gap is <span id="MathJax-Element-485-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="cosā”(2Ļ€k/n)=1āˆ’O(k2/n2)ā€ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>cos(2Ļ€k/n)=1āˆ’O(k2/n2)cosā”(2Ļ€k/n)=1āˆ’O(k2/n2)
\cos(2\pi k/n) = 1-O(k^2/n^2). Thus, the convergence time is <span id="MathJax-Element-486-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āˆ¼1/(1āˆ’Ī»1)āˆ¼n2ā€³ role=ā€presentationā€ style=ā€font-size: 110%; position: relative;ā€>āˆ¼1/(1āˆ’Ī»1)āˆ¼n2āˆ¼1/(1āˆ’Ī»1)āˆ¼n2
\sim 1/(1-\lambda_1) \sim n^2, which is indeed the convergence time for a random walk on a cycle.
References
S. Gharibian, Y. Huang, Z. Landau, and S. W. Shin, ā€œQuantum Hamiltonian complexity,ā€ Found. Trends Theor. Comput. Sci., vol. 10, no. 3, pp. front matter, 159ā€“282, 2014.
R. W. Keener, Theoretical statistics. Springer, New York, 2010, p. xviii+538.
E. Crosson, D. Bacon, and K. R. Brown, ā€œMaking Classical Ground State Spin Computing Fault-Tolerant,ā€ Physical Review E, vol. 82, no. 3, Sep. 2010.
C. Moore and S. Mertens, The nature of computation. Oxford University Press, Oxford, 2011, p. xviii+985.
F. Martinelli, ā€œLectures on Glauber dynamics for discrete spin models,ā€ in Lectures on probability theory and statistics (Saint-Flour, 1997), vol. 1717, Springer, Berlin, 1999, pp. 93ā€“191.
F. Martinelli and E. Olivieri, ā€œFinite volume mixing conditions for lattice spin systems and exponential approach to equilibrium of Glauber dynamics,ā€ in Cellular automata and cooperative systems (Les Houches, 1992), vol. 396, Kluwer Acad. Publ., Dordrecht, 1993, pp. 473ā€“490.
M. S. Leifer and D. Poulin, ā€œQuantum graphical models and belief propagation,ā€ Ann. Physics, vol. 323, no. 8, pp. 1899ā€“1946, 2008.
M. J. Kastoryano and F. G. S. L. BrandĆ£o, ā€œQuantum Gibbs samplers: the commuting case,ā€ Comm. Math. Phys., vol. 344, no. 3, pp. 915ā€“957, 2016.
F. G. S. L. BrandĆ£o and M. J. Kastoryano, ā€œFinite correlation length implies efficient preparation of quantum thermal states,ā€ ArXiv e-prints, Sep. 2016.
O. Fawzi and R. Renner, ā€œQuantum conditional mutual information and approximate Markov chains,ā€ Comm. Math. Phys., vol. 340, no. 2, pp. 575ā€“611, 2015.
T. M. Cover and J. A. Thomas, Elements of information theory, Second. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2006, p. xxiv+748.
R. Durrett, Essentials of stochastic processes. Springer, Cham, 2016, p. ix+275.
R. Durrett, Probability: theory and examples, Fourth., vol. 31. Cambridge University Press, Cambridge, 2010, p. x+428.
M. Mitzenmacher and E. Upfal, Probability and computing, Second. Cambridge University Press, Cambridge, 2017, p. xx+467.
F. Cesi, ā€œQuasi-factorization of the entropy and logarithmic Sobolev inequalities for Gibbs random fields,ā€ Probab. Theory Related Fields, vol. 120, no. 4, pp. 569ā€“584, 2001.
This is the opposite of the probabilistsā€™ convention, i.e., the transition probability matrix that we define here is the transpose of the one usually found in most probability theory textbooks.Ā ā†©
As a side note, it may be a good research question to investigate to what extent quantum algorithms can be used to compute summations whose terms are possibly negative. In quantum Monte Carlo, the quantum Hamiltonian is converted to a classical energy function; this conversion always works, but sometimes you end up with complex energies, which is terrible for estimating the partition function because terms can cancel each other out.Ā ā†©
You may recognize this as the total variation norm.Ā ā†©
Petz wrote about quantum relative entropy in 1991, way before it was cool.Ā ā†©
Windows On Theory published first on Windows On Theory
0 notes
doodlyaim Ā· 6 years ago
Text
n e g a t i v i t y Ā  w e e k
Huy ano puro negativity na-aabsorb ko ngayon linggo. Hindi nakakatulong sa totoo lang, sobrang affected ako ng negativity ng surroundings ko. Entropy is really greater than zero. My patience is sagad na bes, all i did every time someone is annoying me is to shout at them coz they are really getting on my nerve. An exception to my madness is the only person who motivate me na kaya ko to, thank you girl! The rest are annoying me. Ang weird maybe dahil rin sa hormones kaya ganito ako. Like i find everything around me maddening to the highest level. Back to old me again? :-(
Im ranting on tumblr kasi my Twitter acct is inactive and people will gossip and judge you and act like they care but the truth is they dont really give a fuck.Ā 
Sorry, i just need to get this thought out of my head. And this is not good coz im putting a lot of negativity here. šŸ˜­šŸ˜­
0 notes
subkulturrecords-blog Ā· 7 years ago
Video
vimeo
musicians with guns - overstepping artifacts from ricardo montalban on Vimeo.
Musicians With Guns - Overstepping Artifacts LP (drone/ambient/eperimental/LoFi music) ---> entropy-records.com/release3032.htm
*** New fractal video for Art Zero label here : vimeo.com/85096694 ***
> / Ā°Ā°000 - #CLS > / Ā°Ā°010 - #LOAD /mwg/protocol/crawling through the amazing box -r -p -s -n > / Ā°Ā°022 - #RUN -- > / Ā°Ā°143 - #ERR [00054d64-ba6c5411] ; "nonlinear overlapping function" > / Ā°Ā°285 - #LIST [4016457fa-6552b31c] > / Ā°Ā°299 - #DEBUG [23d3c447-9ff4c812] > / Ā°Ā°356 - #COORD [????????-????????] > / Ā°Ā°387 - #TRACE [535b8c47-88f5a7bb] ; "analytical distance estimator" > / Ā°Ā°408 - #ERR [012d2556-9a9b1445] ; "overstepping artifacts" > / Ā°Ā°437 - #LOSTINTIMEANDSPACE [8342dc41-e4a22f1b] > / Ā°Ā°446 - #MOFRACT [7271dd5c2 - a948b8d4] ; "half octahedral planes" > / Ā°Ā°489 - #ROTATE [61d5f4aa-62b49c10] ; "helispiral rotation" > / Ā°Ā°524 - #TRAPPED [33a49ee2-bb6462c7] > / Ā°Ā°551 - #ZOOM [+inf/-inf -g -s -t] > / Ā°Ā°637 - #ERR [e682b6d4-fe945aa0] ; "lost in a different coordinate system" > / Ā°Ā°793 - #NOEND /mwg/cometa -i
0 notes
sikoko Ā· 7 years ago
Text
Don't Fear The Reaper
I've mostly stopped development on my honeypot. Instead, I'm just watching the logs for any unusual behavior. About 3 weeks ago, I noticed something odd. The number of SYN-scans against 23/tcp took a dramatic uptick. The ports under attack looks similar to the Mirai botnet, so this appeared to be yet-another Mirai variant. I've been profiling different types of botnets, and I call this one 'Botnet[E]'. (Botnet[A] refers to the network library used by Mirai.) Except... My honeypot records a lot of detail about the packet header. One of the things that I began recording last month is the TCP sequence number. And with these attacks, I kept seeing the same sequence numbers over and over. In this case, the TCP sequence number (a 4-byte value) was the exact same as the IP target address (a 4-byte value). This becomes a very distinct value for identifying this particular scanning bot. And based on the number of IP addresses during the scanning, this is a huge botnet.
Sequencing
The TCP header contains two fields for counters. There's the sequence number and the acknowledgment number. These two fields work together during the TCP 3-way handshake and data transfer:
Handshake step 1: SYN. During the initial connection, the client sends a SYN packet to the server. The SYN packet sets the initial sequence number. This could be literally any 4-byte value -- it just needs to be set. The ACK number is ignored and usually zeros.
Handshake step 2: SYN-ACK. The server responds with a SYN-ACK. The server sets it's own initial sequence number, and responds with the client's sequence number as the acknowledgment number. This tells the client that the server has synchronized with the client's counter.
Handshake step 3: ACK. The client responds with an ACK. The sequence number should not change, and the server's sequence number becomes the acknowledgment number. This tells the server that the client has synchronized with the server's counter.
Data transfers. After the handshake, both sides start exchanging data. The server's sequence number increases by one byte for every byte of data that the server sends to the client. And the client's sequence number increases by one byte for every byte of data that the client sends to the server. The corresponding ACK packets tell each side how much data has been received and whether to resend data that got lost.
Different systems use different methods to initialize these TCP sequence numbers:
Old operating systems. Old systems would initialize it based on something like the current time, current uptime, or just use an incremental counter. The time-based initializers were fun, because you could tell the system's time or how long the computer had been up since the last reboot just by looking at the initial sequence number. However, if an attacker can guess what the sequence number will be, then they could hijack the TCP connection. For this reason, most systems have moved away from time-based initial values. Today, time-based initial values are still the case with some embedded devices.
Modern operating systems. Most of today's computers use an effectively random value to defeat information leakage.
Scanners. Tools like masscan and zmap are capable of scanning the entire Internet in under an hour. They can sling out a huge number of packets. Of course, they don't want to store every outgoing packet sequence number. Instead, they just want to send out SYN packets and see what (if anything) responds. These tools initialize the sequence number with a pre-determined value. For example, the initial sequence could be the 1st four bytes of hash(entropy XOR target IP address XOR port number). This makes the sequence numbers look random, but the scanner can tell when a valid response is received. (This stops jerks from flooding the scanner with fake SYN-ACK responses. Since you don't know the sequence number, the fake SYN-ACK is ignored.)
Botnets. Different botnets use different initial sequence values. One botnet that I see often uses an initial sequence number of 0x00000000. Another botnet uses a two-byte value, leaving the first two bytes zero. For example, seq=00000e44 and seq=0000441a. And a different botnet leaves the last two bytes zero: seq=0a4d0000 and seq=467e0000. In these cases, it's not just that it's a botnet. I can tell the botnet family based on the sequence number format.
And that brings us back to this new botnet. It initializes the TCP sequence number with the target's IP address. For example, if the SYN packet is sent from 201.52.93.90 to 182.231.151.141, then the initial SYN packet's sequence number will be 0xb6e7978d (that's 182.231.151.141 written in hex, or 3,068,630,925 in decimal). This is actually kind of brilliant. Each bot slings out packets and doesn't store any information. When a response comes back, the botnet can identify the sender by the sequence number. Moreover, if the response sequence number doesn't come from the sender's network address, then they can use this information to partially map out network addresses. (Although more likely, they just use it to check if the SYN-ACK is valid, or ignore it altogether and just use the SYN-ACK packet's sender address.) Of course, since I know the sequence number, I can be a jerk and send back SYN-ACK responses for arbitrary addresses. I just need to wait until my honeypot receives one of these scans. At that point, I know the IP address of an infected IoT devices. Then I can start sending back fake packets with fake IP addresses but valid sequence numbers in the ACK field. This will make the botnet think that lots of systems are vulnerable, even when they aren't. Each node in the botnet may end up spending a lot of time attacking ghosts. This either slows down the scan as they attack nothing, or fills the attack queue with garbage. And yes, I've been doing this for a week. Here's my script:
#!/bin/bash # Attack back ip2dec () { local a b c d ip=$@ IFS=. read -r a b c d <<< "$ip" printf '%d\n' "$((a * 256 ** 3 + b * 256 ** 2 + c * 256 + d))" } tail -f /var/log/honeypot.log | grep --line-buffered seq=ip.src | sed -u -e 's@^.*honeypot: @@' -e 's@\[@ @g' -e 's@\]@ @g' -e 's@/tcp@@g' | # My logs include source address, source port, destination address, and destination port # The source is the botnet. while read src sport junk dst dport junk ; do # Send 10 fake responses loop=0; while [ $loop -lt 10 ] ; do # Don't use the real destination; select one at random. # Or I could loop over specific addresses I want the botnet to attack. dst="$((RANDOM%6)).$((RANDOM%6)).$((RANDOM%6)).$((RANDOM%6))" # Derive the sequence and acknowledgment numbers from the addresses. seq=$(ip2dec "$src") ack=$(ip2dec "$dst") # Get the date for logging date=$(date +"%Y-%m-%d %H:%M:%S %Z") echo -n "$date " # SYN goes from src to dst #echo "SYN from $src:$sport[$ack] to $dst:$dport" # SYN-ACK goes from dst to src echo "SYN-ACK from $dst:$dport[$seq] to $src:$sport[$ack]" # use hping3 to send the fake packet hping3 -q -i 1u -c 1 -a "$dst" -L "$ack" -M "$seq" -t 242 -s "$dport" -p "$sport" -S -A "$src" > /dev/null 2>&1 ((loop=$loop+1)) done done
My honeypot is currently receiving scans from this botnet about 1-3 times per minute. And it is rare to see the same infected IP address twice. Nearly all are scans for 23/tcp. There is a small percent of scans for 22/tcp, 2323/tcp, and other ports.
Comparing Results
Good science means that results can be repeated, and the same results can be determined using different means. I've been communicating with Troy Mursch from Bad Packets. He runs a honeypot but doesn't use the same tools as me. (I wrote my own tools, so I'm very certain that we don't have the same setup.) Every now and then, I'll send him a query like, "Are you seeing any activity from xx.xx.xx.xx?" or he'll ask if I just received a flood from yy.yy.yy.yy. (It's really entertaining when the initial response is "no", but a few hours later it changes to "yes -- they just showed up.") Troy and I began comparing notes and findings with regards to packets with this specific fingerprint. We were seeing the same volume of traffic, same types of source hosts, and even the same ports being scanned. (The vast majority of scans were for 23/tcp. A distant second came 22/tcp, then 2323/tcp, and other common IoT ports.) Troy has a write-up of his findings at "EnGenius Routers Found in Reaper Botnet with Distinct Network Fingerprint". Going back through our logs: I had been seeing these scans since mid-September. At least, that's when I started tracking sequence numbers. Troy traced through his logs and found this same profile dating back to early May 2017. We compared the observed source addresses and, amazingly, had very little overlap. That means that this botnet is massive. Something in the "millions of infected hosts" range. We had been working on tracking this botnet for about a week, when the news broke. There's a new big botnet in town. Checkpoint Research calls it "IoTroop", but other security companies are calling it "Reaper" or "IoT_reaper". Like the Mirai botnet, Reaper infects IoT devices. But there's a twist. With Mirai, it just looked for default logins on IoT devices. Reaper also uses a few known exploits to compromise the IoT devices. As Andy Greenberg at Wired described it, "It's the difference between checking for open doors and actively picking locks." As soon as I read the description and the timeframe for when it appeared, I became convinced that Troy and I had been looking at packets from Reaper. It's the same timeframe, same volume, and same types of devices. As botnet infections go, this one is bad because of the sheer size of it. If the owner of this botnet were to use it for a DDoS attack, the results would be enough to take Brian Krebs offline. (And Brian's been able to withstand some really massive DDoS attacks.)
Detecting Reaper
Most of the reports about the Reaper botnet lack details for detecting an infection. The folks over at Radware have a detailed report that lists ports that Reaper scans. However, I'm not able to validate their findings. For example, they say:
The first wave consists of SYN scans on TCP ports in the following order: 20480, 20736, 36895, 37151, 22528, 16671, 14340, 20992, 4135, 64288, 45090, 21248, 21504, 31775, 39455, 47115 and 42254.
I'm just not seeing this in my logs. For example, in the last five days, my honeypot has seen five (5) scans against 20480/tcp. All five of those had the TCP SYN sequence set to the destination IP address. In contrast, during that same timeframe my logs identified 4,212 scans from Reaper -- and most were against 23/tcp. Perhaps Radware is talking about how Reaper appears on the infected system, or how Reaper attacks a vulnerable system, and not how it looks as it tries to spread across the Internet. I'm also seeing a much wider range of infected devices that those listed in the other researcher reports. To test devices, I just waited for the IP address to scan me, and then I checked for a publicly accessible web server. For example, I saw a scan from 168.167.177.180 and saw an infected airOS router. (You can tell it's infected because of the code error at the top of the page.) And here's an accessible TripMate Nano that scanned my honeypot: NOTE: It defaults to filling in the username as 'admin'. I didn't type that. The TripMate Nano scan was shortly followed by a D-Link router: For network administrators who want to detect infected hosts from this new botnet: Look for SYN packets where tcp.seq==ip.dst. If you see a match, then the ip.src denotes an infected address. Either the device at that address is infected, or something behind that NAT router is infected. Source: http://ift.tt/2zS84di
0 notes
chemphdblr-blog Ā· 8 years ago
Text
[under construction] first year undergraduate thermodynamics: a simplified glossary, discussion and sign convention summary
V idk about you, but thermodynamics just fucks me up. every. fricking. time. the different sign conventions donā€™t help either. this a short summary of undergraduate first year thermo; ideally, you would have a textbook and notes to refer to, and this is just a no-nonsense rundown of need-to-knows. this should be nothing more than a springboard for you to review topics and refresh your memory. THIS IS BY NO MEANS A COMPREHENSIVE LIST.
Glossary (will be redefined within equations but important to know&understand these at the start) Ī” change; theĀ ā€œbig versionā€ of d derivative U (sometimes E) : internal energy.Ā  Q: heat, a form of energy W: work, a form of energy H : enthalpy; the total heat of the system. It is different from Q in that it accounts for any heat lost as work in a system S: entropy. generally we say it is theĀ ā€œdisorderā€/randomnessĀ  G: Gibbs free energy H: helmholtz free energy
Equations (we will go over these individually, but I find it convenient to have them listed at the start so we know what weā€™re going over; plus it gives me a guideline for how I can connect these and talk about them) Ī”U =Ā  Ī”Q +Ā  Ī”W Ī”U = mcĪ”T + Ā PĪ”V c.v = Ī”H =Ā  Ī”U +Ā  Ī”(PV) ****differs from PĪ”V***** (not discussed here, but note H = Ī”Hproduct - Hreactant as well, when looking for heats of formation, etc) Ī”S =Ā  Ī”q/T F = U - TS G = U - TS + PV ***note on entropy S in statistical mechanics****
the real stuff Ī”U = Ā Ī”Q + Ā Ī”W about: the first law of thermodynamics. itā€™s another form of conservation of energy. It says that the total energy in an isolated system (((link to a page on isolated system))) is constant. theoretically, should you find Q and W, you know the energy of that system! of course this is unrealistic, but the basis of thermodynamics starts here. the equation, but in words: the total energy in this system (assumed isolated) is equal to the heat energy put into (+) Ā or released from the system (-) PLUS the work energy the system exerted on the system (+) or that the system did (-).Ā  sign convention: depending on your book or course, you may seeĀ  Ī”U = Ā Ī”Q - Ā Ī”W. iā€™m not entirely sure why, but for me, keeping things consistent makes snese. so I likeĀ  Ī”U =Ā  Ī”Q +Ā  Ī”W, since a positiveĀ  Ī”Q andĀ  Ī”W indicate energy being put into the system by heat and work. If you find the other way easier to understand, please work with that. but keep in mind what your teacher says the test will be like! Is it a state function? Yes
Ī”U = mcĪ”T + Ā PĪ”VĀ  about: So, this is the most common form ofĀ Ī”U I think youā€™ll see. Note that it for a system of gases only. Other equations of state will exist in other studies, but I donā€™t think thatā€™s true for very basic thermodynamics. So this equation arises from Q = mc Ī” T and W for gases in some sort of changeable volume (like a container with a piston) is Ā PĪ”V. Ā Ā  Sign convention: -PĪ”V Ā is the normal convention. mc Ī” T is the normal convention the equation, but in words: Is it a state function?: Itā€™s just another form of the above, so yes
Ī”H = Ā Ī”U + Ā Ī”(PV)Ā  A related concept to internal energy, so Iā€™ll go into it here rather than the second law.Ā  About: This is mostly relevant for gases. In liquids/solids, the Ī”U and Ī”H are usually quite close because Ā there tends to be no or very small change in volume for stuff in those states. Gases, however, can expand/depress; and in doing so, the pressure it exerts on the environment or has exerted on it would change. The volume of a gas expanding or depressing will also change in those cases.Ā Ideally, we would have no escape of gas and a system that is isochoric and isobaric. however, for the most part, itā€™s untrue. theĀ  Ā Ī”(PV) Ā is expanded out into P Ī” V +Ā Ī”PV. Ā **IT DETERMINES EXO/ENDOTHERMICITY** the equation, but in words: since most reactions are done in open containers (i.e., not isochoric or isobaric), internal energy is not total energy of system, you must account for any energy lost as work; i.e., gas pushing (pressure) against a container is work; the gas changing volume is work, etc.Ā  Sign convention: None besides the ones associated withĀ Ī”U Is it a state function? Yes
ALSO Ī”H =Ā  Ī”Hfinal -Ā  Ī”H intiial Hessā€™s Law (see notes below)
Consider: water boiling in open container. The energy needed for water molecules to escape the surface of the liquid and overcome equilibrium of condensation/evaporation cycle is very high, more than is accounted for inĀ  Ī”U; soĀ  Ī”H cannot be zero.Ā 
Ā  Ā  Ī”S =Ā  Ī”q/T (for a reversible process) and can be written as Ī”U <= TĪ”S - PĪ”V,Ā  Ī”S >Ā  Ī”q/T for irreversible process about: entropy has to do with the randomness of a system. the tendency of things to equa out and spread apart; i.e., something hot will try to disperse that heat to something less hot; molecules tend to spread away from each other. Ā Adds some moreĀ ā€œrulesā€ about the energy in a system. The entropy of a system will be the same or tend to get bigger. Ā Important to note that in a reversible process, itā€™s the above equation
Ī”G =Ā  Ī”H - T Ī”S gibbs free energy is a relationship that describes whether a chemical reaction is predicted to be spontaneous or not A negativeĀ  Ī”G indicates it is spontaneous inthe forward direction as written. A zeroĀ  Ī”G indicates equilibrium; i.e., no driving force forwards or backwards it becomes temperature indepedent whenĀ  Ī”S is a very small value in in either positive or negative direction; this is bcause the product of the temperature and the entropy would be smaller than the enthalpy change
some terms: spontaneity determines if a reaction will occur or not. IT DOES NOT DETERMINE RATE OR EXO/ENDO THERMICITY spontaneous reactions mean that the products are lower in free energy than reactants exo/endothermicity is whether a reaction gives off heat to environment or takes heat from itĀ 
memorization listĀ (once you understand the concepts, this list is what you need to know on a fast basis. I donā€™t advocate for memorization EVER but if youā€™re shaky on topics and you have these memorized, you can sometimes reason your way into understanding something)
a (-) Ī”U indicates the system lost energy/did work on the environment. something cooled off, or the container expandedĀ 
a (+) Ī”U indicates system got energy from the environment (heat put in by a bunsen burner or you squeezed the container)
(-) Ī”Q is heat lost from the system to environment
(+) Ī”Q is heat gained by the system from the environment
(-) Ī”W is work done by the system on the environment
(+) Ī”W is work done to the system by the environment
a (-)Ā  Ī”H indicates an exothermic reaction; the system released heat. (think of it as the system put out/lost energy as heat)
a (+)Ā  Ī”H indicates an endothermic reaction; the system had to take in heat to occur.Ā 
a (-)Ā  Ī”S indicates a loss of entropy (more ordered)
a (+)Ā  Ī”S indicates an increase in entropy (more chaos)
a (-) Ī”G indicates a spontaneous forward reaction, as the equation is written
a (+) Ī”G indicates a nonspontaneous forward reaction, although the reverse *may* be
Ī”G = 0 indicates equilibrium.Ā 
when Ī”H is (+) and Ī”S is (+), Ī”G could be either (+) or (-) depending on temperature, and is spontaneous at high temperature
when Ī”H is (-) and Ī”S is (+), Ī”G is always (-) and always spontaneous
when Ī”H is (-) and Ī”S is (-), Ī”G could be either (+) or (-) Ā depending on temperature, and is spontaneous at low temperature
when Ī”H is (+) and Ī”S is (-), Ī”G is always (+), and is never spontaneousĀ 
Discuss later: Hessā€™s Law (of constant heat summation): the enthalpy change of any reaction is equal to the sum of the enthaly changes of a series of reactions into which the overall reaction may be divided. (bc enthalpy is a statefunction)Ā  Ī”H = sum H (bonds broken) - sum H (bonds formed) *** NOT Hfinal - Hinitial****
thermodyanmic potentials and their relationships [x]
0 notes