#entropy zero g/t
Explore tagged Tumblr posts
Text
"So, can I call you Big Boss now?" "Fuck you, Wilson." messing around with gmod animated props and discovered there was a model resizing feature so i made entropy: zero g/t out of boredom
clone cop's NOT amused LMFAOOO
#g/t#giant/tiny#sfw g/t#sfw giant/tiny#sfw gt#gt#using the source engine to kick artblock's ass rn#entropy zero g/t#entropy zero 2 g/t
4 notes
Ā·
View notes
Text
The Sunnydale Herald Newsletter, Wednesday, September 18-Thursday, September 19
Master: You're dead! Buffy: I may be dead, but I'm still pretty. Which is more than I can say for you.
~~Prophecy Girl~~
[Drabbles & Short Fiction]
Nothing's The Same (Buffy, PG) by badly_knitted
bufy a girl (Buffy/Spike, M) by lottiecrabie
A Tale of Two Slayers (Buffy/Kendra, E) by Ann_Douglas
Convincing (Buffy/Giles, T) by simple_nothings
[Chaptered Fiction]
Erotic Entertainment, Chapter 4 (Multi, E) by TheInfiniteDoctor
Cat Scratch Fever, Chapter 2 (Buffy/Cordelia/Oz, E) by KNZ1
My (love) Kill, Chapter 1 (Buffy/Spike, G) by scared_worm
Love Lives Here, Chapter 100 (Buffy/Spike, NC-17) by Passion4Spike
Forty-eight days in LA, Chapter 7 (Buffy/Spike, NC-17) by Blissymbolics
Mysterious Destinies, Chapter 10 (Buffy/Spike, NC-17) by EnchantedWillow
Lost in Desolation, Chapter 9 (Buffy/Spike, NC-17) by Melme1325
A Nice Surprise, Chapter 1 (Crossover with Riptide, FR15) by calikocat
Meanwhile, back on the farm..., Chapter 6 (Multiple crossings, FR13) by hysteriumredux
Seeing Clearly, Chapter 200 (Multiple crossings, FR13) by JoeB
Perception, Chapter 4 (Buffy/Spike, G) by flootzavut
Champion of War, Chapter 3 (Buffy/Spike, NC-17) by Desicat
To All We Guard, Chapter 25 (Buffy/Spike, NC-17) by simmony
Waiting for You, Chapter 21 (Buffy/Spike, NC-17) by honeygirl1885
A Sword in the Man, Chapter 4 (Buffy/Spike, PG-13) by Desicat
[Images, Audio & Video]
Artwork:The Slayer's pet by JSBirsa
Manip:moodboard: buffy summers by courtillyy
Manip:Collage #172 by thedecadentraven
Gifset:Buffy Meme: [1/8 Episodes] 6x07 Once More With Feeling by lovebvffys
Gifset:7x15 | āGet It Doneā by clarkgriffon
Gifset:6x18 "Entropy" by peeta-mellark
Gifset:Timestamp Roulette 3x03 | Faith, Hope, & Trick by clarkgriffon
Gifset:3.04 | Beauty and the Beasts by bangelgifs
Gifset:Mercedes McNab as HARMONY KENDALL š¦ by whatisyourchildhoodtrauma
Video: Buffy Best Comebacks // Buffy The Vampire Slayer Funniest Moments by Hollistic Witch
Video: Buffy and Faith - Empty by juliaroxs241
[Reviews & Recaps]
Big Bad Powerful She-Witches | Buffy the Vampire Slayer 7x9 "Never Leave Me" | Normies Reaction! by The Normies
Buffy the Vampire Slayer 4x11 & 4x12 REACTION | "Doomed" & "A New Man" by The Horror Bandwagon
Buffy the Vampire Slayer REACTION | Season 7, Episode 8: Sleeper by JayPeaKay
*BETTER THAN MCDONALD'S?!* Buffy the Vampire Slayer S6 Ep 12 "Double Meat Palace" Reaction FIRST by Nick Reacts
First Date: Buffy 7x14 Reaction by Dakara
[Fandom Discussions]
[Something Blue with zero context] by mybloodyvampire
[Headcanon- Dru has a lot of things] by voices-not-echoes
Question relating to Season 7 by multiple authors
I just watched s6e19 for the first timeā¦ by multiple authors
I do like this moment of sincerity by multiple authors
Poll - Would you rather scenario regarding Giles in season 7 by multiple authors
The man in the middle by multiple authors
How has your perspective on the show changed as you get older? by multiple authors
What's on your Drusilla/Spike playlist? by multiple authors
Why is "Anne" such an disliked episode? by multiple authors
What are your best Buffy trivia questions? by multiple authors
How much time passes between 'The Body' and 'The Gift'? by multiple authors
Jasmine's best exploits/power statements? by multiple authors
Top 2 favorite episodes... by multiple authors
I finished the show 20 years ago, and Iām only on season 1 of my rewatch, but I have thoughts about Cordelia and Angel by multiple authors
Buffy and Anya by multiple authors
Response to criticism over Buffyās perceived hardness in S:7 by multiple authors
What did Faith and Willow talk about on their car ride back from Los Angeles? by multiple authors
Just finished watching Angel for the first time ever-VENT by multiple authors
It always seemed to me that Willow's arc was setting up to stall before Smashed and Wrecked by multiple authors
I Robot, You Jane appreciation thread by multiple authors
Buffy The Vampire Slayer is one of the most iconic shows ever produced but which season was THE season and why ? by multiple authors
Submit a link to be included in the newsletter!
Join the editor team :)
5 notes
Ā·
View notes
Text
Completed WIPs, June 2020
The Color Of Blood by MonsieurBlueSky (MyChemicalRachel) (T, 9K)
8:31pm by justforme23 (T, 1775)
Otherkind by MizDiablo (E, 65K)
Toothpaste Kisses by buckybees (T, 19K)
Try Me On For Size by fandomfluffandfuck (E, 7K)
where else would we go by kocuria (G, 1433)
A Higher Epsilon by deadto27 (E, 91K)
Red Carpet Rescue Mission by darter_blue (E, 10K)
Encouragement and Persuasion by orphan_account (NR, 36K)
The Discovery by fandomfluffandfuck (E, 5K)
Siberia by CC99trialanderrorgirl (E, 10K)
A Study In Steve Rogers' Stupidity by TheGameIsOver (T, 8K)
He Wears It Well by roe87 (E, 15K)
Nothing colder than being alone by lalalalalahahahahaha (M, 4K)
Heavy Lies the Head by inflomora, odetteandodile (M, 60K)
The Safer Course by seapigeon (M, 8K)
Broken memories by mcuwhore (G, 26K)
Very, Very Easy To Be True by Whendoestheshipsail (AO3 login required) (E, 57K)
Paradise Lost (& Found) by JJK (M, 121K)
Latte Art and Slow Dancing in the Dark by deadonarrival for The Twitter Collaborative (E, 90K)
āHey, Sarge, you got a dame?ā by rox_fanfics (T, 3K)
Out Of Order by Marvel_Mania (E, 19K)
Bucky Makes Cheesecake for Clint's Birthday Dinner by E_Greer (M, 8K)
way far beyond by squirmingbubbles (T, 7K)
Third one's the charmed by NatyCeleste (E, 12K)
When The World Was At War (we just kept dancing) by PeachyKeener (E, 41K)
Renegades by crinklefries (M, 142K)
Phase One: MechAvengers Assemble by DarkFairytale (T, 51K)
Podfic for "Dinner for Two" by Dira Sudis by MsPooslie for Dira Sudis (dsudis), hpismyhomeboy (E, 22)
Take Me Home by fandomfluffandfuck (NR, 5K)
cradling the flame by astudyinsolitude for ZepysGirl (E, 17K)
Brooklyn Syndrome by lordelannette (E, 158K)
They Were Zoomates!!! by Written_prose_things (G, 4K)
college 101 by mareviils (E, 14K)
Under the Table and Dreaming by Daretodream66 (M, 76K)
So. You want to know more about the A/B/O universe. by moonythejedi394 (M, 13K)
Corrupted by nerdyrose24 (T, 3K)
The Comeback Kid by grimeysociety (E, 169K)
Latitude Zero by Madara_Nycteris (E, 27K)
why would i shake your hand (when i could shake your bed)? by mediocre_fanfics (NR, 23K)
in the ruins of our worlds by made_of_sunshine (T, 56K)
Old Friends, New Problems by BFab (NR, 14K)
A Perfect Prescription by thewaythatwerust (E, 45K)
I Call Out Your Name, It Feels Like a Song I Know So Well by breatheforeverypart (M, 21K)
two gentlemen of brooklyn by rooonil_waazlib (E, 8K)
Baby Seasons Change But People Don't by fandomfluffandfuck (NR, 6K)
put you on something new by howdoyousleep, the1918 (E, 14K)
Raging War Within by softboibarnes (E, 124K)
Any Day We Can Wash Out To Sea by ashdeanmanns (NR, 26K)
entropy by truehumandisaster (NR, 4K)
Starting Over by Annaelle (T, 50K)
Everybody's a Dreamer by powerfulowl (StuckyFlangst) (E, 13K)
Quality Wood by cloudycelebrations (G, 600)
14 notes
Ā·
View notes
Text
youtube
youtube
youtube
5x5 == *Y.ank E&/|&3\B
RSC:BeCCa
Ušš«§š½
KšŖ š¾šæ
RA(C++);
]Tš§@[:š āAR$U$
youtube
NA:[R]
š¤©š¶šµ
š§š¼š©āš¤
youtube
E šØāš¤[š»]š¤ <- š
R {^.*}āa.$.c = š
G QL ALF:šŖ == š
O Išš¹; Tu -> Du%.dd
W AS H8F8CHK ? M*
š¤©āDA +e Rāi $V.etI &V3 =š§vā ļø.@š«6^2Z
š[
š„£<Ti~T@-MLe3^š°>@
[š¼:$&ap&U QA
$BUN.exe -š½ļø.UB>DB4.$ -io -UX
š@š] M.O. {š„}Pš R:Q
C i =š<=:N.UM<ā]nI8āP
*^-[~]:
H.EX E(8);L<Dš„¢āX
>e%Piā
>$ š.exe -opt66 -view -7D81
$š.FHĆ(š š§²šŖ); AuTO:-šŖ«.CMD
BGr*.* &! āYEllā&āOWā..
6]x9[:96āH2]O, (SPlinDA); {ā²ļø, s-infinitive}.(ā¦
..
UTIL(Prior S; a cogito; sm-3); [gerund, š].. ā¦)
.1š (48āļø3P8 = H8.NuLL) ; āI <=> š¦»āhaā
.2šāhš¤H <-8šļøFxEh.00
.EšÆ.š§āāļø.Sš§4
PāļøRš£ļøOšDuCšT = .^. Sāš£
Māš§ & āW [āš]
I = S[4.MEM]&E..
š <=> [.@*]
@ āIā <PHD@:8š§8:-er:> āO ā :: @A+1 :š»:0
R/3:<<=:DAP
O = AD
A !== [#&] T[$AvP] @š
æļø00P; mDNA
Z a:šÆ
a š¦”:z
NaH:0š
2{np(š„£);
P(š¼|^:š¦§:-SM&P EaveLR/2)
<š„F^L>iK.3X>Eā
:$š½ļø,TYPES E%MAD.di :$: T$eve-N$T0$
š„¢āi :š:
O [š¼YiāN m, I] :š¤:
U == Particular(šļø|//š³š¤š; š¦)
Qāļø = {šš„”š§} == QA
BU N AZN ZzZ BOT-tOm HE -> 3N.EM AP
T š„; zero-sum STAigiNā TEST MA@ FR MA
N š.T4ReLĆ© AS C^EāM:EA š® A.EA š®
Sound; Induction D0 <- Beg[š§]
Abducting-DeDuction š· <- W__$
UNSouNā: Epistemological D1 M__%
OrTHoGoNai :. PREmi$e. IĀ°E0 CON:P2,P1
youtube
pReMi$e PMP;š
æļø E G^Tu Ć {|šŖØšš„”|};
š!š§“:š
š
æļø = Quench(Ć^BoMB<-8|PrioR);
(
MeTHod A = DeB[8]^(ORa) &H.it(t*h)
T-I =! C^2 cos [(FIFO), (LIFO)]
E-D = <FTE> /āJ__1@YAh/OO.CoMā
E&D-A:8:=:8N-BEck i.e. EXāLuSi = ā3,ā e.g.)
)&š
æļøvš¦
Pš„”&š¦S;4P https://youtu.be/9p_U_o1pMKo FY
TY
TU
^PiPe & @Foe,
#Do&e-3^SāX
#XoDx && YoDy FR E!=[VALuE]
TR:ERR; 5x5 = {me,myself & āiā}
youtube
š¹EAM is RoRfnRIM(šÆ);
youtube
#MSG; B*R*B.8.1.3.69-Tu-šŖ« && š¢
UNK KNOWN
š³ļøāš
KNOWN UNK
š¦
ENTROPY š¹ SIGINT āļø
Gšļøāāļøaš©²mšBš¤æLĆ© Aš©±M šŖPyoX *~i:.
.:~i[*š§»-P|TuRāN domainās,
N-geNSUffI T.fprint] ::
x(RANDšļø)šļø.MSD ? M ! SUBJECT .?.
*^*^*^*^*
LIFE ALERT Flee THen Fight
*v*v*v*v%[fn(dx); b8KoVj^i
Bš¤iā®ļøLš§»
Dš±š„šāļøš„š¦
Iš«š§šļøšš„š
VeKTOR š„L,š¦L,šu,ā
N.N
Oš«Proš¤Pš³ļøer š®Tš„”, iš±e
GoS4, e.g.
RECOMMEND THE REST OF OUR FAMILY AND THEN GO LIVE WITH YOUR EYES TODAY AND GET YOUR OWN MONEY š°
youtube
MšŖoYš¬ļøušNšŖRšoYš”gšBišŖVš“ BuF -ARDuOuS$ &_OvR ā_BoT@
U->
<-D
F :: š
B;šÆ
L-is-š
R $ÄÆ šH.ikeāK.iš¼šUš©šMāļøstrš¢
šIš§šNš¹šTš
æļøremise2
Undivided MāD; medium
[THOT]
{RAPE}
J(;$)K$-/in//var/i//a/nt
$>&Predicted @v@
Tu fn dmDB rn Q s;tiL(B&)e[&L] e.g.
I A M B a d B o i {i,s,g,&u; n&m:@C&} C.HuE
EGO-ToKen
Lose-Lose &āUH-LO$,ā3
THis.is.DeaTH.š§¬
Sš«£PšµļøāāļøTšµļøāāļøCašµļøāāļøsšt
youtube
Strategy Process ā§ļø 4pL(ā ļø)orM@
š¦>š¶
š§<š·
š“>šø
šā²ļø == ššš SUCCESS = āļø
šššš©āš»š“āā ļø
š§āš»xšØāš»
:š³ļø:
āTo have someone understand your mind is a different kind of intimacy.ā
ā Unknown
4K notes
Ā·
View notes
Text
Gravitational domain wall in two-dimensional dilaton gravity. (arXiv:2006.07962v1 [hep-th])
We study a special two-dimensional dilaton gravity with Lagrangian $\mathcal{L}=\frac{1}{2}\sqrt{-g}(\phi R+W(\phi))$ where $W(\phi)={\rm sech}^2\phi$. This theory describes two-dimensional spacetimes that are asymptotically flat. Very interestingly, it has an exact solution for the metric, ${\rm d} s^2=-(\tanh x){\rm d} t^2+1/(\tanh x)\, {\rm d} x^2$, which presents an event horizon but no singularity. Because of the kink profile for the metric components appearing in this solution, we refer to it as gravitational domain wall with the wall simply being the event horizon and separating two asymptotically Minkowskian spacetimes. The global causal structure for such an object is studied via coordinate extension and the thermodynamical quantities are computed. While the gravitational domain wall has non-zero temperature $1/4\pi$, its energy and entropy are vanishing.
from gr-qc updates on arXiv.org https://ift.tt/3hwHU5L
0 notes
Text
tmp
<span id="MathJax-Element-1-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="ā role=āpresentationā>
Lecturer: Aram Harrow
Scribes: Sinho Chewi, William Moses, Tasha Schoenstein, Ary Swaminathan
November 9, 2018
Outline
Sampling from thermal states was one of the first and (initially) most important uses of computers. In this blog post, we will discuss both classical and quantum Gibbs distributions, also known as thermal equilibrium states. We will then discuss Markov chains that have Gibbs distributions as stationary distributions. This leads into a discussion of the equivalence of mixing in time (i.e. the Markov chain quickly equilibrates over time) and mixing in space (i.e. sites that are far apart have small correlation). For the classical case, this equivalence is known. After discussing what is known classically, we will discuss difficulties that arise in the quantum case, including (approximate) Quantum Markov states and the equivalence of mixing in the quantum case.
Gibbs distributions
We have already learned about phase transitions in a previous blog post, but they are important, so we will review them again. The Gibbs or thermal distribution is defined as follows: Suppose that we have an energy function <span id="MathJax-Element-2-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E:{0,1}nāRā role=āpresentationā style=āfont-size: 109%; position: relative;ā>E:{0,1}nāRE:{0,1}nāR
E : {\{0,1\}}^n \to {\mathbb R}, which takes <span id="MathJax-Element-3-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>nn
n-bit strings to real numbers. Usually, <span id="MathJax-Element-4-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E=āi=1mEiā role=āpresentationā style=āfont-size: 109%; position: relative;ā>E=āmi=1EiE=āi=1mEi
E = \sum_{i=1}^m E_i, where each <span id="MathJax-Element-5-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eiā role=āpresentationā style=āfont-size: 109%; position: relative;ā>EiEi
E_i term depends only on a few bits. For example, the energy might be the number of unsatisfied clauses in a 3-SAT formula, or it may arise from the Ising model. The Gibbs distribution is
<span id="MathJax-Element-6-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="p(x)=expā”{āE(x)/T}Zā role=āpresentationā>p(x)=exp{āE(x)/T}Zp(x)=expā”{āE(x)/T}Z
p(x) = \frac{\exp\{-E(x)/T\}}{Z}
where the normalization factor in the denominator, also called the partition function, is <span id="MathJax-Element-7-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Z=āxā{0,1}nexpā”{āE(x)/T}ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Z=āxā{0,1}nexp{āE(x)/T}Z=āxā{0,1}nexpā”{āE(x)/T}
Z = \sum_{x \in {\{0,1\}}^n} \exp\{-E(x)/T\}. Another, perhaps more operational, way to define the Gibbs distribution is:
<span id="MathJax-Element-8-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="p=argmaxqāP({0,1}n)ā”H(q)Ā subject to the constraintĀ āØq,Eā©=EĀÆ.ā role=āpresentationā>p=argmaxqāP({0,1}n)H(q)Ā subject to the constraintĀ āØq,Eā©=ĀÆE.p=argmaxqāP({0,1}n)ā”H(q)Ā subject to the constraintĀ āØq,Eā©=EĀÆ.
p = \operatorname*{arg\,max}_{q \in {\mathcal{P}}({\{0,1\}}^n)} H(q)~\text{subject to the constraint}~\braket{q,E} = \bar{E}.
In this expression, <span id="MathJax-Element-9-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P({0,1}n)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>P({0,1}n)P({0,1}n)
{\mathcal{P}}({\{0,1\}}^n) is the set of probability distributions on <span id="MathJax-Element-10-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="{0,1}nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>{0,1}n{0,1}n
{\{0,1\}}^n, <span id="MathJax-Element-11-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Hā role=āpresentationā style=āfont-size: 109%; position: relative;ā>HH
H is the Shannon entropy, and <span id="MathJax-Element-12-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="EĀÆā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĀÆEEĀÆ
\bar E is a constant representing the average energy. We are thinking of probability distributions and <span id="MathJax-Element-13-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā role=āpresentationā style=āfont-size: 109%; position: relative;ā>EE
E as vectors of size <span id="MathJax-Element-14-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="2nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>2n2n
2^n. It turns out that if we solve this optimization problem, then the Gibbs distribution is the unique solution.
Uses of Gibbs distributions
Why is it useful to work with Gibbs distributions?
Gibbs distributions arise naturally in statistical physics systems, such as constraint satisfaction problems (CSPs), the Ising model, and spin glasses. One approach to deal with Gibbs distributions is through belief propagation (BP), which yields exact inference on tree graphical models and sometimes phase transition predictions on loopy graphs. Instead, we will focus on a different approach, namely, sampling from the Gibbs distribution.
If we want to minimize <span id="MathJax-Element-15-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā role=āpresentationā style=āfont-size: 109%; position: relative;ā>EE
E (say, to find a 3-SAT solution), we can use simulated annealing. The idea of annealing is that we want to produce a crystal; a crystal is the lowest energy configuration of molecules. If we heat up the substance to a liquid and then cool it quickly, we will not get a nice crystal, because little bits of the material will point in different directions. In order to form a crystal, we need to cool the system slowly.
In computer science terms, we take a sample from a high temperature because sampling is generally easier at a higher temperature than at a lower temperature. We then use that sample as the starting point for an equilibration process at a slightly lower temperature, and repeat this procedure. If we reach zero temperature, then we are sampling from the minimizers of <span id="MathJax-Element-16-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā role=āpresentationā style=āfont-size: 109%; position: relative;ā>EE
E. In practice, the system will usually stop mixing before we get to zero temperature, but this is a good heuristic. You can think of this process as gradient descent, with some additional randomness.
Gibbs distributions are used to simulate physical systems.
Gibbs distributions are used in Bayesian inference due to the Hammersley-Clifford theorem, which will be discussed next.
Gibbs distributions are also connected to multiplicative weights for linear programming (not discussed in this blog post).
Bayesian inference & the Hammersley-Clifford theorem
In order to present the Hammersley-Clifford theorem, we must first discuss Markov networks. For this part, we will generalize our setup to a finite alphabet <span id="MathJax-Element-17-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī£ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī£Ī£
\Sigma, so the energy function is now a function <span id="MathJax-Element-18-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī£nāRā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī£nāRĪ£nāR
\Sigma^n \to \mathbb R.
Markov chains
First, let us recall the idea of a Markov chain with variables <span id="MathJax-Element-19-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X1X1
X_1, <span id="MathJax-Element-20-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X2X2
X_2, <span id="MathJax-Element-21-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X3ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X3X3
X_3.
The random variables <span id="MathJax-Element-22-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X1X1
X_1, <span id="MathJax-Element-23-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X2X2
X_2, <span id="MathJax-Element-24-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X3ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X3X3
X_3 form a Markov chain if their joint distribution can be written in a factored way: <span id="MathJax-Element-25-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x1,x2,x3)=p1,2(x1,x2)p3ā£2(x3ā£x2)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>p(x1,x2,x3)=p1,2(x1,x2)p3ā£2(x3ā£x2)p(x1,x2,x3)=p1,2(x1,x2)p3ā£2(x3ā£x2)
p(x_1,x_2,x_3) = p_{1,2}(x_1,x_2)p_{3 \mid 2}(x_3 \mid x_2). For example, imagine that <span id="MathJax-Element-26-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X1X1
X_1, <span id="MathJax-Element-27-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X2X2
X_2, <span id="MathJax-Element-28-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X3ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X3X3
X_3 represent the weather on Monday, Tuesday, and Wednesday respectively. These random variables form a Markov chain if, conditioned on the weather on Tuesday, we have all of the information we need to forecast the weather on Wednesday. Another way to say this is that conditioned on the weather on Tuesday, then the weather on Monday and the weather on Wednesday are conditionally independent. Note that the weather on Monday and the weather on Wednesday are not independent; there can be correlations, but these correlations are mediated through the weather on Tuesday. It is important to note that the definition of a Markov chain is symmetric with respect to going forwards or backwards in time, so we can also write the conditional independence condition as <span id="MathJax-Element-29-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x1,x2,x3)=p2,3(x2,x3)p1ā£2(x1ā£x2)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>p(x1,x2,x3)=p2,3(x2,x3)p1ā£2(x1ā£x2)p(x1,x2,x3)=p2,3(x2,x3)p1ā£2(x1ā£x2)
p(x_1,x_2,x_3) = p_{2,3}(x_2,x_3) p_{1 \mid 2}(x_1 \mid x_2).
The conditional independence condition can also be written as <span id="MathJax-Element-30-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p1,3ā£2(x1,x3ā£x2)=p1ā£2(x1ā£x2)p3ā£2(x3ā£x2).ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>p1,3ā£2(x1,x3ā£x2)=p1ā£2(x1ā£x2)p3ā£2(x3ā£x2).p1,3ā£2(x1,x3ā£x2)=p1ā£2(x1ā£x2)p3ā£2(x3ā£x2).
p_{1,3 \mid 2}(x_1, x_3 \mid x_2) = p_{1 \mid 2}(x_1 \mid x_2) p_{3 \mid 2}(x_3 \mid x_2). Recall that for two random variables <span id="MathJax-Element-31-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X1X1
X_1 and <span id="MathJax-Element-32-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X2ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>X2X2
X_2 with joint distribution <span id="MathJax-Element-33-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 109%; position: relative;ā>pp
p, they are independent, i.e., <span id="MathJax-Element-34-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x1,x2)=p1(x1)p2(x2)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>p(x1,x2)=p1(x1)p2(x2)p(x1,x2)=p1(x1)p2(x2)
p(x_1,x_2) = p_1(x_1) p_2(x_2), if and only if <span id="MathJax-Element-35-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(X1;X2)=0ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(X1;X2)=0I(X1;X2)=0
I(X_1; X_2) = 0, where <span id="MathJax-Element-36-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Iā role=āpresentationā style=āfont-size: 109%; position: relative;ā>II
I here denotes the mutual information. Similarly, conditional independence is equivalent to the conditional mutual information <span id="MathJax-Element-37-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(X1;X3ā£X2)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(X1;X3ā£X2)I(X1;X3ā£X2)
I(X_1; X_3 \mid X_2) equaling zero. This quantity is defined as <span id="MathJax-Element-38-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(X1;X3ā£X2)=H(X1ā£X2)+H(X3ā£X2)āH(X1,X3ā£X2)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(X1;X3ā£X2)=H(X1ā£X2)+H(X3ā£X2)āH(X1,X3ā£X2)I(X1;X3ā£X2)=H(X1ā£X2)+H(X3ā£X2)āH(X1,X3ā£X2)
I(X_1;X_3 \mid X_2) = H(X_1 \mid X_2) + H(X_3 \mid X_2) ā H(X_1, X_3 \mid X_2).
Keep in mind that conditional independence is characterized in two equivalent ways: via an algebraic condition on the distributions, and via mutual information.
Markov networks
A Markov network is like a Markov chain, but with more random variables and a more interesting structure. Imagine that we have a graph, where each node is associated with a random variable and the edges encode possible correlations. A Markov network has the property that if we take any disjoint collection of nodes <span id="MathJax-Element-39-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A, <span id="MathJax-Element-40-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B, and <span id="MathJax-Element-41-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C such that <span id="MathJax-Element-42-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and <span id="MathJax-Element-43-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C are fully separated by <span id="MathJax-Element-44-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B (that is, any path from <span id="MathJax-Element-45-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A to <span id="MathJax-Element-46-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C must go through <span id="MathJax-Element-47-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B, or alternatively, removing <span id="MathJax-Element-48-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B leaves <span id="MathJax-Element-49-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and <span id="MathJax-Element-50-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C disconnected), then <span id="MathJax-Element-51-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(XA;XCā£XB)=0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>I(XA;XCā£XB)=0I(XA;XCā£XB)=0
I(X_A; X_C \mid X_B) = 0. The notation <span id="MathJax-Element-52-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="XAā role=āpresentationā style=āfont-size: 110%; position: relative;ā>XAXA
X_A here means the collection of random variables associated with the nodes in <span id="MathJax-Element-53-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A.
For example:
Here, if <span id="MathJax-Element-54-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="A={1,5,6}ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>A={1,5,6}A={1,5,6}
A=\{1,5,6\}, <span id="MathJax-Element-55-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="B={2,7}ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>B={2,7}B={2,7}
B=\{2,7\}, and <span id="MathJax-Element-56-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C={3,4}ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>C={3,4}C={3,4}
C=\{3,4\}, then <span id="MathJax-Element-57-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 109%; position: relative;ā>BB
B separates <span id="MathJax-Element-58-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A and <span id="MathJax-Element-59-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 109%; position: relative;ā>CC
C.
A Markov network is also called a graphical model or a Markov random field; and yet another name for them is Gibbs distribution, which is the content of the following theorem:
Theorem 1 (Hammersley-Clifford Theorem): Let <span id="MathJax-Element-60-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 109%; position: relative;ā>pp
p be a strictly positive distribution on <span id="MathJax-Element-61-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī£nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī£nĪ£n
\Sigma^n. Then, <span id="MathJax-Element-62-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 109%; position: relative;ā>pp
p can be represented as a Markov network with respect to a graph <span id="MathJax-Element-63-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Gā role=āpresentationā style=āfont-size: 109%; position: relative;ā>GG
G if and only if <span id="MathJax-Element-64-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 109%; position: relative;ā>pp
p can be expressed as a Gibbs distribution <span id="MathJax-Element-65-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="p(x)āexpā”{āāCāC(G)EC(xC)}ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>p(x)āexp{āāCāC(G)EC(xC)}p(x)āexpā”{āāCāC(G)EC(xC)}
p(x) \propto \exp\{-\sum_{C \in {\mathcal{C}}(G)} E_C(x_C)\}, where <span id="MathJax-Element-66-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C(G)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>C(G)C(G)
{\mathcal{C}}(G) is the set of cliques (fully connected subsets) of <span id="MathJax-Element-67-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Gā role=āpresentationā style=āfont-size: 109%; position: relative;ā>GG
G.
This theorem says that Markov networks are the same as Gibbs states, with the same notion of locality.
The Hammersley-Clifford theorem implies an area law for mutual information; we will explain what this is and sketch why this is true. Divide a system into two disjoint pieces <span id="MathJax-Element-68-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A and <span id="MathJax-Element-69-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 109%; position: relative;ā>BB
B. We want to know about the mutual information between <span id="MathJax-Element-70-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A and <span id="MathJax-Element-71-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 109%; position: relative;ā>BB
B, <span id="MathJax-Element-72-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;B)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(A;B)I(A;B)
I(A;B). The Hammersley-Clifford theorem gives us a bound which depends only on the size of the boundary <span id="MathJax-Element-73-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
\partial between these sets. For simplicity, assume <span id="MathJax-Element-74-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āāBā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āāBāāB
\partial \subseteq B. Also, assume that the interactions have bounded range; then, the Hammersley-Clifford theorem tells us that <span id="MathJax-Element-75-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;Bā£ā)=0ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(A;Bā£ā)=0I(A;Bā£ā)=0
I(A; B \mid \partial) = 0.
Now, we will use the fact <span id="MathJax-Element-76-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;Bā£ā)=I(A;B,ā)āI(A;ā)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(A;Bā£ā)=I(A;B,ā)āI(A;ā)I(A;Bā£ā)=I(A;B,ā)āI(A;ā)
I(A; B \mid \partial) = I(A; B,\partial) ā I(A; \partial). We can see this by writing out the expressions, but the intuition is that the term on the left asks about how much <span id="MathJax-Element-77-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A knows about <span id="MathJax-Element-78-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 109%; position: relative;ā>BB
B, having already known about <span id="MathJax-Element-79-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
\partial. This equals how much <span id="MathJax-Element-80-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A knows about <span id="MathJax-Element-81-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 109%; position: relative;ā>BB
B and <span id="MathJax-Element-82-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
\partial combined, minus how much <span id="MathJax-Element-83-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A knows about <span id="MathJax-Element-84-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
\partial alone. In this case, since we said <span id="MathJax-Element-85-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āāBā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āāBāāB
\partial \subseteq B, then <span id="MathJax-Element-86-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;B)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(A;B)I(A;B)
I(A; B) is the same as <span id="MathJax-Element-87-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;B,ā)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(A;B,ā)I(A;B,ā)
I(A; B, \partial). In general, however, we have an upper bound:
<span id="MathJax-Element-88-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="I(A;B)ā¤I(A;B,ā)=I(A;ā)+I(A;Bā£ā)ā¤H(ā)ā¤|ā|logā”|Ī£|ā role=āpresentationā>I(A;B)ā¤I(A;B,ā)=I(A;ā)+I(A;Bā£ā) ā¤H(ā)ā¤|ā|log|Ī£|I(A;B)ā¤I(A;B,ā)=I(A;ā)+I(A;Bā£ā)ā¤H(ā)ā¤|ā|logā”|Ī£|
I(A;B) \le I(A; B, \partial) = I(A; \partial) + \cancel{I(A;B \mid \partial)} \le H(\partial) \le |\partial| \log |\Sigma|
In this calculation, we have used <span id="MathJax-Element-89-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;ā)=H(ā)āH(āā£A)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>I(A;ā)=H(ā)āH(āā£A)I(A;ā)=H(ā)āH(āā£A)
I(A; \partial) = H(\partial) ā H(\partial \mid A) (the information between <span id="MathJax-Element-90-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A and <span id="MathJax-Element-91-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
\partial is the amount by which the entropy of <span id="MathJax-Element-92-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
\partial gets reduced once we know <span id="MathJax-Element-93-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A) and <span id="MathJax-Element-94-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="H(āā£A)ā„0ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>H(āā£A)ā„0H(āā£A)ā„0
H(\partial \mid A) \ge 0 (which is true classically).
Since the mutual information only scales with the surface area of the boundary and not with the area of the two regions <span id="MathJax-Element-95-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 109%; position: relative;ā>AA
A and <span id="MathJax-Element-96-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 109%; position: relative;ā>BB
B, this is known as an area law [1].
Relationship to Bayesian inference
In Bayesian inference, we have a model for a system which can be very complicated. The model represents our assumptions on how parts of the system are causally related to the rest of the system. We have some observations, and we want to sample from a distribution conditionally on the fixed observations. Sampling from a conditional distribution is not the same as sampling from the original distribution, but we can still formally represent the conditional distribution as a Markov network. Therefore, sampling from Markov networks is a broadly useful task.
As an example of a complicated Bayesian model, consider a hierarchical Bayesian model [2]. Bayesian statistics requires choosing a prior distribution, and when there is a natural parameterized family of priors that a statistician can use, it may make sense to introduce a distribution over the priors; this is known as introducing a hyperparameter, and inference in the resulting hierarchical model (including computation of the posterior distribution) is frequently intractable. However, it is still desirable to work with these models because they are often more accurate than models in which the prior is handpicked by a statistician.
Sampling from Gibbs distributions
The task of sampling from an arbitrary Gibbs distribution is MA-complete [3], and it is not hard to see that at low enough temperatures this problem is at least NP-hard. So, how do we sample from these distributions?
This section will discuss Monte Carlo Markov chain (MCMC) methods, namely the Metropolis-Hastings algorithm and Glauber dynamics. Readers familiar with these methods may wish to skip to the discussion of mixing in time. For readers who wish to build more intuition about Markov chains before proceeding, see the Appendix, where the simple example of the random walk on a cycle is treated in detail.
Monte Carlo Markov chain (MCMC) methods
The general approach is to use a Markov chain. Let <span id="MathJax-Element-97-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©=Ī£nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī©=Ī£nĪ©=Ī£n
\Omega=\Sigma^n be the possible states of the system. Effectively, a Markov chain is a way of doing a random walk over <span id="MathJax-Element-98-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī©Ī©
\Omega.
The transition probabilities of the Markov chain are1 <span id="MathJax-Element-99-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P{X(t+1)=yā£X(t)=x}=Ty,x.ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>P{X(t+1)=yā£X(t)=x}=Ty,x.P{X(t+1)=yā£X(t)=x}=Ty,x.
{\mathbb P}\{X(t+1) = y \mid X(t) = x\} = T_{y,x}. Here, <span id="MathJax-Element-100-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T is the transition probability matrix. The column at index <span id="MathJax-Element-101-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x of <span id="MathJax-Element-102-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T is the probability distribution of the next state of the Markov chain, if the current state is <span id="MathJax-Element-103-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x. The row at index <span id="MathJax-Element-104-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y is a row of probability values which give the probabilities of jumping into state <span id="MathJax-Element-105-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y from every other state. It has the properties that its entries are non-negative and for every <span id="MathJax-Element-106-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x, <span id="MathJax-Element-107-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āyāĪ©Ty,x=1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>āyāĪ©Ty,x=1āyāĪ©Ty,x=1
\sum_{y \in \Omega} T_{y,x} = 1. These properties say that <span id="MathJax-Element-108-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T is a (column) stochastic matrix.
Suppose we start at a state <span id="MathJax-Element-109-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="x(0)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>x(0)x(0)
x(0); or, more generally, we will start with a distribution <span id="MathJax-Element-110-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 109%; position: relative;ā>pp
p over <span id="MathJax-Element-111-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī©Ī©
\Omega. If we move according to the chain once, the distribution will be <span id="MathJax-Element-112-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tpā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TpTp
Tp. If we move agian, the distribution will be <span id="MathJax-Element-113-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="T2pā role=āpresentationā style=āfont-size: 109%; position: relative;ā>T2pT2p
T^2 p. In general, after <span id="MathJax-Element-114-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>tt
t movements, the distribution is <span id="MathJax-Element-115-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ttpā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TtpTtp
T^t p. So, we can express the dynamics of the chain as matrix-vector multiplication.
It is worth mentioning that if we are simulating the chain on a computer and we are manipulating <span id="MathJax-Element-116-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>nn
n-bit numbers, then these probability vectors are of size <span id="MathJax-Element-117-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="2nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>2n2n
2^n so it becomes impractical to store the entire probability distributions.
The justification for our algorithms is the following theorem.
Theorem 2 (Perron-Frobenius Theorem): If <span id="MathJax-Element-118-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T is a stochastic aperiodic matrix, then one of the eigenvalues is <span id="MathJax-Element-119-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>11
1, and all other eigenvalues have magnitude strictly less than <span id="MathJax-Element-120-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>11
1. There is a unique probability distribution <span id="MathJax-Element-121-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĻĻ
\pi such that <span id="MathJax-Element-122-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="TĻ=Ļā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TĻ=ĻTĻ=Ļ
T\pi = \pi.
The theorem implies that <span id="MathJax-Element-123-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ttpā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TtpTtp
T^t p will converge to the stationary distribution <span id="MathJax-Element-124-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĻĻ
\pi as <span id="MathJax-Element-125-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tāāā role=āpresentationā style=āfont-size: 109%; position: relative;ā>tāātāā
t\to\infty. So, if we want to sample from a distribution, this provides a method of doing so: cook up a Markov chain that equilibrates to the desired distribution, and then run the Markov chain until convergence. A priori, it is not obvious how we can design the Markov chain. At first, our problem was to sample from a probability distribution (a vector), and now we have changed the problem to designing an entire matrix, which does not appear to make our task easier.
Now, the question becomes: how does one come up with Markov chains that give you the desired stationary distribution?
Metropolis-Hastings algorithm
The first algorithm we will introduce is the Metropolis-Hastings algorithm. One more desirable feature of a Markov chain is that it satisfies detailed balance, which says <span id="MathJax-Element-126-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻxTy,x=ĻyTx,yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĻxTy,x=ĻyTx,yĻxTy,x=ĻyTx,y
\pi_x T_{y,x} = \pi_y T_{x,y} for all <span id="MathJax-Element-127-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x and <span id="MathJax-Element-128-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y. This condition says that if we pick a point with probability according to the stationary distribution and transition, the probability of picking <span id="MathJax-Element-129-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x and then moving to <span id="MathJax-Element-130-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y should be the same as picking <span id="MathJax-Element-131-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y and then moving to <span id="MathJax-Element-132-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x.
For a Markov chain in equilibrium, the total amount of probability flowing out of <span id="MathJax-Element-133-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x must equal the total amount of probability flowing into <span id="MathJax-Element-134-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x. For example, the United States might export products to Europe and import from China. Detailed balance says that the flow along each edge must balance, which is a more demanding condition. In the example with country trade deficits, we are requiring that all bilateral trade deficits must be zero.
Mathematically, detailed balance implies that <span id="MathJax-Element-135-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T can be transformed, via similarity transformations, into a symmetric matrix. The Metropolis-Hastings algorithm says that we should choose <span id="MathJax-Element-136-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T with the property <span id="MathJax-Element-137-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tx,yTy,x=ĻxĻy.ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Tx,yTy,x=ĻxĻy.Tx,yTy,x=ĻxĻy.
\frac{T_{x,y}}{T_{y,x}} = \frac{\pi_x}{\pi_y}. Suppose that we have an underlying graph on our state space, and suppose that we are at a state <span id="MathJax-Element-138-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x. The algorithm chooses a random neighbor, say <span id="MathJax-Element-139-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y, and then accepts or rejects this move with some probability. If the move is accepted, then we move to <span id="MathJax-Element-140-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y and continue the algorithm from there. Otherwise, if the move is rejected, then we stay at <span id="MathJax-Element-141-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x. We are free to choose any underlying graph (as long as it is connected and has a self-loop), and then we will tune the acceptance probability so that detailed balance holds.
Look at the trial move <span id="MathJax-Element-142-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xāyā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xāyxāy
x\to y. One way we can accomplish detailed balance is by looking at the ratio <span id="MathJax-Element-143-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļy/Ļxā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļy/ĻxĻy/Ļx
\pi_y/\pi_x. If <span id="MathJax-Element-144-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļy/Ļxā„1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļy/Ļxā„1Ļy/Ļxā„1
\pi_y/\pi_x \ge 1, then always accept the move. If <span id="MathJax-Element-145-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļy/Ļx<1ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļy/Ļx<1Ļy/Ļx<1
% <![CDATA[ \pi_y/\pi_x , then accept the move with probability <span id="MathJax-Element-146-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļx/Ļyā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļx/ĻyĻx/Ļy
\pi_x/\pi_y.
To get an idea for how the algorithm works, suppose that our underlying graph is <span id="MathJax-Element-147-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="dā role=āpresentationā style=āfont-size: 109%; position: relative;ā>dd
d-regular. Then, for neighbors <span id="MathJax-Element-148-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x and <span id="MathJax-Element-149-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y,
<span id="MathJax-Element-150-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="Ty,x=min{1,ĻyĻx}1d,Tx,y=min{1,ĻxĻy}1dā role=āpresentationā>Ty,x=min{1,ĻyĻx}1d,Tx,y=min{1,ĻxĻy}1dTy,x=min{1,ĻyĻx}1d,Tx,y=min{1,ĻxĻy}1d
%
Claim: <span id="MathJax-Element-151-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ty,xĻx=1dmin{Ļx,Ļy},ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ty,xĻx=1dmin{Ļx,Ļy},Ty,xĻx=1dmin{Ļx,Ļy},
T_{y,x} \pi_x = \frac{1}{d} \min\{\pi_x,\pi_y\}, which is manifestly symmetric in <span id="MathJax-Element-152-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x and <span id="MathJax-Element-153-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yā role=āpresentationā style=āfont-size: 109%; position: relative;ā>yy
y; thus, we have reversibility. This is the basic idea of the Metropolis-Hastings algorithm.
How does it work for a Gibbs distribution <span id="MathJax-Element-154-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļx=expā”{āE(x)/T}/Zā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļx=exp{āE(x)/T}/ZĻx=expā”{āE(x)/T}/Z
\pi_x = \exp\{-E(x)/T\}/Z, where the energy function might, for example, count the number of violated clauses in a 3-SAT formula? In this case, we might be a little worried. The numerator of <span id="MathJax-Element-155-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļxā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĻxĻx
\pi_x is pretty easy to compute (we can count how many violated constraints there are), but the denominator is hard to compute. In general, it is #P-hard to compute the denominator, because as <span id="MathJax-Element-156-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>TT
T drops to <span id="MathJax-Element-157-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>00
0, the partition function in this case approaches the number of 3-SAT solutions. So, how do we calculate the ratios <span id="MathJax-Element-158-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļy/Ļxā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļy/ĻxĻy/Ļx
\pi_y/\pi_x that the algorithm requires? Weāre able to do this because the ratio does not depend on <span id="MathJax-Element-159-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Zā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ZZ
Z:
<span id="MathJax-Element-160-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 109%; text-align: center; position: relative;" data-mathml="ĻyĻx=expā”E(x)āE(y)T.ā role=āpresentationā>ĻyĻx=expE(x)āE(y)T.ĻyĻx=expā”E(x)āE(y)T.
\frac{\pi_y}{\pi_x} = \exp \frac{E(x)-E(y)}{T}.
Suppose that the energy is a sum of local terms, and the underlying graph corresponds to modifying one site at at a time. What this means is that the graph is <span id="MathJax-Element-161-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©={0,1}nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ī©={0,1}nĪ©={0,1}n
\Omega = {\{0,1\}}^n and the edges in the graph correspond to flipping exactly one bit. In this case, it becomes very easy to evaluate the computations needed for the algorithm; in fact, we can even do them in parallel.
How do we choose the underlying graph? The key idea is that we do not want the majority of our moves to be rejected. A good example to keep in mind is the Ising model, where the configurations are <span id="MathJax-Element-162-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā{0,1}nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xā{0,1}nxā{0,1}n
x \in {\{0,1\}}^n and the energy is <span id="MathJax-Element-163-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E(x)=āāi,j=1nJi,jxixjā role=āpresentationā style=āfont-size: 109%; position: relative;ā>E(x)=āāni,j=1Ji,jxixjE(x)=āāi,j=1nJi,jxixj
E(x) = -\sum_{i,j=1}^n J_{i,j} x_i x_j. If <span id="MathJax-Element-164-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ji,jā„0ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ji,jā„0Ji,jā„0
J_{i,j} \ge 0 for all <span id="MathJax-Element-165-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ii
i, <span id="MathJax-Element-166-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="jā role=āpresentationā style=āfont-size: 109%; position: relative;ā>jj
j, then we say that the model is ferromagnetic (we obtain lower energy by making the sites agree with each other). Of course, an antiferromagnetic model is just the opposite of this.
Assume that the bits are laid out in a square and <span id="MathJax-Element-167-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ji,j=Jā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ji,j=JJi,j=J
J_{i,j} = J if <span id="MathJax-Element-168-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ii
i and <span id="MathJax-Element-169-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="jā role=āpresentationā style=āfont-size: 109%; position: relative;ā>jj
j are neighbors on the square, and <span id="MathJax-Element-170-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ji,j=0ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ji,j=0Ji,j=0
J_{i,j} = 0 if they are not. As we vary the quantity <span id="MathJax-Element-171-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="J/Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>J/TJ/T
J/T, we observe a phase transition. If <span id="MathJax-Element-172-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="J/Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>J/TJ/T
J/T is small, then the coupling between the random variables is weak and the different parts of the system are almost independent; we call this the disordered phase. If <span id="MathJax-Element-173-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="J/Tā role=āpresentationā style=āfont-size: 109%; position: relative;ā>J/TJ/T
J/T is large, then the spins want to align in the same direction and the Gibbs distribution will look almost like the following: with probability <span id="MathJax-Element-174-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>1/21/2
1/2, all spins are <span id="MathJax-Element-175-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>++
+, and with probability <span id="MathJax-Element-176-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā³ role=āpresentationā style=āfont-size: 109%; position: relative;ā>1/21/2
1/2, all spins are <span id="MathJax-Element-177-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
-; we call this the ordered phase.
In the disordered phase, when the spins do not need to align so closely, the Metropolis-Hastings algorithm will work well. In the ordered phase, the algorithm is doomed. Indeed, suppose that most of the spins are <span id="MathJax-Element-178-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>++
+. As time proceeds, any <span id="MathJax-Element-179-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
-s will switch to <span id="MathJax-Element-180-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>++
+. There may be islands of <span id="MathJax-Element-181-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
ā spins initially, but it will be energetically favorable for these islands to shrink over time. Therefore, there will be an exponentially small chance for the system to switch to a configuration with mostly <span id="MathJax-Element-182-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
-ās, and thus the chain takes exponentially long to mix. Here, people are interested in understanding the autocorrelation time, because the goal is to run the chain for some time, get one sample, run the chain for some more time, get another sample, etc.
Glauber dynamics
This next method (Glauber dynamics) is essentially the same as Metropolis-Hastings, but this is not immediately obvious. We are at a state <span id="MathJax-Element-183-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="x=(x1,ā¦,xn)āĪ£nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>x=(x1,ā¦,xn)āĪ£nx=(x1,ā¦,xn)āĪ£n
x = (x_1,\dotsc,x_n) \in \Sigma^n. (For the Metropolis-Hastings algorithm, we could be walking on a state space without a product structure. However, Glauber dynamics requires a product structure.) Then, we update <span id="MathJax-Element-184-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xx
x to <span id="MathJax-Element-185-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(x1,ā¦,xiā1,xiā²,xi+1,ā¦,xn)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>(x1,ā¦,xiā1,xā²i,xi+1,ā¦,xn)(x1,ā¦,xiā1,xiā²,xi+1,ā¦,xn)
(x_1,\dotsc,x_{i-1},x_iā,x_{i+1},\dotsc,x_n) with chance <span id="MathJax-Element-186-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļiā£āi(xiā²ā£x1,ā¦,xiā1,xi+1,ā¦,xn)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļiā£āi(xā²iā£x1,ā¦,xiā1,xi+1,ā¦,xn)Ļiā£āi(xiā²ā£x1,ā¦,xiā1,xi+1,ā¦,xn)
\pi_{i\mid -i}(x_iā \mid x_1,\dotsc,x_{i-1},x_{i+1},\dotsc,x_n). In other words, we hold all other bits fixed, and conditioned on those other bits, we resample the <span id="MathJax-Element-187-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ii
ith bit. Like Metropolis-Hastings, <span id="MathJax-Element-188-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĻĻ
\pi is stationary for this chain.
It is not obvious that these conditional distributions can be computed efficiently, but it is possible since normalizing the conditional distribution only requires summing over the possible configurations for a single random variable. On a Markov network, the conditional probability is <span id="MathJax-Element-189-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļiā£N(i)(xiā²ā£xN(i))ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>Ļiā£N(i)(xā²iā£xN(i))Ļiā£N(i)(xiā²ā£xN(i))
\pi_{i \mid N(i)}(x_iā \mid x_{N(i)}), where <span id="MathJax-Element-190-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="N(i)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>N(i)N(i)
N(i) denotes the set of neighbors of <span id="MathJax-Element-191-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ii
i. This makes the computation a constant-sized calculation (i.e., does not depend on the size of the system).
For example, in the Ising model, suppose we are at state <span id="MathJax-Element-192-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="xā{Ā±1}nā role=āpresentationā style=āfont-size: 109%; position: relative;ā>xā{Ā±1}nxā{Ā±1}n
x \in {\{\pm 1\}}^n. In Glauber dynamics, we pick a vertex <span id="MathJax-Element-193-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā[n]ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>iā[n]iā[n]
i \in [n] u.a.r. and update it to <span id="MathJax-Element-194-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>++
+ with probability <span id="MathJax-Element-195-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="piā£N(i)(+ā£xN(i))=expā”(Tā1ājāN(i)xj)expā”(āTā1ājāN(i)xj)+expā”(Tā1ājāN(i)xj).ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>piā£N(i)(+ā£xN(i))=exp(Tā1ājāN(i)xj)exp(āTā1ājāN(i)xj)+exp(Tā1ājāN(i)xj).piā£N(i)(+ā£xN(i))=expā”(Tā1ājāN(i)xj)expā”(āTā1ājāN(i)xj)+expā”(Tā1ājāN(i)xj).
p_{i \mid N(i)}(+ \mid x_{N(i)}) = \frac{\exp(T^{-1}\sum_{j\in N(i)} x_j)}{\exp(-T^{-1} \sum_{j\in N(i)} x_j) + \exp(T^{-1} \sum_{j\in N(i)} x_j)}.
Mixing in time
Mixing in time means that the dynamics will equilibrate rapidly. It turns out that this is equivalent to mixing in space, which means that <span id="MathJax-Element-196-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 109%; position: relative;ā>ĻĻ
\pi itself has decaying correlations. For example, the Ising model at low temperature has a lot of long-range correlations, but at high temperature it does not. For the high temperature regime, we can prove that mixing in time occurs. We will prove this for the ferromagnetic Ising model. The result is known more generally, but the proofs are much easier for the Ising model.
People have known about the Metropolis-Hastings algorithm since the 1950s, but only recently have researchers been able to prove convergence guarantees for the 2D Ising model. There is a large gap between theory and practice, but in some situations we can prove that the algorithm works.
Sampling from the distribution is roughly equivalent to estimating the partition function (sampling-counting equivalence). There have been many papers addressing tasks such as estimating the non-negative permanent, the number of colorings of a graph, etc.2 A dominant way of accomplishing these tasks is proving that the Metropolis-Hastings algorithm converges for these problems. It is easy to find algorithms for these problems that converge to Gibbs distributions, but the convergence may take exponential time.
We will look at the situation when the energy function looks like the Ising model, in the sense that the interactions are local and reflect the structure of some underlying space. Also, assume that the interactions are of size <span id="MathJax-Element-197-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="O(1)ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>O(1)O(1)
O(1) and that the scaling comes from the size of the system. When can we expect that our algorithms work? There are two main cases when we can argue that there should be rapid mixing.
High temperature regime: The system is very disordered, and in the limit as the temperature approaches infinity, we get the uniform distribution.
One-dimension: In 1D, we can exactly compute the partition function using dynamic programming. Before, we mentioned that if there are a sea of <span id="MathJax-Element-198-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>++
+s and an island of <span id="MathJax-Element-199-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 109%; position: relative;ā>āā
-s, then it is energetically favorable for the island to shrink; note that this is no longer true in 1D. In a way, 1D systems are more āboringā because they cannot exhibit arbitrarily long-range correlations.
In this part of the blog post, we will try to be more proof-oriented. We will start by explaining why it is plausible that high temperature means that the chain will mix rapidly in time.
Coupling method
One method of proving rates of convergence for Markov chains is by analzying the spectral gap. Another method is the coupling method.
The idea behind the coupling method is to start with two configurations <span id="MathJax-Element-200-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0),Y(0)āĪ©ā role=āpresentationā style=āfont-size: 109%; position: relative;ā>X(0),Y(0)āĪ©X(0),Y(0)āĪ©
X(0),Y(0) \in \Omega. We want each one to evolve under the Markov chain.
The key part is that there is still some freedom with respect to what the dynamics looks like. In particular, we are allowed to correlate the <span id="MathJax-Element-201-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Xā role=āpresentationā style=āfont-size: 110%; position: relative;ā>XX
X and <span id="MathJax-Element-202-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Yā role=āpresentationā style=āfont-size: 110%; position: relative;ā>YY
Y processes. Thus, we are defining a joint transition probability <span id="MathJax-Element-203-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P{X(1)=x(1),Y(1)=y(1)ā£X(0),Y(0)}ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>P{X(1)=x(1),Y(1)=y(1)ā£X(0),Y(0)}P{X(1)=x(1),Y(1)=y(1)ā£X(0),Y(0)}
{\mathbb P}\{X(1)=x(1),Y(1)=y(1) \mid X(0),Y(0)\}. We want to design the process such that <span id="MathJax-Element-204-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(1)X(1)
X(1) and <span id="MathJax-Element-205-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(1)Y(1)
Y(1) are closer together than <span id="MathJax-Element-206-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)X(0)
X(0) and <span id="MathJax-Element-207-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(0)Y(0)
Y(0). Imagine that we have two particles bouncing around. Each particle follows the dynamics of <span id="MathJax-Element-208-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 110%; position: relative;ā>TT
T, but they are correlated so that they drift together, and once they meet, they stick together. It turns out that the mixing time can be upper bounded by the time it takes for the particles to meet each other.
Assume we have some sort of distance function <span id="MathJax-Element-209-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="distā role=āpresentationā style=āfont-size: 110%; position: relative;ā>distdist
\operatorname{dist} on the underlying space and we can prove that <span id="MathJax-Element-210-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Eā”distā”(X(1),Y(1))ā¤expā”(āĪ±)distā”(X(0),Y(0))ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Edist(X(1),Y(1))ā¤exp(āĪ±)dist(X(0),Y(0))Eā”distā”(X(1),Y(1))ā¤expā”(āĪ±)distā”(X(0),Y(0))
\operatorname{\mathbb E}\operatorname{dist}(X(1),Y(1)) \le \exp(-\alpha) \operatorname{dist}(X(0),Y(0)). Then, it turns out that the mixing time <span id="MathJax-Element-211-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tmix(Ļµ)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>tmix(Ļµ)tmix(Ļµ)
t_{\rm mix}(\epsilon), i.e. the time required to get within <span id="MathJax-Element-212-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļµā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻµĻµ
\epsilon of the stationary distribution, is upper bounded as
<span id="MathJax-Element-213-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="tmix(Ļµ)ā¤logā”{(diamā”Ī©)/Ļµ}Ī±ā role=āpresentationā>tmix(Ļµ)ā¤log{(diamĪ©)/Ļµ}Ī±tmix(Ļµ)ā¤logā”{(diamā”Ī©)/Ļµ}Ī±
t_{\rm mix}(\epsilon) \le \frac{\log\{(\operatorname{diam}\Omega)/\epsilon\}}{\alpha}
Initially, the two particles can be <span id="MathJax-Element-214-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="diamā”Ī©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>diamĪ©diamā”Ī©
\operatorname{diam}\Omega apart, but the expected distance is exponentially shrinking as we run the coupling, so the mixing time is logarithmic in the diameter.
The distance between probability distributions is defined as follows. Let <span id="MathJax-Element-215-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pp
p and <span id="MathJax-Element-216-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā role=āpresentationā style=āfont-size: 110%; position: relative;ā>qq
q be two probability distributions on <span id="MathJax-Element-217-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī©Ī©
\Omega. Then, the metric is:3
<span id="MathJax-Element-218-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="12āpāqā1=12āxāĪ©|p(x)āq(x)|=min(X,Y)ā¼rāP(Ī©ĆĪ©)r1=pr2=qPr{Xā Y}.ā role=āpresentationā>12ā„pāqā„1=12āxāĪ©|p(x)āq(x)|=min(X,Y)ā¼rāP(Ī©ĆĪ©)r1=pr2=qPr{Xā Y}.12āpāqā1=12āxāĪ©|p(x)āq(x)|=min(X,Y)ā¼rāP(Ī©ĆĪ©)r1=pr2=qPr{Xā Y}.
\frac{1}{2} \|p-q\|_1 = \frac{1}{2}\sum_{x \in \Omega}|p(x)-q(x)| = \min_{\substack{(X,Y) \sim r \in {\mathcal{P}}(\Omega \times \Omega) \\ r_1 = p \\ r_2 = q}} {\mathbb P}_r\{X \ne Y\}.
In this expression, <span id="MathJax-Element-219-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="r1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>r1r1
r_1 and <span id="MathJax-Element-220-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="r2ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>r2r2
r_2 denote the first and second marginals of <span id="MathJax-Element-221-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="rā role=āpresentationā style=āfont-size: 110%; position: relative;ā>rr
r respectively. The minimum is taken over all couplings of <span id="MathJax-Element-222-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pp
p and <span id="MathJax-Element-223-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā role=āpresentationā style=āfont-size: 110%; position: relative;ā>qq
q. This is the correct way to measure the distance between distributions. To give some intuition for this quantity, the quantity on the right represents the best test to distinguish the two distributions. If <span id="MathJax-Element-224-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pp
p and <span id="MathJax-Element-225-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā role=āpresentationā style=āfont-size: 110%; position: relative;ā>qq
q are the same, we can take a coupling in which <span id="MathJax-Element-226-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Xā¼pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Xā¼pXā¼p
X \sim p and <span id="MathJax-Element-227-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Yā¼qā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Yā¼qYā¼q
Y \sim q are always identical. If <span id="MathJax-Element-228-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pp
p and <span id="MathJax-Element-229-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="qā role=āpresentationā style=āfont-size: 110%; position: relative;ā>qq
q have disjoint supports, then no matter what coupling we use, <span id="MathJax-Element-230-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Xā role=āpresentationā style=āfont-size: 110%; position: relative;ā>XX
X and <span id="MathJax-Element-231-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Yā role=āpresentationā style=āfont-size: 110%; position: relative;ā>YY
Y will never be equal.
It suffices to consider when <span id="MathJax-Element-232-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)X(0)
X(0) and <span id="MathJax-Element-233-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(0)Y(0)
Y(0) are neighbors, i.e. at distance <span id="MathJax-Element-234-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1 apart. This is because if we have <span id="MathJax-Element-235-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)X(0)
X(0) and <span id="MathJax-Element-236-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(0)Y(0)
Y(0) far apart, then we could look at the path between them and reduce to the case when they are neighbors. Formally, this is known as path coupling. The formal statement is in Theorem 12.3 of [4]:
Theorem 3: Let <span id="MathJax-Element-237-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪĪ
\Gamma be a connected weighted graph on the state space, where no edge has weight less than <span id="MathJax-Element-238-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="dminā role=āpresentationā style=āfont-size: 110%; position: relative;ā>dmindmin
d_{\min}. Let <span id="MathJax-Element-239-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="d(C,Cā²)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>d(C,Cā²)d(C,Cā²)
d(C,Cā) be the length of the shortest path from <span id="MathJax-Element-240-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C to <span id="MathJax-Element-241-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā²ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Cā²Cā²
Cā in <span id="MathJax-Element-242-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪĪ
\Gamma and let <span id="MathJax-Element-243-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="dmax=maxC,Cā²āĪ©d(C,Cā²)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>dmax=maxC,Cā²āĪ©d(C,Cā²)dmax=maxC,Cā²āĪ©d(C,Cā²)
d_{\max} = \max_{C,Cā \in \Omega} d(C,Cā) be the diameter of <span id="MathJax-Element-244-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪĪ
\Gamma. Suppose there is a coupling such that for some <span id="MathJax-Element-245-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī“>0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī“>0Ī“>0
\delta > 0,
<span id="MathJax-Element-246-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Eā”[d(X(1),Y(1))|(X(0),Y(0))=(C,Cā²)]ā¤(1āĪ“)d(C,Cā²)ā role=āpresentationā>E[d(X(1),Y(1))ā£ā£(X(0),Y(0))=(C,Cā²)]ā¤(1āĪ“)d(C,Cā²)Eā”[d(X(1),Y(1))|(X(0),Y(0))=(C,Cā²)]ā¤(1āĪ“)d(C,Cā²)
\operatorname{\mathbb E}\bigl[d\bigl(X(1),Y(1)\bigr) \bigm\vert \bigl(X(0),Y(0)\bigr) = (C,Cā)\bigr] \le (1-\delta)d(C,Cā)
for all neighboring pairs <span id="MathJax-Element-247-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C, <span id="MathJax-Element-248-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā²ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Cā²Cā²
Cā, i.e., those pairs connected by an edge in <span id="MathJax-Element-249-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪĪ
\Gamma. Then, the mixing time is bounded by
<span id="MathJax-Element-250-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="tmix(Ļµ)ā¤logā”(Ļµā1dmax/dmin)Ī“.ā role=āpresentationā>tmix(Ļµ)ā¤log(Ļµā1dmax/dmin)Ī“.tmix(Ļµ)ā¤logā”(Ļµā1dmax/dmin)Ī“.
t_{\rm mix}(\epsilon) \le \frac{\log(\epsilon^{-1}d_{\max}/d_{\min})}{\delta}.
Glauber dynamics at high temperature
Recall that in Glauber dynamics, we pick a site <span id="MathJax-Element-251-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ii
i randomly and then update the site conditioned on its neighbors. The first way we will couple together <span id="MathJax-Element-252-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(1)X(1)
X(1) and <span id="MathJax-Element-253-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(1)Y(1)
Y(1) is by picking the same site for both of them.
Pick a random <span id="MathJax-Element-254-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā[n]ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>iā[n]iā[n]
i \in [n].
If <span id="MathJax-Element-255-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)N(i)=Y(0)N(i)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)N(i)=Y(0)N(i)X(0)N(i)=Y(0)N(i)
{X(0)}_{N(i)} = {Y(0)}_{N(i)}, then set <span id="MathJax-Element-256-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)i=Y(1)iā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(1)i=Y(1)iX(1)i=Y(1)i
{X(1)}_i = {Y(1)}_i (if the neighborhoods of the two points agree, then update them the same way). Otherwise, update them using the best possible coupling, i.e., pick a coupling for <span id="MathJax-Element-257-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(X(1)i,Y(1)i)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>(X(1)i,Y(1)i)(X(1)i,Y(1)i)
({X(1)}_i, {Y(1)}_i) which minimizes <span id="MathJax-Element-258-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="P{X(1)iā Y(1)i}ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>P{X(1)iā Y(1)i}P{X(1)iā Y(1)i}
{\mathbb P}\{ {X(1)}_i \ne {Y(1)}_i \}.
So if <span id="MathJax-Element-259-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)=Y(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)=Y(0)X(0)=Y(0)
X(0) = Y(0), then the points will never drift apart. The reason why analyzing this coupling is non-trivial is because there is a chance that the distance between the two points can increase.
Assume that the degree of the graph is <span id="MathJax-Element-260-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪĪ
\Delta. Suppose that <span id="MathJax-Element-261-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="distā”(X(0),Y(0))=1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>dist(X(0),Y(0))=1distā”(X(0),Y(0))=1
\operatorname{dist}(X(0),Y(0)) = 1, that is, there is a single <span id="MathJax-Element-262-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>aa
a such that <span id="MathJax-Element-263-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)aā Y(0)aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)aā Y(0)aX(0)aā Y(0)a
{X(0)}_a \ne {Y(0)}_a. What will happen to <span id="MathJax-Element-264-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(1)X(1)
X(1) and <span id="MathJax-Element-265-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(1)Y(1)
Y(1)? We start by picking a random <span id="MathJax-Element-266-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā[n]ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>iā[n]iā[n]
i \in [n]. There are three cases:
<span id="MathJax-Element-267-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā({a}āŖN(a))ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>iā({a}āŖN(a))iā({a}āŖN(a))
i \notin (\{a\} \cup N(a)) (with probability <span id="MathJax-Element-268-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā(Ī+1)/nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>1ā(Ī+1)/n1ā(Ī+1)/n
1 ā (\Delta + 1)/n): Nothing changes; <span id="MathJax-Element-269-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)X(0)
X(0) and <span id="MathJax-Element-270-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(0)Y(0)
Y(0) agree at <span id="MathJax-Element-271-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ii
i, and <span id="MathJax-Element-272-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(1)X(1)
X(1) and <span id="MathJax-Element-273-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(1)Y(1)
Y(1) will also agree at <span id="MathJax-Element-274-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ii
i. The distance remains at <span id="MathJax-Element-275-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1.
<span id="MathJax-Element-276-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="i=aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>i=ai=a
i = a (with probability <span id="MathJax-Element-277-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>1/n1/n
1/n): We picked the one spot in which the two configurations differ. The neighborhoods of <span id="MathJax-Element-278-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>aa
a are the same for <span id="MathJax-Element-279-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)X(0)
X(0) and <span id="MathJax-Element-280-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Y(0)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Y(0)Y(0)
Y(0), so we update in the same way for both processes, and the distance drops to <span id="MathJax-Element-281-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>00
0.
<span id="MathJax-Element-282-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iāN(a)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>iāN(a)iāN(a)
i \in N(a) (with probability <span id="MathJax-Element-283-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī/nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī/nĪ/n
\Delta/n): We could have different updates. Here, we have to use the high temperature assumption, which says that if we change one bit, the probability of a configuration cannot change too much.
In the Ising model, <span id="MathJax-Element-284-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="E(x)=āi,j=1nJi,jxixjā role=āpresentationā style=āfont-size: 110%; position: relative;ā>E(x)=āni,j=1Ji,jxixjE(x)=āi,j=1nJi,jxixj
E(x) = \sum_{i,j=1}^n J_{i,j} x_i x_j. Changing <span id="MathJax-Element-285-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>aa
a can bias the energy by at most <span id="MathJax-Element-286-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĪmaxiJi,aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪmaxiJi,aĪmaxiJi,a
\Delta\max_i J_{i,a}, so the expected distance afterwards is <span id="MathJax-Element-287-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1+O(maxi,j=1nJi,j/T)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>1+O(maxni,j=1Ji,j/T)1+O(maxi,j=1nJi,j/T)
1 + O(\max_{i,j=1}^n J_{i,j}/T).
Adding these cases up to get the overall expected distance gives
<span id="MathJax-Element-288-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Eā”distā”(X(1),Y(1))=1ā1n+O(ĪJmaxT)āā¤11n=1ācnā role=āpresentationā>Edist(X(1),Y(1))=1ā1n+O(ĪJmaxT)ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ī
ā¤11n=1ācnEā”distā”(X(1),Y(1))=1ā1n+O(ĪJmaxT)āā¤11n=1ācn
\operatorname{\mathbb E}\operatorname{dist}\bigl(X(1), Y(1)\bigr) = 1-\frac{1}{n} + \underbrace{O\Bigl(\frac{\Delta J_{\max}}{T}\Bigr)}_{\le 1}\frac{1}{n} = 1 ā \frac{c}{n}
for <span id="MathJax-Element-289-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 110%; position: relative;ā>TT
T large enough, so the expected distance will shrink. This argument also tells us how large the temperature must be, which is important for applications. This gives us <span id="MathJax-Element-290-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tmix(Ļµ)=O(nlogā”nĻµ).ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>tmix(Ļµ)=O(nlognĻµ).tmix(Ļµ)=O(nlogā”nĻµ).
t_{\rm mix}(\epsilon) = O\Bigl(n\log\frac{n}{\epsilon}\Bigr). Notice that this is the same dependence as the coupon collector problem. Therefore, in the high temperature regime, the system behaves qualitatively as if there are no correlations.
Temporal and spatial mixing equivalence
The analysis of Glauber dynamics at high temperature is already a version of the equivalence between mixing in time and mixing in space. It says that if the correlations even with the immediate neighbors of a node are weak, then Glauber dynamics rapidly mixes.
Now, we want to consider the situation in which there can be strong correlations between immediate neighbors, but weak correlation with far away sites. We want to show that spatial mixing implies temporal mixing.
We will give a few definitions of correlation decay. (Note: The definitions of correlation decay below are not exactly the ones from Aramās lecture. These definitions are from [5] and [6].)
For non-empty <span id="MathJax-Element-291-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="WāVā role=āpresentationā style=āfont-size: 110%; position: relative;ā>WāVWāV
W \subseteq V and <span id="MathJax-Element-292-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻāĪ£VāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻāĪ£VāWĻāĪ£VāW
\tau \in \Sigma^{V\setminus W}, let <span id="MathJax-Element-293-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¼WĻā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī¼ĻWĪ¼WĻ
\mu_W^\tau be the distribution of the spins in <span id="MathJax-Element-294-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā role=āpresentationā style=āfont-size: 110%; position: relative;ā>WW
W conditional on the spins in <span id="MathJax-Element-295-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="VāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>VāWVāW
V \setminus W being fixed to <span id="MathJax-Element-296-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\tau. For <span id="MathJax-Element-297-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĪāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪāWĪāW
\Delta \subseteq W, let <span id="MathJax-Element-298-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¼W,ĪĻā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī¼ĻW,ĪĪ¼W,ĪĻ
\mu_{W,\Delta}^\tau be the marginal of <span id="MathJax-Element-299-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¼WĻā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī¼ĻWĪ¼WĻ
\mu_W^\tau on the spins in <span id="MathJax-Element-300-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪĪ
\Delta. We will assume that the interactions between the spins have finite range <span id="MathJax-Element-301-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="r>0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>r>0r>0
r > 0, and <span id="MathJax-Element-302-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ārWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ārWārW
\partial_r W denotes the <span id="MathJax-Element-303-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="rā role=āpresentationā style=āfont-size: 110%; position: relative;ā>rr
r-boundary of <span id="MathJax-Element-304-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā role=āpresentationā style=āfont-size: 110%; position: relative;ā>WW
W, i.e., <span id="MathJax-Element-305-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="{vāVāW:distā”(v,W)ā¤r}ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>{vāVāW:dist(v,W)ā¤r}{vāVāW:distā”(v,W)ā¤r}
\{v \in V \setminus W : \operatorname{dist}(v,W) \le r\}.
(Weak decay of correlations) Weak spatial mixing holds for <span id="MathJax-Element-306-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā role=āpresentationā style=āfont-size: 110%; position: relative;ā>WW
W if there exist constants <span id="MathJax-Element-307-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C,Ī¾>0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>C,Ī¾>0C,Ī¾>0
C, \xi > 0 such that for any subset <span id="MathJax-Element-308-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĪāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪāWĪāW
\Delta \subseteq W, <span id="MathJax-Element-309-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="supĻ,Ļā²āĪ£VāWāĪ¼W,ĪĻāĪ¼W,ĪĻā²ā1ā¤CāxāĪ,yāārWexpā”(ādistā”(x,y)Ī¾).ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>supĻ,Ļā²āĪ£VāWā„Ī¼ĻW,ĪāĪ¼Ļā²W,Īā„1ā¤CāxāĪ,yāārWexp(ādist(x,y)Ī¾).supĻ,Ļā²āĪ£VāWāĪ¼W,ĪĻāĪ¼W,ĪĻā²ā1ā¤CāxāĪ,yāārWexpā”(ādistā”(x,y)Ī¾).
\sup_{\tau,\tauā \in \Sigma^{V\setminus W}}\|\mu_{W,\Delta}^\tau ā \mu_{W,\Delta}^{\tauā}\|_1 \le C\sum_{x\in\Delta, \; y \in \partial_r W} \exp\Bigl(- \frac{\operatorname{dist}(x,y)}{\xi}\Bigr).
(Strong decay of correlations) Strong spatial mixing holds for <span id="MathJax-Element-310-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Wā role=āpresentationā style=āfont-size: 110%; position: relative;ā>WW
W if there exist constants <span id="MathJax-Element-311-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="C,Ī¾>0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>C,Ī¾>0C,Ī¾>0
C,\xi > 0 such that for every <span id="MathJax-Element-312-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĪāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪāWĪāW
\Delta \subseteq W and every <span id="MathJax-Element-313-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ,Ļā²āĪ£VāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ļ,Ļā²āĪ£VāWĻ,Ļā²āĪ£VāW
\tau,\tauā \in \Sigma^{V\setminus W} differing only at site <span id="MathJax-Element-314-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="yāVāWā role=āpresentationā style=āfont-size: 110%; position: relative;ā>yāVāWyāVāW
y \in V\setminus W, <span id="MathJax-Element-315-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āĪ¼W,ĪĻāĪ¼W,ĪĻā²ā1ā¤Cexpā”(ādistā”(y,Ī)Ī¾).ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ā„Ī¼ĻW,ĪāĪ¼Ļā²W,Īā„1ā¤Cexp(ādist(y,Ī)Ī¾).āĪ¼W,ĪĻāĪ¼W,ĪĻā²ā1ā¤Cexpā”(ādistā”(y,Ī)Ī¾).
\|\mu_{W,\Delta}^\tau ā \mu_{W,\Delta}^{\tauā}\|_1 \le C\exp\Bigl(-\frac{\operatorname{dist}(y,\Delta)}{\xi}\Bigr).
(Strong decay of correlations) Strong spatial mixing in the truncated sense holds for <span id="MathJax-Element-316-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Vā role=āpresentationā style=āfont-size: 110%; position: relative;ā>VV
V if there exist <span id="MathJax-Element-317-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="n,Ī¾>0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>n,Ī¾>0n,Ī¾>0
n, \xi > 0 such that for all functions <span id="MathJax-Element-318-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="f,g:Ī©āRā role=āpresentationā style=āfont-size: 110%; position: relative;ā>f,g:Ī©āRf,g:Ī©āR
f, g : \Omega \to {\mathbb R} which depend only on the sites at <span id="MathJax-Element-319-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īfā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪfĪf
\Lambda_f and <span id="MathJax-Element-320-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Īgā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĪgĪg
\Lambda_g respectively and such that <span id="MathJax-Element-321-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="distā”(Īf,Īg)ā„nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>dist(Īf,Īg)ā„ndistā”(Īf,Īg)ā„n
\operatorname{dist}(\Lambda_f,\Lambda_g) \ge n, <span id="MathJax-Element-322-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="supĻāĪ£VāWcovĪ¼WĻā”(f,g)ā¤|Īf||Īg|āfāāāgāāexpā”(ād(Īf,Īg)Ī¾).ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>supĻāĪ£VāWcovĪ¼ĻW(f,g)ā¤|Īf||Īg|ā„fā„āā„gā„āexp(ād(Īf,Īg)Ī¾).supĻāĪ£VāWcovĪ¼WĻā”(f,g)ā¤|Īf||Īg|āfāāāgāāexpā”(ād(Īf,Īg)Ī¾).
\sup_{\tau \in \Sigma^{V\setminus W}} \operatorname{cov}_{\mu_W^\tau}(f, g) \le |\Lambda_f||\Lambda_g|\|f\|_\infty \|g\|_\infty \exp\Bigl(-\frac{d(\Lambda_f,\Lambda_g)}{\xi}\Bigr).
Here, <span id="MathJax-Element-323-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¾ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī¾Ī¾
\xi is the correlation length (in physics, it is the characteristic length scale of a system). In the disordered phase, the correlation length is a constant independent of system size. For our purposes, the main consequence of these definitions is that the effective interaction range of each spin is <span id="MathJax-Element-324-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="O(Ī¾)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>O(Ī¾)O(Ī¾)
O(\xi). For the Ising model, there is a key simplification due to monotonicity. Namely, the ferromagnetic Ising model has the nice property (which is not true for other models) that if we flip a sign from <span id="MathJax-Element-325-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āā
ā to <span id="MathJax-Element-326-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+, this only makes <span id="MathJax-Element-327-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+ more likely everywhere. This is because the spins want to agree. There are a lot of boundary conditions to consider, but here, due to monotonicity, we only need to consider two: all of the spins are <span id="MathJax-Element-328-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āā
-, and all of the spins are <span id="MathJax-Element-329-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+. All <span id="MathJax-Element-330-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+ spins will give the highest probability of a <span id="MathJax-Element-331-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+ spin, and all <span id="MathJax-Element-332-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āā
ā spin will give the lowest probability of a <span id="MathJax-Element-333-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+ spin. This monotonicity property is generally not required for time-space mixing equivalence to hold, but it greatly simplifies proofs.
It is a very non-obvious fact that all of these notions of spatial mixing are equivalent. We will sketch a proof that strong correlation decay implies that <span id="MathJax-Element-334-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tmix=O(nlogā”n)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>tmix=O(nlogn)tmix=O(nlogā”n)
t_{\rm mix} = O(n\log n).
The idea is to use another coupling argument. Let <span id="MathJax-Element-335-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0),Y(0)ā{Ā±1}Vā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0),Y(0)ā{Ā±1}VX(0),Y(0)ā{Ā±1}V
X(0), Y(0) \in {\{\pm 1\}}^V differ in one coordinate, i.e., <span id="MathJax-Element-336-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)aā Y(0)aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)aā Y(0)aX(0)aā Y(0)a
{X(0)}_a \ne {Y(0)}_a and <span id="MathJax-Element-337-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="X(0)i=Y(0)iā role=āpresentationā style=āfont-size: 110%; position: relative;ā>X(0)i=Y(0)iX(0)i=Y(0)i
{X(0)}_i = {Y(0)}_i for <span id="MathJax-Element-338-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="iā aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>iā aiā a
i \ne a. We want to argue that the expected distance between the processes will decrease. The proof uses a generalization of Glauber dynamics called block Glauber dynamics. In Glauber dynamics, we take a single spin and resample it conditioned on its neighbors. In block Glauber dynamics, we take an <span id="MathJax-Element-339-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="LĆLā role=āpresentationā style=āfont-size: 110%; position: relative;ā>LĆLLĆL
L\times L box and resample it conditioned on its neighbors. There is an argument, called canonical paths, which can be used to show that if block Glauber dynamics mixes, then regular Glauber dynamics also mixes (slightly more slowly; we lose a <span id="MathJax-Element-340-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="polyā”(L)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>poly(L)polyā”(L)
\operatorname{poly}(L) factor, but anyway <span id="MathJax-Element-341-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Lā role=āpresentationā style=āfont-size: 110%; position: relative;ā>LL
L will be a large constant) so analyzing block Glauber dynamics is fine.
If <span id="MathJax-Element-342-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>aa
a lies in the box, then the expected change in distance is <span id="MathJax-Element-343-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āL2/nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āL2/nāL2/n
-L^2/n. If <span id="MathJax-Element-344-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>aa
a is far away from the box, then there is no change. If <span id="MathJax-Element-345-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>aa
a is in the boundary of the box, then it is possible for the distance to increase. However, strong spatial mixing allows us to control the influence of a single site, so the expected change in distance is bounded by <span id="MathJax-Element-346-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="O(LĪ¾2/n)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>O(LĪ¾2/n)O(LĪ¾2/n)
O(L\xi^2/n). Now, since <span id="MathJax-Element-347-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī¾ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī¾Ī¾
\xi is a constant, if we choose <span id="MathJax-Element-348-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Lā role=āpresentationā style=āfont-size: 110%; position: relative;ā>LL
L sufficiently large, then we will have the same situation as in the high temperature case: the expected distance will exponentially shrink over time.
Quantum systems
The quantum version of Markov chains has many more difficulties. The first difficulty is that the Hammersley-Clifford theorem (which we have been relying on throughout this blog post) fails.
Notation
To properly discuss what we mean, letās set up some notation. Readers already familiar with density matrices, quantum entropy, and quantum mutual information may wish to skip to the next subsection. Most of the time we discuss quantum objects here, weāll be using density matricies, often denoted <span id="MathJax-Element-349-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\rho. A density matrix can be thought of as an extension to regular quantum states <span id="MathJax-Element-350-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|Ļā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>|Ļā©|Ļā©
\ket{\psi}, where there is some classical source of uncertainty.
A density matrix is a positive semidefinite matrix with trace <span id="MathJax-Element-351-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1. This extends the notion of a classical probability distribution; in the quantum setting, a classical probability distribution <span id="MathJax-Element-352-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pp
p (thought of as a vector whose entries sum to <span id="MathJax-Element-353-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1) is represented as the density matrix <span id="MathJax-Element-354-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="diagā”(p)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>diag(p)diagā”(p)
\operatorname{diag}(p).
For example, we can consider a situation in which there is a <span id="MathJax-Element-355-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>1/21/2
1/2 probability that we started with the quantum state <span id="MathJax-Element-356-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|Ļā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>|Ļā©|Ļā©
\ket{\psi} and a <span id="MathJax-Element-357-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>1/21/2
1/2 probability that we started with the quantum state <span id="MathJax-Element-358-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|Ļā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>|Ļā©|Ļā©
\ket{\phi}. This would be denoted as follows:
<span id="MathJax-Element-359-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Ļ=12|Ļā©āØĻ|+12|Ļā©āØĻ|ā role=āpresentationā>Ļ=12|Ļā©āØĻ|+12|Ļā©āØĻ|Ļ=12|Ļā©āØĻ|+12|Ļā©āØĻ|
\rho = \frac{1}{2} \ket{\psi} \bra{\psi} + \frac{1}{2} \ket{\phi} \bra{\phi}
Density matricies are generally useful for a lot of tasks, but for our purposes a density matrix will be used to discuss both the classical and quantum āuncertaintyā we have about what state we have.
Now letās also talk about a second important piece of notation: the tensor product. Often when discussing quantum states, it is important to discuss multiple quantum states simultaneously. For example, Alice has one system <span id="MathJax-Element-360-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and Bob has another system <span id="MathJax-Element-361-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B. However, these systems might be entangled, meaning that the results of the two systems are correlated.
For instance, let us consider the following state:
<span id="MathJax-Element-362-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="|Ļā©=12(|+ā©A|+ā©B+|āā©A|āā©B)ā role=āpresentationā>|Ļā©=1ā2(|+ā©A|+ā©B+|āā©A|āā©B)|Ļā©=12(|+ā©A|+ā©B+|āā©A|āā©B)
\ket{\psi} = \frac{1}{\sqrt{2}}\left(\ket{+}_A \ket{+}_B + \ket{-}_A \ket{-}_B \right)
This particular state has the property that Alice and Bob will always both measure <span id="MathJax-Element-363-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="+ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>++
+ or they will both measure <span id="MathJax-Element-364-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āā
-. The notation for tensors is often ambiguous in the literature as there are many ways of specifying tensors. For instance, above we used subscripts to explicitly denote which particle was in system <span id="MathJax-Element-365-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and which was in system <span id="MathJax-Element-366-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B. One may also choose to simply use the index of the system as below. The symbol <span id="MathJax-Element-367-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āā
\otimes is used to denote a tensor between states (where it is assumed that the first state is system <span id="MathJax-Element-368-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and the second, system <span id="MathJax-Element-369-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B). Gradually folks may shorten the notation as follows:
<span id="MathJax-Element-370-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="|Ļā©=12(|+ā©|+ā©+|āā©|āā©)|Ļā©=12(|++ā©+|āāā©)|Ļā©=12(10)ā(01)ā role=āpresentationā>|Ļā©=1ā2(|+ā©|+ā©+|āā©|āā©)|Ļā©=1ā2(|++ā©+|āāā©)|Ļā©=1ā2(10)ā(01)|Ļā©=12(|+ā©|+ā©+|āā©|āā©)|Ļā©=12(|++ā©+|āāā©)|Ļā©=12(10)ā(01)
%
These are all notations for the same state. Letās now talk about this state in the context of a density matrix. The density matrix of this state is as follows:
<span id="MathJax-Element-371-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻA,B=12(|++ā©+|āāā©)(āØ++|+āØāā|)ĻA,B=12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)ĻA,B=12(1001000000001001)ā role=āpresentationā>ĻA,B=12(|++ā©+|āāā©)(āØ++|+āØāā|)ĻA,B=12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)ĻA,B=12āā ā āā1001000000001001āā ā āā ĻA,B=12(|++ā©+|āāā©)(āØ++|+āØāā|)ĻA,B=12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)ĻA,B=12(1001000000001001)
%
Writing the density matrix <span id="MathJax-Element-372-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\rho as <span id="MathJax-Element-373-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻA,Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻA,BĻA,B
\rho_{A,B} makes explicit that this is the density matrix over systems <span id="MathJax-Element-374-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and <span id="MathJax-Element-375-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B.
A crucial operation that one will often perform using density matricies is the partial trace. The partial trace is a way of allowing us to consider only a smaller part of the larger part of the system, while taking into account the influence of the larger system around it.
Hereās an example: Suppose Bob wants to know what his state is. However, Bob really doesnāt care about Aliceās system and just wants to know what the density matrix for his system is. Bobās density matrix is simply the following density matrix (a 50% chance of being in <span id="MathJax-Element-376-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|+ā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>|+ā©|+ā©
\ket{+} and a 50% chance of being in <span id="MathJax-Element-377-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="|āā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>|āā©|āā©
\ket{-}).
<span id="MathJax-Element-378-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻB=12(|+ā©āØ+|+|āā©āØā|)ā role=āpresentationā>ĻB=12(|+ā©āØ+|+|āā©āØā|)ĻB=12(|+ā©āØ+|+|āā©āØā|)
%
More explicitly, we could write the following:
<span id="MathJax-Element-379-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻB=12(|+ā©BāØ+|B+|āā©BāØā|B)ā role=āpresentationā>ĻB=12(|+ā©BāØ+|B+|āā©BāØā|B)ĻB=12(|+ā©BāØ+|B+|āā©BāØā|B)
%
The partial trace is an operation that will let us take our original density matrix <span id="MathJax-Element-380-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻA,Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻA,BĻA,B
\rho_{A,B} and generates a new density matrix <span id="MathJax-Element-381-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻBā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻBĻB
\rho_B that ignores system <span id="MathJax-Element-382-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A. This is specifically called the partial trace over <span id="MathJax-Element-383-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A, or <span id="MathJax-Element-384-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="trAā role=āpresentationā style=āfont-size: 110%; position: relative;ā>trAtrA
\operatorname{tr}_A.
So how do we do this? We simply sum over the state <span id="MathJax-Element-385-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A (effectively taking a trace, but only along one axis):
<span id="MathJax-Element-386-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="trAā”ĻA,B=āiāØi|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|iā©Aā role=āpresentationā>trAĻA,B=āiāØi|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|iā©AtrAā”ĻA,B=āiāØi|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|iā©A
%
This is easier to evaluate using certain choices of notation:
<span id="MathJax-Element-387-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="trAā”ĻA,B=āØ+|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|+ā©A+āØā|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|āā©A=12(|+ā©BāØ++|+|+ā©BāØāā|)|+ā©A+12(|āā©BāØ++|+|āā©BāØāā|)|āā©A=12(|+ā©BāØ+|B)+12(|āā©BāØā|B)=12(|+ā©BāØ+|B+|āā©BāØā|B)=ĻBā role=āpresentationā>trAĻA,B=āØ+|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|+ā©A+āØā|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|āā©A=12(|+ā©BāØ++|+|+ā©BāØāā|)|+ā©A+12(|āā©BāØ++|+|āā©BāØāā|)|āā©A=12(|+ā©BāØ+|B)+12(|āā©BāØā|B)=12(|+ā©BāØ+|B+|āā©BāØā|B)=ĻBtrAā”ĻA,B=āØ+|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|+ā©A+āØā|A12(|++ā©āØ++|+|āāā©āØ++|+|++ā©āØāā|+|āāā©āØāā|)|āā©A=12(|+ā©BāØ++|+|+ā©BāØāā|)|+ā©A+12(|āā©BāØ++|+|āā©BāØāā|)|āā©A=12(|+ā©BāØ+|B)+12(|āā©BāØā|B)=12(|+ā©BāØ+|B+|āā©BāØā|B)=ĻB
%
This gives us the answer that we had expected.
We now have all of the tools we need to talk about quantum entropy. Intuitively, entropy can be thought of as the amount of uncertainty we have for our system, or equivalently the amount of information it takes to define our system. The entropy for a quantum system <span id="MathJax-Element-388-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\rho is defined as follows:
<span id="MathJax-Element-389-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="H(Ļ)=ātrā”(Ļlog2ā”Ļ)ā role=āpresentationā>H(Ļ)=ātr(Ļlog2Ļ)H(Ļ)=ātrā”(Ļlog2ā”Ļ)
%
Note that here we use the shorthand <span id="MathJax-Element-390-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻBā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻBĻB
\rho_B to denote <span id="MathJax-Element-391-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="trAā”ĻA,Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>trAĻA,BtrAā”ĻA,B
\operatorname{tr}_A \rho_{A,B}. Here, writing <span id="MathJax-Element-392-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="trā role=āpresentationā style=āfont-size: 110%; position: relative;ā>trtr
\operatorname{tr} without the subscript indicates that this is the full or normal trace that one might expect (or equivalently performing the partial trace over all systems). We can now define the conditional entropy of a system as follows:
<span id="MathJax-Element-393-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="H(Aā£B)Ļ=H(ĻA,B)āH(ĻB)ā role=āpresentationā>H(Aā£B)Ļ=H(ĻA,B)āH(ĻB)H(Aā£B)Ļ=H(ĻA,B)āH(ĻB)
%
This definition intuitively makes sense since we can think of conditional entropy as the amount of information it takes to describe our joint system <span id="MathJax-Element-394-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(A,B)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>(A,B)(A,B)
(A,B), given that we already know what <span id="MathJax-Element-395-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B is.
We can now discuss quantum mutual information, the amount of information that measuring system <span id="MathJax-Element-396-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A will provide you about system <span id="MathJax-Element-397-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B. Like the classical case, this is defined as follows:
<span id="MathJax-Element-398-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="I(A;B)Ļ=H(A,B)ĻāH(Aā£B)ĻāH(Bā£A)Ļā role=āpresentationā>I(A;B)Ļ=H(A,B)ĻāH(Aā£B)ĻāH(Bā£A)ĻI(A;B)Ļ=H(A,B)ĻāH(Aā£B)ĻāH(Bā£A)Ļ
%
We can now finally discuss quantum mutual information (QCMI), defined as follows: <span id="MathJax-Element-399-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(A;Bā£C)Ļ=I(A;B,C)ĻāI(A;C)Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>I(A;Bā£C)Ļ=I(A;B,C)ĻāI(A;C)ĻI(A;Bā£C)Ļ=I(A;B,C)ĻāI(A;C)Ļ
{I(A;B \mid C)}_\rho = {I(A;B,C)}_\rho ā {I(A;C)}_\rho. With some algebraic simplifications, one can arrive at the expression:
<span id="MathJax-Element-400-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="I(A;Bā£C)Ļ=H(A,C)Ļ+H(B,C)ĻāH(A,B,C)ĻāH(C)Ļ.ā role=āpresentationā>I(A;Bā£C)Ļ=H(A,C)Ļ+H(B,C)ĻāH(A,B,C)ĻāH(C)Ļ.I(A;Bā£C)Ļ=H(A,C)Ļ+H(B,C)ĻāH(A,B,C)ĻāH(C)Ļ.
%
The QCMI equals <span id="MathJax-Element-401-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>00
0 if and only if <span id="MathJax-Element-402-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\rho is a quantum Markov state. Classically, the entropic characterization of conditional independence corresponds to an algebraic characterization.
Recovery Maps
Here, the algebraic characterization is more grueling. We have
<span id="MathJax-Element-403-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻABC=expā”(logā”ĻAB+logā”ĻBCālogā”ĻB)ā role=āpresentationā>ĻABC=exp(logĻAB+logĻBCālogĻB)ĻABC=expā”(logā”ĻAB+logā”ĻBCālogā”ĻB)
\rho_{ABC} = \exp(\log \rho_{AB} + \log \rho_{BC} ā \log \rho_B)
Equivalently,
<span id="MathJax-Element-404-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻABC=ĻAB1/2ĻBā1/2ĻBCĻBā1/2ĻAB1/2=RBāAB(ĻBC)ā role=āpresentationā>ĻABC=Ļ1/2ABĻā1/2BĻBCĻā1/2BĻ1/2AB=RBāAB(ĻBC)ĻABC=ĻAB1/2ĻBā1/2ĻBCĻBā1/2ĻAB1/2=RBāAB(ĻBC)
\rho_{ABC} = \rho_{AB}^{1/2} \rho_B^{-1/2} \rho_{BC}\rho_B^{-1/2} \rho_{AB}^{1/2} = R_{B\to AB}(\rho_{BC})
Here, <span id="MathJax-Element-405-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="RBāABā role=āpresentationā style=āfont-size: 110%; position: relative;ā>RBāABRBāAB
R_{B\to AB} is called the Petz recovery map,4 <span id="MathJax-Element-406-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻBāAB(X)=ĻAB1/2ĻBā1/2XĻBā1/2ĻAB1/2ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻBāAB(X)=Ļ1/2ABĻā1/2BXĻā1/2BĻ1/2ABĻBāAB(X)=ĻAB1/2ĻBā1/2XĻBā1/2ĻAB1/2
\rho_{B\to AB}(X) = \rho_{AB}^{1/2} \rho_B^{-1/2} X\rho_B^{-1/2} \rho_{AB}^{1/2}. One can think of a recovery may as a way that we can reconstruct the entire system <span id="MathJax-Element-407-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="A,Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>A,BA,B
A, B using just system <span id="MathJax-Element-408-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B. It is not obvious that this is a quantum channel, but it is.
Suppose <span id="MathJax-Element-409-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\rho is a probability distribution, so <span id="MathJax-Element-410-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļ=diagā”(p)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ļ=diag(p)Ļ=diagā”(p)
\rho =\operatorname{diag}(p) for some vector <span id="MathJax-Element-411-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pp
p. Then, all of the density matrices are diagonal and commuting. Then, the recovery map means that we divide by <span id="MathJax-Element-412-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pBā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pBpB
p_B and multiply by <span id="MathJax-Element-413-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pABā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pABpAB
p_{AB}, i.e., multiply by <span id="MathJax-Element-414-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="pAā£Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>pAā£BpAā£B
p_{A \mid B}. This is the natural thing to do if we lost our information about <span id="MathJax-Element-415-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and were trying to figure out what <span id="MathJax-Element-416-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A was based on our knowledge of <span id="MathJax-Element-417-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B. This is why <span id="MathJax-Element-418-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="RBāA,Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>RBāA,BRBāA,B
R_{B\to A,B} is known as a recovery map, and it is used to discuss conditional distributions in the quantum setting. In the classical case, if we start with <span id="MathJax-Element-419-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="B,Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>B,CB,C
B, C, look only at <span id="MathJax-Element-420-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B, and use this to reconstruct <span id="MathJax-Element-421-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A, then we would have the whole state in a Markov chain. That is why this is a plausible quantum version of being a Markov chain.
However, quantum Gibbs states are not, in general, quantum Markov chains. The failure of this statement to hold is related to topological order, which is similar to the degrees of freedom that show up in error correcting codes.
Quantum Markov Networks
Here, we will formally define a quantum Markov network. The reference for this is [7].
Let <span id="MathJax-Element-422-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="G=(V,E)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>G=(V,E)G=(V,E)
G = (V, E) be a finite graph. We associate with each vertex <span id="MathJax-Element-423-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="vāVā role=āpresentationā style=āfont-size: 110%; position: relative;ā>vāVvāV
v \in V a Hilbert space <span id="MathJax-Element-424-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Hvā role=āpresentationā style=āfont-size: 110%; position: relative;ā>HvHv
{\mathcal{H}}_v and we consider a density matrix <span id="MathJax-Element-425-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻVā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻVĻV
\rho_V acting on <span id="MathJax-Element-426-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āØvāVHvā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āØvāVHvāØvāVHv
\bigotimes_{v\in V} {\mathcal{H}}_v. Then, <span id="MathJax-Element-427-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(G,ĻV)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>(G,ĻV)(G,ĻV)
(G, \rho_V) is a quantum Markov network if for all <span id="MathJax-Element-428-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="UāVā role=āpresentationā style=āfont-size: 110%; position: relative;ā>UāVUāV
U\subseteq V, <span id="MathJax-Element-429-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Uā role=āpresentationā style=āfont-size: 110%; position: relative;ā>UU
U is conditionally independent of <span id="MathJax-Element-430-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Vā(UāŖāU)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Vā(UāŖāU)Vā(UāŖāU)
V \setminus (U \cup \partial U) given <span id="MathJax-Element-431-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="āUā role=āpresentationā style=āfont-size: 110%; position: relative;ā>āUāU
\partial U, where the conditional independence statement is w.r.t. <span id="MathJax-Element-432-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻVā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻVĻV
\rho_V and means that the corresponding QCMI satisfies <span id="MathJax-Element-433-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="I(U;Vā(UāŖāU)ā£āU)ĻV=0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>I(U;Vā(UāŖāU)ā£āU)ĻV=0I(U;Vā(UāŖāU)ā£āU)ĻV=0
{I(U; V\setminus (U \cup \partial U) \mid \partial U)}_{\rho_V} = 0.
A quantum Markov network is called positive if <span id="MathJax-Element-434-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻVā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻVĻV
\rho_V has full rank. (Recall that in the statement of the Hammersley-Clifford Theorem, , it is assumed that the distribution is strictly positive.)
Now, consider the following example. First, we introduce the Pauli matrices
<span id="MathJax-Element-435-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Ļx:=[0110],Ļz:=[100ā1],Ļy:=[0āii0].ā role=āpresentationā>Ļx:=[0110],Ļz:=[100ā1],Ļy:=[0āii0].Ļx:=[0110],Ļz:=[100ā1],Ļy:=[0āii0].
%
We define a Hamiltonian on three qubits <span id="MathJax-Element-436-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A, <span id="MathJax-Element-437-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B, <span id="MathJax-Element-438-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C by
<span id="MathJax-Element-439-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="H:=(ĻAxĻBx+ĻAyĻBy+ĻAzĻBz)IC+IA(ĻBxĻCx+ĻByĻCy+ĻBzĻCz)ā role=āpresentationā>H:=(ĻxAĻxB+ĻyAĻyB+ĻzAĻzB)IC+IA(ĻxBĻxC+ĻyBĻyC+ĻzBĻzC)H:=(ĻAxĻBx+ĻAyĻBy+ĻAzĻBz)IC+IA(ĻBxĻCx+ĻByĻCy+ĻBzĻCz)
H := (\sigma_A^x \sigma_B^x + \sigma_A^y \sigma_B^y + \sigma_A^z \sigma_B^z) I_C + I_A (\sigma_B^x \sigma_C^x + \sigma_B^y \sigma_C^y + \sigma_B^z \sigma_C^z)
(Juxtaposition in the above expression signifies the tensor product as discussed before.) Finally, for <span id="MathJax-Element-440-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī²>0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī²>0Ī²>0
\beta > 0, we define the Gibbs state
<span id="MathJax-Element-441-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="ĻA,B,C(Ī²):=1Z(Ī²)expā”(āĪ²H)ā role=āpresentationā>ĻA,B,C(Ī²):=1Z(Ī²)exp(āĪ²H)ĻA,B,C(Ī²):=1Z(Ī²)expā”(āĪ²H)
\rho_{A,B,C}(\beta) := \frac{1}{Z(\beta)} \exp(-\beta H)
The Hamiltonian here has local terms which correspond to interactions <span id="MathJax-Element-442-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(A,B)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>(A,B)(A,B)
(A,B), <span id="MathJax-Element-443-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="(B,C)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>(B,C)(B,C)
(B, C). However, it can be shown that the QCMI between <span id="MathJax-Element-444-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Aā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AA
A and <span id="MathJax-Element-445-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>CC
C conditioned on <span id="MathJax-Element-446-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Bā role=āpresentationā style=āfont-size: 110%; position: relative;ā>BB
B w.r.t. <span id="MathJax-Element-447-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻA,B,Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻA,B,CĻA,B,C
\rho_{A,B,C} is non-zero, which means that this is not a quantum Markov network w.r.t. the line graph <span id="MathJax-Element-448-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="A
B
Cā role=āpresentationā style=āfont-size: 110%; position: relative;ā>AāBāCAāBāC
A \leftrightarrow B \leftrightarrow C. This demonstrates the failure of the Hammersley-Clifford Theorem in the quantum setting.
Important Results
We will briefly discuss the results of two papers.
[8] This paper shows that mixing in space implies mixing in time in the quantum case. However, the result of the paper only applies to commuting Hamiltonians. For commuting Hamiltonians, it turns out that quantum Gibbs states are quantum Markov networks. They use a version of Glauber dynamics, which can be simulated on a quantum computer but are also plausible dynamics for a physical system in nature. This is a difficult paper to read, but it is worth digesting if you want to work in the field.
[9] This second paper is much easier and more general, covering non-commuting Hamiltonians, but it requires more conditions. They give a method of preparing the Gibbs state which can run on a quantum computer, but the dynamics are not plausible as a physical system because they are too complicated. The more complicated dynamics allows them to make the proof work. The paper also uses QCMI.
They have two assumptions. The first assumption looks like mixing in space (weak correlation decay). The second assumption is that the state looks approximately like a quantum Markov network (this is definitely not met in general). A very important paper in this space is a recent breakthrough ([10]) which characterizes quantum Markov chains. They show that if the QCMI is bounded by <span id="MathJax-Element-449-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļµā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻµĻµ
\epsilon, then the recovery map <span id="MathJax-Element-450-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="RBāA,B(ĻBC)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>RBāA,B(ĻBC)RBāA,B(ĻBC)
R_{B\to A,B}(\rho_{BC}) is <span id="MathJax-Element-451-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļµā²ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ļµā²Ļµā²
\epsilonā-close to <span id="MathJax-Element-452-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ĻABCā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻABCĻABC
\rho_{ABC}, i.e., low QCMI implies that the recovery map works well. This is trivial to prove classically, but very difficult in the quantum world.
The algorithm in [9] is very elegant. Essentially, we take the entire system and punch out constant-sized boxes. If we can reconstruct the region outside of the boxes, then we can use the recovery maps to reconstruct the regions inside of the boxes, and the boxes are far apart enough so they are almost independent. For this argument, we must assume that the QCMI decays exponentially. Whenever we have exponential decay, we get a correlation decay that sets the size of the boxes. It is very difficult to condition on quantum states, but recovery maps provide a sense in which it is meaningful to do so. The paper gives an efficient method of preparing Gibbs states and simulating quantum systems on quantum computers.
Additional reading
The standard treatment of information theory is [11]. This book contains definitions and properties of entropy, conditional entropy, mutual information, and conditional mutual information.
To see a treatment of the subject of Markov chains from the perspective of probability theory, see [12] or the mathematically more sophisticated counterpart [13]. An introduction to coupling can be found in [14], as well as [4] (the latter also contains an exposition to spatial mixing). The connection between Markov chain mixing and the so-called logarithmic Sobolev inequality is described in [15].
Appendix: Intuition for Markov chains
Random walk on the cycle
We have <span id="MathJax-Element-453-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>nn
n points on the cycle, <span id="MathJax-Element-454-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="0,1,ā¦,nā1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>0,1,ā¦,nā10,1,ā¦,nā1
0,1,\dotsc,n-1. At each step, we move left or right with probability <span id="MathJax-Element-455-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1/2ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>1/21/2
1/2. We can write the transition matrix as
<span id="MathJax-Element-456-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="T=S+Sā12ā³ role=āpresentationā>T=S+Sā12T=S+Sā12
T = \frac{S + S^{-1}}{2}
where <span id="MathJax-Element-457-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Sā role=āpresentationā style=āfont-size: 110%; position: relative;ā>SS
S is the shift operator <span id="MathJax-Element-458-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="S|xā©=|x+1modnā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>S|xā©=|x+1modnā©S|xā©=|x+1modnā©
S\ket{x} = \ket{x+1 \bmod n}. The matrix <span id="MathJax-Element-459-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Sā role=āpresentationā style=āfont-size: 110%; position: relative;ā>SS
S is diagonalized by the Fourier transform. Define, for <span id="MathJax-Element-460-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=0,1,ā¦,nā1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>k=0,1,ā¦,nā1k=0,1,ā¦,nā1
k=0,1,\dotsc,n-1,
<span id="MathJax-Element-461-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="|k~ā©=1nāx=0nā1expā”(2Ļikxn)|xā©ā role=āpresentationā>|~kā©=1ānnā1āx=0exp(2Ļikxn)|xā©|k~ā©=1nāx=0nā1expā”(2Ļikxn)|xā©
\ket{\tilde k} = \frac{1}{\sqrt n} \sum_{x=0}^{n-1} \exp\Bigl( \frac{2\pi i k x}{n} \Bigr) \ket{x}
We have the same amount of amplitude at every point, but there is a varying phase which depends on <span id="MathJax-Element-462-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā role=āpresentationā style=āfont-size: 110%; position: relative;ā>kk
k. If <span id="MathJax-Element-463-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>k=0k=0
k = 0, we get the all-ones vector. If <span id="MathJax-Element-464-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā role=āpresentationā style=āfont-size: 110%; position: relative;ā>kk
k is small, then the phase is slowly varying. If <span id="MathJax-Element-465-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā role=āpresentationā style=āfont-size: 110%; position: relative;ā>kk
k is large, then the phase is rapidly varying. Look at what happens after we apply the shift operator:
<span id="MathJax-Element-466-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="S|k~ā©=1nāx=0nā1expā”(2Ļikxn)|x+1modnā©=1nāx=1nexpā”(2Ļik(xā1)n)|xmodnā©=expā”(ā2Ļikn)|k~ā©.ā role=āpresentationā>S|~kā©=1ānnā1āx=0exp(2Ļikxn)|x+1modnā©=1ānnāx=1exp(2Ļik(xā1)n)|xmodnā©=exp(ā2Ļikn)|~kā©.S|k~ā©=1nāx=0nā1expā”(2Ļikxn)|x+1modnā©=1nāx=1nexpā”(2Ļik(xā1)n)|xmodnā©=expā”(ā2Ļikn)|k~ā©.
%
After the shift, we pick up an additional phase based on how rapidly the phase is varying. From this, we get:
<span id="MathJax-Element-467-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="T|k~ā©=expā”(2Ļik/n)+expā”(ā2Ļik/n)2|k~ā©=cosā”(2Ļkn)|k~ā©.ā role=āpresentationā>T|~kā©=exp(2Ļik/n)+exp(ā2Ļik/n)2|~kā©=cos(2Ļkn)|~kā©.T|k~ā©=expā”(2Ļik/n)+expā”(ā2Ļik/n)2|k~ā©=cosā”(2Ļkn)|k~ā©.
%
The eigenvalues are
<span id="MathJax-Element-468-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Ī»k=cosā”2Ļkn,k=0,1,ā¦,nā1.ā role=āpresentationā>Ī»k=cos2Ļkn,k=0,1,ā¦,nā1.Ī»k=cosā”2Ļkn,k=0,1,ā¦,nā1.
\lambda_k = \cos \frac{2\pi k}{n}, \qquad k=0,1,\dotsc,n-1.
Only <span id="MathJax-Element-469-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>k=0k=0
k = 0 will give me an eigenvalue of <span id="MathJax-Element-470-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1.
How do we analyze <span id="MathJax-Element-471-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tt|pā©ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Tt|pā©Tt|pā©
T^t \ket{p}? We should Fourier transform the distribution.
<span id="MathJax-Element-472-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 110%; text-align: center; position: relative;" data-mathml="Tt|pā©=Ttāk=0nā1pk|k~ā©=āk=0nā1pkĪ»kt|k~ā©.ā role=āpresentationā>Tt|pā©=Ttnā1āk=0pk|~kā©=nā1āk=0pkĪ»tk|~kā©.Tt|pā©=Ttāk=0nā1pk|k~ā©=āk=0nā1pkĪ»kt|k~ā©.
\begin{aligned} T^t \ket{p} = T^t \sum_{k=0}^{n-1} p_k \ket{\tilde{k}} = \sum_{k=0}^{n-1} p_k \lambda_k^t \ket{\tilde k}.\end{aligned}
If <span id="MathJax-Element-473-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="nā role=āpresentationā style=āfont-size: 110%; position: relative;ā>nn
n is odd, then as <span id="MathJax-Element-474-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="tāāā role=āpresentationā style=āfont-size: 110%; position: relative;ā>tāātāā
t\rightarrow\infty, <span id="MathJax-Element-475-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ī»ktā0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ī»tkā0Ī»ktā0
\lambda_k^t \to 0 for all <span id="MathJax-Element-476-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="k=1,ā¦,nā1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>k=1,ā¦,nā1k=1,ā¦,nā1
k=1,\dotsc,n-1, so <span id="MathJax-Element-477-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ttā|Ļā©āØ1n|ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>Ttā|Ļā©āØ1n|Ttā|Ļā©āØ1n|
T^t \to \ket{\pi}\bra{1_n}. Whatever you put into this operator, you get <span id="MathJax-Element-478-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Ļā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ĻĻ
\pi out.
Spectral gap
The example of the random walk on the cycle shows that there is generally a unique stationary distribution and suggests that the speed of convergence is determined by how close the other eigenvalues are to <span id="MathJax-Element-479-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1. Specifically, suppose for simplicity that the eigenvalues of <span id="MathJax-Element-480-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="Tā role=āpresentationā style=āfont-size: 110%; position: relative;ā>TT
T are <span id="MathJax-Element-481-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1=Ī»0ā„Ī»1ā„āÆā„0ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>1=Ī»0ā„Ī»1ā„āÆā„01=Ī»0ā„Ī»1ā„āÆā„0
1 = \lambda_0 \ge \lambda_1\ge\cdots \ge 0 (real and positive). Then, the convergence time is on the order of <span id="MathJax-Element-482-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ā¼1/(1āĪ»1)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>ā¼1/(1āĪ»1)ā¼1/(1āĪ»1)
\sim 1/(1-\lambda_1).
Typically, the distance of the eigenvalues from <span id="MathJax-Element-483-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="1ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>11
1 reflects the size of the physical system. Even from the simple example, we can get some physical intuition from this. If <span id="MathJax-Element-484-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="kā role=āpresentationā style=āfont-size: 110%; position: relative;ā>kk
k is small, then the spectral gap is <span id="MathJax-Element-485-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="cosā”(2Ļk/n)=1āO(k2/n2)ā role=āpresentationā style=āfont-size: 110%; position: relative;ā>cos(2Ļk/n)=1āO(k2/n2)cosā”(2Ļk/n)=1āO(k2/n2)
\cos(2\pi k/n) = 1-O(k^2/n^2). Thus, the convergence time is <span id="MathJax-Element-486-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" data-mathml="ā¼1/(1āĪ»1)ā¼n2ā³ role=āpresentationā style=āfont-size: 110%; position: relative;ā>ā¼1/(1āĪ»1)ā¼n2ā¼1/(1āĪ»1)ā¼n2
\sim 1/(1-\lambda_1) \sim n^2, which is indeed the convergence time for a random walk on a cycle.
References
S. Gharibian, Y. Huang, Z. Landau, and S. W. Shin, āQuantum Hamiltonian complexity,ā Found. Trends Theor. Comput. Sci., vol. 10, no. 3, pp. front matter, 159ā282, 2014.
R. W. Keener, Theoretical statistics. Springer, New York, 2010, p. xviii+538.
E. Crosson, D. Bacon, and K. R. Brown, āMaking Classical Ground State Spin Computing Fault-Tolerant,ā Physical Review E, vol. 82, no. 3, Sep. 2010.
C. Moore and S. Mertens, The nature of computation. Oxford University Press, Oxford, 2011, p. xviii+985.
F. Martinelli, āLectures on Glauber dynamics for discrete spin models,ā in Lectures on probability theory and statistics (Saint-Flour, 1997), vol. 1717, Springer, Berlin, 1999, pp. 93ā191.
F. Martinelli and E. Olivieri, āFinite volume mixing conditions for lattice spin systems and exponential approach to equilibrium of Glauber dynamics,ā in Cellular automata and cooperative systems (Les Houches, 1992), vol. 396, Kluwer Acad. Publ., Dordrecht, 1993, pp. 473ā490.
M. S. Leifer and D. Poulin, āQuantum graphical models and belief propagation,ā Ann. Physics, vol. 323, no. 8, pp. 1899ā1946, 2008.
M. J. Kastoryano and F. G. S. L. BrandĆ£o, āQuantum Gibbs samplers: the commuting case,ā Comm. Math. Phys., vol. 344, no. 3, pp. 915ā957, 2016.
F. G. S. L. BrandĆ£o and M. J. Kastoryano, āFinite correlation length implies efficient preparation of quantum thermal states,ā ArXiv e-prints, Sep. 2016.
O. Fawzi and R. Renner, āQuantum conditional mutual information and approximate Markov chains,ā Comm. Math. Phys., vol. 340, no. 2, pp. 575ā611, 2015.
T. M. Cover and J. A. Thomas, Elements of information theory, Second. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2006, p. xxiv+748.
R. Durrett, Essentials of stochastic processes. Springer, Cham, 2016, p. ix+275.
R. Durrett, Probability: theory and examples, Fourth., vol. 31. Cambridge University Press, Cambridge, 2010, p. x+428.
M. Mitzenmacher and E. Upfal, Probability and computing, Second. Cambridge University Press, Cambridge, 2017, p. xx+467.
F. Cesi, āQuasi-factorization of the entropy and logarithmic Sobolev inequalities for Gibbs random fields,ā Probab. Theory Related Fields, vol. 120, no. 4, pp. 569ā584, 2001.
This is the opposite of the probabilistsā convention, i.e., the transition probability matrix that we define here is the transpose of the one usually found in most probability theory textbooks.Ā ā©
As a side note, it may be a good research question to investigate to what extent quantum algorithms can be used to compute summations whose terms are possibly negative. In quantum Monte Carlo, the quantum Hamiltonian is converted to a classical energy function; this conversion always works, but sometimes you end up with complex energies, which is terrible for estimating the partition function because terms can cancel each other out.Ā ā©
You may recognize this as the total variation norm.Ā ā©
Petz wrote about quantum relative entropy in 1991, way before it was cool.Ā ā©
Windows On Theory published first on Windows On Theory
0 notes
Text
n e g a t i v i t y Ā w e e k
Huy ano puro negativity na-aabsorb ko ngayon linggo. Hindi nakakatulong sa totoo lang, sobrang affected ako ng negativity ng surroundings ko. Entropy is really greater than zero. My patience is sagad na bes, all i did every time someone is annoying me is to shout at them coz they are really getting on my nerve. An exception to my madness is the only person who motivate me na kaya ko to, thank you girl! The rest are annoying me. Ang weird maybe dahil rin sa hormones kaya ganito ako. Like i find everything around me maddening to the highest level. Back to old me again? :-(
Im ranting on tumblr kasi my Twitter acct is inactive and people will gossip and judge you and act like they care but the truth is they dont really give a fuck.Ā
Sorry, i just need to get this thought out of my head. And this is not good coz im putting a lot of negativity here. šš
0 notes
Video
vimeo
musicians with guns - overstepping artifacts from ricardo montalban on Vimeo.
Musicians With Guns - Overstepping Artifacts LP (drone/ambient/eperimental/LoFi music) ---> entropy-records.com/release3032.htm
*** New fractal video for Art Zero label here : vimeo.com/85096694 ***
> / Ā°Ā°000 - #CLS > / Ā°Ā°010 - #LOAD /mwg/protocol/crawling through the amazing box -r -p -s -n > / Ā°Ā°022 - #RUN -- > / Ā°Ā°143 - #ERR [00054d64-ba6c5411] ; "nonlinear overlapping function" > / Ā°Ā°285 - #LIST [4016457fa-6552b31c] > / Ā°Ā°299 - #DEBUG [23d3c447-9ff4c812] > / Ā°Ā°356 - #COORD [????????-????????] > / Ā°Ā°387 - #TRACE [535b8c47-88f5a7bb] ; "analytical distance estimator" > / Ā°Ā°408 - #ERR [012d2556-9a9b1445] ; "overstepping artifacts" > / Ā°Ā°437 - #LOSTINTIMEANDSPACE [8342dc41-e4a22f1b] > / Ā°Ā°446 - #MOFRACT [7271dd5c2 - a948b8d4] ; "half octahedral planes" > / Ā°Ā°489 - #ROTATE [61d5f4aa-62b49c10] ; "helispiral rotation" > / Ā°Ā°524 - #TRAPPED [33a49ee2-bb6462c7] > / Ā°Ā°551 - #ZOOM [+inf/-inf -g -s -t] > / Ā°Ā°637 - #ERR [e682b6d4-fe945aa0] ; "lost in a different coordinate system" > / Ā°Ā°793 - #NOEND /mwg/cometa -i
0 notes
Text
Don't Fear The Reaper
I've mostly stopped development on my honeypot. Instead, I'm just watching the logs for any unusual behavior. About 3 weeks ago, I noticed something odd. The number of SYN-scans against 23/tcp took a dramatic uptick. The ports under attack looks similar to the Mirai botnet, so this appeared to be yet-another Mirai variant. I've been profiling different types of botnets, and I call this one 'Botnet[E]'. (Botnet[A] refers to the network library used by Mirai.) Except... My honeypot records a lot of detail about the packet header. One of the things that I began recording last month is the TCP sequence number. And with these attacks, I kept seeing the same sequence numbers over and over. In this case, the TCP sequence number (a 4-byte value) was the exact same as the IP target address (a 4-byte value). This becomes a very distinct value for identifying this particular scanning bot. And based on the number of IP addresses during the scanning, this is a huge botnet.
Sequencing
The TCP header contains two fields for counters. There's the sequence number and the acknowledgment number. These two fields work together during the TCP 3-way handshake and data transfer:
Handshake step 1: SYN. During the initial connection, the client sends a SYN packet to the server. The SYN packet sets the initial sequence number. This could be literally any 4-byte value -- it just needs to be set. The ACK number is ignored and usually zeros.
Handshake step 2: SYN-ACK. The server responds with a SYN-ACK. The server sets it's own initial sequence number, and responds with the client's sequence number as the acknowledgment number. This tells the client that the server has synchronized with the client's counter.
Handshake step 3: ACK. The client responds with an ACK. The sequence number should not change, and the server's sequence number becomes the acknowledgment number. This tells the server that the client has synchronized with the server's counter.
Data transfers. After the handshake, both sides start exchanging data. The server's sequence number increases by one byte for every byte of data that the server sends to the client. And the client's sequence number increases by one byte for every byte of data that the client sends to the server. The corresponding ACK packets tell each side how much data has been received and whether to resend data that got lost.
Different systems use different methods to initialize these TCP sequence numbers:
Old operating systems. Old systems would initialize it based on something like the current time, current uptime, or just use an incremental counter. The time-based initializers were fun, because you could tell the system's time or how long the computer had been up since the last reboot just by looking at the initial sequence number. However, if an attacker can guess what the sequence number will be, then they could hijack the TCP connection. For this reason, most systems have moved away from time-based initial values. Today, time-based initial values are still the case with some embedded devices.
Modern operating systems. Most of today's computers use an effectively random value to defeat information leakage.
Scanners. Tools like masscan and zmap are capable of scanning the entire Internet in under an hour. They can sling out a huge number of packets. Of course, they don't want to store every outgoing packet sequence number. Instead, they just want to send out SYN packets and see what (if anything) responds. These tools initialize the sequence number with a pre-determined value. For example, the initial sequence could be the 1st four bytes of hash(entropy XOR target IP address XOR port number). This makes the sequence numbers look random, but the scanner can tell when a valid response is received. (This stops jerks from flooding the scanner with fake SYN-ACK responses. Since you don't know the sequence number, the fake SYN-ACK is ignored.)
Botnets. Different botnets use different initial sequence values. One botnet that I see often uses an initial sequence number of 0x00000000. Another botnet uses a two-byte value, leaving the first two bytes zero. For example, seq=00000e44 and seq=0000441a. And a different botnet leaves the last two bytes zero: seq=0a4d0000 and seq=467e0000. In these cases, it's not just that it's a botnet. I can tell the botnet family based on the sequence number format.
And that brings us back to this new botnet. It initializes the TCP sequence number with the target's IP address. For example, if the SYN packet is sent from 201.52.93.90 to 182.231.151.141, then the initial SYN packet's sequence number will be 0xb6e7978d (that's 182.231.151.141 written in hex, or 3,068,630,925 in decimal). This is actually kind of brilliant. Each bot slings out packets and doesn't store any information. When a response comes back, the botnet can identify the sender by the sequence number. Moreover, if the response sequence number doesn't come from the sender's network address, then they can use this information to partially map out network addresses. (Although more likely, they just use it to check if the SYN-ACK is valid, or ignore it altogether and just use the SYN-ACK packet's sender address.) Of course, since I know the sequence number, I can be a jerk and send back SYN-ACK responses for arbitrary addresses. I just need to wait until my honeypot receives one of these scans. At that point, I know the IP address of an infected IoT devices. Then I can start sending back fake packets with fake IP addresses but valid sequence numbers in the ACK field. This will make the botnet think that lots of systems are vulnerable, even when they aren't. Each node in the botnet may end up spending a lot of time attacking ghosts. This either slows down the scan as they attack nothing, or fills the attack queue with garbage. And yes, I've been doing this for a week. Here's my script:
#!/bin/bash # Attack back ip2dec () { local a b c d ip=$@ IFS=. read -r a b c d <<< "$ip" printf '%d\n' "$((a * 256 ** 3 + b * 256 ** 2 + c * 256 + d))" } tail -f /var/log/honeypot.log | grep --line-buffered seq=ip.src | sed -u -e 's@^.*honeypot: @@' -e 's@\[@ @g' -e 's@\]@ @g' -e 's@/tcp@@g' | # My logs include source address, source port, destination address, and destination port # The source is the botnet. while read src sport junk dst dport junk ; do # Send 10 fake responses loop=0; while [ $loop -lt 10 ] ; do # Don't use the real destination; select one at random. # Or I could loop over specific addresses I want the botnet to attack. dst="$((RANDOM%6)).$((RANDOM%6)).$((RANDOM%6)).$((RANDOM%6))" # Derive the sequence and acknowledgment numbers from the addresses. seq=$(ip2dec "$src") ack=$(ip2dec "$dst") # Get the date for logging date=$(date +"%Y-%m-%d %H:%M:%S %Z") echo -n "$date " # SYN goes from src to dst #echo "SYN from $src:$sport[$ack] to $dst:$dport" # SYN-ACK goes from dst to src echo "SYN-ACK from $dst:$dport[$seq] to $src:$sport[$ack]" # use hping3 to send the fake packet hping3 -q -i 1u -c 1 -a "$dst" -L "$ack" -M "$seq" -t 242 -s "$dport" -p "$sport" -S -A "$src" > /dev/null 2>&1 ((loop=$loop+1)) done done
My honeypot is currently receiving scans from this botnet about 1-3 times per minute. And it is rare to see the same infected IP address twice. Nearly all are scans for 23/tcp. There is a small percent of scans for 22/tcp, 2323/tcp, and other ports.
Comparing Results
Good science means that results can be repeated, and the same results can be determined using different means. I've been communicating with Troy Mursch from Bad Packets. He runs a honeypot but doesn't use the same tools as me. (I wrote my own tools, so I'm very certain that we don't have the same setup.) Every now and then, I'll send him a query like, "Are you seeing any activity from xx.xx.xx.xx?" or he'll ask if I just received a flood from yy.yy.yy.yy. (It's really entertaining when the initial response is "no", but a few hours later it changes to "yes -- they just showed up.") Troy and I began comparing notes and findings with regards to packets with this specific fingerprint. We were seeing the same volume of traffic, same types of source hosts, and even the same ports being scanned. (The vast majority of scans were for 23/tcp. A distant second came 22/tcp, then 2323/tcp, and other common IoT ports.) Troy has a write-up of his findings at "EnGenius Routers Found in Reaper Botnet with Distinct Network Fingerprint". Going back through our logs: I had been seeing these scans since mid-September. At least, that's when I started tracking sequence numbers. Troy traced through his logs and found this same profile dating back to early May 2017. We compared the observed source addresses and, amazingly, had very little overlap. That means that this botnet is massive. Something in the "millions of infected hosts" range. We had been working on tracking this botnet for about a week, when the news broke. There's a new big botnet in town. Checkpoint Research calls it "IoTroop", but other security companies are calling it "Reaper" or "IoT_reaper". Like the Mirai botnet, Reaper infects IoT devices. But there's a twist. With Mirai, it just looked for default logins on IoT devices. Reaper also uses a few known exploits to compromise the IoT devices. As Andy Greenberg at Wired described it, "It's the difference between checking for open doors and actively picking locks." As soon as I read the description and the timeframe for when it appeared, I became convinced that Troy and I had been looking at packets from Reaper. It's the same timeframe, same volume, and same types of devices. As botnet infections go, this one is bad because of the sheer size of it. If the owner of this botnet were to use it for a DDoS attack, the results would be enough to take Brian Krebs offline. (And Brian's been able to withstand some really massive DDoS attacks.)
Detecting Reaper
Most of the reports about the Reaper botnet lack details for detecting an infection. The folks over at Radware have a detailed report that lists ports that Reaper scans. However, I'm not able to validate their findings. For example, they say:
The first wave consists of SYN scans on TCP ports in the following order: 20480, 20736, 36895, 37151, 22528, 16671, 14340, 20992, 4135, 64288, 45090, 21248, 21504, 31775, 39455, 47115 and 42254.
I'm just not seeing this in my logs. For example, in the last five days, my honeypot has seen five (5) scans against 20480/tcp. All five of those had the TCP SYN sequence set to the destination IP address. In contrast, during that same timeframe my logs identified 4,212 scans from Reaper -- and most were against 23/tcp. Perhaps Radware is talking about how Reaper appears on the infected system, or how Reaper attacks a vulnerable system, and not how it looks as it tries to spread across the Internet. I'm also seeing a much wider range of infected devices that those listed in the other researcher reports. To test devices, I just waited for the IP address to scan me, and then I checked for a publicly accessible web server. For example, I saw a scan from 168.167.177.180 and saw an infected airOS router. (You can tell it's infected because of the code error at the top of the page.) And here's an accessible TripMate Nano that scanned my honeypot: NOTE: It defaults to filling in the username as 'admin'. I didn't type that. The TripMate Nano scan was shortly followed by a D-Link router: For network administrators who want to detect infected hosts from this new botnet: Look for SYN packets where tcp.seq==ip.dst. If you see a match, then the ip.src denotes an infected address. Either the device at that address is infected, or something behind that NAT router is infected. Source: http://ift.tt/2zS84di
0 notes
Text
[under construction] first year undergraduate thermodynamics: a simplified glossary, discussion and sign convention summary
V idk about you, but thermodynamics just fucks me up. every. fricking. time. the different sign conventions donāt help either. this a short summary of undergraduate first year thermo; ideally, you would have a textbook and notes to refer to, and this is just a no-nonsense rundown of need-to-knows. this should be nothing more than a springboard for you to review topics and refresh your memory. THIS IS BY NO MEANS A COMPREHENSIVE LIST.
Glossary (will be redefined within equations but important to know&understand these at the start) Ī change; theĀ ābig versionā of d derivative U (sometimes E) : internal energy.Ā Q: heat, a form of energy W: work, a form of energy H : enthalpy; the total heat of the system. It is different from Q in that it accounts for any heat lost as work in a system S: entropy. generally we say it is theĀ ādisorderā/randomnessĀ G: Gibbs free energy H: helmholtz free energy
Equations (we will go over these individually, but I find it convenient to have them listed at the start so we know what weāre going over; plus it gives me a guideline for how I can connect these and talk about them) ĪU =Ā ĪQ +Ā ĪW ĪU = mcĪT + Ā PĪV c.v = ĪH =Ā ĪU +Ā Ī(PV) ****differs from PĪV***** (not discussed here, but note H = ĪHproduct - Hreactant as well, when looking for heats of formation, etc) ĪS =Ā Īq/T F = U - TS G = U - TS + PV ***note on entropy S in statistical mechanics****
the real stuff ĪU = Ā ĪQ + Ā ĪW about: the first law of thermodynamics. itās another form of conservation of energy. It says that the total energy in an isolated system (((link to a page on isolated system))) is constant. theoretically, should you find Q and W, you know the energy of that system! of course this is unrealistic, but the basis of thermodynamics starts here. the equation, but in words: the total energy in this system (assumed isolated) is equal to the heat energy put into (+) Ā or released from the system (-) PLUS the work energy the system exerted on the system (+) or that the system did (-).Ā sign convention: depending on your book or course, you may seeĀ ĪU = Ā ĪQ - Ā ĪW. iām not entirely sure why, but for me, keeping things consistent makes snese. so I likeĀ ĪU =Ā ĪQ +Ā ĪW, since a positiveĀ ĪQ andĀ ĪW indicate energy being put into the system by heat and work. If you find the other way easier to understand, please work with that. but keep in mind what your teacher says the test will be like! Is it a state function? Yes
ĪU = mcĪT + Ā PĪVĀ about: So, this is the most common form ofĀ ĪU I think youāll see. Note that it for a system of gases only. Other equations of state will exist in other studies, but I donāt think thatās true for very basic thermodynamics. So this equation arises from Q = mc Ī T and W for gases in some sort of changeable volume (like a container with a piston) is Ā PĪV. Ā Ā Sign convention: -PĪV Ā is the normal convention. mc Ī T is the normal convention the equation, but in words: Is it a state function?: Itās just another form of the above, so yes
ĪH = Ā ĪU + Ā Ī(PV)Ā A related concept to internal energy, so Iāll go into it here rather than the second law.Ā About: This is mostly relevant for gases. In liquids/solids, the ĪU and ĪH are usually quite close because Ā there tends to be no or very small change in volume for stuff in those states. Gases, however, can expand/depress; and in doing so, the pressure it exerts on the environment or has exerted on it would change. The volume of a gas expanding or depressing will also change in those cases.Ā Ideally, we would have no escape of gas and a system that is isochoric and isobaric. however, for the most part, itās untrue. theĀ Ā Ī(PV) Ā is expanded out into P Ī V +Ā ĪPV. Ā **IT DETERMINES EXO/ENDOTHERMICITY** the equation, but in words: since most reactions are done in open containers (i.e., not isochoric or isobaric), internal energy is not total energy of system, you must account for any energy lost as work; i.e., gas pushing (pressure) against a container is work; the gas changing volume is work, etc.Ā Sign convention: None besides the ones associated withĀ ĪU Is it a state function? Yes
ALSO ĪH =Ā ĪHfinal -Ā ĪH intiial Hessās Law (see notes below)
Consider: water boiling in open container. The energy needed for water molecules to escape the surface of the liquid and overcome equilibrium of condensation/evaporation cycle is very high, more than is accounted for inĀ ĪU; soĀ ĪH cannot be zero.Ā
Ā Ā ĪS =Ā Īq/T (for a reversible process) and can be written as ĪU <= TĪS - PĪV,Ā ĪS >Ā Īq/T for irreversible process about: entropy has to do with the randomness of a system. the tendency of things to equa out and spread apart; i.e., something hot will try to disperse that heat to something less hot; molecules tend to spread away from each other. Ā Adds some moreĀ ārulesā about the energy in a system. The entropy of a system will be the same or tend to get bigger. Ā Important to note that in a reversible process, itās the above equation
ĪG =Ā ĪH - T ĪS gibbs free energy is a relationship that describes whether a chemical reaction is predicted to be spontaneous or not A negativeĀ ĪG indicates it is spontaneous inthe forward direction as written. A zeroĀ ĪG indicates equilibrium; i.e., no driving force forwards or backwards it becomes temperature indepedent whenĀ ĪS is a very small value in in either positive or negative direction; this is bcause the product of the temperature and the entropy would be smaller than the enthalpy change
some terms: spontaneity determines if a reaction will occur or not. IT DOES NOT DETERMINE RATE OR EXO/ENDO THERMICITY spontaneous reactions mean that the products are lower in free energy than reactants exo/endothermicity is whether a reaction gives off heat to environment or takes heat from itĀ
memorization listĀ (once you understand the concepts, this list is what you need to know on a fast basis. I donāt advocate for memorization EVER but if youāre shaky on topics and you have these memorized, you can sometimes reason your way into understanding something)
a (-) ĪU indicates the system lost energy/did work on the environment. something cooled off, or the container expandedĀ
a (+) ĪU indicates system got energy from the environment (heat put in by a bunsen burner or you squeezed the container)
(-) ĪQ is heat lost from the system to environment
(+) ĪQ is heat gained by the system from the environment
(-) ĪW is work done by the system on the environment
(+) ĪW is work done to the system by the environment
a (-)Ā ĪH indicates an exothermic reaction; the system released heat. (think of it as the system put out/lost energy as heat)
a (+)Ā ĪH indicates an endothermic reaction; the system had to take in heat to occur.Ā
a (-)Ā ĪS indicates a loss of entropy (more ordered)
a (+)Ā ĪS indicates an increase in entropy (more chaos)
a (-) ĪG indicates a spontaneous forward reaction, as the equation is written
a (+) ĪG indicates a nonspontaneous forward reaction, although the reverse *may* be
ĪG = 0 indicates equilibrium.Ā
when ĪH is (+) and ĪS is (+), ĪG could be either (+) or (-) depending on temperature, and is spontaneous at high temperature
when ĪH is (-) and ĪS is (+), ĪG is always (-) and always spontaneous
when ĪH is (-) and ĪS is (-), ĪG could be either (+) or (-) Ā depending on temperature, and is spontaneous at low temperature
when ĪH is (+) and ĪS is (-), ĪG is always (+), and is never spontaneousĀ
Discuss later: Hessās Law (of constant heat summation): the enthalpy change of any reaction is equal to the sum of the enthaly changes of a series of reactions into which the overall reaction may be divided. (bc enthalpy is a statefunction)Ā ĪH = sum H (bonds broken) - sum H (bonds formed) *** NOT Hfinal - Hinitial****
thermodyanmic potentials and their relationships [x]
0 notes