jjyao.github.io/atom.xml at master · jjyao/jjyao.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Jiajun Yao]]></title>
  <link href="http://blog.jjyao.me/atom.xml" rel="self"/>
  <link href="http://blog.jjyao.me/"/>
  <updated>2025-12-16T10:47:52-08:00</updated>
  <id>http://blog.jjyao.me/</id>
  <author>
    <name><![CDATA[jjyao]]></name>

  </author>
  <generator uri="http://octopress.org/">Octopress</generator>


  <entry>
    <title type="html"><![CDATA[Demystify Smaps]]></title>
    <link href="http://blog.jjyao.me/blog/2025/12/14/demystify-smaps/"/>
    <updated>2025-12-14T22:19:03-08:00</updated>
    <id>http://blog.jjyao.me/blog/2025/12/14/demystify-smaps</id>
    <content type="html"><![CDATA[<p>This post demystifies the Linux <code>/proc/&lt;pid&gt;/smaps</code> file and its <code>Shared_Clean</code>, <code>Shared_Dirty</code>, <code>Private_Clean</code> and <code>Private_Dirty</code> fields.</p>

<!-- more -->


<h2>What is smaps</h2>

<p>The smaps file in Linux provides detailed information for each of the process&rsquo;s VMAs (Virtual Memory Areas), represented by <a href="https://github.com/torvalds/linux/blob/v5.15/include/linux/mm_types.h#L319">struct vm_area_struct</a>. The content of smaps file of a process is constructed by <a href="https://github.com/torvalds/linux/blob/v5.15/fs/proc/task_mmu.c#L955">iterating</a> through each VMA and calling <a href="https://github.com/torvalds/linux/blob/v5.15/fs/proc/task_mmu.c#L811">show_smap</a>. The <code>Shared_Clean</code>, <code>Shared_Dirty</code>, <code>Private_Clean</code> and <code>Private_Dirty</code> fields are printed <a href="https://github.com/torvalds/linux/blob/v5.15/fs/proc/task_mmu.c#L790">here</a> and calculated <a href="https://github.com/torvalds/linux/blob/v5.15/fs/proc/task_mmu.c#L403">here</a>.</p>

<h2>Shared vs Private</h2>

<p>From the perspective of smaps, whether a page is considered shared or private is determined by its <a href="https://github.com/torvalds/linux/blob/v5.15/fs/proc/task_mmu.c#L467C18-L467C31">page_mapcount</a> which is basically <a href="https://github.com/torvalds/linux/blob/v5.15/include/linux/mm.h#L877">page->_mapcount</a> NOT by the <code>MAP_SHARED</code> or <code>MAP_PRIVATE</code> flags used in <a href="https://man7.org/linux/man-pages/man2/mmap.2.html">mmap</a>. <a href="https://github.com/torvalds/linux/blob/v5.15/include/linux/mm_types.h#L201">_mapcount</a> of a page is a counter tracks how many page table entries (PTEs) point to this physical page across ALL processes. A page is considered shared if there are at least 2 PTEs point to it.</p>

<p>Let&rsquo;s verify this with different cases:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>dd <span class="k">if</span><span class="o">=</span>/dev/zero <span class="nv">of</span><span class="o">=</span>/tmp/smaps  <span class="nv">bs</span><span class="o">=</span>1M <span class="nv">count</span><span class="o">=</span>1024
</span></code></pre></td></tr></table></div></figure>


<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp">#include &lt;stdio.h&gt;</span>
</span><span class='line'><span class="cp">#include &lt;stdlib.h&gt;</span>
</span><span class='line'><span class="cp">#include &lt;sys/mman.h&gt;</span>
</span><span class='line'><span class="cp">#include &lt;sys/stat.h&gt;</span>
</span><span class='line'><span class="cp">#include &lt;fcntl.h&gt;</span>
</span><span class='line'><span class="cp">#include &lt;unistd.h&gt;</span>
</span><span class='line'><span class="cp">#include &lt;string.h&gt;</span>
</span><span class='line'>
</span><span class='line'><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
</span><span class='line'>  <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">&quot;/tmp/smaps&quot;</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
</span><span class='line'>  <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="n">perror</span><span class="p">(</span><span class="s">&quot;Failed to open&quot;</span><span class="p">);</span>
</span><span class='line'>    <span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
</span><span class='line'>  <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">const</span> <span class="kt">int</span> <span class="n">num_mmaps</span> <span class="o">=</span> <span class="n">atoi</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
</span><span class='line'>  <span class="k">const</span> <span class="kt">bool</span> <span class="n">map_shared</span> <span class="o">=</span> <span class="n">strcmp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="s">&quot;MAP_SHARED&quot;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'>  <span class="k">const</span> <span class="kt">bool</span> <span class="n">map_populate</span> <span class="o">=</span> <span class="n">strcmp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="s">&quot;MAP_POPULATE&quot;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'>  <span class="n">printf</span><span class="p">(</span><span class="s">&quot;Command line args: num_mmaps=%d, map_shared=%d, map_populate=%d</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span>
</span><span class='line'>         <span class="n">num_mmaps</span><span class="p">,</span> <span class="n">map_shared</span><span class="p">,</span> <span class="n">map_populate</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_mmaps</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="kt">void</span> <span class="o">*</span><span class="n">map</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="mi">512</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
</span><span class='line'>                     <span class="p">(</span><span class="n">map_shared</span> <span class="o">?</span> <span class="nl">MAP_SHARED</span> <span class="p">:</span> <span class="n">MAP_PRIVATE</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">map_populate</span> <span class="o">?</span> <span class="nl">MAP_POPULATE</span> <span class="p">:</span> <span class="mi">0</span><span class="p">),</span>
</span><span class='line'>                     <span class="n">fd</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="n">map</span> <span class="o">==</span> <span class="n">MAP_FAILED</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>      <span class="n">perror</span><span class="p">(</span><span class="s">&quot;Failed to mmap&quot;</span><span class="p">);</span>
</span><span class='line'>      <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
</span><span class='line'>      <span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">int</span> <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span> <span class="n">map</span><span class="p">;</span>
</span><span class='line'>    <span class="n">printf</span><span class="p">(</span><span class="s">&quot;data = %d</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
</span><span class='line'>  <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="n">sleep</span><span class="p">(</span><span class="mi">2000000</span><span class="p">);</span>
</span><span class='line'>  <span class="p">}</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is a test program that mmaps 512M out of a 1GB file in various ways.</p>

<h3>Single MAP_SHARED mmap</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>./test <span class="m">1</span> MAP_SHARED NO_MAP_POPULATE
</span></code></pre></td></tr></table></div></figure>


<p>Content of smaps:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>7f418625d000-7f41a625d000 rw-s <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">64</span> kB
</span><span class='line'>Shared_Clean:          <span class="m">0</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:        <span class="m">64</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span><span class='line'>...
</span></code></pre></td></tr></table></div></figure>


<p>Several things to note here:</p>

<ul>
<li>The mmaped pages are considered private even though we use <code>MAP_SHARED</code> because each physical page only has 1 PTE points to it.</li>
<li>mmap is lazy and doesn&rsquo;t populate the page table unless MAP_POPULATE is specified and that&rsquo;s why we didn&rsquo;t see 524288 KB in Private_Clean.</li>
<li>We also didn&rsquo;t see 4 KB in Private_Clean even though we just read one integer from the file and that&rsquo;s due to <a href="https://github.com/torvalds/linux/blob/v5.15/mm/memory.c#L4107">fault-around</a>.</li>
</ul>


<h3>Multiple MAP_SHARED mmaps in single process</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>./test <span class="m">2</span> MAP_SHARED NO_MAP_POPULATE
</span></code></pre></td></tr></table></div></figure>


<p>Content of smaps:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>7f764be9a000-7f766be9a000 rw-s <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">32</span> kB
</span><span class='line'>Shared_Clean:         <span class="m">64</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span><span class='line'>...
</span><span class='line'>7f766be9a000-7f768be9a000 rw-s <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">32</span> kB
</span><span class='line'>Shared_Clean:         <span class="m">64</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span><span class='line'>...
</span></code></pre></td></tr></table></div></figure>


<p>In this case, we mmaps the same file twice in the same process so each page now has 2 PTEs point to it and that&rsquo;s why they are considered shared.</p>

<h3>Multiple MAP_SHARED mmaps in multiple processes</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>./test <span class="m">1</span> MAP_SHARED NO_MAP_POPULATE
</span></code></pre></td></tr></table></div></figure>


<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>./test <span class="m">1</span> MAP_SHARED NO_MAP_POPULATE
</span></code></pre></td></tr></table></div></figure>


<p>Content of smaps:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>7fb76d49e000-7fb78d49e000 rw-s <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">32</span> kB
</span><span class='line'>Shared_Clean:         <span class="m">64</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span></code></pre></td></tr></table></div></figure>


<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>7f4e1eb61000-7f4e3eb61000 rw-s <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">32</span> kB
</span><span class='line'>Shared_Clean:         <span class="m">64</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span></code></pre></td></tr></table></div></figure>


<p>In this case, we mmaps the same file once in two processes so each page also has 2 PTEs point to it and that&rsquo;s why they are considered shared.</p>

<h3>Multiple MAP_PRIVATE mmaps in single process</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>./test <span class="m">2</span> MAP_PRIVATE NO_MAP_POPULATE
</span></code></pre></td></tr></table></div></figure>


<p>Content of smaps:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>7f726c909000-7f728c909000 rw-p <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">32</span> kB
</span><span class='line'>Shared_Clean:         <span class="m">64</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span><span class='line'>...
</span><span class='line'>7f728c909000-7f72ac909000 rw-p <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:                  <span class="m">64</span> kB
</span><span class='line'>Pss:                  <span class="m">32</span> kB
</span><span class='line'>Shared_Clean:         <span class="m">64</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:         <span class="m">0</span> kB
</span><span class='line'>...
</span></code></pre></td></tr></table></div></figure>


<p>Since MAP_PRIVATE is copy-on-write (COW) and we only read from the file so the two VMAs are backed by the same physical pages and hence those pages are considered shared.</p>

<h3>Multiple MAP_PRIVATE mmaps with MAP_POPULATE in single process</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>./test <span class="m">2</span> MAP_PRIVATE MAP_POPULATE
</span></code></pre></td></tr></table></div></figure>


<p>Content of smaps:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>7fafbaabe000-7fafdaabe000 rw-p <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:              <span class="m">524288</span> kB
</span><span class='line'>Pss:              <span class="m">524288</span> kB
</span><span class='line'>Shared_Clean:          <span class="m">0</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:    <span class="m">524288</span> kB
</span><span class='line'>...
</span><span class='line'>7fafdaabe000-7faffaabe000 rw-p <span class="m">00000000</span> 103:01 <span class="m">132</span>                       /tmp/smaps
</span><span class='line'>Size:             <span class="m">524288</span> kB
</span><span class='line'>KernelPageSize:        <span class="m">4</span> kB
</span><span class='line'>MMUPageSize:           <span class="m">4</span> kB
</span><span class='line'>Rss:              <span class="m">524288</span> kB
</span><span class='line'>Pss:              <span class="m">524288</span> kB
</span><span class='line'>Shared_Clean:          <span class="m">0</span> kB
</span><span class='line'>Shared_Dirty:          <span class="m">0</span> kB
</span><span class='line'>Private_Clean:         <span class="m">0</span> kB
</span><span class='line'>Private_Dirty:    <span class="m">524288</span> kB
</span><span class='line'>...
</span></code></pre></td></tr></table></div></figure>


<p>If the mmap is PROT_WRITE, MAP_PRIVATE and MAP_POPULATE, then kernel will not only populate the page tables but also <a href="https://github.com/torvalds/linux/blob/v5.15/mm/gup.c#L1477">eagerly triggers COW</a> so each VMA is backed by their own private physcial pages.</p>

<h2>Clean vs Dirty</h2>

<p>From the perspective of smaps, whether a page is considered clean or dirty is determined by <a href="https://github.com/torvalds/linux/blob/v5.15/fs/proc/task_mmu.c#L419">this</a>. Conceptually a page is dirty if it cannot just be discarded without data loss, which means it has dirty writes that haven&rsquo;t been flushed to the underlying file or swap space yet.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[LLM Learning Resources]]></title>
    <link href="http://blog.jjyao.me/blog/2025/10/16/llm-learning-resources/"/>
    <updated>2025-10-16T21:54:04-07:00</updated>
    <id>http://blog.jjyao.me/blog/2025/10/16/llm-learning-resources</id>
    <content type="html"><![CDATA[<p>This post lists resources that I find useful during my journey of learning LLM as a system enginer.</p>

<!-- more -->


<h2>Neural Networks</h2>

<p>LLMs are large neural networks so having a basic understanding of what neural networks are is helpful.</p>

<h4>Victor Zhou&rsquo;s Neural Networks From Scratch</h4>

<p><a href="https://victorzhou.com/series/neural-networks-from-scratch/">Neural Networks From Scratch</a> is a 4-post series
that introduces classic neural networks, recurrent neural networks (RNNs) and convolutional neural networks (CNNs).
It doesn&rsquo;t require any prior knowledge except for some math. One good thing about this series is that it&rsquo;s very hands-on and you will learn, step-by-step, how to write a simple nueral network from scratch using only numpy to solve a real problem.</p>

<h4>Michael Nielsen&rsquo;s Visual proof that neural nets can compute any function</h4>

<p><a href="http://neuralnetworksanddeeplearning.com/chap4.html">Visual proof that neural nets can compute any function</a> gives a visual explanation of the <a href="https://en.wikipedia.org/wiki/Universal_approximation_theorem">universal approximation theorem</a> that neural networks can be used to approximate any continuous function to any desired precision.</p>

<h2>LLM</h2>

<h4>Andrej Karpathy&rsquo;s Neural Networks: Zero to Hero</h4>

<p><a href="https://karpathy.ai/zero-to-hero.html">Neural Networks: Zero to Hero</a> is a hands-on course that starts with building a simple neural network from scratch and ends with building a GPT from scratch. It&rsquo;s a great course to learn LLM progressively.</p>

<h3>Inference</h3>

<p>The best way to learn LLM inference is writing an inference engine from scratch IMO and this helps you better understand model architectures as well.</p>

<h4>Andrej Karpathy&rsquo;s llama2.c</h4>

<p><a href="https://github.com/karpathy/llama2.c">llama2.c</a> is an inference engine for Llama 2 in one file of pure C. It can give you a taste of what LLM inference looks like. Based on it, I wrote a <a href="https://github.com/jjyao/llm.edu/tree/main/llama2.rs">Rust implementation</a> with optimizations like tensor parallelism.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[High Output Management]]></title>
    <link href="http://blog.jjyao.me/blog/2025/08/17/high-output-management/"/>
    <updated>2025-08-17T21:29:32-07:00</updated>
    <id>http://blog.jjyao.me/blog/2025/08/17/high-output-management</id>
    <content type="html"><![CDATA[<p>The output of a manager is the output of the organizational units under his or her supervision or influence. This book talks about how to increase the managerial output.</p>

<!-- more -->


<h2>Managerial Leverage</h2>

<p>Managerial output = Output of organization = A1 * L1 + A2 * L2 + &hellip; where A is a managerial activity and L is the corresponding leverage. Manager activities include information-gathering, information-giving, decision-making, nudging, and being a role model. Leverage, by definition, is the measure of the output generated by any given managerial activity. Based on the formula, a manager can increase his output by:</p>

<ol>
<li>Increase the rate with which a manager performs his activities, speeding up his work.</li>
<li>Increasing the leverage associated with the various managerial activities.</li>
<li>Shifting the mix of a manager&rsquo;s activities from those with lower to those with higher leverage.</li>
</ol>


<p>The art of management lies in the capacity to select from the many activities of seemingly comparable significance the one or two or three that provide leverage well beyond the others and concentrate on them. This is how we should allocate our time —— the single most important resource that we allocate from one day to the next. Given time is a finite resource, and when we say &ldquo;yes&rdquo; to one thing we are inevitably saying &ldquo;no&rdquo; to another, we should be very selective about things we choose to do.</p>

<h2>Meetings —— The Medium of Managerial Work</h2>

<p>There are two kinds of meetings. The first is <strong>process-oriented</strong> meeting, knowledge is shared and information is exchanged. Such meetings take place on a regularly scheduled basis. The second is <strong>mission-oriented</strong> meeting, the purpose is to solve a specific problem and produce a decision. They are ad hoc affairs, not scheduled long in advance, because they usually can&rsquo;t be.</p>

<p>One-on-one is a process-oriented meeting. It should be regarded as the subordinate&rsquo;s meeting, with its agenda and tone set by him.</p>

<p>For a mission-oriented meeting, the purpose should be decided beforehand and it&rsquo;s the meeting chairman&rsquo;s job to make sure the meeting accomplishs the purpose for which it was called. Since the goal of this type of meetings is to make a decision, we should control the number of attendees. A decision-making meeting is hard to keep moving if more than six or seven people attend. Eight people should be the absolute cutoff. Decision-making is not a spectator sport, because onlookers get in the way of what needs to be done. Once the meeting is over, the chairman must nail down exactly what happened by sending out minutes that summarize the discussion that occurred, the decision made, and the actions to be taken. And it&rsquo;s very important that attendees get the minutes quickly, before they forget what happened. The minutes should also be as clear and as specific as possible, telling the reader what is to be done, who is to do it, and when.</p>

<h2>Decisions, Decisions</h2>

<p>An organization does not live by its members agreeing with one another at all times about everything. It lives instead by people committing to support the decisions and the moves of the business.</p>

<p><img src="http://blog.jjyao.me/images/post/high-output-management/ideal-decision-making-process.png"></p>

<p>The ideal decision-making process involves three stages. The first stage should be <strong>free</strong> discussion, in which all points of view and all aspects of an issue are openly welcomed and debated. The greater the disagreement and controversy, the more important becomes the word free. The next stage is reaching a <strong>clear</strong> decision. Again, the greater the disagreement about the issue, the more important becomes the word clear. Finally, everyone involved must give the decision reached by the group <strong>full</strong> support.</p>

<p>The person makes the decision is the DRI and everyone else needs to <strong>disagree and commit</strong> once the decision is made. The DRI should have the confidence to make the decision especially when consensus is not reached and the self-confidence mostly comes from a gut-level realization that nobody has ever died from making a wrong business decision, or taking inappropriate action, or being overruled (most decisions are two-way doors).</p>

<p>We shouldn&rsquo;t enter the decision-making stage too early or wait too long. If we make the decision too early, we may miss some opinions. If we make the decision too late (aiming for the absolutely right decision), we are not moving fast enough. Often, it&rsquo;s better to try out a 60-point decision and iterate on it quickly based on feedbacks than trying to find a 100-point decision endlessly.</p>

<p>For decision-making, one of the manager&rsquo;s key tasks is to settle six important questions in advance:</p>

<ul>
<li>What decision needs to be made?</li>
<li>When does it have to be made?</li>
<li>Who will decide?</li>
<li>Who will need to be consulted prior to making the decision?</li>
<li>Who will ratify or veto the decision?</li>
<li>Who will need to be informed of the decision?</li>
</ul>


<h2>Planning: Today&rsquo;s Actions for Tomorrow&rsquo;s Output</h2>

<p>Planning is to answer the qeustion: What do I have to do today to solve —— or better, avoid —— tomorrow&rsquo;s problem? Today&rsquo;s gap represents a failure of planning sometime in the past.</p>

<p>MBO (management-by-objectives) is a planning process that can be applied to daily work. A successful MBO system needs only to answer two questions:</p>

<ul>
<li>Where do I want to go? (The answer provides the objective).</li>
<li>How will I pace myself to see if I am getting there? (The answer gives us milestones, or key results.)</li>
</ul>


<h2>The Sports Analogy</h2>

<p>The role of the manager is that of the coach. First, an ideal coach takes no personal credit for the success of his team, and because of that his players trust him. Second, he is tough on his team. By being critical, he tries to get the best performance his team members can provide. Third, a good coach was likely a good player himself at one time. And having played the game well, he also understands it well.</p>

<h2>Performance Appraisal: Manager as Judge and Jury</h2>

<p>Performance reviews is the single most important form of task-relevant feedback we as supervisors can provide. The goal is to improve the subordinate&rsquo;s performance instead of cleansing our system of all the truths we may have observed about the subordinate so less may very well be more (give one feedback and one area for improvement at a time).</p>

<p>It might be counterintuitive but we should spend more time trying to improve the performance of our stars. After all, these people account for a disproportionately large share of the work in any organization. Put another way, concentrating on the stars is a high-leverage activity: if they get better, the impact on group output is very great indeed. We must keep in mind, however, that no matter how stellar a person&rsquo;s performance level is, there is always room for improvement.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Python Heap Dump]]></title>
    <link href="http://blog.jjyao.me/blog/2024/09/13/python-heap-dump/"/>
    <updated>2024-09-13T21:42:32-07:00</updated>
    <id>http://blog.jjyao.me/blog/2024/09/13/python-heap-dump</id>
    <content type="html"><![CDATA[<p>This post shows how to heap dump a <em>running</em> Python process using <a href="https://pypi.org/project/pyrasite/">pyrasite</a> and <a href="https://pypi.org/project/guppy3/">guppy3</a>.</p>

<!-- more -->


<h3>Install pyrasite</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>pip install pyrasite
</span></code></pre></td></tr></table></div></figure>


<p><code>pyrasite</code> allows you to attach to a running Python process and run arbitrary Python code. It needs ptrace to function properly and the way to enable it varies depending on your OS. For Ubuntu, you can do:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>sudo sysctl -w kernel.yama.ptrace_scope<span class="o">=</span>0
</span></code></pre></td></tr></table></div></figure>


<p>If you use <code>Conda</code>, you might need to run <code>unset LD_LIBRARY_PATH</code> so that <code>gdb</code> can use the system <code>libstdc++.so</code> instead of the one installed inside your conda env which can be incompatible.</p>

<h3>Install guppy3</h3>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>pip install guppy3
</span></code></pre></td></tr></table></div></figure>


<p><code>guppy3</code> has a subpackage <code>heapy</code> that allows you to inspect the heap.</p>

<h3>Dump Heap</h3>

<p>Once everything is installed, we can then use pyrasite to attach to the target running Python process:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>pyrasite-shell &lt;pid&gt;
</span></code></pre></td></tr></table></div></figure>


<p>This attaches to the process and opens a REPL that you can run the heap dump code using <code>guppy3</code>:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="kn">from</span> <span class="nn">guppy</span> <span class="kn">import</span> <span class="n">hpy</span>
</span><span class='line'><span class="n">h</span> <span class="o">=</span> <span class="n">hpy</span><span class="p">()</span>
</span><span class='line'><span class="n">heap</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="n">heap</span><span class="p">()</span>
</span><span class='line'>
</span><span class='line'><span class="k">print</span><span class="p">(</span><span class="n">heap</span><span class="o">.</span><span class="n">all</span><span class="p">)</span> <span class="c"># print the heap dump</span>
</span><span class='line'><span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'><span class="sd">Partition of a set of 222038 objects. Total size = 26540189 bytes.</span>
</span><span class='line'><span class="sd"> Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)</span>
</span><span class='line'><span class="sd">     0  67095  30  6591256  25   6591256  25 str</span>
</span><span class='line'><span class="sd">     1  50143  23  3623400  14  10214656  38 tuple</span>
</span><span class='line'><span class="sd">     2  14415   6  2563320  10  12777976  48 types.CodeType</span>
</span><span class='line'><span class="sd">     3  28124  13  2205321   8  14983297  56 bytes</span>
</span><span class='line'><span class="sd">     4   2175   1  2064360   8  17047657  64 type</span>
</span><span class='line'><span class="sd">     5   4816   2  1962952   7  19010609  72 dict (no owner)</span>
</span><span class='line'><span class="sd">     6  13577   6  1846472   7  20857081  79 function</span>
</span><span class='line'><span class="sd">     7   2175   1  1087024   4  21944105  83 dict of type</span>
</span><span class='line'><span class="sd">     8    690   0  1019024   4  22963129  87 dict of module</span>
</span><span class='line'><span class="sd">     9  10472   5   312536   1  23275665  88 int</span>
</span><span class='line'><span class="sd">    10    273   0   244608   1  23520273  89 google._upb._message.MessageMeta</span>
</span><span class='line'><span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'>
</span><span class='line'><span class="k">print</span><span class="p">(</span><span class="n">heap</span><span class="o">.</span><span class="n">all</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span><span class="o">.</span><span class="n">shpaths</span><span class="p">)</span> <span class="c"># print the shortest path from root to object with index 10</span>
</span><span class='line'><span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'><span class="sd"> 0: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;ActorTableData&#39;]</span>
</span><span class='line'><span class="sd"> 1: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;AvailableResources&#39;]</span>
</span><span class='line'><span class="sd"> 2: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;ErrorTableData&#39;]</span>
</span><span class='line'><span class="sd"> 3: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;GcsEntry&#39;]</span>
</span><span class='line'><span class="sd"> 4: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;GcsNodeInfo&#39;]</span>
</span><span class='line'><span class="sd"> 5: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;JobConfig&#39;]</span>
</span><span class='line'><span class="sd"> 6: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;JobTableData&#39;]</span>
</span><span class='line'><span class="sd"> 7: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;PlacementGroupTableData&#39;]</span>
</span><span class='line'><span class="sd"> 8: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;PubSubMessage&#39;]</span>
</span><span class='line'><span class="sd"> 9: hp.Root.i0_modules[&#39;ray._private.gcs_utils&#39;].__dict__[&#39;ResourceDemand&#39;]</span>
</span><span class='line'><span class="sd">&lt;... 271 more paths ...&gt;</span>
</span><span class='line'><span class="sd">&quot;&quot;&quot;</span>
</span></code></pre></td></tr></table></div></figure>


]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[The 4 Disciplines of Execution]]></title>
    <link href="http://blog.jjyao.me/blog/2023/04/16/the-4-disciplines-of-execution/"/>
    <updated>2023-04-16T21:21:10-07:00</updated>
    <id>http://blog.jjyao.me/blog/2023/04/16/the-4-disciplines-of-execution</id>
    <content type="html"><![CDATA[<p>This book shows how a team can execute an important goal successfully in the face of massive amount of day-to-day operations that are urgently required to keep the business running.</p>

<!-- more -->


<p>There are lots of urgent things (whirlwind) to do every day to keep the lights on. Failure to do so can kill the business today. However, there are also important things to take the business to the next level. Failure to do those can kill the business tomorrow. As a result, being able to execute important things successfully while doing the day job is critical for the success of the business in the long term. To help teams to do so, this book presents a methodology called 4DX that consists of four disciplines.</p>

<h2>Discipline 1: Focus on the Wildly Important</h2>

<p>The key of the first discipline is <strong>focus</strong>. Attempting to spread the limited capacity across multiple goals is the most common cause of failure in execution. Instead, <strong>one and only one</strong> wildly important goal (WIG) should be identified and executed.</p>

<p>To identify the WIG, one can ask the following question: If every other aspect of our team&rsquo;s performance remained at its current level, what is the one area where significant improvement would have the greatest <strong>impact</strong>? The resulting WIG should be specific, clearly measurable and in the form of <strong>From X to Y by when</strong> (e.g. improve the NPS score from 1 to 6 by end of the year).</p>

<h2>Discipline 2: Act on the Lead Measures</h2>

<p>The key of the second discipline is <strong>leverage</strong>. Lag measures are the tracking measurements of the WIG and they tell you whether you achieve the WIG or not. For example, if the WIG is improving the NPS score from 1 to 6 by end of the year then the lag measure is the NP score. The issue with the lag measure is that you cannot influence it directly and it is lagging in the sense that by the time you get the result, it is already too late and the performance that drove it is in the past. In contrast, lead measures measure things that lead to the WIG and can be influenced by us directly. For example, burning 1000 calories every day is a good lead measure for a WIG of losing 20 pounds in 3 months.</p>

<p>Two basic characteristics of a valid lead measure are <strong>predictive</strong> and <strong>influenceable</strong>: it&rsquo;s predictive of achieving the WIG and can be influenced directly. There could be many lead measures for a WIG and finding the highest-leverage ones is perhaps the toughest and most intriguing challenge for leaders trying to execute a WIG. Finding lead measures requires the involvement of the frontline team members and that involvement brings engagement and commitment. In the end, it&rsquo;s the frontline team that acts on the lead measures. A leader can only veto but not dictate what the final lead measures should be. The resulting lead measures are the team&rsquo;s bet: they bet that by driving these lead measures, the team is going to achieve the WIG.</p>

<p><img src="http://blog.jjyao.me/images/post/the-4-disciplines-of-execution/lead-measure.png"></p>

<h2>Discipline 3: Keep a Compelling Scoreboard</h2>

<p>The key of the third discipline is <strong>engagement</strong>. A compelling player&rsquo;s scoreboard that clearly shows whether a team is winning or losing drives the highest level of engagement and the engagement drives the highest level of performance. The fundamental purpose of a player&rsquo;s scoreboard is to motivate the players to win.</p>

<p>A compelling scoreboard has the following four characteristics:</p>

<ul>
<li><p>It&rsquo;s simple as a player&rsquo;s scoreboard in a sporting event, not a more complex coach&rsquo;s scoreboard.</p></li>
<li><p>It can be seen easily. A visible scoreboard makes sure that the WIG and lead measures are not forgotten in the constant urgency of day-to-day responsibilities.</p></li>
<li><p>It shows both the lead and lag measures. From the scoreboard, the team can see what they are doing (the lead measures) and what they are getting (the lag measures). Once the team sees that the lag measure is moving because of the efforts they have made on the leads, it has a dramatic effect on engagement because they know they are directly impacting the results.</p></li>
<li><p>It tells if a team is winning or losing immediately. If the team can&rsquo;t quickly determine if they are winning or not by looking at the scoreboard, then it&rsquo;s not a game, it&rsquo;s just data.</p></li>
</ul>


<h2>Discipline 4: Create a Cadence of Accountability</h2>

<p>The key of the fourth discipline is <strong>accountability</strong>. This is where execution actually happens. The previous three disciplines set up the game but until this discipline is applied, the team is not in the game. Without this discipline, there will always be actions the team members know they should perform but never actually do with real consistency due to whirlwind.</p>

<p>To create a cadence of accountability, a WIG session is held on the same day and at the same time every week with the following agenda:</p>

<ol>
<li><p>Account. Report on last week&rsquo;s commitments.</p></li>
<li><p>Review the scoreboard. Learn from successes and failures.</p></li>
<li><p>Plan. Clear the path and make new commitments.</p></li>
</ol>


<p>To figure out the <em>high-impact</em> commitments for the week, ask the question &ldquo;What are the one or two most important things I can do this week to impact the lead measures?&rdquo; Once the commitments are made from team members, they are <em>unconditional</em> regardless of the whirlwind and the team member is accountable to each other.</p>

<p>Finally, a successful 4DX not only helps the team to achieve the specific WIG but also produces a winning team that performs at the next level and is ready for the next WIG.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[The Design of Everyday Things]]></title>
    <link href="http://blog.jjyao.me/blog/2022/06/21/the-design-of-everyday-things/"/>
    <updated>2022-06-21T22:40:54-07:00</updated>
    <id>http://blog.jjyao.me/blog/2022/06/21/the-design-of-everyday-things</id>
    <content type="html"><![CDATA[<p>What&rsquo;s a good design? How to produce a good design? This book provides some answers.</p>

<!-- more -->


<h2>Human-centered Design</h2>

<p>A good design should be human-centered. It should put human needs, capabilities, and behavior first, then accommodate those needs, capabilities, and ways of behaving. <em>The design should adapt to people, not the opposite</em>.</p>

<p>A good design should be understandable and usable. The design should help users establish a concept model of the system that is an explanation, usually highly simplified, of how the system works. There are seven design principles to achieve a good design:</p>

<ul>
<li>Discoverability: It is possible to determine what actions are possible and the current state of the device.</li>
<li>Feedback: There is full and continuous information about the results of actions and the current state of the product or service. After an action has been executed, it is easy to determine the new state.</li>
<li>Conceptual model: The design projects all the information needed to create a good conceptual model of the system, leading to understanding and a feeling of control. The conceptual model enhances both discoverability and evaluation of results.</li>
<li>Affordances: The proper affordances exist to make the desired actions possible.</li>
<li>Signifiers: Effective use of signifiers ensures discoverability and that the feedback is well communicated and intelligible.</li>
<li>Mappings: The relationship between controls and their actions follows the principles of good mapping, enhanced as much as possible through spatial layout and temporal contiguity.</li>
<li>Constraints: Providing physical, logical, semantic, and cultural constraints guides actions and eases interpretation.</li>
</ul>


<h2>The Double-Diamond Model of Design</h2>

<p>Finding the right problem and finding the right solution are the two components of design and this corresponds to the two phases of the design process. The double-diamond model of design describes these two phases.</p>

<p><img src="http://blog.jjyao.me/images/post/the-design-of-everyday-things/double-diamond-model-of-design.png"></p>

<p>To find the real, ROOT problem to solve, we can use the <em>five whys</em> approach. We should keep asking why until we find the real, fundamental, root problem. Don&rsquo;t rely on whatever people tell you because they will tell you that they want &ldquo;a faster horse&rdquo;. In fact, the author has a somewhat counterintuitive rule for himself: never solve the problem I am asked to solve. Because the problem being asked to solve, invariably, is not the real, fundamental, root problem.</p>

<h2>No Human Error, Only Bad Design</h2>

<p>Human error usually is a result of poor design. Whenever human errors happen, we should think about how we can design better to eliminate those errors or reduce the impact if complete elimination is impossible. This is an important mindset to have: if a system lets you make the error, it&rsquo;s badly designed.</p>

<p>There are several design practices to prevent human errors:</p>

<ul>
<li>Avoid procedures that have identical opening steps but then diverge.</li>
<li>Ensure that controls and displays for different purposes are significantly different from one another.</li>
<li>Try to avoid modes. If they are necessary, the equipment must make it obvious which mode is invoked.</li>
<li>People will be interrupted during their activities and they may need assistance in resuming their operations.</li>
<li>Don&rsquo;t count on much being retained in people&rsquo;s short-term memory. The most effective way of helping people remember is to make it unnecessary.</li>
</ul>


<h2>Successful Product, Successful Design</h2>

<p>The author argues that the design is successful only if the final product is successful —— if people buy it, use it and enjoy it. It doesn&rsquo;t matter how great the design is if people don&rsquo;t buy it.</p>

<p>To create a successful product, people should focus on strengths, not weaknesses. If the product has real strengths, it can afford to just be &ldquo;good enough&rdquo; in the other areas. People should also focus on the true needs of the people who use the product and ignore competing voices.</p>

<p>To turn a good idea into a successful product requires <em>timing</em>. Good ideas that appear too early will fail even if eventually others introduce them successfully. It takes a long time (may be decades), for good ideas to traverse the distance from conception to successful products. We should <em>ride the wave</em> to make the product more likely to be successful.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Crossing the Chasm]]></title>
    <link href="http://blog.jjyao.me/blog/2021/12/27/crossing-the-chasm/"/>
    <updated>2021-12-27T20:26:25-08:00</updated>
    <id>http://blog.jjyao.me/blog/2021/12/27/crossing-the-chasm</id>
    <content type="html"><![CDATA[<p>For B2B high-tech startup, there is a period when the product is already successful in the early market and is trying to enter the mainstream market. This period is called crossing-the-chasm period and it&rsquo;s peril. Many startups die during this period and never reach the mainstream market. This book tells us why there is such a period and why it&rsquo;s dangerous. It also shows how we can get through this period and be successful in the mainstream market.</p>

<!-- more -->


<h2>Technology Adoption Life Cycle</h2>

<p><img src="http://blog.jjyao.me/images/post/crossing-the-chasm/technology-adoption-life-cycle.png"></p>

<p>For the technology or product offered by a company, its adoption goes through several phases from left to right.</p>

<h4>Innovators (Technology Enthusiasts)</h4>

<p>This is a group of people who are techies. They just like new technologies for its own sake and are willing to try them. They don&rsquo;t represent a significant market in themselves, nor do they have enough buying power. However they are still important to win over because 1) they are the sounding board and test bed for the new technology or product 2) they are the reference base for early adopters.</p>

<h4>Early Adopters (Visionaries)</h4>

<p>This group of people treat the new technology or product as an opportunity to have a strategic leap forward. They want fundamental breakthrough enabled by the new technology and are willing to take high risks due to the potential order-of-magnitude return on investment. These people have big budget to implement their strategic initiative so they are an important source of high-tech development capital.</p>

<h4>Early Majority (Pragmatists)</h4>

<p>These people represent the mainstream market that any startup wants to win due to its volume. Unlike visionaries, they want incremental, predictable improvements instead of disruptive ones. They have the following characteristics: 1) they are loyal once won and will even help you to defend new comers 2) they want to buy the <em>whole product</em> from the <em>market leader</em> 3) they reference other pragmatists not visionaries.</p>

<h4>Late Majority (Conservatives)</h4>

<p>They buy the extremely mature product with low cost just to stay on par with the rest of the world. They are the low-margin end of the market but have high volume.</p>

<h4>Laggards (Skeptics)</h4>

<p>Even though they may not buy products, they do point continually to the discrepancies between the sales claims and the delivered product. Startups should use their feedbacks to continuously improving the products.</p>

<h2>Word of Mouth</h2>

<p>In the high-tech buying process, word of mouth is the number-one source of information that buyers reference. People reference each other during the buying decision.</p>

<h2>Whole Product</h2>

<p><img src="http://blog.jjyao.me/images/post/crossing-the-chasm/whole-product.png"></p>

<p>In the mainstream market, people want to buy whole product not core product. In other words, they want to buy the product that has a surrounding ecosystem, which radically reduces their burden of support. Since whole products grow up around the market leading products and not around the others, pragmatists buy from market leaders.</p>

<h2>Chasm</h2>

<p>The chasm between visionaries and pragmatists exists because visionaries is a very poor reference base for pragmatists. They are fundamentally different groups of people for the following reasons: 1) visionaries don&rsquo;t need many references to buy a product, rather they want to be the first to buy the product and create competitive advantage; pragmatists, on the other hand, need extensive reference to prove the validity of the new product 2) visionaries care more about the future, in fact, they are defining the future, while pragmatists don&rsquo;t put a lot of stake in futuristic things 3) visionaries don&rsquo;t expect the existence of the whole product and they are willing to piece together one themselves in return for getting a jump on their competition, while pragmatists only want to buy the whole product 4) visionaries, successful or not, don&rsquo;t plan to stick around long while pragmatists are cautious about their decisions since they know they will have to live with the results. This effectively means that when we try to win over pragmatists, there is effectively no reference base to start with.</p>

<h2>D-Day</h2>

<p>The way to cross the chasm successfully is by launching a D-Day type of invasion focusing on a highly specific target segment within a mainstream marketplace.</p>

<h4>Target the Point of Attack</h4>

<p>The first thing to do is finding the target niche market segment which is also called the beachhead segment. The segment should be <em>big enough to matter, small enough to win and good fit with your crown jewels</em>. Customers in that segment should have compelling reason to buy the product or in other words, feel enough pain. Instead of picking the optimal beachhead to be successful, which is very hard if not impossible, what more important is winning whatever good enough beachhead that&rsquo;s picked.</p>

<p>It&rsquo;s very important to pick a very specific beachhead to conquer instead of the entire mainstream market when we are crossing the chasm for the following reasons: 1) word-of-mouth has boundaries which is ususally within a market segment so winning over one or two customers in each of five or ten different segments is worse than winning four or five customers in one segment in terms of word-of-mouth effect 2) it&rsquo;s easier to become <em>a big fish in a small pond</em> and achieve market leadership given that pragmatists want to buy from market leaders 3) our scarce resource is only enough to build the whole product for a single niche market.</p>

<h4>Assemble the Invasion Force</h4>

<p>The next thing is creating the ecosystem around our core product, namely the whole product. This often requires bringing in partners and allies needed to make it a reality.</p>

<h4>Define the Battle</h4>

<p>Then we need to create the competition and positioning in order for our product to be <em>easy to buy</em>. Viable competition is important since where there is no competition, there is no market. <em>Market alternatives</em> indicate the existence of the budget dollars to buy our products and <em>product alternatives</em> call out the differentiation. The positioning of our product should be short enough to pass the elevator test so that it can successfully create and occupy a space inside the target customers&#8217; head.</p>

<h4>Launch the Invasion</h4>

<p>Finally we need to select the intended distribution channel and set pricing to give us motivational leverage over that channel.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Multipliers: How the Best Leaders Make Everyone Smarter]]></title>
    <link href="http://blog.jjyao.me/blog/2021/08/29/multipliers-how-the-best-leaders-make-everyone-smarter/"/>
    <updated>2021-08-29T22:07:08-07:00</updated>
    <id>http://blog.jjyao.me/blog/2021/08/29/multipliers-how-the-best-leaders-make-everyone-smarter</id>
    <content type="html"><![CDATA[<p>When you ask people if they want to be multipliers who make surrounding people better or diminishers who make everyone worse, most people will say that they want to be multipliers. However, in reality, many people are accidental diminishers in some aspects. Being a true multiplier requires a multiplier mindset and certain approaches. This book tells you what multipliers are and how to become one.</p>

<!-- more -->


<p>Multipliers are leaders who make people better and more capable. They can access and revitalize the intelligence in the people around them. They are <em>genius makers</em> by accessing and multiplying the genius in others. They have a <em>growth mindset</em>, which is a belief that basic qualities like intelligence and ability can be cultivated through effort. In contrast, diminishers drain intelligence and capability out of the people around them.</p>

<h2>Multiplier Mindset</h2>

<p>What people believe affects their behaviors. As a result, we need to have a multiplier mindset first in order to become one. The fundamental assumption a multiplier has is that people are smart. They discover the genius in people by asking &ldquo;how is this person smart?&rdquo;. Multipliers believe that:</p>

<ol>
<li><em>People are smart and will figure things out.</em></li>
<li><em>If I can find someone&rsquo;s genius, I can put them to work.</em></li>
<li><em>People&rsquo;s best thinking must be given, not taken.</em></li>
<li><em>People get smarter by being challenged.</em></li>
<li><em>With enough minds, we can figure it out.</em></li>
</ol>


<h2>Challenger vs Know-it-all</h2>

<p>Since multipliers believe that people get smarter by being challenged, they ask really insightful and interesting questions that make people think. They don&rsquo;t limit the team to what they know, they push their teams beyond their own knowledge and that of the organization. Even if they have the answer, the don&rsquo;t just give them. Instead, they just provide enough information to provoke thinking and to help people discover and see the opportunity for themselves. <em>What&rsquo;s more important as a leader is not having the right answer but asking the right questions.</em></p>

<h2>Debate Maker vs Decision Maker</h2>

<p>Multipliers believe that with enough minds, we can figure it out. As a result, they like <em>collective debate</em>. Through debate, they challenge and stretch what people know, thus making the organization smarter over time and creating the organizational will to execute the decisions made. In contrast, when people execute an undebated decision, they turn to debating the soundness of a decision rather than executing it.</p>

<h2>Liberator vs Tyrant</h2>

<p>Multipliers believe that people&rsquo;s best thinking must be given, not taken. They provide a safe environment for people to think and make mistakes. The highest quality of thinking cannot emerge without learning and learning can&rsquo;t happen without mistakes. <em>Intimidation and fear rarely produce truly great work.</em> Speaking of creating a safe environment, there is no easier way to invite experimentation and learning than to <em>share stories about multipliers&#8217; own mistakes</em>.</p>

<p>While multiplier gives space and safe environment, they also demand the best work. It&rsquo;s a fair trade. They create an intense environment that requires people&rsquo;s best thinking and their best work. They generate pressue, but they don&rsquo;t generate stress. Requiring people&rsquo;s best work is different from insisting on desired outcomes. Stress is created when people are expected to produce outcomes that are beyond their control. But they feel positive pressure when they are held to their best work. <em>Multipliers distinguish best work from outcomes.</em></p>

<h2>Investor vs Micro-manager</h2>

<p>Multipliers believe that people are smart and will figure things out. So they operate as investors, giving ownership that keeps rolling back to other people. As investors, they define ownership, invest resources and hold people <em>accountable</em>. When they teach, they invest in their people&rsquo;s ability to solve and avoid problems in the future. In the end, multipliers enable others to operate independently by giving other people ownership for results and investing in their success. They create organizations that can perform and win, not only without them on the field, but long after their direct influence is felt.</p>

<h2>Talent Magnet vs Empire Builder</h2>

<p>Multipliers believe that if I can find someone&rsquo;s genius, I can put them to work. They are like magnets that draw in talent and develop it to its fullset. They look for talent everywhere, find people&rsquo;s <em>native genius</em>, utilize people at their fullsets and remove the blockers. Multipliers not only notice people&rsquo;s talent, they <em>label</em> it for them. By telling people what they see, they raise people&rsquo;s awareness and confidence, allowing them to provide their capability more fully. What&rsquo;s more, once they uncover the native genius of others, they look for opportunities that demand that capability.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[What I Have Learned From the LinkedIn Graph Database Team]]></title>
    <link href="http://blog.jjyao.me/blog/2021/08/22/what-i-have-learned-from-the-linkedin-graph-database-team/"/>
    <updated>2021-08-22T19:48:33-07:00</updated>
    <id>http://blog.jjyao.me/blog/2021/08/22/what-i-have-learned-from-the-linkedin-graph-database-team</id>
    <content type="html"><![CDATA[<p>I had worked for the LinkedIn graph database team for 5+ years and we successfully built a <a href="https://engineering.linkedin.com/blog/2020/liquid-the-soul-of-a-new-graph-database-part-1">graph database</a> serving the entire LinkedIn economic graph. In this post, I want to share what I have learned. Disclaimer: many of the words and wisdom are from my great colleagues.</p>

<!-- more -->


<h3>All incidents are gifts</h3>

<p>Incidents are opportunities for us to fix bugs, improve the stability of the system and improve the process of handling incidents. We should treat them as gifts and learn as much as possible out of them.</p>

<h3>All incidents should be novel</h3>

<p>This basically means that we should never make the same mistake twice. Once an incident happens, we try as hard as we can to fix it and make sure it will never happen again with the same root cause.</p>

<h3>Hardware failure is common at scale</h3>

<p>We have, on average, 2 DIMM failures per week so we should design our software in a fault tolerant way.</p>

<h3>API is sticky</h3>

<p>Once clients start to use the exposed APIs, they become extremely sticky. That means we need to design them carefully since changing them afterwards is <a href="https://www.joelonsoftware.com/2004/06/13/how-microsoft-lost-the-api-war/">costly</a>.</p>

<h3>Logging is talking to the user/operator</h3>

<p>How programs talk to humans has a huge impact on the rate at which mistakes can be fixed. If programs tell humans exactly what is wrong, that rate can be very fast. If programs are silent or overwhelm humans with too much information, that rate can be extremely slow. When there are too many spurious errors, people get alerts fatigue and they will overlook the real problems.</p>

<p>When we write logs in our code, we need to remember that the audience is not just us but also people that may not be familiar with the entire codebase like SREs. That means the log messages should be crystal clear and actionable. Imagining how frustrating it is when oncalls get paged at 2am and they have no clue what those log messages mean and how to act on them. <a href="https://spark.apache.org/error-message-guidelines.html">Here</a> is a guideline of how to write good error messages.</p>

<h3>Comment on why</h3>

<p>Code comments should say why the code is there instead of what the code does. What the code does should be clear from the code itself. If it is not the case, then we should refactor the code to make it clear instead of adding a comment. A large decaying comment is frequently just an apology for crappy code. Don&rsquo;t accept the apology. Fix the code. Don&rsquo;t give up until you try your best. Then, as a last resort, write the comment.</p>

<h3>Good enough is not enough</h3>

<p>The math is simple: 0.8 * 0.8 * 0.8 * &hellip;.. = 0. If every time we just achieve good enough, then eventually it will become zero/failure. We should never settle and always try to do as best as we can.</p>

<h3>Conduct code review</h3>

<p>Code review is not just about finding bugs. We should also think about how we can rewrite the code in a better way that is easily understandable and unlikely to cause future bugs.</p>

<p>Also reading other people&rsquo;s good reviews allows us to learn not only from our own mistakes but also the mistakes of others.</p>

<h3>Write design document</h3>

<p>&ldquo;Writing is nature’s way of letting you know how sloppy your thinking is&rdquo; by Leslie Lamport. The very act of writing the design document helps to clarify the design itself. It also helps people to learn or understand the system in the future.</p>

<h3><a href="http://blog.jjyao.me/blog/2021/08/18/keep-your-eyes-open/">Keep your eyes open</a></h3>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Keep Your Eyes Open]]></title>
    <link href="http://blog.jjyao.me/blog/2021/08/18/keep-your-eyes-open/"/>
    <updated>2021-08-18T08:10:01-07:00</updated>
    <id>http://blog.jjyao.me/blog/2021/08/18/keep-your-eyes-open</id>
    <content type="html"><![CDATA[<p>As engineers, our job is to solve problems. In order to do that, we need to discover them first. To me, the best way to find problems is through doing things while keeping the eyes open.</p>

<!-- more -->


<p>Problems are everywhere, it&rsquo;s just whether we can find them or not. Whenever we are doing some task, it&rsquo;s not just about finishing the task itself, it&rsquo;s also about discovering new problems along the way. Here we have a cycle of <code>doing things -&gt; discovering problems -&gt; doing more things</code>. Through this cycle, we make whatever we are building better and better.</p>

<p>If we treat everything we do as an opportunity of discovering new problems, then we will find tons of them. To make this more concrete, let me give some examples:</p>

<ul>
<li>Whenever we touch a piece of code, even just one line, it&rsquo;s an opportunity to look at the surrounding code and see if we can refactor to make it better.</li>
<li>Whenever we do some repetitive work, it&rsquo;s an opportunity to automate it.</li>
<li>Whenever we talk to clients, it&rsquo;s an opportunity to learn their pain points.</li>
</ul>


<p>Amar Bose also told an interesting <a href="https://www.youtube.com/watch?v=ySAXW-7WrDg">story</a> where the student with the eyes open saw a huge opportunity while doing a quite tedious work.</p>

<p>Keep your eyes open, problems and opportunities are around you.</p>
]]></content>
  </entry>

  <entry>
    <title type="html"><![CDATA[Response Time and Throughput]]></title>
    <link href="http://blog.jjyao.me/blog/2021/04/04/response-time-and-throughput/"/>
    <updated>2021-04-04T13:43:32-07:00</updated>
    <id>http://blog.jjyao.me/blog/2021/04/04/response-time-and-throughput</id>
    <content type="html"><![CDATA[<p>For the discussion of this post, response time is the time between a service receiving a request and returning a response. It is the sum of waiting time and processing time. Waiting time is how long the request waits in queues before being processed. Processing time is the time to actually do the work of the request. Throughput is the number of requests that are completed per unit time. This post discusses how they can be possibly related.</p>

<!-- more -->


<h2>Lower Processing Time &amp; Higher Throughput</h2>

<p>If we reduce the processing time, the throughput might be higher. For example, the throughput is 10 requests per second if the processing time is 100ms CPU time assuming it&rsquo;s a single CPU system. If the processing time is reduced to 10ms CPU time, the throughput is increased to 100 requests per second.</p>

<h2>Higher Processing Time &amp; Lower Throughput</h2>

<p>This is the opposite of lower processing time &amp; higher throughput. This is undesirable since we lose both processing time and throughput.</p>

<h2>Lower Processing Time &amp; Lower Throughput</h2>