-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsearch.xml
More file actions
720 lines (610 loc) · 102 KB
/
search.xml
File metadata and controls
720 lines (610 loc) · 102 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>Ceph源码阅读(1)——librbd对rados接口调用</title>
<url>/2020/07/29/Ceph%E6%BA%90%E7%A0%81%E9%98%85%E8%AF%BB-1/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><h2 id="1-概述"><a href="#1-概述" class="headerlink" title="1. 概述"></a>1. 概述</h2><h3 id="1-1-Librados"><a href="#1-1-Librados" class="headerlink" title="1.1 Librados"></a>1.1 Librados</h3><p>Ceph rados分布式存储提供了多种语言的api接口,封装在librados库中。客户可以在客户端调用librados,完成对远端的rados分布式存储系统的访问。我们可以在源码目录中打开librados中的api文件夹,查看相关接口。</p>
<p>librados就是操作rados对象存储的接口。 其接口分为两种:一个是c 接口,其定义在include/librados.h 中。 一个是 c++接口,定义在include/librados.hpp中,实现都在<a href="http://librados.cc/">librados.cc</a>中实现。</p>
<p>接口主要分为五类[^1]:</p>
<ul>
<li><strong>ceph集群句柄(rados client类的实例)</strong>的创建和销毁,配置,连接等,<strong>pool</strong>的创建和销毁,<strong>io上下文(ioctx)</strong>的创建和销毁等。<ul>
<li>创建一个集群句柄</li>
<li>根据配置文件,命令行参数,环境变量<strong>配置</strong>集群句柄</li>
<li>连接集群,相当于使rados client能够实时通信</li>
<li><strong>创建pool</strong>,配置不同的 crush分布策略,复制级别,位置策略等等</li>
<li><strong>io上下文</strong>的创建及获取</li>
</ul>
</li>
<li><strong>快照相关接口</strong>,librados支持对于整个pool的快照,接口包括快照的创建和销毁,到对应快照版本的回滚,快照查询等等。</li>
<li><strong>同步IO操作接口</strong>,包括读,写,覆盖写,追加写,对象数据克隆,删,截断,获取和设置指定的扩展属性,批量获取扩展属性,迭代器遍历扩展属性,特殊键值对获取等等</li>
<li><strong>异步IO操作接口</strong>包括异步读,异步写,异步覆盖写,异步追加写,异步删,librados还提供了对象的监视功能,通过rados_watch可以注册回调,当对象发生变化时会回调通知上层。</li>
<li><strong>io操作组原子操作</strong>,即可以把对同一个对象的一系列io操作放到一个组里面,最后保证加入到组里的所有io操作保持原子性,要么全部成功,要么全部失败,而不会给用户呈现出文件系统不一致的问题。</li>
</ul>
<p>下述是客户端使用librados连接集群并读写对象的一个实例[^2]:</p>
<ol>
<li>获得集群的句柄,并连接到集群的某个Monitor中.以获得Cluster Map;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">cluster_name</span><span class="params">(<span class="string">"ceph"</span>)</span></span>;</span><br><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">user_name</span><span class="params">(<span class="string">"client.admin"</span>)</span></span>;</span><br><span class="line">librados::Rados cluster ;</span><br><span class="line">cluster.init2(user_name.c_str(), cluster_name.c_str(), <span class="number">0</span>);</span><br><span class="line">cluster.conf_read_file(<span class="string">"/etc/ceph/ceph.conf"</span>);</span><br><span class="line">cluster.<span class="built_in">connect</span>();</span><br></pre></td></tr></table></figure>
<ol start="2">
<li>创建IO上下文,并绑定一个已经存在的存储池;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">librados::IoCtx io_ctx ;</span><br><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">pool_name</span><span class="params">(<span class="string">"data"</span>)</span></span>;</span><br><span class="line">cluster.ioctx_create(pool_name.c_str(), io_ctx);</span><br></pre></td></tr></table></figure>
<ol start="3">
<li>同步写入一个对象;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">librados::bufferlist bl;</span><br><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">objectId</span><span class="params">(<span class="string">"hw"</span>)</span></span>;</span><br><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">objectContent</span><span class="params">(<span class="string">"Hello World!"</span>)</span></span>;</span><br><span class="line">bl.append(objectContent);</span><br><span class="line">io_ctx.<span class="built_in">write</span>(objectId, bl, objectContent.<span class="built_in">size</span>(), <span class="number">0</span>);</span><br></pre></td></tr></table></figure>
<ol start="4">
<li>为该对象添加扩展属性;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">librados::bufferlist lang_bl;</span><br><span class="line">lang_bl.append(<span class="string">"en_US"</span>);</span><br><span class="line">io_ctx.setxattr(objectId, <span class="string">"lang"</span>, lang_bl);</span><br></pre></td></tr></table></figure>
<ol start="5">
<li>异步读取对象;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">librados::bufferlist read_buf;</span><br><span class="line"><span class="keyword">int</span> read_len = <span class="number">4194304</span>;</span><br><span class="line">librados::AioCompletion *read_completion = librados::Rados::aio_create_completion();</span><br><span class="line">io_ctx.aio_read(objectId, read_completion, &read_buf, read_len, <span class="number">0</span> );</span><br></pre></td></tr></table></figure>
<ol start="6">
<li>断开连接</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">io_ctx.<span class="built_in">close</span>();</span><br><span class="line">cluster.<span class="built_in">shutdown</span>();</span><br></pre></td></tr></table></figure>
<h3 id="1-2-librbd"><a href="#1-2-librbd" class="headerlink" title="1.2 librbd"></a>1.2 librbd</h3><p>Librbd是Ceph提供块存储的库,它实现了RBD接口,基于Librados实现了对块设备的基本操作。[^3]librbd的基本架构及功能如下图所示。</p>
<p><img src= "/img/loading.gif" data-src="https://zhoubofsy.github.io/images/ceph/librbd_frame.png" alt="librbd的基本架构"></p>
<p>Ceph librbd通过调用librados的接口,完成了块设备的接口封装。下面将讲解librbd中的具体实现情况。</p>
<h2 id="2-librbd接口实现"><a href="#2-librbd接口实现" class="headerlink" title="2. librbd接口实现"></a>2. librbd接口实现</h2><p>首先,librbd中调用的rados类如下:,包括:</p>
<table>
<thead>
<tr>
<th>名称</th>
<th>声明</th>
<th>定义</th>
<th align="left">注释</th>
</tr>
</thead>
<tbody><tr>
<td>librados::(v14_2_0::)IoCtx</td>
<td>\include\rados\librados.hpp</td>
<td>src\librados\librados_cxx.cc</td>
<td align="left">rados的上下文实例</td>
</tr>
<tr>
<td>librados::(v14_2_0::)AioCompletion</td>
<td>\include\rados\librados.hpp</td>
<td>src\librados\librados_cxx.cc</td>
<td align="left">rados异步操作的回调实现</td>
</tr>
<tr>
<td>librados::(v14_2_0::)Rados</td>
<td>\include\rados\librados.hpp</td>
<td>src\librados\librados_cxx.cc</td>
<td align="left">rados实例</td>
</tr>
</tbody></table>
<p>其中IoCtx是rados的上下文实例,创建时需要绑定一个<strong>存储池</strong>(pool),封装了大量对存储池的操作函数,比如:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">create</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& oid, <span class="keyword">bool</span> exclusive)</span></span>; <span class="comment">//创建object</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">write</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& oid, bufferlist& bl, <span class="keyword">size_t</span> len, <span class="keyword">uint64_t</span> off)</span> <span class="comment">//从偏移处修改某个对象</span></span></span><br><span class="line"><span class="function"></span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">append</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& oid, bufferlist& bl, <span class="keyword">size_t</span> len)</span></span>;<span class="comment">// 在对象末尾追加</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">aio_read</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& oid, AioCompletion *c,</span></span></span><br><span class="line"><span class="function"><span class="params"> bufferlist *pbl, <span class="keyword">size_t</span> len, <span class="keyword">uint64_t</span> off, <span class="keyword">uint64_t</span> snapid)</span></span>;<span class="comment">// 异步读</span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">aio_write</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& oid, AioCompletion *c, <span class="keyword">const</span> bufferlist& bl,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">size_t</span> len, <span class="keyword">uint64_t</span> off)</span></span>;<span class="comment">//异步写</span></span><br><span class="line">...</span><br></pre></td></tr></table></figure>
<p>从以上源码可知,IoCtx中提供了对rados中对象的创建、删除、读写等同步操作接口以及除了同步操作,其中也提供异步操作的接口。</p>
<p>而AioCompletion这个类十分特别。他提供了回调函数的相关接口,即可以简单的理解成,AioCompletion表示我们需要进行的回调函数。</p>
<h3 id="2-1-librbd使用实例"><a href="#2-1-librbd使用实例" class="headerlink" title="2.1 librbd使用实例"></a>2.1 librbd使用实例</h3><p>Ceph的rbd设备使用方法与对象存储的使用有很大的相关性,原因在于rbd本质上是对于rados的再次封装。相关接口可以直接查看*/include/rbd/librbd.hpp*查看。</p>
<ol>
<li>获得集群的句柄,并连接到集群的某个Monitor中.以获得Cluster Map;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">cluster_name</span><span class="params">(<span class="string">"ceph"</span>)</span></span>;</span><br><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">user_name</span><span class="params">(<span class="string">"client.admin"</span>)</span></span>;</span><br><span class="line">librados::Rados cluster ;</span><br><span class="line">cluster.init2(user_name.c_str(), cluster_name.c_str(), <span class="number">0</span>);</span><br><span class="line">cluster.conf_read_file(<span class="string">"/etc/ceph/ceph.conf"</span>);</span><br><span class="line">cluster.<span class="built_in">connect</span>();</span><br></pre></td></tr></table></figure>
<ol start="2">
<li>创建IO上下文,并绑定一个已经存在的存储池;</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">librados::IoCtx io_ctx;</span><br><span class="line"><span class="function"><span class="built_in">std</span>::<span class="built_in">string</span> <span class="title">pool_name</span><span class="params">(<span class="string">"data"</span>)</span></span>;</span><br><span class="line">cluster.ioctx_create(pool_name.c_str(), io_ctx);</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ol start="3">
<li>创建rbd设备,即我们需要的虚拟块设备,并创建image结构,这里该结构将<em>myimag</em>e与<em>ioctx</em> 联系起来,后面可以通过image结构直接找到<em>ioctx</em>。这里会将ioctx复制两份,分为为<em>data_ioctx</em>和<em>md_ctx</em>。见明知意,一个用来处理rbd的存储数据,一个用来处理rbd的管理数据。</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">rbd_inst.create(ioctx,'myimage',size);</span><br><span class="line">image = rbd.Image(ioctx,'myimage')</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>再次之后,我们就可以通过调用image的相关接口,如aio_write,aio_read对该块设备进行读写操作。</p>
<h3 id="2-2-librbd读写流程"><a href="#2-2-librbd读写流程" class="headerlink" title="2.2 librbd读写流程"></a>2.2 librbd读写流程</h3><ol>
<li><em>image.read(data,0)<em>,通过image开始了一个写请求的生命的开始。这里指明了request的两个基本要素 buffer=data 和 offset=0。</em>image.read(data,0)<em>将会转化为librbd.cc文件中的</em>Image::read()</em> 函数,该函数中调用了ImageRequestWQ中的read的函数。</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">ssize_t</span> <span class="title">Image::read</span><span class="params">(<span class="keyword">uint64_t</span> ofs, <span class="keyword">size_t</span> len, bufferlist& bl)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> ImageCtx *ictx = (ImageCtx *)ctx;</span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">int</span> r = ictx->io_work_queue-><span class="built_in">read</span>(ofs, len, io::ReadResult{&bl}, <span class="number">0</span>);</span><br><span class="line"> tracepoint(librbd, read_exit, r);</span><br><span class="line"> <span class="keyword">return</span> r;</span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ol start="2">
<li>ImageRequestWQ::read中的实现。该函数的具体实现在ImageRequestWQ.cc文件中。</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">ssize_t</span> ImageRequestWQ<I>::<span class="built_in">read</span>(<span class="keyword">uint64_t</span> off, <span class="keyword">uint64_t</span> len,</span><br><span class="line"> ReadResult &&read_result, <span class="keyword">int</span> op_flags) {</span><br><span class="line"> CephContext *cct = m_image_ctx.cct;</span><br><span class="line"> ldout(cct, <span class="number">20</span>) << <span class="string">"ictx="</span> << &m_image_ctx << <span class="string">", off="</span> << off << <span class="string">", "</span></span><br><span class="line"> << <span class="string">"len = "</span> << len << dendl;</span><br><span class="line"></span><br><span class="line"> C_SaferCond cond; <span class="comment">//---a</span></span><br><span class="line"> AioCompletion *c = AioCompletion::create(&cond); <span class="comment">//---b</span></span><br><span class="line"> aio_read(c, off, len, <span class="built_in">std</span>::<span class="built_in">move</span>(read_result), op_flags, <span class="literal">false</span>); <span class="comment">//---c</span></span><br><span class="line"> <span class="keyword">return</span> cond.wait(); <span class="comment">//---d</span></span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ul>
<li>a. 创建了一个等待机制的上下文。</li>
<li>b. 根据上下文创建回调函数,即aio_read完成后会调用的函数。</li>
<li>c. 该函数aio_read会继续处理这个读请求。</li>
<li>d. 知道cond.wait()结束,即aio_read调用回调函数时,程序结束。</li>
</ul>
<p>由上述步骤可知,ceph的同步读写实际上是在异步读写的基础上,加上同步机制实现的。</p>
<ol start="3">
<li>再来看看aio_write 拿到了 请求的offset和buffer会做点什么呢?</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> I></span><br><span class="line"><span class="keyword">void</span> ImageRequestWQ<I>::aio_read(AioCompletion *c, <span class="keyword">uint64_t</span> off, <span class="keyword">uint64_t</span> len,</span><br><span class="line"> ReadResult &&read_result, <span class="keyword">int</span> op_flags,</span><br><span class="line"> <span class="keyword">bool</span> native_async) {</span><br><span class="line"> CephContext *cct = m_image_ctx.cct;</span><br><span class="line"> ...</span><br><span class="line"> <span class="function">RWLock::RLocker <span class="title">owner_locker</span><span class="params">(m_image_ctx.owner_lock)</span></span>;</span><br><span class="line"> <span class="keyword">if</span> (m_image_ctx.non_blocking_aio || writes_blocked() || !writes_empty() ||</span><br><span class="line"> require_lock_on_read()) { <span class="comment">//---a</span></span><br><span class="line"> <span class="built_in">queue</span>(ImageDispatchSpec<I>::create_read_request(</span><br><span class="line"> m_image_ctx, c, {{off, len}}, <span class="built_in">std</span>::<span class="built_in">move</span>(read_result), op_flags,</span><br><span class="line"> trace));</span><br><span class="line"> } <span class="keyword">else</span> { <span class="comment">//---b</span></span><br><span class="line"> c->start_op();</span><br><span class="line"> ImageRequest<I>::aio_read(&m_image_ctx, c, {{off, len}},</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">move</span>(read_result), op_flags, trace);</span><br><span class="line"> finish_in_flight_io();</span><br><span class="line"> }</span><br><span class="line"> trace.event(<span class="string">"finish"</span>);</span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>这里对输入的情况进行讨论。</p>
<ul>
<li>a. 如果输入后,发现写入队列不为空/日志被其他程序打开/写入区域已被阻塞/需要锁定当前数据,就会将当前写请求加入读写队列中。</li>
<li>b. 否则直接调用ImageRequet::aio_read操作进行读取。</li>
</ul>
<p>这里直接查看第二处的执行流程。实际上,在<em>ImageRequest::aio_read</em>函数中,读取请求按照下图的次序进入ImageReadRequest::send_reques*中,在该函数将对于块设备的读写请求转化为对于对象的读写请求。</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">graph TB</span><br><span class="line"> D[ImageRequestWQ::aio_read] --> A</span><br><span class="line"> A[ImageRequest::aio_read] -->B[ImageRequest::send]</span><br><span class="line"> B --> C[ImageWriteRequest::send_request]</span><br></pre></td></tr></table></figure>
<ol start="4">
<li><em>ImageReadRequest::send_request</em>这个函数主要完成了块设备分割的功能。</li>
</ol>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">void</span> ImageReadRequest<I>::send_request() {</span><br><span class="line"> I &image_ctx = <span class="keyword">this</span>->m_image_ctx;</span><br><span class="line"> CephContext *cct = image_ctx.cct;</span><br><span class="line"> ...</span><br><span class="line"> Striper::file_to_extents(cct, image_ctx.format_string, &image_ctx.layout,</span><br><span class="line"> extent.first, extent.second, <span class="number">0</span>, object_extents,</span><br><span class="line"> buffer_ofs); <span class="comment">// ---a</span></span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> &object_extent : object_extents) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> &extent : object_extent.second) {</span><br><span class="line"> <span class="keyword">auto</span> req_comp = <span class="keyword">new</span> io::ReadResult::C_ObjectReadRequest(</span><br><span class="line"> aio_comp, extent.offset, extent.length,</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">move</span>(extent.buffer_extents));</span><br><span class="line"> <span class="keyword">auto</span> req = ObjectDispatchSpec::create_read(</span><br><span class="line"> &image_ctx, OBJECT_DISPATCH_LAYER_NONE, extent.oid.name,</span><br><span class="line"> extent.objectno, extent.offset, extent.length, snap_id, m_op_flags,</span><br><span class="line"> <span class="keyword">this</span>->m_trace, &req_comp->bl, &req_comp->extent_map, req_comp);</span><br><span class="line"> req->send(); <span class="comment">// ---b</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> aio_comp-><span class="built_in">put</span>();</span><br><span class="line"></span><br><span class="line"> image_ctx.perfcounter->inc(l_librbd_rd);</span><br><span class="line"> image_ctx.perfcounter->inc(l_librbd_rd_bytes, buffer_ofs);</span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ul>
<li>a. 根据请求的大小需要将这个请求按着object进行划分,由函数<em>file_to_extents</em>进行处理,处理完成后按着object进行保存在extents中。该函数完成了原始请求的拆分。</li>
</ul>
<blockquote>
<p>一个rbd设备是有很多的object组成,也就是需要将rbd设备进行切块,每一个块叫做object,每个object的大小默认为4M,也可以自己指定。file_to_extents函数将这个大的请求分别映射到object上去,拆成了很多小的请求如下图。最后映射的结果保存在ObjectExtent中。</p>
<p><img src= "/img/loading.gif" data-src="http://static.oschina.net/uploads/space/2015/1119/145240_0zUe_2460844.jpg" alt="img"></p>
<p>原本的offset是指在rbd内的偏移量(写入rbd的位置),经过file_to_extents后,转化成了一个或者多个object的内部的偏移量offset0。这样转化后处理一批这个object内的请求。</p>
</blockquote>
<ul>
<li>b. 调用<em>ObjectDispatchSpec::send</em>,将分割后的对象读请求进行分发、处理。</li>
</ul>
<ol start="5">
<li><em>ObjectDispatchSpac::send</em>函数将会按照下图次序进入函数*ImageReadRequest::send_request()*:</li>
</ol>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">graph TB</span><br><span class="line"> D[ObjectDispatchSpac::send] --> A[ObjectDispatcherInterface::send]</span><br><span class="line"> A -->B[ObjectDispatcher::send]</span><br><span class="line"> B --> C[ImageReadRequest::send_request]</span><br></pre></td></tr></table></figure>
<p>函数<em>ImageReadRequest::send_request</em>将创建一个SendVistor,由观察者继续读写流程。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">void</span> ObjectDispatcher<I>::send(ObjectDispatchSpec* object_dispatch_spec) {</span><br><span class="line"> <span class="keyword">auto</span> cct = m_image_ctx->cct;</span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">bool</span> handled = boost::apply_visitor(</span><br><span class="line"> SendVisitor{object_dispatch, object_dispatch_spec},</span><br><span class="line"> object_dispatch_spec->request);</span><br><span class="line"> object_dispatch_meta.async_op_tracker->finish_op(); <span class="comment">// 创建SendVistor</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// handled ops will resume when the dispatch ctx is invoked</span></span><br><span class="line"> <span class="keyword">if</span> (handled) {</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> object_dispatch_spec->dispatcher_ctx.complete(<span class="number">0</span>);</span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ol start="6">
<li>进入<em>ObjectDispatcher::SendVisitor</em>函数,发现读流程如下图,最终调用了<em>read_object</em>。</li>
</ol>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">graph TB</span><br><span class="line"> D[ObjectDispatcher::SendVisitor] --> A[ObjectDispatchInterface::read]</span><br><span class="line"> A -->B[ObjectDispatch::read]</span><br><span class="line"> B --> C[ObjectReadRequest::read]</span><br><span class="line"> C --> E[ObjectReadRequest::read_object]</span><br><span class="line"> </span><br></pre></td></tr></table></figure>
<p>最终在函数<em>read_object</em>中调用了rados对于对象的读写接口。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">void</span> ObjectReadRequest<I>::read_object() {</span><br><span class="line"> I *image_ctx = <span class="keyword">this</span>->m_ictx;</span><br><span class="line"> ...</span><br><span class="line"> librados::ObjectReadOperation op; <span class="comment">// ---a</span></span><br><span class="line"> ...</span><br><span class="line"> librados::AioCompletion *rados_completion = util::create_rados_callback<</span><br><span class="line"> ObjectReadRequest<I>, &ObjectReadRequest<I>::handle_read_object>(<span class="keyword">this</span>); <span class="comment">// ---b</span></span><br><span class="line"> </span><br><span class="line"> <span class="keyword">int</span> flags = image_ctx->get_read_flags(<span class="keyword">this</span>->m_snap_id);</span><br><span class="line"> <span class="keyword">int</span> r = image_ctx->data_ctx.aio_operate(</span><br><span class="line"> data_object_name(<span class="keyword">this</span>->m_ictx, <span class="keyword">this</span>->m_object_no), rados_completion, &op,</span><br><span class="line"> flags, <span class="literal">nullptr</span>,</span><br><span class="line"> (<span class="keyword">this</span>->m_trace.valid() ? <span class="keyword">this</span>->m_trace.get_info() : <span class="literal">nullptr</span>)); <span class="comment">// ---c</span></span><br><span class="line"> ceph_assert(r == <span class="number">0</span>);</span><br><span class="line"></span><br><span class="line"> rados_completion-><span class="built_in">release</span>();</span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ul>
<li>a. 创建对象操作。这里将根据具体情况选择进行<em>read</em>或<em>sparse_read</em>操作。</li>
<li>b. 创建回调函数。这里创建了读取操作的回调函数,回调函数将会返回读取道德结果</li>
<li>c. <code>int r = image_ctx->data_ctx.aio_operate</code>语句中,data_ctx即为当前image所在存储池的IoCtx对象。通过调用IoCtx对象的接口,最终将读写请求交付到了librados的相关接口。</li>
</ul>
<h3 id="2-3-librbd小结"><a href="#2-3-librbd小结" class="headerlink" title="2.3 librbd小结"></a>2.3 librbd小结</h3><p>上文已经介绍,librbd的基本功能是将块存储的请求转化为对象存储。事实上,librbd实现的功能远远不止这些。包括调用rados的snap机制完成<strong>快照</strong>,借助journal完成<strong>镜像</strong>功能,并能在故障后进行<strong>故障恢复</strong>,完成<strong>回滚</strong>操作等等。</p>
<p>为了保证块设备的正常运行,librbd中还需要管理大量的其他数据,这些数据都会以对象的形式存储在rados分布式存储系统中。包括:</p>
<ul>
<li>元数据:rbd _dircetory,rbd_id,rbd_head,rbd_object_map等</li>
<li>cache数据:cache_object,cache_parent,cache_writeAround等</li>
<li>日志数据:journal</li>
</ul>
<p>因为需要实现的功能太过繁杂,librbd的代码十分复杂。对librbd进行全文件夹搜索,发现其对librados的接口调用多达<strong>1042</strong>处。</p>
<p>[^1]: <a href="https://blog.csdn.net/hit1944/article/details/38330975">ceph的librados api解释</a><br>[^2]: <a href="https://my.oschina.net/u/2271251/blog/369820">ceph librados接口说明</a><br>[^3]: <a href="https://blog.csdn.net/csnd_pan/article/details/78728743">Ceph学习——Librbd块存储库与RBD读写流程源码分析</a><br>[^4]: <a href="https://zhoubofsy.github.io/2017/01/22/storage/ceph/librbd-frame-analyse/">librbd 架构分析</a><br>[^5]: <a href="https://my.oschina.net/u/2460844/blog/532755">ceph的数据存储之路(4) —– rbd client 端的数据请求处理</a><br>[^6]: <a href="https://docs.ceph.com/docs/master/rados/operations/cache-tiering/">Ceph Tiering官方文档</a> </p>
]]></content>
<categories>
<category>Ceph</category>
</categories>
<tags>
<tag>Ceph</tag>
<tag>RBD</tag>
<tag>RADOS</tag>
<tag>c++</tag>
</tags>
</entry>
<entry>
<title>Hello World</title>
<url>/2020/07/29/hello-world/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><p>Welcome to <a href="https://hexo.io/">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues">GitHub</a>.</p>
<h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/writing.html">Writing</a></p>
<h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">$ hexo server</span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/server.html">Server</a></p>
<h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">$ hexo generate</span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/generating.html">Generating</a></p>
<h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">$ hexo deploy</span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/one-command-deployment.html">Deployment</a></p>
]]></content>
</entry>
<entry>
<title>Deploy a Ceph Cluster Manually</title>
<url>/2020/08/17/Deploy%20a%20Ceph%20Cluster%20Manually/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><h4 id="Preparation:"><a href="#Preparation:" class="headerlink" title="Preparation:"></a>Preparation:</h4><ul>
<li><p>close firewall、selinux</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> systemctl stop firewalld</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> setenforce 0</span></span><br></pre></td></tr></table></figure>
</li>
<li><p>set hosts。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> vim /etc/hosts</span></span><br><span class="line">192.168.2.172 ca12</span><br><span class="line">192.168.2.179 ca19</span><br><span class="line">192.168.2.95 ca95</span><br><span class="line">192.168.2.98 ca98</span><br></pre></td></tr></table></figure>
</li>
<li><p>set ssh</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ssh-keygen</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ssh-copy-id ca12</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ssh-copy-id ca19</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ssh-copy-id ca95</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ssh-copy-id ca98</span></span><br></pre></td></tr></table></figure>
</li>
</ul>
<h4 id="1-Arch-of-my-cluster"><a href="#1-Arch-of-my-cluster" class="headerlink" title="1. Arch of my cluster"></a>1. Arch of my cluster</h4><table>
<thead>
<tr>
<th><strong>hostname</strong></th>
<th><strong>IP</strong></th>
<th><strong>role</strong></th>
<th><strong>info</strong></th>
</tr>
</thead>
<tbody><tr>
<td>ca12</td>
<td>192.168.2.172</td>
<td>Mon、OSD</td>
<td>a moniter, and 1 OSD run in SSD</td>
</tr>
<tr>
<td>ca19</td>
<td>192.168.2.179</td>
<td>Mon、OSD</td>
<td>a moniter, and 1 OSD run in SSD</td>
</tr>
<tr>
<td>ca95</td>
<td>192.168.2.95</td>
<td>Mon、OSD</td>
<td>a moniter, and 1 OSD run in SSD</td>
</tr>
</tbody></table>
<h4 id="2-Partition-disk-for-bluestore"><a href="#2-Partition-disk-for-bluestore" class="headerlink" title="2. Partition disk for bluestore"></a>2. Partition disk for bluestore</h4><p>split each SSD into 4 partitions. Then use ‘mkfs.xfs’ to format /dev/sdb1.</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> parted /dev/sdb </span></span><br><span class="line">(parted) mkpart osd-device-0-data 0G 10G</span><br><span class="line"><span class="meta">$</span><span class="bash"> parted /dev/sdb </span></span><br><span class="line">(parted) mkpart osd-device-0-wal 10G 20G</span><br><span class="line"><span class="meta">$</span><span class="bash"> parted /dev/sdb </span></span><br><span class="line">(parted) mkpart osd-device-0-db 20G 30G</span><br><span class="line"><span class="meta">$</span><span class="bash"> parted /dev/sdb </span></span><br><span class="line">(parted) mkpart osd-device-0-block 30G 70G</span><br><span class="line"></span><br><span class="line"><span class="meta">$</span><span class="bash"> mkfs.xfs /dev/sdb1</span></span><br></pre></td></tr></table></figure>
<p>result:</p>
<table>
<thead>
<tr>
<th>Number</th>
<th>Start</th>
<th>End</th>
<th>Size</th>
<th>File system</th>
<th>Name</th>
<th>Flags</th>
</tr>
</thead>
<tbody><tr>
<td>1</td>
<td>1049kB</td>
<td>10.0GB</td>
<td>9999MB</td>
<td></td>
<td>osd-device-0-data</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>10.0GB</td>
<td>20.0GB</td>
<td>9999MB</td>
<td></td>
<td>osd-device-0-wal</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>20.0GB</td>
<td>30.0GB</td>
<td>10.0GB</td>
<td></td>
<td>osd-device-0-db</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>30.0GB</td>
<td>70.0GB</td>
<td>40.0GB</td>
<td></td>
<td>osd-device-0-block</td>
<td></td>
</tr>
</tbody></table>
<h4 id="3-Ceph-conf"><a href="#3-Ceph-conf" class="headerlink" title="3. Ceph.conf"></a>3. Ceph.conf</h4><p>create directory <strong>/etc/ceph</strong> to store conf and keyring。</p>
<p>/etc/ceph/ceph.conf:</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">[global]</span><br><span class="line">fsid = 57077c8f-0a92-42e8-a82c-61198875a30e</span><br><span class="line">osd crush chooseleaf type =0</span><br><span class="line"></span><br><span class="line">[mon]</span><br><span class="line">mon data=/data/$name</span><br><span class="line"> </span><br><span class="line">[mon.ca12]</span><br><span class="line">host=ca12</span><br><span class="line">mon addr=192.168.2.172:6789</span><br><span class="line">public addr=192.168.2.172</span><br><span class="line"></span><br><span class="line">[mon.ca19]</span><br><span class="line">host=ca19</span><br><span class="line">mon addr=192.168.2.179:6789</span><br><span class="line">public addr=192.168.2.179</span><br><span class="line"></span><br><span class="line">[mon.ca95]</span><br><span class="line">host=ca95</span><br><span class="line">mon addr=192.168.2.95:6789</span><br><span class="line">public addr=192.168.2.95</span><br><span class="line"></span><br><span class="line">[osd]</span><br><span class="line">osd mkfs type=xfs</span><br><span class="line">osd data = /data/$name</span><br><span class="line">enable_experimental_unrecoverable_data_corrupting_features= bluestore</span><br><span class="line">osd objectstore = bluestore</span><br><span class="line">bluestore = true</span><br><span class="line">bluestore fsck on mount = true</span><br><span class="line">bluestore block create = true</span><br><span class="line">bluestore block db size =67108864</span><br><span class="line">bluestore block db create = true</span><br><span class="line">bluestore block wal size =134217728</span><br><span class="line">bluestore block wal create =true</span><br><span class="line"></span><br><span class="line">[osd.0]</span><br><span class="line">host = ca12</span><br><span class="line">bluestore block db path =/dev/sdc2</span><br><span class="line">bluestore block wal path =/dev/sdc3</span><br><span class="line">bluestore block path = /dev/sdc4</span><br><span class="line"></span><br><span class="line">[osd.1]</span><br><span class="line">host = ca19</span><br><span class="line">bluestore block db path =/dev/sdd2</span><br><span class="line">bluestore block wal path =/dev/sdd3</span><br><span class="line">bluestore block path = /dev/sdd4</span><br><span class="line"></span><br><span class="line">[osd.2]</span><br><span class="line">host = ca95</span><br><span class="line">bluestore block db path =/dev/sdb2</span><br><span class="line">bluestore block wal path =/dev/sdb3</span><br><span class="line">bluestore block path = /dev/sdb4</span><br><span class="line"></span><br><span class="line">[mgr]</span><br><span class="line">mgr modules = dashboard balancer</span><br><span class="line">mgr data = /data/$name</span><br></pre></td></tr></table></figure>
<blockquote>
<ul>
<li>db path: storing data of rocksdb</li>
<li>wal path: for atom operation of rocksdb</li>
<li>block path: storing userdata</li>
</ul>
<p>matedata, such as keyring, is stored in the first partition /dev/sdb1。</p>
</blockquote>
<h4 id="4-Mount-osd"><a href="#4-Mount-osd" class="headerlink" title="4. Mount osd"></a>4. Mount osd</h4><figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> ca12</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> mount /dev/sdc1 /data/osd.0</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> touch /data/osd.0/keyring</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> ca19</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> mount /dev/sdd1 /data/osd.1</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> touch /data/osd.1/keyring</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> ca95</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> mount /dev/sdb1 /data/osd.2</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> touch /data/osd.2/keyring</span></span><br></pre></td></tr></table></figure>
<h4 id="5-Deploy-a-MON-in-ca12"><a href="#5-Deploy-a-MON-in-ca12" class="headerlink" title="5. Deploy a MON in ca12"></a>5. Deploy a MON in ca12</h4><figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> create mon keyring</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-authtool --create-keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. --<span class="built_in">cap</span> mon <span class="string">'allow *'</span> </span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> create admin keyring</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --<span class="built_in">cap</span> mon <span class="string">'allow *'</span> --<span class="built_in">cap</span> osd <span class="string">'allow *'</span> --<span class="built_in">cap</span> mds <span class="string">'allow'</span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> add admin keyring into mon keyring </span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> generate monmap,and save as /etc/ceph/monmap. Then register ca12 <span class="keyword">in</span> monmap.</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> monmaptool --create --clobber --add ca12 192.168.2.172 --fsid 57077c8f-0a92-42e8-a82c-61198875a30e /etc/ceph/monmap</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> create work dir of mon.ca12</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-mon --mkfs -i ca12 --monmap /etc/ceph/monmap --keyring /etc/ceph/ceph.mon.keyring</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> start mon service</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-mon -i ca12 </span></span><br></pre></td></tr></table></figure>
<h4 id="6-Deploy-a-osd-in-ca12"><a href="#6-Deploy-a-osd-in-ca12" class="headerlink" title="6. Deploy a osd in ca12"></a>6. Deploy a osd in ca12</h4><figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> generate an osd id。</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd create</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> general keyring of osd</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-osd -i 0 --mkfs --mkkey --no-mon-config</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> add keyring of osd into ceph auth</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph auth add osd.0 osd <span class="string">'allow *'</span> mon <span class="string">'allow profile osd'</span> -i /data/osd.0/keyring</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> create a host <span class="keyword">in</span> crushmap</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd crush add-bucket ca12 host</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> add the host into root of crushmap</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd crush move ca12 root=default </span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> add osd.0 into host ca12</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd crush add osd.0 1.0 host=ca12</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> start osd service</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-osd -i 0 </span></span><br></pre></td></tr></table></figure>
<h4 id="3-Deploy-mgr"><a href="#3-Deploy-mgr" class="headerlink" title="3. Deploy mgr"></a>3. Deploy mgr</h4><figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> mkdir /data/mgr.admin</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> generate keyring of mgr</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> bin/ceph --cluster ceph auth get-or-create mgr.admin mon <span class="string">'allow profile mgr'</span> osd <span class="string">'allow *'</span> > /data/mgr.admin/keyring</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> start mgr service</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph-mgr -i admin</span></span><br></pre></td></tr></table></figure>
<p>Now, a local cluster is deployed in node <strong>ca12</strong>. We can watch its health info in shell.</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">[root@localhost build]# bin/ceph -s</span><br><span class="line">*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***</span><br><span class="line"> cluster:</span><br><span class="line"> id: 57077c8f-0a92-42e8-a82c-61198875a30e</span><br><span class="line"> health: HEALTH_WARN</span><br><span class="line"> 13 mgr modules have failed dependencies</span><br><span class="line"> 1 monitors have not enabled msgr2</span><br><span class="line"> OSD count 1 < osd_pool_default_size 3</span><br><span class="line"> </span><br><span class="line"> services:</span><br><span class="line"> mon: 1 daemons, quorum ca12 (age 18h)</span><br><span class="line"> mgr: 0(active, since 9s)</span><br><span class="line"> osd: 1 osds: 1 up (since 29m), 1 in (since 29m)</span><br><span class="line"> </span><br><span class="line"> data:</span><br><span class="line"> pools: 0 pools, 0 pgs</span><br><span class="line"> objects: 0 objects, 0 B</span><br><span class="line"> usage: 10 GiB used, 36 GiB / 47 GiB avail</span><br><span class="line"> pgs: </span><br></pre></td></tr></table></figure>
<p>Next, let’s deploy more OSDs in other nodes.</p>
<h4 id="4-Deploy-more-MONs"><a href="#4-Deploy-more-MONs" class="headerlink" title="4. Deploy more MONs"></a>4. Deploy more MONs</h4><ul>
<li>register new MON in MonMap (Configue in old moniter nodes)</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> monmaptool --add ca19 192.168.2.179 --fsid 57077c8f-0a92-42e8-a82c-61198875a30e /etc/ceph/monmap</span></span><br></pre></td></tr></table></figure>
<ul>
<li>use <code>scp</code> to copy conf, keyring, monmap to the new node.</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> scp -r root@192.168.2.172:/etc/ceph root@192.168.2.179:/etc/</span></span><br></pre></td></tr></table></figure>
<ul>
<li>switch to the new node, and create work directory of new moniter.</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ceph-mon --mkfs -i ca19 --monmap /etc/ceph/monmap --keyring /etc/ceph/ceph.mon.keyring</span></span><br></pre></td></tr></table></figure>
<ul>
<li>start mon service</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ceph-mon -i ca19</span></span><br></pre></td></tr></table></figure>
<h4 id="5-Deploy-more-OSDs"><a href="#5-Deploy-more-OSDs" class="headerlink" title="5. Deploy more OSDs"></a>5. Deploy more OSDs</h4><ul>
<li>general osd id in <strong>MON node</strong></li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ceph osd create</span></span><br></pre></td></tr></table></figure>
<ul>
<li>use <code>scp</code> to copy conf, keyring, monmap to the new node.</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> scp -r root@192.168.2.172:/etc/ceph root@192.168.2.179:/etc/</span></span><br></pre></td></tr></table></figure>
<ul>
<li>switch to the new node, and general keyring of the new osd.</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ceph-osd -i 1 --mkfs --mkkey --no-mon-config</span></span><br></pre></td></tr></table></figure>
<ul>
<li>add the new keyring into the ceph auth</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ceph auth add osd.1 osd <span class="string">'allow *'</span> mon <span class="string">'allow profile osd'</span> -i /data/osd.1/keyring</span></span><br></pre></td></tr></table></figure>
<ul>
<li>add osd into the crush map</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> create a host <span class="keyword">in</span> crushmap</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd crush add-bucket ca19 host</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> add the host into the root of crushmap</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd crush move ca19 root=default </span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> add osd.1 into host ca19</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ceph osd crush add osd.1 1.0 host=ca19</span></span><br></pre></td></tr></table></figure>
<ul>
<li>start osd service</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ceph-osd -i 1 </span></span><br></pre></td></tr></table></figure>
<p>Do the similar operation in <em>ca95</em> to create <strong>osd.2</strong>.</p>
<p>Result is following:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">[root@localhost build]# bin/ceph osd tree</span><br><span class="line">*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***</span><br><span class="line">ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF</span><br><span class="line"> -1 3.00000 root default </span><br><span class="line"> -2 1.00000 host ca12 </span><br><span class="line"> 0 ssd 1.00000 osd.0 up 1.00000 1.00000</span><br><span class="line"> -8 1.00000 host ca19 </span><br><span class="line"> 1 ssd 1.00000 osd.1 up 1.00000 1.00000</span><br><span class="line">-11 1.00000 host ca95 </span><br><span class="line"> 2 ssd 1.00000 osd.2 up 1.00000 1.00000</span><br><span class="line"></span><br><span class="line">[root@localhost build]# bin/ceph -s</span><br><span class="line">*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***</span><br><span class="line"> cluster:</span><br><span class="line"> id: 57077c8f-0a92-42e8-a82c-61198875a30e</span><br><span class="line"> health: HEALTH_WARN</span><br><span class="line"> 13 mgr modules have failed dependencies</span><br><span class="line"> 3 monitors have not enabled msgr2</span><br><span class="line"> </span><br><span class="line"> services:</span><br><span class="line"> mon: 3 daemons, quorum ca12,ca19,ca95 (age 28m)</span><br><span class="line"> mgr: 0(active, since 8h), standbys: admin</span><br><span class="line"> osd: 3 osds: 3 up (since 15m), 3 in (since 15m)</span><br><span class="line"> </span><br><span class="line"> data:</span><br><span class="line"> pools: 0 pools, 0 pgs</span><br><span class="line"> objects: 0 objects, 0 B</span><br><span class="line"> usage: 31 GiB used, 109 GiB / 140 GiB avail</span><br><span class="line"> pgs: </span><br><span class="line"></span><br></pre></td></tr></table></figure>
]]></content>
<categories>
<category>Ceph</category>
</categories>
<tags>
<tag>Ceph</tag>
<tag>shell</tag>
</tags>
</entry>
<entry>
<title>Lightnvm+qemu 搭建Open Channel SSD测试环境</title>
<url>/2020/11/10/Lightnvm+qemu%20%E6%90%AD%E5%BB%BAOpen%20Channel%20SSD%E6%B5%8B%E8%AF%95%E7%8E%AF%E5%A2%83/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><p>Lightnvm的原作者在文档中给出了两种实现方式,第一种是使用真实设备,比如CNEX公司,华为公司的Open-channel SSD,价格昂贵。第二种方式就是本文将要介绍的,使用<a href="https://www.qemu.org/">QEMU</a>的方式,向主机侧提供一个模拟的支持open channel功能的nvme设备。实际上qemu中仅仅是模拟了NVMe specification 1.2.1的HCI的功能逻辑,并且作者增加了支持lightnvm的扩展,让主机端的驱动程序以为下面是一个真实的设备,这样一种方式就提供了一个快速,便捷地体验Lightnvm的方式 : )<br>下面是我的配置过程。</p>
<h3 id="配置环境"><a href="#配置环境" class="headerlink" title="配置环境"></a>配置环境</h3><ul>
<li>host: Centos 7.6</li>
<li>guest: ubuntu 18.04 on nvme-qemu.</li>
</ul>
<h3 id="在主机端安装nvme-qemu"><a href="#在主机端安装nvme-qemu" class="headerlink" title="在主机端安装nvme-qemu"></a>在主机端安装nvme-qemu</h3><ul>
<li><p>克隆增加lightnvm支持的分支,nvme-qemu(<a href="https://github.com/OpenChannelSSD/qemu-nvme">https://github.com/OpenChannelSSD/qemu-nvme</a>)<br><code>$ git clone https://github.com/OpenChannelSSD/qemu-nvme.git</code></p>
</li>
<li><p>编译QEMU源码,并且安装,需要注意的是 ,qemu需要安装在Linux主机下。在笔者电脑上,发现已安装glusterfs的头文件与nvme-qemu调用的参数不匹配(可能是glusterfs版本不匹配),因而关闭了对glusterfs的支持。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ./configure --python=/usr/bin/python2 --<span class="built_in">enable</span>-kvm --target-list=x86_64-softmmu --<span class="built_in">enable</span>-linux-aio --prefix=(安装的目录,如<span class="variable">$HOME</span>/qemu-nvme) --<span class="built_in">disable</span>-glusterfs</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> make -j8</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> make install</span></span><br></pre></td></tr></table></figure>
</li>
</ul>
<h3 id="创建QEMU虚拟机"><a href="#创建QEMU虚拟机" class="headerlink" title="创建QEMU虚拟机"></a>创建QEMU虚拟机</h3><h4 id="1-创建一个空白的磁盘文件"><a href="#1-创建一个空白的磁盘文件" class="headerlink" title="1. 创建一个空白的磁盘文件"></a>1. 创建一个空白的磁盘文件</h4><p> <code>$ qemu-img create -f qcow2 ubuntu.qcow2 20G</code></p>
<p>虚拟机的最大空间为20G,qcow2格式,空间动态增长</p>
<h4 id="2-下载新版本的Ubuntu镜像"><a href="#2-下载新版本的Ubuntu镜像" class="headerlink" title="2. 下载新版本的Ubuntu镜像"></a>2. 下载新版本的Ubuntu镜像</h4><p>下载Ubuntu 18.04 64位桌面版。</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">$ wget https://releases.ubuntu.com/18.04.5/ubuntu-18.04.5-desktop-amd64.iso</span><br></pre></td></tr></table></figure>
<h4 id="3-在该磁盘文件中安装Ubuntu系统。"><a href="#3-在该磁盘文件中安装Ubuntu系统。" class="headerlink" title="3. 在该磁盘文件中安装Ubuntu系统。"></a>3. 在该磁盘文件中安装Ubuntu系统。</h4><figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">$ qemu-system-x86_64 -m 2G -enable-kvm ubuntu.qcow2 -cdrom ~/ubuntu-18.04.5-desktop-amd64.iso</span><br></pre></td></tr></table></figure>
<p>这一步遇到了一些问题。笔者用xshell连接服务器主机,并在服务器上运行QEMU虚拟机。在这一步出现报错:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">ALSA lib pulse.c:<span class="number">243</span>:(pulse_connect) PulseAudio: Unable to <span class="built_in">connect</span>: Connection refused</span><br><span class="line"></span><br><span class="line">alsa: Could <span class="keyword">not</span> initialize DAC</span><br><span class="line">alsa: Failed to <span class="built_in">open</span> `<span class="keyword">default</span><span class="number">'</span>:</span><br><span class="line">alsa: Reason: Connection refused</span><br><span class="line">ALSA lib pulse.c:<span class="number">243</span>:(pulse_connect) PulseAudio: Unable to <span class="built_in">connect</span>: Connection refused</span><br><span class="line"></span><br><span class="line">alsa: Could <span class="keyword">not</span> initialize DAC</span><br><span class="line">alsa: Failed to <span class="built_in">open</span> `<span class="keyword">default</span><span class="number">'</span>:</span><br><span class="line">alsa: Reason: Connection refused</span><br><span class="line">audio: Failed to create voice `pcspk<span class="number">'</span></span><br><span class="line">qemu-system-x86_64: Initialization of device isa-pcspk failed: Initializing audio voice failed</span><br></pre></td></tr></table></figure>
<p>看注释是找不到音频输出。我在尝试了多个版本后,发现<strong>只有最近的master版本</strong>有这个问题(commit head = b6fb7eb1e9d708b920f24b559c503e68d0eb0329)。google答案后,修改了环境变量。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">[root@localhost OCSSD_vm]# export QEMU_AUDIO_DRV=none</span><br><span class="line">[root@localhost OCSSD_vm]# $ qemu-system-x86_64 -m 2G -enable-kvm ubuntu.qcow2 -cdrom ~/ubuntu.iso</span><br><span class="line">VNC server running on ::1:5900</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>QEMU会默认创建VNC服务,等待客户端连接后,客户端处能够对虚拟机具体操作。因而打开另一个主机的shell窗口,安装vncviewer,使用vncviewer连接。注意,这里连接后会直接打开一个图形窗口,因而正常显示需要<strong>安装xmanager</strong>。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> apt install vncviewer -y</span></span><br></pre></td></tr></table></figure>
<blockquote>
<p>Xmanager 破解版链接:<a href="https://pan.baidu.com/s/1AjcVwoh4euAzxe34bPwghw">https://pan.baidu.com/s/1AjcVwoh4euAzxe34bPwghw</a> 密码:g8me</p>
</blockquote>
<p>打开图形界面后,完成正常的ubuntu安装流程。</p>
<h3 id="加载含有LightNVM模块的内核"><a href="#加载含有LightNVM模块的内核" class="headerlink" title="加载含有LightNVM模块的内核"></a>加载含有LightNVM模块的内核</h3><p>这一步笔者查阅了很多博客,尝试使用qemu更换内核,但总会产生奇怪的结果。比如使用qemu加载新内核后无法识别根目录,引入initrd后读取不到系统镜像等等。</p>
<p>经过几天挣扎,最后在快要放弃的时候,发现Ubuntu 18.06原系统的/dev/路径下有一个小小的,不起眼的虚拟设备:lightnvm。</p>
<p>!!????</p>
<p>OK,笔者电脑上Ubuntu18.06的内核(5.4.0-42-generic)已经安装了lightnvm的内核模块,不需要更换。因而直接进入下面流程。(所以不能完全参照博客,很多内容已经过期了)。</p>
<h3 id="创建Virtual-OCSSD块设备"><a href="#创建Virtual-OCSSD块设备" class="headerlink" title="创建Virtual OCSSD块设备"></a>创建Virtual OCSSD块设备</h3><p>这一步大部分博客的方法都过期,可以直接参考github上的说明。</p>
<p>在主机端创建OCSSD镜像。(2 group,每个group有)</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">$ qemu-img create -f ocssd -o num_grp=2,num_pu=4,num_chk=60 ocssd.img</span><br></pre></td></tr></table></figure>
<p>关于create参数的详情,用help查看。</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">$ qemu-img create -f ocssd -o help</span><br></pre></td></tr></table></figure>
<p>创建完成后,添加到虚拟机上并运行</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> /usr/qemu-nvme/bin/qemu-system-x86_64 -m 4G -<span class="built_in">enable</span>-kvm ./ubuntu.qcow2 -blockdev ocssd,node-name=nvme01,file.driver=file,file.filename=ocssd.img -device nvme,drive=nvme01,serial=deadbeef,id=lnvm</span></span><br></pre></td></tr></table></figure>
<p>继续使用vncviewer打开虚拟机,准备安装liblightnvm。</p>
<h3 id="安装liblightnvm"><a href="#安装liblightnvm" class="headerlink" title="安装liblightnvm"></a>安装liblightnvm</h3><p>进入虚拟机,首先安装nvme-cli</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> sudo apt-get install nvme-cli</span></span><br></pre></td></tr></table></figure>
<p>列出所有nvme设备。此处显示我们刚添加的nvme0n1。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> sudo nvme lnvm list</span></span><br><span class="line">Device Block manager Version</span><br><span class="line">nvme0n1 gennvm (1,0,0)</span><br></pre></td></tr></table></figure>
<p>接下来安装liblightnvm</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> git <span class="built_in">clone</span> https://github.com/OpenChannelSSD/liblightnvm.git</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">cd</span> liblightnvm</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> sudo apt install cmake libcunit1-dev</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> make configure</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> make</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> sudo make install</span></span><br></pre></td></tr></table></figure>
<p>完成安装后,可以使用测试的helloworld代码验证。(代码在liblightnvm/doc/src/quick_start/hello.c文件中)</p>
<figure class="highlight c"><table><tr><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><liblightnvm.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> **argv)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">nvm_dev</span> *<span class="title">dev</span> = <span class="title">nvm_dev_open</span>("/<span class="title">dev</span>/<span class="title">nvme0n1</span>");</span></span><br><span class="line"><span class="keyword">if</span> (!dev) {</span><br><span class="line">perror(“nvm_dev_open”);</span><br><span class="line"><span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line">}</span><br><span class="line">nvm_dev_pr(dev);</span><br><span class="line">nvm_dev_close(dev);</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>将上面的代码保存为 hello.c<br>然后编译:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> gcc hello.c -fopenmp -llightnvm -o hello</span></span><br></pre></td></tr></table></figure>
<p>并运行</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> sudo ./hello</span></span><br></pre></td></tr></table></figure>
<p>得到输出。该输出与命令行中使用nvme-cli指令一致。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> nvm_dev info /dev/nvme0n1</span></span><br></pre></td></tr></table></figure>
<p>输出为:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> Device information -- nvm_dev_pr</span></span><br><span class="line">dev_attr:</span><br><span class="line"> verid: 0x02</span><br><span class="line"> be_id: 0x01</span><br><span class="line"> be_name: 'NVM_BE_IOCTL'</span><br><span class="line"> name: 'nvme0n1'</span><br><span class="line"> path: '/dev/nvme0n1'</span><br><span class="line"> fd: 3</span><br><span class="line"> ssw: 12</span><br><span class="line"> mccap: '00000000000000000000000000000001'</span><br><span class="line"> bbts_cached: 0</span><br><span class="line"> quirks: '00000000'</span><br><span class="line">dev_geo:</span><br><span class="line"> verid: 0x02</span><br><span class="line"> npugrp: 8</span><br><span class="line"> npunit: 4</span><br><span class="line"> nchunk: 1474</span><br><span class="line"> nsectr: 6144</span><br><span class="line"> nbytes: 4096</span><br><span class="line"> nbytes_oob: 16</span><br><span class="line"> tbytes: 1187021586432</span><br><span class="line"> tmbytes: 1132032</span><br><span class="line">dev_cmd_opts:</span><br><span class="line"> mask: '00000000000000000000000011001000'</span><br><span class="line"> iomd: 'SYNC'</span><br><span class="line"> addr: 'VECTOR'</span><br><span class="line"> plod: 'PRP'</span><br><span class="line">dev_vblk_opts:</span><br><span class="line"> pmode: 'SNGL'</span><br><span class="line"> erase_naddrs_max: 64</span><br><span class="line"> read_naddrs_max: 64</span><br><span class="line"> write_naddrs_max: 64</span><br><span class="line"> meta_mode: 0</span><br><span class="line">dev_ppaf: ~</span><br><span class="line">dev_ppaf_mask: ~</span><br><span class="line">dev_lbaf:</span><br><span class="line"> pugrp: 3</span><br><span class="line"> punit: 2</span><br><span class="line"> chunk: 11</span><br><span class="line"> sectr: 13</span><br><span class="line">dev_lbaz:</span><br><span class="line"> pugrp: 26</span><br><span class="line"> punit: 24</span><br><span class="line"> chunk: 13</span><br><span class="line"> sectr: 0</span><br><span class="line">dev_lbam:</span><br><span class="line"> pugrp: '0000000000000000000000000000000000011100000000000000000000000000'</span><br><span class="line"> punit: '0000000000000000000000000000000000000011000000000000000000000000'</span><br><span class="line"> chunk: '0000000000000000000000000000000000000000111111111110000000000000'</span><br><span class="line"> sectr: '0000000000000000000000000000000000000000000000000001111111111111'</span><br></pre></td></tr></table></figure>
<h3 id="参考:"><a href="#参考:" class="headerlink" title="参考:"></a>参考:</h3><ol>
<li><a href="https://www.dazhuanlan.com/2019/10/02/5d9395b689db4/">Ubuntu搭建使用LightNVM开发Open Channel SSD的QEMU虚拟机</a></li>
<li><a href="https://blog.csdn.net/nikokvcs/article/details/84973529">Virtual OCSSD实验平台搭建</a></li>
<li><a href="https://openchannelssd.readthedocs.io/en/latest/commands/">openchannelssd</a></li>
<li><a href="http://lightnvm.io/liblightnvm/quick_start/index.html#cli-hello-open-channel-ssd">liblightnvm</a></li>
<li><a href="https://github.com/OpenChannelSSD/qemu-nvme">github: qemu-nvme</a></li>
<li><a href="https://blog.csdn.net/zlx_csdn/article/details/80672057?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.pc_relevant_is_cache&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.pc_relevant_is_cache">xmanager 5 破解版,有需要自己下载</a></li>
<li><a href="https://blog.51cto.com/568273240/1689280">xshell远程qemu-kvm虚拟机安装</a></li>
</ol>
]]></content>
<categories>
<category>OCSSD</category>
</categories>
<tags>
<tag>lightnvm</tag>
<tag>qume</tag>
<tag>OCSSD</tag>
</tags>
</entry>
<entry>
<title>DiskSim + SSD extent,SSD FTL模拟-(一)安装</title>
<url>/2020/11/02/DiskSim+SSD%20extent%E6%A8%A1%E6%8B%9FSSD%E8%A1%8C%E4%B8%BA/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><p>首先感谢<a href="http://cighao.com/">Hao Chen</a>博客记录的内容。笔者对于DiskSim没有接触,第一次成功安装全是按照前人博客完成。相对而言DiskSim + SSD extent的编译流程还是比较复杂,设计源码和make文件的大量改动。前人栽树后人乘凉,已经有现成的patch文件,避免了我们手动改动代码,不仅耗时还容易出错。</p>
<p><strong>再次感谢!!</strong></p>
<p><strong>笔者使用的系统介绍:centos7.6-64bit,内核版本5.8.6,编译器为4.8.5</strong></p>
<h3 id="安装依赖"><a href="#安装依赖" class="headerlink" title="安装依赖"></a>安装依赖</h3><ul>
<li>linux 如没安装flex、bison的话,先要安装。</li>
</ul>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash">sudo yum install bison flex -y</span></span><br></pre></td></tr></table></figure>
<ul>
<li>下载源码包</li>
</ul>
<p>disksim 4.0: <a href="http://www.pdl.cmu.edu/DiskSim/">http://www.pdl.cmu.edu/DiskSim/</a><br>SSD extension: <a href="http://research.microsoft.com/en-us/downloads/b41019e2-1d2b-44d8-b512-ba35ab814cd4/">http://research.microsoft.com/en-us/downloads/b41019e2-1d2b-44d8-b512-ba35ab814cd4/</a></p>
<h3 id="解压安装包"><a href="#解压安装包" class="headerlink" title="解压安装包"></a>解压安装包</h3><figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">$ tar xfz disksim-4.0-with-dixtrac.tar.gz</span><br><span class="line">$ cd disksim-4.0</span><br><span class="line">$ unzip ../ssd-add-on.zip</span><br></pre></td></tr></table></figure>
<h3 id="补丁"><a href="#补丁" class="headerlink" title="补丁"></a>补丁</h3><p>补丁为:从Hao Chen的github上下载的相关patch文件。</p>
<p>点击链接,下载 <a href="https://github.com/cighao/disksim-4.0-with-ssdmodel-patch">‘modify-patch’</a> 和 <a href="https://github.com/cighao/disksim-4.0-with-ssdmodel-64bit-patch">‘64bit-patch’</a>。下载好后将 <code>modify-patch</code> 和 <code>64bit-patch</code> 这两个文件都放到 <code>disksim-4.0</code> 路径下。</p>
<p>step1. 集成 ssdmodel</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">patch -p1 < ssdmodel/ssd-patch</span><br></pre></td></tr></table></figure>
<p>step2. 修改文件</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">patch -p1 < modify-patch</span><br></pre></td></tr></table></figure>
<p>如果只想在 32 位系统上运行的话,step3可以跳过,直接 make 就行了。</p>
<p>step3. 兼容64位的补丁</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">patch -p1 < 64bit-patch</span><br></pre></td></tr></table></figure>
<h3 id="make"><a href="#make" class="headerlink" title="make"></a>make</h3><figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> make</span></span><br></pre></td></tr></table></figure>
<h3 id="运行"><a href="#运行" class="headerlink" title="运行"></a>运行</h3><p>step1. 测试disksim能否顺利执行。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">cd</span> valid</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ./runvalid</span></span><br></pre></td></tr></table></figure>
<p>正常运行,输出如下:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">These results represent actual drive validation experiments</span><br><span class="line"></span><br><span class="line">QUANTUM_QM39100TD-SW (rms should be about 0.378)</span><br><span class="line">rms = 0.378078</span><br><span class="line"></span><br><span class="line">SEAGATE_ST32171W (rms should be about 0.349)</span><br><span class="line">rms = 0.347863</span><br><span class="line"></span><br><span class="line">SEAGATE_ST34501N (rms should be about 0.318)</span><br><span class="line">rms = 0.318228</span><br><span class="line"></span><br><span class="line">SEAGATE_ST39102LW (rms should be about 0.107)</span><br><span class="line">rms = 0.107098</span><br></pre></td></tr></table></figure>
<p>step2. 测试ssd extension的脚本能否顺利执行。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> chmod a+x ../ssdmodel/valid/runvalid</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">cd</span> ../ssdmodel/valid</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> ./runvalid</span></span><br></pre></td></tr></table></figure>
<p>正常运行,输出如下:</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">---Running tests with the synthetic workload generator---</span><br><span class="line"></span><br><span class="line">Sequential read (250K I/Os): average SSD response time should be around 0.132 ms</span><br><span class="line">ssd Response time average: 0.132511</span><br><span class="line">Sequential write (250K I/Os): average SSD response time should be around 0.310 ms</span><br><span class="line">ssd Response time average: 0.310895</span><br><span class="line">Sequential write (5M I/Os): average SSD response time should be around 0.334 ms</span><br><span class="line">ssd Response time average: 0.334365</span><br><span class="line">Random read (250K I/Os): average SSD response time should be around 0.136 ms</span><br><span class="line">ssd Response time average: 0.136118</span><br><span class="line">Random write (250K I/Os): average SSD response time should be around 0.329 ms</span><br><span class="line">ssd Response time average: 0.329458</span><br><span class="line">Random write (5M I/Os): average SSD response time should be around 0.593 ms</span><br><span class="line">ssd Response time average: 0.593438</span><br><span class="line">---Running tests with the real traces---</span><br><span class="line"></span><br><span class="line">IOzone: average SSD response time should be around 6.394276 ms</span><br><span class="line">ssd Response time average: 6.394276</span><br><span class="line">Postmark: average SSD response time should be around 4.140330 ms</span><br><span class="line">ssd Response time average: 4.140330</span><br></pre></td></tr></table></figure>
]]></content>
<categories>
<category>OCSSD</category>
</categories>
<tags>
<tag>DiskSim</tag>
<tag>SSD extent</tag>
<tag>OCSSD</tag>
</tags>
</entry>
<entry>
<title>SATA、mSATA、M.2、M.2(NVMe)、PCIE固态硬盘接口详解</title>
<url>/2021/01/19/SATA%E3%80%81mSATA%E3%80%81M.2%E3%80%81M.2%EF%BC%88NVMe%EF%BC%89%E3%80%81PCIE%E5%9B%BA%E6%80%81%E7%A1%AC%E7%9B%98%E6%8E%A5%E5%8F%A3%E8%AF%A6%E8%A7%A3/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><p>引用自:<a href="https://blog.csdn.net/shuai0845/article/details/98330290">shuai0845 - SATA、mSATA、M.2、M.2(NVMe)、PCIE固态硬盘接口详解</a></p>
<p>目前固态硬盘的主要接口有:</p>
<h2 id="SATA接口"><a href="#SATA接口" class="headerlink" title="SATA接口"></a><strong>SATA接口</strong></h2><p>作为目前应用最多的硬盘接口,SATA 3.0接口最大的优势就是成熟。普通2.5英寸SSD以及HDD硬盘都使用这种接口,理论传输带宽6Gbps,虽然比起新接口的10Gbps甚至32Gbps带宽差多了,但普通2.5英寸SSD也没这么高的需求,500MB/s多的读写速度也够用。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133229660.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133239257.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<h2 id="mSATA接口"><a href="#mSATA接口" class="headerlink" title="mSATA接口"></a><strong>mSATA接口</strong></h2><p>mSATA接口,全称迷你版SATA接口(mini-SATA)。是早期为了更适应于超级本这类超薄设备的使用环境,针对便携设备开发的mSATA接口应运而生。可以把它看作标准SATA接口的mini版,而在物理接口上(也就是接口类型)是跟mini PCI-E接口是一样的。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133304630.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>mSATA接口是SSD小型化的一个重要过程,不过mSATA依然没有摆脱SATA接口的一些缺陷,比如依然是SATA通道,速度也还是6Gbps。诸多原因没能让mSATA接口火起来,反而被更具升级潜力的M.2 SSD所取代。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133316185.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<h2 id="M-2接口"><a href="#M-2接口" class="headerlink" title="M.2接口"></a><strong>M.2接口</strong></h2><p>M.2接口是Intel推出的一种替代mSATA的新的接口规范,也就是我们以前经常提到的NGFF,即Next Generation Form Factor。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/2019080313333130.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>M.2接口的固态硬盘宽度22mm,单面厚度2.75mm,双面闪存布局也不过3.85mm厚,但M.2具有丰富的可扩展性,最长可以做到110mm,可以提高SSD容量。M.2 SSD与mSATA类似,也是不带金属外壳的,常见的规格有主要有2242、2260、2280三种,宽度都为22mm,长度则各不相同。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133345703.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>不仅仅是长度,M.2的接口也有两种不同的规格,分别是“socket2”和”socket3”</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133356831.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>看似都是M.2接口,但其支持的协议不同,对其速度的影响可以说是千差万别,M.2接口目前支持两种通道总线,一个是SATA总线,一个是PCI-E总线。当然,SATA通道由于理论带宽的限制(6Gb/s),极限传输速度也只能到600MB/s,但PCI-E通道就不一样了,带宽可以达到10Gb/s,所以看似都为M.2接口,但走的“道儿”不一样,速度自然也就有了差别。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/2019080313341120.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>上图为M.2接口走SATA通道的速率</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133426433.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>上图为M.2接口走PCIE通道的速率</p>
<h3 id="M-2接口-NVMe协议"><a href="#M-2接口-NVMe协议" class="headerlink" title="M.2接口(NVMe协议)"></a><strong>M.2接口(NVMe协议)</strong></h3><p>NVM Express(NVMe),或称非易失性内存主机控制器接口规范(Non-Volatile Memory express),是一个逻辑设备接口规范。他是与AHCI类似的、基于设备逻辑接口的总线传输协议规范(相当于通讯协议中的应用层),用于访问通过PCI-Express(PCIe)总线附加的非易失性内存介质,虽然理论上不一定要求 PCIe 总线协议。</p>
<p>此规范目的在于充分利用PCI-E通道的低延时以及并行性,还有当代处理器、平台与应用的并行性,在可控制的存储成本下,极大的提升固态硬盘的读写性能,降低由于AHCI接口带来的高延时,彻底解放SATA时代固态硬盘的极致性能。</p>
<p>NVMe具体优势包括:</p>
<p>①性能有数倍的提升;</p>
<p>②可大幅降低延迟;</p>
<p>③NVMe可以把最大队列深度从32提升到64000,SSD的IOPS能力也会得到大幅提升;</p>
<p>④自动功耗状态切换和动态能耗管理功能大大降低功耗;</p>
<p>⑤NVMe标准的出现解决了不同PCIe SSD之间的驱动适用性问题。</p>
<p><strong>延时更低:</strong></p>
<p>说到NVMe标准对比AHCI标准的优势,其中之一就是低延时。因为AHCI标准本身就是为高延迟的机械硬盘而设,虽然SSD发展至今,主流产品已经开始不能满足性能的高速发展,特别是在延迟方面。而面向SSD产品的NVMe标准,降低存储时出现的高延迟,就是其要解决的问题之一。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133449868.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>NVMe SSD可有效降低延迟(图片来自网络)</p>
<p>在软件层方面,NVMe标准的延时只有AHCI的一半不到,NVMe精简了调用方式,执行命令时不需要读取寄存器;而AHCI每条命令则需要读取4次寄存器,一共会消耗8000次CPU循环,从而造成大概2.5微秒的延迟。</p>
<p><strong>IOPS大增:</strong></p>
<p>NVMe的另一个重点则是提高SSD的IOPS(每秒读写次数)性能。目前市面上性能不错的SATA接口SSD,最多只会测试到队列深度为32的IOPS能力,其实终究原因这是AHCI的上限,其实许多闪存主控可以提供更好的队列深度。而NVMe则可以把最大队列深度从32提升到64000,SSD的IOPS能力也会得到大幅提升。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133458697.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p><strong>队列深度的大幅提升(图片来自网络)</strong></p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/2019080313351546.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>低延时和良好的并行性的优势就是可以让SSD的随机性能得到大幅度提升,这是950PRO系列SSD的现场跑分,它的随机性能表现绝对是一流的,在任何队列深度下都能发挥出极佳的速度。</p>
<p><strong>功耗更低:</strong></p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133525280.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>更先进的能耗管理(图片来自网络)</p>
<p>NVMe加入了自动功耗状态切换和动态能耗管理功能,设备从能耗状态0闲置50ms后可以迅速切换到能耗状态1,在500ms闲置后又会进入能耗更低的状态2。虽然切换能耗状态会产生短暂延迟,但闲置时这两种状态下的功耗可以控制在非常低的水平,因此在能耗管理上,相比起主流的SATA接口SSD拥有较大优势,这一点对增加笔记本电脑等移动设备的续航尤其有帮助。</p>
<p><strong>驱动适用性广:</strong></p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133535511.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>*流操作系统逐渐开始支持NVMe(图片来自网络)</p>
<p>NVMe标准的出现解决了不同PCIe SSD之间的驱动适用性问题,NVMe SSD可以很方便的匹配不同的平台、系统,无需厂家提供相应的驱动就可以正常工作,目前Windows、Linux、Solaris、Unix、VMware、UEFI等都加入了对NVMe SSD的支持。</p>
<h2 id="PCI-E接口:"><a href="#PCI-E接口:" class="headerlink" title="PCI-E接口:"></a><strong>PCI-E接口:</strong></h2><p>在传统SATA硬盘中,当我们进行数据操作时,数据会先从硬盘读取到内存,再将数据提取至CPU内部进行计算,计算后写入内存,存储至硬盘中;而PCI-E就不一样了,数据直接通过总线与CPU直连,省去了内存调用硬盘的过程,传输效率与速度都成倍提升。简单的说,我们可以把两种通道理解成两辆相同的汽车,PCI-E通道的汽车就像是在高速上行驶,而SATA通道的汽车就像是在崎岖山路上行驶。很显然,PCI-E SSD传输速度远远大于SATA SSD。</p>
<p><img src= "/img/loading.gif" data-src="https://img-blog.csdnimg.cn/20190803133550963.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NodWFpMDg0NQ==,size_16,color_FFFFFF,t_70" alt="img"></p>
<p>目前PCI-E接口通道有PCI-E 2.0 x2及PCI-E 3.0 x4两种,最大速度达到32Gbps,可以满足未来一段时间的使用,而且早期PCI-E硬盘不能做启动盘的问题早解决,现在旗舰级SSD大多会选择PCI-E接口。</p>
<p>虽然PCI-E SSD有诸多好处,但也不是每个人都适合。PCI-E SSD由于闪存颗粒和主控品质问题,总体成本较高,相比传统SATA固态硬盘价格贵一些。另外,由于PCI-e会占用总线通道,入门以及中端平台CPU通道数较少,都不太适合添加PCI-e SSD,只有Z170,或者是X79、X99这样顶级平台,才可以完全发挥PCI-E SSD的性能。总的来说,如果你是一个不差钱的土豪,那么就 PCI-e SSD吧!</p>
]]></content>
<categories>
<category>SSD</category>
</categories>
<tags>
<tag>SSD</tag>
<tag>接口</tag>
</tags>
</entry>
<entry>
<title>centos 7使用tinyproxy配置局域网代理</title>
<url>/2021/01/19/tinyproxy%E9%85%8D%E7%BD%AE%E5%B1%80%E5%9F%9F%E7%BD%91%E4%BB%A3%E7%90%86/</url>
<content><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="\assets\css\APlayer.min.css"><script src="\assets\js\APlayer.min.js" class="aplayer-secondary-script-marker"></script><script class="meting-secondary-script-marker" src="\assets\js\Meting.min.js"></script><h2 id="背景介绍"><a href="#背景介绍" class="headerlink" title="背景介绍"></a>背景介绍</h2><p>ssh工具对于大部分程序猿是最常使用的工具了。但跨网络使用ssh可能会产生种种问题,以下是我们的场景:</p>
<p>有一个服务器集群,<strong>内部由局域网连接</strong>,同时只指定了<strong>一个公网IP给网关机</strong>。我们在局域网内,使用指定网关的方式,可以很简单的使用外网,但如果想要<strong>从公网上</strong>使用ssh连接到<strong>指定服务器</strong>,却需要额外配置。</p>
<p>通常使用的方法包括:</p>
<ol>
<li>NAT穿透</li>
<li>正向代理</li>
</ol>
<p>我们主要介绍第二种方式。通过正向代理,我们可以通过唯一确定的代理IP(即网关机Ip)和局域网内服务器IP进行ssh连接。同时配置代理后,集群数量可以动态调节,不需要对代理服务器进行多余配置。</p>
<p>代理服务器(网关机)公网IP为:222.222.22.2(我胡编的一个IP),内网IP为192.168.1.1/24</p>
<p>希望连接的服务器内网IP:192.168.1.32/24</p>
<h2 id="步骤"><a href="#步骤" class="headerlink" title="步骤"></a>步骤</h2><h3 id="安装iptables"><a href="#安装iptables" class="headerlink" title="安装iptables"></a>安装iptables</h3><p>因为代理服务器系统为centos7,需要安装iptables进行精细的访问控制。如果系统为centos6,则跳过当前步骤。</p>
<ol>
<li>查看iptables是否安装</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> systemctl status iptables</span></span><br></pre></td></tr></table></figure>
<ol start="2">
<li>如果未安装,安装iptables</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> yum -y install iptables-services</span></span><br></pre></td></tr></table></figure>
<ol start="3">
<li>关闭selinux与firewalld</li>
</ol>
<blockquote>
<p> 关闭selinux,不关闭时,iptables不读取配置文件 </p>
<p> centos7中默认的防火墙是firewalld,使用iptables需要先关闭firewalld防火墙</p>
</blockquote>
<p>首先关闭selinux</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> setenforce 0 <span class="comment">#关闭selinux</span></span></span><br><span class="line"><span class="meta">$</span><span class="bash"> vim /etc/selinux/config <span class="comment">#修改完成后需要重启才能生效</span></span></span><br></pre></td></tr></table></figure>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> This file controls the state of SELinux on the system.</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> SELINUX= can take one of these three values:</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> enforcing - SELinux security policy is enforced.</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> permissive - SELinux prints warnings instead of enforcing.</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> disabled - No SELinux policy is loaded.</span></span><br><span class="line">SELINUX=disabled #设为disabled</span><br><span class="line"><span class="meta">#</span><span class="bash"> SELINUXTYPE= can take one of three values:</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> targeted - Targeted processes are protected,</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> minimum - Modification of targeted policy. Only selected processes are protected. </span></span><br><span class="line"><span class="meta">#</span><span class="bash"> mls - Multi Level Security protection.</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> SELINUXTYPE=targeted <span class="comment">#注释改行</span></span></span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>其次关闭防火墙</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash">关闭防火墙</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> systemctl stop firewalld</span></span><br><span class="line"><span class="meta">#</span><span class="bash">关闭开机自启</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> systemctl <span class="built_in">disable</span> firewalld</span></span><br></pre></td></tr></table></figure>
<ol start="4">
<li>配置iptables</li>
</ol>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line"># 打开iptables文件</span><br><span class="line">$ vim /etc/sysconfig/iptables</span><br></pre></td></tr></table></figure>
<p>完成的iptables配置文件如下:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> Generated by iptables-save v1.4.7 on Tue Nov 17 17:11:57 2020</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> 配置nat,</span></span><br><span class="line">*nat </span><br><span class="line">:PREROUTING ACCEPT [639:204921]</span><br><span class="line">:POSTROUTING ACCEPT [10:682]</span><br><span class="line">:OUTPUT ACCEPT [13:854]</span><br><span class="line">-A POSTROUTING -s 192.168.1.0/24 -j SNAT --to-source 222.222.22.2</span><br><span class="line">COMMIT</span><br><span class="line"><span class="meta">#</span><span class="bash"> Completed on Tue Nov 17 17:11:57 2020</span></span><br><span class="line"><span class="meta">#</span><span class="bash"> Generated by iptables-save v1.4.7 on Tue Nov 17 17:11:57 2020</span></span><br><span class="line">*filter</span><br><span class="line">:INPUT ACCEPT [0:0]</span><br><span class="line">:FORWARD ACCEPT [0:0]</span><br><span class="line">:OUTPUT ACCEPT [12404:4566425]</span><br><span class="line">-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT</span><br><span class="line">-A INPUT -p icmp -j ACCEPT</span><br><span class="line">-A INPUT -i lo -j ACCEPT</span><br><span class="line"><span class="meta">#</span><span class="bash"> 开启22端口,允许ssh访问本机</span></span><br><span class="line">-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT</span><br><span class="line"><span class="meta">#</span><span class="bash"> 开启9001端口,允许本机作为tinyproxy的代理服务器</span></span><br><span class="line">-A INPUT -p tcp -m state --state NEW -m tcp --dport 9001 -j ACCEPT</span><br><span class="line"><span class="meta">#</span><span class="bash"> 开启5001端口,允许使用本机作为iperf测试服务器</span></span><br><span class="line">-A INPUT -p tcp -m state --state NEW -m tcp --dport 5001 -j ACCEPT</span><br><span class="line">-A INPUT -j REJECT --reject-with icmp-host-prohibited</span><br><span class="line"><span class="meta">#</span><span class="bash"> 允许外网流量进入192.168.1.0网段内的机器,即局域网</span></span><br><span class="line">-A FORWARD -d 192.168.1.0/24 -j ACCEPT</span><br><span class="line">-A FORWARD -s 192.168.1.0/24 -j ACCEPT</span><br><span class="line">-A INPUT -p icmp -j ACCEPT</span><br><span class="line">-A FORWARD -j REJECT --reject-with icmp-host-prohibited</span><br><span class="line">COMMIT</span><br><span class="line"><span class="meta">#</span><span class="bash"> Completed on Tue Nov 17 17:11:57 2020</span></span><br></pre></td></tr></table></figure>
<ol start="4">
<li>开启iptables服务</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> systemctl start iptables <span class="comment">#启动</span></span></span><br><span class="line"><span class="meta">$</span><span class="bash"> systemctl <span class="built_in">enable</span> iptables <span class="comment">#设置开机自启</span></span></span><br></pre></td></tr></table></figure>
<h3 id="安装tinyproxy"><a href="#安装tinyproxy" class="headerlink" title="安装tinyproxy"></a>安装tinyproxy</h3><ol>
<li>安装epel源,能够安装额外的软件包(包括tinyproxy)</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> yum install -y epel-release</span></span><br></pre></td></tr></table></figure>
<ol start="2">
<li>安装tinyproxy</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> yum -y install tinyproxy</span></span><br></pre></td></tr></table></figure>
<ol start="3">
<li>配置tinyproxy。首先打开配置文件</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> vim /etc/tinyproxy/tinyproxy.conf</span></span><br></pre></td></tr></table></figure>
<p>我们需要修改的参数包括:</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">#配置代理端口</span><br><span class="line">Port 9001 </span><br><span class="line"></span><br><span class="line">#配置最大连接数</span><br><span class="line">MaxClients 100000 </span><br><span class="line"></span><br><span class="line"># 添加允许代理的端口。我们需要代理ssh服务,因而添加22和443端口。</span><br><span class="line">ConnectPort 443</span><br><span class="line">ConnectPort 563</span><br><span class="line">ConnectPort 22</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<ol start="4">
<li>启动tinyproxy</li>
</ol>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> systemctl start tinyproxy</span></span><br></pre></td></tr></table></figure>
<h2 id="效果"><a href="#效果" class="headerlink" title="效果"></a>效果</h2><p>完成上述配置后,我们可以使用代理ip+内网ip的方式访问内网服务器。</p>
]]></content>
<categories>
<category>proxy</category>
</categories>
<tags>
<tag>http proxy</tag>
<tag>tinyproxy</tag>
<tag>代理</tag>
</tags>
</entry>
</search>