-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
840 lines (792 loc) · 50.4 KB
/
index.html
File metadata and controls
840 lines (792 loc) · 50.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
<html>
<head>
<!-- Google tag (gtag.js) -->
<!-- <script async src="https://www.googletagmanager.com/gtag/js?id=G-VTKHNTKBM4"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-VTKHNTKBM4');
</script> -->
<meta charset="utf-8" />
<title>Botany-Bot</title>
<!-- TODO double check these are updated -->
<!-- Website Metadata -->
<meta content="Gaussian Splatting + GARField + Robot Interaction for plant Digital Twins"
name="description" />
<meta content="Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats" property="og:title" />
<meta content="Create a segmented 3D reconstruction of plants with Gaussian Splatting and Garfield, then use a robot's interaction with the plant to reveal even more information."
property="og:description" />
<meta content="https://berkeleyautomation.github.io/Botany-Bot/data/preview_card.png" property="og:image" />
<meta content="Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats" property="twitter:title" />
<meta content="Create a segmented 3D reconstruction of plants with Gaussian Splatting and Garfield, then use a robot's interaction with the plant to reveal even more information."
property="twitter:description" />
<meta content="https://berkeleyautomation.github.io/Botany-Bot/data/preview_card.png" property="twitter:image" />
<meta property="og:type" content="website" />
<meta content="summary_large_image" name="twitter:card" />
<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
<!-- Fonts -->
<link href="https://fonts.googleapis.com" rel="preconnect" />
<link href="https://fonts.gstatic.com" rel="preconnect" crossorigin="anonymous" />
<script src="https://ajax.googleapis.com/ajax/libs/webfont/1.6.26/webfont.js" type="text/javascript"></script>
<script
type="text/javascript">WebFont.load({ google: { families: ["Lato:100,100italic,300,300italic,400,400italic,700,700italic,900,900italic", "Montserrat:100,100italic,200,200italic,300,300italic,400,400italic,500,500italic,600,600italic,700,700italic,800,800italic,900,900italic", "Ubuntu:300,300italic,400,400italic,500,500italic,700,700italic", "Open Sans:300,300italic,400,400italic,600,600italic,700,700italic,800,800italic", "Changa One:400,400italic", "Varela Round:400", "Bungee Shade:regular", "Roboto:300,regular,500", "Bungee Outline:regular"] } });</script>
<!--[if lt IE 9]><script src="https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv.min.js" type="text/javascript"></script><![endif]-->
<!-- JQuery, scripts etc -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<script src="script.js" type="text/javascript"></script>
<script src="js/carousel_utils.js" type="text/javascript"></script>
<!-- Stylesheets; tabler icons, fonts, ...-->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@tabler/icons@latest/iconfont/tabler-icons.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<link href="style.css" rel="stylesheet" type="text/css" />
<link href="data/botanybot.png" rel="shortcut icon" type="image/x-icon" />
</head>
<body>
<div>
<!-- Title, Subtitle, Authors -->
<div>
<h1 class="title">
<span style="text-wrap: nowrap">Botany-Bot: Digital Twin Monitoring</span> <span style="text-wrap: nowrap">of Occluded and Underleaf Plant Structures with Gaussian Splats</span>
<!-- <span style="text-wrap: nowrap">Robot Do 🦾</span> -->
</h1>
<!-- <h1 class="subheader">Imitating Articulated Object Manipulation with Monocular 4D Reconstruction</h1> -->
<!-- Author list -->
<div class="base-row">
<!-- Refer to how author wrapping is performed in https://brentyi.github.io/tilted/ -->
<div style="display: flex; flex-wrap: wrap; justify-content: center;">
<div>
<a href="https://simeonoa.github.io/" target="_blank" class="author-text">
Simeon Adebola
</a><sup>1</sup>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://chungmin99.github.io/" target="_blank" class="author-text">
Chung Min Kim
</a><sup>1</sup>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://kerrj.github.io/" target="_blank" class="author-text">
Justin Kerr
</a><sup>1</sup>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://ehehee.github.io/" target="_blank" class="author-text">
Shuangyu Xie
</a><sup>1</sup>
</div>
<div>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://scholar.google.com/citations?user=SGXXVx8AAAAJ&hl=en" target="_blank" class="author-text">
Prithvi Akella
</a><sup>2</sup>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://scholar.google.com/citations?user=vk2qKkYAAAAJ&hl=en" target="_blank" class="author-text">
Jose Luis Susa Rincon
</a><sup>2</sup>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://scholar.google.com.au/citations?user=AvvaaJcAAAAJ&hl=en" target="_blank" class="author-text">
Eugen Solowjow
</a><sup>2</sup>
<div style="width: 1.25em; display: inline-block"></div>
<a href="https://goldberg.berkeley.edu/" target="_blank" class="author-text">
Ken Goldberg
</a><sup>1</sup>
</div>
</div>
</div>
<div style="text-align: center">
<h1 id="uc-berkeley"><sup>1</sup>UC Berkeley </h1>
<h1 id="uc-berkeley"><sup>2</sup>Siemens Research Lab, Berkeley</h1>
<!-- <span class="text-star">*</span>
Denotes Equal Contribution -->
</div>
</div>
<!-- Submission status -->
<div class="title-row">
<!-- <h2 class="subheader">CoRL 2024 (Oral)</h1> -->
<h2 class="subheader">IROS 2025 </h1>
</div>
<!-- Paper / code / data URLs -->
<!-- TODO: Update arxiv/code/data link -->
<div class="base-row add-top-padding">
<!-- Paper -->
<a href="https://arxiv.org/abs/2510.17783" target="_blank" class="link-block">
<figure>
<img src="https://uploads-ssl.webflow.com/51e0d73d83d06baa7a00000f/5cab99df4998decfbf9e218e_paper-01.png"
alt="paper"
srcset="https://uploads-ssl.webflow.com/51e0d73d83d06baa7a00000f/5cab99df4998decfbf9e218e_paper-01-p-500.png"
style="max-height: 4em" />
</figure>
<figcaption>
<strong class="link-labels-text">Paper </strong>
</figcaption>
</a>
<!-- Code -->
<a href="https://berkeleyautomation.github.io/Botany-Bot/" target="_blank" class="link-block">
<figure>
<img src="https://uploads-ssl.webflow.com/51e0d73d83d06baa7a00000f/5cae3b53b42ebb3dd4175a82_68747470733a2f2f7777772e69636f6e66696e6465722e636f6d2f646174612f69636f6e732f6f637469636f6e732f313032342f6d61726b2d6769746875622d3235362e706e67.png"
alt="code" style="max-height: 4em" />
</figure>
<figcaption>
<strong class="link-labels-text"></Code ></strong>
</figcaption>
<figcaption>
<strong class="link-labels-text">Coming Soon </strong>
</figcaption>
</a>
<!-- Data -->
<a href="https://huggingface.co/datasets/SimeonOA/Botany-Bot" target="_blank" class="link-block">
<figure>
<img src="data/database_icon.jpg" alt="data" style="max-height: 4em" />
</figure>
<figcaption>
<strong class="link-labels-text">Data </strong>
</figcaption>
</a>
</div>
<!-- TL;DR + Teaser video -->
<div class="section base-row add-top-padding">
<h1 class="tldr">
<b>TL;DR</b>:
Botany-Bot uses Gaussian Splatting and Garfield to create a segmented 3D reconstruction of plants and then uses this to guide a robot's interaction with the plant to reveal even more information.
</h1>
<video id="main-video" autobuffer muted autoplay loop controls playsinline>
<source id="mp4" src="data/supp_video.mp4" type="video/mp4">
</video>
</div>
<!-- Abstract -->
<div class="section base-row add-top-padding">
<h1>Abstract</h1>
<p class="paragraph">
Commercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed “annotated digital twins” of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy.
</p>
</div>
<div class="section base-row add-top-padding">
<h1>Full Pipeline</h1>
<h1 class="tldr">
Botany-Bot allows you to obtain a <b>3D model</b> of plants from a scan and then use a robot arm to manipulate the plant's leaves to obtain more information than the 3D model currently provides. <br>
</h1>
<img src="data/pipeline.png" style="max-width: 100%" />
</div>
<div class="section add-top-padding">
<div class="base-row">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
</div>
</div>
<div class="section base-row add-top-padding">
<h1>Plant Modeling</h1>
<p class="paragraph">
Botany-Bot uses a light box, two fixed cameras and a digital turntable to obtain a plant scan. To obtain multi-view camera poses we place an ArUco
marker on the turntable and calibrate the camera-to-turntable pose for angles through which the turntable moves. Next, we place the plant on top of
the turntable and repeat the same angles, which results in a multi-view posed capture. We utilize two ZED 2Stereo cameras oriented vertically, for a total of 4 angles of elevation, and rotate the turntable to evenly spaced radial angles. Every plant also has an ArUco marker which we use to save a relative pose between the plant and the turntable by calculating the relative pose between the camera-
to-turntable pose and the camera-to-plant pose.
</p>
<img src="data/scan_platform.png" style="max-width: 100%" />
<h1>3D Plant Reconstruction and Segmentation</h1>
<p class="paragraph">
The rotating turntable multiview
capture breaks the core fundamental assumption in NeRF
and 3DGS that the scene remain static during capture in two
ways: 1) the background around the object is static relative
to the camera, and 2) lighting on the surface of the object
is not 3D-consistent. To alleviate 1), we preprocess input
data by automatically masking the potted plant with <a href="https://github.com/facebookresearch/sam2" class="author-text"target="_blank">Segment
Anything 2 (SAM 2)</a>. During radiance field construction,
we do not compute standard loss functions on pixels lying
outside this mask. We implement an extra L1 loss between
the potted plant’s mask and accumulation in the gaussian
splatting reconstruction which allows us to delete spurious geometry in the scene. We refer to this loss as an alpha loss.
We use <a href="https://www.garfield.studio/" class="author-text"target="_blank">GARField</a>, for segmenting the various parts of the plants. See below the Gaussian splatting reconstruction for the plants and click parts to see the resultant segmentation.
</p>
<div id="main-results">
<h1 class="tldr"><b>Plant Visualizations</b></h1>
<!-- <div class="video-carousel-wrapper">
<div class="carousel-container" id="videoCarousel"> -->
<!-- Add your video elements here -->
<!-- <div class="carousel-item" id="bear_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/cal_bear_naked_wave.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="nerfgun_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/nerfgun_final.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="redbox_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/buddha_empty_close.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="scissors_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/scissors2.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="sunglasses_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/sunglasses_fold.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="ledlight_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/led_light_unfold3.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="stapler_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/stapler_fold.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="wirecutters_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/wire_cutters_close2.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="usbplug_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/usb_plug_unplug3.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-item" id="jan12_video">
<video autoplay muted loop playsinline height="400px">
<source src="data/demo_vids/sunglasses_fold.mp4" type="video/mp4">
</video>
</div>
</div>
</div> -->
<!-- <h1 class="tldr"><b>
<img src="data/drag_icon.png" alt="" class="inline-image">Click and move me!<img src="data/drag_icon.png" alt="" class="inline-image">
</b></h1> -->
<div id="iframe-container" class="iframe-container">
<div class="click-and-move-overlay">
<h1 class="tldr">
<b>
<img src="data/drag_icon.png" alt="" class="inline-image">
Click and move me!
<img src="data/drag_icon.png" alt="" class="inline-image">
</b>
</h1>
</div>
<iframe
id="pepper1"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/pepper1.viser&initDistanceScale=15"
></iframe>
<iframe
id = "pepper2"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/pepper2.viser&initDistanceScale=15"
></iframe>
<iframe
id = "pepper3"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/pepper3.viser&initDistanceScale=15"
></iframe>
<iframe
id = "alocasia"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/alocasia.viser&initDistanceScale=12"
></iframe>
<iframe
id = "anthurium"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/anthurium.viser&initDistanceScale=12"
></iframe>
<iframe
id="pinkprincess"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/pinkprincess.viser&initDistanceScale=12"
></iframe>
<iframe
id="belize"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/belize.viser&initDistanceScale=12"
></iframe>
<iframe
id="croton"
class = "iframe"
data-src="https://berkeleyautomation.github.io/Botany-Bot/build/?playbackPath=https://berkeleyautomation.github.io/Botany-Bot/recordings/croton.viser&initDistanceScale=12"
></iframe>
</div>
<div style="position: relative; display: flex;">
<button class="results-slide-arrow" id="results-slide-arrow-prev" onclick="results_slide_left()">
‹
</button>
<div class="results-slide-row" id="results-objs-scroll">
<div data-img-src="data/thumbnails/pepper1_zoomed.jpg" data-id="pepper1-thumb" data-label="Pepper 1"></div>
<div data-img-src="data/thumbnails/pepper2_zoomed.jpg" data-id="pepper2-thumb" data-label="Pepper 2"></div>
<div data-img-src="data/thumbnails/pepper3_zoomed.jpg" data-id="pepper3-thumb" data-label="Pepper 3"></div>
<div data-img-src="data/thumbnails/alocasia_zoomed.jpg" data-id="alocasia-thumb" data-label="Alocasia"></div>
<div data-img-src="data/thumbnails/anthurium_zoomed.jpg" data-id="anthurium-thumb" data-label="Anthurium"></div>
<div data-img-src="data/thumbnails/pinkprincess_zoomed.jpg" data-id="pinkprincess-thumb" data-label="Pink Princess"></div>
<div data-img-src="data/thumbnails/belize_zoomed.jpg" data-id="belize-thumb" data-label="Belize"></div>
<div data-img-src="data/thumbnails/croton_zoomed.jpg" data-id="croton-thumb" data-label="Croton"></div>
</div>
<button class="results-slide-arrow" id="results-slide-arrow-next" onclick="results_slide_right()">
›
</button>
</div>
</div>
<p class="tldr">These 3D segmented reconstructions are rendered in-browser! If you think that's cool, check out <a href="https://viser.studio/latest/" class="author-text" target="_blank">Viser</a>!</p>
</div>
<div class="section add-top-padding">
<div class="base-row">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
</div>
</div>
<div class="section base-row add-top-padding">
<h1>Inspection Planning</h1>
<p class="paragraph">
Certain plant regions like leaf undersides, can be under-reconstructed in teh resulting 3D reconstruction while being crucial for detecting issues such as pest infestations or diseases. To
address this, Botany-Bot uses robot interaction with a custom
end-effector to lift/push down each leaf toward a static
camera, capturing high-resolution underside/overside images.
<br>
</p>
<img src="data/robot_setup_fig_newer.png" style="max-width: 100%" />
<p class="paragraph">
To achieve this we have three task primitives for
manipulating a target leaf using the robot arm with its
inspection tool and turntable:
<ol class="paragraph">
<li>
<b>Rotation Alignment</b>:The target leaf is rotated through
the rotation of the turntable by θ such that its principal
axis (i.e. the stem direction for the leaf) q<sub>i</sub> is aligned to
the high resolution camera z-axis within a small margin ϵ
and the leaf center is on the camera z-axis. We denote the
turntable rotation set that satisfies the alignment condition
to be A<sub>rotate</sub> ⊂ SO(2)
</li>
<li><b>Tool Positioning</b>: The inspection tool is moved directly
above or underneath the leaf center to position it for
lifting/pushing. This step ensures that the tool’s position
p aligns with the leaf center in the horizontal plane:
x<sub>prepare</sub> = (x<sub>i</sub> , y<sub>i</sub> , z<sub>t<sub>initial</sub></sub> ) where z<sub>t<sub>initial</sub></sub> is the initial height
of the tool. The tool also needs to be parallel to the the
leaf surface. We denote the inspection tool pose set that
the tool positioning satisfies A<sub>prepare</sub> ⊂ SE(3).</li>
<li><b>Manipulation</b>: The inspection tool moves upward/down-
ward while simultaneously rotating to lift/push down the
leaf. The upward motion follows: x<sub>lift</sub> = (x<sub>i</sub> , y<sub>i</sub> , z<sub>t<sub>final</sub></sub> )
where z<sub>t<sub>final</sub></sub> > z<sub>t<sub>initial</sub></sub> while the downward motion follows:
x<sub>push</sub> = (x<sub>i</sub> , y<sub>i</sub> , z<sub>t<sub>final</sub></sub> ) where z<sub>t<sub>final</sub></sub> < z<sub>t<sub>initial</sub></sub> . Simultaneously,
the inspection tool rotates by an angle ϕ, applied as:
R<sub>lift/push</sub> ⊂ SO(3) so that the leaf is lifted/pushed down
in a controlled manner. The inspection tool pose set
A<sub>lift/push</sub> ⊂ SE(3) satisfies the lifting/pushing condition. </li>
</p>
</div>
<!-- <div class="section base-row add-top-padding">
<h1>Robot Motion Retargeting</h1>
<p class="paragraph">
After recovering 3D part motion, RSRD optimizes grasps and robot motions to reproduce the 4D reconstructions.
</p>
<div id="iframe-container" class="iframe-container">
<div class="click-and-move-overlay">
<h1 class="tldr">
<b>
<img src="data/drag_icon.png" alt="" class="inline-image">
Click and move me!
<img src="data/drag_icon.png" alt="" class="inline-image">
</b>
</h1>
</div>
<iframe class="iframe show"
src="https://rsrd-anonymous.github.io/build/?playbackPath=https://rsrd-anonymous.github.io/recordings/robot_bear.viser&initDistanceScale=1.5&gaussianGroupColorShuffleSeed=1"
></iframe>
</div>
<p class="paragraph">
These can be physically executed on a real robot to produce the demonstrated motion:
</p>
<div style="position: relative; display: flex;">
<button class="results-slide-arrow" id="results-slide-arrow-prev" onclick="vid_slide_left()">
‹
</button>
<div class="video-scroll" id="result-video-scroll">
<div class="video-scroll-card">
<video autoplay muted loop playsinline>
<source src="data/bear_robot_1_trimmed.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-scroll-card">
<video autoplay muted loop playsinline>
<source src="data/nerfgun_robot_1_trimmed.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-scroll-card">
<video autoplay muted loop playsinline>
<source src="data/scissors_white_out.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-scroll-card">
<video autoplay muted loop playsinline>
<source src="data/scissors_red_out.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-scroll-card">
<video autoplay muted loop playsinline>
<source src="data/red_box_out.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-scroll-card">
<video autoplay muted loop playsinline>
<source src="data/sunglasses_out.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
<button class="results-slide-arrow" id="results-slide-arrow-next" onclick="vid_slide_right()">
›
</button>
</div>
<p class="paragraph">
RSRD's visual imitation is <b>object-centric</b>, allowing it to adapt to <em>different</em> object orientations with the <em>same</em> demo:
<div class="video-container">
<div class="video-item">
<p><b>0 degrees rotated</b></p>
<video autoplay muted loop playsinline>
<source src="data/bear_robot_2_trimmed.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<p><b>180 degrees rotated</b></p>
<video autoplay muted loop playsinline>
<source src="data/bear_robot_1_trimmed.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
<div class="video-container">
<div class="video-item">
<p><b>0 degrees rotated</b></p>
<video autoplay muted loop playsinline>
<source src="data/nerfgun_straight_out.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<p><b>30 degrees rotated</b></p>
<video autoplay muted loop playsinline>
<source src="data/nerfgun_half_turned_out.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<p><b>45 degrees rotated</b></p>
<video autoplay muted loop playsinline>
<source src="data/nerfgun_robot_2_trimmed.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
</p>
</div>
<!-- Just for vertical spacing...
<div class="section add-top-padding">
<div class="base-row">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
</div>
</div>
</div>
<div class="section base-row add-top-padding">
<h1>How it works</h1>
<div class="separator"></div>
<p class="tldr"><b>4D Differentiable Part Models</b></p>
<p class="paragraph">
4D-DPM decomposes objects into parts with <a href="https://www.garfield.studio/" class="author-text" target="_blank">GARField</a>, and trains part-centric feature fields
on top of these. Each part is assigned a trainable 6D pose parameter which is optimized with gradient descent.
DINO improves dramatically over photometric tracking as a more robust feature target, and allows reconstructing a broad range of open-world objects.
</p>
<img src="data/4ddpm_fig.jpg" style="max-width: 100%; padding-bottom: 30px" />
<p class="paragraph">
Because 4D-DPM uses gradient descent, any differentiable prior is easily incorporated like temporal smoothness and rigidity.
</p>
<img src="data/4ddpm_ablation.jpg" style="max-width: 90%" />
</div>
<div class="section base-row add-top-padding">
<h1>Retargeting Robot Trajectory</h1>
<p class="paragraph">
With the recovered 3D part motion and the object placed in the robot workspace,
RSRD now can <i>do</i> the motion demonstrated in the video.
Motions can be retargeted regardless of object pose!
Three main takeaways are:
<ol class="paragraph">
<li>
<b>Hand-Guided Part Selection</b>:
From the 4D motion reconstruction, RSRD must automatically detect which parts need to be manipulated to reproduce it.
Not all moving parts are relevant, like the wooden figurine's hand.
We use hand position as prior for part selection (using <a class="author-text" href="https://geopavlakos.github.io/hamer/">HaMeR</a>),
but <i>do not</i> use the hand contact points, as explained in (2).
<img src="data/rsrd_hands.jpg" style="max-width: 90%" />
</li><br>
<li>
<b>Part-centric Grasps</b>:
We cannot use detected finger contact locations as grasp points, as they may either jitter or become kinematically unreachable.
Also, a robot must remain rigidly attached to the object part during the entire motion, whileas
humans can do so much more — change contact points by shuffling fingers, or do prehensile motions.
<div class="video-container" style="max-width: 600px;">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/nerfgun_graspsearch.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
</li><br>
<li>
<b>Bimanual Robot Pose Optimization</b>:
We exhaustively search for collision-free, kinematically feasible robot trajectories: we first use
<a href="https://github.com/brentyi/jaxls" class="author-text" target="_blank">jaxls</a>
to optimize a robot trajectory to fit the robot end-effector waypoints using a Levenberg-Marquardt solver.
Then, we use
<a href="https://curobo.org/" class="author-text" target="_blank">cuRobo</a>
to plan collision-free robot approach motions and for all collision avoidance checks.
<br>
Below we vary the object pose, and visualize below some bimanual IK solutions for each object motion.
<div class="video-container">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/nerfgun_posevary.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/bear_posevary.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/scissors_posevary.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
</li>
</ol>
</p>
</div> -->
<div class="section base-row add-top-padding">
<h1>Leaf Lifting/Pushing</h1>
<div class="video-container" style="max-width: 10000px;">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/lifting-pushing.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
</div>
<!-- Just for vertical spacing... -->
<div class="section add-top-padding">
<div class="base-row">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
</div>
</div>
<div class="section base-row add-top-padding">
<h1>Experiments</h1>
<div>
<p class="paragraph">
We evaluate the following three metrics for 3D reconstruction:
</p>
<ol class="paragraph">
<li>
Ratio of leaves correctly <b>segmented</b> on each plant
</li>
<li>
Ratio of leaves successfully <b>detected</b> per plant
</li>
<li>
Physical accuracy of leaf area and leaf height on a select subset of leaves measured against ground-truth
</li>
</ol>
<p class="paragraph">
and these two metrics for autonomous robot leaf inspection:
</p>
<ol class="paragraph">
<li>
Number of leaves autonomously <b>lifted</b> by robot
</li>
<li>
Undersides of leaves fully <b>revealed</b> to high-res camera
</li>
</ol>
</div>
<!-- <p class="paragraph">
RSRD takes in <b>1) </b>a multi-view object scan and <b>2)</b> a monocular demonstration video.
By creating part-aware 3D representations using <a href="https://www.garfield.studio/" class="author-text" target="_blank">GARField</a> (parts, toggle for clusters) and
<a href="https://github.com/facebookresearch/dinov2" class="author-text"target="_blank">DINOv2</a> (tracking SE3 pose),
these smartphone-captured inputs can generate these 4D reconstructions:
</p> -->
<img src="data/table1.png" style="max-width: 100%" />
<img src="data/table2.png" style="max-width: 100%" />
<img src="data/table3.png" style="max-width: 100%" />
</div>
<div class="section add-top-padding">
<div class="base-row">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
<img src="data/botanybot.png" style="height: 40px; margin-left: 5px; margin-right: 5px; margin-top: 40px; margin-bottom: 40px;">
</div>
</div>
<!-- <div class="section base-row add-top-padding">
<h1>Limitations and Failures</h1>
<p class="paragraph">
4D monocular reconstruction is an extremely under-constrained and challenging problem, and 4D-DPM still suffers from sensitivity to demonstration viewpoint, occlusions, reconstruction quality, and more.
It also requires some hyper-parameter tuning of regularizers like the ARAP loss, and can frustratingly fail due to incorrect or incomplete part segmentations. More work is needed to adapt the approach for more in-the-wild videos.
In addition, while we show how the 4D reconstruction enables a robot to imitate with motion planning, RSRD as designed cannot scale to multiple demonstration videos of the same object, it can only mimick motion from one video.
Learning to manipulate from more demonstrations, perhaps with policy learning, is an exciting future direction!
</p>
<p class="paragraph">
<b>Camera Angle Sensitivity:</b> Since RSRD uses only a single video, it can be sensitive to camera pose during the demonstration, as illustrated in the tracking failure below.
The same scan of the laptop can work or fail depending on the camera angle.
</p>
<div class="video-container">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/laptop_trackfail.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/laptop_tracksucc.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
<p class="paragraph">
<b>Difficulty with feature-less parts:</b> 4D-DPM can struggle with parts which look similar from multiple angles, or have
not enough visual features for DINO to pick up on. In these videos the tail of the plushie and cable of the charger rotate along their major axis, making robot
execution difficult.
</p>
<div class="video-container">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/garfield_tailrotate.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/mac_charger_cablerotate.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
<p class="paragraph">
<b>Poor segmentation or reconstruction</b>: If the object scan is poor, or the part segmentation is incomplete or severely over-segmented,
4D-DPM can be unstable and lead to catastrophic failure like below.
</p>
<div class="video-container">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/mac_charger_catastrophic.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
<p class="paragraph">
<b>Hand occlusions</b>: hand occlusions can interfere with part motion recovery, such as with the leg of this sculpture
</p>
<div class="video-container">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/hand_occ.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div>
</div> -->
<div class="section base-row add-top-padding">
<h1>Limitations and Failures</h1>
<p class="paragraph">
Across 68 safely accessible leaves across 8 plants,
the robot can successfully lift/push 53 leaves. Where the
leaf is not sufficiently aligned with the camera, we mark
this as a failure even if the lift/push motion is executed
correctly. In one notable case (Croton), the robot successfully
pushes down a leaf but the leaf is broken. We do not include
this leaf among the considered results. Besides breakage,
for lifting/pushing, there are three main failure cases:
</p>
<ol class ="paragraph">
<li>Leaf obstructions</li>
<li>Plant dynamics</li>
<li>Pose error</li>
</ol>
<p class="paragraph">
A robot may fail to interact with a leaf properly if the gripper
accidentally catches on a lower leaf during its motion, bends
the stem, and causes the leaf to rotate out of the way. Also,
even if the robot made proper contact with the leaf initially,
the leaf may slip out of the way depending on how it is
connected to the stem. Any pose registration error will only
exacerbate these problems. Solving this would require some
form of visual closed-loop servoing to detect any errors.
</p>
<p class="paragraph">
Out of the 53 leaves that were pushed/lifted, their over-
sides/undersides are visible in 41 cases. For observing leaf oversides/undersides, the biggest challenges are singulating the leaf and choosing the correct distance to lift/push down the leaf. Most failure cases (6/12)
are due to a nearby leaf that gets lifted up/pushed down,
and blocks the underside of the target leaf from the camera
view. Collision-free, contact-aware motion planning with the
3D plant model would be required to “burrow” between
leaves carefully. Another failure mode (5/12) is the leaf being
lifted/pushed but not enough to expose its underside/overside.
This is because the leaf lift/push distance selection is a naive
implementation; the lift/push distance should depend on the
leaf’s position in the camera image center, as the leaves
lower on the camera perspective should be lifted up a larger
amount while leaves higher on the camera perspective should
be pushed up a larger amount. Another solution could be
to implement a closed-loop motion that takes a image just
before losing leaf contact. Lastly, we notice that the gripper
did not always orient parallel to the leaf surface. Improving the plant-specific tuning of this parameter could help fix this.
</p>
<!-- <div class="video-container">
<div class="video-item">
<video autoplay muted loop playsinline>
<source src="data/hand_occ.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
</div> -->
</div>
<div class="section citation" style="margin-top: 50px">
<h1 id="abstract">Citation </h1>
<p class="paragraph"> If you use this work or find it helpful, please consider citing: (bibtex) </p>
<pre id="codecell0">@inproceedings{adebola2025botanybot,
title={Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats},
author={Simeon Adebola and Chung Min Kim `and Justin Kerr and Shuangyu Xie and Prithvi Akella and Jose Luis Susa Rincon and Eugen Solowjow and Ken Goldberg},
booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025},
<!-- url={https://openreview.net/forum?id=2LLu3gavF1} -->
}
</pre>
</div>
<footer>
<div class="section" style="margin-top: 40px;">
<p class="paragraph">
The website template is adapted from the <a class="author-text" href="https://robot-see-robot-do.github.io/">Robot See Robot Do</a>
project page.
</p>
</div>
</footer>
</div>
</body>