robinbaumann.github.io/gsoc20.html at master · RobinBaumann/robinbaumann.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1.0, user-scalable=no"/>
    <meta name="theme-color" content="#008080">
    <title>Robin Baumann</title>

    <!-- CSS  -->
    <link href="min/plugin-min.css" type="text/css" rel="stylesheet">
    <link href="min/custom-min.css" type="text/css" rel="stylesheet" >
    <link href="img/icons/hackerrank/styles.css" type="text/css" rel="stylesheet">
</head>

<style>
        ul.custom {list-style-type: circle;}
</style>

<body id="top" class="scrollspy">

<!-- Pre Loader -->
<div id="loader-wrapper">
    <div id="loader"></div>

    <div class="loader-section section-left"></div>
    <div class="loader-section section-right"></div>

</div>

<!--Navigation-->
 <div class="navbar-fixed">
    <nav id="nav_f" class="default_color" role="navigation">
        <div class="container">
            <div class="nav-wrapper">
            <a href="index.html" id="logo-container" class="brand-logo">Robin Baumann</a>
                <ul class="right hide-on-med-and-down">
                    <li><a href="index.html#about">About</a></li>
                    <li><a href="index.html#work">Work</a></li>
                    <li><a href="index.html#contact">Contact</a></li>
                    <li><a href="cv/online_cv_robinbaumann.pdf">CV</a></li>
                </ul>
                <ul id="nav-mobile" class="side-nav">
                    <li><a href="index.html#about">About</a></li>
                    <li><a href="index.html#work">Work</a></li>
                    <li><a href="index.html#contact">Contact</a></li>
                    <li><a href="cv/online_cv_robinbaumann.pdf">CV</a></li>
                </ul>
            <a href="index.html" data-activates="nav-mobile" class="button-collapse"><i class="mdi-navigation-menu"></i></a>
            </div>
        </div>
    </nav>
</div>

<!--Hero
<div class="section no-pad-bot" id="index-banner">
    <div class="container">
        <h1 class="text_h center header cd-headline letters type">
            <span>HI! I'm <strong>Robin Baumann</strong></span> <br>
            <span style="font-family: monospace, monospace; font-size: 30%; font-weight: lighter">Master student in Machine Learning from Karlsruhe, Germany.</span>
        </h1>
    </div>
</div>-->

<!--Intro and service-->
<div id="about" class="section scrollspy">
    <div class="container">
        <div class="row">
            <div  class="col s12">
                <h2 class="center header text_h2"> Google Summer of Code 2020: <br \> Implement Mesh R-CNN in TensorFlow Graphics </h2>
            </div>

            <div  class="col s12 m4 l4"></div>

            <div  class="s12 m4 l4">
                <div class="center promo promo-example">
                    <h5 class="promo-caption">Project Summary</h5>
                    <p><a class="span_h2" target="_blank" href="https://summerofcode.withgoogle.com/projects/#5892825423544320">GSoC Projecet Page</a></p>

                    <p>During my Google Summer of Code experience, I implemented the main components of <a class="span_h2" href="https://arxiv.org/abs/1906.02739">Mesh R-CNN</a>
                        and a TensorFlow Datasets Provider for <a class="span_h2" href="http://pix3d.csail.mit.edu/">Pix3D</a> for TensorFlow Graphics. It was a great experience and
                        I have learned a lot during the three months, not least because of the good support from my mentors. The following lines contain an overview of my achievements
                    together with links to all work that I have submitted to the <a class="span_h2" href="https://github.com/tensorflow/graphics">TensorFlow Graphics GitHub Repository</a>.</p>

                    <h5 class="promo-caption">Objectives</h5>
                    <ul class="custom">
                        <li>Pix3D Dataset feature connectors (<a class="span_h2" href="https://github.com/tensorflow/graphics/pull/386">PR #386</a>)</li>
                        <li>Pix3D Dataset implementation (<a class="span_h2" href="https://github.com/tensorflow/graphics/pull/410">PR #410</a>)</li>
                        <li>Mesh R-CNN Core ops: Cubify and VertAlign (<a class="span_h2" href="https://github.com/tensorflow/graphics/pull/437">PR #437</a>)</li>
                        <li>Mesh R-CNN Network heads: Mesh Refinement (<a class="span_h2" href="https://github.com/tensorflow/graphics/pull/441">PR #441</a>)</li>
                        <li>Mesh R-CNN Loss functions: Mesh Losses (<a class="span_h2" class="span_h2" href="https://github.com/tensorflow/graphics/pull/449">PR #449</a>)</li>
                        <li>Mesh R-CNN Network heads: Voxel branch (<a class="span_h2" href="https://github.com/tensorflow/graphics/pull/450">PR #450</a>)</li>
                        <li>Mesh R-CNN 3D Prediction integration (<a class="span_h2" href="https://github.com/tensorflow/graphics/pull/454">PR #454</a>)</li>
                    </ul>
                    <br>

                    <b>Note:</b>
                    Since the project was very extensive, I had to limit myself to implementing the main components of Mesh R-CNN.
                    In consultation with my mentors, we decided to focus on high-quality, well tested implementations of the core ops of the 3D part of Mesh R-CNN rather than to set up an end-to-end pipeline with a 2D backbone.
                    <h5>Main Challenges</h5>
                    <p>The following list contains some of the main challenges I faced during the summer:</p>

                    <ol>
                        <li><b>Pix3D Dataset <tt>FeatureConnectors</tt></b>: The encoding schema for TensorFlow Datasets <tt>FeatureConnectors</tt> only supports one unknown dimension
                        per feature. This is a problem for datasets containing multiple objects per image, which would usually store a list of all objects with corresponding
                        labels together with one image. Nesting multiple objects to a list with unknown length (like the number of objects per image, which could vary for different images),
                        forces the TFDS encoding to introduce additional dimensions of unknown size. As Pix3D has only one object per image, this is not particulary a problem, but most of the 2D backbones in e.g. the TensorFlow Object Detection API
                        assume the provided data in a nested structure, as e.g. defined by the <a class="span_h2" href="https://www.tensorflow.org/datasets/catalog/coco">COCO Dataset Provider</a>. Thus, these models could not be used out of the box with the Pix3D Dataset provider.</li>
                        <li><b>Erros in Pix3D Dataset</b>: While exploring the Pix3D dataset and preparing the TFDS <tt>DatasetBuilder</tt>, we found that two samples of Pix3D are flawed. More precisely, two of the segmentation masks are of wrongly rendered. I raised an Issue in the official
                            <a class="span_h2" href=https://github.com/xingyuansun/pix3d/issues/18>Pix3D GitHub Repository</a> and hopefully, the creators of Pix3D will fix this.
                        Removing those samples from the new <tt>DatasetBuilder</tt> fixed the issue and makes the dataset usable. However, we still wanted to give users the ability to
                        use the full dataset. Thus, we rendered the two wrong masks with the correct parameters(See <a class="span_h2" href="https://github.com/tensorflow/graphics/tree/master/tensorflow_graphics/datasets/pix3d/fixed_masks">here</a>) and provide a documentation on how users could integrate those in their dataset in the <tt>DatasetInfo</tt> specification.</li>
                        <li><b>Working with meshes</b>: Using meshes in an deep learning approach comes with many difficulties, as meshes representing different object may have different topologies and thus, a different number of vertices and faces each.
                            One way to mitigate this is to pad vertex and face tensors to the same length and pass them through the ops. However, simply padding the meshes is not enough, since especially face and edge lists contain indices into the vertex list. Adding padding to these lists could lead to degenerate meshes.
                            To mitigate this, I implemented a <tt>Meshes</tt> class, which supports transformations of meshes into different representations while keeping track of all padded elements. This allows to use meshes of different size in a batch. The class also contains some additional functionality to efficiently extract vertex adjacencies from the batch of meshes.
                            For more information <a class="span_h2" href="https://github.com/RobinBaumann/graphics/blob/GSoC20_Mesh_R-CNN_integration/tensorflow_graphics/projects/mesh_rcnn/structures/mesh.py">have a look at the implementation.</a>
                        </li>
                        <li><b>High quality implementations</b>: Writing cumputational efficient, generalizable (in the means of supporting arbitrary batch dimension) and well tested code can be very challenging. Especially the implementation of cubify, which converts voxel grid occupancy probabilities to triangle meshes and the Mesh R-CNN mesh loss functions were very challenging. The <tt>Meshes</tt> class simplified a lot of those conversions between different 3D data representations.
                            But I definately learned a lot during the discussions with my mentors.</li>
                    </ol>

                    <h5>Future Work</h5>
                    <p>Although the Google Summer of Code 2020 is officially over, I would love to continue working on this project. There are a few things that are still left to do, like the connection to a 2D backbone and running some experiments on the Pix3D dataset.
                       Additionally, I will continue to address all comments in the pull requests and provide minor refactorings in the <tt>Meshes</tt> class, as I think this would be a really cool feature to have in the core TF Graphics repository.
                    </p>

                    <h5>Special thanks</h5>

                    <p>My Google Summer of Code Experience was great and a large part of this great experience was the good mentoring of <span class="span_h2" >Avneesh Sud</span>, <span class="span_h2">Abhijit Kundu</span>
                    and last but not least, <span class="span_h2">Paige Bailey</span>, who made me aware of the program and encouraged me to apply.</p>
                   </div>
            </div>
            </div>
        </div>
    </div>
</div>


<!--Parallax-->
<div class="parallax-container">
    <div class="parallax"><img src="img/parallax.jpeg"></div>
</div>

<!--Footer-->
<footer id="contact" class="page-footer default_color scrollspy">
    <div class="container">
        <div class="row">
            <div class="col l3 s12">
                <h5 class="white-text">Robin Baumann</h5>
                <ul>
                    <li><a class="white-text" href="index.html">Home</a></li>
                    <li>
                        <a class="white-text" href="&#109;&#097;&#105;&#108;&#116;&#111;&#058;&#114;&#111;&#098;&#105;&#110;&#046;&#098;&#097;&#117;&#109;&#097;&#110;&#110;&#056;&#049;&#048;&#064;&#103;&#109;&#097;&#105;&#108;&#046;&#099;&#111;&#109;">
                            Email
                        </a>
                    </li>
                </ul>
            </div>
            <div class="col l3 s12">
                <h5 class="white-text">Social</h5>
                <ul>
                    <li>
                        <a class="white-text" href="https://www.linkedin.com/in/robin-baumann-272959159/">
                            <i title="Follow me on LinkedIn" class="small how fa fa-linkedin-square"></i>
                        </a> &ensp;
                        <a class="white-text" href="https://github.com/RobinBaumann">
                            <i title="see my projects on Github" class="small hov fa fa-github-square"></i>
                        </a> &ensp;
                        <a class="white-text" href="https://twitter.com/_RobinBaumann">
                            <i title="Follow me on Twitter" class="small hov fa fa-twitter-square" ></i>
                        </a>
                    </li>
                </ul>
            </div>
        </div>
    </div>
    <div class="footer-copyright default_color">
        <div class="container">
            Made by <a class="white-text" href="index.html">Robin Baumann</a>
            . Design framework:
            <a class="white-text" href="http://materializecss.com/">materializecss</a>
        </div>
    </div>
</footer>


    <!--  Scripts-->
    <script src="min/plugin-min.js"></script>
    <script src="min/custom-min.js"></script>

    </body>
</html>