-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy pathdataacquire.html
More file actions
735 lines (686 loc) · 65.2 KB
/
dataacquire.html
File metadata and controls
735 lines (686 loc) · 65.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>Chapter 4 Data Acquisition | R for HR: A Hands-On Introduction to Human Resource Analytics Using R</title>
<meta name="description" content="Human resource (HR) analytics is a growing area of HR, and the purpose of this book is to show how the R programming language can be used to manage, analyze, and visualize HR data in order to derive insights." />
<meta name="generator" content="bookdown 0.20 and GitBook 2.6.7" />
<meta property="og:title" content="Chapter 4 Data Acquisition | R for HR: A Hands-On Introduction to Human Resource Analytics Using R" />
<meta property="og:type" content="book" />
<meta property="og:description" content="Human resource (HR) analytics is a growing area of HR, and the purpose of this book is to show how the R programming language can be used to manage, analyze, and visualize HR data in order to derive insights." />
<meta name="github-repo" content="davidcaughlin/R-for-HR" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="Chapter 4 Data Acquisition | R for HR: A Hands-On Introduction to Human Resource Analytics Using R" />
<meta name="twitter:description" content="Human resource (HR) analytics is a growing area of HR, and the purpose of this book is to show how the R programming language can be used to manage, analyze, and visualize HR data in order to derive insights." />
<meta name="author" content="David E. Caughlin" />
<meta name="date" content="2020-07-22" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<link rel="prev" href="gentleintro.html"/>
<link rel="next" href="references.html"/>
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-clipboard.css" rel="stylesheet" />
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li><a href="./">R for HR</a></li>
<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>Preface</a><ul>
<li class="chapter" data-level="0.1" data-path="index.html"><a href="index.html#growth-of-hr-analytics"><i class="fa fa-check"></i><b>0.1</b> Growth of HR Analytics</a></li>
<li class="chapter" data-level="0.2" data-path="index.html"><a href="index.html#hr-analytics-project-life-cycle"><i class="fa fa-check"></i><b>0.2</b> HR Analytics Project Life Cycle</a></li>
<li class="chapter" data-level="0.3" data-path="index.html"><a href="index.html#about-the-author"><i class="fa fa-check"></i><b>0.3</b> About the Author</a></li>
<li class="chapter" data-level="0.4" data-path="index.html"><a href="index.html#orientation-to-the-book"><i class="fa fa-check"></i><b>0.4</b> Orientation to the Book</a><ul>
<li class="chapter" data-level="0.4.1" data-path="index.html"><a href="index.html#hr-analytics-life-cycle"><i class="fa fa-check"></i><b>0.4.1</b> HR Analytics Life Cycle</a></li>
<li class="chapter" data-level="0.4.2" data-path="index.html"><a href="index.html#multiple-options"><i class="fa fa-check"></i><b>0.4.2</b> Multiple Options</a></li>
<li class="chapter" data-level="0.4.3" data-path="index.html"><a href="index.html#parametric-nonparametric-statistics"><i class="fa fa-check"></i><b>0.4.3</b> Parametric & Nonparametric Statistics</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>I Introduction</b></span></li>
<li class="chapter" data-level="1" data-path="install.html"><a href="install.html"><i class="fa fa-check"></i><b>1</b> Installing R & RStudio</a><ul>
<li class="chapter" data-level="1.1" data-path="install.html"><a href="install.html#downloading-installing-r"><i class="fa fa-check"></i><b>1.1</b> Downloading & Installing R</a><ul>
<li class="chapter" data-level="1.1.1" data-path="install.html"><a href="install.html#for-windows-operation-systems"><i class="fa fa-check"></i><b>1.1.1</b> For Windows Operation Systems</a></li>
<li class="chapter" data-level="1.1.2" data-path="install.html"><a href="install.html#for-mac-operating-systems"><i class="fa fa-check"></i><b>1.1.2</b> For Mac Operating Systems</a></li>
</ul></li>
<li class="chapter" data-level="1.2" data-path="install.html"><a href="install.html#downloading-installing-rstudio"><i class="fa fa-check"></i><b>1.2</b> Downloading & Installing RStudio</a></li>
<li class="chapter" data-level="1.3" data-path="install.html"><a href="install.html#orientation-to-rstudio"><i class="fa fa-check"></i><b>1.3</b> Orientation to RStudio</a></li>
<li class="chapter" data-level="1.4" data-path="install.html"><a href="install.html#summary"><i class="fa fa-check"></i><b>1.4</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2" data-path="gettingstarted.html"><a href="gettingstarted.html"><i class="fa fa-check"></i><b>2</b> Getting Started with R</a><ul>
<li class="chapter" data-level="2.1" data-path="gettingstarted.html"><a href="gettingstarted.html#setting-a-working-directory"><i class="fa fa-check"></i><b>2.1</b> Setting a Working Directory</a><ul>
<li class="chapter" data-level="2.1.1" data-path="gettingstarted.html"><a href="gettingstarted.html#determining-current-working-directory"><i class="fa fa-check"></i><b>2.1.1</b> Determining Current Working Directory</a></li>
<li class="chapter" data-level="2.1.2" data-path="gettingstarted.html"><a href="gettingstarted.html#setting-a-new-working-directory"><i class="fa fa-check"></i><b>2.1.2</b> Setting a New Working Directory</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="gettingstarted.html"><a href="gettingstarted.html#creating-saving-an-r-script"><i class="fa fa-check"></i><b>2.2</b> Creating & Saving an R Script</a><ul>
<li class="chapter" data-level="2.2.1" data-path="gettingstarted.html"><a href="gettingstarted.html#creating-a-new-r-script"><i class="fa fa-check"></i><b>2.2.1</b> Creating a New R Script</a></li>
<li class="chapter" data-level="2.2.2" data-path="gettingstarted.html"><a href="gettingstarted.html#using-an-r-script"><i class="fa fa-check"></i><b>2.2.2</b> Using an R Script</a></li>
<li class="chapter" data-level="2.2.3" data-path="gettingstarted.html"><a href="gettingstarted.html#saving-an-r-script"><i class="fa fa-check"></i><b>2.2.3</b> Saving an R Script</a></li>
<li class="chapter" data-level="2.2.4" data-path="gettingstarted.html"><a href="gettingstarted.html#opening-a-saved-r-script"><i class="fa fa-check"></i><b>2.2.4</b> Opening a Saved R Script</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="gettingstarted.html"><a href="gettingstarted.html#creating-a-project-in-rstudio"><i class="fa fa-check"></i><b>2.3</b> Creating a Project in RStudio</a><ul>
<li class="chapter" data-level="2.3.1" data-path="gettingstarted.html"><a href="gettingstarted.html#creating-a-new-rstudio-project"><i class="fa fa-check"></i><b>2.3.1</b> Creating a New RStudio Project</a></li>
<li class="chapter" data-level="2.3.2" data-path="gettingstarted.html"><a href="gettingstarted.html#opening-an-existing-r-project"><i class="fa fa-check"></i><b>2.3.2</b> Opening an Existing R Project</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="gettingstarted.html"><a href="gettingstarted.html#orientation-to-written-tutorials"><i class="fa fa-check"></i><b>2.4</b> Orientation to Written Tutorials</a></li>
<li class="chapter" data-level="2.5" data-path="install.html"><a href="install.html#summary"><i class="fa fa-check"></i><b>2.5</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="3" data-path="gentleintro.html"><a href="gentleintro.html"><i class="fa fa-check"></i><b>3</b> Basic Features and Operations of the R Language</a><ul>
<li class="chapter" data-level="3.1" data-path="gentleintro.html"><a href="gentleintro.html#r-as-a-calculator"><i class="fa fa-check"></i><b>3.1</b> R as a Calculator</a></li>
<li class="chapter" data-level="3.2" data-path="gentleintro.html"><a href="gentleintro.html#functions"><i class="fa fa-check"></i><b>3.2</b> Functions</a></li>
<li class="chapter" data-level="3.3" data-path="gentleintro.html"><a href="gentleintro.html#packages"><i class="fa fa-check"></i><b>3.3</b> Packages</a></li>
<li class="chapter" data-level="3.4" data-path="gentleintro.html"><a href="gentleintro.html#variable-assignment"><i class="fa fa-check"></i><b>3.4</b> Variable Assignment</a></li>
<li class="chapter" data-level="3.5" data-path="gentleintro.html"><a href="gentleintro.html#types-of-data"><i class="fa fa-check"></i><b>3.5</b> Types of Data</a><ul>
<li class="chapter" data-level="3.5.1" data-path="gentleintro.html"><a href="gentleintro.html#numeric-data"><i class="fa fa-check"></i><b>3.5.1</b> <code>numeric</code> Data</a></li>
<li class="chapter" data-level="3.5.2" data-path="gentleintro.html"><a href="gentleintro.html#character-data"><i class="fa fa-check"></i><b>3.5.2</b> <code>character</code> Data</a></li>
<li class="chapter" data-level="3.5.3" data-path="gentleintro.html"><a href="gentleintro.html#date-data"><i class="fa fa-check"></i><b>3.5.3</b> <code>Date</code> Data</a></li>
<li class="chapter" data-level="3.5.4" data-path="gentleintro.html"><a href="gentleintro.html#logical-data"><i class="fa fa-check"></i><b>3.5.4</b> <code>logical</code> Data</a></li>
</ul></li>
<li class="chapter" data-level="3.6" data-path="gentleintro.html"><a href="gentleintro.html#vectors"><i class="fa fa-check"></i><b>3.6</b> Vectors</a></li>
<li class="chapter" data-level="3.7" data-path="gentleintro.html"><a href="gentleintro.html#lists"><i class="fa fa-check"></i><b>3.7</b> Lists</a></li>
<li class="chapter" data-level="3.8" data-path="gentleintro.html"><a href="gentleintro.html#data-frames"><i class="fa fa-check"></i><b>3.8</b> Data Frames</a></li>
<li class="chapter" data-level="3.9" data-path="gentleintro.html"><a href="gentleintro.html#annotations"><i class="fa fa-check"></i><b>3.9</b> Annotations</a></li>
<li class="chapter" data-level="3.10" data-path="gentleintro.html"><a href="gentleintro.html#removing-objects-from-environment"><i class="fa fa-check"></i><b>3.10</b> Removing Objects from Environment</a></li>
<li class="chapter" data-level="3.11" data-path="install.html"><a href="install.html#summary"><i class="fa fa-check"></i><b>3.11</b> Summary</a></li>
</ul></li>
<li class="part"><span><b>II Data Acquisition</b></span></li>
<li class="chapter" data-level="4" data-path="dataacquire.html"><a href="dataacquire.html"><i class="fa fa-check"></i><b>4</b> Data Acquisition</a><ul>
<li class="chapter" data-level="4.1" data-path="dataacquire.html"><a href="dataacquire.html#reading-data-to-r"><i class="fa fa-check"></i><b>4.1</b> Reading Data to R</a><ul>
<li class="chapter" data-level="4.1.1" data-path="dataacquire.html"><a href="dataacquire.html#initial-steps"><i class="fa fa-check"></i><b>4.1.1</b> Initial Steps</a></li>
<li class="chapter" data-level="4.1.2" data-path="dataacquire.html"><a href="dataacquire.html#read-data"><i class="fa fa-check"></i><b>4.1.2</b> Read Data</a></li>
<li class="chapter" data-level="4.1.3" data-path="dataacquire.html"><a href="dataacquire.html#read-.csv-data-file-with-2-rows-of-variable-nameslabels"><i class="fa fa-check"></i><b>4.1.3</b> Read .csv Data File With 2+ Rows of Variable Names/Labels</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="dataacquire.html"><a href="dataacquire.html#remove-variable-names-from-a-data-frame-object"><i class="fa fa-check"></i><b>4.2</b> Remove Variable Names from a Data Frame Object</a></li>
<li class="chapter" data-level="4.3" data-path="dataacquire.html"><a href="dataacquire.html#add-variable-names-from-a-data-frame-object"><i class="fa fa-check"></i><b>4.3</b> Add Variable Names from a Data Frame Object</a></li>
<li class="chapter" data-level="4.4" data-path="dataacquire.html"><a href="dataacquire.html#writing-data"><i class="fa fa-check"></i><b>4.4</b> Writing Data</a><ul>
<li class="chapter" data-level="4.4.1" data-path="dataacquire.html"><a href="dataacquire.html#initial-steps-1"><i class="fa fa-check"></i><b>4.4.1</b> Initial Steps</a></li>
<li class="chapter" data-level="4.4.2" data-path="dataacquire.html"><a href="dataacquire.html#write-data-frame-to-working-directory"><i class="fa fa-check"></i><b>4.4.2</b> Write Data Frame to Working Directory</a></li>
<li class="chapter" data-level="4.4.3" data-path="dataacquire.html"><a href="dataacquire.html#write-table-to-working-directory"><i class="fa fa-check"></i><b>4.4.3</b> Write Table to Working Directory</a></li>
</ul></li>
<li class="chapter" data-level="4.5" data-path="install.html"><a href="install.html#summary"><i class="fa fa-check"></i><b>4.5</b> Summary</a></li>
<li class="chapter" data-level="4.6" data-path="dataacquire.html"><a href="dataacquire.html#references"><i class="fa fa-check"></i><b>4.6</b> References</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="dataacquire.html"><a href="dataacquire.html#references"><i class="fa fa-check"></i>References</a></li>
<li class="divider"></li>
<li><a href="https://github.com/rstudio/bookdown" target="blank">Published with bookdown</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">R for HR:<br />
A Hands-On Introduction to Human Resource Analytics Using R</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<div id="dataacquire" class="section level1">
<h1><span class="header-section-number">Chapter 4</span> Data Acquisition</h1>
<p><strong>Functions & Packages Introduced</strong></p>
<table>
<thead>
<tr class="header">
<th>Function</th>
<th>Package</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>read.csv</code></td>
<td>base</td>
</tr>
<tr class="even">
<td><code>read_csv</code></td>
<td><code>readr</code></td>
</tr>
<tr class="odd">
<td><code>Read</code></td>
<td><code>lessR</code></td>
</tr>
<tr class="even">
<td><code>excel_sheets</code></td>
<td><code>readxl</code></td>
</tr>
<tr class="odd">
<td><code>read_excel</code></td>
<td><code>readxl</code></td>
</tr>
<tr class="even">
<td><code>View</code></td>
<td>base</td>
</tr>
<tr class="odd">
<td><code>print</code></td>
<td>base</td>
</tr>
<tr class="even">
<td><code>head</code></td>
<td>base</td>
</tr>
<tr class="odd">
<td><code>tail</code></td>
<td>base</td>
</tr>
<tr class="even">
<td><code>names</code></td>
<td>base</td>
</tr>
<tr class="odd">
<td><code>colnames</code></td>
<td>base</td>
</tr>
<tr class="even">
<td><code>install.packages</code></td>
<td>base</td>
</tr>
<tr class="odd">
<td><code>library</code></td>
<td>base</td>
</tr>
<tr class="even">
<td><code>write.csv</code></td>
<td>base <code>R</code></td>
</tr>
<tr class="odd">
<td><code>write.table</code></td>
<td>base <code>R</code></td>
</tr>
<tr class="even">
<td><code>table</code></td>
<td>base <code>R</code></td>
</tr>
</tbody>
</table>
<div id="reading-data-to-r" class="section level2">
<h2><span class="header-section-number">4.1</span> Reading Data to R</h2>
<p><strong>Link to Video Tutorial:</strong> <a href="https://youtu.be/smWjqhaxHY8" class="uri">https://youtu.be/smWjqhaxHY8</a></p>
<p><strong>Reading data</strong> refers to the process of importing data from a working directory or website into the R environment. When we read a data file into R, we often read it in as a <strong>data frame (df)</strong>, where a data frame is a tabular display with columns representing variables and rows representing cases. Many different data file formats can be read into R as data frames, such as .csv, .xls/x, .txt, .sas7bdat (SAS), and .sav (SPSS). Finally, as you will learn in this tutorial, different functions can be used to read data into R.</p>
<div id="initial-steps" class="section level3">
<h3><span class="header-section-number">4.1.1</span> Initial Steps</h3>
<p>Any function that appears in the Initial Steps section has been covered in a previous chapter. If you need a refresher, please view the relevant chapter. In addition, a previous chapter may show you how to perform the same action using different functions or packages.</p>
<p>You can access the data for this project using two methods. As the <em>first option</em>, you can save the file called <strong>"PersData.csv"</strong> and <strong>"PersData_Excel"</strong> into a folder on your computer that you set as your working directory. As a reminder, you can access all of the data files referenced in this R book by downloading them as a compressed (zipped) folder from the my GitHub site: <a href="https://github.com/davidcaughlin/R-Tutorial-Data-Files" class="uri">https://github.com/davidcaughlin/R-Tutorial-Data-Files</a>; once you've followed the link to GitHub, just click "Clone or Download" followed by "Download ZIP", which will download all of the data files for this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.</p>
<p>Next, set your working directory by using the <code>setwd</code> function (see below) or by doing it manually. Your working directory folder will likely be different than the one shown below; "H:/RWorkshop" just happens to be the name of the folder that I save my data files to and that I set as my working directory. You can manually set your working directory folder in your drop-down menus by going to <em>Session > Set Working Directory > Choose Directory...</em>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Set your working directory to the folder containing your data file</span>
<span class="kw">setwd</span>(<span class="st">"H:/RWorkshop"</span>)</code></pre></div>
<p>Finally, I highly recommend that you create a new R Script file (.R), which will allow you to edit and save your script and annotations.</p>
</div>
<div id="read-data" class="section level3">
<h3><span class="header-section-number">4.1.2</span> Read Data</h3>
<p>One of the easiest data file formats to work with when reading data into R is the .csv (comma-separated values) format. The .csv (comma-separated values) format is commonly used among R users, and such files can be created in Microsoft Excel and Google Sheets (as well as other programs). For example, many survey, data analysis, and data-acquisition platforms allow data to be exported to .csv files. When getting started in R, the way in which the .csv file is formatted can make your life easier. Specifically, the most straightforward .csv file format to read in is one in which the first row contains the name of each variable in each column, and in which the second row contains the first row of observed values (i.e., data) for the cases (i.e., observations, entities, people, units). Later in the chapter, I will show you how to read in .csv files in which the observed values do not begin until the third row or later; in addition, I will demonstrate how to read in other file formats. however, as mentioned above, other file formats can be read into R as well.</p>
<p>In this tutorial, you will learn how to read data into R using four different functions. If there are any missing values in your data, each function we cover will replace those missing values with <code>NA</code> by default. <em>I personally recommend that you get comfortable with Option 2 (<code>read_csv</code> function from <code>readr</code> package), as this function has some advantages when it comes to reading in .csv files specifically.</em></p>
<div id="option-1-read.csv-function-from-base-r" class="section level4">
<h4><span class="header-section-number">4.1.2.1</span> Option 1: <code>read.csv</code> Function from Base R</h4>
<p>The <code>read.csv</code> file comes standard with base R, which means that you don't need to install a package to access the function. As the function name implies, this function is used when the source data file is in .csv format. Typically, the <code>read.csv</code> function requires only a single argument within the parentheses, which will be the <em>exact</em> name of the data file enclosed with quotation marks; the file should be located your working directory folder. Remember, R is a language where case and space sensitivity matters when it comes to names; meaning, if there are spaces in your file name, there needs to be spaces when the file name appears in your R script, and if some letters are upper case in your file name, there needs to be corresponding upper-case letters in your R script. Let's practice reading in a file called <strong>"PersData.csv"</strong> by entering the exact name of the file followed by the .csv extension, all within in quotation marks. Remember, the file called <strong>"PersData.csv"</strong> should already be saved in your working directory folder (see <em>Initial Steps</em>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data from working directory</span>
<span class="kw">read.csv</span>(<span class="st">"PersData.csv"</span>)</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male</code></pre>
<p>As you can see, the data that appear in your Console contains only a handful of rows and columns; nonetheless, this gives you an idea of how the read.csv function works.</p>
<p>Often, you will want to create a data frame object that is stored in your Global Environment for subsequent use. By creating a data frame object, you can manipulate and/or analyze the data within the object using a variety of functions (and without changing the data in the source file). To create a data frame object, we simply (a) use the same <code>read.csv</code> function from above, (b) add either a <code><-</code> or <code>=</code> to the left of the <code>read.csv</code> function, and (c) create a name of our choosing for the data frame object by entering that name to the left of the <code><-</code> or <code>=</code>. You can name your data frame object whatever you would like as long as it doesn't include spaces, doesn't start with a numeral, and doesn't include special characters like <code>*</code> or <code>-</code> (to name a few). I recommend choosing a name that is relatively short but descriptive, and that is not the same as another R function or variable name that you plan to use. Below, I name the new data frame object <code>personaldata</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read in data and name data frame object</span>
personaldata <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"PersData.csv"</span>)</code></pre></div>
<p>If your data file resides in a folder other than your specified working directory, then you can simply add the path directory followed by a forward slash (<code>/</code>) before the file name. Please note that your working directory will almost certainly be different than the one I show below.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data and name data frame object</span>
personaldata <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"H:/RWorkshop/PersData.csv"</span>)</code></pre></div>
<p>If you are working in RStudio, you will see the data frame object appear in your Global Environment window, as shown below. If you click on the name of the data frame object in your Global Environment window, a new tab will open up, allowing you to view the data.</p>
<p><img src="dfobject.png" /><br />
Alternatively, you can use the <code>View</code> function from base R with the name of the data frame object we just created as the parenthetical argument. Note that the <code>View</code> function begins with an upper-case <code>V</code>. Remember, R is case and space sensitive when it comes to function names. Further, the name of the data frame object you enter into the parentheses of the function must be <em>exactly</em> the same as what you originally named the data frame object when you created it (e.g., read it into R and named it). That is, R won't recognize the data frame object if you type it as <code>PersonalData</code>, but R will recognize it if you type it as <code>personaldata</code>. Sometimes it helps to copy and paste the exact names of functions and variables into the function parentheses.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View data within data frame object</span>
<span class="kw">View</span>(personaldata)</code></pre></div>
<p>Instead of using the <code>View</code> function, you could just "run" the name of the data frame object by highlighting <code>personaldata</code> in your R Script and clicking "Run" (or you can enter the name of the data frame object directly into your Console command line and click Enter). Another option is to use the <code>print</code> function (from base R) with the name of the data frame object as the sole argument in the parentheses. Similarly, if you have many rows of data, you can use the <code>head</code> function from base R to see just the first 6 rows of data, or you can use the <code>tail</code> function from base R to see the last 6 rows of data.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Highlight the name of data frame object and click Run to view data in Console</span>
personaldata</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Use print function with the name of the data frame object to view data in Console</span>
<span class="kw">print</span>(personaldata)</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View just the first 6 rows of the data frame object in Console</span>
<span class="kw">head</span>(personaldata)</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View just the last 6 rows of the data frame object in Console</span>
<span class="kw">tail</span>(personaldata)</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male</code></pre>
<p>As a final note, where available, you can use the <code>read.csv</code> function to read in .csv data from a website. For example, rather than save the .csv file to a folder on your computer, you can read in the raw data directly from my GitHub site. Within the quotation marks (<code>" "</code>), simply paste in the following URL: <a href="https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv" class="uri">https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv</a>, as shown below.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data using URL</span>
personaldata <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv"</span>)</code></pre></div>
<p>Note that by naming the data frame object <code>personaldata</code> we have overwritten the previous version of the object with that same name.</p>
</div>
<div id="option-2-read_csv-function-from-readr-package" class="section level4">
<h4><span class="header-section-number">4.1.2.2</span> Option 2: <code>read_csv</code> Function from <code>readr</code> Package</h4>
<p>As part of the <code>tidyverse</code> of R packages, the <code>readr</code> package and its functions can be used to read in a few different data file formats (as long as they are rectangular), including .csv files. We will use the <code>read_csv</code> function from the package, which as the name implies is used to read in .csv files. Among other advantages over the <code>read.csv</code> function we learned in Option 1, the <code>read_csv</code> function is notably faster. Further, <code>read_csv</code> creates a tibble (as opposed to a data frame), which behaves like a data frame for most purposes; for more information on tibbles, check out Wickham and Grolemund's (2017) chapter on tibbles: <a href="http://r4ds.had.co.nz/tibbles.html" class="uri">http://r4ds.had.co.nz/tibbles.html</a>.</p>
<p>To use the <code>read_csv</code> function, the <code>readr</code> package must be installed and accessed using the <code>install.packages</code> and <code>library</code> functions, respectively. Type <code>"readr"</code> (note the quotation marks) into the parentheses of the <code>install.packages</code> function. Next, type <code>readr</code> (without quotation marks) into the parentheses of the <code>library</code> function.</p>
<p>Just like with the <code>read.csv</code> function, enter the <em>exact</em> name of the data file (as named in your working directory), followed by .csv -- and all within quotation marks (<code>" "</code>). Further, either the <code><-</code> or <code>=</code> operator can be used to name the data frame object. Below, I name the data frame object <code>personaldata2</code> to distinguish it from the data frame object we previously read in and named using the <code>read.csv</code> function.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Install readr package</span>
<span class="kw">install.packages</span>(<span class="st">"readr"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Access readr package</span>
<span class="kw">library</span>(readr)
<span class="co"># Read data and name data frame object</span>
personaldata2 <-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">"PersData.csv"</span>)</code></pre></div>
<pre><code>## Parsed with column specification:
## cols(
## id = col_double(),
## lastname = col_character(),
## firstname = col_character(),
## startdate = col_character(),
## gender = col_character()
## )</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View just the first 6 rows of the data frame in Console</span>
<span class="kw">head</span>(personaldata2)</code></pre></div>
<pre><code>## # A tibble: 6 x 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male</code></pre>
<p>Where available, you can also use the <code>read_csv</code> function to read in .csv data from a website. For example, rather than save the .csv file to a folder on your computer, you can read in the raw data directly from my GitHub site. Within the quotation marks (<code>" "</code>), simply paste in the following URL: <a href="https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv" class="uri">https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv</a>, as shown below.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data using URL</span>
personaldata2 <-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">"https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv"</span>)</code></pre></div>
<pre><code>## Parsed with column specification:
## cols(
## id = col_double(),
## lastname = col_character(),
## firstname = col_character(),
## startdate = col_character(),
## gender = col_character()
## )</code></pre>
<p>Note that by naming the data frame object <code>personaldata2</code> we have overwritten the previous version of the object with that same name.</p>
</div>
<div id="option-3-read-function-from-lessr-package" class="section level4">
<h4><span class="header-section-number">4.1.2.3</span> Option 3: <code>Read</code> Function from <code>lessR</code> Package</h4>
<p>Just like the <code>read.csv</code> and <code>read_csv</code> functions, the <code>Read</code> function from the <code>lessR</code> package can read in .csv files; however, it can also read in other file formats like .xls/x, .sas7bdat (SAS), and .sav (SPSS). When reading in a .csv file using the <code>Read</code> function, the <em>exact</em> name of your data file from your working directory needs to be entered as an argument (followed by .csv and surrounded by quotation marks). Further, either the <code><-</code> or <code>=</code> operator can be used to name the data frame object. To use the <code>Read</code> function, the <code>lessR</code> package needs to be installed and accessed using the <code>install.packages</code> and <code>library</code> functions, respectively.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Install lessR package</span>
<span class="kw">install.packages</span>(<span class="st">"lessR"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Access lessR package</span>
<span class="kw">library</span>(lessR)
<span class="co"># Read data and name data frame object</span>
personaldata3 <-<span class="st"> </span><span class="kw">Read</span>(<span class="st">"PersData.csv"</span>)</code></pre></div>
<pre><code>##
## >>> Suggestions
## To read a csv or Excel file of variable labels, var_labels=TRUE
## Each row of the file: Variable Name, Variable Label
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 id integer 9 0 9 153 154 155 ... 198 201 282
## 2 lastname character 9 0 9 Sanchez McDonald ... Providence Legend
## 3 firstname character 9 0 8 Alejandro Ronald ... Cindy John
## 4 startdate character 9 0 5 1/1/2016 1/9/2016 ... 1/9/2016 1/9/2016
## 5 gender character 9 0 2 male male male ... female female male
## ------------------------------------------------------------------------------------------</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View just the first 6 rows of the data frame object in Console</span>
<span class="kw">head</span>(personaldata3)</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male</code></pre>
<p>Where available, you can also use the <code>Read</code> function to read in data from a website. For example, rather than save the .csv file to a folder on your computer, you can read in the raw data directly from my GitHub site. Within the quotation marks (<code>" "</code>), simply paste in the following URL: <a href="https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv" class="uri">https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv</a>, as shown below.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data using URL</span>
personaldata3 <-<span class="st"> </span><span class="kw">Read</span>(<span class="st">"https://raw.githubusercontent.com/davidcaughlin/R-Tutorial-Data-Files/master/PersData.csv"</span>)</code></pre></div>
<pre><code>##
## >>> Suggestions
## To read a csv or Excel file of variable labels, var_labels=TRUE
## Each row of the file: Variable Name, Variable Label
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 id integer 9 0 9 153 154 155 ... 198 201 282
## 2 lastname character 9 0 9 Sanchez McDonald ... Providence Legend
## 3 firstname character 9 0 8 Alejandro Ronald ... Cindy John
## 4 startdate character 9 0 5 1/1/2016 1/9/2016 ... 1/9/2016 1/9/2016
## 5 gender character 9 0 2 male male male ... female female male
## ------------------------------------------------------------------------------------------</code></pre>
<p>Note that by naming the data frame object <code>personaldata3</code> we have overwritten the previous version of the object with that same name.</p>
<p>For more information on the <code>Read</code> function from the <code>lessR</code> package, check out David Gerbing's website for the package and specifically the section with links to video tutorials: <a href="http://www.lessrstats.com/videos.html" class="uri">http://www.lessrstats.com/videos.html</a>.</p>
</div>
<div id="option-4-read_excel-function-from-readxl-package" class="section level4">
<h4><span class="header-section-number">4.1.2.4</span> Option 4: <code>read_excel</code> Function from <code>readxl</code> Package</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Install readxl package</span>
<span class="kw">install.packages</span>(<span class="st">"readxl"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Access readxl package</span>
<span class="kw">library</span>(readxl)
<span class="co"># View Excel file worksheets</span>
<span class="kw">excel_sheets</span>(<span class="st">"PersData_Excel.xlsx"</span>)</code></pre></div>
<pre><code>## [1] "Year1" "Year2"</code></pre>
<p>Note that the .xlsx file contains two worksheets called "Year1" and "Year2". We can now reference each of these worksheets when reading in the data from the Excel workbook file. To do so, we will use the <code>read_excel</code> function. As the first argument, enter the <em>exact</em> name of the data file (as named in your working directory), followed by .xlsx -- and all within quotation marks (<code>" "</code>). As the second argument, type <code>sheets=</code> followed by the name of the worksheet containing the data you wish to read in; let's read in the data from the worksheet called "Year1". Finally, either the <code><-</code> or <code>=</code> operator can be used to name the data frame object. Below, I name the data frame object <code>personaldata4</code></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data from sheet called "Year1" and name data frame object</span>
personaldata4 <-<span class="st"> </span><span class="kw">read_excel</span>(<span class="st">"H:/RWorkshop/PersData_Excel.xlsx"</span>, <span class="dt">sheet=</span><span class="st">"Year1"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View the data frame object in Console</span>
<span class="kw">print</span>(personaldata4)</code></pre></div>
<pre><code>## # A tibble: 9 x 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <dttm> <chr>
## 1 153 Sanchez Alejandro 2016-01-01 00:00:00 male
## 2 154 McDonald Ronald 2016-01-09 00:00:00 male
## 3 155 Smith John 2016-01-09 00:00:00 male
## 4 165 Doe Jane 2016-01-04 00:00:00 female
## 5 125 Franklin Benjamin 2016-01-05 00:00:00 male
## 6 111 Newton Isaac 2016-01-09 00:00:00 male
## 7 198 Morales Linda 2016-01-07 00:00:00 female
## 8 201 Providence Cindy 2016-01-09 00:00:00 female
## 9 282 Legend John 2016-01-09 00:00:00 male</code></pre>
<p>Let's repeat the process for the worksheet called "Year2".</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Read data from sheet called "Year2" and name data frame object</span>
personaldata5 <-<span class="st"> </span><span class="kw">read_excel</span>(<span class="st">"H:/RWorkshop/PersData_Excel.xlsx"</span>, <span class="dt">sheet=</span><span class="st">"Year2"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View the data frame object in Console</span>
<span class="kw">print</span>(personaldata5)</code></pre></div>
<pre><code>## # A tibble: 9 x 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <dttm> <chr>
## 1 153 Sanchez Alejandro 2016-01-01 00:00:00 male
## 2 155 Smith John 2016-01-09 00:00:00 male
## 3 165 Doe Jane 2016-01-04 00:00:00 female
## 4 125 Franklin Benjamin 2016-01-05 00:00:00 male
## 5 111 Newton Isaac 2016-01-09 00:00:00 male
## 6 201 Providence Cindy 2016-01-09 00:00:00 female
## 7 282 Legend John 2016-01-09 00:00:00 male
## 8 312 Ramos Jorge 2017-03-01 00:00:00 male
## 9 395 Lucas Nadia 2017-03-04 00:00:00 female</code></pre>
</div>
</div>
<div id="read-.csv-data-file-with-2-rows-of-variable-nameslabels" class="section level3">
<h3><span class="header-section-number">4.1.3</span> Read .csv Data File With 2+ Rows of Variable Names/Labels</h3>
</div>
</div>
<div id="remove-variable-names-from-a-data-frame-object" class="section level2">
<h2><span class="header-section-number">4.2</span> Remove Variable Names from a Data Frame Object</h2>
<p>In some instances, you may wish to remove the variable names from a data frame. To do so, just apply the <code>names</code> function with the data frame name as the argument, and then use either the <code><-</code> operator with <code>NULL</code> to remove the variable names. Let's practice this by removing the variable names from the <code>personaldata</code> data frame object that we read in under Option 1 above.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Remove variable names</span>
<span class="kw">names</span>(personaldata) <-<span class="st"> </span><span class="ot">NULL</span>
<span class="co"># View just the first 6 rows of the data frame object in Console</span>
<span class="kw">head</span>(personaldata)</code></pre></div>
<pre><code>##
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male</code></pre>
</div>
<div id="add-variable-names-from-a-data-frame-object" class="section level2">
<h2><span class="header-section-number">4.3</span> Add Variable Names from a Data Frame Object</h2>
<p>In other instances, you might find yourself with a dataset that lacks variable names, which means that you will need to add those variable names to the data frame. To do so, we can use the <code>colnames</code> function from base R, and enter the name of the data frame as the argument. Using the <code><-</code> operator, we can specify the variable names using the <code>c</code> function that contains a vector of variable names in quotation marks (<code>" "</code>) as the arguments.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View just the first 6 rows of the data frame object in Console to verify that no variable names exist</span>
<span class="kw">head</span>(personaldata)</code></pre></div>
<pre><code>##
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Add variable names to data frame object</span>
<span class="kw">colnames</span>(personaldata) <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"id"</span>, <span class="st">"lastname"</span>, <span class="st">"firstname"</span>, <span class="st">"startdate"</span>, <span class="st">"gender"</span>)
<span class="co"># View just the first 6 rows of data in Console to verify that the variable names have been added</span>
<span class="kw">head</span>(personaldata)</code></pre></div>
<pre><code>## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male</code></pre>
</div>
<div id="writing-data" class="section level2">
<h2><span class="header-section-number">4.4</span> Writing Data</h2>
<p><strong>Writing data</strong> refers to the process of exporting data from the R environment to a (working directory) folder. If you collaborate with others who do not work in R, writing data will allow them to use the data you cleaned, managed, or manipulated in the R environment in other software programs. In this tutorial we focus on how to write a data frame and a table to our working directory folder as .csv files.</p>
<div id="initial-steps-1" class="section level3">
<h3><span class="header-section-number">4.4.1</span> Initial Steps</h3>
<p>Any function that appears in the Initial Steps section has been covered in another tutorial. If you need a refresher, please view the relevant tutorial. In addition, a previous tutorial may show you how to perform the same action using different functions or packages.</p>
<p>If you haven't already, save the file called <strong>"PersData.csv"</strong> into your working directory folder, wherever that is located and set your working directory. Your working directory folder will likely be different than the one shown below (i.e., <code>"H:/RWorkshop"</code>). You can manually set your working directory folder in your drop-down menus by going to <em>Session > Set Working Directory > Choose Directory...</em>. Create a new R Script file (.R) or update an existing R Script file so that you can save your script and annotations.</p>
<p>Next, read in the file called <strong>"PersData.csv"</strong> using a read function of your choosing. In this example, I use the <code>read_csv</code> function from the <code>readr</code> package. If you use the <code>read_csv</code> function, be sure that you have installed and accessed the <code>readr</code> package using the <code>install.packages</code> and <code>library</code> functions.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Set your working directory</span>
<span class="kw">setwd</span>(<span class="st">"H:/RWorkshop"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Install readr package if you haven't already</span>
<span class="kw">install.packages</span>(<span class="st">"readr"</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Access readr package</span>
<span class="kw">library</span>(readr)
<span class="co"># Read in data</span>
personaldata <-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">"PersData.csv"</span>)</code></pre></div>
<pre><code>## Parsed with column specification:
## cols(
## id = col_double(),
## lastname = col_character(),
## firstname = col_character(),
## startdate = col_character(),
## gender = col_character()
## )</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View the names of the variables in the data frame object</span>
<span class="kw">names</span>(personaldata)</code></pre></div>
<pre><code>## [1] "id" "lastname" "firstname" "startdate" "gender"</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># View data frame object</span>
personaldata</code></pre></div>
<pre><code>## # A tibble: 9 x 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male</code></pre>
</div>
<div id="write-data-frame-to-working-directory" class="section level3">
<h3><span class="header-section-number">4.4.2</span> Write Data Frame to Working Directory</h3>
<p>The <code>write.csv</code> function from base R can be used to write a data frame object to your working directory or to a folder of your choosing. Let's write the <code>personaldata</code> data frame (that we read in and named above) to our working directory. Before doing so, however, let's make a minor change to the data frame. Specifically, let's remove the <code>lastname</code> variable from the data frame. To do so, type the name of the data frame (<code>personaldata</code>), followed by the <code>$</code> symbol and then the name of the variable in question (<code>lastname</code>). Next, type the <code><-</code> operator followed by <code>NULL</code>. This script will remove the variable from the data frame.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Remove variable from data frame</span>
personaldata<span class="op">$</span>lastname <-<span class="st"> </span><span class="ot">NULL</span>
<span class="co"># View data frame object</span>
personaldata</code></pre></div>
<pre><code>## # A tibble: 9 x 4
## id firstname startdate gender
## <dbl> <chr> <chr> <chr>
## 1 153 Alejandro 1/1/2016 male
## 2 154 Ronald 1/9/2016 male
## 3 155 John 1/9/2016 male
## 4 165 Jane 1/4/2016 female
## 5 125 Benjamin 1/5/2016 male
## 6 111 Isaac 1/9/2016 male
## 7 198 Linda 1/7/2016 female
## 8 201 Cindy 1/9/2016 female
## 9 282 John 1/9/2016 male</code></pre>
<p>To write our edited data frame (<code>personaldata</code>) to our working directory, we use the <code>write.csv</code> function from base R. As the first argument in the parentheses, type the name of the data frame (<code>personaldata</code>). <em>Type a comma (<code>,</code>) before the second argument, as this is how we separate arguments from one another when there are more than one.</em> As the second argument, let's type what we want to name the file that we will create in our working directory. Make sure that the name of the new .csv file is in quotation marks (<code>" "</code>). Here, I name the new file <strong>"Edited PersData.csv"</strong>; it is important that you keep the .csv extension at the end of the name you provide.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Write data frame to working directory</span>
<span class="kw">write.csv</span>(personaldata, <span class="st">"Edited PersData.csv"</span>)</code></pre></div>
<p>If you go to your working directory folder, you will find the file called <strong>"Edited PersData.csv"</strong> saved there.</p>
<p>We can also specify which folder that we want to write our data to using the full path extension and what we would like to name the new .csv file.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Write data frame to folder</span>
<span class="kw">write.csv</span>(personaldata, <span class="st">"H:/RWorkshop/Edited PersData2.csv"</span>)</code></pre></div>
<p>If you go to your working directory folder, you will find the file called **"Edited PersData2.csv*"**.</p>
</div>
<div id="write-table-to-working-directory" class="section level3">
<h3><span class="header-section-number">4.4.3</span> Write Table to Working Directory</h3>
<p>Sometimes we work with table objects in R. If we wish to write a table to our working directory, we can use the <code>write.table</code> function from base R. Before doing so, we need to create a data table object as an example, which we can do using the <code>table</code> function from base R.</p>
<p>To create a table, first, come up with a name for your new table object; in this example, I name the table <code>table_example</code> (because I'm so creative). Second, type the <code><-</code> operator to the right of your new table name to tell R that you are creating a new object. Third, type the name of the table-creation function, which is <code>table</code>. Fourth, in the function's parentheses, as the first argument, enter the name of first variable you wish to use to make the table, and use the <code>$</code> symbol to indicate that the variable (<code>gender</code>) belongs to the data frame in question (<code>personaldata</code>), which should look like this: <code>personaldata$gender</code>. Fifth, as the second argument, enter the name of the second variable you wish to use to make the table, and use the <code>$</code> symbol to indicate that the variable (<code>startdate</code>) belongs to the data frame in question (<code>personaldata</code>), which should look like this: <code>personaldata$startdate</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Create table from gender and startdate variables from personaldata data frame</span>
table_example <-<span class="st"> </span><span class="kw">table</span>(personaldata<span class="op">$</span>gender, personaldata<span class="op">$</span>startdate)
<span class="co"># View table in Console</span>
table_example</code></pre></div>
<pre><code>##
## 1/1/2016 1/4/2016 1/5/2016 1/7/2016 1/9/2016
## female 0 1 0 1 1
## male 1 0 1 0 4</code></pre>
<p>The table above shows how how many female versus male employees started working on a given date.</p>
<p>Now we are ready to write the table called <code>table_example</code> to our working directory using the <code>write.table</code> function. As the first argument, type the name of the table object (<code>table_example</code>). Second, type what we would like to call the file when it is saved in our working directory (<code>**"Practice Table.csv"**</code>); be sure to include the .csv extension in the name and wrap it all in quotation marks. Third, use the <code>sep=","</code> argument to specify that the values in the table are separated by commas, as this will be a comma separated values file. Fourth, add the argument <code>col.names=NA</code> to format the table such that the column names will be aligned with their respective values. The reason for this fourth argument is that in our table the first column will contain the row names of one of the variables; if we don't include this argument, the function will by default enter the name of the first column name associated with one of the levels of the variables in the first column, and because the first column actually contains the row names for the table, the row names will be off by one column. The <code>col.names=NA</code> argument simply leaves the first cell in the top row blank so that in the next column to the right, the first column name for one of the variables will appear. [To understand what the table would look like <em>without</em> this fourth argument, simply omit it, and open the resulting file in your working directory to see what happens.]</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Write table to working directory</span>
<span class="kw">write.table</span>(table_example, <span class="st">"Practice Table.csv"</span>, <span class="dt">sep=</span><span class="st">","</span>, <span class="dt">col.names=</span><span class="ot">NA</span>)</code></pre></div>
<p>If you go to your working directory, you will find the file called <strong>"Practice Table.csv"</strong>.</p>
</div>
</div>
<div id="summary" class="section level2">
<h2><span class="header-section-number">4.5</span> Summary</h2>
<p>Reading data into R is an important first step, and often, it is the step that causes the most problems for new R users. The <code>read.csv</code>, <code>read_csv</code>, and <code>Read</code> functions can all be used to read data into R. The <code>read_csv</code> has the advantage of being fast, which can be helpful when reading in large data files. The <code>Read</code> function has the advantage of being able to read in data file formats other than .csv. With all that said, if you're working with smaller data files in the .csv format, the <code>read.csv</code> format typically works just fine. In all subsequent tutorials, I use the <code>read_csv</code> function from the <code>readr</code> package. Writing data from the R environment to your working directory or another folder can be useful, especially when collaborating with those who do not use R. The <code>write.csv</code> function writes a data frame object to a .csv file, whereas the <code>write.table</code> function writes a data table object to a .csv file.</p>
</div>
<div id="references" class="section level2">
<h2><span class="header-section-number">4.6</span> References</h2>
<p>Wickham, H., & Grolemund, G. (2017). <em>R for data science: Visualize, model, transform, tidy, and import data.</em> Sebastopol, CA: O;Reilly Media, Inc. <a href="https://r4ds.had.co.nz/" class="uri">https://r4ds.had.co.nz/</a></p>
</div>
</div>
</section>
</div>
</div>
</div>
<a href="gentleintro.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="references.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
</div>
</div>
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
<script src="libs/gitbook-2.6.7/js/clipboard.min.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-clipboard.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": false,
"facebook": true,
"twitter": true,
"linkedin": false,
"weibo": false,
"instapaper": false,
"vk": false,
"all": ["facebook", "twitter", "linkedin", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": null,
"text": null
},
"history": {
"link": null,
"text": null
},
"view": {
"link": null,
"text": null
},
"download": ["R for HR.pdf", "R for HR.epub"],
"toc": {
"collapse": "subsection"
}
});
});
</script>
</body>
</html>