Skip to content

When passed a tibble(), ds() always reports Warning: Unknown or uninitialised column: distbegin #218

@jjrob

Description

@jjrob

Unless you are doing a binned analysis, you should not include distbegin and distend columns in your data frame passed to ds(). But when you don't, ds() always reports warnings about those columns being missing. Here's an example:

> head(obs1)
# A tibble: 6 x 4
  object Sample.Label  size distance
   <int> <chr>        <dbl>    <dbl>
1 276831 16936673      1        368.
2 276844 16936692      2       3089.
3 276846 16936693      4.67     635.
4 276832 16936673      3        583.
5 276895 16936828      1       3640.
6 276847 16936694      2       2333.
> summary(obs1)
     object       Sample.Label            size          distance     
 Min.   :268824   Length:329         Min.   : 1.00   Min.   :   0.0  
 1st Qu.:271124   Class :character   1st Qu.: 1.00   1st Qu.: 574.2  
 Median :273335   Mode  :character   Median : 1.25   Median :1499.4  
 Mean   :273629                      Mean   : 2.14   Mean   :1980.6  
 3rd Qu.:276267                      3rd Qu.: 3.00   3rd Qu.:3072.5  
 Max.   :293257                      Max.   :15.00   Max.   :8230.6  
> ds(obs, truncation = 6000, key = "hr", adjustment = NULL)
Warning: Unknown or uninitialised column: `distbegin`.
Warning: Unknown or uninitialised column: `distend`.
Warning: Unknown or uninitialised column: `distbegin`.
Fitting hazard-rate key function
AIC= 5400.108
No survey area information supplied, only estimating detection function.

Distance sampling analysis object

Detection function:
 Hazard-rate key function 

Estimated abundance in covered region: 836.4272 

This problem occurs in the latest version of distance at the time of this writing, version 2.0.1. My guess is that it's caused by validation code in checkdata.R, such as this code that occurs at line 29:

  if(any(!is.null(data$distbegin), !is.null(data$distend)) && !all(!is.null(data$distbegin), !is.null(data$distend))){
    stop("You have provided either a 'distbegin' or 'distend' column in your dataset but not both. Please provide both or remove these and provide a distance column and use the cutpoint argument.", call. = FALSE)
  }

Checks like !is.null(data$distbegin) will report the warning if data does not contain a distbegin column. Here is a demonstration, in which I check first my obs1 data frame above for an existing column, distance, and then check it for an non-existing column foobar:

> is.null(obs1$distance)
[1] FALSE
> is.null(obs1$foobar)
Warning: Unknown or uninitialised column: `foobar`.
[1] TRUE

A better way to check for a missing column might be the base R hasName() function:

> !hasName(obs1, "foobar")
[1] TRUE

Am I missing something? If my understanding is correct, then this problem is affecting the entire Distance user community (except users performing binned analyses). It has come up recently on the email list, e.g. here, but was not diagnosed as a bug at that time.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions