Reading `.dta` with value labels

As you know, Stata basically stores value-labeled data as a vector of integers or doubles, not necessarily an ordered sequence starting at `1`, and a `Dict` going from `Int => String`. 

Accessing the string values, which we generally care the most about, is hard with `ReadStat`. You have to 

1. Use ReadStat not StatFiles to access the internal fields of the Stata File
2. Construct the `DataFame` from the data and header fields
3 . Use the `value_label_dict` field to perform the replacement
4.  Use `get` on the DataValue elements of the array

This is not the most user friendly thing. 

There isn't a great solution for this in Julia as we dont have a `CategoricalArray` equivalent where the base dict maps arbitrary types to strings. So converting to categorical array will drop the underlying integers, which are useful to keep due to inter-operability.

`haven` in R recently made a change with how this is handled with the `<dbl+lbl>` vector type. Though working with it is a bit of a pain, see [here](https://stackoverflow.com/a/63566175/8891877). 

I can email a data-set to someone with an MWE for more information. 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading `.dta` with value labels #74

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading .dta with value labels #74

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Reading `.dta` with value labels #74