As you know, Stata basically stores value-labeled data as a vector of integers or doubles, not necessarily an ordered sequence starting at 1, and a Dict going from Int => String.
Accessing the string values, which we generally care the most about, is hard with ReadStat. You have to
- Use ReadStat not StatFiles to access the internal fields of the Stata File
- Construct the
DataFame from the data and header fields
3 . Use the value_label_dict field to perform the replacement
- Use
get on the DataValue elements of the array
This is not the most user friendly thing.
There isn't a great solution for this in Julia as we dont have a CategoricalArray equivalent where the base dict maps arbitrary types to strings. So converting to categorical array will drop the underlying integers, which are useful to keep due to inter-operability.
haven in R recently made a change with how this is handled with the <dbl+lbl> vector type. Though working with it is a bit of a pain, see here.
I can email a data-set to someone with an MWE for more information.
As you know, Stata basically stores value-labeled data as a vector of integers or doubles, not necessarily an ordered sequence starting at
1, and aDictgoing fromInt => String.Accessing the string values, which we generally care the most about, is hard with
ReadStat. You have toDataFamefrom the data and header fields3 . Use the
value_label_dictfield to perform the replacementgeton the DataValue elements of the arrayThis is not the most user friendly thing.
There isn't a great solution for this in Julia as we dont have a
CategoricalArrayequivalent where the base dict maps arbitrary types to strings. So converting to categorical array will drop the underlying integers, which are useful to keep due to inter-operability.havenin R recently made a change with how this is handled with the<dbl+lbl>vector type. Though working with it is a bit of a pain, see here.I can email a data-set to someone with an MWE for more information.