categorize
Data exploration and transformation
Create a categorical variable from a continuous one.
Description
categorize
is a shortcut and extension of egen newvar = cut(args) [...], icodes
.
Unlike egen
with the cut()
function, categorize
:
Does not require the user to include the minimum and the maximum value of the continuous variable in the list of breaks.
Creates more descriptive value labels for the generated categorical variable. Users can specify a variable label for the new variable.
Allows users working with age or poverty ratio variables to use “default” breaks.
Syntax
generate(newvar) {breaks(numlist)|default(string)} [options] categorize varname,
Option | Description |
---|---|
generate(newvar) |
Name of the categorical variable to be generated. |
breaks(numlist) |
Left-hand ends of the grouping intervals. Do not include the minimum or the maximum value of varname. Either breaks() or default() must be specified. |
default(string) |
Use default breaks; “age” or “povratio”. For default("age") , these are 18 and 65. For default("povratio") , these are 50, 100, 150, 200, and 250. Cannot be combined with breaks() . |
lblname(string) |
Name of value label to create; default is “varname_lbl”. Ignored if nolabel is specified. |
nformat(%fmt) |
Numeric format to use in value labels; default is %13.0gc. Ignored if nolabel is specified. |
nolabel |
Do not assign value labels to newvar. |
varlabel(string) |
Variable label for newvar. |
Examples
Using user-specified breaks.
generate(pincp_cat) breaks(25000 50000 100000) categorize pincp_adj,
Using default breaks.
generate(age_cat) default("age") varlabel("Age group") categorize agep,