categorize
Data exploration and transformation
Create a categorical variable from a continuous one.
Description
categorize is a shortcut and extension of egen newvar = cut(args) [...], icodes.
Unlike egen with the cut() function, categorize:
Does not require the user to include the minimum and the maximum value of the continuous variable in the list of breaks.
Creates more descriptive value labels for the generated categorical variable. Users can specify a variable label for the new variable.
Allows users working with age or poverty ratio variables to use “default” breaks.
Syntax
categorize varname, generate(newvar) {breaks(numlist)|default(string)} [options]| Option | Description |
|---|---|
generate(newvar) |
Name of the categorical variable to be generated. |
breaks(numlist) |
Left-hand ends of the grouping intervals. Do not include the minimum or the maximum value of varname. Either breaks() or default() must be specified. |
default(string) |
Use default breaks; “age” or “povratio”. For default("age"), these are 18 and 65. For default("povratio"), these are 50, 100, 150, 200, and 250. Cannot be combined with breaks(). |
lblname(string) |
Name of value label to create; default is “varname_lbl”. Ignored if nolabel is specified. |
nformat(%fmt) |
Numeric format to use in value labels; default is %13.0gc. Ignored if nolabel is specified. |
nolabel |
Do not assign value labels to newvar. |
varlabel(string) |
Variable label for newvar. |
Examples
Using user-specified breaks.
categorize pincp_adj, generate(pincp_cat) breaks(25000 50000 100000)Using default breaks.
categorize agep, generate(age_cat) default("age") varlabel("Age group")