This post is talking about how to display descriptive statistics for variables quickly. In the sense that we would like to know an usual and agile way to accomplish it in SAS.
The following examples show how to resolve the below questions (just very simple but quite common):
- How to count distinct values
- How to count variables by group
- How to produce the frequency table of variables
- How to calculate the statistics for variables
In R, it seems like using
Hmisc::describe is available, but not the only function, other external packages or
base functions like
summary can also be utilized very well.
Count Values or Distinct Values
Here we use the
proc sql procedure with the SAS dataset called BirthWgt, to count the
proc sql; select count(Race) as cnt_race from sashelp.BirthWgt; run;
But I feel just count the total number of
Race variable is not make sense. If we would like to count the
Married variables grouped by the
proc sql; select Race, count(Married) as cnt_married from sashelp.BirthWgt group by Race; run;
If you want to count the distinct value, add the
distinct in the
proc sql; select count(distinct Married) as distinct_married from sashelp.BirthWgt; run;
We can use
proc freq to create frequency tables for one or more variables. Such as the example for the
SomeCollege variable with missing values, sorted by
Race and define the output as
result dataset including cumulative frequencies and percentages.
proc sort data = sashelp.BirthWgt; by Race; run; proc freq data=sashelp.BirthWgt; tables SomeCollege /out=result missing outcum; by Race; run;
BTW if you add a statistical argument like
chisq, the result becomes the statistics for the Chi-Square Tests.
Otherwise we can use
proc tabulate to create a table for displaying multiple statistics quickly.
proc tabulate data = sashelp.cars; var weight; table weight * (N Min Q1 Median Mean Q3 Max); run;
But I think
proc means is more convenient to save the output like:
proc means data = sashelp.cars n nmiss mean std median p25 p75 min max; var weight; output out=weight_tbl n=n nmiss=nmiss mean=mean std=std median=median p25=p25 p75=p75 min=min max=max; run;
Please indicate the source: http://www.bioinfo-scrounger.com