0%

Select N Rows or K-th Elements from Macro Variable - SASlearner

Selecting N rows from a dataset or K-th element from a macro variable is a common data manipulation process. This post is listing the ways how to resolve these questions.

First or Last N Rows

For the proc sql method, inobs or outobs both can be used to select N rows from a dataset, but worth noting that it will cause differences if you join tables.

proc sql inobs = 5 /*outobs=5*/;
    create table cls as
        select * from sashelp.class;
quit;

For the SAS code instead of proc sql, the most straightforward method is to utilize the obs that is very similar to the sql method. Otherwise if you would like to select a range of rows, just add another parameter firstobs.

data raw_o;
    set sashelp.class(firstobs = 5 obs = 10);
run;

Utilizing the _N_ variable with the IF-ELSE statement to reach this purpose I suppose is more flexible sometimes.

data raw_1;
    set sashelp.class;
    if 5<= _N_ <=10 then output;
run;

So how about selecting the last rows? It seems we have to know the total number of rows, and then utilize the _N_ variable once.

data raw_2;
    set sashelp.class;
    if &n_rows.-4<=_N_<=&n_rows. then
        output;
run;

BTW how to select N observations randomly, we can use the proc surveyselect procedure and define method = srs as the simple random selection method, so that we get the random 5 rows from this dataset.

proc surveyselect data = sashelp.class out = rd_class
    method = srs sampsize = 5 seed = 123456;
run;

Besides for the row number, we can also add a new row number by group, as shown the following example:

proc sort data = sashelp.class out = sorted_class;
    by age;
run;

data sorted_class_2;
    set sorted_class;
    by age;
    if first.age then new_row_number=0;
    new_row_number+1;
run;

K-th Elements from Macro Variable

First off, we would want to create a macro list to store information. If we just want to store a single value, as following:

proc sql;
    select count(name) into: n_name trimmed from sashelp.class;
quit;
%put &n_name;

Storing multiple values is also very similar.

proc sql;
    select count(name),mean(height) format=10.2
        into: n_name trimmed, :mean_height trimmed
    from sashelp.class;
quit;
%put &n_name &mean_height;

Or just simply want the list values to be assigned to a list of macro variables.

proc sql;
    select distinct(name) into: n1-:n19 from sashelp.class;
quit;

But the above example seems like you have to know the total number of distinct values in the dataset. So maybe the common way is to store the column values in a list separated by any delimiter you want.

proc sql;
    select distinct(name) into: nameList separated by ' '
        from sashelp.class;
/*    %let numNames = &sqlobs;*/
quit;
%put &nameList.;

Supposed that I just want the second element in this macro variable, what should I do? Maybe the %scan function is enough to reach our purpose.

%put %scan(&nameList,2);

Obviously, it seems not as convenient to extract the element as nameList[2] in R, but it is enough to use in SAS.

Another way is to loop the macro variable by %do, as below to assign each element to a new column.

%let cntName = %sysfunc(countw(&nameList));
data raw;
    array names[&cntName] $200 name1-name&cntName;
    do i=1 to &cntName;
        names[i]=scan("&nameList", i);
    end;
    drop i;
run;

Otherwise there are still many great documents that have been posted to show how to store and manipulate lists in SAS, like Choosing the Best Way to Store and Manipulate Lists in SAS


Reference

How to use the SAS SCAN Function?
Creating Lists! Using the Powerful INTO Clause with PROC SQL to Store Information in Macro Lists
How to Select the First N Rows in SAS
Using SAS® Macro Variable Lists to Create Dynamic Data-Driven Programs
Storing and Using a List of Values in a Macro Variable

Please indicate the source: http://www.bioinfo-scrounger.com