If you often work with data manipulation, obviously you need to know how to merge and transpose data as that is very common in our data processing.
In R, you can use the
dplyr package to
inner_join and other functions
*_join() to handle your data in any form. As per to transposing, we generally call it Pivot data, and use corresponding
pivot_longer() functions from the
tidyr package to handle your data from long to wide, or the opposite. This how to use R to pivot data could refer to my post. Pivoting data in R
As for the SAS, let's see this with step-by-step examples.
In SAS, it defines the merge processing as three approaches.
- one-to-one reading
The first two ways are the usage of
set, not to discuss in this post. The third way I suppose is the most common in our data processing.
First, we create two example datasets as shown below.
proc sort data = sashelp.class; by Name; run; data cls1; set sashelp.class(keep = Name Sex obs = 10); run; data cls2; set sashelp.class(keep = Name Age firstobs = 5 obs = 15); run;
Then we define the most important argument
by to specify which variable to join by. Before merging, we must make sure the dataset is sorted by that
A simple answer is that the SAS match-merge is based on the classic sequential match algorithm, and the latter is based on the premise that all input streams are sorted identically.
data clss; merge cls1 cls2; by Name; run;
Actually in my option, the above code is not commonly used, that is just similar with the
full_join all x rows, followed by unmatched y rows. Maybe I would like to use merge processing such as
inner_join. In this case we need to specify the
IN= argument in the
For instance, post-process like
data clss2; merge cls1(in = x) cls2(in = y); by Name; if x; run;
data clss2; merge cls1(in = x) cls2(in = y); by Name; if x and y; run;
It can be seen above that we can use
IN= to control which rows to be filtered.
However, considering we have to sort the dataset first, I sometimes would like to use
proc sql to merge data. Since it’s close to the form I used in R.
proc sql; create table clss3 as select x.*, y.* from cls1 as x left join cls2 as y on x.Name=y.Name; quit;
In my opinion, for transpose processing, it's better to learn and understand it with a few examples. Simply memorizing the arguments is easy to confuse. I suggest running each code and seeing the output, and thinking of how to realize it.
So let's see the examples to show transpose a dataset from long to wide, i.e. rows to columns.
First, create an example data from
data shoes; set sashelp.shoes; if Subsidiary in ("Johannesburg" "Nairobi"); keep Region Product Subsidiary Sales Inventory; run;
By default, the transpose procedure only transposes the numeric columns from long to wide and ignores any character column. But actually in normal work, we generally will define a series of arguments like
label. Besides with the
var statement, we can select which column or columns you want to transpose. And for the
id statement you can use the variable of a column as the new variable names.
proc transpose data = shoes(where = (Subsidiary = "Johannesburg")) out = res; var Sales; id Product; run;
If you want to group the data by a variable, then add the
proc transpose data = shoes out = res; var Sales; id Product; by Subsidiary; run;
If you want to re-define the columns
_LABEL, then add the
proc transpose data = shoes out = res name = var_name label = label_name; var Sales; id Product; by Subsidiary; run;
Then we see an example to show how to transpose data from wide to long.
proc transpose data = res(drop = var_name label_name) out = res2; var Boot Sandal Slipper; by Subsidiary; run;
SAS MERGING TUTORIAL
MATCH MERGING DATA FILES IN SAS | SAS LEARNING MODULES
Complete Guide to PROC TRANSPOSE in SAS
HOW TO RESHAPE DATA WIDE TO LONG USING PROC TRANSPOSE | SAS LEARNING MODULES
Please indicate the source: http://www.bioinfo-scrounger.com