0%

Encoding in SAS

一个SAS encoding的问题

当SAS的配置附件选择u8的sasv9.cfg后,SAS的-ENCODING参数就变成UTF-8,那么若输入数据是其他格式,如euc-cn Simplified Chinese (EUC);那么若不将SAS session 转化未UTF-8,则可能会出现以下报错:

ERROR: Some character data was lost during transcoding in the data set MYDATA.DS3. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.

对于解决SAS encoding的问题,可以follow以下SAS的步骤:Migrating Data to UTF-8 for SAS 来排错

针对上述的ERROR,可参考:Determine Whether the CVP Engine Is Needed to Read Your Data without Truncation,即调用CVP engine

  1. The first LIBNAME statement points to the original data set. Use a second LIBNAME statement to point to the location of the library that will contain the new data set.

     libname mylib cvp "path to original data set";
     libname mylib2 "path to new data set";
  2. Use PROC DATASETS with the COPY statement and the OVERRIDE= option. When you specify OVERRIDE=(ENCODING=SESSION OUTREP=SESSION) in the COPY statement, the new data set is created in the host data representation and encoding of the SAS session that is executing the COPY statement. Add the CONTENTS statement to view a description of the content of the new data set.

     proc datasets nolist;
       copy in=mylib out=mylib2 override=(encoding=session outrep=session);
       contents data=mylib2.mydata; 
     run;

因此以下两种方式都行(都是基于以上原理):

libname mylib cvp "./Documents/test";
libname mylib2 "./Documents/format";
proc datasets nolist;
  copy in=mylib out=mylib2 override=(encoding=session outrep=session);
  contents data=mylib2.foo; 
run;

或:

libname inlib cvp "./Documents/test"; 
libname outlib "./Documents/format" outencoding="UTF-8";
proc datasets nolist;
  contents data=outlib.foo; 
run;

若不是出现以上ERROR报错,那可以尝试下以下解决常规的encoding问题的方法:

  • Using the FILE Statement to Specify an Encoding for Writing to an External File
  • Using the FILENAME Statement to Specify an Encoding for Reading an External File
  • Using the FILENAME Statement to Specify an Encoding for Writing to an External File
  • Changing Encoding for Message Body and Attachment
  • Using the INFILE= Statement to Specify an Encoding for Reading from an External File

具体代码参照:ENCODING Examples

以上是个人解决SAS ENCODING的过程,仅供参考

本文出自于http://www.bioinfo-scrounger.com转载请注明出处