Saturday, October 30, 2010

SAS - Efficiency

Some SAS recommendations for making the code more efficient.
  • Avoid necessary DATA steps.
  • IF statements before loading raw data.
  • Use of WHERE statements in procedures.
  • Drop unnecessary variables by using KEEP/DROP.
  • Optimize size of variables using LENGTH.
  • Use of IF-THEN-ELSE statements when possible instead of an IF-THEN sequence.
  • Order of conditions inside the statements. The most frequent case should be the first.
  • Multiple OR operators instead of IN.
  • Use of DATA _NULL_ when creating Reports.
  • Use of permanent data sets in libraries.
  • use of PROC DATASETS to modify variables, instead of DATA statements.
  • PROC APPEND to join similar datasets.
  • Use RETAIN statement to initialize contents
  • Avoid unnecessary sorts: two-level sort instead of one-level sort and a two-level sort.
  • Sort only what we have to sort
  • Sort: Using NOEQUALS if we do not neet the relative order.

SAS - Macros

Macros are powerful tools used in SAS code which allow the execution of generic code applied to specific tables.
  • Macro variables
&name; /* Macro variable */
%name; /* Use of created macros or system macros */


%let city = "Barcelona";
TITLE "City: &city";

  • Generation of code by using macros
%macro b;

proc print data=%b;

SAS - Retain and Lag

In SAS, working inside one observation is easy.
On the contrary, working across observations is complicated.

In this post we will put the basics of two statements that allow to communicate information across registers.
With RETAIN, we can save the value of a variable across registers. In the following example, each register will have an additional subject field, with increasing value starting from 0.

data better;
retain subject 0;
subject = subject + 1;
input score 1 score 2;

RETAIN presents some problems, specially when dealing with missing values. If a value is missing, RETAIN does not maintain the previous non-missing value.

  • LAG
The LAG function returns the value of its argument the last time it was executed. For example,

if subj ne lag(subj) then old = new;

Here, old only changes when there is a new subject.

SAS - Basic statements

SAS is a powerful statistical language which consists bascally of an extended SQL language and other statements. In this post we will summarize the basic ones.

In this page, from the UCLA, there is a basic SAS tutorial.

  • DATA statement
input VAR_1 VAR_2 VAR_3 $ VAR_4; /* $ indicates VAR_3 is alphanumeric */
/* start of data reading */
123 1 foo 234
124 2 foo2 221;

  • SET statement (inside DATA)
set NAME_DATA_2;
run; /* NAME_DATA := NAME_DATA_2 */

  • SET statement with conditional (inside DATA)
/* concatenates name_data_2 and name_data_3 */
if VAR_1 = 'yes' then answer = 1;
else answer = 0;
/* NAME_DATA := NAME_DATA_2 U NAME_DATA_3 with the additional answer field in each row */

proc sql;
/* sql code like */
create table NAME_DATA as
select * from ANOTHER_TABLE group by VAR_1;

It is important to notice that the proc sql statement ends with quit, whereas all other instructions finish with run.

  • LEFT JOIN (inside PROC SQL)
LEFT JOIN instruction allows us to join two tables by some relation between them, BUT with the additional functionality that all rows of the first table (the left table) remain at least with one observation in the resulting table, even if there is no matching between rows of the two tables.

  • SELECT DISTINCT (inside SQL code)
SELECT DISTINCT allows you to ensure that the multiplicity of each element inside a table is one, by including its identifying fields into the select distinct command.

  • How to sort a table?
proc sort data = NAME_DATA;
by VAR_1, descending VAR_2;

proc sql;
select * from NAME_DATA order by VAR_1, VAR_2 desc;

Or using the assistent. Menu Data -> Query

  • Variable names: not special chars, only '_'. Name literals finish with n, like 'Hello World!'n

Tuesday, October 19, 2010

LU decomposition with Lapack

2 pages where we can find information:

Saturday, October 16, 2010


Useful comands to plot histograms:

gnuplot> set style data histograms
gnuplot> plot "random1.dat" using 2:xticlabels(1)
gnuplot> set boxwidth 0.9 relative
gnuplot> set style histogram clustered gap 0
gnuplot> set style fill solid

Some interesting examples:

If we don't want to show the key:

gnuplot>unset key