- Avoid necessary DATA steps.
- IF statements before loading raw data.
- Use of WHERE statements in procedures.
- Drop unnecessary variables by using KEEP/DROP.
- Optimize size of variables using LENGTH.
- Use of IF-THEN-ELSE statements when possible instead of an IF-THEN sequence.
- Order of conditions inside the statements. The most frequent case should be the first.
- Multiple OR operators instead of IN.
- Use of DATA _NULL_ when creating Reports.
- Use of permanent data sets in libraries.
- use of PROC DATASETS to modify variables, instead of DATA statements.
- PROC APPEND to join similar datasets.
- Use RETAIN statement to initialize contents
- Avoid unnecessary sorts: two-level sort instead of one-level sort and a two-level sort.
- Sort only what we have to sort
- Sort: Using NOEQUALS if we do not neet the relative order.
Welcome to Fer programes, a blog created with the intention to help us and other programmers in the developing of IT applications. It includes some commands used previously to solve specific problems, links to interesting web pages and general explanations about informatic topics. You are free to collaborate with comments when you consider it is oportune.
Saturday, October 30, 2010
SAS - Efficiency
Some SAS recommendations for making the code more efficient.
SAS - Macros
Macros are powerful tools used in SAS code which allow the execution of generic code applied to specific tables.
%name; /* Use of created macros or system macros */
Example:
%let city = "Barcelona";
TITLE "City: &city";
a
%mend;
proc print data=%b;
run;
- Macro variables
%name; /* Use of created macros or system macros */
Example:
%let city = "Barcelona";
TITLE "City: &city";
- Generation of code by using macros
a
%mend;
proc print data=%b;
run;
SAS - Retain and Lag
In SAS, working inside one observation is easy.
On the contrary, working across observations is complicated.
In this post we will put the basics of two statements that allow to communicate information across registers.
data better;
retain subject 0;
subject = subject + 1;
input score 1 score 2;
datalines;
run;
RETAIN presents some problems, specially when dealing with missing values. If a value is missing, RETAIN does not maintain the previous non-missing value.
if subj ne lag(subj) then old = new;
Here, old only changes when there is a new subject.
On the contrary, working across observations is complicated.
In this post we will put the basics of two statements that allow to communicate information across registers.
- RETAIN
data better;
retain subject 0;
subject = subject + 1;
input score 1 score 2;
datalines;
run;
RETAIN presents some problems, specially when dealing with missing values. If a value is missing, RETAIN does not maintain the previous non-missing value.
- LAG
if subj ne lag(subj) then old = new;
Here, old only changes when there is a new subject.
SAS - Basic statements
SAS is a powerful statistical language which consists bascally of an extended SQL language and other statements. In this post we will summarize the basic ones.
In this page, from the UCLA, there is a basic SAS tutorial.
input VAR_1 VAR_2 VAR_3 $ VAR_4; /* $ indicates VAR_3 is alphanumeric */
cards;
/* start of data reading */
123 1 foo 234
124 2 foo2 221;
run;
set NAME_DATA_2;
run; /* NAME_DATA := NAME_DATA_2 */
set NAME_DATA_2 NAME_DATA_3;
/* concatenates name_data_2 and name_data_3 */
if VAR_1 = 'yes' then answer = 1;
else answer = 0;
/* NAME_DATA := NAME_DATA_2 U NAME_DATA_3 with the additional answer field in each row */
run;
/* sql code like */
create table NAME_DATA as
select * from ANOTHER_TABLE group by VAR_1;
quit;
It is important to notice that the proc sql statement ends with quit, whereas all other instructions finish with run.
by VAR_1, descending VAR_2;
quit;
proc sql;
select * from NAME_DATA order by VAR_1, VAR_2 desc;
quit;
Or using the assistent. Menu Data -> Query
In this page, from the UCLA, there is a basic SAS tutorial.
- DATA statement
input VAR_1 VAR_2 VAR_3 $ VAR_4; /* $ indicates VAR_3 is alphanumeric */
cards;
/* start of data reading */
123 1 foo 234
124 2 foo2 221;
run;
- SET statement (inside DATA)
set NAME_DATA_2;
run; /* NAME_DATA := NAME_DATA_2 */
- SET statement with conditional (inside DATA)
set NAME_DATA_2 NAME_DATA_3;
/* concatenates name_data_2 and name_data_3 */
if VAR_1 = 'yes' then answer = 1;
else answer = 0;
/* NAME_DATA := NAME_DATA_2 U NAME_DATA_3 with the additional answer field in each row */
run;
- PROC SQL
/* sql code like */
create table NAME_DATA as
select * from ANOTHER_TABLE group by VAR_1;
quit;
It is important to notice that the proc sql statement ends with quit, whereas all other instructions finish with run.
- LEFT JOIN (inside PROC SQL)
- SELECT DISTINCT (inside SQL code)
- How to sort a table?
by VAR_1, descending VAR_2;
quit;
proc sql;
select * from NAME_DATA order by VAR_1, VAR_2 desc;
quit;
Or using the assistent. Menu Data -> Query
- Variable names: not special chars, only '_'. Name literals finish with n, like 'Hello World!'n
Tuesday, October 19, 2010
LU decomposition with Lapack
2 pages where we can find information:
http://www.physics.orst.edu/~rubin/nacphy/lapack/compile.html
http://www.physics.orst.edu/~rubin/nacphy/lapack/codes/linear-f.html
Saturday, October 16, 2010
Gnuplot
Useful comands to plot histograms:
gnuplot> set style data histograms
gnuplot> plot "random1.dat" using 2:xticlabels(1)
gnuplot> set boxwidth 0.9 relative
gnuplot> set style histogram clustered gap 0
gnuplot> set style fill solid
Some interesting examples:
http://objectmix.com/graphics/140008-gnuplot-histogram-has-large-spaces.html
If we don't want to show the key:
gnuplot>unset key
gnuplot> set style data histograms
gnuplot> plot "random1.dat" using 2:xticlabels(1)
gnuplot> set boxwidth 0.9 relative
gnuplot> set style histogram clustered gap 0
gnuplot> set style fill solid
Some interesting examples:
http://objectmix.com/graphics/140008-gnuplot-histogram-has-large-spaces.html
If we don't want to show the key:
gnuplot>unset key
Friday, October 8, 2010
Python - Functional Programming
- A good tutorial is here:
Subscribe to:
Posts (Atom)