Writing Modular Code with Macros



SAS Macros are an extension of the macro variables we've discussed thus far. They can perform more complex tasks beyond the capabilities of macro variables alone. You can define a macro by enclosing code blocks of your interest between %MACRO and %MEND. For example:

%LET libref = SASHELP; %LET dsn = RETAIL; %LET nobs = 20; %MACRO head; PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs observations of &libref..&dsn"; RUN; %MEND head;

In this example, %MACRO start defining a SAS macro named head, and %MEND head; marks the end of the head macro definition. Once you've define the macro, you can call it anywhere in your SAS program using the following syntax:

%head;

When is called, the head first resolves the global macro variables, libref, dsn, and nobs. Subsequently, it executes PROC PRINT based on the resolved macro variables.


Understanding Scope of Macro Variables

Unlike data set variables, macro variables have their values stored in symbol tables in memory. These tables act as a dictionary, mapping macro variable names to their corresponding values and scope, which defines the visibility and accessibility of a macro variable. For example:

%LET global_var = global;

%MACRO show_scope; %LET local_var = local;
%PUT ***** Inside the macro *****;
%PUT Global Variable: &global_var;
%PUT Local Variable: &local_var; %MEND show_scope;

%show_scope;

%PUT ***** Outside the macro *****; %PUT Global Variable: &global_var; %PUT Local Variable: &local_var;

  • Global Scope: Macro variables that are defined outside of any specific macro definition is called global macro variables. Each global variable holds a single value that is accessible to all macros throughout your program. 
  • Local Scope: A local macro variable's value is only accessible within the macro where it is defined or macros nested inside that macro. Since macros can call other macros, this creates a hierarchy with multiple levels of nested local symbol tables.

For example, in the example program shown above, %LET global_var = global; defines a global macro variable named glogal_var. Since it is outside any macro definition, it has global scope and is accessible throughout the program. On the other hand, local_var is defined within the show_scope macro. Thus, this variable has local scope and only accessible within the macro and any nested macros.

Defining Macros with Parameters

When creating a macro, while it is possible to define and use a macro variable defined by a %LET statement, relying solely on %LET statement can easily become cumbersome. Particularly, %LET statement used in a macro is not flexible enough for handling different values or arguments; you will need to edit the macro each time you want to change a variable's value.

To address this limitations, creating macros with parameters is the preferred approach. Macro parameters allow you to pass different values into the macro during each call, making it adaptable to various scenarios. The assignment of values to the parameters is made when the macro is called, not when the macro is coded.

For macros with a small number of parameters (typically less than four), where the order between the parameters is clear or not very important, it is convenient to define and use a macro with positional parameters. For example:

%MACRO stacking_two_datasets(dsn1, dsn2); DATA Output; SET &dsn1 &dsn2; RUN; PROC PRINT DATA=Output; TITLE "Dataset: &dsn1 + &dsn2"; RUN; %MEND stacking_two_datasets;

The stacking_two_datasets are defined with two positional parameters, dsn1 and dsn2. Then inside the macro, the positional parameters dsn1 and dsn2 are used directly with the SET statement of the DATA step. This macro can be called as follows:

%stacking_two_datasets(SASHELP.NVST1, SASHELP.NVST2);

When the stacking_two_datasets is called, the provided values were assigned by their positions: SASHELP.NVST1 is assigned to dsn1 and SASHELP.NVST2 is assigned to dsn2. The macro code then uses these assigned values inside the macro body.

In essence, the order in which you list the arguments when calling the macro determines which parameter they are assigned to. So, mixing up the order of arguments can lead to errors or unexpected results

Sometimes, you don't know how many parameter would be needed for your macro. Rather, you would like to leave it undetermined and make your macro adopt any number of parameters. In such cases, you can have a single parameter as a placeholder for any number of parameter. For example:

%MACRO stacking_datasets(ds_list); DATA Output; SET &ds_list; RUN; PROC PRINT DATA=Output; TITLE "Dataset: &ds_list"; RUN; %MEND stacking_datasets;

%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3);
%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5);

Here, when stacking_datasets is called, the parameter &ds_list is resolved to the list of the provided arguments: %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3); calls the macro with three arguments and %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5); calls it with five arguments. This trick can make your macro take any number of arguments as needed. 

Notice that there is no commas between the data set names in the macro call. When resolved, a macro reference will become a text string that forms a single definition. So, in this context, where we would like to list multiple data sets in a SET statement, we should not include any commas. 

In the %MACRO statement, you can also designate parameters as keywords. Unlike positional parameters, these keyword parameters can be employed in any sequence and may have default values assigned them. Particularly, when you have more than four parameters to use, want to specify default values, or when parameter names can provide some additional information, it is convenient to define a macro with keyword parameters. For example:

%MACRO head(libref=, dsn=, nobs=5, var_format_pair=); PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs observations of &libref..&dsn";
FORMAT &var_format_pair; RUN; %MEND head;

In this example, head is defined with four keyword parameters: libref, dsn, nobs, and var_format_pair. Among the four parameters, the nobs is assigned its default value, while the others are not. Now, let's call this macro as follows:

%head(libref=SASHELP, dsn=RENT, var_format_pair=Date EURDFDD10. Amount EUROX12.2);

In the macro call, the three keyword arguments are provided for libref, dsn, and var_format_pair; nobs will be resolved with its default value of 5. Note that if a keyword argument does not have a default value and is not provided any argument, it will be resolved into a null string.

In a more common scenario, you would specify both keyword and positional parameters in the %MACRO statement. In this case, however, you must list positional parameters before any keyword parameters. For example:

%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*); DATA StockSubset; SET MyData.SP500; WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period); RUN; PROC SGPLOT DATA=StockSubset;
TITLE "&ticker Stock";
FOOTNOTE "Last &period &int"; &open SERIES X = DATE Y = OPEN; &high SERIES X = DATE Y = HIGH; &low SERIES X = DATE Y = LOW; &close SERIES X = DATE Y = CLOSE;
YAXIS LABEL = 'USD'; RUN; %MEND stock_chart;

The stock_chart is defined with 7 parameters: ticker, period, and int are defined as positional parameters, while the remaining four are defined as keyword parameters. In the definition, all positional parameters are placed before keyword parameters. Calling this macro also requires positional parameters prior to any keyword parameters:

%stock_chart(ABT, 5, Year, high=, low=);

In this code line, the macro stock_chart is invoked with five arguments: ABT, 5, Year, high=, and low=. Note that the keyword parameters have a default value of *, acting as automatic commenting-out feature. When a null value is provided, on the other hand, the associated variable will be employed for plotting the stock chart.


D



Flow Controls in SAS Macros

control structure in programing is a block that supports variable evaluations and conditional decision making. It determines the path your program follows based on specified condition, so that allow you to control program flow. SAS macro language, as a programming tool, also integrates control structures with its own syntax. In fact, implementing a statistical algorithm using SAS macros typically requires conditional logic.

%IF-%THEN/%ELSE Statements

Remember that IF-THEN/ELSE statements in a DATA step allow you to write programs that execute conditionally based on the Program Data Vector (PDV) variable values. The macro %IF-%THEN and %ELSE statements acts similarly to DATA step IF-THEN/ELSE statements. However, unlike DATA step statements, the macro %IF-%THEN/%ELSE statements do not directly evaluate variables on PDV.

Instead, they manage program flow by evaluating whether a macro variable equals to your specified value. This is because, when you compile your SAS program, any macro reference will be resolved before DATA step loops. Thus, %IF-%THEN statements compare a macro variable with another macro variable or literal. For example:

%IF &SYSDAY = Sunday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;

In this example, the %IF compares the resolved value of &SYSDAY with the specified literal Sunday. If &SYSDAY equals Sunday after resolving its value, the macro variable &workout_routine will be assigned the value Leg

The %DO block is basically analogous to that of DATA step. It begins with %DO; and is terminated with %END;. The %DO block is needed when you want to add multiple macro calls, multiple macro statements, or even multiple SAS statements after %THEN.

To specify what to do when the %IF condition is not true, you should add an %ELSE block. For example:

%IF &SYSDAY = Sunday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;
%ELSE %DO;
%LET workout_routine = Push;
%PUT Today is &workout_routine. day!;
%END;



However, you cannot nest %IF block in open code. To nest a %IF-%THEN block, it must be inside a macro. So, the following code will not work:

%IF &SYSDAY = Sunday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;
%ELSE %DO;
%IF &SYSDAY = Monday %THEN %DO;
%LET workout_routine = Push;
%PUT Today is &workout_routine. day!;
%END;
%ELSE %DO;
%LET workout_routine = Pull;
%PUT Today is &workout_routine. day!;
%END;
%END;

You can also use ANDORNOT operators to add some more complicated expressions in the %IF clause. For example:

%IF &SYSDAY = Sunday OR &SYSDAY = Wednesday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;
%IF &SYSDAY = Monday OR &SYSDAY = Thursday %THEN %DO; %LET workout_routine = Push; %PUT Today is &workout_routine. day!; %END;
%ELSE %DO;
%LET workout_routine = Pull;
%PUT Today is &workout_routine. day!; %END;


Iterative %DO-%END Blocks 

In SAS macros, the %DO-%END blocks are not only used for grouping multiple statements that you want to execute under the same program flow path. You can also use them for iteration with arguments specifying the loop's behavior. The iterative %DO-%END blocks are basically analogous to the DO-END block of DATA step statements, except that: 

  • %WHILE and %UNTIL specifications cannot be added to the increments.
  • Increments are integer only.
  • Only one specification is allowed.
  • %DO defines and increments a macro variable, not a data set variable.

A common use case for the iterative %DO-%END block is generating a series of data sets or variable names with prefixed or suffixed names. For example:

%MACRO names(name, first, last); %DO i = &first %TO &last;
&name._&i
%END;
%MEND names; %PUT %names(MyData, 1, 5);

In this example, the local macro variable &i is incremented by one starting with &first and ending with &last. Then in the %DO-%END loop, it concatenates &name with an underscore and &i. Notice that the statement doesn't closed by a semicolon. This is because what it supposed to generate is a list of text literals, neither a macro nor DATA step statement.

So, if you append a semicolon to &name._&i, SAS will throw four errors, after successfully generate MyData_1:

You can call this macro for naming a series of SAS data sets or variables. For example:

/* Creates five empty data sets from scratch */
DATA %names(MyData, 1, 5);
ATTRIB
VarA LENGTH=8 LENGTH=BEST12. LABEL="Dummy variable A"
VarB LENGTH=8 LENGTH=BEST12. LABEL="Dummy variable B"
VarC LENGTH=8 LENGTH=BEST12. LABEL="Dummy variable C"
;
STOP;
RUN;

%WHILE and %UNTIL specifies conditions to stop iterations. For each iteration, the %WHILE statement first evaluates condition. If the it is true, then proceed with the executions for the current iteration. For example:

%MACRO count_while(n); %PUT Count starts at: &n; %DO %WHILE(&n < 3); %PUT *** &n ***; %LET n = %EVAL(&n + 1); %END; %PUT Count ends at: &n; %MEND count_while;

%count_while(1); %count_while(5);

Conversely, %UNTIL executes the current iteration's tasks first and then evaluates the condition. If true, the loop terminates. For example:

%MACRO count_until(n); %PUT Count starts at: &n; %DO %UNTIL(&n >= 3); %PUT *** &n ***; %LET n = %EVAL(&n + 1); %END; %PUT Count ends at: &n; %MEND count_until; %count_until(1); %count_until(5);




D



Documenting Your Macro

After creating a macro, documenting it is generally considered good practice. Well-documented macros are easier to modify and debug, as they provide explanations about the macros' purpose, functionalities, and parameters, as well as their intended behaviors. Furthermore, when working in a team environment, documentation promotes code sharing and reusability within the team, reducing redundant efforts to write new codes with the same functionality. 

Depending on the teams and projects, there could be different style guides and templates on how to write a documentation. But typically follows the rules listed below:

  • Use descriptive parameter names, so that users can easily grasp what each parameter does.
  • Supply default values, whenever reasonable defaults are available.
  • Lining up each parameter one per line with its default value and add explanation, such as range of acceptable values or some examples. 

For example:

%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*);
/*
ticker Ticker symbol of the SP500 company of interest.
period Desired time period for analysis.
int Unit of time interval. Available options are: Year | Month | Day
open=* To plot open price on the chart, pass open=
high=* To plot high price on the chart, pass high=
low=* To plot low price on the chart, pass low=
close=* To plot close price on the chart, pass close=
*/
DATA StockSubset; SET MyData.SP500; WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period); RUN; PROC SGPLOT DATA=StockSubset;
TITLE "&ticker Stock";
FOOTNOTE "Last &period &int"; &open SERIES X = DATE Y = OPEN; &high SERIES X = DATE Y = HIGH; &low SERIES X = DATE Y = LOW; &close SERIES X = DATE Y = CLOSE;
YAXIS LABEL = 'USD'; RUN; %MEND stock_chart;

df

fd

Macros Invoking Macros

In SAS macro programming, it is a common practice to define global macros as separate building blocks and to invoke them in another macro. This approach promotes modularity by compartmentalizing functionality and improves maintainability by allowing for isolated changes. For example, let's consider the %interleaving_two_datasets shown below:

%MACRO interleaving_two_datasets(dsn1, dsn2, by_var_list);
%sorting_obs(&dsn1, out_dsn1, &by_var_list);
%sorting_obs(&dsn2, out_dsn2, &by_var_list);
DATA Output;
SET out_dsn1 out_dsn2;
BY &by_var_list;
RUN;
%MEND interleaving_two_datasets;

%MACRO sorting_obs(input_dsn, output_dsn, by_var_list);
PROC SORT DATA=&input_dsn OUT=&output_dsn;
BY &by_var_list;
RUN;
%MEND sorting_obs;

%interleaving_two_datasets(SASHELP.NVST1, SASHELP.NVST2, Date);

In this example, %interleaving_two_datasets is defined alongside parameters that %sorting_obs will use as arguments. Here, the values specified when you call %interleaving_two_datasets are resolved and passed to %sorting_obs. Thus, invoking %interleaving_two_datasets with the three arguments, SASHELP.NVST1, SASHELP.NVST2, and Date, will be resolved as:

PROC SORT DATA=SASHELP.NVST1 OUT=out_dsn1; BY Date; RUN;

PROC SORT DATA=SASHELP.NVST2 OUT=out_dsn2; BY Date; RUN;

DATA Output;
SET out_dsn1 out_dsn2;
BY Date;
RUN;

Notice that the macro %interleaving_two_datasets is defined prior to the macro that it calls. This works because of how the SAS preprocessor handles macro definitions:

  1. Preprocessing: Before running the actual SAS program, the preprocessor scans the code for macros.
  2. Macro Expansion: During preprocessing, whenever the preprocessor encounters a macro call, it replaces it the entire definition of the called macro. This process is called macro expansion.
  3. In-place Substitution: Importantly, the macro expansion happens in-place. This means the preprocessor substitutes the macro call with the complete definition, including any nested macro calls within that definition.

Thus, as long as each macro is defined before it is called, the order in which macros are defined does not matter. In this example, the calls to %sorting_obs are not executed until %interleaving_two_datasets itself is executed.

In practice, a separate macro that does nothing but calling other macros is often referred to as a master macro. This approach promotes code reusability and efficiency. By incorporating control structures like conditionals and loops we've discussed so far, you can craft master macros that automate complex workflows, conditionally executing tasks, or iterating through processes in a single organized batch.

Bad Practice: Nesting Macro Definitions

Nested macro definitions occur when the %MACRO through %MEND statements are enclosed within another macro's definition. Almost always, this is due to the lack of programmer's understanding how macros are stored; nesting macro definitions is only very rarely necessary or advisable.

For example, in the following program, the %interleaving_two_datasets is rewritten to nest definition of the %sorting_obs. Although this program would work just as before, it is very inefficient. Every time the %interleaving_two_datasets is executed, the nested macro %sorting_obs will be redundantly re-compiled:

%MACRO interleaving_two_datasets(dsn1, dsn2, by_var_list); %MACRO sorting_obs(input_dsn, output_dsn, by_var_list); PROC SORT DATA=&input_dsn OUT=&output_dsn; BY &by_var_list; RUN; %MEND sorting_obs; %sorting_obs(&dsn1, out_dsn1, &by_var_list); %sorting_obs(&dsn2, out_dsn2, &by_var_list); DATA Output; SET out_dsn1 out_dsn2; BY &by_var_list; RUN; %MEND interleaving_two_datasets;

There is no need for this. If you define %sorting_obs in the global scope, its definition will be compiled only once. Then, whenever you invoke %sorting_obs within %interleaving_two_datasets calls, the compiled definition of %sorting_obs will be reused.

So, it is almost always best practice to avoid any nesting definitions. The only justifiable use I can think of would be when the nested macro's definition needs to vary based on conditional evaluation. However, even in such cases, I would recommend defining two separate macros in the global scope and invoking the appropriate one based on the evaluation.





Macro Functions

Numerical Evaluations on Macro Variables

Again, what a macro variable hold is neither character nor numeric. Rather, it holds a literal text value that will be substituted into your SAS code. Thus, it is generally not possible to directly take operations on a macro variable. However, we sometimes want the expression to be directly evaluated. One of the solutions for such case is the %EVAL function. When it is called, %EVAL always perform integer arithmetic. For example:

%LET A = 5;
%LET B = &A + 1;
%LET C = %EVAL(&B + 1);
%LET D = %EVAL(&A / 2);
%LET E = %EVAL(&A + 0.2);

%PUT A: &A;
%PUT B: &B;
%PUT C: &C;
%PUT D: &D;
%PUT E: &E;


We see that the macro variable &B resolves to the literal 5 + 1, rather than being directly evaluated as 6. On the other hand, we see that &C directly evaluated the expression and resolves to 7. Similarly, %EVAL function evaluates &A / 2, but in this case, it truncates any decimals and return the whole number.

%EVAL function only takes whole numbers and performs the four basic operations: addition, subtraction, multiplication, and division. Thus, when it comes to %EVAL(&A + 0.2), SAS throws an error. For floating point evaluations, you should use %SYSEVALF function, instead of %EVAL. For example:

%LET X = 5/3; %PUT Default: %SYSEVALF(&X); %PUT Bool: %SYSEVALF(&X, BOOLEAN); %PUT Ceil: %SYSEVALF(&X, CEIL);
%PUT Floor: %SYSEVALF(&X, FLOOR);
%PUT Integer: %SYSEVALF(&X, INTEGER);


Macro Functions for Text Modification

Sometimes, you may need to modify literal texts stored in a macro variable or extract information from it. Text functions can be quite helpful in these scenarios. Here are some commonly used text functions for the purposes:

  • %INDEX(arg1arg2): Searches arg1 for the first occurrence in arg2. If there is any, return the position of the first match.
  • %LENGTH(arg): Determines the length of its argument.
  • %SCAN(arg1arg2, <delimiters>): Searches arg1 for the n-th word (arg2) and return its value. If omitted, the same word delimiter that was used as in the DATA step will be used.
  • %SUBSTR(arg, pos, <length>): Return a portion of arg, starting from pos to the <length>. If omitted, it will return by the end of the string.
  • %UPCASE(arg): Converts all characters in the arg to upper case. This function is useful when you need to compare text strings that my have inconsistent case.

For example:

%LET my_pangram = The jovial fox jumps over the lazy dog;
%LET pos_jumps = %INDEX(%UPCASE(&my_text), JUMPS); %LET my_substr = %SUBSTR(&my_pangram, &pos_jumps, %LENGTH(jumps)); %PUT &my_pangram; %PUT &pos_jumps; %PUT &my_substr; %LET x = XYZ.ABC/XYY; %LET word = %SCAN(&x, 3); %LET part = %SCAN(&x, 1, z);
%PUT WORD is &word and PART is &part;


Post a Comment

0 Comments