Creating and Using SAS Macro Variables

In computer programming, a macro is essentially pre-defined set of text that are mapped to a certain input. Once defined, the macro can be called to replace the input text whenever needed. For example, imagine you have an email client, and you type the phrase "Best regards, John Doe" at the end of your emails. Instead of typing it out every time, you can create a macro that automatically inserts this phrase whenever you type a specific shortcut, like "/signature."

  • Step 1: You define a macro where "/signature" is mapped to "Best regards, John Doe."
  • Step 2: Each time you type "/signature" in your email, the program automatically replaces it with "Best regards, John Doe."

The same concept applies in SAS programming. Frequently used code elements, such as dataset names, SAS statements, or options, can be bundled together and stored as a SAS macro. Then you can simplify your SAS program, referencing the macro instead of repeatedly writing the same code elements.

To be more specific, when you submit a SAS program, SAS first checks if there is any macro declarations and references. If found, these are passed to the macro processor, which replaces them with the corresponding text. This preprocessing step, where the macro processor generates standard SAS code by substituting macro references with their declared values, is referred to as macro resolution. Because you are writing a program that writes a program, this is sometimes called meta-programming.

This fundamental concept can answer common questions in SAS macro programming, such as:

  • Can macro %IF statements be used interchangeably with DATA step IF statements?
  • Why can't I assign a DATA step variable value to a macro variable using the %LET statement?
  • Why can't I use a DATA step IF to conditionally execute a %LET statement?
  • Why do data set variables not have values when using them in %IF statements?

The single answer to all these four questions is that macros are resolved before any standard SAS code, including DATA steps and PROC steps. In other words, at the time a macro is resolved, any elements from the standard SAS code have not yet been created. This is a key concept in SAS macro programming, as it provides an overall understanding of how SAS macro works and helps avoid common pitfalls in your programs.

Defining and Using Macro Variables

The most basic form of a SAS macro is the macro variables, also known as symbolic variables, differentiating it from SAS dataset variables. The macro variable is a very simple, yet powerful tool all by itself. Even if you know nothing else about the SAS macro facility other than how to define and use macro variables properly, you can accomplish a great deal and confidently say that you are "proficient" in SAS macro programming during job interviews.

Long story short, defining macro variable is mapping text values to the variable, so that whenever it is referenced in the code, the macro variable resolved into corresponding texts. These resolved texts then act as pieces of actual SAS code. To define a macro variable, use the %LET statement following the syntax below: 

%LET <macro-variable-name> = <text-value>;

Macro variable names follow the same naming rules as dataset variables: they can be up to 32 characters long, must begin with a letter or an underscore, and may only include letters, numbers, and underscores. Additionally, when naming a macro variables, it is advisable to avoid starting macro variable names with "SYS," as these are reserved for automatic macro variables. Using such names for your macro variables may lead to unintended behavior or errors.

On the right-side of the equal sign, you can specify text. These text will replace the macro variable references in your code before compiling and running any standard SAS codes. So that your final code will include the substituted text values integrated as part of the SAS program.

For example, consider the following code: 

/* Defining macro variables */
%LET region_name = United States;
%LET ros_threshold = 0.05;

PROC PRINT DATA = sashelp.shoes;
/* Referencing macro variables */
WHERE region = "&region_name" AND return / sales >= &ros_threshold;
RUN;

The sashelp.shoes dataset has 395 observations from a fictitious shoe company. In this example, two macro variables, region_name and ros_threshold, are defined using the %LET statements. When you submit this program for run, SAS first resolves the macro variables. The macro reference &region_name is replaced with United States, resulting in the WHERE condition region = "United States". Similarly, the macro reference &ros_threshold is replaced with 0.05, yielding the WHERE condition return / sales >= 0.05. Note that when a macro variable is referenced inside of a quoted string as demonstrated here, that string must be enclosed by double quotes ("), not single ('). Otherwise, the macro variable will not be resolved correctly.

Here is the final SAS code after resolving the two macro variables: 

PROC PRINT DATA = sashelp.shoes;
/* Referencing macro variables */
WHERE region = "United States" AND return / sales >= 0.05;
RUN;

One might wonder what's the point of using macro variables in this specific example. The code only uses each macro variable once for a simple substitution. The macro variables are nothing more than just placeholders for static values (United States and 0.05); if you were only analyzing US data and a fixed return-on-sales threshold, there's little advantage to using macros.

However, macro variables become very powerful in more complex scenarios:

  • Parameterization: Imagine you want to analyze data for multiple regions (US, Europe, Asia) or vary the ros_threshold for different analysis. With macros, you change the values in the %LET statements once, and the entire code adapts.
  • Code Reusability: If the WHERE conditions are part of a larger, more intricate program, using macro variables makes the code more modular and easier to maintain. You can easily adjust the analysis by modifying the macro variable definitions.
  • Dynamic Code Generation: Macros can generate complex SAS code based on input values or conditions, which is necessary for automating tasks and creating flexible programs.

While this simple example might not fully demonstrate the power of macro variables, they are a fundamental tool in SAS programming for enhancing code flexibility, reusability, and maintainability, particularly in larger and more dynamic projects.

Concatenating Macro Variables with Other Text

In essence, SAS macro programming is writing a program that generates another program, which is why it is called meta-programming. So, in SAS macro programming, you write the macro code with the final generated SAS codes in mind and you need to ensure that the macro produce the correct code to achieve the desired results. Predicting how the SAS statements will be after resolution is generally straightforward, but it can occasionally be less intuitive. This is particularly true when a macro variable reference is concatenated with other text.

When resolved, a macro variable is basically replaced by its text value. No concatenator is needed, as the macro variable substitutes the actual code itself, not a data value. The SAS code with the resolved text is then compiled for execution. For example:


:

%LET team_name = Chicago;

DATA team_&team_name;
SET sashelp.baseball;
WHERE team = "&team_name";
RUN;

PROC PRINT DATA=team_&team_name;
RUN;

In this example, the macro variable &team_name is appended to a preceding text, chicago. Thus, this code will be resolved into:

DATA team_Chicago;
SET sashelp.baseball;
WHERE team = "Chicago";
RUN;

PROC PRINT DATA=team_Chicago;
RUN;

Similarly, you can place text after a macro variable reference. However, this can create some ambiguity, as it may not clear whether the following text is part of the macro variable name. For example, consider the following program referencing the team_name macro variable defined earlier:

DATA &team_name_team;
SET sashelp.baseball;
WHERE team = "&team_name";
RUN;

In the program, our intention was to specify the name of the DATA step by Chicago_team after substituting the macro variable &team_name with ChicagoHowever, due to the appended text (_team), it becomes ambiguous if the macro variable refers to the entire &team_name_team or just &team_nameSo, running this code will result in an error:

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 %LET team_name = Chicago;
70
WARNING: Apparent symbolic reference TEAM_NAME_TEAM not resolved.
71 DATA &team_name_team;
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, /, ;, _DATA_, _LAST_, _NULL_.
ERROR 200-322: The symbol is not recognized and will be ignored.
72 SET sashelp.baseball;
73 WHERE team = "&team_name";
74 RUN;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEAM_NAME_TEAM may be incomplete. When this step was stopped there were 0 observations and 24 variables.

To avoid this ambiguity, you can add a period (.) after the macro reference. This period acts as a marker, indicating the end of macro variable name:

DATA &team_name._team;
SET sashelp.baseball;
WHERE team = "&team_name";
RUN;

In this case, the period after &team_name ensures that the macro variable &team_name is correctly resolved to Chicago, and the text _team is treated as part of the dataset name, not part of the macro variable. Thus, this program will be interpreted as:

DATA Chicago_team;
SET sashelp.baseball;
WHERE team = "Chicago";
RUN;

Sometimes the first character of the text following a macro variable can be a period. However, as mentioned earlier, a single period appended to the macro variable acts as a delimiter and will not appear in the resolved text. One common scenario where this occurs is when you store library and dataset names as macro variables and reference them in dataset specification. For example:

%LET libref = mydata;
%LET dsn = mlb;
%LET team_name = Chicago;

DATA &libref.&dsn._&team_name;
SET sashelp.baseball;
WHERE team = "&team_name";
RUN;

In this case, since the single period after &libref will not appear in the resolved text, the program will be interpreted as creating mydatamlb_Chicago under the default WORK library.

To get around this, you can escape the leading period in the concatenated text by adding another period:

%LET libref = mydata;
%LET dsn = mlb;
%LET team_name = Chicago;

DATA &libref..&dsn._&team_name;
SET sashelp.baseball;
WHERE team = "&team_name";
RUN; 

Multi-Level Resolutions

When two or more ampersands (&) appear consecutively, there will be successive passes or scans before achieving the final resolution. The two notable ones are double and triple ampersands. The double ampersands (&&) are used to resolve macro variables that contain other macro variables as part of the name. The three ampersands (&&&) are used to a macro variable that is contained as a value of another macro variable. 

This becomes quickly confusing. To clarify, let's define the following macro variables:

%LET var1 = 5;
%LET var2 = var1;
%LET var3 = &&&var2;

With the variables, let's consider the two combinations listed below:

CombinationFirst Scan Resolves ToSecond Scan Resolves To
&&dsn&n&dsn5No such macro variable is defined, and thus will throw and error!
&&&dsn&n&nvst5nvst5

T

To make things easy, you can think of the double ampersand (&&) as a special reference that resolves to a single ampersand (&). For example, in the first case, && is resolved into &. Then, the following text, dsn is appended. Similarly, &n is resolved to 5. After first scan, the resolution is &dsn5. Then in the second scan, SAS attempts to resolve macro variable &dsn5. Since no such macro variable is defined, SAS will throw an error during the second scan.

Let's move onto the next example. When a triple ampersand (&&&) is encountered, the following occurs: During the first scan, the first two ampersands are resolved into a single ampersand (&). The remaining one ampersand and its subsequent text (&dsn) is then resolved into nvst. Similarly &n is resolved into 5. This outcome, &nvst5, is then further resolved in the second scan, ultimately resulting in sashelp.nvst5.

So, if you have 

One tip to avoid any confusion on writing macro references with consecutive ampersands is that starting from the final result and working backward. By clearly identifying what the final outcome should look like, you can better plan how each macro variable should be defined and referenced in order to achieve that result. This approach helps you avoid confusion and errors that might arise when dealing with complex resolutions.

Post a Comment

0 Comments