Note: When filename is listed, it is assumed you are in the directory where this file is located. Also, % is a prompt and you need not type this.
A useful resource online for many various UNIX commands is located at
the following address: http://www.itc.virginia.edu/desktop/unix/docs/u001.unix.resources.html
GETTING DATA
Method 1 (UNIX to UNIX):
Step1: use command: % ftp newhost
Ex: %ftp pitman.stat.virginia.edu
Step 2: Enter login and password
Ex: If you want data for STAT512, enter stat512(login) and yty-btxty(password)
Step 3: Change to the correct directory using the command: % cd directoryname
Step 4: Use command: % ls to view files
Step 5: To get a particular file: % get filename
Step 6: To exit: % bye
Method 2: (Between UNIX and PC)
Step 1: Double Click on the WS_FTP95 LE icon.
Step 2: Click the arrow in the Profile Name window
Step 3: Scroll down until you find the account of interest
For example: Scroll down until you find STAT512
Step 4: Highlight the account of interest (STAT512).
Note: If you forget where your file is located, go to the Start icon
on the bottom of the screen and select find/files or folders. Type in the
name of your file and it should bring up a list of where the file is located.
A Basic Introduction to SAS(in
UNIX) at the Department of Statistics,
University of Virginia
Section 1: Creating a new SAS program
Within Exceed, open a text editor. This is done by clicking on the up arrow above the paper and pencil icon. Then click on Text Editor. This text editor will be where you write your SAS program.
1.1 Syntax Rules and Statement Ordering (from: Lefkowitz, Jerry M., Introduction to Statistical Computer Packages, Boston: Duxbury Press, 1985.)
In most cases, you will be importing a data set from the class UNIX account. For this reason, I will only describe the INFILE command, and not cover inputting the data directly into your program (i.e. using CARDS). Please refer to the handout Getting Data for ways to get SAS data.
For the following examples, assume that we have imported a data set
that contains eight variables. These variables are:
| Variable | Description |
| PULSE1 | First pulse rate |
| PULSE2 | Second pulse rate |
| RAN | 1=ran in place, 2=did not run in place |
| SMOKES | 1=smokes regularly, 2=does not smoke regularly |
| SEX | male, female (note: not numeric) |
| HEIGHT | Height in inches |
| WEIGHT | Weight in pounds |
| ACTIVITY | Level of activity (1=light, 2=moderate, 3=heavy) |
Sample data line: 64 88 1 1 male 66 140 2 (Note no semicolon at the end)
OPTIONS LS=75;
DATA EXAMPLE;
INFILE “/home/mjs5b/STAT512/OUR_DATA.SET”;
INPUT PULSE1 PULSE2 RAN SMOKES SEX $ HEIGHT WEIGHT ACTIVITY;
RUN;
On the first line the statement options ls=75; tells SAS to print output in 75 characters per line. This should be the first line to every program. The next line tells SAS that the name of our data set is EXAMPLE. The INFILE statement says that the data set, our_data.set, is located at /home/mjs5b/STAT512. Note that when in quotes, SAS is case sensitive. For example, SAS will not find the data if I had used /HOME/MJS5B/ STAT512/ OUR_DATA.SET. On the next line, note the $ after SEX. This tells SAS that we are dealing with a character (alphanumeric) variable. The RUN statement just tells SAS to execute all statements up until this point.
Now, let’s assume you want to create a new variable. For example, let’s say that we want the average pulse rate.
OPTIONS LS=75;
DATA EXAMPLE;
INFILE “/home/mjs5b/STAT512/OUR_DATA.SET”;
INPUT PULSE1 PULSE2 RAN SMOKES SEX $ HEIGHT WEIGHT ACTIVITY;
AVGPULSE= (PULSE1 + PULSE2)/2;
RUN;
One can use almost any mathematical operator in defining a new variable name. A list of these is located in the SAS manuals. Just remember in order to create a new variable name, put it at the end of the input statement and end it with a semicolon.
1.3 The PROC Step
The PROC statements immediately follow the DATA statements. PROC’s perform various functions and computations on SAS data sets. For STAT512 and STAT513, you will mostly be working with the either the GLM or REG procedures.
1.3.1 Descriptive Procedures
PROC PRINT DATA=EXAMPLE;
VAR AVGPULSE SMOKES SEX HEIGHT WEIGHT;
RUN;
This PROC simply prints a list of the data with specified variables: AVGPULSE, SMOKES, SEX, HEIGHT, AND WEIGHT. Note that there are no commas in between the specified variables. If you want a list of all variables, you do not need to include the VAR statement.
PROC MEANS DATA= EXAMPLE N MEAN STD;
VAR PULSE1 PULSE2 HEIGHT WEIGHT;
CLASS SEX;
TITLE ‘Basic Descriptive Statistics’;
RUN;
This PROC statement produces the number of observations (N), the mean (MEAN), and standard deviation (STD), for the specified variables (PULSE1, PULSE2, HEIGHT, WEIGHT) separately for each sex (CLASS SEX statement). The default statistics for the MEANS Procedure are N, MEAN, STD, MAX, and MIN. Again, if you wanted descriptive statistics for all variables you would not use the VAR statement. Also, if you didn’t want descriptive statistics separately for each sex, delete the CLASS statement. The TITLE statement simply gives a title to the table of descriptive statistics in the output. Again, note that when stuff is in quotes it is case sensitive.
1.3.2 The REG Procedure
Let’s say that we are interested in the relationship between height and weight. This is a classic example of simple linear regression. The REG procedure is useful in determining if this relationship is significant, and linearly related. It is also useful in determining the estimates of the coefficients.
PROC REG DATA=EXAMPLE;
MODEL WEIGHT=HEIGHT;
OUTPUT OUT=REG1 P=PREDY R=RESID;
RUN;
PROC PLOT DATA=REG1;
PLOT WEIGHT*HEIGHT PREDY*HEIGHT=’*’ / OVERLAY;
PLOT RESID*PREDY;
RUN;
The REG procedure performs a regression analysis on the data set EXAMPLE. The MODEL statement specifies that WEIGHT is the dependent variable and HEIGHT is the independent variable. I general, the MODEL statement is as follows: MODEL dependent variable = independent variable(s) / options;. The OUTPUT statement is used to create a new data set that now has variables: PREDY (predicted values of WEIGHT) and RESID (residuals). If you would use the PRINT procedure for the data set REG1, it would be identical to the EXAMPLE data set except that now it would list both the residuals and the predicted values of WEIGHT.
The PLOT procedure is now using the new data set that we generated from
our regression analysis, REG1. The first PLOT statement is telling SAS
to produce a plot of WEIGHT*HEIGHT and to also produce a plot of PREDICTED
WEIGHT*HEIGHT. This second plot is to be overlayed onto the first plot
and the point are to be denoted by a ‘*’. The ‘*’ can be any letter, number,
or symbol. A plot would look similar to something shown in figure 1. The
second PLOT statement is simply producing a residual plot of the predicted
values of weight.
Figure 1: Plot of weight vs. height with predicted value
of weight vs. height denoted as ‘*’.
1.3.3 The GLM Procedure
The GLM procedure is quite similar to the REG procedure. In STAT512 and STAT513, either one can be used for the majority of analyses. Specific times where one should be used over another will be addressed in class. Below is an example of a PROC statement that uses the GLM procedure and the RANK procedure to produce a normal scores plot which is good diagnostic check of your data.
PROC GLM DATA=EXAMPLE;
MODEL WEIGHT=HEIGHT;
OUTPUT OUT=REG2 P=PREDY R=RESID;
RUN;
PROC RANK DATA=REG2 OUT=REG3 NORMAL=BLOM;
RANKS NSCORE;
VAR RESID;
RUN;
PROC PLOT DATA=REG3;
PLOT RESID*NSCORE;
RUN;
Again, the model is the same as the REG procedure. Next, you will notice
the PROC RANK procedure. This is ranking the data from the data produced
by the GLM procedure. The NORMAL=BLOM part tells SAS that we are working
within a normal distribution. Then, SAS ranks the normal scores (RANKS
NSCORE) for the residuals. The PLOT procedure is just doing a normal
scores plot.
SECTION 2: Executing the SAS Program You
Created
After you have created your SAS program, save the filename. For book keeping purposes, it is a good idea to save the file as filename.sas. This lets you know within your file manager that this is your SAS program. Once you have the filed saved, go to a terminal and make sure you are in the directory where the SAS program is located. Once you are in the correct directory type the command % sas filename.sas
Once you do this, SAS creates two additional files. One is a .lst file which contains the output from your PROC steps. The other is a .log file. This file is a copy of the original program without the data listed. Any SAS error messages are listed here, along with information about the data set that was created.
If an error was made, simply go into the .log file and find the error message. Many times the error messages are hard to interpret and can often be confusing. You just need to go through your program and check for syntax errors first that are listed above. If these are correct and the problem comes in on a specific PROC procedure refer to the SAS manuals for an idea as to where the error might have come from. Basically, the best way to learn SAS is through trial and error. Use old programs as references for new programs and use the SAS manuals for further information.
I recommend the following book: Applied Statistics and the SAS Programming
Language by Ronald P. Cody and Jeffrey K. Smith. I think it is the
best book out there for those just learning SAS and for a good introduction
into using some of the various PROC statements. The price is about $40
new (Bigwords.com). It’s a book that you can always use and for $40 it’s
not a bad investment.
REFERENCES:
PRINTING TIPS
This puts 2 pages per sheet.
2) % mpage –Pstat_l1 filename.lst
From acrobat reader you can print this file without messing up the printers.
Step2: From here, get your job number and find out the host that it was sent from.
Step3: Login to the host that the job was sent from and use the command:
%lprm –Pstat_l1 jobnumber (Again, note that you need to select the printer that the job was sent to.)
The Home Directory
· The Home Directory is a service provided by ITC that is useful for backing up your computer files.
Features and Advantages of the Home Directory Service
2.The Home Directory is backed up daily. Files that you accidentally delete can usually be recovered.
3.Files are secure in case of hardware failure. If your personal files are saved to the Home Directory Service and the hard disk on your computer crashes, you will have copies of your work.
4.You can publish Web pages without the need to use a Unix account and you will have easy access to personal Web space through the Home Directory.
Step 1: You need an account on blue.unix.virginia.edu. Go to the website http://www.itc.virginia.edu/unixsys/homedir/about.html to see if you have an account. If not, there are directions on how to obtain one.
Step 2: The Home directory is mounted on all the stats machines. Therefore log into one of the stat machines (i.e. Wishart, Galois, etc).
Step 3: Once logged into a stat machine, create a symbolic link to the home directory using the following command % ln –s /h1/m/mj/mjs5b home.dir
% ln –s /h1/s/sa/sax4u home.dir, where the name home.dir is simply the name of the Home Directory.
Step 1: Let’s say you created a Microsoft Word document and you wanted to save it to your Home Directory. Simply save the file on the C: Drive of the PC.
Step2: Using FTP in much the same way as described in the Getting Data section of this handout, ftp the file into your home directory in UNIX.
Step 3: Delete the file from the PC’s C: Drive by finding the file and single clicking on it. Go to the File menu and select Delete. Then click Yes. This is merely recommended if you don’t want others looking at your work.
It should be pointed out that what I’m describing as using the Home
Directory Service is much like ftp’ing files into your UNIX account. This
has to do with the fact that our PC’s are set up in such a way that we
can’t mount the Home Directory Service Login that you would have access
to at other computing labs. Still, this is an excellent way to back up
files, and I would strongly suggest that anything you want saved be put
in your home directory.
Ó Matt Soukup, 2000