USEFUL UNIX COMMANDS
Note: When filename is listed, it is assumed you are in the directory where this file is located. Also, % is a prompt and you need not type this.
Note: directoryname is where you want the copied or the moved file to be stored.
Note: if you just want to connect to a new stat host, just type in the new host name only( Example: % rlogin galois). If you want to login to a remote host outside of stat, you need the whole address (Example: % rlogin blue.unix.virginia.edu)
A useful resource online for many various UNIX commands is located at the following address: http://www.itc.virginia.edu/desktop/unix/docs/u001.unix.resources.html
GETTING DATA
Method 1 (UNIX to UNIX):
Step1: use command: % ftp newhost
Ex: %ftp pitman.stat.virginia.edu
Step 2: Enter login and password
Ex: If you want data for STAT512, enter stat512(login) and yty-btxty(password)
Step 3: Change to the correct directory using the command: % cd directoryname
Step 4: Use command: % ls to view files
Step 5: To get a particular file: % get filename
Step 6: To exit: % bye
Method 2: (Between UNIX and PC)
Step 1: Double Click on the WS_FTP95 LE icon.
Step 2: Click the arrow in the Profile Name window
Step 3: Scroll down until you find the account of interest
For example: Scroll down until you find STAT512
Step 4: Highlight the account of interest (STAT512).
(Note: you will see that the connection has already been defined with the current password in place.)
Step 5: Click on the OK button (this will establish your connection with the class account in UNIX)
(Note: all class directories (on the PC) are listed in the Local System window and the current UNIX account is listed in the Remote System Window)
To change directories within either window:
To make a directory within either window:
To view a file within either window:
Step 6: To ftp a file (from PC to UNIX):
Step 7: To ftp a file (from UNIX to PC):
Step 8: Close the program
Note: If you forget where your file is located, go to the Start icon on the bottom of the screen and select find/files or folders. Type in the name of your file and it should bring up a list of where the file is located.
SAS 101
A Basic Introduction to SAS at the Department of Statistics, University of Virginia
Section 1: Creating a new SAS program
Within Exceed, open a text editor. This is done by clicking on the up arrow above the paper and pencil icon. Then click on Text Editor. This text editor will be where you write your SAS program.
1.1 Syntax Rules and Statement Ordering (from: Lefkowitz, Jerry M., Introduction to Statistical Computer Packages, Boston: Duxbury Press, 1985.)
1.2 The DATA Step
In most cases, you will be importing a data set from the class UNIX account. For this reason, I will only describe the INFILE command, and not cover inputting the data directly into your program (i.e. using CARDS). Please refer to the handout Getting Data for ways to get SAS data.
For the following examples, assume that we have imported a data set that contains eight variables. These variables are:
|
Variable |
Description |
|
PULSE1 |
First pulse rate |
|
PULSE2 |
Second pulse rate |
|
RAN |
1=ran in place, 2=did not run in place |
|
SMOKES |
1=smokes regularly, 2=does not smoke regularly |
|
SEX |
male, female (note: not numeric) |
|
HEIGHT |
Height in inches |
|
WEIGHT |
Weight in pounds |
|
ACTIVITY |
Level of activity (1=light, 2=moderate, 3=heavy) |
Sample data line: 64 88 1 1 male 66 140 2 (Note no semicolon at the end)
On the first line the statement options ls=75; tells SAS to print output in 75 characters per line. This should be the first line to every program. The next line tells SAS that the name of our data set is EXAMPLE. The INFILE statement says that the data set, our_data.set, is located at /home/mjs5b/STAT512. Note that when in quotes, SAS is case sensitive. For example, SAS will not find the data if I had used /HOME/MJS5B/ STAT512/ OUR_DATA.SET. On the next line, note the $ after SEX. This tells SAS that we are dealing with a character (alphanumeric) variable. The RUN statement just tells SAS to execute all statements up until this point.
Now, let’s assume you want to create a new variable. For example, let’s say that we want the average pulse rate.
One can use almost any mathematical operator in defining a new variable name. A list of these is located in the SAS manuals. Just remember in order to create a new variable name, put it at the end of the input statement and end it with a semicolon.
1.3 The PROC Step
The PROC statements immediately follow the DATA statements. PROC’s perform various functions and computations on SAS data sets. For STAT512 and STAT513, you will mostly be working with the either the GLM or REG procedures.
1.3.1 Descriptive Procedures
This PROC simply prints a list of the data with specified variables: AVGPULSE, SMOKES, SEX, HEIGHT, AND WEIGHT. Note that there are no commas in between the specified variables. If you want a list of all variables, you do not need to include the VAR statement.
This PROC statement produces the number of observations (N), the mean (MEAN), and standard deviation (STD), for the specified variables (PULSE1, PULSE2, HEIGHT, WEIGHT) separately for each sex (CLASS SEX statement). The default statistics for the MEANS Procedure are N, MEAN, STD, MAX, and MIN. Again, if you wanted descriptive statistics for all variables you would not use the VAR statement. Also, if you didn’t want descriptive statistics separately for each sex, delete the CLASS statement. The TITLE statement simply gives a title to the table of descriptive statistics in the output. Again, note that when stuff is in quotes it is case sensitive.
1.3.2 The REG Procedure
Let’s say that we are interested in the relationship between height and weight. This is a classic example of simple linear regression. The REG procedure is useful in determining if this relationship is significant, and linearly related. It is also useful in determining the estimates of the coefficients.
The REG procedure in (1) does a regression analysis on the data set EXAMPLE. The MODEL statement specifies that WEIGHT is the dependent variable and HEIGHT is the independent variable. I general, the MODEL statement is as follows: MODEL dependent variable = independent variable(s) / options;. The OUTPUT statement is used to create a new data set that now has variables: PREDY (predicted values of WEIGHT) and RESID (residuals). If you would use the PRINT procedure for the data set REG1, it would be identical to the EXAMPLE data set except that now it would list both the residuals and the predicted values of WEIGHT.
The PLOT procedure in (2) is now using the new data set that we generated from our regression analysis, REG1. The first PLOT statement is telling SAS to produce a plot of WEIGHT*HEIGHT and to also produce a plot of PREDICTED WEIGHT*HEIGHT. This second plot is to be overlayed onto the first plot and the point are to be denoted by a ‘*’. The ‘*’ can be any letter, number, or symbol. A plot would look similar to something shown in figure 1. The second PLOT statement is simply producing a residual plot of the predicted values of weight.
Figure 1: Plot of weight vs. height with predicted value of weight vs. height denoted as ‘*’.
1.3.3 The GLM Procedure
The GLM procedure is quite similar to the REG procedure. In STAT512 and STAT513, either one can be used for the majority of analyses. Specific times where one should be used over another will be addressed in class. Below is an example of a PROC statement that uses the GLM procedure and the RANK procedure to produce a normal scores plot which is good diagnostic check of your data.
Again, the model is the same as the REG procedure. Next, you will notice the PROC RANK procedure. This is ranking the data from the data produced by the GLM procedure. The NORMAL=BLOM part tells SAS that we are working within a normal distribution. Then, SAS ranks the normal scores (RANKS NSCORE) for the residuals. The PLOT procedure is just doing a normal scores plot.
SECTION 2: Executing the SAS Program You Created
After you have created your SAS program, save the filename. For book keeping purposes, it is a good idea to save the file as filename.sas. This lets you know within your file manager that this is your SAS program. Once you have the filed saved, go to a terminal and make sure you are in the directory where the SAS program is located. Once you are in the correct directory type the command % sas filename.sas
Once you do this, SAS creates two additional files. One is a .lst file which contains the output from your PROC steps. The other is a .log file. This file is a copy of the original program without the data listed. Any SAS error messages are listed here, along with information about the data set that was created.
If an error was made, simply go into the .log file and find the error message. Many times the error messages are hard to interpret and can often be confusing. You just need to go through your program and check for syntax errors first that are listed above. If these are correct and the problem comes in on a specific PROC procedure refer to the SAS manuals for an idea as to where the error might have come from. Basically, the best way to learn SAS is through trial and error. Use old programs as references for new programs and use the SAS manuals for further information.
I recommend the following book: Applied Statistics and the SAS Programming Language by Ronald P. Cody and Jeffrey K. Smith. I think it is the best book out there for those just learning SAS and for a good introduction into using some of the various PROC statements. The price is about $40 new (Bigwords.com). It’s a book that you can always use and for $40 it’s not a bad investment.
REFERENCES:
PRINTING TIPS
stat_l1 is the default printer for Unix print jobs. As a result, you don’t need to use –Pstat_l1 if you want your print job sent to stat_l1. If you want to send a print job to stat_l2, simply replace –Pstat_l1 with –Pstat_l2.
1) % mpage –2l –L62 –Pstat_l1 filename.lst
This puts 2 pages per sheet.
2) host:/home/username/correct.dir % mpage –Pstat_l1 filename.lst
This will jam up the printers and make my life miserable. In order to print these types of files do the following: %acroread filename.pdf
From acrobat reader you can print this file without messing up the printers.
Step 1: From the directory where the job was sent: %lpq –Pstat_l1 (note: Select the printer that you sent the job to.)
Step2: From here, get your job number and find out the host that it was sent from.
Step3: Login to the host that the job was sent from and use the command:
%lprm –Pstat_l1 jobnumber (Again, note that you need to select the printer that the job was sent to.)
Final note: The above commands used –Pstat_l1 (The default printer). Replacing this with –Pstat_l2 merely just changes the destination of the print job, and no harm is done.
Ó Matt Soukup, 2000