USEFUL UNIX COMMANDS

Note: When filename is listed, it is assumed you are in the directory where this file is located. Also, % is a prompt and you need not type this.

 

Note: directoryname is where you want the copied or the moved file to be stored.

Note: if you just want to connect to a new stat host, just type in the new host name only( Example: % rlogin galois). If you want to login to a remote host outside of stat, you need the whole address (Example: % rlogin blue.unix.virginia.edu)

 

 

 

A useful resource online for many various UNIX commands is located at the following address: http://www.itc.virginia.edu/desktop/unix/docs/u001.unix.resources.html

 

 

GETTING DATA

 

Method 1 (UNIX to UNIX):

Step1: use command: % ftp newhost

Ex: %ftp pitman.stat.virginia.edu

Step 2: Enter login and password

Ex: If you want data for STAT512, enter stat512(login) and yty-btxty(password)

Step 3: Change to the correct directory using the command: % cd directoryname

Step 4: Use command: % ls to view files

Step 5: To get a particular file: % get filename

Step 6: To exit: % bye

 

 

Method 2: (Between UNIX and PC)

 

Step 1: Double Click on the WS_FTP95 LE icon.

Step 2: Click the arrow in the Profile Name window

Step 3: Scroll down until you find the account of interest

For example: Scroll down until you find STAT512

Step 4: Highlight the account of interest (STAT512).

(Note: you will see that the connection has already been defined with the current password in place.)

Step 5: Click on the OK button (this will establish your connection with the class account in UNIX)

(Note: all class directories (on the PC) are listed in the Local System window and the current UNIX account is listed in the Remote System Window)

 

To change directories within either window:

    1. scroll down to the directory of interest
    2. highlight the directory of interest
    3. click on the ChgDir button

 

To make a directory within either window:

    1. click on the MkDir button
    2. type in directory name

 

To view a file within either window:

    1. highlight the file you wish to view
    2. click on the View button

 

Step 6: To ftp a file (from PC to UNIX):

    1. highlight the file of interest (within the Local System)
    2. click on the right arrow button

Step 7: To ftp a file (from UNIX to PC):

    1. highlight the file of interest (within the Remote System)
    2. click on the left arrow button

Step 8: Close the program

 

 

Note: If you forget where your file is located, go to the Start icon on the bottom of the screen and select find/files or folders. Type in the name of your file and it should bring up a list of where the file is located.

 

 

 

 

 

 

 

 

 

SAS 101

A Basic Introduction to SAS at the Department of Statistics, University of Virginia

 

 

Section 1: Creating a new SAS program

 

Within Exceed, open a text editor. This is done by clicking on the up arrow above the paper and pencil icon. Then click on Text Editor. This text editor will be where you write your SAS program.

 

1.1 Syntax Rules and Statement Ordering (from: Lefkowitz, Jerry M., Introduction to Statistical Computer Packages, Boston: Duxbury Press, 1985.)

 

  1. A SAS program consists of DATA steps and PROC steps. DATA steps create and/or modify SAS data sets. PROC steps analyze SAS data sets. PROC steps may also create other SAS data sets.
  2.  

  3. DATA steps begin with a DATA statement, which names the data set. The data set name is optional, but it strongly recommended that you always name that data set. Data set names should be up to eight characters in length, begin with a letter, use only letters, digits, or the underscore character, use no other special characters, and not have an underscore as both the first and last character of the name.
  4.  

  5. PROC steps begin with a PROC statement, which identifies the PROCEDURE and SAS data set to be used. Specifying the data set name is optional, but it is strongly recommended that you always do it. (If you do not specify which data set to analyze on the PROC step, the data set analyzed is the last one created. IF a previous PROC step created a SAS data set, perhaps without you realizing it, you may end up analyzing the wrong data set. In general, specifying the data set name is relatively painless in terms of the extra amount of work required and guarantees that you will not analyze the wrong data set.)
  6.  

  7. All SAS statements end with a semicolon. (However, data lines do not end with semicolons because they are not considered SAS statements. Note that a data line is not a SAS statement within a SAS step, but merely a line of data entered in a CARDS statement.) Extra text is not allowed within SAS statements.
  8.  

  9. Items in a list (Ex: the variable names in an INPUT statement) are separated with blanks (not commas).
  10.  

  11. You can begin SAS statements anywhere on the line. Statements can be continued on additional lines. Just remember that the semicolon goes at the end of the statement, not necessarily at the end of each line. Also, indenting can help make the program more readable, but is not required.
  12.  

  13. Variable names can be up to eight characters long, and should start with a letter. Letters and digits can be used in variable names. No special characters can be used, nor can blanks be used.
  14.  

  15. An underscore can be used as a letter (Ex: SAMPLE_1), but don’t use underscore as both the first and last character.
  16.  

  17. A hyphen can be used to abbreviate a list of variables in some SAS statements. For example, X1-X4 is equivalent to X1 X2 X3 X4.
  18.  

  19. SAS is not case sensitive. For example, PROC is equivalent to proc.

 

1.2 The DATA Step

 

In most cases, you will be importing a data set from the class UNIX account. For this reason, I will only describe the INFILE command, and not cover inputting the data directly into your program (i.e. using CARDS). Please refer to the handout Getting Data for ways to get SAS data.

 

For the following examples, assume that we have imported a data set that contains eight variables. These variables are:

 

Variable

Description

PULSE1

First pulse rate

PULSE2

Second pulse rate

RAN

1=ran in place, 2=did not run in place

SMOKES

1=smokes regularly, 2=does not smoke regularly

SEX

male, female (note: not numeric)

HEIGHT

Height in inches

WEIGHT

Weight in pounds

ACTIVITY

Level of activity (1=light, 2=moderate, 3=heavy)

 

Sample data line: 64 88 1 1 male 66 140 2 (Note no semicolon at the end)

 

 

 

 

 

 

 

 

 

On the first line the statement options ls=75; tells SAS to print output in 75 characters per line. This should be the first line to every program. The next line tells SAS that the name of our data set is EXAMPLE. The INFILE statement says that the data set, our_data.set, is located at /home/mjs5b/STAT512. Note that when in quotes, SAS is case sensitive. For example, SAS will not find the data if I had used /HOME/MJS5B/ STAT512/ OUR_DATA.SET. On the next line, note the $ after SEX. This tells SAS that we are dealing with a character (alphanumeric) variable. The RUN statement just tells SAS to execute all statements up until this point.

 

Now, let’s assume you want to create a new variable. For example, let’s say that we want the average pulse rate.

 

 

 

 

 

 

 

 

 

 

One can use almost any mathematical operator in defining a new variable name. A list of these is located in the SAS manuals. Just remember in order to create a new variable name, put it at the end of the input statement and end it with a semicolon.

 

1.3 The PROC Step

 

The PROC statements immediately follow the DATA statements. PROC’s perform various functions and computations on SAS data sets. For STAT512 and STAT513, you will mostly be working with the either the GLM or REG procedures.

 

1.3.1 Descriptive Procedures

 

 

 

This PROC simply prints a list of the data with specified variables: AVGPULSE, SMOKES, SEX, HEIGHT, AND WEIGHT. Note that there are no commas in between the specified variables. If you want a list of all variables, you do not need to include the VAR statement.

 

 

 

 

 

 

This PROC statement produces the number of observations (N), the mean (MEAN), and standard deviation (STD), for the specified variables (PULSE1, PULSE2, HEIGHT, WEIGHT) separately for each sex (CLASS SEX statement). The default statistics for the MEANS Procedure are N, MEAN, STD, MAX, and MIN. Again, if you wanted descriptive statistics for all variables you would not use the VAR statement. Also, if you didn’t want descriptive statistics separately for each sex, delete the CLASS statement. The TITLE statement simply gives a title to the table of descriptive statistics in the output. Again, note that when stuff is in quotes it is case sensitive.

 

1.3.2 The REG Procedure

 

Let’s say that we are interested in the relationship between height and weight. This is a classic example of simple linear regression. The REG procedure is useful in determining if this relationship is significant, and linearly related. It is also useful in determining the estimates of the coefficients.

 

 

 

 

 

 

 

The REG procedure in (1) does a regression analysis on the data set EXAMPLE. The MODEL statement specifies that WEIGHT is the dependent variable and HEIGHT is the independent variable. I general, the MODEL statement is as follows: MODEL dependent variable = independent variable(s) / options;. The OUTPUT statement is used to create a new data set that now has variables: PREDY (predicted values of WEIGHT) and RESID (residuals). If you would use the PRINT procedure for the data set REG1, it would be identical to the EXAMPLE data set except that now it would list both the residuals and the predicted values of WEIGHT.

The PLOT procedure in (2) is now using the new data set that we generated from our regression analysis, REG1. The first PLOT statement is telling SAS to produce a plot of WEIGHT*HEIGHT and to also produce a plot of PREDICTED WEIGHT*HEIGHT. This second plot is to be overlayed onto the first plot and the point are to be denoted by a ‘*’. The ‘*’ can be any letter, number, or symbol. A plot would look similar to something shown in figure 1. The second PLOT statement is simply producing a residual plot of the predicted values of weight.

 

 

 

 

 

Figure 1: Plot of weight vs. height with predicted value of weight vs. height denoted as ‘*’.

 

1.3.3 The GLM Procedure

 

The GLM procedure is quite similar to the REG procedure. In STAT512 and STAT513, either one can be used for the majority of analyses. Specific times where one should be used over another will be addressed in class. Below is an example of a PROC statement that uses the GLM procedure and the RANK procedure to produce a normal scores plot which is good diagnostic check of your data.

 

 

 

 

 

 

 

 

 

 

 

 

 

Again, the model is the same as the REG procedure. Next, you will notice the PROC RANK procedure. This is ranking the data from the data produced by the GLM procedure. The NORMAL=BLOM part tells SAS that we are working within a normal distribution. Then, SAS ranks the normal scores (RANKS NSCORE) for the residuals. The PLOT procedure is just doing a normal scores plot.

 

 

 

SECTION 2: Executing the SAS Program You Created

 

 

After you have created your SAS program, save the filename. For book keeping purposes, it is a good idea to save the file as filename.sas. This lets you know within your file manager that this is your SAS program. Once you have the filed saved, go to a terminal and make sure you are in the directory where the SAS program is located. Once you are in the correct directory type the command % sas filename.sas

Once you do this, SAS creates two additional files. One is a .lst file which contains the output from your PROC steps. The other is a .log file. This file is a copy of the original program without the data listed. Any SAS error messages are listed here, along with information about the data set that was created.

If an error was made, simply go into the .log file and find the error message. Many times the error messages are hard to interpret and can often be confusing. You just need to go through your program and check for syntax errors first that are listed above. If these are correct and the problem comes in on a specific PROC procedure refer to the SAS manuals for an idea as to where the error might have come from. Basically, the best way to learn SAS is through trial and error. Use old programs as references for new programs and use the SAS manuals for further information.

I recommend the following book: Applied Statistics and the SAS Programming Language by Ronald P. Cody and Jeffrey K. Smith. I think it is the best book out there for those just learning SAS and for a good introduction into using some of the various PROC statements. The price is about $40 new (Bigwords.com). It’s a book that you can always use and for $40 it’s not a bad investment.

 

 

REFERENCES:

 

  1. Lefkowitz, Jerry M., Introduction to Statistical Computer Packages, Boston: Duxbury Press, 1985.
  2.  

  3. Cody, Ronald P. and Smith, Jeffrey K., Applied Statistics and the SAS Programming Language, Upper Saddle River, New Jersey: Prentice Hall, 1997.

 

 

 

 

 

 

 

 

 

PRINTING TIPS

 

stat_l1 is the default printer for Unix print jobs. As a result, you don’t need to use –Pstat_l1 if you want your print job sent to stat_l1. If you want to send a print job to stat_l2, simply replace –Pstat_l1 with –Pstat_l2.

1) % mpage –2l –L62 –Pstat_l1 filename.lst

This puts 2 pages per sheet.

2) host:/home/username/correct.dir % mpage –Pstat_l1 filename.lst


This puts 4 pages per sheet. This is recommended for large amounts of output.

This will jam up the printers and make my life miserable. In order to print these types of files do the following: %acroread filename.pdf

From acrobat reader you can print this file without messing up the printers.

Step 1: From the directory where the job was sent: %lpq –Pstat_l1 (note: Select the printer that you sent the job to.)

Step2: From here, get your job number and find out the host that it was sent from.

Step3: Login to the host that the job was sent from and use the command:

%lprm –Pstat_l1 jobnumber (Again, note that you need to select the printer that the job was sent to.)

 

Final note: The above commands used –Pstat_l1 (The default printer). Replacing this with –Pstat_l2 merely just changes the destination of the print job, and no harm is done.

 

 

Ó Matt Soukup, 2000