USEFUL UNIX COMMANDS

Note: When filename is listed, it is assumed you are in the directory where this file is located. Also, % is a prompt and you need not type this.

Note: directoryname is where you want the copied or the moved file to be stored. Note: if you just want to connect to a new stat host, just type in the new host name only( Example: % rlogin galois). If you want to login to a remote host outside of stat, you need the whole address (Example: % rlogin blue.unix.virginia.edu)
 
 

A useful resource online for many various UNIX commands is located at the following address: http://www.itc.virginia.edu/desktop/unix/docs/u001.unix.resources.html
 
 

GETTING DATA

Method 1 (UNIX to UNIX):

Step1: use command: % ftp newhost

Ex: %ftp pitman.stat.virginia.edu

Step 2: Enter login and password

Ex: If you want data for STAT512, enter stat512(login) and yty-btxty(password)

Step 3: Change to the correct directory using the command: % cd directoryname

Step 4: Use command: % ls to view files

Step 5: To get a particular file: % get filename

Step 6: To exit: % bye
 
 

Method 2: (Between UNIX and PC)

Step 1: Double Click on the WS_FTP95 LE icon.

Step 2: Click the arrow in the Profile Name window

Step 3: Scroll down until you find the account of interest

For example: Scroll down until you find STAT512

Step 4: Highlight the account of interest (STAT512).

(Note: you will see that the connection has already been defined with the current password in place.) Step 5: Click on the OK button (this will establish your connection with the class account in UNIX) (Note: all class directories (on the PC) are listed in the Local System window and the current UNIX account is listed in the Remote System Window) To change directories within either window:
    1. scroll down to the directory of interest
    2. highlight the directory of interest
    3. click on the ChgDir button
To make a directory within either window:
    1. click on the MkDir button
    2. type in directory name
To view a file within either window:
    1. highlight the file you wish to view
    2. click on the View button
Step 6: To ftp a file (from PC to UNIX):
    1. highlight the file of interest (within the Local System)
    2. click on the right arrow button
Step 7: To ftp a file (from UNIX to PC):
    1. highlight the file of interest (within the Remote System)
    2. click on the left arrow button
Step 8: Close the program
 
 

Note: If you forget where your file is located, go to the Start icon on the bottom of the screen and select find/files or folders. Type in the name of your file and it should bring up a list of where the file is located.
 
 

A Basic Introduction to SAS(in UNIX) at the Department of Statistics,
University of Virginia



Section 1: Creating a new SAS program

Within Exceed, open a text editor. This is done by clicking on the up arrow above the paper and pencil icon. Then click on Text Editor. This text editor will be where you write your SAS program.

1.1 Syntax Rules and Statement Ordering (from: Lefkowitz, Jerry M., Introduction to Statistical Computer Packages, Boston: Duxbury Press, 1985.)

  1. A SAS program consists of DATA steps and PROC steps. DATA steps create and/or modify SAS data sets. PROC steps analyze SAS data sets. PROC steps may also create other SAS data sets.
  2. DATA steps begin with a DATA statement, which names the data set. The data set name is optional, but it strongly recommended that you always name that data set. Data set names should be up to eight characters in length, begin with a letter, use only letters, digits, or the underscore character, use no other special characters, and not have an underscore as both the first and last character of the name.
  3. PROC steps begin with a PROC statement, which identifies the PROCEDURE and SAS data set to be used. Specifying the data set name is optional, but it is strongly recommended that you always do it. (If you do not specify which data set to analyze on the PROC step, the data set analyzed is the last one created. IF a previous PROC step created a SAS data set, perhaps without you realizing it, you may end up analyzing the wrong data set. In general, specifying the data set name is relatively painless in terms of the extra amount of work required and guarantees that you will not analyze the wrong data set.)
  4. All SAS statements end with a semicolon. (However, data lines do not end with semicolons because they are not considered SAS statements. Note that a data line is not a SAS statement within a SAS step, but merely a line of data entered in a CARDS statement.) Extra text is not allowed within SAS statements.
  5. Items in a list (Ex: the variable names in an INPUT statement) are separated with blanks (not commas).
  6. You can begin SAS statements anywhere on the line. Statements can be continued on additional lines. Just remember that the semicolon goes at the end of the statement, not necessarily at the end of each line. Also, indenting can help make the program more readable, but is not required.
  7. Variable names can be up to eight characters long, and should start with a letter. Letters and digits can be used in variable names. No special characters can be used, nor can blanks be used.
  8. An underscore can be used as a letter (Ex: SAMPLE_1), but don’t use underscore as both the first and last character.
  9. A hyphen can be used to abbreviate a list of variables in some SAS statements. For example, X1-X4 is equivalent to X1 X2 X3 X4.
  10. SAS is not case sensitive. For example, PROC is equivalent to proc.
1.2 The DATA Step

In most cases, you will be importing a data set from the class UNIX account. For this reason, I will only describe the INFILE command, and not cover inputting the data directly into your program (i.e. using CARDS). Please refer to the handout Getting Data for ways to get SAS data.

For the following examples, assume that we have imported a data set that contains eight variables. These variables are:
 
Variable Description
PULSE1 First pulse rate
PULSE2 Second pulse rate
RAN 1=ran in place, 2=did not run in place
SMOKES 1=smokes regularly, 2=does not smoke regularly
SEX male, female (note: not numeric)
HEIGHT Height in inches
WEIGHT Weight in pounds
ACTIVITY Level of activity (1=light, 2=moderate, 3=heavy)

Sample data line: 64 88 1 1 male 66 140 2 (Note no semicolon at the end)

OPTIONS LS=75;

DATA EXAMPLE;
 INFILE “/home/mjs5b/STAT512/OUR_DATA.SET”;
INPUT PULSE1 PULSE2 RAN SMOKES SEX $ HEIGHT WEIGHT ACTIVITY;
RUN;

On the first line the statement options ls=75; tells SAS to print output in 75 characters per line. This should be the first line to every program. The next line tells SAS that the name of our data set is EXAMPLE. The INFILE statement says that the data set, our_data.set, is located at /home/mjs5b/STAT512. Note that when in quotes, SAS is case sensitive. For example, SAS will not find the data if I had used /HOME/MJS5B/ STAT512/ OUR_DATA.SET. On the next line, note the $ after SEX. This tells SAS that we are dealing with a character (alphanumeric) variable. The RUN statement just tells SAS to execute all statements up until this point.

Now, let’s assume you want to create a new variable. For example, let’s say that we want the average pulse rate.

OPTIONS LS=75;

DATA EXAMPLE;
 INFILE “/home/mjs5b/STAT512/OUR_DATA.SET”;
INPUT PULSE1 PULSE2 RAN SMOKES SEX $ HEIGHT WEIGHT ACTIVITY;
AVGPULSE= (PULSE1 + PULSE2)/2;
RUN;

One can use almost any mathematical operator in defining a new variable name. A list of these is located in the SAS manuals. Just remember in order to create a new variable name, put it at the end of the input statement and end it with a semicolon.

1.3 The PROC Step

The PROC statements immediately follow the DATA statements. PROC’s perform various functions and computations on SAS data sets. For STAT512 and STAT513, you will mostly be working with the either the GLM or REG procedures.

1.3.1 Descriptive Procedures

PROC PRINT DATA=EXAMPLE;
 VAR AVGPULSE SMOKES SEX HEIGHT WEIGHT;
RUN;

This PROC simply prints a list of the data with specified variables: AVGPULSE, SMOKES, SEX, HEIGHT, AND WEIGHT. Note that there are no commas in between the specified variables. If you want a list of all variables, you do not need to include the VAR statement.

PROC MEANS DATA= EXAMPLE N MEAN STD;
 VAR PULSE1 PULSE2 HEIGHT WEIGHT;
 CLASS SEX;
 TITLE ‘Basic Descriptive Statistics’;
RUN;

This PROC statement produces the number of observations (N), the mean (MEAN), and standard deviation (STD), for the specified variables (PULSE1, PULSE2, HEIGHT, WEIGHT) separately for each sex (CLASS SEX statement). The default statistics for the MEANS Procedure are N, MEAN, STD, MAX, and MIN. Again, if you wanted descriptive statistics for all variables you would not use the VAR statement. Also, if you didn’t want descriptive statistics separately for each sex, delete the CLASS statement. The TITLE statement simply gives a title to the table of descriptive statistics in the output. Again, note that when stuff is in quotes it is case sensitive.

1.3.2 The REG Procedure

Let’s say that we are interested in the relationship between height and weight. This is a classic example of simple linear regression. The REG procedure is useful in determining if this relationship is significant, and linearly related. It is also useful in determining the estimates of the coefficients.

PROC REG DATA=EXAMPLE;
 MODEL WEIGHT=HEIGHT;
 OUTPUT OUT=REG1 P=PREDY R=RESID;
RUN;

PROC PLOT DATA=REG1;
 PLOT WEIGHT*HEIGHT PREDY*HEIGHT=’*’ / OVERLAY;
 PLOT RESID*PREDY;
RUN;

The REG procedure performs a regression analysis on the data set EXAMPLE. The MODEL statement specifies that WEIGHT is the dependent variable and HEIGHT is the independent variable. I general, the MODEL statement is as follows: MODEL dependent variable = independent variable(s) / options;. The OUTPUT statement is used to create a new data set that now has variables: PREDY (predicted values of WEIGHT) and RESID (residuals). If you would use the PRINT procedure for the data set REG1, it would be identical to the EXAMPLE data set except that now it would list both the residuals and the predicted values of WEIGHT.

The PLOT procedure is now using the new data set that we generated from our regression analysis, REG1. The first PLOT statement is telling SAS to produce a plot of WEIGHT*HEIGHT and to also produce a plot of PREDICTED WEIGHT*HEIGHT. This second plot is to be overlayed onto the first plot and the point are to be denoted by a ‘*’. The ‘*’ can be any letter, number, or symbol. A plot would look similar to something shown in figure 1. The second PLOT statement is simply producing a residual plot of the predicted values of weight.
 


Figure 1: Plot of weight vs. height with predicted value of weight vs. height denoted as ‘*’.

1.3.3 The GLM Procedure

The GLM procedure is quite similar to the REG procedure. In STAT512 and STAT513, either one can be used for the majority of analyses. Specific times where one should be used over another will be addressed in class. Below is an example of a PROC statement that uses the GLM procedure and the RANK procedure to produce a normal scores plot which is good diagnostic check of your data.

PROC GLM DATA=EXAMPLE;
 MODEL WEIGHT=HEIGHT;
 OUTPUT OUT=REG2 P=PREDY R=RESID;
RUN;

PROC RANK DATA=REG2 OUT=REG3 NORMAL=BLOM;
 RANKS NSCORE;
 VAR RESID;
RUN;

PROC PLOT DATA=REG3;
 PLOT RESID*NSCORE;
RUN;
 

Again, the model is the same as the REG procedure. Next, you will notice the PROC RANK procedure. This is ranking the data from the data produced by the GLM procedure. The NORMAL=BLOM part tells SAS that we are working within a normal distribution. Then, SAS ranks the normal scores (RANKS NSCORE) for the residuals. The PLOT procedure is just doing a normal scores plot.
 

SECTION 2: Executing the SAS Program You Created
 

After you have created your SAS program, save the filename. For book keeping purposes, it is a good idea to save the file as filename.sas. This lets you know within your file manager that this is your SAS program. Once you have the filed saved, go to a terminal and make sure you are in the directory where the SAS program is located. Once you are in the correct directory type the command % sas filename.sas

Once you do this, SAS creates two additional files. One is a .lst file which contains the output from your PROC steps. The other is a .log file. This file is a copy of the original program without the data listed. Any SAS error messages are listed here, along with information about the data set that was created.

If an error was made, simply go into the .log file and find the error message. Many times the error messages are hard to interpret and can often be confusing. You just need to go through your program and check for syntax errors first that are listed above. If these are correct and the problem comes in on a specific PROC procedure refer to the SAS manuals for an idea as to where the error might have come from. Basically, the best way to learn SAS is through trial and error. Use old programs as references for new programs and use the SAS manuals for further information.

I recommend the following book: Applied Statistics and the SAS Programming Language by Ronald P. Cody and Jeffrey K. Smith. I think it is the best book out there for those just learning SAS and for a good introduction into using some of the various PROC statements. The price is about $40 new (Bigwords.com). It’s a book that you can always use and for $40 it’s not a bad investment.
 
 

REFERENCES:

  1. Lefkowitz, Jerry M., Introduction to Statistical Computer Packages, Boston: Duxbury Press, 1985.
  2. Cody, Ronald P. and Smith, Jeffrey K., Applied Statistics and the SAS Programming Language, Upper Saddle River, New Jersey: Prentice Hall, 1997.

 
 

PRINTING TIPS

stat_l1 is the default printer for Unix print jobs. As a result, you don’t need to use –Pstat_l1 if you want your print job sent to stat_l1. If you want to send a print job to stat_l2, simply replace –Pstat_l1 with –Pstat_l2.         1) % mpage –2l –L62 –Pstat_l1 filename.lst

                This puts 2 pages per sheet.

        2) % mpage –Pstat_l1 filename.lst


                This puts 4 pages per sheet. This is recommended for large amounts of output. This will jam up the printers and make my life miserable. In order to print these types of files do the following: %acroread filename.pdf

From acrobat reader you can print this file without messing up the printers.

Step 1: From the directory where the job was sent: %lpq –Pstat_l1 (note: Select the printer that you sent the job to.)

Step2: From here, get your job number and find out the host that it was sent from.

Step3: Login to the host that the job was sent from and use the command:

%lprm –Pstat_l1 jobnumber (Again, note that you need to select the printer that the job was sent to.)

Final note: The above commands used –Pstat_l1 (The default printer). Replacing this with –Pstat_l2 merely just changes the destination of the print job, and no harm is done. It is recommended by myself, Matt Soukup, that you use the default printer –l1. If there are printer problems, it is much easier for me to correct them on the HP printer. Hence making my job less complicated.
 
 


The Home Directory

· The Home Directory is a service provided by ITC that is useful for backing up your computer files.

Features and Advantages of the Home Directory Service

1.You can save, store* and access files (whether you are using a computer on-grounds or off-grounds) on a central file service. This eliminates the need for floppy disks as file storage, since files stored on the Home Directory Service are accessible from any computer connected to the UVa network.

2.The Home Directory is backed up daily. Files that you accidentally delete can usually be recovered.

3.Files are secure in case of hardware failure. If your personal files are saved to the Home Directory Service and the hard disk on your computer crashes, you will have copies of your work.

4.You can publish Web pages without the need to use a Unix account and you will have easy access to personal Web space through the Home Directory.

How to get setup using the Home Directory

Step 1: You need an account on blue.unix.virginia.edu. Go to the website http://www.itc.virginia.edu/unixsys/homedir/about.html to see if you have an account. If not, there are directions on how to obtain one.

Step 2: The Home directory is mounted on all the stats machines. Therefore log into one of the stat machines (i.e. Wishart, Galois, etc).

Step 3: Once logged into a stat machine, create a symbolic link to the home directory using the following command % ln –s /h1/m/mj/mjs5b home.dir

Note: /m/mj/mjs5b is simply a path that leads to my account. For example if your computing ID were sax4u you would use the command

% ln –s /h1/s/sa/sax4u home.dir, where the name home.dir is simply the name of the Home Directory.

How to Save Files in Your Home Directory

Step 1: Let’s say you created a Microsoft Word document and you wanted to save it to your Home Directory. Simply save the file on the C: Drive of the PC.

Step2: Using FTP in much the same way as described in the Getting Data section of this handout, ftp the file into your home directory in UNIX.

Step 3: Delete the file from the PC’s C: Drive by finding the file and single clicking on it. Go to the File menu and select Delete. Then click Yes. This is merely recommended if you don’t want others looking at your work.

It should be pointed out that what I’m describing as using the Home Directory Service is much like ftp’ing files into your UNIX account. This has to do with the fact that our PC’s are set up in such a way that we can’t mount the Home Directory Service Login that you would have access to at other computing labs. Still, this is an excellent way to back up files, and I would strongly suggest that anything you want saved be put in your home directory.
 

Ó Matt Soukup, 2000