www.pudn.com > tpWY.rar > README



This file contains usage information for the programs tpWY and 
one_sided_tpWY, developed by CBIL at the Center for Bioinformatics at
the University of Pennsylvania (http://www.cbil.upenn.edu).

The input file format for these two programs is also discussed below.

----------

tpWY:

This program computes t-statistics and p-values (both unadjusted and 
adjusted according to the Westfall and Young step-down algorithm) for 
two-sided alternative hypotheses, as described in the paper 
Dudoit S., Yang Y.H., Callow M.J., Speed T.P. (2000), UC Berkeley, 
Technical report #578.

The program takes up to 9 arguments

usage: tpWY   .... 

where

arg1 is mandatory and is the name of the input file.
arg2 is mandatory and is the name of the output file.
arg3 is mandatory and is the number of rows in the input file.
arg4 is mandatory and is the number of columns in the input file 
(INCLUDING the identifiers column).
arg5 is mandatory and is the number of columns corresponding to the first
of the two groups being compared.
arg6 is mandatory and is a fixed float value used to represent missing 
values in the input file (see input file format below).
arg7 is mandatory and can be either 0 or 1. If set to 0, only the 
t-statistics are computed; if set to 1, the unadjusted and adjusted 
p-values are also computed. (The output is sorted by t-statistic.)
arg8 is mandatory when arg7 is set to 1 and can be either 0 or 1. If set
to 1, all rearrangements of the data columns are considered when p-values
are computed; if set to 0, arg9 is mandatory and should be an integer 
which gives the number of random rearrangements to be used in such 
computations. arg8 should be set to 0 and arg9 should be provided when the 
number of all possible rearrangements is very large.

Example 1: if the input file is in_file and the desired output file is
out_file and if:
(i) there are 6000 rows in the input file,
(ii) there are 17 columns in the input file (including the indentifier 
column), of which the first 8 belong to the first group,
(iii) missing values are represented by -100 (which therefore does not 
correspond to any actual value)
(iv) p-value calculations are desired using all possible column 
rearrangements,

then the usage is

tpWY in_file out_file 6000 17 8 -100 1 1
----------

one_sided_tpWY:

This program is the analogous of tpWY for one sided alternative 
hypotheses. The usage is the same as that for tpWY except the an extra 
argument (arg0) must be provided BEFORE the arguments described above. 

arg0 should be either 1 or -1.
For the alternative hypothesis (mean of group 2)>(mean of group 1), set 
arg0 to 1.
For the alternative hypothesis (mean of group 2)<(mean of group 2), set 
arg0 to -1.

Example 2:
For the same scenario as in Example 1 and to test the alternative 
hypothesis that (mean of group 2)<(mean of group 1), the usage is

one_sided_tpWY -1 in_file out_file 6000 17 8 -100 1 1

----------

Input file format:

The input file should be a tab delimited text file (any two adjacent 
columns should be separated by a tab), whose first column 
contains the identifiers (20 characters max, NO WHITE SPACES), e.g. image 
clone ids. 
Subsequent columns are the data columns. If you wish to normalize and/or
log your intensities, you should do so before feeding the input to the 
above programs (tpWY and one_sided_tpWY take the intensities as 
provided by the input file, they do not normalize or transform them in 
any way).
Put the columns for your group 1 (e.g. your "control" group) first, 
preceding all the columns for your group 2 (e.g. your "treatment" group).

There should be no header nor footer nor any empty line at the beginnig 
or end of the input file. 

Missing values: the programs deal with missing values, but for ease of 
parsing and speed these should be represented by a fixed float (e.g. -100
or any other number which does not correspond to an actual value 
appearing in your data). Do not use labels like "na" or "*" or any of the
like to denote missing values. Just choose any number which is not an
actual value and put this in place of any missing value in your data. 
This way each row in your input file will contain the same number of 
values. The value used to represent missing values should be provided as
an argument when the programs are run (see the usage described above).

----------

NOTE on tpWY and one_sided_tpWY: If the t-statistic is undefined for one 
of the gene tags in the original set, then the program exits with an explanation 
message. So the user can remove the rows with problems from the input file and 
run the program on the data for which the t-statistic is defined.
If the t-statistic is undefined for one of the gene tags in one of the permuted 
sets, the program discards that set and at the end it outputs the actual number 
of permutations employed in the calculation of p-values.

----------