www.pudn.com > tpWY.rar > README
This file contains usage information for the programs tpWY and one_sided_tpWY, developed by CBIL at the Center for Bioinformatics at the University of Pennsylvania (http://www.cbil.upenn.edu). The input file format for these two programs is also discussed below. ---------- tpWY: This program computes t-statistics and p-values (both unadjusted and adjusted according to the Westfall and Young step-down algorithm) for two-sided alternative hypotheses, as described in the paper Dudoit S., Yang Y.H., Callow M.J., Speed T.P. (2000), UC Berkeley, Technical report #578. The program takes up to 9 arguments usage: tpWY.... where arg1 is mandatory and is the name of the input file. arg2 is mandatory and is the name of the output file. arg3 is mandatory and is the number of rows in the input file. arg4 is mandatory and is the number of columns in the input file (INCLUDING the identifiers column). arg5 is mandatory and is the number of columns corresponding to the first of the two groups being compared. arg6 is mandatory and is a fixed float value used to represent missing values in the input file (see input file format below). arg7 is mandatory and can be either 0 or 1. If set to 0, only the t-statistics are computed; if set to 1, the unadjusted and adjusted p-values are also computed. (The output is sorted by t-statistic.) arg8 is mandatory when arg7 is set to 1 and can be either 0 or 1. If set to 1, all rearrangements of the data columns are considered when p-values are computed; if set to 0, arg9 is mandatory and should be an integer which gives the number of random rearrangements to be used in such computations. arg8 should be set to 0 and arg9 should be provided when the number of all possible rearrangements is very large. Example 1: if the input file is in_file and the desired output file is out_file and if: (i) there are 6000 rows in the input file, (ii) there are 17 columns in the input file (including the indentifier column), of which the first 8 belong to the first group, (iii) missing values are represented by -100 (which therefore does not correspond to any actual value) (iv) p-value calculations are desired using all possible column rearrangements, then the usage is tpWY in_file out_file 6000 17 8 -100 1 1 ---------- one_sided_tpWY: This program is the analogous of tpWY for one sided alternative hypotheses. The usage is the same as that for tpWY except the an extra argument (arg0) must be provided BEFORE the arguments described above. arg0 should be either 1 or -1. For the alternative hypothesis (mean of group 2)>(mean of group 1), set arg0 to 1. For the alternative hypothesis (mean of group 2)<(mean of group 2), set arg0 to -1. Example 2: For the same scenario as in Example 1 and to test the alternative hypothesis that (mean of group 2)<(mean of group 1), the usage is one_sided_tpWY -1 in_file out_file 6000 17 8 -100 1 1 ---------- Input file format: The input file should be a tab delimited text file (any two adjacent columns should be separated by a tab), whose first column contains the identifiers (20 characters max, NO WHITE SPACES), e.g. image clone ids. Subsequent columns are the data columns. If you wish to normalize and/or log your intensities, you should do so before feeding the input to the above programs (tpWY and one_sided_tpWY take the intensities as provided by the input file, they do not normalize or transform them in any way). Put the columns for your group 1 (e.g. your "control" group) first, preceding all the columns for your group 2 (e.g. your "treatment" group). There should be no header nor footer nor any empty line at the beginnig or end of the input file. Missing values: the programs deal with missing values, but for ease of parsing and speed these should be represented by a fixed float (e.g. -100 or any other number which does not correspond to an actual value appearing in your data). Do not use labels like "na" or "*" or any of the like to denote missing values. Just choose any number which is not an actual value and put this in place of any missing value in your data. This way each row in your input file will contain the same number of values. The value used to represent missing values should be provided as an argument when the programs are run (see the usage described above). ---------- NOTE on tpWY and one_sided_tpWY: If the t-statistic is undefined for one of the gene tags in the original set, then the program exits with an explanation message. So the user can remove the rows with problems from the input file and run the program on the data for which the t-statistic is defined. If the t-statistic is undefined for one of the gene tags in one of the permuted sets, the program discards that set and at the end it outputs the actual number of permutations employed in the calculation of p-values. ----------