www.pudn.com > WAPtree.rar > WAPtree.cpp
/*
This is the WAP algorithm program based on the description in:
Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu, "Mining Access
Patterns Eciently from Web Logs", PAKDD 2000.
1.Development environment:
Although initial version is developed under the
hardware/software environment specified below, the program
runs on more powerful and faster multiprocessor UNIX environments as well.
(1)Hardware: Intel Celeron 400 PC, 64M Memory;
(2)Operting system: Windows 98;
(3)Development tool: Inprise(Borland) C++ Builder 6.0.
Note: The algorithm is developed under C++ Builder 6.0. However, it is possible
to compile the program under any standard C++ development tools and
run it.
Program is in WAPtree.cpp, compiled with Unix "g++ WAPtree.cpp" and
executed with a.out.
2. INPUT:
(1) test.data:
For simplifying input process of the program, we assume that all input data
has been preprocessed such that all events belongs to a same user id have been
gathered together, and formed as a sequence which is saved in a named "test.data".
The "test.data" file is composed of hundreds of thousands lines of sequence
where each line represents a web access sequence for each user.
Every line include UserID, length of sequence and the sequence which are
seperated by table spaces.
For example, given a line:
100 5 10 20 40 10 30
100 represents UserID, 5 means the length of sequence is 5, the sequence is
10,20,40,10,30.
(2) middle.data:
used to save the conditoinal middle pattern. During the mining process of
WAP tree, following the linkage, once the sum of support for a event is
found greater than minimum support, all its prefix condition patterns are
saved in the "middle.data" file for next round mining. The format of
"middle.data" is following:
each line include the length of sequence, the occurrence of the sequence,
and the events in the sequence.
For example, given a line in middle.data:
5 4 10 20 40 10 30
5 means the length of sequence is 5, 4 indicates the sequence occurred 4
times in the previous condition WAP tree. the sequence is 10,20,40,10,30.
(3) minimum support:
The program also needs to accept a value between 0 and 1 which is called
minimum support.
The minimum support is prompted to enter when the program starts.
3. OUTPUT: result_WAP.data
Once the program terminates, we can find the result patterns in a file named
"result_WAP.data".
It may contains lines of patterns. Each line represents a pattern.
4. METHODS:
(1)BuildTree: Build the WAP tree/conditional WAP tree
(2)MiningProcess: produce sequential pattern/conditional prefix sub-pattern from WAP tree/conditional WAP tree.
5. DATA STRUCTURE
three struct are used in this program:
(1) the node struct indicates a WAP node which contains the information:
a.the event name
b.the number of occurrence of the event
e. the linkage to next node same event name in WAP tree
d. a pointer to it's left son
e. a pointer to it's rights sibling
f. a pointer to it's parent
(2) a linkage struct
descibed in the program .
6. ADDTIONAL INFORMATION:
The run time is display on the screen with start time, end time and total
seconds for running the program.
*/
#include
#include