Open Access
Subscription Access
Open Access
Subscription Access
A General Model for Sequential Pattern Mining with a Progressive Database
Subscribe/Renew Journal
Although there have been many recent studies on the mining of sequential patterns in a static database and in a database with increasing data, these works, in general, do not fully explore the effect of deleting old data from the sequences in the database. When sequential patterns are generated, the newly arriving patterns may not be identified as frequent sequential patterns due to the existence of old data and sequences. Even worse, the obsolete sequential patterns that are not frequent recently may stay in the reported results. In practice, users are usually more interested in the recent data than the old ones. To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted, or deleted. In addition, we present a progressive algorithm Pisa, which stands for Progressive mining of Sequential patterns, to progressively discover sequential patterns in defined time period of interest (POI). The POI is a sliding window continuously advancing as the time goes by. Pisautilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of POI, thereby effectively limiting the memory space required by Pisathat is significantly smaller than the memory needed by the alternative method, Direct Appending (DirApp). Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing Start time and End time of the POI, Pisacan easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed. The experimental results show that Pisanot only significantly outperforms the prior methods in execution time by orders of magnitude but also possesses graceful scalability.
Keywords
Progressive Sequential Pattern.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 374
PDF Views: 2