Design and Implementation of Domain-specific Business Information Search System in Electronic Commerce Environment(1)

By Pauline Parker,2014-08-06 14:35
8 views 0
Abstract: In Electronic Commerce (EC) environment, the quality of business information directly affects the level of enterprise operations. This paper analyses the common methods of business information retrieval in EC environment, and design a software system which can gather business information in internet automatically and extract business information demanded by enterprise from database directly. The system adopts meta-search engine to extend search range, and applies information retrieval, web mining and agent technology to analyze and filter the business information, improved the search quality of business information. Key Words: Electronic Commerce (EC); Business information; Information Retrieval (IR); Meta-Search Engine (MSE)

Design and Implementation of Domain-specific Business Information

    Search System in Electronic Commerce Environment

    111, 21Ruijun Xia, Qing Wang, Dingwei Wang, Lili Liu

    1. Institute of System Engineering, Information College, Northeastern University, Shenyang, 110004


    2. Modern Logistics Center, Shanghai University, Shanghai, 200072, China

    Abstract: In Electronic Commerce (EC) environment, the quality of business information directly affects the level of enterprise operations. This paper analyses the common methods of business information retrieval in EC environment,

    and design a software system which can gather business information in internet automatically and extract business information demanded by enterprise from database directly. The system adopts meta-search engine to extend search range, and applies information retrieval, web mining and agent technology to analyze and filter the business information, improved the search quality of business information.

    Key Words: Electronic Commerce (EC); Business information; Information Retrieval (IR); Meta-Search Engine


    ; ? Using specialized search engines to search business 1 INTRODUCTION information, it has a good search result, but the amount of

    results is limited, and relies on the database of their site. In recent years, the application of Electronic Commerce

    In view of the above problems, this paper designs a business (EC) becomes more and more widespread. The enterprises

    information search system in Electronic Commerce need more and more business information such as raw

    environment, which can gather business information in material, product, supplier and customer, and use this

    internet automatically and extract business information information to provide information support for

    demanded by enterprise form database directly. The system decision-making of enterprise. So, whether or not the

    adopts meta-search engine which can be integrated with enterprise in Electronic Commerce environment would

    several General-purpose Search Engines(GSE) to extend access to the accurate, comprehensive and necessary

    search range and improve the recall, and applies business information in time will bear on the success and

    information retrieve, web mining and agent technology to failure of Electronic Commerce operation. The enterprise

    analyze and filter business information, extract customer, must go beyond the relatively narrow operation

    supplier and product information which has potential value environment in the past, collect and use business

    to enterprise, improved the precision. information effectively.

    2 DOMAIN-SPECIFIC BUSINESS In Electronic Commerce environment, the main methods to

    INFORMATION SEARCH search business information for enterprise are as following:

    ? Using General-purpose Search Engine (GSE) to Search There are many scholars researched for the [2-6]business information, it covers a wide range of business domain-specific business search. Paper [2] proposed an information but contains too many irrelevant pages, results agent-based framework for dynamic information retrieve in a low precision, and could not meet the personalization process to manage the business status intelligently and requirements of user. dynamically. Paper [3] presented a method to build ? Logging in web site of enterprise to search business personalized domain-specific search engine, adopted information, it can get information accurately such as type domain-based grading thesauruses and Chinese and price of products of this enterprise, but the search range segmentation algorithm with disambiguation mechanism is very limited and also result in a low recall. to ensure high accuracy, and adopted retrospective, state ? Logging in large business portal website to search memory and linear nature of segmentation algorithm to business information, it contains a lot of product ensure engines efficiency. Paper [4] proposed a Hopfield information, but not all the enterprises issue their product neural network based business search algorithm, a set of information to this website, so in comparison with entire extended query terms are generated automatically by business information in Internet, the amount of Hopfield neural network in accordance with the query information in these site are very limited, and could not keywords the users input. Searching general-purpose meet the requirements of enterprise. search engine with those extended query terms can extend

     search range and improve search precision. Paper [5] This work is supported by National Nature Science Foundation under proposed a Bayesian Network (BN) based business Grant 74105110, Innovative Research Team Project of National Natural information retrieve model, in this model the customized Science Foundation under grant 60821063.

query requirement of enterprise is expressed in terms of the 3.2 Architecture based on MSE

    predefined illustrative documents related to business This system adopts MSE based architecture as shown in domain. The similarities between the documents and the Fig 1. It is divided into 3 main modules, including query are evaluated with the conditional probabilities meta-search and system search module, user search module among the nodes in the BN. Paper [6] proposed a method and user interaction module. Each module includes various for building Domain-specific search engine based on sub-modules. Meta-Search Engine (MSE) on internet, It selects keywords [8] This system applies Luceneas database to enhance by the Odds Ratio (OR) method and weights them by the indexing and retrieve functions. Lucene is a full-text TF-IDF method. Domain query expression is derived by the indexing tool wrap based on Java, which provides a number Decision Tree (DT) method. Finally, it ranks the returned of API functions and flexible data storage structure(can be documents by the Extended Boolean Model. The method customized), and can be easily embedded into various can effectively remedy the drawbacks of KS method and applications to achieve or enhance indexing and retrieval can perform better in terms of precision and recall. functions. Being different from other databases, Lucene Based on the study and analysis to existing theory research, stores information in the form of index file, and the retrieve this paper designs and implements a domain-specific speed quicker than other databases. In addition, it doesn’t business search software which adopted MSE as framework, adopt B-tree structure which cause a large number of IO applies the theory to practical system in the form of operation as updating index, but creates a new index file, modularization and then merges these small index files into a large one, so

    as to enhance the indexing efficiency without affecting the 3 DESIGN OF DOMAIN-SPECIFIC

    search efficiency. BUSINESS INFORMATION SEARCH This system sets up index and achieves user search SYSTEM functions through APIs provided by Lucene primarily. We In order to help business person to get information such as could also add some information retrieve model (such as commodity, supplier and customer, and provide reference Hopfield neural network based information retrieve model [4]for further inquiry and commodity pricing, the system is ) to the user search module, enable user to get more designed to collect business information required by precise business information. enterprise in internet automatically according to the

    character of business information.

    3.1 Main functions of the system

    ? Meta-search engine function: The user can enter

    several keywords belonged to the field of business, search

    business information from several GSE, remove duplicated

    and invalid pages, parse pages, and extract the abstract or

    full text of the pages.

    ? System search function: The system can gather

    relevant information regularly and automatically in

    internet according to the pre-determined system search

    keywords and search time, and deposit them in the


    ? User search function: The system can retrieve the

    database according to query statement entered by user. As

    business information required by user, the retrieve results Fig 1. Basic architecture of business information search system will be returned to the user.

     ? User interaction function: The user can achieve basic

    operation such as input, output and parameter setting, feed 3.3 Design of the system module back evaluation information about the query results, modify

    the parameters of retrieve model, and modify query Fig 2 shows the detailed function of the system. In order to expression to adapt to the changes of network environment achieve management function of the system, we add the and information requirement, so as to get more accurate system management module. The detailed functions of and valuable business information. each module are as following.

    ? System management function: The user can achieve 3.3.1 Meta-search and system search module the initialization for the necessary relevant parameters,

    define the update strategy of business information, record This module includes 3 sub-modules: domain-specific the users visits to the system and information update time, expression sub-module, search engine agent sub-module and set user access rights. and information extraction sub-module. The functions of

    each sub-module are as following.

    ? Domain-specific expression module: (a) Selecting

domain-specific keywords: adopting the Odds Ratio (OR) information retrieve models such as KS method based [6]method to select domain-specific keywords according to , Bayesian network business information retrieve model [5]sample documents, and weighting the domain-specific based business information retrieval model ).

    keywords. (b) Generating domain query expression: using 3.3.2 User search module domain-specific keywords to structure domain query

    expression, and modifying domain query expression This module includes 2 sub-modules: query statement according to the modification information fed back by processing sub-module and information retrieve module. users. The functions of each sub-module are as following. ? Search engine agent sub-module: (a) Structuring (1) Query statement processing sub-module: Lexical query URL: Structuring query URL according to the system analysis to the query statement entered by user with an search keywords and domain-specific expression, and objective of extracting keywords, then submitting the submitting query URL simultaneously to several individual particular logical expression composed of keywords to the GSE according to HTTP protocol. The purpose is to collect query parser of Lucene.

    a large number of domain-specific business information. (b) (2) Information retrieve sub-module: (a) Query engine: Analyzing webpage: Getting search result pages according Calling the IndexSearcher class of Lucene to retrieve from to HTTP protocol, analyzing the links of these pages, business information database according to the query removing invalid and duplicate pages, and saving expression, constituting results set with all records got from remained pages to the buffer. the database. (b) Query result processing: In accordance ? Information extraction sub-module: (a) Parsing page: with certain algorithm filtering and ranking the query Using HTML Parser to parse pages, removing the HTML results, as business information return to enterprise user. tags, and extracting summary or full-text. (b) Document 3.3.3 User interaction module preprocessing: Removing punctuations, stop words, etc,

    and extracting some important nouns and verbs as index This module includes 2 sub-modules: User interface terms. (c) Weighting the index terms: Adopting the sub-module and User feedback sub-module. The functions TF-IDF method to weight the index terms. (d) Similarity of each sub-module are as following.

    calculation: Structuring user query vector and pages vector, (1) User interface sub-module: (a) Submitting input using the cosine of angle between two vectors to express the information: Users submit the system search keywords to similarity between user query and pages, and select the the search engine agent module, and submit user query pages with a degree of similarity above a certain threshold statement to the query statement processing module. (b) as business information. (e) Page information extraction: Submitting feedback information: Users evaluate the Further processing the selected pages, and extracting some degree of similarity between retrieve results and query important information such as URL, title, summary and requirement, and submit the evaluation information to the update time. (f) Creating index: Using the IndexWriter user feedback module. (c) Displaying result information: class of Lucene to create index for extracted page Outputting the business information required by enterprise information so as to be retrieved by enterprise user. The user through the visual interface.

    index structure of business information as shown in Table1. (2) User feedback sub-module: (a) evaluation result

    analysis: According to the evaluation result information, Table1.Index structure of business information analyzing the retrieve results, calculating the relevant

    FIELD INDEX TOKENIZED STORE parameters, or directly putting the retrieve results into

    sample document database as business information Document ID NO NO YES samples. (b) Domain-specific expression modification: Page URL NO NO YES According to the analysis result, setting the way of Page Title YES YES YES modification of domain-specific expression, such as Page Summary YES YES YES modifying the weights of domain-specific keywords, Page Update Time YES NO YES updating the domain-specific expression, etc. (c) Search

    keywords modification: According to the analysis result, [8]modifying user search keywords. The Vector Space Model with query expansion function

    is applied in this module (could also apply other

     Fig 2.Detailed function framework of business information search system

    SystemSearch package is described with UML class 3.3.4 System management module

    diagram, as shown in Fig 4 and Fig 5 respectively. The This module includes 4 sub-modules: system initialization classes in PageParsing are responsible for parsing pages, sub-module, information updating sub-module, log while the classes in SystemSearch package call the management sub-module, and user management WebParserWrapper class in PageParsing package to sub-module. The functions of each sub-module are as achieve other functions of this module. following

    (1) System initialization sub-module: Setting the

    necessary relative parameters, removing the data of

    business information database, etc.

    (2) Information updating sub-module: Setting update

    strategy, update time and update mode of the business


    (3) Log management sub-module: Recording some

    information such as users visit to the system, information

    updating time, etc.

    (4) User management sub-module: Managing basic

    information of the users, setting user access rights, etc. Fig 4.Class diagram of the PageParsing package 4 IMPLEMENTATION OF DOMAIN-SPECIFIC BUSINESS


    In order to verify the design effect, a business information search system has been developed, which aim to the

    domain of auto parts and integrate with 3 famous GSE:

    Google, Baidu and Sougou. We adopted Java language, and

    applied the JDK1.6 as Java Virtual Machine (JVM) and

    Eclipse 3.2 as development platform in the system.

    4.1 Class design of main module of the system

    The entire program of the system is divided into 4 packages: PageParsing, SystemSearch, UserSearch and

    MainInterface. The PageParsing package and the

    SystemSearch package are used for implementation of

    meta-search and system search functions, the UserSearch

    package is used for implementation of user search function,

    and the MainInterface package is used for implementation

    of user interaction and system management functions. The inter-relationship between each package is described with Fig 5.Class diagram of the SystemSearch package UML package diagram, as shown in Fig 3

    4.2 Operation effect of the system

    Fig 6 shows an interface that the system collects business

    information from the Internet. User can choose GSE and

    set search keywords, for example: we choose Baidu GSE

    and set 汽油发动机 as keywords, then the search results

    as shown in Fig 6.

    The system can also achieve automatic search function, the

    steps are as following:

    (1) The user sets several search keywords and search time

    (such as the rest time) on the automatic search interface, . and saves them to the database. Fig 3.Package diagram of the domain-specific business search system (2) According to the search time, the system extracts search keywords from the database periodically and The core module of the system is meta-search and system submits them to three GSE respectively. search module. This module is taken as an example to (3) The system analyzes and filters the search results and describe the implementation method of the system. The automatically saves domain-specific business information relationship of the classes in PageParsing package and

    to the database. 5 CONCLUSION

     Aim at the business information search problem in Electronic Commerce, this paper designed a software system which can automatically gather business information in internet and conveniently extract information demanded by enterprise from database at any time, and implemented with Java development tools. This

    system adopts meta-search engine to extend search range, and applies information retrieval, web mining and agent technology to analyze and filter the business information, improved the search quality of business information. This system can be embedded into existing management information system of enterprise. Through collecting business information in internet continually and setting up huge business information database, it could provide comprehensive and accurate information support for decision-making of enterprise in electronic commerce environment. Fig 6.Interface of system search


    Fig.7 shows an interface that the system retrieves business [1] Wang Qing, Wang Zheng, Wang Dingwei1, Application of information from the database. User can simultaneously Web Mining in Business, Computer Engineering, Vol.34, enter several keywords for Boolean query; filter the search No.11, 197-199, 2008.

    results according the update time of the pages and rank the [2] Hua Hu, Bin Xu, An Agent-based Framework for Intelligent search results according to the relevance or update time. and Dynamic Business Information Retrieval, Workshop on

    Intelligent Information Technology Application, DOI


    [3] Lei Zhang, Yong Peng, Xiangwu Meng, and Jie Guo,

    Personalized Domain-specific Search Engine, Industrial

    Informatics, 1308-1313, 2008.

    [4] Zheng Wang, Qing Wang, Dingwei Wang, Searching

    Business Information with Hopfield Neural Network in

    Electronic Commerce Environment, International

    Conference on Bio-Inspired Computing: Theories and

    Applications (BIC-TA 2007), 552-554,2007.

    [5] Zheng Wang, Qing Wang, Ding-Wei Wang, Bayesian

    network based business information retrieval model,

    Knowledge and Information Systems, DOI

    10.1007/s10115-008-0151-5, 2008.

    [6] Zheng Wang, Qing Wang, DingWei Wang, Application of

    Domain-Specific Search Method in Meta-Search Engine on

    Internet, IMACS Multi-conference on Computational

    Engineering in Systems Applications, 2078-2085, 2006. Fig.7.Interface of user search [7] Otis Gospodnetic, Erik Hatcher, LUCENE IN ACTION, OREILLY & ASSOCIATES INC, 2005. This system is a previous experimental system. Though the [8] Ricardo B Y, Berthier R N, Modern Information Retrieval, interface is relatively simple, all the module functions of China Machine Press, Beijing, China, 2004. business information search system have been achieved,

     and sufficient to verify the effectiveness of system design.

Report this document

For any questions or suggestions please email