mongodb improves big data analysis performance on electric health record system v0401

By Patricia Robertson,2014-10-16 18:37
7 views 0
mongodb improves big data analysis performance on electric health record system v0401

    MongoDB Improves Big Data Analysis Performance on

    Electric Health Record System

    12111* Wei Xu, Zhonghua Zhou, Hong Zhou, Wu Zhang, Jiang Xie

     1 School of Computer Engineering and Science, Shanghai University 2 Shanghai University Hospital


    Abstract. Electronic Health Record system has been widely used in different

    occasions such as hospitals, health welfare institutions and education

    institutions. However, the data structures of health information are usually very

    complicated and unstructured. It is hardly to deal with the health data for the

    general relational databases. We build a Nosql-based EHR system named

    hanghai University Electronic Health Record System (SHU-EHR) for health S

    data management and analysis with MongoDB. The experiments demonstrate

    that the performance of SHU-EHR is far better than the SQL-based EHR


    Keywords: Electronic Health Record, Nosql, MongoDB

    1 Introduction

    An electronic health record (EHR) is a systematic collection of electronic health information about an individual patient or population [1]. With the development of information technology, EHR system becomes more and more popular in hospitals, health welfare institutions and education institutions. Investigators contributed themselves to EHR system [2-4] and many EHR systems are developed by commercial companies such as Cerner [5], Mckensson [6], eChlinicalWorks [7], Allscripts altenahealth [8] etc. However, most of the EHR systems are built on SQL database which can hardly handle big data in short time.

    MongoDB [9-10], which is written in C++, is an open-source document database rather than a traditional relational database. It is the leading NoSQL database so far. Different with SQL databases, MongoDB provides weak consistency guarantees so that it has better performance in big data management and analysis than SQL database. Some people have used MongoDB in different areas [11-13].

    Shanghai University Electronic Health Record System (SHU-EHR) is a Nosql-based EHR system that is built on MongoDB. It includes 13 different types of student health data such as physical examination records, medical records, and so on. This paper introduces the architecture, database component and data synchronization

     * the corresponding author

    method of SHU-EHR. Two experiments are conducted on SHU-EHR to compare the performance of SQL database and MongoDB database.

    2 Architecture

    The basic architecture of SHU-EHR is introduced. SHU-EHR adopts .NET MVC framework and Entity Framework so as to keep maintainability. Figure 1 shows the basic architecture. Statistic Chart ViewHealth Data ViewEngine

    Data Logic


    Health Record Security Guard

    ModelSecurity ModelData Model

    Fig. 1. Architecture of SHU-EHR

    SHU-EHR is mainly consists of three parts. The model part includes two main parts which are the security model and the data model. The security model is one of the most important parts of SHU-EHR, because the health record of each person is very private. This model is in the kernel of SHU-HER, which is used to keep all the health data safe. The data model is built to maintain all the 13 different types of health records. This model can keep the data in a uniformed format so that the data can be easily shared.

    The middle part of SHU-EHR is controller. This part handles all the data logic and user requests. The health record security guard is the basic component of the controller. All the requests are protected by this component and the user access logs are recorded by the security core. The data logic part dispatches all the requests to different controller instances and process query, computational requests. On the top, it is the view part. Health data view displays all the data details of 13 different kind of health records. Statistic chart engine is used to show computational results with various data charts such as line chart, bar chart, pie chart and polar chart. With the help of chart engine the system data managers and department leaders can easily understand the whole health conditions.

3 Mixture Database

    Fig. 2. Mixture Database of SHU-EHR. SHU-EHR uses two different kinds of databases. The SQL database is used for user roles identification and the Nosql database is used for data storage and data query.

    SHU-EHR uses mixture database. Figure 2 shows the database architecture of SHU-EHR. The security model and the data model in section 2 are mapped to different databases. The security model is mapped to the SQL database because SHU-EHR implements the Microsoft AspNet Identity which is stable, reliable and security. The data model is mapped to the Nosql database that can make queries and calculations much faster.

    3.1 SQL Based Component

    The SQL database of SHU-EHR plays the key role for security reasons. This database includes the user authority information, system configuration, system logs and the original health data. The user authority module implies the Microsoft AspNet Identity model which includes profile support, OAuth integration and works with open web interface for .NET (OWIN) [14]. With the help of this module, SHU-EHR offers many useful data interfaces and web APIs for different occasions.

    The other security information is the system configuration and running logs. System configuration controls the whole system and the running logs traces user operations. Both of these information are all stored in the SQL database. Usually these information should not be accessed by the normal user.

    For some reason, the original data are firstly stored in the SQL database and then SHU-EHR transfers the data to Nosql database. Figure 3 shows the SQL database component of SHU-EHR.

    System ConfigUser AuthorityImported Dataand Logs

    Security Model

    SQL Database

Fig. 3.SQL database component of SHU-EHR.

    3.2 Nosql Based Component

    The Nosql database of SHU-EHR is showed in figure 4. This database stores two kinds of data, the users’ health data and the data statistic results. Because most of the query requests are about user data and the statistic results that are high-dimension and with complex relationships, it is difficult for SQL databases to response in a short time when it comes to big data. MangoDB can address this problem.

    Health DataStatistic Results

    Data Model

    Nosql Database

Fig. 4.Nosql database component of SHU-EHR.

    3.3 Multi Databases Synchronization

    The health data in the Nosql database are imported from SQL database. SHU-EHR has two different interfaces for data synchronization. One is the synchronous interface, which is used to synchronize user health information as soon as new data is inserted into SQL database. If the synchronous interface load failed, the unloaded data will be handled by the asynchronous interface. When the system is free, the asynchronous interface reloads the unloaded health data. All the data are defined as SHU-HER health record object and then transferred in the uniform format between SQL and Nosql databases. At the same time, the user access log and data transfer logs are stored into Nosql database by synchronous interface. Figure 5 shows the two interfaces.

Fig. 5. Data synchronization of SHU-EHR.

    4 Experiments

    In this section we conduct two experiments with the same health data to compare query performance and statistic performance of SQL database and Nosql database. The SQL database is Microsoft SQL Server v11.00.2100 and the Nosql database is MongoDB v2.4.9. Both of the experiments are conduct on the same computer with 4GB memory, Intel core i3 3.4GHz dual-core processor and Windows 8.1 operating system.

    4.1 Query Performance

    Table 1 displays the query performance of these two different databases. We searched the top 10 records of the total data. As the number of total data increases from 5000 to 1000000, SQL database query time increases by over 100 times (from 179ms to 20148ms), while the MongoDB query time only increase by 2 times (from 4ms to 7ms). Figure 6 shows the same results of the query performance comparison. Because the difference of the two query time is too large, and the query time of MongoDB is almost zero, the y-axis of this figure is converted by equation 1.

    y= 10Log (Query Time). (1)

Table 1. Data Query Performance between SQL and Nosql Databases.

    Data Number SQL Time(ms) Nosql Time(ms)

    5K 179.3558 4.0036

    10K 306.2019 4.0027

    20K 633.4176 6.0043

    50K 1255.828 5.0045

    100K 3026.469 5.0036

    200K 6065.009 6.0046

    500K 20148.33 7.0043

    1000K N/A 8.0056

Fig. 6. Query performance between SQL and Nosql Databases.

    4.2 Statistic Performance

    Table 2 shows the calculation time of the two databases. In this experiment we compute the record number of 10 different groups. When the data is small, the two databases have the same performance. However, when the amount of the records increases, the computing time of SQL database increases much faster than Nosql database. Figure 7 displays the experiment result of the statistic performance of SQL and MangoDB.

    Table 2. Calculation Performance between SQL and Nosql Databases.

    Data Number SQL (ms) Nosql (ms)

    5K 269 300.7302

    10K 341.0455 154.1017

    20K 805.5341 296.1966

    50K 1325.8772 490.3232

    100K 2570.7013 989.6535

    200K 8056.3307 1935.2803

    500K 15198.039 4342.87

    1000K N/A 7919.1565

Fig. 7. Calculation performance between SQL and Nosql Databases.

    5 Conclusion

    Using MongoDB in SHU-EHR greatly improves both query and statistic performance of the system. Thanks to MongoDB, SHU-EHR offers many APIs for big data analysis with MongoDB data analysis methods. Investigators who are not familiar with system coding can easily use SHU-EHR for big health data analysis. Acknowledgement

    This research is partially supported by the Specialized Research Fund for the Doctoral Program of Higher Education [SRFDP 20113108120022], the Key Project of Science and Technology Commission of Shanghai Municipality [No. 11510500300], and the Major Research Plan of NSFC [No. 91330116].


    1. Gunter T D, Terry N P. The emergence of national electronic health record architectures

    in the United States and Australia: models, costs, and questions [J]. Journal of Medical

    Internet Research, (2005).

    2. Lowry S Z, Quinn M T, Ramaiah M, et al. Technical evaluation, testing and validation of

    the usability of electronic health records [J]. National Institute of Standards and

    Technology, (2012).

    3. Baron R J, Fabens E L, Schiffman M, et al. Electronic health records: just around the

    corner? Or over the cliff? [J]. Annals of internal medicine, (2005).

    4. Tang P C, Ash J S, Bates D W, et al. Personal health records: definitions, benefits, and

    strategies for overcoming barriers to adoption [J]. Journal of the American Medical

    Informatics Association, (2006).

    5. Cerner ,

    6. McKesson,

    7. eClinicalWorks,

    8. Allscripts altenahealth , 9. MongoDB,

    10. C.O. Truica, A. Boicea, I. Trifan, CRUD Operations in MongoDB, Adv Intel Sys Res, 41


    11. D.I. Cogean, M. Fotache, V. Greavu-Serban, Nosql In Higher Education. A Case Study,

    Int Conf Inform Econ, (2013).

    12. J.H. Yang, W.Y. Ping, L. Liu, Q.P. Hu, Memcache and MongoDB based GIS Web

    Service, Second International Conference on Cloud And Green Computing / Second

    International Conference on Social Computing And Its Applications (Cgc/Sca 2012),


    13. D. Dykstra, Comparison of the Frontier Distributed Database Caching System to NoSQL

    Databases, International Conference on Computing In High Energy And Nuclear Physics

    2012 (Chep2012), (2012).

    14. ASP.NET Identity,

Report this document

For any questions or suggestions please email