MongoDB Improves Big Data Analysis Performance on
Electric Health Record System
12111* Wei Xu, Zhonghua Zhou, Hong Zhou, Wu Zhang, Jiang Xie
1 School of Computer Engineering and Science, Shanghai University 2 Shanghai University Hospital
Abstract. Electronic Health Record system has been widely used in different
occasions such as hospitals, health welfare institutions and education
institutions. However, the data structures of health information are usually very
complicated and unstructured. It is hardly to deal with the health data for the
general relational databases. We build a Nosql-based EHR system named
hanghai University Electronic Health Record System (SHU-EHR) for health S
data management and analysis with MongoDB. The experiments demonstrate
that the performance of SHU-EHR is far better than the SQL-based EHR
Keywords: Electronic Health Record, Nosql, MongoDB
An electronic health record (EHR) is a systematic collection of electronic health information about an individual patient or population . With the development of information technology, EHR system becomes more and more popular in hospitals, health welfare institutions and education institutions. Investigators contributed themselves to EHR system [2-4] and many EHR systems are developed by commercial companies such as Cerner , Mckensson , eChlinicalWorks , Allscripts altenahealth  etc. However, most of the EHR systems are built on SQL database which can hardly handle big data in short time.
MongoDB [9-10], which is written in C++, is an open-source document database rather than a traditional relational database. It is the leading NoSQL database so far. Different with SQL databases, MongoDB provides weak consistency guarantees so that it has better performance in big data management and analysis than SQL database. Some people have used MongoDB in different areas [11-13].
Shanghai University Electronic Health Record System (SHU-EHR) is a Nosql-based EHR system that is built on MongoDB. It includes 13 different types of student health data such as physical examination records, medical records, and so on. This paper introduces the architecture, database component and data synchronization
* the corresponding author
method of SHU-EHR. Two experiments are conducted on SHU-EHR to compare the performance of SQL database and MongoDB database.
The basic architecture of SHU-EHR is introduced. SHU-EHR adopts .NET MVC framework and Entity Framework so as to keep maintainability. Figure 1 shows the basic architecture. Statistic Chart ViewHealth Data ViewEngine
Health Record Security Guard
ModelSecurity ModelData Model
Fig. 1. Architecture of SHU-EHR
SHU-EHR is mainly consists of three parts. The model part includes two main parts which are the security model and the data model. The security model is one of the most important parts of SHU-EHR, because the health record of each person is very private. This model is in the kernel of SHU-HER, which is used to keep all the health data safe. The data model is built to maintain all the 13 different types of health records. This model can keep the data in a uniformed format so that the data can be easily shared.
The middle part of SHU-EHR is controller. This part handles all the data logic and user requests. The health record security guard is the basic component of the controller. All the requests are protected by this component and the user access logs are recorded by the security core. The data logic part dispatches all the requests to different controller instances and process query, computational requests. On the top, it is the view part. Health data view displays all the data details of 13 different kind of health records. Statistic chart engine is used to show computational results with various data charts such as line chart, bar chart, pie chart and polar chart. With the help of chart engine the system data managers and department leaders can easily understand the whole health conditions.
3 Mixture Database
Fig. 2. Mixture Database of SHU-EHR. SHU-EHR uses two different kinds of databases. The SQL database is used for user roles identification and the Nosql database is used for data storage and data query.
SHU-EHR uses mixture database. Figure 2 shows the database architecture of SHU-EHR. The security model and the data model in section 2 are mapped to different databases. The security model is mapped to the SQL database because SHU-EHR implements the Microsoft AspNet Identity which is stable, reliable and security. The data model is mapped to the Nosql database that can make queries and calculations much faster.
3.1 SQL Based Component
The SQL database of SHU-EHR plays the key role for security reasons. This database includes the user authority information, system configuration, system logs and the original health data. The user authority module implies the Microsoft AspNet Identity model which includes profile support, OAuth integration and works with open web interface for .NET (OWIN) . With the help of this module, SHU-EHR offers many useful data interfaces and web APIs for different occasions.
The other security information is the system configuration and running logs. System configuration controls the whole system and the running logs traces user operations. Both of these information are all stored in the SQL database. Usually these information should not be accessed by the normal user.
For some reason, the original data are firstly stored in the SQL database and then SHU-EHR transfers the data to Nosql database. Figure 3 shows the SQL database component of SHU-EHR.
System ConfigUser AuthorityImported Dataand Logs
Fig. 3.SQL database component of SHU-EHR.
3.2 Nosql Based Component
The Nosql database of SHU-EHR is showed in figure 4. This database stores two kinds of data, the users’ health data and the data statistic results. Because most of the query requests are about user data and the statistic results that are high-dimension and with complex relationships, it is difficult for SQL databases to response in a short time when it comes to big data. MangoDB can address this problem.
Health DataStatistic Results
Fig. 4.Nosql database component of SHU-EHR.
3.3 Multi Databases Synchronization
The health data in the Nosql database are imported from SQL database. SHU-EHR has two different interfaces for data synchronization. One is the synchronous interface, which is used to synchronize user health information as soon as new data is inserted into SQL database. If the synchronous interface load failed, the unloaded data will be handled by the asynchronous interface. When the system is free, the asynchronous interface reloads the unloaded health data. All the data are defined as SHU-HER health record object and then transferred in the uniform format between SQL and Nosql databases. At the same time, the user access log and data transfer logs are stored into Nosql database by synchronous interface. Figure 5 shows the two interfaces.
Fig. 5. Data synchronization of SHU-EHR.
In this section we conduct two experiments with the same health data to compare query performance and statistic performance of SQL database and Nosql database. The SQL database is Microsoft SQL Server v11.00.2100 and the Nosql database is MongoDB v2.4.9. Both of the experiments are conduct on the same computer with 4GB memory, Intel core i3 3.4GHz dual-core processor and Windows 8.1 operating system.
4.1 Query Performance
Table 1 displays the query performance of these two different databases. We searched the top 10 records of the total data. As the number of total data increases from 5000 to 1000000, SQL database query time increases by over 100 times (from 179ms to 20148ms), while the MongoDB query time only increase by 2 times (from 4ms to 7ms). Figure 6 shows the same results of the query performance comparison. Because the difference of the two query time is too large, and the query time of MongoDB is almost zero, the y-axis of this figure is converted by equation 1.
y= 10Log (Query Time). (1)
Table 1. Data Query Performance between SQL and Nosql Databases.
Data Number SQL Time(ms) Nosql Time(ms)
5K 179.3558 4.0036
10K 306.2019 4.0027
20K 633.4176 6.0043
50K 1255.828 5.0045
100K 3026.469 5.0036
200K 6065.009 6.0046
500K 20148.33 7.0043
1000K N/A 8.0056
Fig. 6. Query performance between SQL and Nosql Databases.
4.2 Statistic Performance
Table 2 shows the calculation time of the two databases. In this experiment we compute the record number of 10 different groups. When the data is small, the two databases have the same performance. However, when the amount of the records increases, the computing time of SQL database increases much faster than Nosql database. Figure 7 displays the experiment result of the statistic performance of SQL and MangoDB.
Table 2. Calculation Performance between SQL and Nosql Databases.
Data Number SQL (ms) Nosql (ms)
5K 269 300.7302
10K 341.0455 154.1017
20K 805.5341 296.1966
50K 1325.8772 490.3232
100K 2570.7013 989.6535
200K 8056.3307 1935.2803
500K 15198.039 4342.87
1000K N/A 7919.1565
Fig. 7. Calculation performance between SQL and Nosql Databases.
Using MongoDB in SHU-EHR greatly improves both query and statistic performance of the system. Thanks to MongoDB, SHU-EHR offers many APIs for big data analysis with MongoDB data analysis methods. Investigators who are not familiar with system coding can easily use SHU-EHR for big health data analysis. Acknowledgement
This research is partially supported by the Specialized Research Fund for the Doctoral Program of Higher Education [SRFDP 20113108120022], the Key Project of Science and Technology Commission of Shanghai Municipality [No. 11510500300], and the Major Research Plan of NSFC [No. 91330116].
1. Gunter T D, Terry N P. The emergence of national electronic health record architectures
in the United States and Australia: models, costs, and questions [J]. Journal of Medical
Internet Research, (2005).
2. Lowry S Z, Quinn M T, Ramaiah M, et al. Technical evaluation, testing and validation of
the usability of electronic health records [J]. National Institute of Standards and
3. Baron R J, Fabens E L, Schiffman M, et al. Electronic health records: just around the
corner? Or over the cliff? [J]. Annals of internal medicine, (2005).
4. Tang P C, Ash J S, Bates D W, et al. Personal health records: definitions, benefits, and
strategies for overcoming barriers to adoption [J]. Journal of the American Medical
Informatics Association, (2006).
5. Cerner ,http://www.cerner.com/
6. McKesson, http://www.mckesson.com/
7. eClinicalWorks, http://www.eclinicalworks.com/
8. Allscripts altenahealth ,https://www.allscripts.com/international1.html 9. MongoDB, https://www.mongodb.org/
10. C.O. Truica, A. Boicea, I. Trifan, CRUD Operations in MongoDB, Adv Intel Sys Res, 41
11. D.I. Cogean, M. Fotache, V. Greavu-Serban, Nosql In Higher Education. A Case Study,
Int Conf Inform Econ, (2013).
12. J.H. Yang, W.Y. Ping, L. Liu, Q.P. Hu, Memcache and MongoDB based GIS Web
Service, Second International Conference on Cloud And Green Computing / Second
International Conference on Social Computing And Its Applications (Cgc/Sca 2012),
13. D. Dykstra, Comparison of the Frontier Distributed Database Caching System to NoSQL
Databases, International Conference on Computing In High Energy And Nuclear Physics
2012 (Chep2012), (2012).
14. ASP.NET Identity, http://www.asp.net/identity