Seasonality and software vulnerabilities in major database management systems

Popularity and marketshare are very important index for software users and vendors since more popular systems tend to engage better user experience and environments. periodical fluctuations in the popularity and marketshare could be vital factors when we estimate the potential risk analysis in target systems. Meanwhile, software vulnerabilities, in major relational database management systems, are detected every now and then. Today, all most every organizations depend on those database systems for store and retrieve their any kinds of informations for the reasons of security, effectiveness, etc. They have to manage and evaluate the level of risks created by the software vulnerabilities so that they could avoid potential losses before the security defects damage their reputations. Here, we examine the seasonal fluctuations with respect to the view of software security risks in the four major database systems, namely MySQL, MariaDB, Oracle Database and Microsoft SQL Server.


I. INTRODUCTION
G NERALLY speaking database management systems are tools which provide us storages where we can put and get any kinds of data. Even though we are well recognized that invaluable data should be stored somewhere else where others are not able to be accessed, if we need to reach that information from the Internet ourselves, technically there must be a possibility that others also reach to the data, when something goes wrong. While we store and retrieve information from the database systems, they provide a lot of handy concepts in terms of security, effectiveness, multiuser environment, etc. Database management systems such as MySQL, MariaDB, Oracle Database and Microsoft SQL Server are considered as de facto standard in the database industries.
Meantime, a software security vulnerabilities are a weakness factors created by developers mistakenly that could be misused by malicious users, so that they could cause harm or loss in the security system [1]. Sometimes those security bugs are traded in the black hat communities [2]. Software security vulnerabilities which is publicly revealed and not fixed are present security risks for the users and their organizations. A possible security risk could create a lot of damages for stakeholders' reputations.
When software systems are developed, security vulnerabilities are generated frequently by coding mistakes. Once those security vulnerabilities are created, they might be found by variable reasons, such as many kinds of software tests or from normal software usages. Or some vulnerabilities could not be found until their host software systems are out of the market if we are lucky. When the software vulnerabilities are discovered by white hats or benign users, the security bugs are reported to the software vendors directly. The vendors then should create the corresponding patch and release to the users as soon as possible to minimize the bad impact. [3] in their paper shows that it usually takes about forty five days for delivering patches.
On the other hand, when a black hat or malicious user discover a security vulnerability, then the possibilities that the bug will be exploited will be high. Most of the time it is because of the economic reasons or their reputations in the black hat communities. The vulnerability detecting rate is able to be estimated by vulnerability discovery models in some degree for a target software systems when we have sufficient data sets [4]. Also malicious users and black hats are frequently share the information of vulnerabilities they found, and that could lead fertile security bridges which put a number of software users in danger. As a result, known vulnerabilities without proper patches create disorder in the related industry sectors. Fortunately, when we have a patch with no defects, then we expect the death of the vulnerability. However, occasionally, those vulnerability patches themselves introduce new vulnerabilities too.
In spite of the fact that the high reputations, the database management systems of MySQL, MariaDB, Oracle Database and Microsoft SQL Server frequently introduce security vulnerabilities. Vulnerabilities in those systems are specifically hazardous and sometimes they are extremely risky because those systems provide malicious users rights to access to the computing systems used by many database users. It could be possible that extremely sensitive information might be leaked or the system could be vandalized with no reason. Consequently, a lot of concerns are risen on very feasible security exploitations among the related IT industries and security researchers. In order to control the risk posed by software vulnerabilities related industries need to have some kind of methods to calculate their risk in numbers, so that they could manage their security risks under control. After all, "If You Can't Measure It, You Can't Improve It".
Here, we are trying to investigate the four major relational database management systems, namely, MySQL, MariaDB, Oracle Database and Microsoft SQL Server. In this paper, first, we are examine the seasonal fluctuations with a graph from the google Web search patterns. Then we are trying to represent some of future works in a sense of software vulnerability security in a quantitative manner. Table I represents some of the characteristics for the four database systems. The information is from https://db-engines.com on December 2020. The Oracle DB is the most popular database management systems followed by Microsoft SQL Server, MySQL. Although MariaDB ranks on twelves, it acts just like MySQL. Originally, MySQL was developed by open communities, but later, it is managed by Oracle. Although the MySQL is still open to the public, developers in the open communities decided to prepare for just in case. As a result, MariaDB intended high compatibility with MySQL. In the market share, Oracle, SQL Server, and MySQL occupy from rank1 to rank3, for the completion we added MariaDB for the analysis in the paper. Figure 1 show the popularities among the presented database systems by region. The figure was generated based on the google trend data sets from Figure 2. Microsoft SQL Server has doing well in the regions of the north America, northern part of Europe, northern part of Africa, South Africa and Oceania. Many of them are English speaking countries while the other part of the world tend to prefer MySQL. It is counterintuitive that the Oracle does not appear enough since the Oracle database places the number in Table I. It is possible that Oracle has a relatively small number of users, but the user group pay for the license fees more than any other database system users. Oracle is quite notorious for their expensive license fees https://www.quora.com/ Why-is-Oracle-licensing-so-expensive.
Although voluminous researches have been conducted on the software security vulnerability, researches focusing on the relational database management systems are not easy to find. Furthermore, much of the software vulnerability studies are concentrated on qualitative methods, and frequently, they have discussed about finding and preventing a particular security vulnerability. After entering the year 2000 when the datasets about software vulnerabilities had been enough to be examined by researchers. As a result, some researchers started to examine the security vulnerabilities in quantitative manners for major software systems such as Web servers and browsers or operating systems [5], [6].
Some researchers match quantitative and qualitative studies to science and art respectively [?]. Methods used in qualitative researches, just like an art, could be very subjective with respect to the experts' opinions, and they might not be mathematically explained, while a method which is using quantitative methods are considered as science based on the clear numbers. Those actual data driven empirical analyses with statistic tests create objective results for a given issue. Therefore, for the future work related to this research area, which is related to the software vulnerabilities should go along well with a quantitative method more as long as data sets are provided. For the quantitative analysis for a software security vulnerability analysis, we could get the related data sets for the four relational database management systems could be collected on cvedetail.com where we could find all the software vulnerability information from NVD (nvd.nist.gov).
The rest of the paper is organized as follows. The following section investigates the seasonal fluctuations based on the popularities for the four database systems according to the the number of Web searches from Google Trend. Finally we conclude this paper with presents some of the future works related to the risk vulnerability analysis based on the major relational database management systems.

II. SEASONAL ANALYSIS ON THE WEB SEARCHES
It is hard to compare among the software systems which database is more popular than others without spending budges for the related surveys. However we might approximate estimations how relatively they are popular by observing the number of the Web searches. Here, we are  using the Google Trends https://trends.google.com to figure it out the popularities among the software systems. Figure 2 shows us the Web searching pattern for the four database management systems. It is clearly observed that the number of searches about all the four databases has been decreased from the beginning in the plot. For the MySQL which is open source software has been placed number one for all the time followed by Microsoft SQL Server and Oracle database. That is because the open source software is free for use. As a result, the popularity is high in both academia and industry fields. Moreover, across the database systems, they share majority of the syntax. Only small portion of the syntax differ each others. Hence, users who like to learn SQL select MySQL for his first choice.
However, when it comes to the proprietary companies, they frequently, need to solve problems related to their databases every now and then. In that case, companies willing to pay for the service. It could be explained that commercial businesses prefer more reliable and easy to use system to the free software system. In Table I, the first, second and third ranks are Oracle, SQL Server and MySQL respectively. However, in Figure 2, the order of number of users are the opposite. That's probably because small businesses and students tend to use MySQL while big companies tend to use commercial software systems, and the former has bigger number of users than the latter cases. Now when we check Figure 3 which only shows the past five years of the Web searching trend it clearly tells us that there is a some kind of fluctuation in the graph. The time line in the figure is expending from 2015.12.06 to 2020.11.29. The x-axis and y-axis represents the same significance from the previous figure. It shows that around winter time, the number of searches are lesser than the other time of the year. Furthermore, it is not as clear as the winter time, the middle of year time periods also seems tend to be inactive. Seasonality or periodicity in software systems are some times observed [5]. And those attributes could help a lot to characterize and mitigate the risk posed by users behaviors.  In fact, those attributes are very well known research fields such as in the stock market [8] or birth defects [9].
In order to check whether there is a seasonal fluctuation in a given data sets, researchers use the seasonal index. It represents the average value of a specific period of time about the exactions. The index could be achived by Equation (1), where d is a grand average value, d i is the average value for the i t h t i mepoi nt and s i represents the seasonal index on the i t h time point [10].
Based on the seasonal index given above, for the future works could be conducted by applying the technique to the software vulnerabilities from the four database systems. From that kind of research we could expect that the optimal resource allocations from the database patch management to the related software marketing. Table II represents top ten the most and the list popular countries in each database systems. When the sampling values are not significant, then the corresponding value is represented with a dashed line. For the top 10 most popular list of MySQL, there are four Asians, five Africans and a Caribbean island. They are all emerging countries, which does not have extra economic conditions. As a result, their first choice is naturally MySQL. It is betrayal of interest when we compare to the graphic design softwares such as Adobe products since Adobe cost policy virtually let users in those countries use their project for free [11], [12]. Some of countries in the list appear in the top 10 least popular lists from the two commercial systems which makes sense. On the other hand, in the top 10 least poplar list of MySQL, there are seven Arabians, two Europeans and an African island (St. Helena). Among the Arabian countries appears again for the Oracle database friendly list, but interestingly no country is shown in the SQL Server list.

III. DISCUSSION
One of the black hat conference paper, called The laws of vulnerabilities [13], represents several vulnerability datasets showing the seasonal patterns or periodic fluctuations. Actually, in the paper, the authors does not empathize the seasonal datasets, but four quantifiable and distinct characteristics which can be found in software vulnerabilities: Half-life, Prevalence, Persistence and Exploitation. However, among the datasets what day presents turns out that there are indeed periodic patterns [7]. Half-life represents "time interval for reducing occurrence of a vulnerability by half. Average duration of half-life continues to be about 30 days, varying by industry sector," prevalence "measures the turnover rate of vulnerabilities in the top twenty list during a year. Prevalence has increased, with 60% remaining in the list in 2008 compared to 50% in 2004, persistence means "total life span of vulnerabilities. Persistence remains virtually unlimited," and exploitation represents "time interval between an exploit announcement and the first attack. Exploitation is faster, often happening in less than 10 days compared to 60 days in 2004." It might be an interesting research, if we apply the seasonal data sets from the database systems into the autocorrelation function analysis and ANOVA (analysis of variance) test. While autocorrelation test is able to tell us that the period of seasonality, ANOVA test can pin point specific time point when the peak values appear. When we conduct the future work, then we could optimally allocate the proper resources at the right time, so that we could mitigate the potential risk posed by vulnerabilities located in the database management systems.
Furthermore, if we combine the idea of seasonal attributes into a one of vulnerability discovery model, then the model performance estimating the number of vulnerability detection process will be dramatically enhanced.