INTRODUCTION

The quality of a government is an exceptional indicator of numerous elements within a country. Circumstances such as conflict and corruption that stem from government involvement can internally dismantle a country. Yet, there are other factors that can strengthen a country like proper education and infrastructure systems. Each country varies in their political systems and in their methods of action. The differences between their organizations are not inevitably damaging. Rather, it is those differences that allow for comparison between countries, allowing researchers to forecast positive and negative trends, influencing the decisions a country makes for its citizens. The data that is being used for this report is the QOG (Quality of Government) Standard collected by The QOG Institute at the University of Gothenburg in Sweden.

Our first question addresses two education variables: does higher school enrollment mean that there is higher government expenditure on education in Europe? The World Bank defines education as one of the leading factors in development. It is directly related to higher hourly earnings and greater economic growth. Education helps balance the disparity between classes; it is an equalizer. Across the globe, 53% of children in low to middle income countries are illiterate and cannot comprehend short stories by the end of primary school. Learning poverty is a generational barrier, but also a societal and economic one for countries looking to advance. With this research question, we want to see if investments in education by federal governments truly result in higher primary school enrollment. While there are other factors that could affect enrollment rates, government expenditure on education is a variable that can be efficiently controlled. We want to see if there will be a direct linear relationship present between these two variables, meaning that higher spending on education produces higher enrollment rates or if this relationship levels out. If indeed it does, we wonder if there is a “perfect” amount that countries should be spending on education that we can extrapolate from the relation between these two variables.

Our second question asks: how good of a predictor is GDP of human development level in Europe? While this question is initially vague, it has numerous implications. The GDP of a country indicates the capacity of their economy and its performance. Trends in GDP can reveal if the economy in question is stable or not. The QOG dataset describes human development as “the summary of the measure of average achievement in having a decent standard of living, being knowledgeable, and living a long and healthy life.” With this question, we will look at the influence of the health of an economy on the health of an individual, if there is any. While we understand that there are numerous factors that go into the level of human development in a nation, things like education and health, another factor is quality of life, which is measured by the United Nations by Gross National Income. GNI is the net amount coming into the country, rather than Gross Domestic Product, which measures money made by a country’s various industries. The main difference between these two measurements is how international earnings and aid money is counted. GDP is a more common indicator, whereas GNI is considered by some to be a slightly more accurate indicator of how a country is doing. By comparing GDP to HDI, we will gain insight as to how good the more common indicator of GDP is as a predictor for HDI.

DATA

For this project, our group utilized the Quality of Government Basic Dataset from the University of Gothenburg in Sweden. This is a compilation dataset, and the variables are drawn from various academically recognized sources in order to collect a broad set of data with multiple areas of interest. The data is organized by country and year. The years span from 1946 through 2019, although there are less data points available past the year 2016, as some of the newer data has not been released yet. Considering the breadth of this dataset, our group decided to condense it by time – from 2000 to 2019 – and region – EU countries. We decided to look at data from the last 20 years and at modern nation-states in order to effectively compare these countries and predict future tendencies. There are also some countries that do not have data for the full span of years due to the creation or division of new nation-states. However, since we have limited our analysis to Europe in the span of the last twenty last years, most of those issues have been limited.

The variables we used were wdi_expeduge, wdi_nerp, gle_cgdpc, undp_hdi, year, region, and cname. Wdi_expeduge is the percentage of government expenditure on education as compared to total government expenditure. It is sourced from the World Bank’s World Development Indicators. Wdi_nerp is also from the World Development Indicators Dataset. It is the percentage of primary school age children enrolled in education. Gle_cgdpc is the Gross Domestic Product per capita. It is sourced from Kristian Skrede Gleditsch’s dataset of trade and GDP data. Undp_hdi is Human Development Index, which utilizes a number of factors, including health, education, and standard of living to score countries on how well developed they are. It is on a scale from 1-10. This is from the United Nations Development Program.
cname region year wdi_expeduge wdi_nerp gle_cgdpc undp_hdi
Bulgaria Eastern 2003 11.18025 94.30700 8276.06 0.738
Georgia Eastern 2008 8.93760 98.30193 6322.75 0.728
Greece Southern 2004 7.70183 96.72800 21955.16 0.835
Serbia Southern 2008 10.39082 96.99485 11150.91 0.757
Greece Southern 2002 7.47716 95.15271 20469.77 0.818
Denmark Northern 2011 15.03626 97.48009 37406.76 0.922

The following table shows the regions that we divided Europe into and which countries are in each.

Southern Western Eastern Northern
Albania Austria Belarus Denmark
Andorra Belgium Bulgaria Estonia
Croatia France Czech Republic Finland
Greece Germany Hungary Iceland
Italy Liechtenstein Poland Ireland
Malta Luxembourg Moldova Latvia
Montenegro Monaco Romania Lithuania
Portugal Netherlands Russia Norway
San Marino Switzerland Slovakia Sweden
Serbia Ukraine United Kingdom
Slovenia Armenia
Spain Azerbaijan
North Macedonia Belarus
Bosnia and Herzegovina Cyprus
Georgia
Kazakhstan
Turkey

These boxplots show the variation of our four main variables amongst each European region.

RESULTS

Question I

To better understand the trends of governmental education expenditure and primary school enrollment, we created a slope chart for each variable from the year 2000 to the year 2016, which was chosen because it had the most complete data. The mean educational expenditure increased for the Eastern, Western, and Southern regions of Europe and decreased in the Northern region. During the same time period, mean primary school enrollment increased in the Southern and Northern regions while decreasing in the Western and Eastern regions. These results indicate that there was not always a positive relationship between these two variables.

The linear regression model was made using the lm() function for each of the four regions. By comparing the dependent variable (wdi_nerp) with the independent variable (wdi_expeduge), we were able to analyze the relationship between.The residual standard error values were substantial for each region, but the Eastern region had the highest value at 4.446. As seen in the plot, the Eastern region is also the most erratic relationship and has the worst fit to the linear regression line. The regression lines for the Northern and Southern regions had p-values that were lower than .05, meaning that they were both statistically significant. The Northern and Southern regions consist of countries that are normally seen as “successful”, including Denmark, Sweden, Norway, the United Kingdom, Italy, Greece, and Spain. It was interesting to see that the Western region, which included Belgium, France, and Germany, did not have a significant linear relationship between the two variables, considering that those countries are typically seen as some of the most developed. We expected to see no correlation in Eastern European countries, which proved to be true, and can be attributed to numerous factors including depopulation, economic crises, and political unrest.

The purpose for using a logarithmic regression model between the two variables of wdi_nerp and wdi_expeduge was to see if there was a point where the relationship began to level off, meaning that there was an ideal amount that a government should spend on education in order to maximize primary school enrollment. We used the logarithmic function \[Y=a+b*log(x)\] to plot the regression curves for each region. In this plot, the Southern region was the only one that had a p-value that was lower than .05, meaning that it was the singular statistically significant model. However, the RSE for the Southern region was still large at 3.17, meaning that there was an inadequate fit of the points to the regression line. From this plot, we can see that there is not a particular amount that a European country should spend on education to improve their school enrollment rates. This can be attributed to the fact that each country functions differently and that there are many factors that go into school enrollment besides the money that the government spends on education. It would be a better research question to look at each individual country and see if there is a relation between the two variables than to look at it from a regional level.

Question II

In order to answer our second question, we first created a scatter plot to observe if we could find any correlation between Human Development Index and GDP. This plot displays the GDP and Human Development Index, of countries in Europe over our range of years. From this plot, we could observe that as the year goes up, GDP in every country in Europe rises and the Human Development Index also steadily increases. This positive relationship between GDP and Human Development Index indicates that GDP could work as one of the predictors that affect the Human Development Level.

Knowing that GDP is a predictor of the Human Development Index, we decided to investigate further how much they are correlated and how good does GDP work as a predictor of the Human Development Index. We first conducted simple correlation tests using three different methods; Pearson, Spearman, and Kendall. The correlation coefficients calculated by the different three methods are ranged from 0.8 to over 0.95. The correlation estimate values are around 0.81 using the Pearson method, around 0.95 with the Spearman method, and around 0.82 using the Kendall method. When the correlation coefficient is over 0.80 and close to 1, the two variables are considered to be very highly correlated. Considering the mean of the correlation estimate values of each of the three different methods is around 0.86, we can say that the Human Development Index and GDP are very highly correlated.

Method Correlation_coefficient
Pearson’s product-moment correlation 0.8050229
Spearman’s rank correlation rho 0.9543072
Kendall’s rank correlation tau 0.8221583
Mean 0.8604961

From the observations above, we were confident that there was a high correlation between Human Development Index and GDP. In order to figure out how good of a predictor the GDP was, we decided to train some models and discover the accuracy of their predictions. First we split our dataset into 80% training and 20% testing. We then developed linear, polynomial, and logistic models to predict HDI based on GDP using the training set. With these models, we obtained predictions and residuals for the testing set which we then plotted versus the actual values. We further calculated some statistics based on the residuals such as Mean Absolute Error and Root Mean Squared Error. From these values, we can see the lowest errors came from Poly(3) and Poly(4), that is the polynomial models with exponents less than and equal to 3 and 4 respectively. The worst error came from the linear model, while the logistic model performed decently but not as well as the polynomials. We thus came to the conclusion that HDI can be a good predictor for GDP and the best models for prediction are polynomials with at least degree of 3.

Model Mean Bias Mean Absolute Error Root Mean Squared Error
Linear -0.0010151 0.0359483 0.0463605
Poly(2) 0.0007622 0.0181223 0.0241004
Poly(3) 0.0018636 0.0158624 0.0212497
Poly(4) 0.0018674 0.0159758 0.0213745
Logistic 0.0018076 0.0158021 0.0208525

CONCLUSION

In our first question, we analyzed government expenditure on education and primary school enrollment to determine whether higher government expenditure on education resulted in higher numbers of enrollment in a given European country. Our simple regression model for Question 1 proved to be statistically significant only for the Northern and Southern regions. This indicates a positive linear relationship between the two variables. Our logarithmic regression model was statistically significantly for only the Southern region. These results indicate that there may be a stronger linear relationship between the variables of government expenditure on education and primary school enrollment over a logarithmic relationship. Because we looked at these variables on a regional basis, the regression becomes under or overestimated based on the deviation of the countries within a region. This analysis led our team to conclude that there is little to no relationship between these two variables in aggregate, but on a country by country basis, the relationships are more clearly defined. As seen in the slope charts, for some regions, increases in expenditure on education correspond to increases in primary enrollment, although that is not always true. Therefore, it would be a better endeavor to compare specific countries, perhaps adding covariate variables to characterize the countries contrary to a regional delineation. For our second question we looked at how good of a predictor GDP was for a country’s HDI. We started by confirming a relationship existed between the two variables, and then proceeded to train various models to discover the accuracy of prediction. From these steps we concluded that GDP was a good predictor of HDI and the best model we found was polynomial with at least degree of 3.

Our conclusions were largely expected. For the first question about expenditure and enrollment, we were expecting a stronger positive relationship, but understand why the relationship was less clear due to sporadic data collection in some countries. For the second question, the positive relationship between GDP and Human Development Index was what we expected. And we also expected that GDP could be a good predictor for HDI. However, we were surprised to see that the best model for prediction was polynomial rather than logistic as that implied that HDI doesn’t simply level out and slow its increase as GDP increases but rather starts to decrease past a certain point.

The results of our analysis are important because they demonstrate that policies at the federal level do impact the quality of life for citizens. We looked at four variables out of 2000, and saw a relationship between government expenditure and variables like school enrollment and human development. Because there we did not find a “perfect” amount of education spending, it is clear that other confounding variables have a larger impact on primary school enrollment in Europe than governmental spending. Therefore, researchers and policymakers should continue to investigate other variables that impact student learning and achievement, such as whether there is peace in the nation and poverty levels. Similarly, by discovering how good of a predictor GDP is of HDI, we can generate variable models that can give us detailed visualizations of the relationship between the two and examine how effectively GDP can be used as an element to measure the standard of living of people in different countries. With this information we could find what level of GDP is related to the highest HDI as well as when the greatest increase in HDI occurs. What we discover through this question would also be able to provide researchers and analysts keys to evaluate the nation’s achievement in quality life and to utilize the data of GDP fluctuations in social and economic development. Furthermore, this would allow researchers to put forth proposals to help countries reach higher HDI levels which would benefit the residents of those countries.

In the future, we recommend that researchers explore the relationship between government spending on education and primary school enrollment on a country by country basis, rather than grouping by region. Doing so will allow for stronger comparisons to be made. There are many European countries, so it would also be useful to be more intentional about which countries are included in the study, such as those who rank highest in student achievement. Researchers could also look into other variables that could be good predictors for HDI, to get a clearer picture of what types of policy could be introduced to improve HDI, as well as look into other possible correlated variables to minimize any possible negative outcomes associated with those predictor variables. While our data contains a rich and wide variety of variables that could be used as standards to measure the quality of government, there were many missing values. If we were allowed more sufficient and up-to-date data available corresponding to every country, we would have been able to produce more useful results. Regarding this point, a method we could have come up with is to collect more available data from some other organizations and merge them into one dataset. Further, more advanced modeling techniques could have been utilized such as training a neural network to improve the performance of our models.