## NULL
This lab journal is created for the Summer-School project BIGSSS - Segregation and Polarization.
In this section we are defining Polarization and constructing custom functions to calculate the level of polarization in each polling station.
A. Read the following papers:
B. For this BIGSSS we will use election results (at the polling station-level) to tap into the degree of polarization. What kind of polarization are we measuring and trying to explain?
C. How would you determine the position of parties on specific political topics with data that is readily available? And, which topic(s) would you use to determine the level of political polarization (in the NL)?
D. Which theoretical article on political polarization should we definitely all read for this summer school? Please provide a reference and motivate your answer.
There are myriad ways in how we could define polarization.
A. Read the paper of (Esteban and Ray 1994).
B. Please summarize some of the key or defining features of (a measure of) polarization.
If we have data on the political opinion of each agent/voter, we could simply calculate the variance in political opinion.
\[\begin{equation} VAR(p_i) = \frac{1}{N-1} \Sigma_i(p_i-\bar{p_i})^2, \tag{2.1} \end{equation}\]
with \(p_i\) the position of one voter with respect to one specific political issue. Note, that using the variance as a measure of political polarization will only work for one political opinion at the time.
Note, that we are not working with data that contains information on the political opinions of individual voters yet. We will work with election outcomes. We therefore assign to each voter, the position of the party the voter voted for
We have data on party positions and the vote share each party obtained in each polling station. We could use the function Hmisc::wtd.var
to calculate the weighted variance (without the need to disaggregate our macro-level data).
A somewhat more general measure would be to define polarization as the variance in the distribution of pairwise opinion differences \(d_{ij}\) across all dyads of voters (where, once again in our case, voters are assigned the positions of the parties they voted for) (Baldassarri and Gelman 2008). Formally:
\[\begin{equation} VAR(d_{ij}) = \frac{1}{N-1} \Sigma_{ij}(d_{ij}-\bar{d_{ij}})^2, \tag{2.2} \end{equation}\]
where \(d_{ij}\) is the difference (or distance) in political position between the voter-dyads: \(\delta(p_i, p_j)\) determined by some function \(\delta\). The distance can be calculated as, for example, Manhatten or Euclidean distance. Since we can calculate the difference in a multi-dimensional space, this measure can also be applied to multi-dimensional opinion settings.
Next to taking the distance of each voter-dyad central, there are also measures that focus on the distance of each voter-group to the center (of mass) (Aleskerov and Oleynik 2016):
\[\begin{equation} P = k \Sigma_{j=1}^J\omega_j*\delta(p_j, c) , \tag{2.3} \end{equation}\]
where \(\delta\) is some distance function that calculates the distance between the (multidimensional) position of the voter-group to \(c\), the center of mass or ‘mean voter position (i.e. \(\bar{p}\))’. \(\omega_j\) is the proportion of group \(j\). These groups \(j\) can be individuals, or groups of individuals with the same position. In our case, the groups are the different parties, voters could vote for and \(\omega_j\) is the vote share of each party in each polling station. \(k\) is a normalization coefficient.
if we define \(\delta\) as:
\[\begin{equation} \delta(p_j, c) = (p_j - c)^2 , \tag{2.4} \end{equation}\]
and set \(k=1\) this measure becomes:
\[\begin{equation} V = \Sigma_{j=1}^J\omega_j(p_j - \bar{p})^2 , \tag{2.5} \end{equation}\]
the (weighted) sum of squared differences.
A much used measure of (group) segregation is the original Esteban-Rey measure of segregation (Esteban and Ray 1994):
\[\begin{equation} P_{ER}(\pi, x) = K \Sigma_{c=1}^h\Sigma_{d=1}^h\pi_c^{1+\alpha}\pi_d|x_c-x_d|, \tag{2.6} \end{equation}\]
with \(\pi_c\) the proportion of group \(c\) and \(x_c\) position of group c and where \(K\) is a normalization constant to make sure we can compare polarization scores among populations of different sizes. The parameter \(\alpha\) is the polarization sensitivity parameter and \(\alpha \in (0,\alpha^*]\) where \(\alpha^*\approx 1.6\). The challenge in this measure is how we should define ‘opinion groups’. This is also an important distinction between the previous measures which do not assume groups but start from individuals. However, in our case we define the groups based on the parties voters voted for. And in this case, I am not sure why you could not simply extend this measure to the multidimensional case if we replace \(|x_c-x_d|\) with \(\delta(p_c, p_d)\) with \(\delta\) being a distance function (e.g. Manhattan).
Now that we provided some formal theoretical background on polarization measures, let us move on and see how we can implement these in R.
rm(list = ls())
fpacage.check
: Check if packages are installed (and install if not) in R (source).fsave
: Function to save data with time stamp in correct directoryfsave <- function(x, file, location = "./data/processed/", ...) {
if (!dir.exists(location))
dir.create(location)
datename <- substr(gsub("[:-]", "", Sys.time()), 1, 8)
totalname <- paste(location, datename, file, sep = "")
print(paste("SAVED: ", totalname, sep = ""))
save(x, file = totalname)
}
fpackage.check <- function(packages) {
lapply(packages, FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
})
}
colorize <- function(x, color) {
sprintf("<span style='color: %s;'>%s</span>", color, x)
}
Let us first load some necessary packages
compiler
: We will use this one to compile the functions we programmed. This makes them a lot faster.tidyverse
: If you can’t base them, join them.Hmisc
: for weighted mean/variancestats
for distance measures.packages = c("tidyverse", "compiler", "Hmisc", "stats")
fpackage.check(packages)
The input of our functions will be:
a vector containing the number of votes each (included) party obtained in each polling station.
a matrix (or dataframe) with in the rows the different parties (in the same order as the votes vector) and in the columns the position of each party on one or more issues.
Let us make sure that all our opinions are in the range \([0,1]\) before we use them as input.
fPvar <- function(votes, positions, method = "euclidean") {
positions <- positions * 2 #this function wants a range of 2 for each attitude
distances <- as.matrix(dist(positions, method = method))
votes_mat <- votes %o% votes
diag(votes_mat)[diag(votes_mat) > 0] <- diag(votes_mat)[diag(votes_mat) > 1] - 1 #we do not want to include distance to yourself, thus i cannot by j in the dyad_ij.
Pvar <- Hmisc::wtd.var(as.numeric(distances), as.numeric(votes_mat))/NCOL(positions) #we normalize for number of opinions
return(Pvar)
}
fPvar <- cmpfun(fPvar)
I programmed three different types of distances:
fPV <- function(votes, positions, method = "euclidean") {
shares <- votes/sum(votes, na.rm = TRUE)
pbar <- rep(NA, NCOL(positions))
pbar <- as.numeric(t(shares) %*% as.matrix(positions)) #center of mass / mean position
# distances to mean
if (method != "sq") {
if (NCOL(positions) == 1) {
distances <- as.matrix(stats::dist(c(pbar, positions), method = method))[, 1][-1]
} else {
distances <- as.matrix(stats::dist(rbind(pbar, positions), method = method))[, 1][-1]
}
}
# if (method=='sq') {distances <- ??}
# defining the constant
if (method == "euclidean") {
k <- 2/sqrt(NCOL(positions))
}
if (method == "manhattan") {
k <- 2/NCOL(positions)
}
if (method == "sq") {
k <- 1
}
PV <- k * sum(shares * distances)
return(PV)
}
fPV <- cmpfun(fPV)
As you see, the function above is not yet ready. It does not yet calculate distances if
method=="sq"
. Please update this function and make it work!
fPER <- function(alpha = 1, votes, positions, method = "euclidean") {
positions <- positions
distances <- as.matrix(stats::dist(positions, method = method))
shares <- votes/sum(votes, na.rm = TRUE)
sharesi <- shares^(1 + alpha)
sharesj <- shares
ER <- as.numeric(sharesi %*% distances %*% sharesj)
return(ER)
}
fPER <- cmpfun(fPER)
Let us generate the positions for 6 parties on two dimensions, x and y.
x <- c(0, 0.5, 1, 0, 0.5, 1)
y <- c(0, 0.5, 1, 1, 0.5, 0)
positions <- data.frame(x, y)
And let us generate a vote share for each party at three polling station.
votes1 <- c(100, 100, 100, 100, 100, 100)
votes2 <- c(100, 0, 100, 0, 0, 0)
votes3 <- c(0, 0, 100, 0, 0, 100)
# this one wants range of 2
Hmisc::wtd.var(positions$x * 2, votes1)
Hmisc::wtd.var(positions$x * 2, votes2)
Hmisc::wtd.var(positions$x * 2, votes3)
Hmisc::wtd.var(positions$y * 2, votes1)
Hmisc::wtd.var(positions$y * 2, votes2)
Hmisc::wtd.var(positions$y * 2, votes3)
#> [1] 0.6677796
#> [1] 1.005025
#> [1] 0
#> [1] 0.6677796
#> [1] 1.005025
#> [1] 1.005025
fPvar(votes = votes1, positions = positions[, 1])
fPvar(votes = votes2, positions = positions[, 1])
fPvar(votes = votes3, positions = positions[, 1])
fPvar(votes = votes1, positions = positions)
fPvar(votes = votes2, positions = positions)
fPvar(votes = votes3, positions = positions)
#> [1] 0.5432073
#> [1] 1.000025
#> [1] 0
#> [1] 0.3710884
#> [1] 1.000025
#> [1] 0.5000125
fPV(votes = votes1, positions = positions[, 1])
fPV(votes = votes2, positions = positions[, 1])
fPV(votes = votes3, positions = positions[, 1])
fPV(votes = votes1, positions = positions)
fPV(votes = votes2, positions = positions)
fPV(votes = votes3, positions = positions)
#> [1] 0.6666667
#> [1] 1
#> [1] 0
#> [1] 0.6666667
#> [1] 1
#> [1] 0.7071068
fPER(votes = votes1, positions = positions[, 1])
fPER(votes = votes2, positions = positions[, 1])
fPER(votes = votes2, positions = positions[, 1])
fPER(votes = votes1, positions = positions)
fPER(votes = votes2, positions = positions)
fPER(votes = votes3, positions = positions)
# if you want to check with existing function require('acid') acid::polarisation.ER(alpha=1,
# rho=data.frame(means=positions[,1], shares=votes1/sum(votes1)), comp = FALSE)$P
# acid::polarisation.ER(alpha=1, rho=data.frame(means=positions[,1], shares=votes2/sum(votes2)),
# comp = FALSE)$P acid::polarisation.ER(alpha=1, rho=data.frame(means=positions[,1],
# shares=votes3/sum(votes3)), comp = FALSE)$P
#> [1] 0.07407407
#> [1] 0.25
#> [1] 0.25
#> [1] 0.1156045
#> [1] 0.3535534
#> [1] 0.25
Copyright © 2022 Jochem Tolsma / Thomas Feliciani / Rob Franken