Many would have you believe that French politics are going the way of Britain and the United States — that the left is buckling to right-leaning politics. But how much should we read in to that? Below we explore my methodology for forecasting the 2017 Presidential Election in France.

To understand how to forecast France’s election, we must know how their elections operate. For those readers hailing from the United States, or another party with the usual first-past-the-post electoral system, it might help to read up on France’s own peculiar method of electing their President. Particularly, France employs a two-round system of electing their President. In the first, voters can pick between any number of eligible candidates. From there, the two with the highest vote shares go on to the second round. Then, everyone votes again! The winner of the most votes out of the two becomes President of France!

With that being said, here’s how the forecast works:

Overview

The process for forecasting the French election is not much simpler than that of the United States, but will sound much more intuitive to most. The process, in brief, is as follows:

Collect and correct polls

Corrections based on recency of poll

Simulate the first round of the election (April 23, 2017)

For each simulated election (what we call a “trial”):

Vary the polls based on what we can expect from historical polling error

Pick two winners, according to the varied polling average

Simulate the second round (May 7, 2017)

For each trial:

Again, vary the polls based on past error

Pick two winners, according to the varied polling average

Result:

50,000 “trials” for possible outcomes of the election

A proportion of how often each candidate wins those trials, and thus

A probability representing each candidate’s chances of winning the election, according to our 50,000 trials.

1. Polls

Round One:

The polling for the French election comes in two forms. Firstly, there are traditional, or the usual, polls — those that poll the vote share for all parties of the Presidency. Once collected from various sources online (all data is entered by hand to ensure integrity) it is time to compute our average of the polls.

However, not all polls polls are created equal. So instead of taking a regular average of all polling, we compute a weighted average of the polls, similar to the method in which your grade school math teacher would measure your final grade.

The weights are assigned based on just one factor: the recency of a poll. This way, our model thinks that polls read the public more recently are better polls. Does it make sense that a week-old poll with 400 respondents should count the same as a one-day-old poll with 1,200 respondents? Of course not. We try to fix this problem.

After all the weights are assigned, we get the snapshot average of polls for today. We keep that list of polling averages for the rest of the day’s model.

Round Two:

The process above is then repeated for every two-way matchup that polls ask their French respondents. IE: we compute an average for Fillon v Le Pen, Macron v Le Pen, Fillon v Macron, etc.

It’s time to move on to the fun part: simulating an election (or rather, simulating 50,000 of them)!

2. Simulating the Election: Round One

Round one of the election is the easy part, where we pick two winners among all the candidates (again, those with the highest two shares of the vote) and plug them into our formula for the second round. Since we already have the polling averages for each competing party — and even some extras, which I won’t get into here — the next steps are pretty simple.

2a. Trial 1

Here I am going to explain how we would do one single simulation of the election. Keep in mind that we repeat this several thousand times (50,000!) so no single trial has much weight in the forecast. That’s the idea, anyway, behind the law of large numbers — do so many trials/pick so many numbers that outliers won’t have any effect.

Step 1

is to randomly vary each party’s vote share to get an idea of the range of possibilities in the election. We do this with the dirichlet distribution. We estimate the range of possibilities for each party using historical polling error data gathered by professors of political science Will Jennings and Christopher Wlezien. Since 1965, polls of the first round of election has missed their mark by an average of 2.5%, but we can expect that to be as high as 5.5% or as low as 0%. This error is larger earlier in the campaign, and looks like this:

For example, this first step of trial may look like this (NOTE for simplicity the numbers below are made up):

Before random variation:

Trial Other Vote (%) Hamon Fillon Macron Le Pen

1 10 15 20 25 30

and with one trial random variation:

Trial Other Vote (%) Hamon Fillon Macron Le Pen

1 12 14 20 22 32

Step 2

Then, simply, we assign a winner and runner up to this trial of the simulations:

Trial Other Vote (%) Hamon Fillon Macron Le Pen Winner Runner-Up

1 12 14 20 22 32 Le Pen Macron

2b. Trials 2-50,000

We repeat this process fifty thousand times to get an idea at the “true” range of possibilities for each candidate, based on their party’s current position and past polling error. You can imagine that our final data looks something like this:

Trial Other Vote (%) Hamon Fillon Macron Le Pen Winner Runner-Up

1 12 14 20 22 32 Le Pen Macron

… … … … … … … …

50,000 7 25 16 28 24 Macron Fillon

3. Simulating the Election: Round Two

After we have the 50,000 first-round simulations detailed above, it’s time to move on to the second round of the election. Here, the process is much simpler.

3a. Trial 1

Step 1

First, we take the polling average in the two-way matchup between the winner and runner-up of the given election simulation.

Then, we randomly vary those vote shares based on the historical error and current uncertainty of those polls. In 2012, polls were off by less than 2% across the board, so this error is not large but it is necessary to include. Recall that the first round’s error decreases over time, and similar trends are identifiable in polls of the second round.

Here, it’s clear that the best polls of the second round are taken around 100 days until election day. Those taken the day before election day Have about 3% average error, with as much as 7%(!). This is where uncertainty comes in to play.

We vary these vote shares in the t distribution, an especially appropriate choice as there are only two candidates here from which to choose.

Before random variation:

Trial Macron Le Pen

1 60 40

After random variation:

Trial Macron Le Pen

1 55 45

Step 2

And then just like above, we assign a winner to the trial.

Trial Macron Le Pen Winner

1 55 45 Macron

2b. Trials 2-50,000

Trial Macron Le Pen Winner

1 55 45 Macron

2 49 51 Le Pen

Viola: we have 50,000 election simulations with winners/runners-up for both the first and second round of the 2017 French election. Finally, let’s determine win probabilities.

4. Tallying Up Score

This is perhaps the easiest step, as we’re just counting up the number of simulations in which each party wins. Then, we divide that number of wins by the total number of simulations (50,000) to get our final win probabilities. They look like this!

That’ll do it for today. Questions? Comments? Concerns? (Hopefully not concerns). Send us a tweet.

Keep your eye on the forecast for updates and new features.