Submitted by **adebar** t3_z049lt
in **dataisbeautiful**

## Comments

#
**adebar**
OP
t1_ix3gu51 wrote

Have you heard of /r/wallstreetbets? You'd be right at home there. 😁

#
**pinkshirtbadman**
t1_ix3hh6l wrote

You laugh now, but when my $3.87 pays off...

#
**adebar**
OP
t1_ix5cmo2 wrote

Please promise me to YOLO any Costa Rica winnings into deep OTM calls 😁

#
**wesblog**
t1_ix3m749 wrote

Didnt costa rica beat brazil? Or was it Germany? In the last olympics?

#
**pinkshirtbadman**
t1_ix6bbtv wrote

I mean, this thread is pretty dang close to 100% of my soccer knowledge, so I'm the wrong guy to ask...

#
**islayblog**
t1_ix3n16w wrote

Greece, anyone?

#
**Naive-Kangaroo3031**
t1_ix3xuus wrote

50% for the USA is a bit generous.

#
**adebar**
OP
t1_ix4lx6i wrote

50% for the USA is for them to qualify for the round of 16. All they need is one win (against Iran is most likely) and then a draw (eg. against Wales). 50% could be the right ball park.

Also, to clarify where the probabilities come from: The probabilities are the implied probabilities from the betting markets for each of those events (p = 1 / odds). This is purely based on the quotes from the market and does not involve any modelling/forecasting.

#
**Naive-Kangaroo3031**
t1_ix4sm35 wrote

You're right, I misread it as 50% to reach the final. That makes much more sense. Very good work

#
**adebar**
OP
t1_ix3g76o wrote

The underlying data is aggregated from Betfair. Data analysis was done in Python and I used Vega/Altair for the visualisation.

EDIT: Just to clarify where the probabilities come from: The probabilities are the **implied probabilities from the betting markets** for each of those events (p = 1 / odds). This is purely based on the quotes from the market and does not involve any modelling/forecasting.

#
**kungfupandey123**
t1_ix3xggy wrote

Nice analysis, and I'd say pretty realistic too.

Care to share the data and Python code too ?

#
**adebar**
OP
t1_ix8q5j4 wrote

Sure, I just hacked together a blog for myself and you can find it there:

The data is available as a .csv file and you can also see the Python/altair code there.

#
**Derpthinkr**
t1_ix4gk4l wrote

Canada finishing so far behind usa and Mexico. What data do these models use? 5 year histories?

#
**adebar**
OP
t1_ix4lzg2 wrote

Just to clarify where the probabilities come from: The probabilities are the implied probabilities from the betting markets for each of those events (p = 1 / odds). This is purely based on the quotes from the market and does not involve any modelling/forecasting.

#
**[deleted]**
t1_ix3ya3m wrote

[removed]

#
**phdoofus**
t1_ix4ft1x wrote

How about past initial predictions vs outcome?

#
**Ravmagn**
t1_ix3ogbt wrote

I see Netherlands going far in this tournament and a more likely candidate to win than Brazil.

#
**adebar**
OP
t1_ix4m1gk wrote

Just to clarify where the probabilities come from: The probabilities are the implied probabilities from the betting markets for each of those events (p = 1 / odds). This is purely based on the quotes from the market and does not involve any modelling/forecasting.

#
**wwarnout**
t1_ix3lkok wrote

Does that take into account the teams that are paid to throw the game?

pinkshirtbadmant1_ix3gd2a wrotePutting all my money on Costa Rica