Submitted by **Barra79** t3_11i5hv3
in **dataisbeautiful**

#
**RenegadeMoose**
t1_jaxre3x wrote

Dumb question, but why is the red line between 10km/h and 20km/h plotted so much higher above the dense mass of dots below it?

Shouldn't the red line be coming in a bit lower and angling up a bit steeper along that part of the graph?

( or are all those low-density outliers above the red line causing it to appear higher up? )

#
**Kualityy**
t1_jb0d62m wrote

>or are all those low-density outliers above the red line causing it to appear higher up?

Pretty much this. Least squares regression is sensitive to outliers.

#
**Barra79**
OP
t1_jaxsay9 wrote

Im using a poly fit function set to the third degree: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

#
**KiwasiGames**
t1_jaykfef wrote

Check your residuals. A third degree polynomial doesn’t look particularly appropriate here.

#
**VikThorior**
t1_jb22gkq wrote

As I said below another post you made, don't do a regression if you don't have a model in mind. It may just be hypothetical, but you must have an explanation as to why you chose this regression in particular, other than "it fits pretty well". A 100th degree polynomial function will fit better, a Ngh degree polynomial, with N the number of points, will fit perfectly.

Also, the problem you have here is that you have "positive" outliers but you don't have negative outliers for the lowest values, because energy production can't go below 0. So you have a regression which is higher than the truth. You should find a way to identfy and eliminate these outliers.

And if you can't that's not a problem! We don't need a regression all the time. We see the relationship pretty well, the red line is not needed. It just shows a model which is obviously wrong for many reasons.

#
**RenegadeMoose**
t1_jaxtf0m wrote

Thanks! Like I said, "dumb question" but I had to ask :)

Viewing a single comment thread. View all comments