Thursday, September 14, 2017

Letdowns After Streaks

I've been telling people for a few days now that the Indians need to lose a game and break their streak, so that they have time to get through the letdown and get back to normal before the playoffs.

But is there really a letdown after a streak? It feels like there is, but maybe that's just an effect of elevated expectations.

I checked the won-lost records of teams in the 11 games after the 10 longest streaks after 1900. Ignoring the game that ended the streak, which by definition has to be a loss, what was the overall record? (Two of the streaks ended very late in the season and there weren't 11 games left.)

It turns out to be .471 (41-46), which is not great baseball, especially for a team strong enough to pull off a long streak. But it's not quite as dire as I imagined.

For the record, the streaks I looked at were

1916 Giants (both the famous 26-game record streak and an earlier 17-game streak that same season)
1935 Cubs
2002 A's
1906 White Sox
1947 Yankees
1904 Giants
1953 Yankees
1907 Giants
1912 Senators

This is a small sample, so take it for what it is.

As I write this the Indians are down 2-1 to KC. We can only hope...

Streaks Part 2

In my last post I estimated the odds of winning streaks of various lengths by simulating a large number of seasons. I came up with a 0.75% chance per season of a win streak of 19 games or longer, but the actual history is 8 such streaks in 137 seasons (6%). That is significantly more than my estimate.

One obvious correction would be to tweak my uniformly distributed team strengths to fatten up the tails. An "outlier" good team would be more likely to have a long win streak. But my distribution was already uniform. A .450 team is as likely as a .500 team, which is to say that my distribution has very fat tails. (I verified that this reasoning was true with a simulation, because I never trust my statistical intuition.) If I fattened up the tails any more, you'd have teams winning 120 and 130 games a season, which never happens.

So I did two things, both based on the fact that a season is not made up of 162 random matchups as my model originally assumed, but of about 50 3- and 4-game series with each series being either all home games or all away games against the same team. That seems like it would increase the likelihood of a streak, because you could line up a bunch of home series against weak teams.

First, I changed the season from 162 individual matchups to 54 3-game series between the same teams. That had basically no effect on the likelihood of a 19-game win streak. Then, I gave the home team a slight edge by increasing its strength 5% and decreasing the visiting team's strength by 5%. This is based on the average home record being about .550 compared to .450 on the road, which I got here. That barely moved the needle. The likelihood of a 19-game win streak was still a little less than 1%, compared to historical experience of 6%.

I didn't include the effects of home stands or road trips; that is, the fact that teams usually play three or four series in a row at home or away instead of cycling between the two. But I don't see that being a big player.

A couple of other possible explanations are that streaks either psychologically build on themselves (probably impossible to verify with any rigor, due to the luck factor) or that team strength waxes and wanes during the season instead of being a fixed value throughout.  This second effect seems promising because many injuries take a few weeks to heal. The worst teams at any given time probably include good teams that have a lot of injured players. When those players get better, all of a sudden the team is good again.  Then there's the streakiness of individual players. I speculate that many slumps are due to players being injured but functional and not telling anyone.

At some point I'll build up my team strengths from player stats, instead of assigning them randomly. Then we'll be cooking' with gas, as they say.

One correction: I said that the 1916 26-game winning streak of the Giants was interrupted by a tie. That is not exactly correct. The "tie" was actually a suspended game that, by the rules of the day, had to be replayed from the start instead of picked up from where it left off as it would be today. They did replay the game (I didn't know this when I made my original remarks) and the Giants won. That's not a tie in my book. So the record really is 26 wins in a row and there should be no asterisk by it.

Tuesday, September 12, 2017

Streak Odds

The simulation I developed to find the effect of luck in baseball can be used to estimate the odds of various streaks. As it happens, the Cleveland Indians are currently sitting on a 19-game winning streak, which is the sixth-longest winning streak since 1880.

In an earlier post I said the all-time longest winning streak was 26 by the Giants in 1916, but it turns out that streak was 27 wins interrupted between wins 15 and 16 by a tie with Pittsburgh. (A tie? According to Retrosheet they finished the top of the 9th tied at 1-1, but the Giants didn't bat in the bottom of the 9th, for unrecorded reasons. I'm guessing it started raining, and then they never completed the game because neither team contended that year.)

The 1916 Giants also had a 17-game winning streak earlier in the season, but they only came in fourth!

What are the odds of any team getting a 19-game win streak or better in a given season? I set up my team strengths as shown in the scatter plot on the left, and then ran 1000 simulated 162-game seasons. The histogram of longest streaks is shown on the right. There were 15 win or loss streaks of 19 or more, so that would be 15/2/1000 = 0.75% chance per season.

The actual number of streaks of 19 or more since 1880 (137 seasons but most were fewer than 162 games) is 8 (6%). So there's a fat tail effect, or something, going on that I'm not accounting for.

A window company in Cleveland offered free window jobs to anyone who bought windows in July, if the Indians had a 15-game winning streak. You got the deal if you bought by July 31, at which time the Tribe had 58 games left to play. What are the odds of a 15-or-better winning streak by one of the 30 teams in 58 games?

I calculated it by looking at the longest streak for "Team 1" of my ensemble over 10,000 58-game "seasons". (It took 10,000 simulated seasons to get a stable value.) That streak was 15 or greater just 14 times, so the chance of a 15-game winning streak was 14/2/10,000 = 0.07%. The figure below shows on the left the strength and actual wins over the 58 games for the 30 teams, and on the right the histogram of longest streaks by Team 1 in each season. The caption should read 10,000 seasons, not 1000 seasons.

I didn't use all the information available. I could have only looked at teams that happened to have 57 wins in the first 104 games (as the Indians did), which would have taken a lot more simulations but probably wouldn't have changed the results much because 57 wins out of 104 is not much better than average.

As is typical of these kinds of promotions, the window company itself didn't take on the risk of having to pay out. They paid a promotion company, which took the risk. What would have been a fair price to pay the promotion company? They sold about $2 million worth of windows, so the expected payout would be 0.07% x $2 million or $1400. Even if they paid $10,000, that promotion company had to eat a very spicy meatball when the Tribe won their 15th game.

Now, a philosophical excursion. It only makes sense to talk about probability and odds when there is some degree of ignorance. On July 31, everyone was ignorant of how the Indians would actually play, but there were varying degrees of knowledge about their record so far, their injuries, which teams they were scheduled to play, how many home versus away games and other information that a sophisticated model could use to estimate the odds. Given what we know today, what are the odds the Indians would have won their 19th straight last night?


Monday, September 4, 2017


I trained a neural network on the lyrics of a certain popular songwriter and then had it generate a short song:

Got in a little favor for him.
I wanna find one place, I wanna find one face that ain't looking through me.
Down in the U.S.A.
Born in the shadow of the refinery.

I'm a cool rocking Daddy in the face of these....
Whoa whoa whoa badlands!. Whoa whoa whoa badlands!. Whoa whoa whoa whoa whoa badlands!. Whoa whoa whoa.
For the ones who had a woman he loved in Saigon.

I was born in the shadow of the penitentiary.
I was born in the night, with a fear so real, you spend your life just covering up.
Learned real good right now, you better listen to me, baby.
I'm a long gone Daddy in the shadow of the penitentiary.

If you can't figure out whose lyrics I trained the network on, you must not be between the ages of 30 and 80. I used this guy's code.

Are You Ready For Some Football?

tl;dr: The small number of games in the NFL season strongly exaggerates the differences between teams. The NFL rule of scheduling six of a team's games between division rivals would have no effect on the actual results of a season determined by coin-flips. But division scheduling very slightly exaggerates differences when real differences already exist. 

Major League Baseball teams play 162 games a season, which are clearly enough to separate the truly good teams from the merely lucky ones. In the NFL it's only 16 games. Is that enough to separate the great from the lucky?

First, I ran the same simulation I used in my last two posts but set the number of teams to 32 and the number of games per season to 16. With each game decided by the flip of a fair coin (therefore, no ties), here is one example season (sorry about the formatting, Blogger has a fixed column width):

North South East West
Cincinnati 12-4 Indianapolis 9-7 NY Jets 14-2 Denver 8-8
Cleveland 12-4 Tennessee 9-7 New England 9-7 LA Chargers 8-8
Pittsburgh 9-7 Jacksonville 8-8 Miami 5-11 Oakland 7-9
Baltimore 7-9 Houston 6-10 Buffalo 7-9 Kansas City 3-13
North South East West
Minnesota 11-5 Atlanta 11-5 Philadelphia 8-8 Arizona 10-6
Chicago 10-6 Tampa Bay 8-8 Washington 7-7 Seattle 9-7
Detroit 7-9 Carolina 7-9 Dallas 7-7 LA Rams 6-10
Green Bay 6-10 New Orleans 4-12 NY Giants 6-10 San Francisco 6-10

Two things you notice right away is that there seem to be too many teams within a game of .500 (7-9, 8-8 or 9-7), and that there isn't enough separation between the teams in most of the divisions. There are 17 teams within one game of .500, but in 2016 there were actually only 11 teams like that. And in two divisions, no team is more than two games from .500. That's unusual. 

Obviously, if I assigned unequal strengths to the teams, this would tend to create some separation. But there is another thing that might work. In my simulation, the schedule ignores divisions. That is,  each of the 16 games a team plays is a random matchup with one of the other 31 teams. The Browns are as likely to play the Saints as they are the Steelers. But in the real NFL, 

1. A team plays its division rivals twice
2. A team plays all four teams in another division in its conference once
3. A team plays all four teams in another division in the other conference once
4. A team plays its remaining two games against teams from the two remaining divisions in its conference.

Rule 1 seems like it might be important in creating separation within a division. In effect, 3/8 of the season is played between just four teams, and each of those games separates two teams in a division by one game. There is a 100% chance of creating a one-game separation. In contrast, when two teams play opponents outside the division, there's a 50% chance of a one-game separation (one team wins, one loses) and a 50% chance of no separation (both win or both lose).

I almost bought that argument. But when the games are decided by coin flips, the expectation value of separation per game is still zero regardless of the number of teams. If that doesn't convince you, consider that in a simulation of 1000 seasons, the coefficient of variation of wins per team was 0.4065 for a 6-game, 4-team season and 0.4093 for a 6-game, 32-team season - not a statistically significant difference.

Anyway, I re-ran the simulation continuing to decide games by coin flips but taking into account Rule 1. Here's how it came out:

North South East West
Cincinnati 12-4 Jacksonville 11-5 Miami 12-4 Oakland 10-6
Baltimore 9-7 Houston 10-6 New England 11-5 Kansas City 8-8
Pittsburgh 6-10 Indianapolis 8-8 NY Jets 7-9 Denver 5-11
Cleveland 5-11 Tennessee 6-10 Buffalo 8-8 LA Chargers 4-12
North South East West
Detroit 9-7 Tampa Bay 11-5 Philadelphia 10-6 San Francisco 9-7
Green Bay 7-9 Carolina 10-6 NY Giants 7-9 Seattle 8-8
Chicago 7-9 Atlanta 8-8 Washington 7-9 LA Rams 7-9
Minnesota 6-10 New Orleans 7-9 Dallas 6-10 Arizona 5-11

It made very little difference. There are now only 15 teams within one game of .500, but there are still two tightly bunched divisions. 

What happens if we assign random team strengths instead of just flipping a coin? I'll just base it on the CV. For uniformly distributed team strengths between 4 wins/season and 12 wins/season, the CV of wins per team for a league without divisions (no Rule 1) was 0.46 in 1000 simulated seasons. With Rule 1, it was 0.49.  So Rule 1 does exaggerate the differences between teams when a real difference already exists. But it's a weak effect. 

The range of 4-12 wins/season for team true strengths seems about right. So now you want to see the cloud plot of wins for the NFL. Here it is for 100 simulated seasons:

The scatter in wins per season is huge. An average team wins anywhere from 3 to 13 games a season. And the plot of "luck ratio" on the right is cleaner than it was for baseball and clearly shows that there's more random variation in wins for weaker teams than for stronger ones. 

Sunday, September 3, 2017

More Baseball Simulations

One question that was raised from my post yesterday is what shape the distribution of true strengths is. Is it a bell curve, a uniform distribution, or something in between?

We can't answer that question directly, because we can never observe the true strengths, only the actual win-loss records. But the shape of the true strength distribution might have an effect on the shape of the actual distributions of wins per season, which we can observe.

If I assume the following bell curve for true strength

then I get the following distribution of wins per season (this was over 137 seasons for reasons I'll explain later):

But if I assume the following flat distribution of strengths:

then I get this distribution of wins per season:

This example looks "blockier" than the one from the bell curve, but in fact its coefficient of variation is 0.12, compared to 0.13 for the bell curve result. So it's not really possible to tell from the actual outcomes whether the distribution of true strengths is bell-shaped or flat - and if you can't tell, then it doesn't matter, at least for the purpose of predicting the distribution of wins.

The histogram of actual wins for the last six MLB seasons is

and its CV is 0.135. You could probably do some more sophisticated tests, but in my experience doing this kind of modeling, if a result isn't apparent to the eye, no fancy test is going to be convincing. One thing that's interesting about the actual MLB histogram is the "dip" in the middle. This could just be random chance and might go away if more seasons were included, but it could be that as the season goes on, talent tends to drain from the weaker teams and go to the stronger teams, which could make the win histogram bimodal. Teams that have big payrolls but are out of the playoff hunt by August are often looking to unload what talent they have to the teams that are going to make a run for October, so bad teams get worse and good teams get better.

I ran 137 seasons because I wanted to get some statistics on win streaks. Here's a histogram of the longest win streak by any team during each of the 137 simulated seasons:

This distribution is definitely skewed. Its mode (the commonest value) is 12, but in no season was the longest streak less than 10. There are 22 streaks of 16 or longer, and the longest streak of all the 137 seasons is 26. This is not far from reality. In the past 137 seasons, there are 30 MLB streaks of 16 games or longer, and the all-time longest streak during that span of time was by the New York Giants of John McGraw, who won 26 in a row in a ridiculous September 101 years ago. 

Saturday, September 2, 2017

The Role of Luck in Baseball

In Major League Baseball, there is decent parity. The span between the worst and best teams in baseball right now is the 51-83 (.381) Phillies to the 92-41 (.692) Dodgers. In contrast, the worst and best teams in the NBA last year were .244 (Brooklyn) and .817 (Golden State), and the worst and best teams in the NFL were .063 (Cleveland, eeegh) and .875 (New England).

There is an element of luck in every game. When the Phillies play the Dodgers, the Dodgers will probably win, but nobody is really shocked if the Phillies pull one out. Maybe the Dodgers stayed out too late the night before, or had a rough flight to Philly.

But in the long run, the "better" team will beat the "worse" team more often than not. I put better and worse in quotes because I haven't exactly defined a team's true strength yet. Here is my definition: the true strength of a baseball team is the average number of wins it would get over an infinite number of seasons. That way, the effect of luck washes out completely. For example, an average team would get 81 wins per 162 games, if they played forever. By forever, I mean the same roster, at the same age and skill level, playing hypothetical repeated seasons forever. Obviously, they aren't getting older and older in these hypothetical seasons, as they would in real life.

Considering the effect of luck, you can see how the shortness of the NFL season (16 games) might tend to exaggerate differences between teams. The Browns clearly suck, but over a large set of seasons they might average 2 or 3 wins instead of the single win they got last year.

How does luck affect the number of wins a baseball team gets in one season, compared to its true strength? The baseball season has 10 times as many games as the NFL season, so the effect of luck should be a lot less than in the NFL. I ran some simulations to find out.

I ran 100 full seasons where 30 teams play each other in random matchups for 162 games. At the beginning of each season, I assign true strengths to the teams from a normal distribution with a coefficient of variation of 0.2. That results in true strengths running from about 40 to about 120 expected wins per season. Then I run through all 162 x 15 = 2,430 games per season. (Remember that on each game day, 30 teams play a total of 15 games.)

Each game goes like this: I draw a number from a uniform distribution between 0 and 1. If that number is less than

Team A's strength / (2 * Team B's strength)

then Team A wins. Otherwise, Team B wins. From this formula you can verify that if Team A has strength 90, and plays average teams (strength of 81) over and over, then Team A will win an average of 90 games per season in the long run. So this satisfies my definition of the team's true strength.

But the outcome of each game has an element of chance. Drumroll, please...

From left: Distribution of team strengths, actual wins versus strength for all teams and seasons, and ratio of actual to expected wins for all teams and seasons

When the team strengths are normally distributed, an average team (average 81 wins per season over infinity seasons) won as few as 65 and as many as 95 games during the 100 simulated seasons. That's the difference between first and last place. The plot of actual divided by expected wins was a check. It should average to 1 for all strength values, which it does, except for the very weak teams (not sure what's going on there, maybe a problem with my random number generator.) But it shows that the scatter is bigger for weak teams. That is, it's more likely for a weak team to do unexpectedly well or unexpectedly poorly than for a strong team. That is good - it means luck plays the least role for the strongest teams, which are the ones that get the glory. If a really crappy team gets lucky, it probably still won't be enough to affect a championship.

I then repeated the simulation but instead of choosing normally distributed strengths, I chose them from a uniformly random distribution on an interval. I set the interval width such that the standard deviation of the uniform distribution matched that of the normal distribution used previously.

Uniformly random draw of team strengths and the resulting actual wins and "luck ratio" versus team strength

In this simulation, the scatter was a little smaller, as might be expected. A team of average true strength (81 wins expected) got between maybe 68 and 90 wins over the 100 simulated seasons. It looks like the actual/expected plot shows the same narrowing of the scatter as team strength increases, but it's hard to say. 

By setting all the strengths equal to 81 (average), the outcome of each game is essentially decided by a coin flip. If a team won more than 81 games, it would solely be due to luck. In this case I found that on average, the winningest team had 90-95 wins per season, which is a very solid year. This would suggest, for instance, that probably every season, one of the division champions is a complete fluke. It took a large number of seasons (more than 10,000) to get a stable value for this number and I didn't have the patience to narrow it down further. The type of distribution used for team strengths didn't seem to matter. 

You could do all kinds of things with this simulation - and I'm sure serious gamblers do. For example, you could estimate the likelihood of a 10-game winning streak and then try to find someone to bet against who underestimated the true odds. With a lot of bets like that, I suspect you could make money consistently. But that's a suspicion I probably shouldn't pursue until my kids are out of college.