tag:blogger.com,1999:blog-315456762024-03-16T14:53:12.166-04:00Sabermetric ResearchPhil BirnbaumPhil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.comBlogger796125tag:blogger.com,1999:blog-31545676.post-5639528665992518692023-03-18T16:00:00.010-04:002023-03-20T10:59:26.268-04:001961 Yankees fielding is double-counted against Whitey Ford <div style="text-align: left;"><span style="font-family: verdana;">Here are the 1961 pitching lines of Whitey Ford and Jack Kralick that </span><a href="https://www.billjamesonline.com/jack_kralick/" style="font-family: verdana;">Bill James wrote about</a><span style="font-family: verdana;"> back in 2019:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b> IP W-L H K BB ERA <br />Ford 283 25- 4 242 209 92 3.21 <br />Kralick 242 13-11 257 101 97 3.61 </b> </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Kralick's season is decent, but clearly no match for Whitey, who has him beat in every category.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But, surprisingly, Baseball Reference has Kralick <a href="https://www.baseball-reference.com/leagues/AL/1961-value-pitching.shtml#players_value_pitching::18" target="_blank">leading the American League</a> with a WAR of 6.0. Whitey Ford, on the other hand, is 12th with only 3.7 WAR. </span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b> IP W-L H K BB ERA <span style="color: #cc0000;">WAR</span><br />Ford 283 25- 4 242 209 92 3.21 <span style="color: #cc0000;">3.7</span><br />Kralick 242 13-11 257 101 97 3.61 <span style="color: #cc0000;">6.0</span></b></span><br /><br /><span style="font-family: verdana;">What's going on?</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The answer, I think, is the fielding and park adjustments WAR uses are overinflated. That's for a future post. This post is about how, while trying to figure out what happened, I think I found an issue with the adjustments that turns out to be randomly specific to Whitey Ford's 1961 Yankees.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-----</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />B-R uses <a href="http://www.baseballprojection.com/defense/home.htm">Sean Smith's "Total Zone Rating"</a> (TZR) to estimate fielding and calculate defensive WAR (dWAR). For seasons before 1989, TZR is based on Retrosheet data, which, for most games, includes information on the type and location of balls in play (BIP). Basing dWAR on TZR means that for the team as a whole, the defensive evaluation is roughly equivalent to what you'd see just looking at batting average on balls in play (BABIP). </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />There are other factors too -- baserunner advancement, catcher arm -- but it's mostly BABIP. To confirm that, I ran a regression to predict dWAR based on BABIP (compared to league) with a dummy variable for franchise (which Sean's website says TZ adjusts for to take outfield park variation into account). For the years 1960-73, the correlation was high (r-squared .77), with the coefficient of BABIP coming out very close to the win value of turning outs into hits. For the 14 Yankee seasons specifically, the correlation was over 0.9.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So it seems like most of dWAR up to 1988 is BABIP.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-----</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In 1961, the Yankees allowed an opposition BABIP of .261, <a href="https://www.baseball-reference.com/leagues/AL/1961-advanced-pitching.shtml#teams_advanced_pitching">compared to the AL average .275</a>. That's an advantage of .014, or "14 points". </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Yankee pitchers allowed 4414 balls in play that year. So the extra .014 represents about 62 hits turned into outs. I use 0.8 as the run value of each of those outs, so that's 49.4 runs. Call it 50 for short. <br /><br />-----</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />As an aside: Baseball Reference has the 1961 Yanks at 72 runs, not 50. Why such a big difference? I'm not sure. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Maybe they use MLB instead of AL as their baseline. That would add 14 runs or so, because in 1961 the <a href="https://www.baseball-reference.com/leagues/majors/1961-advanced-pitching.shtml#teams_advanced_pitching">BABIP for both leagues combined</a> was .279 instead of the AL-only .275. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />It could also be a result of TZR not including popups and line drives in its evaluation (because presumably there's not much difference in fielding those). It also adds measures of outfielder arms (by looking at baserunner advancement), double play ability, and caught stealings for catchers. And there's that franchise adjustment. All those might contribute to the difference.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But there's another anomaly in the raw data. The MLB average for the 1961 season was almost +12 defensive runs per team. You'd think the average would have to be zero, by definition. It could be that the system uses an average based on a large number of seasons, and 1961 just happened to be a great year for fielding. But that doesn't seem like it could be the answer. For 1959-1967, the MLB total defensive runs saved is positive every one of those nine seasons. Then, it switches over: from <a href="https://www.baseball-reference.com/leagues/majors/1968-standard-fielding.shtml" target="_blank">1968</a> to 1975, every season total is negative. It doesn't seem plausible to me that MLB spent eight years with good fielders, then the next eight with worse fielders.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Whatever it is ... over all 17 years, 1961 is the biggest outlier. The <a href="https://www.baseball-reference.com/leagues/majors/1961-standard-fielding.shtml#teams_standard_fielding">total fielding runs for 1961</a> is +214, which is +11.9 per team. The next highest, 1960, is only +146. None of the other positives break 100, and the highest negative is -71 in 1971.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Anyway, that still isn't my main point; it's just something that I noticed.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />OK, now the interesting part.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />As I said, the 1961 Yankees' opponents' BABIP was 14 points lower than the league. All things being equal, you'd expect the fielding to be equally good at home and on the road -- about the same 14 points either way. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But the Yankees BABIP advantage was much higher at home. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Overall, <a href="https://www.baseball-reference.com/leagues/split.cgi?t=p&lg=AL&year=1961#hmvis" target="_blank">AL teams</a> were 8 points better at home than on the road. But the <a href="https://www.baseball-reference.com/teams/split.cgi?t=p&team=NYY&year=1961#hmvis">Yankees</a> were 37 points better:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b> NYY AL<br />------------------<br />home .242 .271<br />road .279 .279</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>------------------</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>diff .037 .008</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />On the road, the Yankee fielders were the same as the AL average, holding opponents to a .279 BABIP. At home, though, they were 29 points better than the league.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So, in effect, all 50 runs the Yankees fielders saved via BABIP were saved at Yankee Stadium.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Why does that matter? Because those 50 runs are going to be *double counted* against Yankee pitchers and their WAR.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />First, those 50 runs will be attributed to the skill of the Yankee fielders. Whitey Ford's WAR will drop, because it appears the fielders behind him were responsible for turning so many of his balls in play into outs.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Second, those same 50 runs are going to be used in calculating the Park Factor, which is based on actual runs scored (by both teams). With 50 fewer runs scored at Yankee Stadium because of BABIP, and no fewer runs scored on the road, the calculation will implicitly attribute those 50 runs to the park and the park factor will drop. Whitey Ford's WAR again will drop because he pitches in a park where it appears to be easier to prevent runs.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The adjustment for BABIP is made twice: the first time it's attributed to the fielders, and the second time it's attributed to the park. But it can't be both. At least not fully both -- it could be 50% fielding and 50% park, but the WAR method treats it as 100% fielding and 100% park.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Specifically, according to Baseball Reference, the dWAR calculation credits the Yankee defense with <a href="https://www.baseball-reference.com/players/f/fordwh01-pitch.shtml#pitching_value" target="_blank">0.43 runs per game behind Whitey Ford</a>. Over Whitey's 283 innings, that's 13.5 runs. At 10 runs per win, that's 1.35 WAR. At 8.5 runs per win -- which is what B-R seems to be using for the 1961 American League -- it's 1.6 WAR.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Whitey is being adjusted, implicitly, by 3.2 WAR instead of 1.6. Turning that double-counting back into single-counting would raise Whitey from 3.7 to 5.3, which seems much more reasonable for his performance.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">-----</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Except ... not quite. My calculation assumed that park factor is based on a single season's runs. It's actually the average of three seasons -- in this case, 1960, 1961, and 1962.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So only a third of the Yankee defense is being double-counted in 1961. That means you'd only adjust Whitey for 0.5 WAR, not the full 1.5. That brings him only to 4.2.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />However, the missing 1.0 WAR is still double-counted: it's just that one third of it is moving to 1960, and one third is moving to 1962. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That means that Whitey will be shorted 0.5 WAR in 1960, and again in 1962. If you fixed that, his 1960 would go from 2.0 to 2.5, and his 1962 from 5.1 to 5.6.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Over Whitey's career, though, the Yankees' overall home BABIP (compared to road) will indeed wind up double counted towards his total WAR (with the exception of his first two and last two years, which will be "1.3-counted" or "1.6 counted").</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I think this is something that will happen all the time, if my understanding is correct of how pitching WAR is calculated. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">Any difference between home and road fielding will be counted as part of the park factor adjustment in addition to be counted as a fielding adjustment. If BABIP is better at home, the pitcher will be debited twice. If BABIP is worse at home, the pitcher will be credited twice.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />BABIP is, like any other stat, subject to random variation. By my calculation, the SD of luck for home-minus-road BABIP is about 14 points, or 24 runs for a team-season. That's a lot. Whitey Ford pitched about 19.5 percent of his team's innings in 1961, so the SD of his random BABIP luck is about 5 runs. (The 1961 number looks like it was double that, or 2 SD, assuming no park effects. Whitey was double counted by about 10 runs by raw BABIP, and a little more than that by the dWAR calculation.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Now, dWAR does remove popups and line drives from consideration ... that will reduce the luck SD (I'm not sure how much) compared to raw BABIP. But even if the SD drops from 0.5 to 0.4, or something, that's still pretty big.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We could just adjust dWAR for this (the BABIP home/road numbers are readily available on B-R). But I think the adjustments for fielding and park are exaggerated in other ways -- as I wrote about in previous sets of posts. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />A different algorithm for adjusting pitcher WAR -- where we regress both fielding and park to the mean by significant amounts -- might reduce the double-count enough that we won't really need to make a correction for it. It will probably adjust both Whitey Ford and Jack Kralick enough that Whitey winds up on top, although I haven't checked that in detail yet.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I'll work on that for the next post.<br /><br /><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /><br /></span><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com3tag:blogger.com,1999:blog-31545676.post-84017518619085695102022-11-16T16:59:00.002-05:002022-11-16T16:59:49.999-05:00Home field advantage is naturally higher in a hitter's park<div style="text-align: left;"><span style="font-family: verdana;">The Rockies have always had a <a href="https://blogs.fangraphs.com/the-rockies-are-historically-road-averse/">huge home-field advantage</a> (HFA) at Coors. From 1993 to 2001, Colorado has played .545 at home, but only .395 on the road. That's the equivalent of the difference between going 89-73 and 64-98. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Why such a big difference? I have some ideas I'm working on, but the most obvious one -- although it's not that big, as we will see -- is that higher scoring naturally, mathematically, leads to a bigger HFA.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />When teams play better at home than on the road -- for whatever reason --the manifestation of "better" is in physical performance, not winning percentage as such. The translation from performance to winning percentage depends on the characteristics of the game. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In MLB, historically, the home team plays around .540. But if the commissioner decreed that now games were going to be 36 innings long instead of 9, the home advantage would roughly double, with the home team now winning at a .580 pace.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />(Why? With the game four times as long, the SD of the score difference by luck would double. But the home team's run advantage would quadruple. So the run differential by talent would double compared to luck. Since the normal distribution is almost linear at such small differences (roughly, from 0.1 SD to 0.2 SD), HFA would approximately double.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But it's not *always* that a higher score number increases HFA. If it was decided that all runs now count as 2 points, like in basketball, scoring would double, but, obviously, HFA would stay the same. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Roughly speaking, increased scoring increases the home advantage only if it also increases the "signal to noise ratio" of performance to luck. Increasing the length of the game does that; doubling all the scores does not.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In 2000, Coors Field increased scoring by about 40%. If that forty percent was obtained by increasing games from 9 innings to 13 innings, HFA would be around 20% higher. If the forty percent was obtained by making every run count as 1.4 runs, HFA would be 0% higher. In reality, the increase could be anywhere between 0% and 20%, or beyond.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We probably have the tools available to get a pretty good estimate of the true increase.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Let's start with the overall average HFA. My subscription to Baseball Reference allowed me to obtain <a href="https://stathead.com/baseball/split_finder.cgi?request=1&match=seasons&order_by_asc=0&order_by=HR&year_min=1980&year_max=2022&split_1=locat%3Ahmvis&split_total_comp=gt&class=team&type=b&combine_lg=M">home and road batting records</a>, all teams combined, for the 1980-2022 seasons:</span></div><div style="text-align: left;"><br /><p style="text-align: left;"><span style="font-family: IBM Plex Mono;"><span style="color: #990000;"><b> AB H 2B 3B HR BB SO<br /></b></span><span style="color: #990000;"><b>------------------------------------------------------<br /></b></span><span style="color: #990000;"><b>home 3209469 846723 161290 19928 95790 321178 612545<br /></b></span><span style="color: #990000;"><b>road 3363640 859813 163954 17203 96043 308047 668363</b></span></span></p></div><div style="text-align: left;"><span style="font-family: verdana;"><br />What's the run differential between those two batting lines? We can look at actual runs, or even the difference in run statistics like Runs Created or Extrapolated Runs. But, for better accuracy, I used Tom Tango's on-line Markov Calculator (the version modified by Bill Skelton, found <a href="http://tangotiger.net/markov_wes.html">here</a>). It turns out the home batting line leads to 4.79 runs per nine innings, and the road batting line works out to 4.36 R/9.</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: IBM Plex Mono; font-size: x-small;"><b> AB H 2B 3B HR BB SO R/9<br />-------------------------------------------------------------<br />home 3209469 846723 161290 19928 95790 321178 612545 4.79<br />road 3363640 859813 163954 17203 96043 308047 668363 4.36<br />-------------------------------------------------------------<br />difference 0.43</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That's a difference of 0.43 runs per game. Using the rule of thumb that 10 runs equals one win, a rough estimate is that the home team should have a win advantage of 0.043 wins per game, for a winning percentage of .543. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That's a pretty good estimate -- home teams actually went .539 in that span (51832-44409). But, we'll actually need to be more accurate than that, because the "10 runs per win" figure will change significantly for higher-scoring environments such as Coors. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So let's calculate an estimate of the actual runs per win for this scoring environment.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The Tango/Skelton Markov calculator includes a feature where, given the batting line, it will show the probability of a team scoring any particular number of runs in a nine-inning game. Here's part of that output:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: IBM Plex Mono;"><b> home road<br />----------------------<br />2 runs: .1201 .1342<br />3 runs: .1315 .1404<br />4 runs: .1282 .1309</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />From this table, which actually extends from 0 to 30+ runs, we can calculate how many runs it would take for the road team to turn a loss into a win.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Case 1: If the road team is tied after 9 innings, it has about a 50% chance of winning. With one additional run, it turns that into 100%. So an additional run in a tie game is worth half a win.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />How often is the game tied? Well, the chance of a 2-2 tie is .1202*.1342, or about 1.6%. The chance of a 3-3 tie is .1315*.1404, or 1.8%. Adding up the 2-2 and the 3-3 and the 0-0 and the 1-1 and the 4-4 and the 5-5, and so on all the way down the line, the overall chance is 9.7%.<br /> <br />Case 2: If the road team is down a run after 9 innings, it loses, which is a 0% chance of winning. With one additional run, it's tied, and turns that into a 50% chance. So, an additional run there is also worth half a win.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />How often is the road team down a run? Well, the chance of a 3-2 result is .1315*.1342, or about 1.8%. The chance of 4-3 is .1282*.1404, another 1.8%. And so on.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The total: a 9.54% chance the road team winds up losing by one run.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />What's the chance that the additional run will give the *home* team the extra half win? We can repeat the calculation, but instead of 3-2, we'll calculate 2-3. Instead of 4-3, we'll calculate 3-4. And so on.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The total: only 8.54%. It makes sense that it's smaller, because the better team is less likely to be behind by a run than ahead by a run.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We'll average the home and road numbers to get 9.04%. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So, we have:</span></div><div style="text-align: left;"><br /><span style="color: #0b5394; font-family: IBM Plex Mono;"><b>9.7% chance of a tie<br />9.0% chance of behind one run<br />----------------------------------------------<br />18.7% chance that a run will create half a win</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Converting that 18.7% chance to R/W:</span></div><div style="text-align: left;"><br /><span style="color: #0b5394; font-family: IBM Plex Mono;"><b> 0.187 half-wins per run <br />= 5.35 runs per half-win <br />= 10.7 runs per win</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So, we'll use 10.7 runs per win for our calculation.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />(Why, by the way, do we get 10.7 runs per win instead of the rule of thumb that it should be 10.0 flat? I think it's becuase the Markov simulation always plays the bottom of the ninth, even when the home team is already up. It therefore includes a bunch of meaningless runs that don't occur in reality. When some of the run currency is randomly useless, it pushes the price of a win higher.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We'd expect that roughly 1/18 of all runs scored are in the bottom of the ninth with the home team having already won. If we discount those by multiplying 10.7 by 17/18, we get ... 10.1 runs per win. Bingo.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We saw earlier that the home team had an advantage of 0.43 runs per game.</span><span style="font-family: verdana;"> Dividing that by 10.3 runs per win, gives us</span></div><div style="text-align: left;"><span style="font-family: verdana;"><span style="color: #0b5394;"><br /></span></span></div><div style="text-align: left;"><span style="color: #0b5394; font-family: IBM Plex Mono;"><b>Predicted: HFA of .42 wins per game (.542)<br />Actual: HFA of .39 wins per game (.539)</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We're off a bit. The difference is about 2 SD. My guess is that the Markov calculation, which is necessarily simplified, is very slightly off, and we only notice because of the huge sample size of almost 100,000 actual games. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />OK, now let's do the same thing, but this time for Coors Field only.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I could do the same thing I did for MLB as a whole: split the combined Coors batting line into home and road, and calculate those individually. The problem with that is ... well, if I do that, I'll be getting the Rockies' actual HFA at Coors, which is huge, because it includes all kinds of factors that we're not concerned with, like altitude acclimatization, tailoring of personnel to field, etc.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So, I'm going to try to convert the Coors line into an approximation of what the split would look like if it were similar to MLB as a whole.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here's that 1980-2022 MLB split from above, except I've added the percentage difference between home and road (on a per-AB basis) below:</span></div><div style="text-align: left;"><br /><p style="text-align: left;"><span style="font-family: IBM Plex Mono;"><span style="color: #990000; font-size: x-small;"><b> AB H 2B 3B HR BB SO<br /></b></span><span style="color: #990000; font-size: x-small;"><b>---------------------------------------------------------<br /></b></span><span style="color: #990000; font-size: x-small;"><b>home 3209469 846723 161290 19928 95790 321178 612545<br /></b></span><span style="color: #990000; font-size: x-small;"><b>road 3363640 859813 163954 17203 96043 308047 668363<br /></b></span><span style="color: #990000; font-size: x-small;"><b>---------------------------------------------------------<br /></b></span><span style="color: #990000; font-size: x-small;"><b>diff +3.2% +3.5% +21.4% +4.5% +9.3% -3.9%</b></span></span></p></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I'll try to create something similar for 2000 Coors. The overall batting line, for both teams, looked like this:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: IBM Plex Mono;"><b> AB H 2B 3B HR BB SO R/9 <br />---------------------------------------------<br />Coors 5843 1860 359 56 245 633 933 7.43</b></span></div><div style="text-align: left;"><br /><span style="font-family: verdana;">Here's my arbitrary split, into Rockies vs. road team, in such a way to keep roughly the same percentage differences as in MLB overall, while also keeping the R/9 roughly 7.43. Here's what I came up with:</span><br /><span style="font-family: verdana;"> </span><br /><p style="text-align: left;"><span style="font-family: IBM Plex Mono;"><span style="color: #990000; font-size: x-small;"><b> AB H 2B 3B HR BB SO <br /></b></span><span style="color: #990000; font-size: x-small;"><b>--------------------------------------------------------<br /></b></span><span style="color: #990000; font-size: x-small;"><b> home 5843 1884 362 66 249 672 936<br /></b></span><span style="color: #990000; font-size: x-small;"><b> road 5843 1826 350 54 238 615 974<br /></b></span><span style="color: #990000; font-size: x-small;"><b>--------------------------------------------------------<br /></b></span><span style="color: #990000; font-size: x-small;"><b> diff +3.2% +3.4% +22.2% +4.6% +9.3% -3.9%</b></span></span></p></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I ran those through Tango's calculator to get runs per 9 innings:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: IBM Plex Mono; font-size: x-small;"><b> AB H 2B 3B HR BB SO R/9<br />---------------------------------------------------------<br /> home 5843 1884 362 66 249 672 936 7.783<br /> road 5843 1826 350 54 238 615 974 7.071<br />---------------------------------------------------------<br /> avg 7.427<br />---------------------------------------------------------<br /> diff +.712</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Next, I ran the runs-per-game distribution calculation to get a runs-per-win estimate. (I won't go through the details here, but it's the same thing as before: calculate the probability of a tie, then a one-run home win, then a one-run road win, etc.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The result: 14.37 runs per win. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />As expected, that's significantly higher than the 10.7 we calculated for MLB overall. (Adjusting 14.37 for the superfluous bottom-of-the-ninth gives about 13.6, so, if you prefer, you can compare 13.6 Coors to 10.1 overall.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The difference of .712 runs per game, divided by 14.43 runs per win, gives an HFA of </span></div><div style="text-align: left;"><span style="font-family: IBM Plex Mono;"><br /><span style="color: #0b5394;"><b>0.0495 wins per game</b></span></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Which translates to a home winning percentage of .5495. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Comparing the two results:</span></div><div style="text-align: left;"><br /><span style="color: #0b5394; font-family: IBM Plex Mono;"><b>.542 home field winning percentage normal<br />.549 home field winning percentage Coors<br />-----------------------------------------<br />.007 difference</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The difference of .007 is worth only about half a win per home season. Sure, half a win is half a win, but I'm a little disappointed that's all we wind up with after all this work. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />It's certainly not as much of an effect as I thought there would be before I started. Even if you deducted this inherent .007, it would barely make a dent in the Rockies' 150 percentage point difference between Coors and road. The Rockies would still be in first place on the FanGraphs chart by a sizeable margin -- 42 points instead of 49.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Looked at another way, an additional .007 would move an average team from the middle of the 29-year standings, to about halfway to the top. So maybe it's not that small after all.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Still, our conclusion has to be that the Rockies' huge HFA over the years is maybe 10 percent a mathematical inevitability of all those extra runs, and 90 percent other causes.<br /><br /><br /><br /></span><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com8tag:blogger.com,1999:blog-31545676.post-19096845737902107842021-09-07T13:21:00.003-04:002021-09-07T13:28:03.739-04:00Are umpires racially biased? A 2021 study (Part II)<div style="text-align: left;"><span style="font-family: verdana; font-size: x-small;">(Part I is <a href="http://blog.philbirnbaum.com/2021/08/are-umpires-racially-biased-2021-study.html" target="_blank">here</a>.)</span></div><div style="text-align: left;"><span style="font-family: verdana; font-size: x-small;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">20 percent of drivers own diesel cars, and the other 80 percent own regular (gasoline) cars. Diesels are, on average, less reliable than regular cars. The average diesel costs $2,000 a year in service, while the average regular car only costs $1,000. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Researchers wonder if there's a way to reduce costs. Maybe diesels cost more partly because mechanics don't like them, or are unfamiliar with them? They create a regression that controls for the model, age, and mileage of the car, as well as driver age and habits. But they also include a variable for whether the mechanic owns the same type of car (diesel or gasoline) as the owner. They call that variable "UTM," or "user/technician match".</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />They run the regression, and the UTM coefficient turns out negative and significant. It turns out that when the mechanic owns the same type of car as the user, maintenance costs are more than 13 percent lower! The researchers conclude that finding a mechanic who owns the same kind of car as you will substantially reduce your maintenance costs.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But that's not correct. The mechanic makes no difference at all. That 13 percent from the regression is showing something completely different.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />If you want to solve this as a puzzle, you can stop reading and try. There's enough information here to figure it out. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The overall average maintenance cost, combining gasoline and diesel, is $1200. That's the sum of 80 percent of $1000, plus 20 percent of $2000.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So what's the average cost for only those cars that match the mechanic's car? My first thought was, it's the same $1200. Because, if the mechanic's car makes no difference, how can that number change?</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But it does change. The reason is: when the user's car matches the mechanic's, it's much less likely to be a diesel. The gasoline owners are over-represented when it comes to matching: each has an 80% chance of being included in the "UTM" sample, while the diesel owner has only a 20% chance.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In the overall population, the ratio of gasoline to diesel is 4:1. But the ratio of "gasoline/gasoline" to "diesel/diesel" is 16:1. So instead of 20%, the proportion of "double diesels" in the "both cars match" population is only 1 in 17, or 5.9%.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That means the average cost of UTM repairs is only $1059. That's 94.1 percent of $1000, plus 5.9% of $2000. That works out to 13.3 percent less than the overall $1200.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here's a chart that maybe makes it clearer. Here's how the raw numbers of UTM pairings break down, per 1000 population:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b><span style="color: #990000;">Technician Gasoline Diesel Total<br />-------------------------------------------<br />User gasoline <span style="background-color: #fcff01;">640</span> 160 800<br />User diesel 160 <span style="background-color: #fcff01;"> 40</span> 200<br />-------------------------------------------<br />Total 800 200 1000 </span></b> </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The highlighted diagonal is where the user matches the mechanic. There are 680 cars on that diagonal, but only 40 (1 in 17) are diesel.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In short: the "UTM" coefficient is significant not because matching the mechanic selects better mechanics, but because it selectively samples for more reliable (gasoline) cars.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />--------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In the umpire/race study I talked about last post, they had a regression like that, where they put all the umpires and batters together into one regression and looked at the "UBM" variable, where the umpire's race matches the batter's race. <br />From last post, here's the table the author included. The numbers are umpire errors per 1000 outside-of-zone pitches (negative favors the batter).</span></div><div style="text-align: left;"><br /><span style="color: #0b5394; font-family: courier;"><b>Umpire Black Hispanic White<br />-------------------------------------------<br />Black batter: --- -5.3 -0.3<br />Hispanic batter +7.8 --- +5.9<br />White batter +5.6 -4.4 ---</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I had adjusted that to equalize the baseline:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b>Umpire Black Hispanic White<br />------------------------------------------<br />Black batter: -5.6 -0.9 -0.3<br />Hispanic batter +2.2 +4.4 +5.9<br />White batter --- --- ---</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I think I'm able to estimate, from the original study, that the batter population was almost exactly in the 2:3:4 range -- 22 percent Black, 34 percent Hispanic, and 44 percent White. Using those numbers, I'm going to adjust the chart one more time, to show approximately what it would look like if the umpires were exactly alike (no bias) and each column added to zero. </span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b>Umpire Black Hispanic White<br />------------------------------------------<br />Black batter: <span style="background-color: #fcff01;">-2.2</span> -2.2 -2.2<br />Hispanic batter +3.8 <span style="background-color: #fcff01;">+3.8</span> +3.8<br />White batter -1.7 -1.7 <span style="background-color: #fcff01;">-1.7</span></b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I chose those numbers so the average UBM (average of diagonals in ratio 22:34:44) is zero, and also to closely fit the actual numbers the study found. That is: suppose you ran a regression using the author's data, but controlling for batter and umpire race. And suppose there was no racial bias. In that case, you'd get that table, which represents our null hypothesis of no racial bias.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">If the null hypothesis is true, what will a regression spit out for UBM? If the batters were represented in their actual ratio, 22:34:44, you'd get zero:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b>Diagonal Effect Weight Product</b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>-------------------------------------------------<br />Black UBM <span style="background-color: #fcff01;">-2.2</span> 22% -0.5 <br />Hispanic UBM <span style="background-color: #fcff01;">+3.8</span> 34% +1.5 <br />White UBM <span style="background-color: #fcff01;">-1.7</span> 44% -0.8 <br />-------------------------------------------------<br />Overall UBM 100% -0.0 per 1000</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />However: in the actual population in the MLB study, the diagonals do NOT appear in the 22:34:44 ratio. That's because the umpires were overwhelmingly White -- 88 percent White. There were only 5 percent Black umpires, and 7 percent Hispanic umpires. So the White batters matched their umpire much more often than the Hispanic or Black batters.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Using 5:7:88 for umpires, and 22:34:44 for batters, the relative frequency of each combination looks like this. Here's the breakdown per 1000 pitches:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b> Batter<br />Umpire Black Hispanic White Total<br />---------------------------------------------------<br />Black batter <span style="background-color: #fcff01;">11</span> 15 194 220<br />Hispanic batter 17 <span style="background-color: #fcff01;">24</span> 300 341<br />White batter 22 31 <span style="background-color: #fcff01;">387</span> 439<br />---------------------------------------------------<br />Umpire total 50 70 881 1000</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Because there are so few minority umpires, there are only 24 Hispanic/Hispanic pairs out of 422 total matches on the UBM diagonal. That's only 5.7% Hispanic batters, rather than 34 percent:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b>Diagonal Frequency Percent<br />----------------------------------<br />Black UBM <span style="background-color: #fcff01;">11</span> 2.6% <br />Hispanic UBM <span style="background-color: #fcff01;">24</span> 5.7%<br />White UBM <span style="background-color: #fcff01;">387</span> 91.7% <br />----------------------------------<br />Overall UBM 422 100%</b></span></div><div style="text-align: left;"><br /><span style="font-family: verdana;">If we calculate the observed average of the diagonal, with this 11/24/387 breakdown, we get this:</span><br /><span style="font-family: verdana;"> </span><br /><span style="color: #990000; font-family: courier;"><b> Effect Weight Product<br />--------------------------------------------------<br />Black UBM -2.2 2.6% -0.06 per 1000<br />Hispanic UBM +3.8 5.7% +0.22 per 1000<br />White UBM -1.7 91.7% -1.56 per 1000 <br />--------------------------------------------------<br />Overall UBM 100% -1.40 per 1000</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Hispanic batters receive more bad calls for reasons other than racial bias. By restricting the sample of Hispanic batters to only those who see a Hispanic umpire, we selectively sample fewer Hispanic batters in the UBM pool, and so we get fewer bad calls. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Under the null hypothesis of no bias, UBM plate appearances still see 1.40 fewer bad calls per 100 pitches, because of selective sampling.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That 1.40 figure is compared to the overall average. The regression coefficient, however, compares it to the non-UBM case. What's the average of the non-UBM case?</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Well, if a UBM happens 422 times out of 1000, and results in 1.40 pitches fewer than average, and the average is zero, then the other 578 times out of 1000, there must have been 1.02 pitches more than average. </span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b> Effect Weight Product<br />--------------------------------------------------<br />UBM -1.40 42.2% -0.59 per 1000<br />Non-UBM +1.02 57.8% +0.59 per 1000<br />--------------------------------------------------<br />Full sample 100% -0.00 per 1000</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So the coefficient the regression produces -- UBM compared to non-UBM -- will be 2.42.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />What did the actual study find? 2.81. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That leaves only 0.39 as the estimate of potential umpire bias:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b>-2.81 Selective sampling plus possible bias<br />-2.42 Effect of selective sampling only<br />---------------------------------------------<br /><span style="background-color: #fcff01;">-0.39 Revised estimate of possible bias</span></b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The study found 2.81 fewer bad calls (per 1000) when the umpire matched the pitcher, but 2.42 of that is selective sampling, leaving only 0.39 that could be umpire bias.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Is that 0.39 statistically significant? I doubt it. For what it's worth, the original estimate had an SD of 0.44. So adjusting for selective sampling, we're less than 1 SD from zero.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />--------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So, the conclusion: the study's finding of a 0.28% UBM effect cannot be attributed to umpire bias. It's mostly a natural mathematical artifact resulting from the fact that</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />(a) Hispanic batters see more incorrect calls for reasons other than bias, </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />(b) Hispanic umpires are rare, and</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />(c) The regression didn't control for the race of batter and umpire separately.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Because of that, almost the entire effect the study attributes to racial bias is just selective sampling.<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /></span><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-67416979224320729932021-08-30T13:13:00.002-04:002021-09-07T13:24:18.835-04:00Are umpires racially biased? A 2021 study (Part I)<div style="text-align: left;"><span style="font-family: verdana;">Are MLB umpires racially biased? There's a recent new study that claims they are. The author, who wrote it as an undergrad thesis, mentioned it on Twitter, and when I checked a week or so later, there were lots of articles and links to it. (<a href="https://www.baseballprospectus.com/news/article/68963/moonshot-a-new-study-shows-umpire-discrimination-against-non-white-players/" target="_blank">Here</a>, for instance, is a Baseball Prospectus post reporting on it. And here's a <a href="https://ca.news.yahoo.com/mlb-umpires-show-discrimination-against-non-white-players-according-to-new-study-191649525.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAALt6-ElJPuCRQvIUQUPIYb5QxpupT-Y7recPbLURfm2u5I37apH1Q89OiSLq0M7WF-1zybB6g4yI4Vtukw0EtlMtB3y3qtZ8NhZy_PW2H0l9u7Zg4R55QD-etR0h1ei_hHcJQ4dwVSg-XUmPnzPynvGlhmKf0X9hNY24cIaTmpr3" target="_blank">Yahoo! report</a>.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The study tried to figure whether umpires make more bad calls against batters* of a race other than theirs (where there is no "umpire-batter match," or "UBM," as the literature calls it). It ran regressions on called pitches from 2008 to 2020, to figure out how best to predict the probability of the home-plate umpire calling a pitch incorrectly (based on MLB "Gameday" pitch location). The author controlled for many different factors, and found a statistically significant coefficient for UBM, concluding that the pitcher gains an advantage when the umpire is of the same race. It also argues that white umpires in particular "could be the driving force behind discrimination in MLB." </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I don't think any of that is right. I think the results point to something different, and benign. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">---------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Imagine a baseball league where some teams are comprised of dentists, while the others are jockeys. The league didn't hire any umpires, so the players take turns, and promise to call pitches fairly.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />They play a bunch of games, and it turns out that the umpires call more strikes against the dentists than against the jockeys. Nobody is surprised -- jockeys are short, and thus have small strike zones.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />It's true that the data shows that if you look at the Jockey umpires, you'll see that they call a lot fewer strikes against batters of their own group than against batters of the other group. Their "UBM" coefficient is high and statistically significant.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Does that mean the jockey umps are "racist" against dentists? No, of course not. It's just that the dentists have bigger strike zones. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />It's the same, but in reverse, for the dentist umpires. They call more strikes against their fellow dentists -- again, not because of pro-jockey "reverse racism," but because of the different strike zones.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Later, teams of NBA players enter the league. These guys are tall, with huge strike zones, so they get a lot of called strikes, even from their own umpires.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Let's put some numbers on this: we'll say there are 10 teams of dentists, 1 team of jockeys, and 2 teams of NBA players. The jockeys are -10 in called strikes compared to average, and the NBA players are +10. That leaves the dentists at -1 (in order for the average to be zero).</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here's a chart that shows every umpire is completely fair and unbiased. </span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b><span style="color: #990000;">Umpire Jockey NBA Dentist</span><br /><span style="color: #990000;">-------------------------------------------</span><br /><span style="color: #990000;">Jockey batter: <span style="background-color: #fcff01;">-10</span> -10 -10</span><br /><span style="color: #990000;">NBA batter +10 <span style="background-color: #fcff01;">+10</span> +10</span><br /><span style="color: #990000;">Dentist batter -1 -1 <span style="background-color: #fcff01;"> -1</span></span></b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I've highlighted the "UBM" cells where the umpire matches the batter. If you look only at those cells, and don't think too much about what's going on, you could think the umpires are horribly biased. The Jockey batters get 10 fewer strikes than average from Jockey umpires! That's awful!</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But then when you look closer, you see the horizontal row is *all* -10. That means all the umpires called the jockeys the same way (-10), so it's probably something about the jockey batters that made that happen. In this case, it's that they're short.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I think this is what's going on in the actual study. But it's harder to see, because the chart isn't set up with the raw numbers. The author ran different regressions for the three different umpire races, and set a different set of batters as the zero-level for each. Since they're calibrated to a different standard of player, the results make the umpires look very different.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />If I had done here what the author did there, the chart above would have looked like this:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier;"><b>Umpire Jockey NBA Dentist<br />------------------------------------------<br />Jockey batter: <span style="background-color: #fcff01;"> 0</span> -20 -9<br />NBA batter +20 <span style="background-color: #fcff01;"> 0</span><span style="white-space: pre;"> </span> +11<br />Dentist batter +9 -11<span style="white-space: pre;"> </span> <span style="background-color: #fcff01;"> 0</span></b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />If you just look at this chart without knowing you can't compare the columns to each other (because they're based on a different zero baseline), it's easy to think there's evidence of bias. You'd look at the chart and say, "Hey, it looks like Jockey umpires are racist against NBA batters and dentists. Also, dentist umpires are racist against NBA players but favor Jockeys somewhat. But, look! NBA umpires actually *favor* other races! That's probably because NBA umpires are new to the tournament, and are going out of their way to appear unbiased." </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That's a near-perfect analogue to the actual study. This is the top half of Table 8, which measures "over-recognition" of pitchers, meaning balls incorrectly called as strikes (hurting the batter). I've multiplied everything by 1000, so the numbers are "wrong strike calls per 1000 called pitches outside the zone".</span></div><div style="text-align: left;"><br /><span style="color: #741b47; font-family: courier;"><b>Umpire Black Hispanic White<br />-------------------------------------------<br />Black batter: --- -5.3 -0.3<br />Hispanic batter +7.8 --- +5.9<br />White batter +5.6 -4.4 ---</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />It's very similar to my fake table above, where the dentists and Jockeys look biased, but the NBA players look "reverse biased". </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The study notes the chart and says,</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /><blockquote>"For White umpires, the results suggest that for pitches outside the zone, Hispanic batters ... face umpire discrimination. [But Hispanic umpires have a] "reverse-bias effect ... [which] holds for both Black and White batters... Lastly, the bias against non-Black batters by Black umpires is relatively consistent for both Hispanic and White batters."</blockquote></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />And it rationalizes the apparent "reverse racism" from Hispanic umpires this way:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /><blockquote>"This is perhaps attributable to the recent increase in MLB umpires from Hispanic countries, who could potentially fear the consequences of appearing biased towards Hispanic players."</blockquote></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But ... no. The apparent result is almost completely the result of setting a different zero level for each umpire/batter race -- in other words, by arbitrarily setting the diagonal to zero. That only works if the groups of batters are exactly the same. They're not. Just as Jockey batters have different characteristics than NBA player batters, it's likely that Hispanic batters don't have exactly the same characteristics as White and Black batters.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The author decided that White, Black, and Hispanic batters all should get exactly the same results from an unbiased umpire. If that assumption is false, the effect disappears. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Instead, the study could have made a more conservative assumption: that unbiased umpires of any race should call *White* batters the same. (Or Black batters, or Hispanic batters. But White batters have the largest sample size, giving the best signal-to-noise ratio.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That is, use a baseline where the bottom row is zero, rather than one where the diagonal is zero. To do that, take the original, set the bottom cells to zero, but keep the differences between any two rows in the same column:</span></div><div style="text-align: left;"><b><br /><span style="color: #741b47; font-family: courier;">Umpire Black Hispanic White<br />------------------------------------------<br />Black batter: -5.6 -0.9 -0.3<br />Hispanic batter +2.2 +4.4 +5.9<br />White batter --- --- ---</span></b></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Does this look like evidence of umpire bias? I don't think so. For any given race of batter, all three groups of umpires call about the same amount of bad strikes. In fact, all three groups of umpires even have the same *order* among batter groups: Hispanic the most, White second, and Black third. (The raw odds of that happening are 1 in 36). </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The only anomaly is that maybe it looks like there's some evidence that Black umpires benefit Black batters by about 5 pitches per 1,000, but even that difference is not statistically significant. <br /><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">In other words: the entire effect in the study disappears when you remove the hidden assumption that Hispanic batters respond to pitches exactly the same way as White or Black batters. And the pattern of "discrimination" is *exactly* what you'd expect if the Hispanic batters respond to pitches in ways that result in more errors -- that is, it explains the anomaly that Hispanic umpires tend to look "reverse racist."</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">Also, I think the entire effect would disappear if the author had expanded his regression to include dummy variables for the race of the batter. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />If, like me, you find it perfectly plausible that Hispanic batters respond to pitches in ways that generate more umpire errors, you can skip this section. If not, I will try to convince you.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />First, keep in mind that it's a very, very small difference we're talking about: maybe 4 pitches per 1,000, or 0.4 percent. Compare that to some of the other, much larger effects the study found:</span></div><div style="text-align: left;"><br /><span style="color: #783f04; font-family: courier;"><b> +8.9% 3-0 count on the batter<br /> -0.9% two outs<br /> +2.8% visiting team batting<br /> -3.3% right-handed batter<br /> +0.5% right-handed pitcher<br />+19.7% bases loaded (!!!)<br /> +1.4% pitcher 2 WAR vs. 0 WAR<br /> +0.9% pitcher has two extra all-star appearances<br /> +4.0% 2019 vs. 2008</b></span></div><div style="text-align: left;"><span style="color: #783f04; font-family: courier;"><b>---------------------------------------------------<br /> +0.4% batter is Hispanic</b></span></div><div style="text-align: left;"><span style="color: #783f04; font-family: courier;"><b>---------------------------------------------------</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I wouldn't have expected most of those other effects to exist, but they do. And they're so large that they make this one, at only +0.4%, look unremarkable. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Also: with so many large effects found in the study, there are probably other factors the author didn't consider that are just as large. Just to make something up ... since handedness of pitcher and batter are so important, suppose that platoon advantage (the interaction between pitcher and batter hand, which the study didn't include) is worth, say, 5%. And suppose Hispanic batters are likely to have the platoon advantage, say, 8% less than White batters. That would give you an 0.4% effect right there.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I don't have data specifically for Hispanic batters, but I do have data for country of birth. Not all non-USA players are Hispanic, but probably a large subset are, so I split them up that way. Here is batting-handedness stats for players from 1969 to 2016:</span></div><div style="text-align: left;"><br /><span style="color: #660000; font-family: courier;"><b>Born in USA: 61.7% RHB<br />Born outside USA: 67.1% RHB</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That's a 10% difference in handedness. I don't know how that translates into platoon advantage, but it's got to be the same order of magnitude as what we'd need for 0.4%.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here's another theory. They used to say, about prospects from the Dominican Republic, that they deliberately become free swingers because "you can't walk off the island." </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Suppose, that knowing a certain player is a free swinger, the pitcher aims a bit more outside the strike zone than usual, knowing the batter is likely to swing anyway. If the catcher sets a target outside, and the pitcher hits it perfectly, the umpire may be more likely to miscall it as a strike (at least according to many broadcasters I've heard).</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Couldn't that explain why Hispanic players get very slightly more erroneous strike calls? </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In support of that hypothesis, here are K/W ratios for that same set of batters (total K divided by total BB):</span></div><div style="text-align: left;"><br /><b><span style="color: #783f04; font-family: courier;">Born in USA: 1.82 K per BB<br />Born outside USA: 2.05 K per BB </span></b></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Again, that seems around the correct order of magnitude.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I'm not saying these are the right explanations -- they might be right, or they might not. The "right answer" is probably several factors, perhaps going different directions, but adding up to 0.4%. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />But the point is: there do seem to be significant differences in hitting styles between Hispanic and non-Hispanic batters, certainly significant enough that an 0.4% difference in bad calls is quite plausible. Attributing the entire 0.4% to racist umpires (and assuming that all races of umpires would have to discriminate against Hispanics!) doesn't have any justification whatsoever -- at least not without additional evidence.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here's a TLDR summary, with a completely different analogy this time:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /><blockquote><span style="color: #990000;">Eddie Gaedel's father calls fewer strikes on Eddie Gaedel than Aaron Judge's father calls on Aaron Judge. So Gaedel Sr. must be biased! </span></blockquote></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />--------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">There's another part of the study -- actually, the main part -- that throws everything into one big regression and still comes out with a significant "UBM" effect, which again it believes is racial bias. I think that conclusion is also wrong, for reasons that aren't quite the same. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That's Part II, which is now <a href="http://blog.philbirnbaum.com/2021/09/are-umpires-racially-biased-2021-study.html" target="_blank">here</a>.<br /></span><br /></div><div style="text-align: left;">----------</div><div style="text-align: left;"><br /></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><span style="font-family: verdana; font-size: x-small;">(*The author found a similar result for pitchers, who gained an advantage in more called strikes when they were the same race as the umpire, and a similar result for called balls as well as called strikes. In this post, I'll just talk about the batting side and the called strikes, but the issues are the same for all four combinations of batter/pitcher ball/strike.)</span></div><div style="text-align: left;"><span style="font-family: verdana; font-size: small;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana; font-size: small;"><br /></span></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-59555438819377843062021-07-26T13:31:00.000-04:002021-07-26T13:31:52.174-04:00DRS team fielding seems overinflated<div style="text-align: left;"><span style="font-family: verdana;">In a previous post, I noticed that the <a href="http://www.fieldingbible.com/TeamDefensiveRunsSaved">DRS estimates of team fielding</a> seemed much too high in many cases. In fact, the spread (standard deviation) of team DRS was almost three times as high as other methods (<a href="https://www.fangraphs.com/leaders.aspx?pos=all&stats=fld&lg=all&qual=0&type=1&season=2016&month=0&season1=2016&ind=0&team=0,ts&rost=0&age=0&filter=&players=0&startdate=&enddate=">UZR</a> and <a href="https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielding_Team&startYear=2016&endYear=2016&split=no&team=&range=year&min=q&pos=&roles=&viz=hide">OAA</a>).</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />For instance, here are the three competing systems for the 2016 Chicago Cubs:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>UZR: +43 runs (range)<br />OAA: +29 runs<br />DRS: +96 runs (107 - 11 for catcher framing)</b></span></div><div style="text-align: left;"><br /><span style="font-family: verdana;">Since I wrote that, the DRS people (Baseball Info Solutions, or BIS) have <a href="https://sportsinfosolutionsblog.com/2021/03/05/an-update-to-the-last-few-years-of-defensive-runs-saved/">issued significant corrections</a> for the 2018 and 2019 seasons (and smaller corrections for 2017). It seems the MLB feeds were off in their timing; when the camera switched from showing the batter to showing the batted ball, they skipped a fraction of a second, which is a big deal when evaluating fielders.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The corrections are a big improvement -- most of the extreme figures have shrunk. For instance, the 2018 Phillies improve from -111 runs to -75 runs. It seems that Baseball Reference has not yet updated with the new figures ... <a href="https://www.baseball-reference.com/players/n/nolaaa01.shtml">Aaron Nola's numbers</a> remain where they were when I wrote the previous post in January. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />However, as far as I can tell, DRS numbers before 2017 remain unchanged, so the problem is still there.</span></div><div style="text-align: left;"><br /><span style="font-family: verdana;">------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In my <a href="http://blog.philbirnbaum.com/2020/12/splitting-defensive-credit-between_29.html">previous posts</a>, I found that the SD of BABIP (batting average on balls in play, which is where the effects of fielding should be seen) had an SD of about 35 runs. (That's about 44 plays out of 3900, at an assumed value of 0.8 runs per play.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Again from those posts, we should expect only about 42 percent of that variation to belong to the fielders -- the rest are the result of pitchers giving up easier balls in play (48 percent) and park effects (10 percent).</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In other words, we should be seeing</span></div><div style="text-align: left;"><br /><b><span style="font-family: courier;">23 runs fielders<br />24 runs pitchers<br />11 runs park<br />----------------<br />35 runs total</span></b></div><div style="text-align: left;"><br /><span style="font-family: verdana;">That means any metric that tries to quantify the performance of the fielders should come in with an SD of about 23 runs. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />DRS, from 2003 to 2019, comes in at 41 runs (<a href="https://www.fangraphs.com/leaders.aspx?pos=all&stats=fld&lg=all&qual=0&type=1&season=2019&month=0&season1=2003&ind=1&team=0,ts&rost=0&age=0&filter=&players=0&startdate=&enddate=">data courtesy Fangraphs</a>). I didn't calculate the SD after subtracting off catcher, because Fangraphs doesn't provide it in their downloadable spreadsheet. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">I did figure it out for 2018, where I typed in the numbers manually from the <a href="http://www.fieldingbible.com/TeamDefensiveRunsSaved">DRS website</a>. That season, the SD without catcher was about 85 percent of the total SD. Using the same adjustment for other years would bring the multi-year observed SD from 41 down to 35.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />That SD of 35 runs happens to be the same as the SD of BABIP runs. That means DRS is effectively attributing the *entire* team BABIP performance to the fielders, and none to the pitchers or park.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />By comparison, the other two metrics are more reasonable:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>OAA: 18 runs<br />UZR: 25 runs<br />DRS: 35 runs</b></span></div><div style="text-align: left;"><br /><span style="font-family: verdana; font-size: x-small;">(Note: Tango tells me I need to bump up official OAA by about 7 percent to account for missing plays, so I've done that. In previous posts, I used 20 percent, which is now wrong -- first, because the data has been improved since then, and, second, because I previously forgot about regression to the mean for the missing data. I should have used 14 percent instead of 20 then, I think.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />OAA and UZR are right around the theoretical 23 runs. DRS, on the other hand, is much higher. To get DRS down to 23 runs, you have to regress it to the mean by about a third. So the 2016 Cubs need to fall from +96 to +63.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />To get DRS down to the OAA level of 18 runs, you have to regress by about half, from +96 to around +49.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-----</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />If DRS is overinflated, does that mean it's also less accurate in identifying the good and bad fielding teams? Apparently not! Despite outsized values, In 2018, DRS predicted BABIP better than OAA did, in terms of correlations:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>OAA: .58 correlation<br />DRS: .62 correlation</b></span></div><div style="text-align: left;"><br /><span style="font-family: verdana;">Correlations include a "built in" regression to the mean, which is why DRS could do well despite being overexaggerated.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />In 2019, though, DRS isn't nearly as accurate:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>OAA: .48 correlation<br />DRS: .33 correlation</b></span></div><div style="text-align: left;"><br /><span style="font-family: verdana;">I guess you could do more years and figure out which metric is better, and by how much. You could include UZR in there too. I probably should have done that myself, but I didn't think of it earlier and I'm too lazy to go do it now.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />And, just for reference, here are the SDs for 2018 and 2019 specifically (DRS does not include catcher):</span><br /><br /></div><div style="text-align: left;"><span style="font-family: courier;"><b>2018 DRS: 41<br />2018 OAA + 7%: 20</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>------------------------------------</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>regress DRS to mean 51% to match OAA</b></span></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><span style="font-family: courier;"><b>2019 DRS: 44<br />2019 OAA + 7%: 19</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>------------------------------------</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>regress DRS to mean 57% to match OAA</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So I'm not sure what's going on with DRS. They seem to be double-counting somewhere in their algorithm, but I don't know how or where.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />If you're using DRS, I would suggest you first regress to the mean by around a third if you want to match the theoretical SD of 23, and by around half if you want to match the OAA SD of 19. The correlations to BABIP suggest the regressed DRS could be as accurate as OAA after regressing.</span><br /><br /><br /><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-48570852520214236062021-01-31T14:47:00.005-05:002021-02-01T16:33:20.901-05:00Splitting defensive credit between pitchers and fielders (Part III)<div style="text-align: left;"><span style="font-family: verdana;">(This is part 3. Part 1 is <a href="http://blog.philbirnbaum.com/2020/12/splitting-defensive-credit-between_29.html">here</a>; part 2 is <a href="http://blog.philbirnbaum.com/2021/01/splitting-defensive-credit-between.html">here</a>.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">UPDATE, 2021-02-01: Thanks to Chone Smith in the comments, who pointed out an error. I investigated and found an error in my code. I've updated this post -- specifically, the root mean error and the final equation. The description of how everything works remains the same.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">------</span></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><span style="font-family: verdana;">Last post, we estimated that in 2018, Phillies fielders were 3 outs better than league average when Aaron Nola was on the mound. That estimate was based on the team's BAbip and Nola's own BAbip.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Our first step was to estimate the Phillies' overall fielding performance from their BAbip. We had to do that because BAbip is a combination of both pitching and fielding, and we had to guess how to split those up. To do that, we just used the overall ratio of fielding BAbip to overall BAbip, which was 47 percent. So we figured that the Phillies fielders were -24, which is 47 percent of their overall park-adjusted -52.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We can do better than that kind of estimate, because, at least for recent years, we have actual fielding data that can substitute for that estimate. Statcast tells us that the Phillies fielders were -39 outs above average (OAA) for the season*. That's 75 percent of BAbip, not 47 percent ... but still well within typical variation for teams. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /><span style="font-size: x-small;">(*The <a href="https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielding_Team&year=2018&team=&range=year&min=10&pos=&roles=&viz=hide">published estimate</a> is -31, but I'm adding 25 percent (per Tango's suggestion) to account for games not included in the OAA estimate.) </span></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />So we can get much more accurate by starting with the true zone fielding number of -39, instead of the weaker estimate of -24. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />First, let's convert the -39 back to BAbip, by dividing it by 3903 BIP. That gives us ... almost exactly -10 points.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The SD of fielding talent is 6.1. The SD of fielding luck in 3903 BIP is 3.65. So it works out that luck is 2.6 of the 10 points, and talent is the remaining 7.3. (That's because 2.6 = 3.65^2/(3.65^2+6.1^2).)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />We have no reason (yet) to believe Nola is any different from the rest of the team, so we'll start out with an estimate that he got team average fielding talent of -7.3, and team average fielding luck of -2.6.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Nola's BAbip was .254, in a league that was .296. That's an observed 41 point benefit. But, with fielders that averaged .00074 talent and -0.0026 luck, in a park that was +0.0025, that +41 becomes +48.5. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">That's what we have to break down. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here's Nola's SD breakdown, for his 519 BIP. We will no longer include fielding talent in the chart, because we're using the fixed team figure for Nola, which is estimated elsewhere and not subject to revision. But we keep a reduced SD for fielding luck relative to team, because that's different for every pitcher.</span></div><div style="text-align: left;"><br /><b><span style="font-family: courier;"> 9.4 fielding luck<br /> 7.6 pitching talent<br />17.3 pitching luck<br /> 1.5 park<br />--------------------<br />21.2 total</span></b></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Converting to percentages:</span></div><div style="text-align: left;"><br /><b><span style="font-family: courier;"> 20% fielding luck<br /> 13% pitching talent<br /> 67% pitching luck<br /> 1% park<br />--------------------<br />100% total</span></b></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Using the above percentages, the 48.5 becomes:</span></div><div style="text-align: left;"><br /><b><span style="font-family: courier;">+ 9.5 points fielding luck<br />+ 6.3 points pitching talent<br />+32.5 points pitching luck<br />+ 0.2 points park<br />-------------------<br />+48.5 points</span></b></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Adding back in the -7.3 points for observed Phillies talent, -2.6 for Phillies luck, and 2.5 points for the park, gives</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b> -7.3 points fielding talent [0 - 7.3]<br /> +6.9 points fielding luck [+10.2 - 2.6]<br /> +6.3 points pitching talent<br />+32.5 points pitching luck<br /> +2.7 points park [0.2 + 2.5]<br />-----------------------------------------<br /> 41 points</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Stripping out the two fielding rows:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>-7.3 points fielding talent <br />+6.9 points fielding luck<br />-----------------------------<br />-0.4 points fielding</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The conclusion: instead of hurting him by 10 points, as the raw team BAbip might suggest, or helping him by 6 points, as we figured last post ... Nola's fielders only hurt him by 0.4 points. That's less than a fifth or a run. Basically, Nola got league-average fielding.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />--------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Like before, I ran this calculation for all the pitchers in my database. Here are the correlations to actual "gold standard" OAA behind the pitcher:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>r=0.23 assume pitcher fielding BAbip = team BAbip<br />r=0.37 BAbip method from last post<br />r=0.48 assume pitcher OAA = team OAA<br />r=0.53 this method</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />And the root mean square error:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>13.7 assume pitcher fielding BAbip = team BAbip<br />11.3 BAbip method from last post<br />10.2 assume pitcher OAA = team OAA<br />10.0 this method</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Like in the last post, here's a simple formula that comes very close to the result of all these manipulations of SDs:</span></div><div style="text-align: left;"><br /><span style="color: #990000; font-family: courier; font-size: large;"><b>F = 0.8*T + 0.2*P</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Here, "F" is fielding behind the pitcher, which is what we're trying to figure out. "T" is team OAA/BAbip. "P" is player BAbip compared to league.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Unlike the last post, here the team *does* include the pitcher you're concerned with. We had to do it this way because presumably we have data for the team without the pitcher. (If we did, we'd just subtract it from team and get the pitcher's number directly!)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />It looks like 20% of a pitcher's discrepancy is attributable to his fielders. That number is for workloads similar to those in my sample -- around 175 IP. It does with playing time, but only slightly. At 320 IP, you can use 19% instead. At 40 IP, you can use 22%. Or, just use 20% for everyone, and you won't be too far wrong.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Full disclosure: the real life numbers for 2017-19 are different. The theory is correct -- I wrote a simulation, and everything came out pretty much perfect. But on real data, not so perfect.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />When I ran a linear regression to predict OAA from team and player BIP, it didn't come out to 20%. It came out to only about 11.5%. The 95% confidence interval only brings it up to 15% or 16%.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />The same thing happened for the formula from the last post: instead of the predicted 26%, the actual regression came out to 17.5%.<br /> <br />For the record, these are the empirical regression equations, all numbers relative to league:</span></div><div style="text-align: left;"><br /><span style="font-family: courier;"><b>F = 0.23*(Team BAbip without pitcher) + 0.175*P<br />F = 0.92*(Team OAA/BIP including pitcher) + 0.115*P</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />Why so much lower than expected? I'm pretty sure it's random variation. The empirical estimate of 11.5% is very sensitive to small variations in the seasonal balance of variation in pitching and fielding luck vs. talent -- so sensitive that the difference between 11.5 points and 20 points is not statistically significant. Also, the actual number changes from year-to-year because of variation. So, I believe that the 20% number is correct as a long-term average, but for the seasons in the study, the actual number is probably somewhere between 11.5% and 20%.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br />I should probably explain that in a future post. But, for now, if you don't believe me, feel free to use the empirical numbers instead of my theoretical ones. Whether you use 11.5% or 20%, you'll still be much more accurate than using 100%, which is effectively what happens when you use the traditional method of assigning the overall team number equally to every pitcher.<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /></span><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com5tag:blogger.com,1999:blog-31545676.post-33004576613447521382021-01-11T16:03:00.027-05:002021-01-31T14:50:24.865-05:00Splitting defensive credit between pitchers and fielders (Part II)<div><span style="font-family: verdana;">(Part 1 is <a href="http://blog.philbirnbaum.com/2020/12/splitting-defensive-credit-between_29.html">here</a>. This is Part 2. If you want to skip the math and just want the formula, it's at the bottom of this post.)</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">------</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">When evaluating a pitcher, you want to account for how good his fielders were. The "traditional" way of doing that is, you scale the team fielding to the pitcher. Suppose a pitcher was +20 plays better than normal, and his team fielding was -5 for the season. If the pitcher pitched 10 percent of the team innings, you might figure the fielding cost him 0.5 runs, and adjust him from +20 to +20.5.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I <a href="http://blog.philbirnbaum.com/2016/11/how-should-we-evaluate-detroits-defense.html">have argued</a> that this isn't right. Fielding performance varies from game to game, just like run support does. Pitchers with better ball-in-play numbers probably got better fielding during their starts than pitchers with worse ball-in-play numbers.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">By analogy to run support: in 1972, Steve Carlton famously went 27-10 on a Phillies team that was 32-87 without him. Imagine how good he must have been to go 27-10 for a team that scored only 3.22 runs per game!</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Except ... in the games Carlton started, the Phillies actually scored 3.76 runs per game. In games he didn't start, the Phillies scored only 3.03 runs per game. </span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">The fielding version of Steve Carlton might be <a href="https://www.baseball-reference.com/players/split.fcgi?id=nolaaa01&year=2018&t=p">Aaron Nola in 2018</a>. A couple of years ago, Tom Tango <a href="https://twitter.com/tangotiger/status/1081587340492050432">pointed out</a> <a href="https://twitter.com/tangotiger/status/1081602562044907521">the problem</a> using Nola as an example, so I'll follow his lead.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Nola went 17-6 for the Phillies with a 2.37 ERA, and gave up a batting average on balls in play (BAbip) of only .254, against a league average of .295 -- that, despite an estimate that his fielders were 0.60 runs per game worse than average. If you subtract 0.60 from Nola's stat line, you wind up with Nola's pitching equivalent to an ERA in the 1s. As a result, Baseball-Reference winds up assigning Nola a WAR of 10.2, tied with Mike Trout for best in MLB that year.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">But ... could Nola really have been hurt that much by his fielders? A BAbip of .254 is already exceptionally low. An estimate of -0.60 runs per game implies his BAbip with average fielders would have been .220, which is almost unheard of.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana; font-size: x-small;">(In fairness: the Phillies 0.60 DRS fielding estimate, which comes from Baseball Info Solutions, is much, much worse than estimates from other sources -- three times the UZR estimate, for instance. I suspect there's some kind of scaling bug in recent BIS ratings, because, roughly, if you divide DRS by 3, you get more realistic numbers, and standard deviations that now match the other measures. But I'll save that for a future post.)</span></div><div><span style="font-family: verdana;"><br /></span></div><div><span style="font-family: verdana;">So Nola was almost certainly hurt less by his fielders than his teammates were, the same way Steve Carlton was hurt less by his hitters than his teammates were. </span><span style="font-family: verdana;">But, how much less? </span></div><div><span style="font-family: verdana;"><br /></span></div><div><span style="font-family: verdana;">Phrasing the question another way: Nola's BAbip (I will leave out the word "against") was .254, on a team that was .306, in a league that was .295. What's the best estimate of how his fielders did?</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I think we can figure that out, extending the results in my previous post.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">------</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">First, let's adjust for park. In the five years prior to 2018, <strike>the Phillies</strike> BAbip for both teams combined was .0127 ("12.7 points") better at Citizens Bank Park than in Phillies road games. Since only half of Phillies games were at home, that's 6.3 points of park factor. Since there's a lot of luck involved, I regressed 60 percent to the mean of zero (with a limit of 5 points of regression, to avoid ruining outliers like Coors Field), leaving the Phillies with 2.5 points of park factor.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Now, look at how the Phillies did with all the other pitchers. For non-Nolas, the team BAbip was .3141, against a league average of .2954. Take the difference, subtract the park factor, and the Phillies were 21 points worse than average.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">How much of those 21 points came from below-average fielding talent? To figure that out, here's the SD breakdown from the previous post, but adjusted. I've bumped luck upwards for the lower number of PA, dropped park down to 1.5 since we have an actual estimate, and increased the SD of pitching because the Phillies had more high-inning guys than average:</span></div><div><span style="font-family: verdana;"><br /></span></div><div><b style="color: #990000; font-family: courier;">6.1 points fielding talent</b><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b>3.9 points fielding luck</b></span><br /><span style="color: #990000; font-family: courier;"><b>5.6 points pitching talent</b></span><br /><span style="color: #990000; font-family: courier;"><b>6.8 points pitching luck</b></span><br /><span style="color: #990000; font-family: courier;"><b>1.5 points park</b></span><br /><span style="color: #990000; font-family: courier;"><b>---------------------------</b></span><br /><span style="color: #990000; font-family: courier;"><b>11.5 points total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">Of the Phillies' 21 points in BAbip, what percentage is fielding talent? The answer: (6.1/11.5)^2, or 28 percent. That's 5.9 points.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">So, we assume that the Phillies' fielding talent was 5.9 points of BAbip worse than average. With that number in hand, we'll leave the Phillies without Nola and move on to Nola himself.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">-------</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">On the raw numbers, Nola was 41 points better than the league average. But, we estimated, his fielding was about 6 points worse, while his park helped him by 2.5 points, so he was really 44.5 points better.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">For an individual pitcher with 700 BIP, here's the breakdown of SDs, again from the previous post:</span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b> 6.1 fielding talent</b></span></div><div><span style="color: #990000; font-family: courier;"><b> 7.6 fielding luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 7.6 pitching talent</b></span><br /><span style="color: #990000; font-family: courier;"><b>15.5 pitching luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 3.5 park</b></span><br /><span style="color: #990000; font-family: courier;"><b>---------------------</b></span><br /><span style="color: #990000; font-family: courier;"><b>20.2 total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">We have to adjust all of these for Nola.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">First, fielding talent goes down to 5.2. Why? Because we estimated it from other data, and so we have less variance than if we just took the all-time average. (A simulation suggests that we multiply the 6.1 by, from the "team without Nola" case, (SD without fielding talent)/(SD with fielding talent).)</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Fielding luck and pitching luck increase because Nola had only 519 BIP, not 700.</span><br /><span style="font-family: verdana;"><br /></span></div><div><span style="font-family: verdana;">Finally, park goes to 1.5 for the same reason as before. </span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b> 5.2 fielding talent</b></span></div><div><span style="color: #990000; font-family: courier;"><b>10.0 fielding luck </b></span></div><div><span style="color: #990000; font-family: courier;"><b> 7.6 pitching talent</b></span><br /><span style="color: #990000; font-family: courier;"><b>17.3 pitching luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 1.5 park</b></span><br /><span style="color: #990000; font-family: courier;"><b>--------------------</b></span><br /><span style="color: #990000; font-family: courier;"><b>22.1 total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">Convert to percentages:</span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b> 5.5% fielding talent</b></span></div><div><span style="color: #990000; font-family: courier;"><b>20.4% fielding luck</b></span><br /><span style="color: #990000; font-family: courier;"><b>11.8% pitching talent</b></span><br /><span style="color: #990000; font-family: courier;"><b>61.3% pitching luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 0.5% park</b></span><br /><span style="color: #990000; font-family: courier;"><b>---------------------</b></span><br /><span style="color: #990000; font-family: courier;"><b>100% total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">Multiply by Nola's 44.5 points:</span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b> 2.5 fielding talent </b></span></div><div><span style="color: #990000; font-family: courier;"><b> 9.1 fielding luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 5.3 pitching talent</b></span><br /><span style="color: #990000; font-family: courier;"><b>27.3 pitching luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 0.2 park</b></span><br /><span style="color: #990000; font-family: courier;"><b>--------------------</b></span><br /><span style="color: #990000; font-family: courier;"><b>44.5 total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">Now we add in our previous estimates of fielding talent and park, to get back to Nola's raw total of 41 points:</span></div><div><span style="font-family: verdana;"> </span></div><div><span style="color: #990000; font-family: courier;"><b>-3.4 fielding talent [2.5-5.9]</b></span></div><div><span style="color: #990000; font-family: courier;"><b> 9.1 fielding luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 5.3 pitching talent</b></span><br /><span style="color: #990000; font-family: courier;"><b>27.3 pitching luck</b></span><br /><span style="color: #990000; font-family: courier;"><b> 2.7 park [0.2+2.5]</b></span><br /><span style="color: #990000; font-family: courier;"><b>------------------------------</b></span><br /><span style="color: #990000; font-family: courier;"><b>41 total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">Consolidate fielding and pitching:</span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b> 5.6 fielding</b></span></div><div><span style="color: #990000; font-family: courier;"><b>32.6 pitching </b></span><br /><span style="color: #990000; font-family: courier;"><b> 2.7 park </b></span></div><div><span style="color: #990000; font-family: courier;"><b>------------- </b></span><br /><span style="color: #990000; font-family: courier;"><b>41 total</b></span></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">Conclusion: The best estimate is that Nola's fielders actually *helped him* by 5.6 points of BAbip. That's about 3 extra outs in his 519 BIP. At 0.8 runs per out, that's 2.4 runs, in 212.1 IP, for about 0.24 WAR or 10 points of ERA.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Baseball-reference had him at 60 points of ERA; we have him at 10. Our estimate brings his WAR down from 10.3 to 9.1, or something like that. (Again, in fairness, most of that difference is the weirdly-high DRS estimate of 0.60. If DRS had him at a more reasonable .20, we'd have adjusted him from 9.4 to 9.1, or something.)</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">-------</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Our estimate of +3 outs is ... just an estimate. It would be nice if we had real data instead. We wouldn't have to do all this fancy stuff if we had a reliable zone-based estimate specifically for Nola.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Actually, we do! Since 2017, Statcast has been analyzing batted balls and tabulating "outs above average" (OAA) for every pitcher. For Nola, in 2018, they have +2. Tom Tango told me Statcast doesn't have data for all games, so I should multiply the OAA estimate by 1.25. </span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">That brings Statcast to +2.5. We estimated +3. Not bad!</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">But Nola is just one case. And we might be biased in the case of Nola. This method is based on a pitcher of average talent. Nola is well above average, so it's likely some of the difference we attributed to fielding is really due to Nola's own BAbip pitching tendencies. Maybe instead of +3, his fielders were really +1 or something.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">So I figured I'd better test other players too.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I found all pitchers from 2017 to 2019 that had Statcast estimates, with at least 300 BIP for a single team. There were a few players whose names didn't quite correlate with my Lahman database, so I just let those go instead of fixing them. That left 342 pitcher-seasons. I assume almost all of them were starters.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">For each pitcher, I ran the same calculation as for Nola. For comparison, I also did the "traditional" estimate where I gave the pitcher the same fielding as the rest of the team. Here are the correlations to the "gold standard" OAA:</span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #351c75; font-family: courier;"><b>r=0.37 this method</b></span></div><div><span style="color: #351c75; font-family: courier;"><b>r=0.23 traditional</b></span><br /><span style="font-family: verdana;"><br /></span></div><div><span style="font-family: verdana;">Here are the approximate root-mean-square errors (lower is better):</span><br /><span style="color: #351c75; font-family: courier;"><b><br /></b></span></div><div><span style="color: #351c75; font-family: courier;"><b>11.3 points of BAbip this method</b></span></div><div><span style="color: #351c75; font-family: courier;"><b>13.7 points of BAbip traditional</b></span><br /><span style="font-family: verdana;"><br /></span></div><div><span style="font-family: verdana;">This method is meant to be especially relevant for a pitcher like Nola, whose own BAbip is very different from his team's. Here are the root-mean-squared errors for pitchers who, like Nola, had a BAbip at least 10 plays better than their team's:</span><br /><span style="color: #351c75; font-family: courier;"><b><br /></b></span></div><div><span style="color: #351c75; font-family: courier;"><b> 9.3 points this method</b></span></div><div><span style="color: #351c75; font-family: courier;"><b>11.9 points traditional </b></span><br /><span style="font-family: verdana;"><br /></span></div><div><span style="font-family: verdana;">And for pitchers at least 10 plays worse:</span><br /><span style="color: #351c75; font-family: courier;"><b><br /></b></span></div><div><span style="color: #351c75; font-family: courier;"><b> 9.3 points this method</b></span></div><div><span style="color: #351c75; font-family: courier;"><b>10.9 points traditional</b></span></div><div><span style="color: #351c75; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">------</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Now, the best part: there's an easy formula to get our estimates, so we don't have to use the messy sums-of-squares stuff we've been doing so far. </span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">We found that the original estimate for team fielding talent was 28% of observed-BAbip-without-pitcher. And then, our estimate for additional fielding behind that pitcher was 26% of the difference between that pitcher and the team. In other words, if the team's non-Nola BAbip (relative to the league) is T, and Nola's is P,</span></div><div><span style="font-family: verdana;"><br /></span><b style="color: #990000; font-family: courier;">Fielders = .28T + .26(P-.28T)</b></div><div><span style="color: #990000; font-family: courier;"><b><br /></b></span><span style="font-family: verdana;">The coefficients vary by numbers of BIPs. But the .28 is pretty close for most teams. And, the .26 is pretty close for most single-season pitchers: luck is 25% fielding, and talent is about 30% fielding, so no matter your proportion of randomness-to-skill, you'll still wind up between 25% and 30%.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Expanding that out gives an easier version of the fielding adjustment, which I'll print bigger.</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">------</span></div><div><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Suppose you have an average pitcher, and you want to know how much his fielders helped or hurt him in a given season. You can use this estimate:</span></div><div><span style="font-family: verdana;"><br /></span><span style="color: #741b47; font-family: courier;"><b><span style="font-size: large;">F = .21T + .26P</span><span style="font-size: medium;"> </span></b></span></div><div><span style="color: #741b47; font-family: courier; font-size: medium;"><b><br /></b></span><b style="color: #351c75; font-family: courier;">Where: </b></div><div><span style="color: #351c75; font-family: courier;"><b><br /></b></span><span style="color: #351c75; font-family: courier;"><b>T is his team's BAbip relative to league for the other pitchers on the team, and</b></span></div><div><span style="color: #351c75; font-family: courier;"><b><br /></b></span></div><div><span style="color: #351c75; font-family: courier;"><div style="color: black; font-family: "Times New Roman";"><span style="color: #351c75; font-family: courier;"><b>P is the pitcher's BAbip relative to league, and </b></span></div><div style="color: black; font-family: "Times New Roman";"><span style="color: #351c75; font-family: courier;"><b><br /></b></span></div><div style="color: black; font-family: "Times New Roman";"><b style="color: #351c75; font-family: courier;">F is the estimated BAbip performance of the fielders, relative to league, when that pitcher was on the mound.</b></div></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="font-family: verdana;">-----</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">Next: <a href="http://blog.philbirnbaum.com/2021/01/splitting-defensive-credit-between_31.html">Part III</a>, splitting team OAA among pitchers.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-85239143777926143342020-12-29T15:56:00.005-05:002021-01-16T09:55:04.357-05:00Splitting defensive credit between pitchers and fielders (Part I)<p><span style="font-family: verdana; font-size: x-small;">(Update, 2020-12-29: This is take 2. I had posted this a few days ago, but, after further research, I tweaked the numbers and this is the result. Explanations are in the text.)</span></p><p><span style="font-family: verdana; font-size: x-small;">-----</span></p><p><span style="font-family: verdana;">Suppose a team has a good year in terms of opposition batted ball quality. Instead of giving up a batting average on balls in play (BAbip) of .300, their opponents hit only .280. In other words, they were .020 better than average in turning (inside-the-park) batted balls into outs. </span></p><p><span style="font-family: verdana;">How much of those "20 points" was because of the fielders, and how much was because of the pitcher?</span></p><p><span style="font-family: verdana;">Thanks to previous work by Tom Tango, Sky Andrecheck, and others, I think we have what we need to figure this out. If you don't want to see the math or logic, just head to the last section of this post for the two-sentence answer.</span></p><p><span style="font-family: verdana;">------</span></p><p><span style="font-family: verdana;">In 2003, a paper called "<a href="http://www.tangotiger.net/solvingdips.pdf">Solving DIPS</a>," (by Erik Allen, Arvin Hsu, Tom Tango, et al) did a great job in trying to establish what factors affect BAbip, and in what proportion. I did <a href="http://blog.philbirnbaum.com/2015/05/pitchers-influence-babip-more-than.html">my own estimation</a> in 2015 (having forgotten about the previous paper). I'll use my breakdown here. </span></p><p><span style="font-family: verdana;">Looking at a large number of actual team-seasons, I found that the observed SD of BAbip was 11.2 points. I estimated the breakdown of SDs as:</span></p><p><span style="font-family: verdana;"><br /></span></p><div style="text-align: left;"><span style="font-family: courier;"><b> 7.7 fielding talent<br /> 2.5 pitching staff talent<br /> 7.1 luck<br /> 2.5 park<br />--------------------------<br />11.0 total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana; font-size: x-small;">(If you haven't seen this kind of chart before, the "total" doesn't actually add up to the components unless you square them all. That's how SDs work -- when you have two independent variables, the SD of their sum is the square root of the sum of their squares.)</span></p><p><span style="font-family: verdana;">OK, this is where I update a bit from the numbers in the previous version of this post.</span></p><p><span style="font-family: verdana;">First, I'm bumping the SD of park from 2.5 points to 3.5 points, to match Tango's numbers for 1999-2002. Second, I'm bumping luck to 7.3, since that's the theoretical value (as I'll calculate later). Third, I'm bumping the pitching staff to 4.3, because after checking, it turns out I made an incorrect mathematical assumption in the previous post. Finally, fielding talent drops to 6.1 to make it all add up. So the new breakdown:</span></p><p><span style="font-family: verdana;"><br /></span></p><p><b style="color: #990000; font-family: courier;"> 6.1 fielding talent<br /> 4.3 pitching staff talent<br /> 7.3 luck<br /> 3.5 park<br />--------------------------<br />11.0 total</b></p><p><br /></p><p><span style="font-family: verdana;">----</span></p><p><span style="font-family: verdana;">We can use that chart to break the team's 20-point advantage into its components. But ... we can't yet calculate how much of that 20 points goes to the fielders, and how much to the pitchers. </span><span style="font-family: verdana;">Because, we have an entry called "luck". We need to know how to break down the luck and assign it to either side. </span></p><p><span style="font-family: verdana;">Your first reaction might be -- it's luck, so why should we care? If we're looking to assign deserved credit, why would we want to assign randomness?</span></p><p><span style="font-family: verdana;">But ... if we want to know how the players actually performed, we *do* want to include the luck. We want to know that Roger Maris hit 61 home runs in 1961, even if it's undoubtedly the case that he played over his head in doing so. In this context, "luck" just means the team did somewhat better or worse than their actual talent. That's still part of their record.</span></p><p><span style="font-family: verdana;">Similarly here. If a team gets lucky in opponent BAbip, all that means is they did better than their talent suggests. But how much of that extra performance was the pitchers, giving up easier balls in play? And how much was the fielders, making more and better plays than expected?</span></p><p><span style="font-family: verdana;">That's easy to figure out if we have zone-type fielding stats, calculated by watching where the ball is hit (and sometimes how fast and at what angle), and figuring out the difficulty of every ball, and whether or not the fielders were able to turn it into an out. With those stats, we don't have to risk "blaming" a fielder for not making a play on a bloop single he really had no chance on. </span></p><p><span style="font-family: verdana;">So where we have those stats, and they work, we have the answer right there, and this post is unnecessary. If the team was +60 runs on balls in play, and the fielders' zone ratings add up to +30, that's half-and-half, so we can say that the 20-point BAbip advantage was 10 points pitching and 10 points hitting.</span></p><p><span style="font-family: verdana;">But for seasons where we don't have the zone rating, what do we do, if we don't know how to split up the luck factor?</span></p><p><span style="font-family: verdana;">Interestingly, it will the stats compiled by the Zone Rating people that allow us to calculate estimates for the years in which we don't have them.</span></p><p><span style="font-family: verdana;">------</span></p><p><span style="font-family: verdana;">Intuitively, the more common "easy outs" and "sure hits" are, the less fielders matter. In fact, if *all* balls in player were 0% or 100%, fielding performance wouldn't matter at all, and fielding luck wouldn't come into play. All the luck would be in what proportion the pitcher split between 0s and 100s. </span></p><p><span style="font-family: verdana;">On the other hand, if all balls in play were exactly the league average of 30%, it would be the other way around. There would be no difference in the types of hits pitchers gave up, which means there would be no BAbip pitching luck at all. All the luck would be in whether the fielders handled more or fewer than 30% of the chances.</span></p><p><span style="font-family: verdana;">So: the more BIP are "near-automatic" hits or "near-automatic" outs, the more pitchers matter. The more BIP that could go either way, the more fielders matter.</span></p><p><span style="font-family: verdana;">That means we need to know the distribution of ball-in-play difficulty. And that's the data that we wouldn't have without the development of Zone ratings now keeping track of it. </span></p><p><span style="font-family: verdana;">The data I'm using comes from Sky Andrecheck, who actually <a href="http://baseballanalysts.com/archives/2009/09/defense_never_s.php">published</a> it in 2009, but I didn't realize what it could do until now. (Actually, I'm repeating some of Sky's work here, because I got his data before I saw his analysis of it. See also <a href="http://www.insidethebook.com/ee/index.php/site/comments/baseball_guts_part_2/">Tango's post</a> at his old blog.)</span></p><p><span style="font-family: verdana;">Here's the distribution. Actually, I tweaked it just a tiny bit to make the average work out to .300 (.29987) instead of Sky's .310, for no other reason than I've been thinking .300 forever and didn't want to screw up and forget I need to use .310. Either way, the results that follow would be almost the same. </span></p><p><span style="font-family: verdana;"><br /></span></p><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>43.0% of BIP: .000 to .032 chance of a hit*<br />23.0% of BIP: .032 to .140 chance of a hit<br />10.3% of BIP: .140 to .700 chance of a hit<br /> 4.7% of BIP: .700 to 1.000 chance of a hit<br />19.0% of BIP: 1.000 chance of a hit<br />---------------------------------------------<br />overall average: really close to .300</b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b><br /></b></span></div><p><span style="color: #990000; font-family: courier; font-size: x-small;">(*Within a group, the probability is uniform, so anything between .032 and .140 is equally likely once that group is selected.)</span></p><p><span style="font-family: courier; font-size: x-small;"><br /></span></p><p><span style="font-family: verdana;">The SD of this distribution is around .397. Over 3900 BIP, which I used to represent a team-season, it's .00636. That's the SD of pitcher luck.</span></p><p><span style="font-family: verdana;">The random binomial SD of BAbip over 3900 PA is the square root of (.3)(1-.3)/3900, which comes out to .00733. That's the SD of overall luck.</span></p><p><span style="font-family: verdana;">Since var(overall luck) = var(pitcher luck) + var(fielder luck), we can solve for fielder luck, which turns out to be .00367.</span></p><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>6.36 points pitcher luck (.00636)<br />3.67 points fielder luck (.00367)<br />--------------------------------<br />7.33 points overall luck (.00733)</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">If you square all the numbers and convert to percentages, you get</span></p><p><span style="font-family: verdana;"><br /></span></p><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b> 75.3 percent pitcher luck<br /> 24.7 percent fielder luck<br />--------------------------<br />100.0 percent overall luck</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">So there it is. BAbip luck is, on average, 75 pitching and 25 percent fielding. Of course, it varies randomly around that, but those are the averages.</span></p><p><span style="font-family: verdana;">What does that mean in practice? Suppose you notice that a team from the past, which you know has average talent in both pitching and fielding, gave up 20 fewer hits than expected on balls in play. If you were to go back and watch re-broadcasts of all 162 games, you'd expect to find that the fielders made 5 more plays than expected, based on what types of balls in play they were. And, you'd expect to find that the other 15 plays were the result of balls being having been hit a bit easier to field than average.</span></p><p><span style="font-family: verdana;">Again, we are not estimating talent here: we are estimating *what happened in games*. This is a substitute for actually watching the games and measuring balls in play, or having zone ratings, which are based on someone else actually having done that. </span></p><p><span style="font-family: verdana;">------</span></p><p><span style="font-family: verdana;">So, now that we know the luck breaks down 75/25, we can take our original breakdown, which was this:</span></p><p><span style="font-family: verdana;"><br /></span></p><div style="text-align: left;"><b style="color: #990000; font-family: courier;"> 6.1 fielding talent<br /> 4.3 pitching staff talent<br /> 7.3 luck<br /> 3.5 park<br />--------------------------<br />11.0 total</b></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">And split up the 7.3 points of luck as we calculated:</span></p><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>6.36 pitching luck<br />3.67 fielding luck<br />--------------------------<br />7.3 total luck</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">And substitute that split back in to the original:</span></p><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b> 6.1 fielding talent<br /> 3.67 fielding luck<br /> 4.3 pitching staff talent<br /> 6.36 pitching staff luck<br /> 3.5 park<br />--------------------------<br />11.0 total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">Since talent+luck = observed performance, and talent and luck are independent, we can consolidate each pair of "talent" and "luck" by summing their squares and taking the square root:</span></p><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b> 7.1 fielding observed<br /> 7.7 pitching observed <br /> 3.5 park<br />----------------------<br />11.0 total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">Squaring, taking percentages, and rounding, we get</span></p><div style="text-align: left;"><span style="font-family: verdana;"><b style="color: #990000; font-family: courier;"> 42 percent fielding<br /></b></span><span style="font-family: verdana;"><b style="color: #990000; font-family: courier;"> 48 percent pitching</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><b style="color: #990000; font-family: courier;"> 10 percent park<br /></b></span><b style="color: #990000; font-family: courier;">--------------------<br /></b><span style="font-family: verdana;"><b style="color: #990000; font-family: courier;">100 percent total </b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><b style="color: #990000; font-family: courier;"><br /></b></span></div><p><span style="font-family: verdana;">If you're playing in an average park, or you're adjusting for park some other way, it doesn't apply here, and you can say </span></p><div style="text-align: left;"><br /></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b> 47 percent fielding<br /> 53 percent pitching<br />---------------------<br />100 percent total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">So now we have our answer. If you see a team's stats one year that show them to have been particularly good or bad at turning batted balls into outs, on average, after adjusting for park, 47 percent of the credit goes to the fielders, and 53 percent to the pitchers.</span></p><p><span style="font-family: verdana;">But it varies. Some teams might have been 40/60, or 60/40, or even 120/-20! (The latter result might happen if, say, the fielders saved 24 hits, but the pitchers gave up harder BIPs that cost 4 extra hits.)</span></p><p><span style="font-family: verdana;">How can you know how far a particular team is from the 47/53 average? Watch the games and calculate zone ratings. Or, just rely on someone else's reliable zone rating. Or, start with 47/53, and adjust for what you know about how good the pitching and fielding were, relative to each other. Or, if you don't know, just use 47/53 as your estimate.</span></p><p><span style="font-family: verdana;">To verify empirically whether I got this right, find a bunch of published Zone Ratings that you trust, and see if they work out to about 42 percent of what you'd expect if the entire excess BAbip was allocated to fielding. (I say 42 percent because I assume zone ratings correct for park.)</span></p><p><span style="font-family: verdana;">(Actually, I ran across about five years of data, and tried it, and it came out to 39 percent rather than 42 percent. Maybe I'm a bit off, or it's just random variation, or I'm way off and there's lots of variation.)</span></p><p><span style="font-family: verdana;">-------</span></p><p><span style="font-family: verdana;">So what we've found so far:</span></p><p><span style="font-family: verdana;">-- Luck in BAbip belongs 25% to fielders, 75% to pitchers;</span></p><p><span style="font-family: verdana;">-- For a team-season, excess performance in observed BAbip belongs 42% to fielders, 48% to pitchers, and 10% to park.</span></p><p><span style="font-family: verdana;">-------</span></p><p><span style="font-family: verdana;">That 42 percent figure is for a team-season only. For an individual pitcher, it's different. </span></p><p><span style="font-family: verdana;">Here's the breakdown for an individual pitcher who allows 700 BIP for the season. </span></p><p><span style="font-family: courier;"><b><br /></b></span></p><div style="text-align: left;"><span style="color: #0c343d;"><span style="font-family: courier;"><b> 6.1 fielding talent<br /></b></span><span style="font-family: courier;"><b> 7.6 pitching talent</b></span></span></div><div style="text-align: left;"><span style="color: #0c343d; font-family: courier;"><b>17.3 luck</b></span></div><div style="text-align: left;"><span style="color: #0c343d;"><span style="font-family: courier;"><b> 3.5 park<br /></b></span><span style="font-family: courier;"><b>---------------------------</b></span><br /><span style="font-family: courier;"><b>20.2 total</b></span></span></div><p style="text-align: left;"></p><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: verdana;">The SD of pitching talent is larger now, because you're dealing with one specific pitcher, rather than the average of all the team's pitchers (who will partially offset each other, reducing variability). A</span><span style="font-family: verdana;">lso, luck has jumped from 7.3 points to 17.2, because of the smaller sample size.</span></div><p></p><p><span style="font-family: verdana;">OK, now let's break up the luck portion again:</span></p><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><b> 6.1 fielding talent<br /> 7.6 fielding luck<br /> 7.6 pitching talent<br />15.5 pitching luck</b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><b> 3.5 park<br />---------------------------<br />20.2 total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">And consolidating:</span></p><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><b> 9.75 observed fielding<br />17.3 observed pitching</b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><b> 3.5 park<br />---------------------------<br />20.2 total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><p><span style="font-family: verdana;">Converting to percentages, and rounding from 31/69:</span></p><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><b> 23% observed fielding<br /> 73% observed pitching</b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><b> 3% park<br />---------------------------<br />100% total</b></span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="font-family: verdana;">If we've already adjusted for park, then</span></div><div style="text-align: left;"><span style="font-family: courier;"><b><br /></b></span></div><div style="text-align: left;"><span style="color: #073763; font-family: courier;"><div style="font-family: "Times New Roman";"><span style="font-family: courier;"><b> 24% observed fielding<br /> 76% observed pitching</b></span></div><div style="font-family: "Times New Roman";"><span style="font-family: courier;"><b>---------------------------<br />100% total</b></span></div></span></div><p><span style="font-family: verdana;"><br /></span></p><p><span style="font-family: verdana;">So it's quite different for an individual pitcher than for a team season, because luck and talent break down differently between pitchers and fielders. </span></p><p><span style="font-family: verdana;">The conclusion: if you know nothing specific about the pitcher, his fielders, his park, or his team, your best guess is that 25 percent of his BAbip (compared to average) came from how well his fielders made plays, and 75 percent of his BAbip comes from what kind of balls in play he gave up.</span></p><p><span style="font-family: verdana;">------</span></p><p><span style="font-family: verdana;">Here's the two-sentence summary. On average,</span></p><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>-- For teams with 3900 BIP, 47 percent of BABIP is fielding and 53 percent is pitching.<br /><br /></b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>-- For starters with 700 BIP, 24 percent of BABIP is fielding and 76 percent is pitching.</b></span></div><p><span style="font-family: verdana;"><b>------</b></span></p><p><span style="font-family: verdana;">Next: <a href="http://blog.philbirnbaum.com/2021/01/splitting-defensive-credit-between.html">Part II</a>, where I try applying this to pitcher evaluation, such as WAR.</span></p><p><span style="font-family: verdana;"><br /></span></p><p><span style="font-family: verdana;"><br /></span></p><p><br /></p>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-71659451505638905482020-10-31T16:17:00.004-04:002020-10-31T16:19:26.953-04:00Calculating park factors from batting lines instead of runs<div style="text-align: left;"><span style="font-family: verdana;">I missed a <a href="http://tangotiger.com/index.php/site/comments/how-much-random-variation-is-there-in-park-factors#2">post</a> Tango wrote back in 2019 about park factors. In the comments, he said,</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;"></span></div><blockquote><div style="text-align: left;"><span style="color: #0b5394; font-family: verdana;">"That’s one place where we failed with our park factors, using actual runs instead of "component" runs. They should be based on Linear Weights or RC or wOBA, something like that.</span></div><div style="text-align: left;"><span style="color: #0b5394;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">"Using actual runs means introducing unnecessary random variation in the mix."</span></span></div></blockquote><div style="text-align: left;"><span style="font-family: verdana;"></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Yup. One of those bits of brilliance that's obvious in retrospect.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">The idea is, there's a certain amount of luck involved in turning batting events into runs, which depends on the sequence -- in other words, "clutch hitting," which is thought to be mostly random. If teams wind up scoring, say, 20 runs above average in a certain park, it could be that the park lends itself to higher offense. But, it could also be that the park is neutral, and those 20 runs just came from better clutch hitting.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">So if we calculated park factors from raw batting lines, instead of actual runs, we eliminate that luck, and should get better estimates. We can still convert to expected runs afterwards.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Let's do it. I'll start with using runs as usual. Then, I'll do it for wOBA, and we'll compare.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I used team-seasons from 2000-2019, except Coors Field (because it`s so extreme an outlier). I included only parks that were used at least 16 of the 20 seasons. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">To get the observed park effects, I just took home scoring (both teams combined) and subtracted road scoring (both teams combined). </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">For those 444 datapoints, I got</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(observed) = 81.6 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">To estimate luck, I used the rule of thumb that SD(runs) for a single team's games is about 3. (Tango uses the square root of total runs for both teams, but I didn't bother.) </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">If SD(1 game) = 3, then SD(81 games) = 27. But we want both teams combined, so multiply by root 2. Then, we want (home - road), so multiply by root 2 again. That gives us 54.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(luck) = 54 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Since var(observed) = var(luck) + var(non-luck), we get*</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(non-luck) = 61.2 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana; font-size: x-small;">*"var" is variance, the square of SD. I'm using it instead of "SD^2" because it makes it much easier to read.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Now, what's this thing I called "non-luck"? It's a combination of the differences between parks, and season-to season differences within the same park -- weather, how well the players are suited to the park, the parks used by other teams in the division (because of the unbalanced schedule), the parks used by interleague opponents, the somewhat-random distribution of opposing pitchers ... stuff like that.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>var(non-luck) = var(between parks) + var(within park)</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">To estimate SD(within park), I just looked at the observed SDs of the same park across the 16-20 seasons in the dataset. There were 23 parks in the sample, and I took the root-mean-square of those 23 individual SDs. I got</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(different seasons of park) = 64.1</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">But ... that 64.1 includes luck, and we want only the non-luck portion. So let's remove luck:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>var(diff. seas. of park)= var(luck) + var(within park)<br />64.1 squared = 54 squared + var(within park)<br />SD(within park) = 34.5 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">And now we can estimate SD(between parks):</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>var(non-luck) = var(between parks) + var(within park)<br />61.2 squared = var(between parks) + 34.5 squared<br />SD(between parks) = 50.5 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Summarizing:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b>81.6 runs total<br />---------------------------------<br />54 luck<br />50.5 between parks<br />34.5 within park between seasons</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Park squared is only 38 percent of the total squared. That means that only 38 percent of the observed park effect is real, and you have to regress to the mean by 62 percent to get an unbiased estimate.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">That's a lot. And it's one reason that most sites publish park factors based on more than one season, to give luck a chance to even out.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Now, let's try Tango's suggestion to use wOBA instead, and see how much luck that squeezes out.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">For the same individual parks, I calculated every year's observed park difference the same way as for runs -- home wOBA minus road wOBA, both teams combined.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">For the sample, SD(observed) was 0.01524, against an average wOBA of .3248. That's a ratio of 4.7%. I did a regression and found runs-per-PA increase 1.8x as fast as wOBA (probably proportional to the 1.77th power, or something), so 4.7% in wOBA is 8.45% in runs.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">In the full sample, there were .118875 runs per PA, and an average 6207 PA for each home park-season. That's about 738 runs. Taking 8.45 percent of that works out to an SD of 67.3 runs.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(observed) = 67.3 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">The luck SD for wOBA for a single PA is .532 (as calculated from an average batting line APBA card). I'll spare you repeating the percentage calculations, but for 6207 PA,</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(luck) = 41.9 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">As before, var(observed) = var(luck) + var(non-luck), so</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(non-luck) = 52.7 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Looking at the RMS between-season SD of the 23 teams in the sample, </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>SD(different seasons of park) = 51.2 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Eliminating luck to get true season-to-season differenc</span><span style="font-family: verdana;">es:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span></div><div style="text-align: left;"><span style="font-family: courier;"><b>var(diff. seas. of park)= var(luck) + var(within park)<br />51.2 squared = 41.9 squared + var(within park)<br />SD(within park) = 29.4 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">And, finally,</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>var(non-luck) = var(between parks) + var(within park)<br />52.7 squared = var(between parks) + 29.4 squared<br />SD(between parks) = 43.7 runs</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">The summary:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b>67.3 runs total<br />---------------------------------<br />41.9 luck<br />43.7 between park<br />29.4 within park between seasons</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Here the "between park" variance is 42 percent of the total, up from 38 percent when we used runs. So we have, in fact, gotten more accurate estimates.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">But wait! The two methods really should give us the same estimate of the SD of the "between" and "within" park factors, since they're trying to measure the same thing. But they don't:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: courier;"><b>runs wOBA<br />-----------------------------------------<br />81.6 67.3 runs total<br />-----------------------------------------<br />54 41.9 luck<br /><span style="color: #ff00fe;">50.5 43.7</span> between park<br /><span style="color: red;">34.5 29.4</span> within park between seasons</b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">(The "luck" SD is supposed to be different, since that was the whole purpose of using wOBA, to eliminate some of the random noise.)</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I think the difference is due to the fact that the wOBA variances were all based on averages per PA, while the runs variances were based on averages per game (roughly, per 27 outs).</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">On average, the more runs you score, the more PA you'll have. So changing the denominator to PA reduces the high-scoring games relative to the low-scoring games, which compresses the differences, which reduces the SD. </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Although the differences in PA look small, they actually indicate large differences in scoring. Because, per season, every park gets roughly the same number of outs, which means roughly the same number of PA that are outs. So any "extra PA" are mostly baserunners, and very valuable in terms of runs.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">If you switch from "observed runs per game" to "observed runs per 6207 PA," the observed SD drops from 81.6 to 72.7 runs. That's an 11 percent drop. When I did the same for wOBA, the observed SD dropped by 13 percent. So, let's estimate that the difference between "per game" and "per PA" is 12 percent, and reduce everything in the runs column by 12 percent:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="color: #990000; font-family: courier;"><b>runs wOBA<br />--------------------------------------------<br />71.8 67.3 runs total<br />--------------------------------------------<br />47.5 41.9 luck<br />44.4 43.7 between park<br />30.4 29.4 within park between seasons<br />--------------------------------------------</b></span></div><div style="text-align: left;"><span style="color: #990000; font-family: courier;"><b>62% 58% regression to long-term mean </b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I'm not 100% sure this is legitimate, but it's probably pretty close. One thing I want to do to make the comparison better, is to use the same value for "between park" and "within park", since we expect the methods to produce the same estimate, and we expect that any difference is random (in things like wOBA to run conversion, or how PA vary between games, or the fact that the wOBA calculation omits factors like baserunning).</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">So after my manual adjustment, we have:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="color: #2b00fe; font-family: courier;"><b>runs wOBA<br />--------------------------------------------<br />71.4 67.8 runs total<br />--------------------------------------------<br />47.5 41.9 luck<br />44 44 between park<br />30 30 within park between seasons<br />--------------------------------------------<br />62% 58% regression to long-term mean </b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">That's still a fair bit you have to regress either way -- more than half -- but that would be reduced if you used more than one season in your sample. If we go to the average of four seasons, "luck" and "within park" both get cut in half (the square root of 1/4). </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">I'll divide both of those by 2, and recalculate the top and bottom line:</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="color: #2b00fe; font-family: courier;"><b>runs wOBA</b></span></div><div style="text-align: left;"><span style="color: #2b00fe; font-family: courier;"><b>--------------------------------------------<br />52.3 51.0 runs total<br />--------------------------------------------<br />24 21 luck<br />44 44 between park<br />15 15 within park between seasons<br />--------------------------------------------</b></span></div><div style="text-align: left;"><span style="color: #2b00fe; font-family: courier;"><b>29% 26% regression to long-term mean </b></span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">So if we use a four-year park average, we should only have to regress 29 percent (for runs) or 26 percent (for wOBA). </span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">-------</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">Thanks to Tango for the wOBA data making this possible, and for other observations I'm saving for a future post.</span></div><div style="text-align: left;"><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;">My three previous posts on park factors are here: <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-i.html">one</a> <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-ii.html">two</a> <a href="http://blog.philbirnbaum.com/2020/04/regressing-park-factors-part-iii.html">three</a></span><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;"><br /></span><span style="font-family: verdana;"><br /></span><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-51402525747473505322020-08-27T15:13:00.003-04:002020-08-27T15:13:49.710-04:00Charlie Pavitt: Open the Hall of Fame to sabermetric pioneers<p><span style="background-color: white; font-family: verdana;">This guest post is from occasional contributor Charlie Pavitt. </span><a href="http://blog.philbirnbaum.com/search?q=charlie+pavitt+" style="background-color: white; color: #666666; font-family: verdana;">Here's a link</a><span style="background-color: white; font-family: verdana;"> to some of Charlie's previous posts.</span></p><p><span style="font-family: verdana;">-----</span></p><p><span style="font-family: verdana;">Induction into the National Baseball Hall of Fame (HOF) is
of course the highest honor available to those associated with the game. When one thinks of the HOF, one first thinks
of the greatest players, such as the first five inductees in 1936 (Cobb,
Johnson, Matthewson, Ruth, and Wagner). But other categories of contributors were added almost immediately;
league presidents (Morgan Bulkeley, Ban Johnson) and managers (Mack, McGraw)
plus George Wright in 1937, pioneers (Alexander Cartwright and Henry Chadwick)
in 1938, owners (Charles Comiskey) in 1939, umpires (Bill Klem) and what would
now be considered general managers (Ed Barrow) in 1953, and even union leaders
(Marvin Miller, this year for induction next year). There is an additional type
of honor associated with the HOF for contributions to the game; the J. G.
Taylor Spink Award (given by the Baseball Writers Association of America)
annually since 1962, the Ford C. Frick Award for broadcasters annually since
1978, and thus far five Buck O’Neill Lifetime Achievement Awards given every
three years since 2008. Even songs get
honored ("Centerfield", 2010; "Talkin' Baseball", 2011).</span></p>
<p class="MsoNormal"><span style="font-family: verdana;">But what about sabermetricians?<span style="mso-spacerun: yes;"> </span>Are they not having a major influence on the
game?<span style="mso-spacerun: yes;"> </span>Are there not some who are
deserving of an honor of this magnitude?</span></p>
<p class="MsoNormal"><span style="font-family: verdana;">I am proposing that an honor analogous to the Spink, Frick,
and O’Neill awards be given to sabermetricians who have made significant and
influential contributions to the analytic study of baseball. I would have
called it the Henry Chadwick Award to pay tribute to the inventor of the box
score, batting average, and earned run average, but SABR has already reserved
that title for its award for research contributions, a few of which have gone
to sabermetricians but most to other contributors.<span style="mso-spacerun: yes;"> </span>So instead I will call it the F. C. Lane award,
not in reference to Frank C. Lane (general manager of several teams in the
1950s and 1960s) but rather Ferdinand C. Lane, editor of the Baseball Magazine
between 1911 and 1937. Lane wrote two articles for the publication ("Why the
System of Batting Should Be Reformed," January 1917, pages 52-60; "The Base on
Balls," March 1917, pages 93-95) in which he proposed linear weight formulas
for evaluating batting performance, the second of which is remarkably accurate.</span></p>
<p class="MsoNormal"><span style="font-family: verdana;">I shall now list those whom I think have made "significant
and influential contributions to the analytic study of baseball" (that phrase
was purposely worded in order to delineate the intent of the award).<span style="mso-spacerun: yes;"> </span>The HOF began inductions with five players,
so I will propose who I think should be the first five recipients:</span></p><p class="MsoNormal"><span style="font-family: verdana;"><br /></span></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>George Lindsay</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">Between 1959 and 1963, based on data from a few hundred
games either he or his father had scored, George Lindsay published three
academic articles in which he examined issues such as the stability of the
batting average, average run expectancies for each number of outs during an
inning and for different innings, the length of extra-inning games, the
distribution of total runs for each team in a game, the odds of winning games
with various leads in each inning, and the value of intentional walks and base
stealing.<span style="mso-spacerun: yes;"> </span>It was revolutionary work, and
opened up areas of study that have been built upon by generations of
sabermetricians since.</span></p>
<p class="MsoNormal"><o:p><span style="font-family: verdana;"> </span></o:p></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>Bill James</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">Starting with his first-self-published Baseball Abstract
back in 1977, James built up an audience that resulted in the Abstract becoming
a conventionally-published best seller between 1982 and 1988.</span><span style="font-family: verdana;"> </span><span style="font-family: verdana;">During those years, he proposed numerous
concepts – to name just three, Runs Created, the Pythagorean Equation, and the
Defensive Spectrum – that have influenced sabermetric work ever since.</span><span style="font-family: verdana;"> </span><span style="font-family: verdana;">But at least if not more important were his
other contributions.</span><span style="font-family: verdana;"> </span><span style="font-family: verdana;">He proposed and got
off the ground Project Scoresheet, the first volunteer effort to compile
pitch-by-pitch data for games to be made freely available to researchers; this
was the forerunner and inspiration for Retrosheet.</span><span style="font-family: verdana;"> </span><span style="font-family: verdana;">During the same years as the Abstract was
conventionally published, he oversaw a sabermetric newsletter/journal, the Baseball
Analyst, which provided a pre-Internet outlet for amateur sabermetricians
(including myself) who had few if any other opportunities to get their work out
to the public.</span><span style="font-family: verdana;"> </span><span style="font-family: verdana;">Perhaps most importantly,
his work was the first serious sabermetric (a term he coined) analysis many of
us saw, and served as an inspiration for us to try our hand at it too.</span><span style="font-family: verdana;"> </span><span style="font-family: verdana;">I might add that calls for James to be
inducted into the Hall itself can be found on a New York Times article from
January 20, 2019 by Jamie Malinowski and the <a href="https://lastwordonbaseball.com/2020/05/17/baseball-hall-of-fame-contributors/">Last Word on Baseball website</a> by
its editor Evan Thompson.</span></p><p class="MsoNormal"><span style="font-family: verdana;"><br /></span></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>Pete Palmer</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">George Lindsay’s work was not readily available.<span style="mso-spacerun: yes;"> </span>The Hidden Game of Baseball, written by
Palmer and John Thorn, was, and included both a history of previous
quantitative work and advancement on that work in the spirit of Lindsay’s.
Palmer’s use of linear-weight equations to measure offensive performance and of
run expectancies to evaluate strategy options were not entirely new, as Lane
and Lindsay had respectively been first, but it was Palmer’s presentation that
served to familiarize those that followed with these possibilities, and as with
James these were inspirations to many of us to try our hands at baseball
analytics ourselves.<span style="mso-spacerun: yes;"> </span>Probably the most
important of Palmer’s contributions has been On-base Plus Slugging (OPS), one
of the few sabermetric concepts to have become commonplace on baseball broadcasts.</span></p>
<p class="MsoNormal"><o:p><span style="font-family: verdana;"> </span></o:p></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>David Smith</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">I’ve already mentioned Project Scoresheet, which lasted as a
volunteer organization from 1984 through 1989.<span style="mso-spacerun: yes;"> </span>I do not wish to go into its fiery ending, a product of a fight about
conflict of interest and, in the end, money.<span style="mso-spacerun: yes;">
</span>Out of its ashes like the proverbial phoenix rose Retrosheet, the go-to
online source for data describing what occurred during all games dating back to
1973, most games back to 1920, and some from before then.<span style="mso-spacerun: yes;"> </span>Since its beginning, those involved with
Retrosheet have known not to repeat the Project’s errors and have made data
freely available to everyone even if the intended use for that data is personal
financial profit.<span style="mso-spacerun: yes;"> </span>Dave Smith was the
last director of Project Scoresheet, the motivator behind the beginning of
Retrosheet, and the latter’s president ever since.<span style="mso-spacerun: yes;"> </span>Although it is primed to continue when Dave
is gone, Retrosheet’s existence would be inconceivable without him.<span style="mso-spacerun: yes;"> </span>Baseball Prospectus’s analyst Russell
Carleton, whose work relies on Retrosheet, has made it clear in print that he
thinks that Dave should be inducted into the Hall itself.</span></p>
<p class="MsoNormal"><o:p><span style="font-family: verdana;"> </span></o:p></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>Sean Forman</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">It is true that Forman copied from other sources, but no
matter; it took a lot of work to begin what is now the go-to online source for
data on seasonal performance.<span style="mso-spacerun: yes;"> </span>Baseball
Reference began as a one-man sideline for an academic, and has become home to
information about all American major team sports plus world-wide info on “real”
football.<span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal"><o:p><span style="font-family: verdana;"> </span></o:p></p><p class="MsoNormal"><o:p><span style="font-family: verdana;">-----</span></o:p></p>
<p class="MsoNormal"><span style="font-family: verdana;">Here are two others that I believe should eventually be recipients.</span></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>Sherri Nichols</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">Only two women have been bestowed with HOF-related awards;
Claire Smith is a past winner of the Spink Award and Rachel Robinson is a
recipient of the O’Neill Award.<span style="mso-spacerun: yes;"> </span>Sherri
Nichols would become the third.<span style="mso-spacerun: yes;"> </span>I became
convinced that she deserved it after reading <a href="https://www.theringer.com/mlb/2018/2/20/17030428/sherri-nichols-baseball-sabermetric-movement">Ben Lindbergh’s tribute</a>, and
recommend it for all interested in learning about the "founding mother" of
sabermetrics.<span style="mso-spacerun: yes;"> </span>I remember when the late Pete DeCoursey (I
was scoring Project Scoresheet Phillies games and he was our team captain)
proposed the concept of Defensive Average, for which (as Lindbergh’s article
noted) Nichols did the computations.<span style="mso-spacerun: yes;"> </span>This was revolutionary work at that time, and laid the groundwork for
all of the advanced fielding information we now have at our disposal.</span></p>
<p class="MsoNormal"><o:p><span style="font-family: verdana;"> </span></o:p></p>
<p class="MsoNormal"><u><span style="font-family: verdana;"><b>Tom Tango</b></span></u></p>
<p class="MsoNormal"><span style="font-family: verdana;">Tango has had significant influence on many areas of
sabermetric work, two of which have joined Palmer’s OPS as commonplaces on
baseball-related broadcasts.<span style="mso-spacerun: yes;"> </span>Wins Above
Replacement (WAR) was actually Bill James’s idea, but James never tried to
implement it.<span style="mso-spacerun: yes;"> </span>Tango has helped define
it, and his offensive index wOBA is at the basis of the two most prominent
instantiations, those from Baseball Reference (alternatively referred to as
bWAR and rWAR) and FanGraphs (fWAR).<span style="mso-spacerun: yes;">
</span>Leverage was an idea whose time had come, as our blogmaster Phil
Birnbaum came up with the same concept at about the same time, but it was
Tango’s usage that became definitive.<span style="mso-spacerun: yes;"> </span>His Fielding Independent Pitching (FIP) corrective to weaknesses in ERA
is also well-known and often used.<span style="mso-spacerun: yes;"> </span>Tango
currently oversees data collection for MLB Advanced Media, and has done
definitive work on MLBAM’s measurement of fielding (click <a href="https://www.reddit.com/r/baseball/comments/em9klj/history_of_the_fielding_a_white_paper_from_tom/">here</a> for a magisterial discussion of that topic).</span></p>
<p class="MsoNormal"><span style="font-family: verdana;">There are some historical figures that might be deserving;
Craig Wright, Dick Cramer, and Allan Roth come to mind as possibilities.<span style="mso-spacerun: yes;"> </span>Maybe even Earnshaw Cook, as wrong as he was
about just about everything, because of what he was attempting to do without
the data he needed to do it right (see his Percentage Baseball book for a
historically significant document).<span style="mso-spacerun: yes;"> </span>Perhaps the Award could also go to organizations as a whole, such as
Baseball Prospectus and FanGraphs; if so, SABR should get it first.</span></p>
<p class="MsoNormal"><br /></p>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com3tag:blogger.com,1999:blog-31545676.post-7597298007846762402020-08-05T12:36:00.008-04:002020-08-08T00:04:24.755-04:00The NEJM hydroxychloroquine study fails to notice its largest effect<div><font face="verdana">Before hydroxychloroquine was a Donald Trump joke, the drug was considered a promising possibility for prevention and treatment of Covid-19. It had been previously shown to work against respiratory viruses in the lab, and, for decades, it was safely and routinely given to travellers before departing to malaria-infested regions. A doctor friend of mine (who, I am hoping, will have reviewed this post for medical soundness before I post it) recalls having taken it before a trip to India.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">Travellers start on hydroxychloroquine two weeks before departure; this gives the drug time to build up in the body. Large doses at once can cause gastrointestinal side effects, but since hydroxychloroquine has a very long half-life in the body -- three weeks or so -- you build it up gradually.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">For malaria, hydroxychloroquine can also be used for treatment. However, several recent studies have found it to be ineffective treating advanced Covid-19.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">That leaves prevention. Can hydroxychloroquine be used to prevent Covid-19 infections? The "gold standard" would be a randomized double-blind placebo study, and we got one a couple of months ago, <a href="https://www.nejm.org/doi/pdf/10.1056/NEJMoa2016638?articleTools=true" target="_blank">in the New England Journal of Medicine</a> (NEJM). </font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">It concluded that there was no statistically significant difference between the treatment and placebo groups, and concluded</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana"><blockquote><font color="#073763">"After high-risk or moderate-risk exposure to Covid-19, hydroxychloroquine did not prevent illness compatible with Covid-19 or confirmed infection when used as postexposure prophylaxis within 4 days after exposure."</font></blockquote></font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">But ... after looking at the paper in more detail, I'm not so sure.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">-------</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">The study reported on 821 subjects who had been exposed, within the past four days, to a patient testing positive for Covid-19. They received a dose of either hydroxychloroquine or placebo for the next five days (the first day was a higher "loading dose"), and followed over the next couple of weeks to see if they contracted the virus.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">The results:</font></div><div><font face="verdana"><br /></font></div><div><font color="#990000" face="courier"><b>49 of 414 treatment subjects (11.8%) became infected</b></font></div><div><font color="#990000" face="courier"><b>58 of 407 placebo subjects (14.3%) became infected.</b></font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">That's about 17 percent fewer cases in patients who got the real drug. </font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">But that wasn't a large enough difference to show statistical significance, with only about 400 subjects in each group. The paper recognizes that, stating the study was designed only with sufficient power to find a reduction of at least 50 percent, not the 17 percent reduction that actually appeared. Still, by the usual academic standards for this sort of thing, the authors were able to declare that "hydroxychloroquine did not prevent illness."</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">At this point I would normally rant about statistical significance and how "absence of evidence is not evidence of absence." But I'll skip that, because there's something more interesting going on.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">------</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">Recall that the study tested hydroxychloroquine on subjects who feared they were already exposed to the virus. That's not really testing prevention ... it's testing treatment, albeit early treatment. It does have elements of prevention in it, as perhaps the subjects may not have been infected at that point, but would be infected later. (The study doesn't say explicitly, but I would assume some of the exposures were to family members, so repeated exposures over the next two weeks would be likely.)</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">Also: it did take five days of dosing until the full dose of hydroxychloroquine was taken. That means the subject didn't get a full dose until up to nine days after exposure to the virus.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">So this is where it gets interesting. Here's Figure 2 from the paper:</font></div><div><font face="verdana"><br /></font></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsMNPyo52spJQ4OD-3yJQHiuCIVzV20FyuneiT_1sdBC09hYFCp_g8k0YN5DHSYjw4uJHgUCPGcR2KPX0vcUW67e87BotzfCbNDQDtE6T87KVzNcV9hE7UadrQfJZupkSeqnP0wQ/s801/hcq.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="680" data-original-width="801" height="278" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsMNPyo52spJQ4OD-3yJQHiuCIVzV20FyuneiT_1sdBC09hYFCp_g8k0YN5DHSYjw4uJHgUCPGcR2KPX0vcUW67e87BotzfCbNDQDtE6T87KVzNcV9hE7UadrQfJZupkSeqnP0wQ/w328-h278/hcq.jpg" width="328" /></a></div><font face="verdana"><br /></font></div><div><br /></div><div><font face="verdana">These lines are cumulative infections during the course of the study. As of day 5, there were actually more infections in the group that took hydroxychloroquine than in the group that got the placebo ... which is perhaps not that surprising, since the subjects hadn't finished their full doses until that fifth day. By day 10, the placebo group has caught up, but the groups are still about equal.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">But now ... look what happens from Day 10 to Day 14. The group that got the hydroxychloroquine doesn't move much ... but the placebo group shoots up.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">What's the difference in new cases? The study doesn't give the exact numbers that correspond to the graph, so I used a pixel ruler to measure the distances between points of the graph. It turns out that from Day 10 to Day 14, they found:</font></div><div><font face="verdana"><br /></font></div><div><font color="#990000" face="courier"><b>-- 11 new infections in the placebo group</b></font></div><div><font color="#990000" face="courier"><b>-- 2 new infections in the hydroxychloroquine group.</b></font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">What is the chance that of 13 new infections, they would get split 11:2? </font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">About 1.12 percent one-tailed, 2.24 percent two-tailed.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">Now, I know that it's usually not legitimate to pick specific findings out of a study ... with 100 findings, you're bound to find one or two random ones that fall into that significance level. But this is not an arbitrary random pattern -- it's exactly what we would have expected to find if hydroxychloroquine worked as a preventative. </font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">It takes, on average, about a week for symptoms to appear after COVID-19 infection. So for those subjects in the "1-5" group, most were probably infected *before* the start of their hydroxychloroquine regimen (up to four days before, as the study notes). So those don't necessarily provide evidence of prevention. </font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">In the "6-10" group, we'd expect most of them to have been already infected before the drugs were administered; the reason they were admitted to the study in the first place was because they feared they had been exposed. So probably many of those who didn't experience symptoms until, say, Day 9, were already infected but had a longer incubation period. Also, most of the subsequently-infected subjects in that group probably got infected in the first five days, while they didn't have a full dose of the drug yet.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">But in the last group, the "11-14" group, that's when you'd expect the largest preventative effect -- they'd have had a full dose of the drug for at least six days, and they were the most likely to have become infected only after the start of the trial.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">And that's when the hydroxychloroquine group had an <b>84 percent lower</b> infection rate than the placebo group.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">------</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">In everything I've been reading about hydroxychloroquine and this study, I have not seen anyone notice this anomaly, that beyond ten days, there were almost seven times as many infections among those who didn't get the hydroxychloroquine. In fact, even the authors of the study didn't notice. They stopped the study on the basis of "futility" once they realized they were not going to achieve statistical significance (or, in other words, once they realized the reduction in infections was much less than the 50% minimum they would endorse). In other words: they stopped the study just as the results were starting to show up! </font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">And then the FDA, noting the lack of statistical significance, revoked authorization to use hydroxychloroquine.</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">I'm not trying to push hydroxychloroquine here ... and I'm certainly not saying that I think it will definitely work. If I had to give a gut estimate, based on this data and everything else I've seen, I'd say ... I dunno, maybe a 15 percent chance. Your guess may be lower. Even if your gut says there's only a one-in-a-hundred chance that this 84 percent reduction is real and not a random artifact ... in the midst of this kind of pandemic, isn't even 1 percent enough to say, hey, maybe it's worth another trial?</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana">I know hydroxychloroquine is considered politically unpopular, and it's fun to make a mockery of it. But these results are strongly suggestive that there might be something there. If we all agree that Trump is an idiot, and even a stopped clock is right twice a day, can we try evaluating the results of this trial on what the evidence actually shows? Can we not elevate common sense over the politics of Trump, and the straitjacket of statistical significance, and actually do some proper science?</font></div><div><font face="verdana"><br /></font></div><div><font face="verdana"><br /></font></div><div><font face="verdana"><br /></font></div><div><br /></div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com4tag:blogger.com,1999:blog-31545676.post-85637883671432023082020-05-17T17:52:00.001-04:002020-05-20T15:12:20.210-04:00Herd immunity comes faster when some people are more infectious<span style="font-family: "verdana" , sans-serif;">By now, we all know about "R0" and how it needs to drop below 1.0 for use to achieve "herd immunity" to the COVID virus. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The estimates I've seen is that the "R0" (or "R") for the COVID-19 virus is around 4. That means, in a susceptible population with no interventions like social distancing, the average infected person will pass the virus on to 4 other people. Each of those four passes it on to four others, and each of those 16 newly-infected people pass it on to four others, and the incidence grows exponentially.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, suppose that 75 percent of the population is immune, perhaps because they've already been infected. Then, each infected person can pass the virus on to only one other person (since the other three who would otherwise be infected, are immune). That means R0 has dropped from 4 to 1. With R0=1, the number of infected people will stay level. As more people become immune, R drops further, and the disease eventually dies out in the population.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's the argument most experts have been making, so far -- that we can't count on herd immunity any time soon, because we'd need 75 percent of the population to get infected first.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But that can't be right, as "Zvi" points out in <a href="https://www.lesswrong.com/posts/TcgWKeuSNfQvjK6Mj/covid-19-5-7-fighting-limbo#Seriously__Stop_Thinking_It_Takes_75__Infected_To_Get_Herd_Immunity">a post on LessWrong</a>*. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(*I recommend LessWrong as an excellent place to look to for <a href="https://www.lesswrong.com/tag/coronavirus">good reasoning on coronavirus issues</a>, with arguments that make the most sense.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's because not everyone is equal in terms of how much they're likely to spread the virus. In other words, everyone has his or her own personal R0. Those with a higher R0 -- people who don't wash their hands much, or shake hands with a lot of people, or just encounter more people for face-to-face interactions -- are also likely to become infected sooner. When they become immune, they drop the overall "societal" R0 more than if they were average.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If you want to reduce home runs in baseball by 75 percent, you don't have to eliminate 75 percent of plate appearances. You can probably do it by eliminating as little as, say, 25 percent, if you get rid of the top power hitters only.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><a href="https://www.lesswrong.com/posts/TcgWKeuSNfQvjK6Mj/covid-19-5-7-fighting-limbo#Seriously__Stop_Thinking_It_Takes_75__Infected_To_Get_Herd_Immunity">As Zvi writes</a>,</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<blockquote class="tr_bq">
<span style="color: #660000; font-family: "verdana" , sans-serif;">"Seriously, stop thinking it takes 75% infected to get herd immunity...</span><br />
<br />
<span style="color: #660000; font-family: "verdana" , sans-serif;">"... shame on anyone who doesn’t realize that you get partial immunity much bigger than the percent of people infected. </span><br />
<br />
<span style="color: #660000; font-family: "verdana" , sans-serif;">"General reminder that people’s behavior and exposure to the virus, and probably also their vulnerability to it, follow power laws. When half the population is infected and half isn’t, the halves aren’t chosen at random. They’re based on people’s behaviors.</span><br />
<br />
<span style="color: #660000; font-family: "verdana" , sans-serif;">"Thus, expect much bigger herd immunity effects than the default percentages."</span></blockquote>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But to what extent does the variance in individual behavior affect the spread of the virus? Is it just a minimal difference, or is it big enough that, for instance, New York City (with some 20 percent of people having been exposed to the virus) is appreciably closer to herd immunity than we think?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">To check, I wrote a simulation. It is probably in no way actually realistic in terms of how well it models the actual spread of COVID, but I think we can learn something from the differences in what the model shows for different assumptions about individual R0.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I created 100,000 simulated people, and gave them each a "spreader rating" to represent their R0. The actual values of the ratings don't matter, except relative to the rest of the population. I created a fixed number of "face-to-face interactions" each day, and the chance of being involved in one is directly proportional to the number. So, people with a rating of "8" are four times as likely to have a chance to spread/catch the virus than people with a rating of "2". </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Each of those interactions, if it turns out person was infected and one wasn't, there was a fixed probability of the infection spreading to the other person.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For every simulation, I jigged the numbers to get the R0 to be around 4 for the first part of the pandemic, from the start until the point where around three percent of the population was infected. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The simulation started with 10 people newly infected. I assumed that infected people could spread the virus only for the first 10 days after infection. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The four simulations were:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. Everyone has the same rating.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. Everyone rolls a die until "1" or "2" comes up, and their spreader rating is the number of rolls it took. (On average, that's 3 rolls. But in a hundred thousand trials, you get some pretty big outliers. I think there was typically a 26 or higher -- 26 consecutive high rolls happens one time in 37,877.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">3. Same as #2, except that 1 percent of the population is a superspreader, with a spreader rating of 30. The first nine infected people were chosen randomly, but the tenth was always set to "superspreader."</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">4. Same as #3, but the superspreaders got an 80 instead of a 30.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In the first simulation, everyone got the same rating. With an initial R0 of around 4, it did, indeed, take around 75 percent of the population to get infected before R0 dropped below 1.0. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Overall, around 97 percent of the population wound up being infected before the virus disappeared completely.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's the graph:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYrSQ6FIPINVstgHxkZ5GJ5c1r-QIpEyEmRWopzC05p7Rf7LafJgcYZVUaOr9uqMAvjgQdmP1X4AfV7uIK9E_ionmuEibtb6S6gv3O1ugGIRAhxRlaSG1l7ckQOPTDd8c6pZDaYg/s1600/covid1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="352" data-original-width="623" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYrSQ6FIPINVstgHxkZ5GJ5c1r-QIpEyEmRWopzC05p7Rf7LafJgcYZVUaOr9uqMAvjgQdmP1X4AfV7uIK9E_ionmuEibtb6S6gv3O1ugGIRAhxRlaSG1l7ckQOPTDd8c6pZDaYg/s400/covid1.jpg" width="400" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The point where R0 drops below 1.0 is where the next day's increase is smaller than the previous day's increase. It's hard to eyeball that on the curve, but it's around day 32, where the total crosses the 75,000 mark.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">As I mentioned, I jigged the other three curves so that for the first days, they had about the same R0 of around 4, so as to match the "everyone the same" graph.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's the graph of all four simulations for those first 22 days:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhptql2VVQTz_BfpwsuqkpRaivQLE6cp8fOXKyP2t5YciHLpYA4TK7C0R7zSTKKsf4oLcmNUl_B7wZpjbgtAD5ytnj6pSoQmydeB9UWAfxNSLQ0FcTCNSMOkaxGco4vxUBOiXrpeQ/s1600/covid2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="352" data-original-width="623" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhptql2VVQTz_BfpwsuqkpRaivQLE6cp8fOXKyP2t5YciHLpYA4TK7C0R7zSTKKsf4oLcmNUl_B7wZpjbgtAD5ytnj6pSoQmydeB9UWAfxNSLQ0FcTCNSMOkaxGco4vxUBOiXrpeQ/s400/covid2.jpg" width="400" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Aside from the scale, they're pretty similar to the curves we've seen in real life. Which means, that, based on the data we've seen so far, we can't really tell from the numbers which simulation is closest to our true situation.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But ... after that point, as Zvi explained, the four curves do diverge. Here they are in full:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjC57SDTriEw9berfXh1pCjY2uqMMFrRb-LYRHXBFqMj29RywqQc2odvQK1bdtUO-PSQ1SBjWgfKwI6h4jcZvY0CSXK9VLAt4dNpRliLDts_YLH2hA8rZe8sMFXv8GiQ8oy7cBUw/s1600/covid3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="352" data-original-width="623" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjC57SDTriEw9berfXh1pCjY2uqMMFrRb-LYRHXBFqMj29RywqQc2odvQK1bdtUO-PSQ1SBjWgfKwI6h4jcZvY0CSXK9VLAt4dNpRliLDts_YLH2hA8rZe8sMFXv8GiQ8oy7cBUw/s400/covid3.jpg" width="400" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Big differences, in the direction that Zvi explained. The bigger the variance in individual R0, the more attenuated the progression of the virus.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which makes sense. All four curves had an R0 of around 4.0 at the beginning. But the bottom curve was 99 percent with an average of 3 encounters, and 1 percent superspreaders with an average of 80 encounters. Once those superspreaders are no longer superspreading, the R0 plummets. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In other words: herd immunity brings the curve under control by reducing opportunity for infection. In the bottom curve, eliminating the top 1% of the population reduces opportunity by 40%. In the top curve, eliminating 1% of the population reduces opportunity only by 2%.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For all four curves, here's where R0 dropped below 1.0:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">75% -- all people the same</span><br />
<span style="font-family: "verdana" , sans-serif;">58% -- all different, no superspreaders</span><br />
<span style="font-family: "verdana" , sans-serif;">44% -- all different, superspreaders 10x average</span><br />
<span style="font-family: "verdana" , sans-serif;">20% -- all different, superspreaders 26x average</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">And here's the total number of people who ever got infected:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">97% -- all people the same</span><br />
<span style="font-family: "verdana" , sans-serif;">81% -- all different, no superspreaders</span><br />
<span style="font-family: "verdana" , sans-serif;">65% -- all different, superspreaders 10x average</span><br />
<span style="font-family: "verdana" , sans-serif;">33% -- all different, superspreaders 26x average</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Does it seem counterintuitive that the more superspreaders, the better the result? How can more infecting make things better?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It doesn't. More *initial* infecting makes things better *only holding the initial R0 constant.* </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If the aggregate R0 is still only 4.0 after including superspreaders, it must mean that the non-superspreaders have an R0 significantly less than 4.0. You can think of a "R=4.0 with superspreaders" society like maybe a "R=2.0" society that's been infected by 1% gregarious handshaking huggers and church-coughers.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In other words, the good news is: if everyone were at the median, the overall R0 would be less than 4. It just looks like R0=4 because we're infested by dangerous superspreaders. Those superspreaders will more quickly turn benign and lower our R0 faster.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">---------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, the shape of the distribution of spreaders matters a great deal. Of course, we don't know the shape of our distribution, so it's hard to estimate which line in the chart we're closest to. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But we *do* know that we at least a certain amount of variance -- some people shake a lot of hands, some people won't wear masks, some people are probably still going to hidden dance parties. So I think we can conclude that we'll need significantly less than 75 percent to get to herd immunity.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">How much less? I guess you could study data sources and try to estimate. I've seen at least one non-wacko argument that says New York City, with an estimated infection rate of at least 20 percent, might be getting close. Roughly speaking, that would be something like the fourth line on the graph, the one on the bottom. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which line is closest, if not that one? My gut says ... given that we know the top line is wrong, and from what we know about human nature ... the second line from the top is a reasonable conservative assumption. Changing my default from 75% to 58% seems about right to me. But I'm pulling that out of my gut. The very end part of my gut, to be more precise. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">At least we know for sure is that the 75%, the top line of the graph, must be too pessimistic. To estimate how far pessimistic, we need more data and more arguments. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com5tag:blogger.com,1999:blog-31545676.post-28923753803032136482020-05-06T15:29:00.001-04:002020-05-06T15:40:22.999-04:00Regression to higher ground<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We know that if an MLB team wins 76 games in a particular season, it's probably a better team than its record indicates. To get its talent from its record, we have to regress to the mean.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Tango has often said that straightforward regression to the mean sometimes isn't right -- you have to regress to the *specific* mean you're concerned with. If Wade Boggs hits .280, you shouldn't regress him towards the league average of .260. You should regress him towards his own particular mean, which is more like .310 or something.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">This came up when I was figuring regression to the mean <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-i.html">for</a> <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-ii.html">park</a> <a href="http://blog.philbirnbaum.com/2020/04/regressing-park-factors-part-iii.html">factors</a>. To oversimplify for purposes of this discussion: the distribution of hitters' parks in MLB is bimodal. There's Coors Field, and then everyone else. Roughly like this random pic I stole from the internet:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5qZfZTTjctTyUsNIqjPHBZ4CM9u8MA2yxL2DK0jafZvnJjDRgh2bghTqSIDuNCGyxwr1ONZaOxpb1ZNePTSxQGnjaUhHi90l9CdHFsAhZ9FKndKrWrA7ThMUOKsv3_vpgEMlFXA/s1600/bimodal.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="292" data-original-width="588" height="158" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5qZfZTTjctTyUsNIqjPHBZ4CM9u8MA2yxL2DK0jafZvnJjDRgh2bghTqSIDuNCGyxwr1ONZaOxpb1ZNePTSxQGnjaUhHi90l9CdHFsAhZ9FKndKrWrA7ThMUOKsv3_vpgEMlFXA/s320/bimodal.jpg" width="320" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, suppose you have a season of Coors Field that comes in at 110. If you didn't know the distribution was bimodal, you might regress it back to the mean of 100, by moving it to the left. But if you *do* know that the distribution is bimodal, and you can see the 110 belongs to the hump on the right, you'd regress it to the Coors mean of 113, by moving it to the right.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But there are times when there is no obvious mean to regress to.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You have a pair of perfectly fair 9-sided dice. You want to count the number of rolls it takes before you roll your first snake eyes (which has a 1 in 81 chance each roll). On average, you expect it to take 81 rolls, but that can vary a lot. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You don't have a perfect count of how many rolls it took, though. Your counter is randomly inaccurate with an SD of 6.4 rolls (coincidentally the same as the SD of luck for team wins).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You start rolling. Eventually you get snake eyes, and your counter estimates that it took 76 rolls. The mean is 81. What's your best estimate of the actual number? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">This time, it should be LOWER than 76. You actually have to regress AWAY from the mean.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Let's go back to the usual case for a second, where a team wins 76 games. Why do we expect its talent to be higher than 76? Because there are two possibilities:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">(a) its talent was lower than 76, and it got lucky; or</span><br />
<span style="font-family: "verdana" , sans-serif;">(b) its talent was higher than 76, and it got unlucky.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But (b) is more likely than (a), because the true number will be higher than 76 more often than it'll be lower than 76. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You can see that from this graph that represents distribution of team talent:</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiDCLy1eDPFIhMoXHuzcbeE4HFWhnEpYuhsrp6aUsIyCCjg0tcfujLOfUbxxE32aBM2sefJDs8p1LNNG8M3C9Har5C6yB36uvNj84B4KO1pvzkqRNKRhla6cY1Tw8aXqJwXnrLfQ/s1600/binomial2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="415" data-original-width="694" height="191" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiDCLy1eDPFIhMoXHuzcbeE4HFWhnEpYuhsrp6aUsIyCCjg0tcfujLOfUbxxE32aBM2sefJDs8p1LNNG8M3C9Har5C6yB36uvNj84B4KO1pvzkqRNKRhla6cY1Tw8aXqJwXnrLfQ/s320/binomial2.jpg" width="320" /></a></div>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The blue bars are the times that talent was less than 76, and the team got lucky. The pink bars are the times the talent was more than 76, and the team got unlucky.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The blue bars around 76 are shorter than the pink bars around 76. That means better teams getting unlucky are more common than worse teams getting lucky, so the average talent must be higher than 76.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But the dice case is different. Here's the distribution of when the first snake-eyes (1 in 81 chance) appears:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8aDi4eFBXoipTMkYCMTeKVxOaBu_GmCT8RJH3r6JwoggotSJ1pCgfVzRfjWydtgHcdTDtVEan9oMpJtVix0_otO4X22E1rKNpcCkad2SNi89vg5bHhus67KN_dA-wZiFmJ16LcA/s1600/geometric2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="361" data-original-width="1130" height="127" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8aDi4eFBXoipTMkYCMTeKVxOaBu_GmCT8RJH3r6JwoggotSJ1pCgfVzRfjWydtgHcdTDtVEan9oMpJtVix0_otO4X22E1rKNpcCkad2SNi89vg5bHhus67KN_dA-wZiFmJ16LcA/s400/geometric2.jpg" width="400" /></a></div>
<br />
<br />
<span style="font-family: "verdana" , sans-serif;">The mean is still 81, but, this time, the curve slopes down at 76, not up.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which means: it's more likely that you rolled less than 76 times and counted too high, than that you rolled more than 76 times and counted too low. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which means that to estimate the actual number of rolls, you have to regress *down* from 76, which is *away* from the mean of 81.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That logic --let's call it the "Dice Method" -- seems completely correct, right? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, the standard "Tango Method" contradicts it.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The SD of the distribution of the dice graph is around 80.5. The SD of the counting error is 6.4. So we can calculate:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><b>SD(true) = 80.5</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>SD(error) = 6.4</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>--------------------</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>SD(observed) = 80.75</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">By the Tango method, we have to regress by (6.4/80.75)^2, which is less than 1% of the way to the mean. Small, but still towards the mean!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So we have two answers, that appear to contradict each other:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #351c75; font-family: "verdana" , sans-serif;">-- Dice Method: regress away from the mean</span><br />
<span style="color: #351c75; font-family: "verdana" , sans-serif;">-- Tango Method: regress towards the mean</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which is correct?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">They both are.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The Tango Method is correct on average. The Dice Method is correct in this particular case.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If you don't know how many rolls you counted, you use the Tango Method.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If you DO know that the count was 76 rolls, you use the Dice Method.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Side note:</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">The Tango Method's regression to the mean looked wrong to me, but I think I figured out where it comes from.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Looking at the graph at a quick glance, it looks like you should always regress to the left, because the left side of every point is always higher than the right side of every point. That means that if you're below the mean of 81, you regress away from the mean (left). If you're above the mean of 81, you regress toward the mean (still left).</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">But, there are a lot more datapoints to the left of 81 than to the right of 81 -- by a ratio of about 64 percent to 36 percent. So, overall, it looks like the average should be regressing away from the mean.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Except ... it's not true that the left is always higher than the right. Suppose your counter said "1". You know the correct count couldn't possibly have been zero or less, so you have to regress to the right. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Even if your counter said "2" ... sure, a true count of 1 is more likely than a true count of 3. but 4, 5, and 6 are more likely than 0, -1, or -2. So again you have to regress to the right.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Maybe the zero/negative logic is a factor when you have, say, 8 tosses or less, just to give a gut estimate. Those might constitute, say, 10 percent of all snake eyes rolled. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">So, the overall "regress less than 1 percent towards the mean of 81" is the average of:</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace; font-size: x-small;">-- 36% regress left towards the mean a bit (>81)</span><br />
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace; font-size: x-small;">-- 54% regress left away from the mean a bit (9-81)</span><br />
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace; font-size: x-small;">-- 10% regress right towards the mean a lot (< 8)</span><br />
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> -----------------------------------------------------</span><br />
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace; font-size: x-small;">-- Overall average: regress towards the mean a tiny bit.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The "Tango Method" and the "Dice Method" are just consequences of Bayes' Theorem that are easier to implement than doing all the Bayesian calculations every time. The Tango Method is a mathematically precise consequence of Bayes Theorem, and the Dice Method is an heuristic from eyeballing. Tango's "regress to the specific mean" is another Bayes heuristic.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We can reduce the three methods into one by noting what they have in common -- they all move the estimate from lower on the curve to higher on the curve. So, instead of "regress to the mean," maybe we can say</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "verdana" , sans-serif;">"regress to higher ground."</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's sometimes how I think of Bayes' Theorem in my own mind. In fact, I think you can explain Bayes exactly, as a more formal method of figuring where the higher ground is, by explicitly calculating how much to weight the closer ground relative to the distant ground. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-19851394155709736682020-04-15T13:15:00.000-04:002020-04-15T18:04:22.796-04:00Regressing Park Factors (Part III)<span style="font-family: "verdana" , sans-serif;">I <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-i.html">previously calculated</a> that to estimate the true park factor (BPF) for a particular season, you have to take the "standard" one and regress it to the mean by 38 percent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's the generic estimate, for all parks combined. If you take Coors Field out of the pool of parks, you have to regress even more.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I ran the same study as in my other post, but this time I left out all the Rockies. Now, instead of 38 percent, you have to regress 50 percent. (It was actually 49-point-something, but I'm calling it 50 percent for simplicity.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In effect, the old 38 percent estimate comes from a combination of </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. Coors Field, which needs to be regressed virtually zero, and</span><br />
<span style="font-family: "verdana" , sans-serif;">2. The other parks, which need to be regressed 50 percent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For the 50-percent estimate, the 93% confidence interval is (41, 58), which is very wide. But the <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-ii.html">theoretical method from last post</a>, which I also repeated without Colorado, gave 51 percent, right in line with the observed number.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I tried this method for the Rockies only, and it turns out that the point estimate is that you have to regress slightly *away* from the mean of 100. But with so few team-seasons, the confidence interval is so huge that I'd just take the park factors at face value and not regress them at all. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The proper method would probably be to regress the Rockies' park factor to the <a href="https://www.baseball-reference.com/teams/COL/attend.shtml">Coors Field mean</a>, which is about 113. You could probably crunch the numbers and figure out how much to regress. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The overall non-Coors value is 50 percent, but it turns out that every decade is different. *Very* different:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>1960s: regress 15 percent</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>1970s: regress 27 percent</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>1980s: regress 80 percent</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>1990s: regress 84 percent</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>2000s: regress 28 percent</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>2010-16: regress 28 percent </b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Why do the values jump around so much? One possibility is that it's random variation on how teams are matched to parks. The method expects batters in hitters' parks to be equal to batters in pitchers' parks, but if (for instance) the Red Sox had a bad team in the 80s, this method would make the park effect appear smaller.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">As soon as I wrote that, I realized I could check it. Here are the correlations between BPF and team talent in terms of RS-RA (per 162 games) for team-seasons, by decade. I'll include the regression-to-the-mean amount to make it easier to compare:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> r RTM</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>---------------------</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>1960s: +0.14 15% </b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>1970s: +0.06 27%</b></span><br />
<span style="color: red; font-family: "courier new" , "courier" , monospace;"><b>1980s: -0.14 80%</b></span><br />
<span style="color: red; font-family: "courier new" , "courier" , monospace;"><b>1990s: +0.03 84%</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>2000s: +0.16 28%</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>2010s: +0.23 28%</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>---------------------</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>overall: +0.05 50%</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It does seem to work out that the more positive the correlation between hitting and BPF, the more you have to regress. The two lowest correlations were the ones with the two highest levels of regression to the mean.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">(The 1990s does seem a little out of whack, though. Maybe it has something to do with the fact that we're leaving out the Rockies, so the NL BPFs are deflated for 1993-99, but the RS-RA are inflated because the Rockies were mediocre that decade. With the Rockies included, the 1990s correlation would turn negative.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The "regress 50 percent to the mean" estimate seems to be associated with an overall correlation of +.05. If we want an estimate that assumes zero correlation, we should probably bump it up a bit -- maybe to 60 percent or something.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I'd have to think about whether I wanted to do that, though. My gut seems more comfortable with the actual observed value of 50 percent. I can't justify that.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<br />Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com6tag:blogger.com,1999:blog-31545676.post-39211947373937063782020-03-23T17:36:00.004-04:002023-01-21T16:46:20.213-05:00Regressing Park Factors (Part II)<span style="font-family: verdana;"><span face=""verdana" , sans-serif">Note: math/stats post.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">-------------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif"><a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-i.html">Last post</a>, I figured the breakdown of variance for (three-year average) park effects (BPFs) from the Lahman database. It came out like this:</span></span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">All Parks [chart 1]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">4.8 = SD(3-year observed)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">4.3 = SD(3-year true) </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">2.1 = SD(3-year luck)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-------------------------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: verdana;"><span face=""verdana" , sans-serif">Using the usual method, we would figure, theoretically, that you have to regress park factor by (2.1/4.8)^2, which is about 20 percent. </span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">But when we used empirical data to calculate the real-life amount of regression required, it turned out to be 38 percent.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">Why the difference? Because the 20 percent figure is to regress the observed three-year BPF to its true three-year average. But the 38 percent is to regress the observed three-year BPF to a single-year BPF.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">My first thought was: the 3-year true value is the average of three 1-year true values. If each of those were independent, we could just break the 3-year SD into three 1-year SDs by dividing by the square root of 3. </span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">But that wouldn't work. That's because when we split a 3-year BPF into three 1-year BPFs, those three are from the same park. So, we'd expect them to be closer to each other than if they were three random BPFs from different parks. (That fact is why we choose a three-year average instead of a single year -- we expect the three years to be very similar, which will increase our accuracy.)</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">Three years of the same park are similar, but not exactly the same. Parks do change a bit from year to year; more importantly, *other* parks change. (In their first season in MLB, the Rockies had a BPF of 118. All else being equal, the other 13 teams would see their BPF drop by about 1.4 points to keep the average at 100.)</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">So, we need to figure out the SD(true) for different seasons of the same park. </span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">--------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">From the Lahman database, I took all ballparks (1960-2016) with the same name for at least 10 seasons. For each park, I calculated the SD of its BPF for those years. Then, I took the root mean square of those numbers. That came out to 3.1.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">We already calculated that the SD of luck for the average of three seasons is 2.1. That means we can fill in SD(true)=2.3.</span></span><div><span style="font-family: verdana;"><br /></span>
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">Same Park [Chart 2]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">------------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">3.1 = SD(3-year observed, same park)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">------------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>2.3 = SD(3-year true, same park)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">2.1 = SD(3-year luck, any parks)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">------------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: verdana;"><span face=""verdana" , sans-serif">(That's the only number we will actually need from that chart.)</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">Now, from Chart 1, we found SD(true) was 4.3 for all park-years. That 4.3 is the combination of (a) variance of different years from the same park, and (b) variance between the different parks. We now know (a) is 2.3, so we can calculate (b) is root (4.3 squared minus 2.3 squared), which equals 3.6.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">So we'll break the "4.3" from Chart 1 into those two parts:</span></span></div><div><span style="font-family: verdana;"><br /></span>
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">All Parks [Chart 3]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">4.8 = SD(3-year observed)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>3.6 = SD(3-year true between parks)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>2.3 = SD(3-year true within park)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">2.1 = SD(3-year luck)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: verdana;"><span face=""verdana" , sans-serif">Now, let's assume that for a given park, the annual deviations from its overall average are independent from year to year. That's not absolutely true, since some changes are more permanent, like when Coors Field joins the league. But it's probably close enough.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">With the assumption of independence, we can break the 3-year SD down into three 1-year SDs. That converts the single 2.3 into three SDs of 1.3 (obtained by dividing 2.3 by the square root of 3):</span></span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">All Parks [Chart 4]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">4.8 = SD(3-year observed)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">3.6 = SD(3-year true between parks)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>1.3 = SD(this year true for park)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>1.3 = SD(next year true for park)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>1.3 = SD(last year true for park)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">2.1 = SD(3-year luck)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: verdana;"><span face=""verdana" , sans-serif">What we're interested in is the SD of this year's value. That's the combination of the first two numbers in the breakdown: the SD of the difference between parks, and the SD of this year's true value for the current park.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">The bottom three numbers are different kinds of "luck," for what we're trying to measure. The actual luck in run scoring, and the "luck" in how the park changed in the other two years we're using in the smoothing for the current year. </span></span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">All Parks [Chart 4a]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">4.8 = SD(3-year observed)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span style="color: red; font-family: "courier new" , "courier" , monospace;"><b>3.6 = SD(3-year true between parks)</b></span><br />
<span style="color: red; font-family: "courier new" , "courier" , monospace;"><b>1.3 = SD(this year true for park)</b></span><br />
<span style="color: #660000; font-family: "courier new" , "courier" , monospace;"><b><br /></b></span>
<b style="font-family: "courier new", courier, monospace;"><span style="color: blue;">1.3 = SD(next year true for park)</span></b><br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>1.3 = SD(last year true for park)</b></span><br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>2.1 = SD(3-year luck)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">-----------------------------------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif" style="font-family: verdana;">Combining the top three and bottom three, we get</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">All Parks [Chart 5]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">----------------------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">4.8 = SD(3-year observed)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">----------------------------------------------</span><br />
<span style="color: red; font-family: "courier new" , "courier" , monospace;"><b>3.8 = SD(true values impacting observed BPF)</b></span><br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>2.8 = SD(random values impacting observed BPF)</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">----------------------------------------------</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span style="font-family: verdana;"><span face=""verdana" , sans-serif">So we regress by (2.8/4.8) squared, which works out to 34 percent. That's pretty close to the actual figure of 38 percent.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">We can do another attempt, with a slightly different assumption. Back in Chart 2, when we figured SD(three year true, same park) was 3.1 ... that estimate was based on parks with at least ten years of data. If I reduce the requirement to three years of data, SD(three year true, same park) goes up to 3.2, and the final result is ... 36 percent regression to the mean.</span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">So there it is. I think this is method is valid, but I'm not completely sure. The 95% confidence interval for the true value seems to be wide -- regression to the mean between 28 percent and 49 percent -- so it might just be coincidence that this calculation matches. </span><br />
<span face=""verdana" , sans-serif"><br /></span>
<span face=""verdana" , sans-serif">If you see a problem, let me know.</span><br />
<span face=""verdana" , sans-serif"><br /></span></span><span face=""verdana" , sans-serif"><span style="font-family: verdana;">Part III is <a href="http://blog.philbirnbaum.com/2020/04/regressing-park-factors-part-iii.html">here</a>.</span><br /></span>
<span face=""verdana" , sans-serif"><br /></span>
</div>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-6794185616001886402020-03-12T16:12:00.000-04:002020-03-24T12:07:11.096-04:00Regressing Park Factors (Part I)<span style="font-family: "verdana" , sans-serif;">I think park factors* are substantial overestimates of the effect of the park. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">At their core, park effects are basically calculated based on runs scored at home divided by runs scored on the road. But that figure is heavily subject to the effects of luck. One random 10-8 game at home can skew the park effect by more than half a point.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Because of this, most sabermetric sources calculate park effects based on more than one year of data. A three-year sample is common ... I think Fangraphs, Baseball Reference, and the Lahman database all use three years. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That helps, but not enough. It looks like park factors are still too extreme, and need to be substantially regressed to the mean.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">* Here's a quick explanation of how park factors work, for those not familiar with them.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Park Factor is a number that tells us how relatively easy or difficult it is for a team to score runs because of the characteristics of its home park. The value "100" represents the average park. A number bigger than 100 means it's a hitters' park, where more runs tend to score, and smaller than 100 means it's a pitchers' park, where fewer runs tend to score.</span><span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Perhaps confusingly, the park factor averages the home park with an amalgam of road parks, in equal proportion. So if Chase Field has a park factor of 105, which is 5 percent more than average, that really means it's about 10 percent more runs at home, and about average on the road.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">The point of park factor is that you can use it to adjust a hitter's stats to account for his home park. So if Edouardo Escobar creates 106 runs for the Diamondbacks, you divide by 1.05 and figure he'd have been good for about 101 runs if he had played in a neutral home park.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For my large sample of batters (1960-2016, minimum 300 PA), I calculated their runs created per 500 PA (RC500), and their park-adjusted RC500. Then, I binned the players by batting park factor (BPF), and took the average for each bin. If park adjustment worked perfectly, you'd expect every bin to have the same level of performance. After all, there's no reason to think batters who play for the Red Sox or Rockies should be any better or worse overall than batters who are current Mets or former Astros.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">(Because of small sample sizes, I grouped all parks 119+ into a single bin. The average BPF for those parks was 123.4.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's the chart:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>BPF<span style="white-space: pre;"> </span> PA<span style="white-space: pre;"> </span> Runs<span style="white-space: pre;"> </span> Adj'd Regressed</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>------------------------------------------</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 88<span style="white-space: pre;"> </span> 8554 <span style="white-space: pre;"> </span> 67.07<span style="white-space: pre;"> </span> 76.22<span style="white-space: pre;"> </span> 72.67</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 89<span style="white-space: pre;"> </span> 2960<span style="white-space: pre;"> </span> 56.57<span style="white-space: pre;"> </span> 63.56<span style="white-space: pre;"> </span> 60.85</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 90<span style="white-space: pre;"> </span> 43213<span style="white-space: pre;"> </span> 62.53<span style="white-space: pre;"> </span> 69.48<span style="white-space: pre;"> </span> 66.78</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 91<span style="white-space: pre;"> </span> 61195<span style="white-space: pre;"> </span> 62.42<span style="white-space: pre;"> </span> 68.59<span style="white-space: pre;"> </span> 66.19</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 92<span style="white-space: pre;"> </span>121203<span style="white-space: pre;"> </span> 61.04<span style="white-space: pre;"> </span> 66.35<span style="white-space: pre;"> </span> 64.29</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 93<span style="white-space: pre;"> </span>195382<span style="white-space: pre;"> </span> 62.21<span style="white-space: pre;"> </span> 66.89<span style="white-space: pre;"> </span> 65.07</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 94<span style="white-space: pre;"> </span>241681<span style="white-space: pre;"> </span> 61.93<span style="white-space: pre;"> </span> 65.89<span style="white-space: pre;"> </span> 64.35</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 95<span style="white-space: pre;"> </span>304270<span style="white-space: pre;"> </span> 64.72<span style="white-space: pre;"> </span> 68.12<span style="white-space: pre;"> </span> 66.80</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 96<span style="white-space: pre;"> </span>325511<span style="white-space: pre;"> </span> 64.13<span style="white-space: pre;"> </span> 66.81<span style="white-space: pre;"> </span> 65.77</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 97<span style="white-space: pre;"> </span>463537<span style="white-space: pre;"> </span> 63.05<span style="white-space: pre;"> </span> 65.00<span style="white-space: pre;"> </span> 64.24</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 98<span style="white-space: pre;"> </span>520621<span style="white-space: pre;"> </span> 65.62<span style="white-space: pre;"> </span> 66.96<span style="white-space: pre;"> </span> 66.44</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> 99<span style="white-space: pre;"> </span>712668<span style="white-space: pre;"> </span> 64.29<span style="white-space: pre;"> </span> 64.94<span style="white-space: pre;"> </span> 64.69</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>100<span style="white-space: pre;"> </span>674090<span style="white-space: pre;"> </span> 64.86<span style="white-space: pre;"> </span> 64.86<span style="white-space: pre;"> </span> 64.86</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>101<span style="white-space: pre;"> </span>589401<span style="white-space: pre;"> </span> 66.53<span style="white-space: pre;"> </span> 65.87<span style="white-space: pre;"> </span> 66.12</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>102<span style="white-space: pre;"> </span>514724<span style="white-space: pre;"> </span> 66.19<span style="white-space: pre;"> </span> 64.89<span style="white-space: pre;"> </span> 65.39</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>103<span style="white-space: pre;"> </span>440940<span style="white-space: pre;"> </span> 65.48<span style="white-space: pre;"> </span> 63.58<span style="white-space: pre;"> </span> 64.32</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>104<span style="white-space: pre;"> </span>415243<span style="white-space: pre;"> </span> 66.07<span style="white-space: pre;"> </span> 63.53<span style="white-space: pre;"> </span> 64.51</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>105<span style="white-space: pre;"> </span>319334<span style="white-space: pre;"> </span> 67.35<span style="white-space: pre;"> </span> 64.15<span style="white-space: pre;"> </span> 65.39</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>106<span style="white-space: pre;"> </span>177680<span style="white-space: pre;"> </span> 66.15<span style="white-space: pre;"> </span> 62.41<span style="white-space: pre;"> </span> 63.86</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>107<span style="white-space: pre;"> </span>138748<span style="white-space: pre;"> </span> 65.85<span style="white-space: pre;"> </span> 61.54<span style="white-space: pre;"> </span> 63.21</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>108<span style="white-space: pre;"> </span>105850<span style="white-space: pre;"> </span> 67.58<span style="white-space: pre;"> </span> 62.57<span style="white-space: pre;"> </span> 64.51</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>109<span style="white-space: pre;"> </span> 25751<span style="white-space: pre;"> </span> 68.50<span style="white-space: pre;"> </span> 62.84<span style="white-space: pre;"> </span> 65.04</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>110<span style="white-space: pre;"> </span> 48127<span style="white-space: pre;"> </span> 65.46<span style="white-space: pre;"> </span> 59.51<span style="white-space: pre;"> </span> 61.82</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>111<span style="white-space: pre;"> </span> 34278<span style="white-space: pre;"> </span> 69.61<span style="white-space: pre;"> </span> 62.71<span style="white-space: pre;"> </span> 65.39</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>112<span style="white-space: pre;"> </span> 36977<span style="white-space: pre;"> </span> 65.12<span style="white-space: pre;"> </span> 58.14<span style="white-space: pre;"> </span> 60.85</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>113<span style="white-space: pre;"> </span> 23001<span style="white-space: pre;"> </span> 67.94<span style="white-space: pre;"> </span> 60.13<span style="white-space: pre;"> </span> 63.16</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>115<span style="white-space: pre;"> </span> 13778<span style="white-space: pre;"> </span> 74.55<span style="white-space: pre;"> </span> 64.83<span style="white-space: pre;"> </span> 68.60</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>116<span style="white-space: pre;"> </span> 7994<span style="white-space: pre;"> </span> 72.21<span style="white-space: pre;"> </span> 62.25<span style="white-space: pre;"> </span> 66.12</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>117<span style="white-space: pre;"> </span> 20901<span style="white-space: pre;"> </span> 73.62<span style="white-space: pre;"> </span> 62.92<span style="white-space: pre;"> </span> 67.07</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>123.4 39629<span style="white-space: pre;"> </span> 79.28<span style="white-space: pre;"> </span> 64.28<span style="white-space: pre;"> </span> 70.10</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>---------------------------------------<span style="white-space: pre;"> </span></b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>row/row diff 0.60% 0.39% 0.00%</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Start with the third column, which is the raw RC500. As you'd expect, the higher the park factor, the higher the unadjusted runs. That's the effect we want BPF to remove.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, I adjusted by BPF, and that's column 4. We now expect everything to even out, and the column to look uniform. But it doesn't -- now, it goes the other way. Batters in pitchers' parks now look like they're better hitters than the batters in hitters' parks.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That shows that we overadjusted. By how much? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Take a look at the bottom row of the chart. Unadjusted, each row is about 0.6 percent higher than the row above it. We'd expect about 1 percent, if BPF worked perfectly. Adjusted, each row is about 0.4 percent lower. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So BPF overestimates the true park factor by around 40 percent. Which means, if we regress park factors to the mean by 40 percent, we should remove the bias.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's what the last column is. And the numbers look pretty level.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Actually, I didn't use 40 percent ... I used 38.8 percent. That's what gave the best flat fit. (Part of the difference is due to rounding, and the rest due to the fact that I ignored nonlinearity when I calculated the percentages.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Just to be more rigorous and get a more accurate estimate, I ran a regression. Instead of binning the players, I just used all players separately, and did a "weighted regression" that effectively adjusts for the number of PA associated with each bin. Because of the weights, I was able to drop the minimum from 300 PA to 10 PA. Also, I included a dummy variable for year, just in case there were a lot of pitchers' parks in 1987, or something.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The result came out almost exactly the same -- regress by 38.3 percent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Could we have calculated this mathematically just from raw park factors? Yes, I think so -- but not quite in the usual way. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I'll show the usual way here, and save the rest for the next post. If you don't care about the math, you can just stop here.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If we used the usual technique for figuring how much to regress, we'd use</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><b>SD^2(observed) = SD^2(true) + SD^2(luck)</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We can figure luck. The SD of team runs in a game is about 3. For two teams combined, multiply by root 2. Calculating the different from a road game, multiply by root 2 again. Then, for 81 games, divide by the square root of 81, which is 9. Finally, because we're using 3 years, divide by the square root of 3. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You get 0.385 runs. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That figure, 0.385 runs, is 4.27 percent of the usual 9 runs per game. To convert that to a park factor, take half. That's 2.13 points. I'll round to 2.1.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The observed SD, from the Lahman database, is 4.8 points. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We can now calculate SD(true), since 4.8^2 = SD^2(true) + 2.1^2. It works out to 4.3 points.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>SD(observed)= 4.8</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-----------------</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> SD(true)= 4.3</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> SD(luck)= 2.1</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, to regress observed to true, we regress the park factor by (2.1/4.8)^2, which is about 19 percent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Why isn't it 38.3 percent?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Because, in this case, the "true" value is the three-year average. For that, regress 19 percent. But, that's not what we really want when it comes to a single year's performance. For that, we want SD(true) and SD(luck) for just that one year's park factor, not the average of the three years.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It makes sense you need to regress more for one year than three, because there's more randomness: the first 19 percent is for the randomness in the three-year average, and the next 19 percent is for the randomness in the fact that the other two of the three years the park might have been different, so the three-year BPF might not be representative of the year you're looking at.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">There's no obvious relationship between the 19 percent and the 38.3 percent -- it's just coincidence that it comes out double. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But I think I've worked out how we could have calculated the 38.3 figure. I'll write that up for Part II. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span><span style="font-family: "verdana" , sans-serif;">(P.S. You might have noticed that the last column of the chart was fairly level, except for the extreme hitters' parks. I'll talk about that in part III.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span><span style="font-family: "verdana" , sans-serif;">Update, 3/24/20: Part II is <a href="http://blog.philbirnbaum.com/2020/03/regressing-park-factors-part-ii.html">here</a>.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-4388182719052900942020-02-28T15:48:00.000-05:002020-02-28T15:49:12.935-05:00Park adjusting a player's batting line has to depend on who the player is<span style="font-family: "verdana" , sans-serif;">Suppose home runs are scarce in the Astrodome, so that only half as many home runs are hit there than in any other park. One year, Astros outfielder Joe Slugger hits 15 HR at the Astrodome. How do you convert that to be park-neutral? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It seems like you should adjust it to 30. Take the 15 HR, double it, and there you go. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But I don't think that works. I think if you do it that way, you overestimate what Joe would have done in a normal park. I think you need to adjust Joe to something substantially less than 30.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">One reason is that Joe might not necessarily be hurt by the park as much as other players. Maybe the park hurts weaker hitters more, the kind who hit mostly 310-foot home runs. Maybe Joe is the kind who hits most of his fly balls 430 feet, so when the indoor dead air shortens them by 15 feet, they still have enough carry to make it over the fence.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's almost certain that some players should have different park factors than others. Many parks are asymmetrical, so lefties and righties will hit to different outfield depths. Some parks may have more impact on players more who hit more line drive HRs, and less impact on towering fly balls. And so on.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I suspect that's actually a big issue, but I'm going to assume it away for now. I'll continue as if every player is affected by the park the same way, and I'll assume that Joe hit exactly 15 HR at home and 30 HR on the road, exactly in line with expectations.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Also, to keep things simple, two more assumptions. First I'll assume that the park factor is caused by distance to the outfield fence -- that the Astrodome is, say, 10 percent deeper than the average park. Second, I'll assume that in the alternative universe where Joe played in a different home park, he would have hit every ball with exactly the same trajectory and distance that he did in the Astrodome.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">My argument is that with these assumptions, the Astros overall would have hit twice as many HR at home as they actually did. But Joe Slugger would have hit *fewer* than twice as many.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Let's start by defining two classes of deep fly balls:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">A: fly balls deep enough to be HR in any park, including the Astrodome; </span><br />
<span style="font-family: "verdana" , sans-serif;">B: fly balls deep enough to be HR in any park *except* the Astrodome.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We know that, overall, class A is exactly equal in size to class B, since (A+B) is exactly twice A.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's why, when we saw 15 HR in class A, we immediately assumed that implies 15 HR in class B. And so we assumed that Joe would have hit an extra 15 HR in any other park.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That seems like it should work, but it doesn't. Here's a series of analogies that shows why.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. You have a pair of fair dice. You expect them to come up snake eyes (1-1) exactly as often as box cars (6-6). You roll the dice 360 times, and find that 1-1 came up 15 times. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Since 6-6 comes up as often as 1-1, should you estimate that 6-6 also came up 15 times? You should not. Since the dice are fair, you expect 6-6 to have come up 1 time in 36, or 10 times.* The fact that 1-1 got lucky, and came up more often, doesn't mean that 6-6 must have come up more often.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(*Actually, you should expect that 6-6 came up only 9.86 times, since there are 5 fewer tosses left for 6-6 after taking out the successful 1-1s. But never mind for now.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. You have a pair of fair dice, and an APBA card. On that card, 1-1 is a home run, and 6-6 represents a home run anywhere except the Astrodome.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You roll the dice 360 times, and 1-1 comes up 15 times. Do you also expect that 6-6 came up 15 times? Same answer: you expect it came up only 10 times. The fact that 1-1 got lucky doesn't mean that 6-6 must also have gotten lucky.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">3. You have a simulation game, with some number of fair dice, and a card for Joe Slugger. You know the probability of Joe hitting an Astrodome HR is equal to the probability of Joe hitting an "anywhere but Astrodome" HR. But that probability -- Joe's talent level -- isn't necessarily 1 in 36.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You play a season's worth of Joe's home games, and he hit 15 HR. Can you assume that he also hit 15 "anywhere but Astrodome" HR? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, in one special case, you can. If the 15 HR was Joe's actual expectation, based on his talent -- that is, his card -- then, yes, you can assume 15 near-HR. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, in all other cases, you can't. If Joe's 15 HR total was lucky, based on his talent, you should assume fewer than 15 near-HR. And if the 15 HR was unlucky, you should assume more than 15 near-HR.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So I think you can't park adjust players via the standard method of multiplying their performance by their park factor. The park adjustment has to be based on their *expected* performance, not their observed performance.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Suppose Joe Slugger, at the beginning of the season, was projected by the Marcel system to hit 10 HR at home. That means that he was expected to hit 10 HR at the Astrodome, and 10 "almost HR" at the Astrodome.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Instead, he winds up hitting 15 HR there. But we still estimate that he hit only 10 "almost HR". So, instead of bumping his 15 HR total to 30, we bump it only to 25.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I was surprised by this, that there's no way to convert the Astrodome to a normal park that doesn't require you to estimate the player's talent. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But here's what surprised me even more, when I worked it out: you only need to know the player's talent when you're adjusting from a pitchers' park. When you adjust from a hitters' park, one formula works for everyone!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Let's take it the other way, and suppose that Fenway affords twice as many home runs as any other park. And, suppose Joe Slugger, now with the Red Sox, hits 40 at Fenway and 20 on the road.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">How many would he have hit if none of his games were at Fenway?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, on average, half of his 40 HR would have been HR on the road. So, that's 20. End of calculation. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It doesn't matter who the batter is, or what his talent is -- as long as we stick to the assumption that every player's expectation is twice as many HR at Fenway, the expectation is that half his Fenway HR would also have been HR on the road.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">(In reality, it might have been more, or it might have been less, since the breakdown of the 40 HR is random, like 40 coin tosses. But the point is, it doesn't depend on the player.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If you're not convinced, here's a coin toss analogy that might make it clearer.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We ask MLB players to do a coin toss experiment. We give them a fair coin. We tell them, take your day of birth, multiply it by 10, toss the coin that many times, and count the heads. Then, toss the coin that many times again, but this time, count the number of tails.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For the Fenway analogy: heads are "Fenway only" HR. Tails are "any park" HR.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We ask each player to come back and tell us H+T, the total number of Fenway HR. We then try to estimate the heads, the number of "Fenway only" HR.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's easy: we just assume half the number. Mathematically, the expectation for any player, no matter who, is that H will be half of (H+T). That's because no matter how lucky or unlucky he was, there's no reason to expect he was luckier in H than T, or vice-versa.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, for the Astrodome analogy. Heads are "Any park including Astrodome" HR. Tails are "other park only" HR.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We ask each player to come back and tell us only the number of heads, which is the the Astrodome HR total. We'll try to estimate tails, the non-Astrodome HR total.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Rob Picciolo comes back and says he got 15 heads. Naively, we might estimate that he also tossed 15 tails, since the probabilities are equal. But that would be wrong. Because, we would check Baseball Reference, and we would see that Picciolo was born on the 4th of the month, not the 3rd. Which means he actually had 40 tosses, not 30, and was unlucky in heads.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In his 40 tosses for tails, there's no reason to expect he'd have been similarly unlucky, so we estimate that Picciolo tossed 20 tails, not 15.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">On the other hand, Barry Bonds comes back and says he got 130 heads. On average, players who toss 130 heads would also have averaged about 130 tails. But Barry Bonds was born on the 24th of July. We should estimate that he tossed only 120 tails, not 130.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For Fenway, when we know the total number of heads and tails, the player's birthday doesn't factor into our estimate of tails. For the Astrodome, when we know only the total number of heads, the player's birthday *does* factor into our estimate of tails.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, when Joe Slugger plays 81 games at the Astrodome and tosses 15 home run "heads," we can't just expect him to have also tossed 15 long fly ball "tails". We have to look up his home run talent "date of birth". If he was only born on the 2nd of the month, so that we'd have only expected him to hit 10 HR "heads" and 10 near-HR "tails" in the first place, then we estimate he'd have hit only 10 neutral-park HR, not 15. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If we don't do that -- if we don't look at his "date of birth" talent and just double his actual Astrodome HR -- our estimates will be too high for players who were lucky, and too low for players who were unlucky. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Obviously, players who were lucky will have higher totals. That means that if we park-adjust the numbers for the Astros every year, the players who have the best seasons will tend to be the ones we overadjust the most. In other words, when a player was both good and lucky, we're going to make his good seasons look great, his great seasons look spectacular, and his spectacular seasons look like insane outliers. When a player is bad and unlucky, his bad seasons will look even worse.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But if we park-adjust the Red Sox every year ... there's no such effect, and everything should work reasonably well.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">My gut still doesn't want to believe it, but my brain thinks it's correct. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, my gut *didn't* want to believe it, when I wrote that sentence originally. Now, I realize that the effect is pretty small. When a player gets lucky by, say, 20 runs, with a season park factor of 95 ... well, that's only 1 run total. My gut is more comfortable with a 1-run effect.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, suppose you're adjusting a Met superstar, trying to figure out what he'd hit in Colorado. Runs are about 60 percent more abundant in Coors Field than Citi Field, which means the park factor is around 30 percent higher. If the player was 20 runs lucky in that particular season, you'd wind up overestimating him by 6 runs, which is now worth worrying about.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #073763;"><span style="font-family: "verdana" , sans-serif;">(Note: After writing this, but before final edit, I discovered that </span><a href="http://www.tangotiger.net/parks.html" style="font-family: Verdana, sans-serif;">Tom Tango made a similar argument</a><span style="font-family: "verdana" , sans-serif;"> years ago. His analysis dealt with the specific case where the player's observed performance matches his expectation, and for that instance I have reinvented his wheel, 15 years later.)</span></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com3tag:blogger.com,1999:blog-31545676.post-57983725114721671722019-11-18T18:12:00.000-05:002019-11-18T19:03:51.638-05:00Why you can't calculate aging trajectories with a standard regression<span style="font-family: "verdana" , sans-serif;">I found myself in a little Twitter discussion last week about using regression to analyze player aging. I argued that regression won't give you accurate results, and that the less elegant "<a href="https://tht.fangraphs.com/how-do-baseball-players-age-part-1/">delta method</a>" is the better way to go.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Although I did a small example to try to make my point, Tango suggested I do a bigger simulation and a blog post. That's this.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(Some details if you want:</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">For the kind of regression we're talking about, each season of a career is an input row. Suppose Damaso Griffin created 2 WAR at age 23, 2.5 WAR at age 24, and 3 WAR at age 25. And Alfredo Garcia created 1, 1.5, and 1.5 WAR at age 24, 25, and 26. The file would look like:</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">2 23 Damaso Griffin</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">2.5 24 Damaso Griffin</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">3 25 Damaso Griffin</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">1 24 Alfredo Garcia</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">1.5 25 Alfredo Garcia</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">1.5 26 Alfredo Garcia</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">And so on, for all the players and ages you're analyzing. (The names are there so you can have dummy variables for individual player skills.)</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">You take that file and run a regression, and you hope to get a curve that's "representative" or an "average" or a "consolidation" of how those players truly aged.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I simulated 200 player careers. I decided to use a quadratic (parabola), symmetric around peak age. I would have used just a linear regression, but I was worried that it might seem like the conclusions were the result of the model being too simple.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Mathematically, there are three parameters that define a parabola. For this application, they represent (a) peak age, (b) peak production (WAR), and (c) how steep or gentle the curve is.* </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(*The equation is: </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">y = (x - peak age)^2 / -steepness + peak production. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">"Steepness" is related to how fast the player ages: higher steepness is higher decay. Assuming a player has a job only when his WAR is positive, his career length can be computed as twice the square root of (peak WAR * steepness). So, if steepness is 2 and peak WAR is 4, that's a 5.7 year career. If steepness is 6 and peak WAR is 7, that's a 13-year career.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">You can also represent a parabola as y = ax^2+bx+c, but it's harder to get your head around what the coefficients mean. They're both the same thing ... you can use basic algebra to convert one into the other.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For each player, I randomly gave him parameters from these distributions: (a) peak age normally distributed with mean 27 and SD 2; (b) peak WAR with mean 4 and SD 2; and (c) steepness (mean 2, SD 5; but if the result was less than 1.5, I threw it out and picked a new one).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I arbitrarily decided to throw out any careers of length three years or fewer, which reduced the sample from 200 players to 187. Also, I assumed nobody plays before age 18, no matter how good he is. I don't think either of those decisions made a difference.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's the plot of all 187 aging curves on one graph:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoptauFoMn3GxcUSREFvpL5O3inHE47JDYvF3iv-hMtUzeuJD2XVP8B1zar9dVC4G7vPBuTEG5SkPiVNw5O_QF3zioUH3sO3-3wLAutWOujB_VkyLT7JPBtZIMwfGZhhtYJDtQ4w/s1600/aging1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="423" data-original-width="734" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoptauFoMn3GxcUSREFvpL5O3inHE47JDYvF3iv-hMtUzeuJD2XVP8B1zar9dVC4G7vPBuTEG5SkPiVNw5O_QF3zioUH3sO3-3wLAutWOujB_VkyLT7JPBtZIMwfGZhhtYJDtQ4w/s400/aging1.jpg" width="400" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The idea, now, is to consolidate the 187 curves into one representative curve. Intuitively, what are we expecting here? Probably, something like, the curve that belongs to the average player in the list.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The average random career turned out to be age 26.9, peak WAR 4.19, and steepness 5.36. Here's a curve that matches those parameters:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirUWGqGIYDeRjRpAuPPafFYqEW_-u2Ko09jN0K2H7fBZDHtYiaGduT0oZbHFXgtp4t4nOKAPxir8LpycXwHi7jl4QNMfWXX59jOthYObdw5_OVSLbe3_CdX8jFy1ByRJRiGWk2Iw/s1600/aging2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="423" data-original-width="734" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirUWGqGIYDeRjRpAuPPafFYqEW_-u2Ko09jN0K2H7fBZDHtYiaGduT0oZbHFXgtp4t4nOKAPxir8LpycXwHi7jl4QNMfWXX59jOthYObdw5_OVSLbe3_CdX8jFy1ByRJRiGWk2Iw/s400/aging2.jpg" width="400" /></a></div>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That seems like what we expect, when we ask a regression to find the best-fit curve. We want a "typical" aging trajectory. Eyeballing the graph, it does look pretty reasonable, although to my eye, it's just a bit small. Maybe half a year bigger left and right, and a bit higher? But close. Up to you ... feel free to draw on your monitor what you think it should look like. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But when I ran the regression ... well, what came out wasn't close to my guess, and probably not close to your guess either:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS9hgF64LbJ2bLse-eYpagtzy4nB-xwlrz4SahcNIW3gVxmfqLXPbRg3okLl_9x1H5JmJhba5_7RihdsVaxCRk0ctuiOnr-woP3nim2pc95NXVqXRT9ALa54k5DlGPjqr6RKOxtg/s1600/aging3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="423" data-original-width="734" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS9hgF64LbJ2bLse-eYpagtzy4nB-xwlrz4SahcNIW3gVxmfqLXPbRg3okLl_9x1H5JmJhba5_7RihdsVaxCRk0ctuiOnr-woP3nim2pc95NXVqXRT9ALa54k5DlGPjqr6RKOxtg/s400/aging3.jpg" width="400" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's much, much gentler than it should be. Even if your gut told you something different than the black curve, there's no way your gut was thinking this. The regression came up with a 19-year career. A career that long happened only once in the entire 187-player sample. we expected "representative," but the regression gave us 99.5th percentile.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">What happened?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's the same old "selective sampling"/"<a href="http://tangotiger.com/index.php/site/comments/how-can-we-handle-survivorship-bias">survivorship bias</a>" problem.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The simulation decided that when a player's curve scores below zero, those seasons aren't included. It makes sense to code the simulation that way, to match real life. If Jerry Remy had played five years longer than he did, what would his WAR be at age 36? We have no idea.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, with this simulation, we have a God's-eye view of how negative every player would go. So, let's include that in the plot, down to -20:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhahFQqkwkdCy7UXw10cTbYrlEdcmn1N64b3aEFvmW0CEhQzbzxQWEUCFqPvIsu_7sVb7ggvN2gzfBhJEjjgdfPbGwsiwg1ZjM56TrBJZ9Aa2hh91hOw_PBKgYPZhi83_YOXgWOEw/s1600/aging4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="423" data-original-width="734" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhahFQqkwkdCy7UXw10cTbYrlEdcmn1N64b3aEFvmW0CEhQzbzxQWEUCFqPvIsu_7sVb7ggvN2gzfBhJEjjgdfPbGwsiwg1ZjM56TrBJZ9Aa2hh91hOw_PBKgYPZhi83_YOXgWOEw/s400/aging4.jpg" width="400" /></a></div>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">See what's happening? The black curve is based on *all* the green data, both above and below zero, and it lands in the middle. The red curve is based only on the green data above zero, so it ignores all the green negatives at the extremes.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If you like, think of the green lines as magnets, pulling the lines towards them. The green magnets bottom-left and bottom-right pull the black curve down and make it steeper. But only the green magnets above zero affect the red line, so it's much less steep.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In fact, if you scroll back up to the other graph, the one that's above zero only, you'll see that at almost every vertical age, the red line bisects the green forest -- there's about as much green magnetism above the red line it there is below it.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In other words: survivorship bias is causing the difference.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">What's really going on is the regression is just falling for the same classic fallacy we've been warning against for the past 30 years! It's comparing players active (above zero) at age 27 to players active (above zero) at age 35. And it doesn't find much difference. But that's because the two sets of players aren't the same. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">One more thing to make the point clearer. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Let's suppose you find every player active last year at age 27, and average their performance (per 500PA, or whatever). And then you find every player active last year at age 35, and average their performance.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">And you find there's not much difference. And you conclude, hey, players age gracefully! There's hardly any dropoff from age 27 to age 35!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, that's the fallacy saberists have been warning against for 30 years, right? The canonical (correct) explanation goes something like this:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<blockquote class="tr_bq">
<span style="color: #073763; font-family: "verdana" , sans-serif;">"The problem with that logic is that it doesn't actually measure aging, because those two sets of players aren't the same. The players who are able to still be active at 35 are the superstars. The players who were able to be active at 27 are ... almost all of them. All this shows is that superstars at 35 are almost as good as the league average at 27. It doesn't actually tell us how players age."</span></blockquote>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, that logic is *exactly* what the regression is doing. It's calculating the average performance at every age, and drawing a parabola to join them. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's one last graph. I've included the "average at each age" line (blue) calculated from my random data. It's almost a perfect match to the (red) regression line.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_B3CRDBJCnY5828QcIdzuhCoxlDofXjSU91Q2Nm9M7KsfsirjeFbSJNIocVvpCMFeFv1Rxlz4FWz3an8rMa60U1fZsw-2ux7F1fZIqYWAc_iaxDAaLA87oIM57UYjtnCDvoDRyg/s1600/aging5.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="432" data-original-width="735" height="235" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_B3CRDBJCnY5828QcIdzuhCoxlDofXjSU91Q2Nm9M7KsfsirjeFbSJNIocVvpCMFeFv1Rxlz4FWz3an8rMa60U1fZsw-2ux7F1fZIqYWAc_iaxDAaLA87oIM57UYjtnCDvoDRyg/s400/aging5.jpg" width="400" /></a></div>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Bottom line: all the aging regression does is commit the same classic fallacy we repeatedly warn about. It just winds up hiding it -- by complicating, formalizing, and blackboxing what's really going on. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com11tag:blogger.com,1999:blog-31545676.post-27152854608103064112019-10-13T15:07:00.000-04:002019-10-13T15:10:48.789-04:00A study on NBA home court advantage<span style="font-family: "verdana" , sans-serif;">Economist Tyler Cowen often links to NBA studies in his "Marginal Revolution" blog ... <a href="https://marginalrevolution.com/marginalrevolution/2019/08/sunday-assorted-links-225.html">here's a recent one</a>, from an August post. (Follow his link to download the study ... you can also find a press release by Googling the title.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The study used a neural network to try to figure out what factors are most important for home (court) advantage (which I'll call "HCA"). The best fit model used twelve variables: two-point shots made, three-point shots made, and free throws made -- repeated for team at home, opposition on road, team on road, and opposition at home.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The authors write, </span><br />
<br />
<blockquote class="tr_bq">
<span style="color: #990000;"><span style="font-family: "verdana" , sans-serif;">"Networks that include shot attempts, shooting percentage, total points scored, field goals, attendance statistics, elevation and market size as predictors added no improvement in performance. ...</span><span style="font-family: "verdana" , sans-serif;"><br /></span><span style="font-family: "verdana" , sans-serif;"><br />"Contrary to previous work, attendance, elevation and market size were not relevant to understanding home advantage, nor were shot attempts, shooting percentage, overall W-L%, and total points scored."</span></span></blockquote>
<div>
<span style="font-family: "verdana" , sans-serif;"><br /></span></div>
<span style="font-family: "verdana" , sans-serif;">On reflection, it's not surprising that those other variables don't add anything ... the ones they used, shots made, are enough to actually compute points scored and allowed. Once you have that, what does it matter what the attendance was? If attendance matters at all, it would affect wins through points scored and allowed, not something independent of scoring. And "total points scored" weren't "relevant" because they were redundant, given shots made.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The study then proceeds to a "sensitivity analysis," where they increase the various factors, separately, to see what happens to HCA. It turns out that when you increase two-point shots made by 10 percent, you get three to four times the impact on HCA compared to when you increase three-point shots made by the same 10 percent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The authors write,</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<blockquote class="tr_bq">
<span style="color: #990000; font-family: "verdana" , sans-serif;">"[This] suggests teams can maximize their advantage -- and hence their odds of winning -- by employing different shot selection strategies when home versus away. When playing at home, teams can maximize their advantage by shooting more 2P and forcing opponents to take more 2P shots. When playing away, teams can minimize an opponent's home advantage by shooting more 3P and forcing opponents to take more 3P shots."</span></blockquote>
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, yes, but, at the same time, no. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The reason increasing 2P by 10 percent leads to a bigger effect than increasing 3P by 10 percent is ... that 10 percent of 2P is a lot more points! Eyeball the graph of "late era" seasons the authors used (I assume it's the sixteen seasons ending with 2015-16). Per team-season, it looks like the average is maybe 2500 two-point shots made, but only 500 three-point shots.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Adding 10 percent more 2P is 250 shots for 500 points. Adding 10 percent more 3P is 50 shots for 150 points. 500 divided by 150 gives a factor of three-and-a-third -- almost exactly what the paper shows!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I'd argue that what the study discovered is that points seem to affect HCA and winning percentage equally, regardless of how they are scored. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Even so, the argument in the paper doesn't work. By the authors' own choice of variables, HCA is increased by *making* 2P shots, not my *taking* 2P shots. Rephrasing the above quote, what the study really shows is,</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">"When playing at home, teams can maximize their advantage by concentrating on *making* more 2P and on forcing opponents to *miss* more 2P. That's assuming that it's just as easy to impact 2P percentages by 10 percent than to impact 3P percentages by 10 percent."</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But we could have figured that out easily, just by noticing that 10 percent of 2P is more points than 10 percent of 3P.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The authors found that you increase your HCA more with a 10 percent increase in road three-pointers than by a 10 percent increase in road two-pointers. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Sure. But that's because, with the 3P, you actually wind up scoring fewer road points. Which means you win fewer road games. Which makes your HCA larger, since winning fewer road games increases the difference between home and road. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's because the worse you do on the road, the bigger your home court advantage!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Needless to say, you don't really want to increase your HCA by tanking road games. The authors didn't notice that's what they were suggesting.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I think the issue is that the paper assumes that increasing your HCA is always a good thing. It's not. It's actually neutral. The object isn't to increase or decrease your HCA. It's to *win more games*. You can do that by winning more games at home, increasing your home court advantage, or by winning more games on the road, decreasing your home court advantage.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's one of those word biases we all have if we don't think too hard. "Increasing your advantage" sounds like something we should strive for. The problem is, in this context, the home "advantage" is relative to *your own performance* on the road. So it really isn't an "advantage," in the sense of something that makes you more likely to beat the other team. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In fact, if you rotate "Home Court Advantage" 360 degrees and call it "Road Court Disadvantage," now it feels like you want to *decrease* it -- even though it's exactly the same number!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But HCA isn't something you should want to increase or decrease for its own sake. It's just a description of how your wins are distributed.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span><span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-35911440212294280592019-09-06T10:55:00.000-04:002019-09-06T19:20:08.608-04:00Evidence confirming the DH "penalty"<span style="font-family: "verdana" , sans-serif;">In "The Book," Tango/Lichtman/Dolphin found that batters perform significantly worse when they play a game as DH than when they play a fielding position. Lichtman (MGL) later <a href="https://mglbaseball.com/2013/12/09/pinch-hitter-dh-and-other-penalties-revisited/">followed up with detailed results</a> -- a difference of about 14 points of wOBA. That translates to about 6 runs per 500 PA.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">A side effect of my new "luck" database is that I'm able to confirm MGL's result in a different way.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The way my luck algorithm works: it tries to "predict" a player's season by averaging the rest of his career -- before and after -- while adjusting for league, park, and age. Any difference between actual and predicted I ascribe to luck.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I calibrated the algorithm so the overall average luck, over thousands of player-seasons, works out to zero. For most breakdowns -- third basemen, say, or players whose first names start with "M" -- average luck stays close to zero. But, for seasons where the batter was exclusively a DH, the average luck worked out negative -- an average of -3.8 runs per 500 PA. I'll round that to 4.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-6 R/500PA MGL</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-4 R/500PA Phil</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">My results are smaller than what MGL found, but that's probably because we used different methods. I considered only players who never played in the field that year. MGL's study also included the DH games of players who did play fielding positions. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(My method also included PH who never fielded that year. </span><span style="font-family: "verdana" , sans-serif; font-size: x-small;">I made sure to cover the same set of seasons as MGL -- 1998 to 2012.</span><span style="font-family: "verdana" , sans-serif; font-size: x-small;">)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">MGL's study would have included players who were DHing temporarily because they were recovering from injury, and I'm guessing that's the reason for my missing 2 runs.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, what about the 4 runs we have in common? What's going on there? Some possibilities:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. Injury. Maybe when players spend a season DHing, they're more likely to be recovering from some longer-term problem, which also winds up impacting their hitting.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. It's harder to bat as a DH than when playing a position. As "The Book" suggests, maybe "there is something about spending two hours sitting on the bench that hinders a player's ability to make good contact with a pitch."</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">3. Selective sampling. Most designated hitters played a fielding position at some time earlier in their careers. The fact that they are no longer doing so suggests that their fielding ability has declined. Whatever aspect of aging caused the fielding decline may have also affect their batting. In that case, looking at DHs might be selectively choosing players who show evidence of having aged worse than expected.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">4. Something else I haven't thought of.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">You could probably get a better answer by looking at the data a little closer. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For the "harder to DH" hypothesis, you could isolate PA from the top of the first inning, when all hitters are on equal footing with the DH, since the road team hasn't been out on defense yet. And, for the "injury" hypothesis, you could maybe check batters who had DH seasons in the middle of their careers, rather than the end, and check if those came out especially unlucky. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">One test I was able to do is a breakdown of the full-season designated hitters by age:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>Age R/500PA sample size</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-----------------------------</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>28-32 -13.7 2,316 PA</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>33-37 - 6.4 4,305 PA</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>38-42 + 1.4 6,245 PA</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(I've left out the age groups with too few PA to be meaningful.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Young DHs underperform, and older DHs overperform. I think that's suggestive more of the injury and selective-sampling explanations than of the "it's hard to DH" hypothesis. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">----</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /><span style="color: #0b5394;">UPDATE: This 2015 <a href="https://tht.fangraphs.com/re-examining-wars-defensive-spectrum/">post by Jeff Zimmerman</a> finds a similar result. Jeff found that designated hitters had a larger "penalty" for the season in cases where they normally played a fielding position, or when they spent some time on the DL.</span></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-67836869190780486532019-08-14T14:37:00.004-04:002019-08-14T14:51:41.193-04:00Aggregate career year luck as evidence of PED use<span style="font-family: "verdana" , sans-serif;">Back in 2005, I came up with a method to try to estimate how lucky a player was in a given season (see my article in BRJ 34, <a href="https://sabr.box.com/shared/static/980if91o2s5purivbnmp.pdf">here</a>). I compared his performance to a weighted average of his two previous seasons and his two subsequent seasons, and attributed the difference to luck.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I'm working on improving that method, as I've been promising Chris Jaffe I would (for the last eight years or something). One thing I changed was that now, I use a player's entire career as the comparison set, instead of just four seasons. One reason I did that is that I realized that, the old way, a player's overall career luck was based almost completely on how well he did at the beginning and end of his career.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The method I used was to weight the four surrounding seasons in a ratio of 1/2/2/1. If the player didn't play all four of those years, the missing seasons just get left out.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, suppose a batter played from 1981 to 1989. The sum of his luck wouldn't be zero:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(81 luck) = (81) - 2/3(82) - 1/3(83) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(82 luck) = (82) - 2/5(81) - 2/5(83) - 1/5(84) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(83 luck) = (83) - 2/6(82) - 1/6(81) - 2/6(84) - 1/6(85) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(84 luck) = (84) - 2/6(83) - 1/6(82) - 2/6(85) - 1/6(86) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(85 luck) = (85) - 2/6(84) - 1/6(83) - 2/6(86) - 1/6(87) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(86 luck) = (86) - 2/6(85) - 1/6(84) - 2/6(87) - 1/6(88) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(87 luck) = (87) - 2/6(86) - 1/6(85) - 2/6(88) - 1/6(89)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(88 luck) = (88) - 2/5(87) - 1/5(86) - 2/5(89) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(89 luck) = (89) - 2/3(88) - 1/3(87) </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">---------------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">total luck = 13/30(81) +1/6(82) - 7/30(83) - 1/30(84) - 1/30(86) - 7/30(87) - 1/6(88) + 13/30 (89)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace; font-size: x-small;">(*Year numbers not followed by the word "luck" refer to player performance level that year).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span><span style="font-family: "verdana" , sans-serif;">(Sorry about the small font.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If a player has a good first two years and last two years, he'll score lucky. If he has a good third and fourth year, or third last and fourth last year, he'll score unlucky. The years in the middle (in this case, 1985, but, for longer careers, any seasons other than the first four and last four) cancel out and don't affect the total.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, by comparing each year to the player's entire career, that problem is gone. Now, every player's luck will sum close to zero (before regressing to the mean).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's not that big a deal, but it was still worth fixing.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">This meant I had to adjust for age. The old way, when a player was (say) 36, his estimate was based on his performance from age 34-38 ... reasonably close to 36. Although players decline from 34 to 38, I could probably assume that the decline from 34 to 36 was roughly equal to the decline from 36 to 38, so the age biases would cancel out.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But now, I'm comparing a 36-year-old player to his entire career ... say, from age 25 to 38. Now, we can't assume the 25-35 years, when the player was in his prime, cancel out the 37-38 years, when he's nowhere near the player he was.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">---------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So ... I have to adjust for age. What adjustment should I use? I don't think there's an accepted aging scale. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But ... I think I figured out how to calculate one.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Good luck should be exactly as prevalent as bad luck, by definition. That means that when I look at all players of any given age, the total luck should add up to zero.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, I experimented with age adjustments until all ages had overall luck close to zero. It wasn't possible to get them to exactly zero, of course, but I got them close.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">From age 20 to 36, for both batting and pitching, no single age was lucky or unlucky more than half a run per 500 PA. Outside of that range, there were sample size issues, but that's OK, because if the sample is small enough, you wouldn't expect them close to zero anyway.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">---------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Anyway, it occurred to me: maybe this is an empirical way to figure out how players age! Even if my "luck" method isn't perfect, as long as it's imperfect roughly the same way for various ages, the differences should cancel out. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">As I said, I'm still fine-tuning the adjustments, but, for what it's worth, here's what I have for age adjustments for batting, from 1950 to 2016, denominated in Runs Created per 500 PA:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(1-17) = 0.7</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(18) = 0.74</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(19) = 0.75</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(20) = 0.775</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(21) = 0.81</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(22) = 0.84</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(23) = 0.86</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(24) = 0.89</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(25) = 0.9</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(26) = 0.925</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(27) = 0.925</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(28) = 0.925</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(29) = 0.925</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(30) = 0.91</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(31) = 0.8975</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(32) = 0.8775</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(33) = 0.8625</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(34) = 0.8425</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(35) = 0.8325</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(36) = 0.8225</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(37) = 0.8025</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(38) = 0.7925</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(39-42) = 0.7</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b> age(43+) = 0.65</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">These numbers only make sense relative to each other. For instance, players created 11 percent more runs per PA at age 24 than they did at age 37 (.89 divided by .8025 equals 1.11).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(*Except ... there might be an issue with that. It's kind of subtle, but here goes.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">The "24" number is based on players at age 24 compared to the rest of their careers. The "37" number is based on players at age 37 compared to the rest of their careers. It doesn't necessarily follow that the ratio is the same for those players who were active both at 24 and 37. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">If you don't see why: imagine that every active player had to retire at age 27, and was replaced by a 28-year-old who never played MLB before. Then, the 17-27 groups and the 28-43 groups would have no players in common, and the two sets of aging numbers would be mutually exclusive. (You could, for instance, triple all the numbers in one group, and everything would still work.)</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">In real life, there's definitely an overlap, but only a minority of players straddle both groups. So, you could have somewhat of the same situation here, I think.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">I checked batters who were active at both 24 and 37, and had at least 1000 PA combined for those two seasons. On average, they showed lucky by +0.2 runs per 500 PA. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">That's fine ... but from 750 to 999 PA, there were 73 players, and they showed unlucky by -3.7 runs per 500 PA. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">You'd expect those players with fewer PA to have been unlucky, since if they were lucky, they'd have been given more playing time. (And players with more PA to have been lucky.) But is 3.7 runs too big to be a natural effect? (And is the +0.2 runs too small?)</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">My gut says: maybe, by a run or two. Still, if this aging chart works for this selective sample within a couple of runs in 500 PA, that's still pretty good.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">Anyway, I'm still thinking about this, and other issues.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">---------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In the process of experimenting with age adjustments, I found that aging patterns weren't constant over that 67-year period. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For instance: for batters from 1960 to 1970, the peak ages from 27 to 31 all came out unlucky (by the standard of 1950-2015), while 22-26 and 32-34 were all lucky. That means the peak was lower that decade, which means more gentle aging. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Still: the bias was around +1/-1 run of luck per 500 PA -- still pretty good, and maybe not enough to worry about.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">---------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If the data lets us see different aging patterns for different eras, we should be able to use it to see the effects of PEDs, if any.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's luck per 500 PA by age group for hitters, 1995 to 2004 inclusive:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-1.75 age 17-22</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-0.74 age 23-27</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+0.61 age 28-32</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+0.99 age 33-37</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+0.45 age 38-42</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That seems like it's in the range we'd expect given what we know, or think we know, about the prevalence of PEDs during that period. It's maybe 2/3 of a run better than normal for ages 28 to 42. If, say 20 percent of hitters in that group were using PEDs, that would be around 3 runs each. Is that plausible? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's pitchers:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-1.22 age 17-22</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-0.51 age 23-27</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+1.36 age 28-32 </b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+1.42 age 33-37 </b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: #990000;">+1.07 age 38-42</span></b> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, that's pretty big (and statistically significant), all the way from 28 to 42: for a starter who faces 800 batters, it's about 2 runs. if 20 percent of pitchers are on PEDs, that's 10 runs each.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">By checking the post-steroid era, we can check the opposing argument that it's not PEDs, it's just better conditioning, or some such. </span><span style="font-family: "verdana" , sans-serif;">Here's pitchers again, but this time 2007-2013:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-0.06 age 17-22</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+1.01 age 23-27</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+0.30 age 28-32</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>-1.67 age 33-37</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>+0.59 age 38-42</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, from 28 to 42, pitchers were *unlucky* on average, overall.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I'd say this is pretty good support for the idea that pitchers were aging better due to PEDs ... especially given actual knowledge and evidence that PED use was happening.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-81001911434267140752019-03-26T13:25:00.001-04:002019-03-26T14:17:29.897-04:00True talent levels for individual players<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(Note: Technical post about practical methods to figure MLB distribution of player talent and regression to the mean.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For a long time, we've been using the "Palmer/Tango" <a href="http://www.insidethebook.com/ee/index.php/site/comments/true_talent_levels_for_sports_leagues/">method</a> to estimating the spread of talent among MLB teams. You're probably sick of seeing it, but I'll run it again real quick for 2013:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "verdana" , sans-serif;">1. Find the SD of observed team winning percentage from the standings. In 2013, SD(observed) was 0.0754.</span><br />
<span style="color: #990000; font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "verdana" , sans-serif;">2. Calculate the theoretical SD of luck in a team-season. Statistical theory tells us the formula is the square root of p(1-p)/162, where p is the probability of winning. Assuming teams aren't that far from .500, SD(luck) works out to around 0.039.</span><br />
<span style="color: #990000; font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "verdana" , sans-serif;">3. Since luck is independent of talent, we can say that SD(observed)^2 = SD(luck)^2 + SD(talent)^2 . Substituting the numbers gives our estimate that SD(talent) = 0.0643. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That works great for teams. But what about players? What's the spread of talent, in, say, on-base percentage, for individual hitters?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It would be great to use the same method, but there's a problem. Unlike team-seasons, where every team plays 162 games, every player bats a different number of times. Sure, we can calculate SD(luck) for each hitter individually, based on his playing time, but then how do we combine them all into one aggregate "SD(luck)" for step 3? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Can we use the average number of plate appearances? I don't think that would work, actually, because the SD isn't linear. It's inversely proportional to the square root of PA, but even if we used the average of that, I still don't think it would work.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Another possibility is to consider only batters with close to some arbitrary number of plate appearances. For instance, we could just take players in the range 480-520 PA, and treat them as if they all had 500 PA. That would give a reasonable approximation.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, that would only help us find talent for batters who make it to 500 PA. Those batters are generally the best in baseball, so the range we find will be much too narrow. Also, batters who do make it to 500 PA are probably somewhat lucky (if they started off 15-for-100, say, they probably wouldn't have been allowed to get to 500). That means our theoretical formula for binomial luck probably wouldn't hold for this sample.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, what do we do?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I don't think there's an easy way to figure that out. Unless Tango already has a way ... maybe I've missed something and reinvented the wheel here, because after thinking about it for a while, I came up with a more complicated method. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The thing is, we still need to have all hitters have the same number of PA. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We take the batter with the lowest playing time, and use that. It might be 1 PA. In that case, for all the hitters who have more than 1 PA, we reduce them down to 1 PA. Now that they're all equal, we can go ahead and run the usual method. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Well, actually, that's a bit of an exaggeration ... 1 PA doesn't work. It's too small, for reasons I'll explain later. But 20 PA does seem to work OK. So, we reduce all batters down to 20 PA.* </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">*The only problem is, we'll only be finding the talent range for the subset of batters who are good (or lucky) enough to make it to 20 plate appearances. That should be reasonable enough for most practical purposes, though. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">How do we take a player with 600 PA, and reduce his batting line to 20 PA? We can't just scale down. Proportionally, there's much less randomness in large samples than small, so if we treated a player's 20 PA as an exact replica of his performance in 600 PA, we'd wind up with the "wrong" amount of luck compared to what the formulas expect, and we'd get the wrong answer.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, what I did was: I took a random sample of 20 PA from every batter's batting line, sampling "without replacement" (which means not using the same plate appearance twice). </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Once that's done, and every hitter is down to 20 PA, we can just go ahead and use the standard method. Here it is for 2013:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. There were 602 non-pitchers in the sample. The SD of the 602 observed batter OBP values (based on 20 PA per player) was 0.1067.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. Those batters had an aggregate OBP of .2944. The theoretical SD(luck) in 20 PA with a .2944 expectation is 0.1019.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">3. The square root of (0.1067 squared - 0.1019 squared) equals 0.0317 squared.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, our estimate of SD(talent) = 0.0317. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That implies that 95% of batters range between .247 and .373. Seems pretty reasonable.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I think this method actually works quite decently. One issue, though, is that it includes a lot of randomness. All the regulars with 500 or 600 plate appearances ... we just randomly pick 20, and ignore the rest. The result is sensitive to which random numbers are pulled. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">How sensitive? To give you an idea, here are the results of 10 different random runs:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0317</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0286</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0340</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0325</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0464</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0471</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0257</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0421</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>imaginary</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>0.0435</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I should explain the "imaginary" one. That happens when, just by random chance, SD(observed) is smaller than the expected SD(luck). It's more frequent when the sample size is so small -- say, 20 PA -- that luck is much larger than talent. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In our original run, SD(observed) was 0.0107 and SD(luck) was 0.0102. Those are pretty close to each other. It doesn't take much random fluctuation to reverse their order ... in the "imaginary" run, the numbers were 0.01021 and 0.01022, respectively.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">More generally, when SD(observed) and SD(luck) are so close, SD(talent) is very sensitive to small random changes in SD(observed). And so the estimates jump around a lot.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">(And that's the reason I used the 20 PA minimum. With a sample size of 1 PA, there would be too much distortion from the lack of symmetry. I think. Still investigating.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The obvious thing to do is just do a whole bunch of random runs, and take the average. That's doesn't quite work, though. One problem is that you can't average the imaginary numbers that sometimes come up. Another problem -- actually, the same problem -- is that the errors aren't symmetrical. A negative random error decreases the estimate more than a positive random error increases the estimate. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">To help get around that, I didn't average the 500 estimates in the list. Instead, I averaged the 500 values of SD(observed), and 500 estimates of SD(luck). Then, I calculated SD(talent) from those.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The result:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #cc0000; font-family: "courier new" , "courier" , monospace;"><b>SD(talent) = 0.0356</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Even with this method, I suspect the estimate is still a bit off. I'm thinking about ways to improve it. I still think it's decent enough, though.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, now we have our estimate that for 2013, SD(talent)=0.0356. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The next step: estimating a batter's true talent based on his observed OBP.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">We know, from Tango, that we can estimate any player's talent by regressing to the mean -- specifically, "diluting" his batting line by adding a certain number of PA of average performance. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">How many PA do we need to add? As <a href="http://blog.philbirnbaum.com/2011/08/tango-method-of-regression-to-mean-kind.html">Tango showed</a>, it's the number that makes SD(luck) equal to SD(talent). </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In the 500 simulations, SD(luck) averaged 0.1023 in 20 PA. To get luck down to 0.0356, where it would equal SD(talent), we'd need 166 PA. (That's 20 multiplied by the square of (0.1023 / 0.0356)). I'll just repeat that for reference:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #cc0000; font-family: "courier new" , "courier" , monospace;"><b>Regress by 166 PA</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">A value of 166 PA seems reasonable. To check, I ran every season from 1950 to 2016, and 166 was right in line. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The average of the 57 seasons was 183 PA. The highest was 244 PA (1981); the lowest was 108 PA (1993). </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now we know we need to add 166 PA of average performance to a batting line to go from observed performance to estimated talent. But what, exactly, is "average performance"?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">There are at least four different possibilities:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. Regress to the observed real-life OBP. In MLB in 2013, for non-pitchers with at least 20 PA, that was .3186. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. Regress to the observed real-life OBP weighting every batter equally. That works out to .2984. (It's smaller than the actual MLB number because, in real life, worse hitters get fewer-than-equal PA.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">3. Regress to the average *talent*, weighted by real-life PA.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">4. Regress to the average *talent*, weighting every batter equally.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which one is correct? I had never actually thought about the question before. That's because I had only every used this method on team talent, and, for teams, all four averages are .500. Here, they're all different. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I won't try to explain why, but I think the correct answer is number 4. We want to regress to the average talent of the players in the sample.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Except ... now we have a Catch-22. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">To regress performance to the mean, we need to know the league's average talent. But to know the league's average talent, we need to regress performance to the mean!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">What's the way out of this? It took me a while, but I think I have a solution.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The Tango method has an implicit assumption that -- while some players may have been lucky in 2013, and some unlucky -- overall, luck evened out. Which means, the observed OBP in MLB in 2013 is exactly equal to the expected OBP based on player talent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Since the actual OBP was .3186, it must be that the expected OBP, based on player talent, is also .3186. That is: if we regress every player towards X by 166 PA, the overall league OBP has to stay .3186. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">What value of X makes that happen?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I don't think there can be an easy formula for X, because it depends on the distribution of playing time -- most importantly, how much more playing time the good hitters got that year compared to the bad hitters.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So I had to figure it out by trial and error. The answer:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #cc0000; font-family: "courier new" , "courier" , monospace;"><b>Mean of player talent = .30995</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(If you want to check that yourself, just regress every player's OBP while keeping PA constant, and verify that the overall average (weighted by PA) remains the same. Here's the SQL I used for that:</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">SELECT </span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">sum(H+bb)/sum(ab+bb) AS actual, </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">sum((h+bb+.30995*166)/(ab+bb+166)*(ab+bb)) / sum(ab+bb) AS regressed </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">FROM batting</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">WHERE yearid=2013 and ab+bb>=20 and primarypos <> "P"</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">The idea is that "actual" and "regressed" should come out equal.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">The "primarypos" column is one I created and populated myself, but the rest should work right from the Lahman database. </span><span style="font-family: "verdana" , sans-serif; font-size: x-small;">You can leave out the "primarypos" and just use all hitters with 20+ PA. You'll probably find that it'll be something lower than .30995 that makes it work, since including pitchers brings down the average talent. </span><span style="font-family: "verdana" , sans-serif; font-size: x-small;">Also, with a different population of talent, the correct number of PA to regress should be something other than 166 -- probably a little lower? -- but 166 is probably close.</span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">While I'm here ... I should have said earlier that I used only walks, AB, and hits in my definition of OBP, all through this post.)</span><br />
<br />
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, a summary of the method:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. For each player, take a random 20 PA subset of his batting line. Figure SD(observed) and SD(luck).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. Repeat the above enough times to get a large sample size, and average out to get a stable estimate of SD(observed) and SD(luck).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">3. Use the Tango method to calculate SD(talent).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">4. Use the Tango method to calculate how many PA to regress to the mean to estimate player talent.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">5. Figure what mean to regress to by trial and error, to get the playing-time-weighted average talent equal to the actual league OBP.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">----------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If I did that right, it should work for any stat, not just OBP. Eventually I'll run it for wOBA, and RC27, and BABIP, and whatever else comes to mind. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">As always, let me know if I've got any of this wrong.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span><span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com6tag:blogger.com,1999:blog-31545676.post-30958423626799076222019-01-15T19:21:00.000-05:002019-09-08T03:34:55.511-04:00Fun with splits<span style="font-family: "verdana" , sans-serif;">This was Frank Thomas in 1993, a year in which he was American League MVP with an OPS of 1.033.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace;"> PA H 2B 3B HR BB K BA OPS </span><br />
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace;">-------------------------------------------------- </span><br />
<span style="color: #0b5394; font-family: "courier new" , "courier" , monospace;">'93 F. Thomas 676 174 36 0 41 112 54 .317 1.033 </span><span style="font-family: "verdana" , sans-serif;"> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Most of Thomas's hitting splits were fairly normal:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">Home/Road: 1.113/0.950</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">First vs. Second Half: 0.970/1.114</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Vs. RHP/LHP: 1.019/1.068</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Outs in inning: 1.023/1.134/0.948</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Team ahead/behind/tied: 1.016/0.988/1.096</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Early/mid/late innings: 1.166/0.950/0.946</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Night/day: 1.071/0.939</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But I found one split that was surprisingly large:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"> PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">Thomas 1 352 108 22 0 33 58 34 .367 1.251 14.81 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #990000;">Thomas 2 309 66 14 0 8 54 20 .259 0.796 5.45</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">"Thomas 1" was an order of magnitude better than "Thomas 2," to the extent that you wouldn't recognize them as the same player. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">This is a real split ... it's not a selective-sampling trick, like "team wins vs. losses," where "team wins" were retroactively more likely to have been games in which Thomas hit better. (For the record, that particular split was 1.172/.828 -- this one is wider.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So what is this split? The answer is ... </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">.</span><br />
<span style="font-family: "verdana" , sans-serif;">.</span><br />
<span style="font-family: "verdana" , sans-serif;">.</span><br />
<br />
<span style="font-family: "verdana" , sans-serif;">The first line is games on odd-numbered days of the month. The second line is even-numbered days.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In other words, this split is random.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In terms of OPS difference -- 455 points -- it's the biggest odd/even split I found for any player in any season from 1950 to 2016 with at least 251 AB <strike>PA</strike> each half. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If we go down to a 150 AB minimum, the biggest is Ken Phelps in 1987:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">1987 Phelps PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">odd 204 31 3 0 8 39 33 .188 <b>0.695</b> 3.79 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #990000;">even 208 55 10 1 19 41 42 .329 <b>1.204</b> 13.03</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">And if we go down to 100 AB, it's Mike Stanley, again in 1987, but on the opposite days to Phelps:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">1987 Stanley PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #990000;">odd 134 42 6 1 6 18 23 .362 <b>1.034</b> 10.49 </span></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">even 113 17 2 0 0 13 25 .170 <b>0.455</b> 1.55 </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But, from here on, I'll stick to the 251 AB standard.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That 1993 Frank Thomas split was also the biggest gap in home runs, with a 25 HR difference between odd and even (33 vs. 8). Here's another I found interesting -- Dmitri Young in 2001:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2001 D Young PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">Odd 285 68 12 2 <b> 2</b> 18 40 .255 0.639 3.48 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #990000;">Even 292 95 16 1 <b>19</b> 19 37 .348 1.013 9.51</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Only two of Young's 21 home runs came on odd-numbered days. The binomial probability of that happening randomly (19-2/2-19 or better) is about 1 in 4520.* And, coincidentally, there were exactly 4516 players in the sample!</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">(* Actually, it must be more likely than 1 in 4520. The binomial probability assumes each opportunity is independent, and equally likely to occur on an even day as an odd day. But, PA tend to happen in daily clusters of 3 to 5. Since PAs are more likely to cluster, so are HR. </span><br />
<span style="font-family: "verdana" , sans-serif; font-size: x-small;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">To see that more easily, imagine extreme clustering, where there are only two games a year (instead of 162), with 250 PA each game. Half of all players would have either all odd PA or all even PA, and you'd see lots of extreme splits.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For K/BB ratio, check out Derek Jeter's 2004: </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2004 Jeter PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">---------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">odd 362 113 27 1 15 <b>14 63</b> .325 0.888 7.12 </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">even 327 75 17 0 8 <b>32 36</b> .254 0.720 4.40 </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">There were bigger differences, but I found Jeter's the most interesting. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In 1978, all 10 of Rod Carew's triples came on even-numbered days:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">1978 Carew PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">---------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">odd 333 92 10 <b> 0</b> 0 45 34 .319 0.766 5.46 </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">even 309 96 16 <b>10</b> 5 33 28 .348 0.950 8.69 </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">A 10-0 split is a 1-in-512 shot. I'd say again that it's actually a bit more likely than that because of PA clustering, but ... Carew actually had *fewer* PA in that situation! </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Oh, and Carew also hit all five of his HR on even days. Combining them into 15-0 is binomial odds of 16383 to 1, if you want to do that.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Strikeouts and walks aren't quite as impressive. It's Justin Upton 2013 for strikeouts:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2003 Upton PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">-----------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">odd 330 71 14 1 16 31 <b>102</b> .237 0.761 4.67 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #990000;">even 303 76 13 1 11 44 <b>59</b> .293 0.875 6.84</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">And Mike Greenwell 1988 for walks:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">88 Greenwell PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">-----------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">odd 357 91 15 3 10 <b>62</b> 18 .308 0.910 7.61 </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">even 320 101 24 5 12 <b>25 </b> 20 .342 0.973 8.85 </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Interestingly, Greenwell was actually more productive on the even-numbered days where he took less than half as many walks.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Finally, here's batting average, Grady Sizemore in 2005:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2005 Sizemore PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">-----------------------------------------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">odd 344 69 9 4 12 26 79 <b>.217</b> 0.660 3.45 </span><br />
<span style="color: #990000;"><span style="font-family: "courier new" , "courier" , monospace;">even 348 116 28 7 10 26 53 <b>.360</b> 0.992 9.50</span><span style="font-family: "verdana" , sans-serif;"> </span></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Another anomaly -- Sizemore hit more home runs on his .217 days than on his .360 days.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Anyway, what's the point of all this? Fun, mostly. But, for me, it did give me a better idea of what kinds of splits can happen just by chance. If it's possible to have a split of 33 odd homers and 8 even homers, just by luck, then it's possible to have 33 first-half homers and 8 second-half homers, just by luck. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Of course, you should just expect that size of effect once every 40 years or so. It might more intuitive to go from a 40-year standard to a single-season standard, to get a better idea of what we can expect each year. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">To do that, I looked at 1977 to 2016 -- 39 seasons plus 1994. Averaging the top 39 should roughly give us the average for the year. Instead of the average, I figured I'd just (unscientifically) take the 25th biggest ... that's probably going to be close to the median MLB-leading split for the year, taking into account that some years have more than one of the top 39.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For HR, the 25th ranked is Fred McGriff's 2002. It's an impressive 22/8 split:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span><span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">02 McGriff PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #351c75;"><span style="font-family: "courier new" , "courier" , monospace;">odd 297 70 11 1 <b>2</b></span><span style="font-family: "courier new" , "courier" , monospace;"><b>2</b></span><span style="font-family: "courier new" , "courier" , monospace;"> 42 47 .275 0.961 7.74 </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #351c75;">even 289 73 16 1 <b> 8</b> 21 52 .272 0.754 4.89</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For OPS, it's Scott Hatteberg in 2004:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">04 Hatteberg PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">odd 312 92 19 0 10 37 23 .335 <b>0.926</b> 8.12 </span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">even 310 64 11 0 5 35 25 .233 <b>0.647</b> 3.47</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For strikeouts, it's Felipe Lopez, 2005. Not that huge a deal ... only 27 K difference.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">05 F. Lopez PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">odd 316 78 15 2 12 19 <b> 69</b> .263 0.755 4.75 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #351c75;">even 321 91 19 3 11 38 <b>42</b> .322 0.928 7.95</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For walks, it's Darryl Strawberry's 1987. The difference is only 23 BB, but to me it looks more impressive than the 27 strikeouts:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">87 Strwb'ry PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #351c75;"><span style="font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span></span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">odd 315 77 15 2 19 <b>37</b> 55 .277 0.912 7.02 </span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">even 314 74 17 3 20 <b>60</b> 67 .291 1.045 9.49 </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For batting average, number 25 is Orestes Infante, 2011, but I'll show you the 24th ranked, which is Rickey Henderson in his rookie card year. (Both players round to a .103 difference.)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">1980 Rickey PA H 2B 3B HR BB K BA OPS RC/G </span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">----------------------------------------------------</span><br />
<span style="color: #351c75; font-family: "courier new" , "courier" , monospace;">odd 340 100 13 1 2 60 21 <b>.357</b> 0.903 8.07 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #351c75;">even 368 79 9 3 7 57 33 <b>.254</b> 0.739 4.67</span> </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I'm going to think of this as, every year, the league-leading random split is going to look like those. Some years it'll be higher, some lower, but these will be fairly typical.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">That's the league-leading split for *each category*. There'll be a random home/road split of this magnitude (in addition to actual home/road effect). There'll be a random early/late split of this magnitude (in addition to any fatigue/weather effects). There'll be a random lefty/righty split of this magnitude (in addition to actual platoon effects). And so on.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Another way I might use this is to get an intuitive grip on how much I should trust a potentially meaningful split. For instance, if a certain player hits substantially worse in the second half of the season than in the first half, how much should you worry? To figure that out, I'd list a season's biggest even/odd splits alongside the season's biggest early/late splits. If the 20th biggest real split is as big as the 10th biggest random split, then, knowing nothing else, you can start with a guess that there's a 50 percent chance the decline is real.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Sure, you could do it mathematically, by figuring out the SD of the various stats. But that's harder to appreciate. And it's not nearly as much fun as being able to say that, in 1987, Rod Carew hit every one of his 10 triples and 5 homers on even-numbered days. Especially when anyone can go to <a href="https://www.baseball-reference.com/players/gl.fcgi?id=carewro01&t=b&year=1978">Baseball Reference</a> and verify it.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com2tag:blogger.com,1999:blog-31545676.post-63482248577664864782018-12-18T12:30:00.000-05:002018-12-18T12:31:44.061-05:00Does the NHL's "loser point" help weaker teams?<span style="font-family: "verdana" , sans-serif;">Back <a href="http://blog.philbirnbaum.com/2013/01/luck-vs-talent-in-nhl-standings.html">when I calculated</a> that it took 73 NHL games for skill to catch up with luck in the standings, I was surprised it was so high. That's almost a whole season. In MLB, it was less than half a season, and in the NBA, Tango found it was only 14 games, less than one-fifth of the full schedule.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Seventy-three games seemed like that was a lot of luck. Why so much? As it turns out, it was an anomaly -- the NHL was just having an era where differences in team talent were small. Now, it's <a href="http://blog.philbirnbaum.com/2018/12/2007-12-was-era-of-competitive-balance.html">back under 40 games</a>.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But I didn't know that at the time, so I had a different explanation: it </span><span style="font-family: "verdana" , sans-serif;">must be the extra point the NHL started giving out for overtime losses. The "loser point," I reasoned, was reducing the importance of team talent, by giving the worse teams more of a chance to catch up to the better teams.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">My line of thinking was something like this: </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<blockquote class="tr_bq">
<span style="color: #990000; font-family: "verdana" , sans-serif;">1. Loser points go disproportionately to worse teams. For team-seasons, there's a correlation of around .4 between negative goal differential (a proxy for team quality) and OTL. So, the loser point helps the worse teams gain ground on the better teams.</span><br />
<br />
<span style="color: #990000; font-family: "verdana" , sans-serif;">2. Adding loser points adds more randomness. When you lose by one goal, whether that goal comes early in the game, or after the third period, is largely a matter of random chance. That adds "when the goals were" luck to the "how many goals there were" luck, which should help mix up the standings more. In fact, as I write this, the Los Angeles Kings have two more wins and three fewer losses than the Chicago Blackhawks. But, because Chicago has five OTL to the Kings' one, they're actually tied in the standings.</span></blockquote>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But ... now I realize that argument is wrong. And, the conclusion is wrong. It turns out the loser point actually does NOT help competitive balance in the NHL. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, what's the flaw in my old argument? </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">I think the answer is: the loser point does affect how compressed the standings get in terms of actual points, but it doesn't have much effect on the *order* of teams. The bottom teams wind up still at the bottom, but (for instance) instead of having only half as many points as the top teams, they have two-thirds as many points.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here's one way to see that. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Suppose there's no loser point, so the winner always gets two points and the loser always gets none (even if it was an overtime or shootout loss). </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, make a change so the losing team gets a point, but *every time*. In that case, the difference between any two teams gets cut in half, in terms of points -- but the order of teams stays exactly the same. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The old way, if you won W games, your point total was 2W. Now, it's W+82. Either way, the order of standings stays the same -- it's just that the differences between teams are cut in half, numerically.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It's still true that the "loser point" goes disproportionately to the worse teams -- the 50-32 team gets only 32 loser points, while the 32-50 team gets 50 of them. But that doesn't matter, because those points are never enough to catch up to any other team. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If you ran the luck vs. skill numbers for the new system compared to the old system, it would work out exactly the same.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In real life, of course, the losing team doesn't get a point every time: only when it loses in overtime. Last season, that happened in about 11.6 percent of games, league-wide, or about 23.3 percent of losses.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If the loser point happened in *exactly* 23.3 percent of losses, for every team, with no variation, the situation would be the same as before -- the standings would get compressed, but the order wouldn't change. It would be as if, every loss, the loser got an extra 0.233 points. No team could pass any other team, since for every two points it was behind, it only gets 0.233 points to catch up. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But: what if you assume that it's completely random which losses become overtime losses? Now, the order can change. A 40-42 team can catch up to a 41-41 team if its losses had randomly included two more overtime losses than its rival. The chance of that happening is helped by the fact that the 40-42 team has one extra loss to try to randomly convert. It needs two random points to catch up, but it starts with a positive expectation of an 0.233 point head start.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If losses became overtime losses in a random way, then, yes, the OTL would make luck more important, and my argument would be correct. But they don't. It turns out that better teams turn losses into OTL much more frequently than worse teams, on a loss-for-loss basis.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Which makes sense. Worse teams' losses are more likely to be blowouts, which means they're less likely to be close losses. That means fewer one-goal losses, proportionately. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In other words: </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">(a) bad teams have more losses, but </span><br />
<span style="font-family: "verdana" , sans-serif;">(b) those losses are less likely to result in an OTL. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Those two forces work in opposite directions. Which is stronger?</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Let's run the numbers from last year to find out.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">If we just gave two points for a win, and zero for a loss, we'd have: </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)=16.47</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(luck) = 9.06</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(talent) =13.76</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But in real life, which includes the OTL, the numbers are</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)=15.44</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(luck) = 8.48</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(talent) =12.90</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Converting so we can compare luck to talent:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>35.5 games until talent=luck (no OTL point)</b></span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;"><b>35.4 games until talent=luck (with OTL point)</b></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It turns out, the two factors almost exactly cancel out! Bad teams have more chances for an OTL point because they lose more -- but those losses are less likely to be OTL almost in exact proportion.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">And that's why I was wrong -- why the OTL point doesn't increase competitive balance, or make the standings less predictable. It just makes the NHL *look* more competitive, by making the point differences smaller.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0tag:blogger.com,1999:blog-31545676.post-5700564203581117292018-12-12T16:51:00.001-05:002018-12-12T16:53:03.843-05:002007-12 was an era of competitive balance in the NHL<span style="font-family: "verdana" , sans-serif;">Five years ago, I calculated that <a href="http://blog.philbirnbaum.com/2013/01/luck-vs-talent-in-nhl-standings.html">in the NHL</a>, it took 73 games until talent was as important as luck in determining the standings. But in a previous study, <a href="http://www.insidethebook.com/ee/index.php/site/comments/true_talent_levels_for_sports_leagues/">Tango found</a> that it took only 36 games. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Why the difference?</span><br />
<br />
<span style="font-family: "verdana" , sans-serif;">I think it's because the years for which I ran the study -- 2006-07 to 2011-12 -- were seasons in which the NHL was much more balanced than usual. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">For each of those six seasons, I went to <a href="https://www.hockey-reference.com/leagues/NHL_2008.html">hockey-reference</a> to find the SD of team standings points:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2006-07 16.14</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2007-08 10.43</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2008-09 13.82</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2009-10 12.95</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2010-11 13.27</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2011-12 11.73</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">average 13.18 (root mean square)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Tango's study was written in August, 2006. The previous season had a higher spread:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2005-06 16.52</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So, I think that's the answer. It just happened that the seasons I looked at had less competitive balance that the season or seasons Tango looked at.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But what's the right answer for today's NHL? Well, it looks like the standings spread in recent seasons has moved back closer to Tango's numbers:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2013-14 14.26</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2014-15 15.91</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2015-16 12.86</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2016-17 15.14</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2017-18 15.44</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">average 14.76</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">What does that mean for the "number of games" estimate? I'll do the calculation for last season, 2017-18.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">From the chart, SD(observed) is 15.44 points. SD(luck) is roughly the same for all years of the shootout era (although it varies very slightly with the number of overtime losses), so I'll use the old study's number of 8.44 points. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">As usual, </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">SD(talent)^2 = SD(observed)^2 - SD(luck)^2</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">SD(talent)^2 = 15.44^2 - 8.44^2</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">SD(talent) = 12.93</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">So last year, SD(talent) was 12.93. For the six seasons I looked at, it was 8.95. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(talent)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2016-12 8.95</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2017-18 12.93</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, let's convert to games.* </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif; font-size: x-small;">*Specifically, "luck as important as talent" means SD(luck)=SD(talent). Formula: using the numbers for a full season, divide SD(luck) by SD(talent), square it, and multiply by the number of games (82).</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">When SD(talent) is 8.95, like the seasons I looked at, it takes 77 games for luck and talent to even out. When SD(talent) = 12.93, like it was last year, it takes only ... 36 games.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Coincidentally, 36 games is exactly what Tango found in his own sample.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">talent=luck, after</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2016-12 77 games</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2017-18 36 games</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Two things we can conclude from this:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">1. Actual competitive balance (in terms of talent) does seem to change over time in non-random ways. The NHL from 2006-12 does actually seem to have been a more competitive league than from 2013-18. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">2. The "number of games" way of expressing the luck/talent balance is very sensitive to moderate changes in the observed range of the standings.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">--------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">To expand a bit on #2 ... </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">There must be significant random fluctuations in observed league balance. We mention that sometimes in passing, but I think we don't fully appreciate how big those random fluctuations can be.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Here, again, is the SD(observed) for the seasons 2014-17:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2014-15 15.91</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2015-16 12.86</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2016-17 15.14</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">It seems unlikely that 2015-16 really had that much tighter a talent distribution than the surrounding seasons. What probably happened, in 2015-16, was just a fluke -- the lucky teams happened to be lower-talent, and the unlucky teams happened to be higher-talent. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In other words, the difference was probably mostly luck. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">A different kind of luck, though -- luck in how each individual team's "regular" luck correlated, league-wide, with their talent. When the better teams (in talent) are luckier than the worse teams , the standings spread goes up. When the worse teams are luckier, the standings get compressed.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Anyway ... the drop in the chart from from 15.91 to 12.86 doesn't seem that big. But it winds up looking bigger once you subtract out luck to get to talent:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(talent)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2014-15 13.49</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2015-16 9.70</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2016-17 12.57</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">The difference is more pronounced now. But, check out what happens when we convert to how many games it takes for luck and talent to even out:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">Talent=luck, after</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2014-15 32 games</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2015-16 62 games</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2016-17 37 games</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Now, the differences are too large to ignore. From 2014-15 to 2015-16, SD(observed) went down only 19 percent, but the "number of games" figure nearly doubled.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">And that's what I mean by #2 -- the "number of games" estimate is very sensitive to what seem like mild changes in standings variation. </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">-------</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">Just for fun, let's compare 2006-07, one of the most unbalanced seasons, to 2007-08, one of the most balanced. Just looking at the standings, there's already a big difference:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">SD(observed)</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">--------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2006-07 16.14</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2007-08 10.43</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">But it becomes *huge* when when you express it in games: </span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">Talent=luck, after</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">------------------</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2006-07 31 games</span><br />
<span style="color: #990000; font-family: "courier new" , "courier" , monospace;">2007-08 156 games</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">In one year, our best estimate of how many games it takes for talent to exceed luck changed by a factor of *five times*. And, I think, almost all that difference is itself just random luck.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.com0