armchairgm
all sports, all you
+ Add Friends
You are not logged-in.
Sign Up - Log In
Main Page
Sports
Write
Articles
Hot Links
Images
Meet People
Fun
Explore
MLB - NFL - NBA - NHL - College Basketball - College Football - Soccer - Nascar - Other
Article - Locker Room Discussion
All Articles - New Articles - Today's Articles
Submit a Link - Approve Links
Picture Game - Ratings - Polls - Pick Game - Quiz Game - Spring Silliness
Random Page - Random Image - Random Fan

About the Author

Tannenj

More By Tannenj

Unstrung
6 votes, 0 comments
View All

Other recent voters

If you like the article, vote for it.
Edit
Page history Discuss pageWhat links here

Applying Sabermetrics To Tennis

by Tannenj
created July 06, 2008, last edited September 29, 2008
8
Vote

Cross-posted from my blog:

Sabermetrics (noun): the analysis of baseball through objective evidence, especially baseball statistics.

Sabermetrics, a collection of formulas and math-rooted ideas that allows interested parties to obtain an improved understanding of baseball, is a topic about which I've blogged sporadically. The Hardball Times, Fire Joe Morgan, Baseball Prospectus, and Baseball Think Factory, among a bunch of other web sites, are excellent sources for sabermetric perspective.

Here, I discussed predictive baseball statistics. Cliffnotes: a team's adjusted Pythagorean record has better predictive value than its actual record, a hitter's predicted OPS has better predictive value than his actual OPS, and a pitcher's FIP has better predictive value than his ERA. These statements are facts because Pythagorean record, PrOPS, and FIP are less influenced by luck than won-loss record, OPS, and ERA. The idea is to isolate that of which players are in control and to remove that which is more or less random, because doing so yields information that's an improvement over traditional statistics with regard to predicting future performance.

For whatever reason, such thinking has yet to become popular in non-baseball circles. Sure, there's this (basketball's version of sabermetrics) and this (football's), but neither is as advanced or as accepted as the baseball version (to my knowledge, work in the sabermetric vein has yet to be been performed in hockey, soccer, etc.). This is probably the case because baseball is an inherently better match with math and statistics than other sports are, but who knows.

Anyway, I caught myself wondering, the other day, about sabermetrics and tennis. A Google search of "sabermetrics tennis" does yield a third result of this blog, but while semi-interesting, it's fluffy and doesn't offer much math.

I decided to do a little thinking on my own. The goal? To obtain an improved understanding of tennis (sound familiar?). Specifically, my aim was to manipulate tennis statistics in a way that caused them to have improved predictive value.

I mentioned Pythagorean record above. I asked myself, "Would it make sense to attempt to apply this concept to tennis?" At this point, I'm not completely sure of the answer to this question.

That said, I created a procedure to derive something resembling the tennis equivalent. Here are the steps I followed:

1. Find a player's profile at atptennis.com.

2. Open a new tab/window for the player's "YTD [year to date] match facts" link in the upper right portion of the screen.

3. Multiply the player's "Service Games Won %" number by his "Service Games Played" number.

4. Multiply the player's "Return Games Won %" number by his "Return Games Played" number.

5. Round the results of step three and step four to the nearest integers and sum these numbers.

6. Divide the result of step five by the sum of the player's "Service Games Played" number and his "Return Games Played" number.

7. Multiply the result of step six by the player's total number of matches (the sum of the player's wins [W] and the player's losses [L], as seen in his profile page).

8. Round the result of step seven to the nearest tenth in order to obtain a number to which I'll refer as Adjusted wins (Wa).

9. Multiply Wa by a multiplier (yikes, awkward use of language) of 1.309 (see below) in order to obtain a total for Pythagorean wins (Wp).

10. Subtract Wp from the player's total number of matches in order to obtain a total for Pythagorean losses (Lp; Pythagorean Record = Wp-Lp).

Example (Roger Federer):

1. Above

2. Above

3. (.88)(546) = 480.48

4. (.29)(533) = 154.57

5. 480 + 155 = 635

6. 635/(546 + 533) = .5885078777

7. (.5885078777)(37 + 8) = 26.48285449

8. 26.5 = Wa

9. (26.5)(1.309) = 34.7

10. (37 + 8) - 34.7 = 10.3; Pythagorean record = 34.7-10.3

Here are the results for the rest of the players currently ranked in the top 10 in the world as of Monday, June 23, 2008:


Observant viewers of the chart will understand my methodology. I divided each player's W-L percentage (1) by his Wa-La percentage (2) in order to obtain a value I called [3] (column eight). I added up each player's value for [3] and divided by 10 in order to derive a mean for [3], which turned out to be 1.309. All that this means is that so far in 2008, the average top 10 player has a W-L percentage of 1.309 times his Wa-La percentage.

Taking that into account, in theory, top 10 players who have a [3] of greater than 1.309 have been "lucky" (Federer, Nadal, Djokovic, Davydenko, Ferrer, Roddick, and Nalbandian fall into this category). Meanwhile, top 10 players who have a [3] of less than 1.309 have been "unlucky" (Blake, Wawrinka, and Gasquet have been "unlucky" so far). NOTE THAT MY METHODOLOGY COMPLETELY IGNORES THE BELIEF THAT CERTAIN PLAYERS ARE MORE ”CLUTCH” THAN OTHER PLAYERS!

In column nine (the rightmost column), I took each player's result from column eight and subtracted the column eight average (1.309, the multiplier I used to derive Pythagorean record). The result was each player's "Luck Score," the difference between his "luck" and the "luck" of the average top 10 player thus far in 2008.

Comments:

  • At the risk of stating the obvious, MY METHODOLOGY IS EXTREMELY ROUGH. My hope is not for this stuff to be put into practice, but for it to inspire a portion of those who understand it to put their brains in motion (and to attempt to improve it).
  • Richard Gasquet's [3] value is a nasty outlier and is skewing the results of a minuscule sample size of numbers. The absolute value of Gasquet's "Luck Score" (.346) is not only the biggest of the scores in question, it's almost three times as big as the next-biggest (Nadal's, .116). There's perhaps merit for redoing the analysis without Gasquet, who has the reputation of being mentally weak and of having a tendency to "choke" away matches (the removal of Gasquet would surely knock Novak Djokovic [+.021] and David Ferrer [+.020] into negative "Luck Score" territory). Additionally, there is surely merit to performing this analysis again A) at the end of the year, when the numbers I used to derive Wa-La are larger and B) for the top 100 players rather than the top 10. Note that all three of these ideas would change the multiplier of 1.309.
  • The order from "luckiest" to least "lucky" was Nadal (ranking: 2), Roddick (6), Nalbandian (7), Federer (1), Davydenko (4), Djokovic (3), Ferrer (5), Blake, (8), Wawrinka (9), and Gasquet (10). Thus, while a quick glance at which numbers are green and which are red might indicate that there's a direct relationship between ranking points and "luck," "luck" actually seems to be somewhat "random" within this small sample. Two of the top three "luckiest" players (Roddick and Nalbandian) are out of the top five in the rankings. Furthermore, as I stated above, if not for Gasquet's ridiculously low "Luck Score," both Djokovic (3) and Ferrer (5) would have luck scores in the red.
  • The content of the above bullet noted, there's no doubt that to some extent, my methodology is oversimplification. In measuring "luck," I have made the assumption that every player in the top 10 has the same chance to win a close match as every other player in the top 10; I strongly doubt that this is the case (the best players have the most reliable money shots). I should note that baseball's adjusted Pythagorean record is also oversimplification. I'm comfortable, however, making the statement that my tennis methodology is a more flagrant example of it.
  • Certain players (like Andy Roddick and Pete Sampras) have mediocre break games but tend to win lots of 7-5 sets and lots of tiebreaks. Players like these will generally have Pythagorean records that are much worse than their actual W-L records and as a result will have inflated "Luck Scores" (this reminds me of knuckleball pitchers appearing "lucky" because they tend to allow very low batting averages on balls in play [see: the blog entry in which I discussed predictive baseball statistics, which is linked at the beginning of the article]). Given this, it would be reasonable to consider adjusting Pythagorean record further through the use of players' career tiebreak records. Of course, tennis statistics aren't even close to as widely maintained as baseball statistics, and I don't know where to find this information (the only page of tennis statistics of which I'm aware is this one).


Enable Comment Auto-Refresher
LASportsblogAAA-er
93 days ago
Score 2+-
That was a buttload of work, but well done and really well thought out. I'm glad you cross published and hope you contribute further here in the future.
Permalink | Reply
Add your Comment
ArmchairGM welcomes all comments. If you don't want to be anonymous, Register or Login. It's free
Categories: Opinions | Opinions by User Tannenj | July 6, 2008 | July 2008 | Sabermetrics Opinions | Baseball Opinions | Tennis Opinions

Don't Miss

AGM Fantasy Football Update (Numero Uno)
How the Angels blew the ALDS I do not know...actually, yes I do
Watching Sports Is Good For Your Brain
2008 MLB Playoff Picture
October Baseball Means... Arizona Fall League!

In the News

The Los Angeles Angels defeated the Boston Red Sox in the playoffs for the first time since 1986, snapping an 11 game losing streak. It took 12 innings to do it, as Mike Napoli led the Halos to a 5-4 victory. Erick Aybar hit the game winning shot to score Napoli, and Nap was held accountable for 3 of the Angels' runs and 3 of their RBI's. Josh Beckett wasn't the postseason man he usually is, as he only lasted 5 innings, surrendering 4 runs and 9 hits. Jacoby Ellsbury also made history, becoming the first player in postseason history to hit a three-run single. He did it in the 3rd inning, as he hit a pop-up while there was some miscommunication involving Torii Hunter and Howie Kendrick in shallow center field. The Red Sox currently lead the series 2-1.

Comments of the Day

3 Varitek, Ortiz, Youkilis and Wakefield are the ONLY player...
3 Hence the user name...
3 Howie Kendrick sacrificed in the 12th inning of Game Three. ...
3 What does watching the Arizona Cardinals have to do with watchi...
3 See? This why I only watch porn.
3 Harry Kalas is the greatest.
3 Damn straight. I think that Philly fans are lucky to hav...
3 I always listen to crickets, when I get tired I grab the can of...
2 No I didn"t - nor did I forget Manny... Those guys...
2 No, he"s [http://boston.redsox.mlb.com/team/roster_40man.j...

Play the Quiz Game

For how many of Bobby Bowden's wins was he not coaching a D-1 or FBS/FCS team?

New Articles

1946 Army football team - backfield
Putt With Your Brain - Part 1
Phillies vs. Dodgers Showdown
Are the Red Sox in Trouble?
EliteXC Musings

Retrieved from "http://www.armchairgm.com/Article:Applying_Sabermetrics_To_Tennis"

This page was last modified 07:51, 6 July 2008. Content is available under the GFDL.

Main Page About Special Pages Help Terms of Use Advertise