探索 NBA 数据

我们首先安装 Goldsberry 包,项目源地址:

https://github.com/bradleyfay/py-Goldsberry

使用 pip 安装:

  1. pip install py-goldsberry

该包的接口与 pandas 兼容,可以与 pandasDataFrame 一起使用。

In [1]:

  1. import goldsberry as gb
  2. import pandas as pd

当前使用的版本号为:

In [2]:

  1. gb.__version__

Out[2]:

  1. '0.8.0.1'

球员信息

获得 2015-2016 赛季运动员的名单:

In [3]:

  1. players = gb.PlayerList().players()
  2. players = pd.DataFrame(players)
  3.  
  4. players.head()

Out[3]:

DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0
——-
Acy, Quincy 2012 Y 203112 quincy_acy 1 SAC Sacramento kings 1610612758 Kings 2015
1
——-
Adams, Jordan 2014 Y 203919 jordan_adams 1 MEM Memphis grizzlies 1610612763 Grizzlies 2015
2
——-
Adams, Steven 2013 Y 203500 steven_adams 1 OKC Oklahoma City thunder 1610612760 Thunder 2015
3
——-
Afflalo, Arron 2007 Y 201167 arron_afflalo 1 NYK New York knicks 1610612752 Knicks 2015
4
——-
Ajinca, Alexis 2008 Y 201582 alexis_ajinca 1 NOP New Orleans pelicans 1610612740 Pelicans 2015

球员总数为:

In [4]:

  1. print len(players)
  1. 464

通过查询特定的 TEAM_ABBREVIATION,我们可以查看某个球队本赛季的球员,比如 2014-2015 赛季的总冠军金州勇士 GSW

In [5]:

  1. gsw_players = players.ix[players["TEAM_ABBREVIATION"] == "GSW"]
  2.  
  3. gsw_players[["DISPLAY_LAST_COMMA_FIRST", "FROM_YEAR", "TEAM_ABBREVIATION", "TEAM_CITY", "TEAM_NAME", "PERSON_ID"]]

Out[5]:

DISPLAY_LAST_COMMA_FIRST FROM_YEAR TEAM_ABBREVIATION TEAM_CITY TEAM_NAME PERSON_ID
30
——-
Barbosa, Leandro 2003 GSW Golden State Warriors 2571
33
——-
Barnes, Harrison 2012 GSW Golden State Warriors 203084
52
——-
Bogut, Andrew 2005 GSW Golden State Warriors 101106
86
——-
Clark, Ian 2013 GSW Golden State Warriors 203546
103
——-
Curry, Stephen 2009 GSW Golden State Warriors 201939
135
——-
Ezeli, Festus 2012 GSW Golden State Warriors 203105
164
——-
Green, Draymond 2012 GSW Golden State Warriors 203110
209
——-
Iguodala, Andre 2004 GSW Golden State Warriors 2738
262
——-
Livingston, Shaun 2004 GSW Golden State Warriors 2733
263
——-
Looney, Kevon 2015 GSW Golden State Warriors 1626172
279
——-
McAdoo, James Michael 2014 GSW Golden State Warriors 203949
377
——-
Rush, Brandon 2008 GSW Golden State Warriors 201575
398
——-
Speights, Marreese 2008 GSW Golden State Warriors 201578
414
——-
Thompson, Jason 2008 GSW Golden State Warriors 201574
415
——-
Thompson, Klay 2011 GSW Golden State Warriors 202691

球员比赛数据

通过 DISPLAY_LAST_COMMA_FIRST,我们来查询宣布本赛季之后退役的科比布莱恩特(Kobe, Bryant)的信息:

In [6]:

  1. kobe = players.ix[players["DISPLAY_LAST_COMMA_FIRST"].str.contains("Kobe")]
  2.  
  3. kobe

Out[6]:

DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
64
——-
Bryant, Kobe 1996 Y 977 kobe_bryant 1 LAL Los Angeles lakers 1610612747 Lakers 2015

为了方便,我们将 KobeID 放到变量中去:

In [7]:

  1. kobe_id = 977

我们来看本赛季 Kobe 的比赛记录:

In [8]:

  1. kobe_logs = gb.player.game_logs(kobe_id)
  2.  
  3. kobe_logs = pd.DataFrame(kobe_logs.logs())
  4.  
  5. # 最近五场比赛
  6. kobe_logs.head()

Out[8]:

AST BLK DREB FG3A FG3M FG3_PCT FGA FGM FG_PCT FTA PF PLUS_MINUS PTS Player_ID REB SEASON_ID STL TOV VIDEO_AVAILABLE WL
0
——-
3 0 6 7 3 0.429 16 5 0.313 4 2 -19 17 977 6 22015 1 3 1 L
1
——-
0 0 4 14 4 0.286 25 6 0.240 4 0 -6 19 977 5 22015 0 0 1 L
2
——-
4 1 1 14 4 0.286 28 9 0.321 3 4 -2 25 977 2 22015 0 2 1 L
3
——-
2 0 9 11 4 0.364 24 10 0.417 4 0 16 27 977 12 22015 2 1 1 W
4
——-
5 0 3 11 7 0.636 21 10 0.476 12 3 6 38 977 5 22015 2 2 1 W

5 rows × 27 columns

截至到全明星赛前,本赛季 Kobe 一共参加了 44 场比赛,其场均数据为:

In [9]:

  1. kobe_logs.Game_ID

Out[9]:

  1. 0 0021500795
  2. 1 0021500776
  3. 2 0021500767
  4. 3 0021500747
  5. 4 0021500734
  6. 5 0021500720
  7. 6 0021500697
  8. 7 0021500662
  9. 8 0021500653
  10. 9 0021500638
  11. 10 0021500614
  12. 11 0021500608
  13. 12 0021500592
  14. 13 0021500576
  15. 14 0021500549
  16. 15 0021500539
  17. 16 0021500476
  18. 17 0021500458
  19. 18 0021500455
  20. 19 0021500440
  21. 20 0021500435
  22. 21 0021500422
  23. 22 0021500385
  24. 23 0021500370
  25. 24 0021500349
  26. 25 0021500342
  27. 26 0021500325
  28. 27 0021500308
  29. 28 0021500301
  30. 29 0021500286
  31. 30 0021500269
  32. 31 0021500263
  33. 32 0021500253
  34. 33 0021500244
  35. 34 0021500214
  36. 35 0021500201
  37. 36 0021500188
  38. 37 0021500151
  39. 38 0021500135
  40. 39 0021500095
  41. 40 0021500077
  42. 41 0021500059
  43. 42 0021500045
  44. 43 0021500031
  45. 44 0021500017
  46. Name: Game_ID, dtype: object

In [10]:

  1. def show_avg_info(avg):
  2. print "得分:{:.1f}".format(avg.ix["PTS"])
  3. print "篮板:{:.1f}".format(avg.ix["REB"])
  4. print "助攻:{:.1f}".format(avg.ix["AST"])
  5. print "盖帽:{:.1f}".format(avg.ix["BLK"])
  6. print "时间:{:.1f}".format(avg.ix["MIN"])
  7. print "抢断:{:.1f}".format(avg.ix["STL"])
  8. print "失误:{:.1f}".format(avg.ix["TOV"])
  9. print "犯规:{:.1f}".format(avg.ix["PF"])
  10. print "投篮:{:.1f}%".format(avg.ix["FGM"] * 100 / avg.ix["FGA"])
  11. print "三分:{:.1f}%".format(avg.ix["FG3M"] * 100 / avg.ix["FG3A"])
  12. print "罚篮:{:.1f}%".format(avg.ix["FTM"] * 100 / avg.ix["FTA"])
  13. print "后篮板:{:.1f}".format(avg.ix["DREB"])
  14. print "前篮板:{:.1f}".format(avg.ix["OREB"])
  15. print "正负值:{:.1f}".format(avg.ix["PLUS_MINUS"])
  16.  
  17. show_avg_info(kobe_logs.mean())
  1. 得分:16.9
  2. 篮板:4.2
  3. 助攻:3.4
  4. 盖帽:0.2
  5. 时间:29.3
  6. 抢断:1.0
  7. 失误:2.2
  8. 犯规:1.9
  9. 投篮:34.9%
  10. 三分:28.0%
  11. 罚篮:80.3%
  12. 后篮板:3.5
  13. 前篮板:0.7
  14. 正负值:-7.9

再看一下史提芬库里的场均数据(不要问我为什么跪着看球):

In [11]:

  1. curry_id = 201939
  2. curry_logs = gb.player.game_logs(curry_id)
  3. curry_logs = pd.DataFrame(curry_logs.logs())
  4.  
  5. show_avg_info(curry_logs.mean())
  1. 得分:29.8
  2. 篮板:5.3
  3. 助攻:6.6
  4. 盖帽:0.2
  5. 时间:33.9
  6. 抢断:2.1
  7. 失误:3.3
  8. 犯规:2.0
  9. 投篮:50.8%
  10. 三分:45.4%
  11. 罚篮:91.2%
  12. 后篮板:4.5
  13. 前篮板:0.9
  14. 正负值:15.5

当然我们也可以对比一下职业生涯的数据:

In [12]:

  1. kobe_career = gb.player.career_stats(kobe_id)
  2. curry_career = gb.player.career_stats(curry_id)

职业生涯最高:

In [13]:

  1. def show_career_high(career):
  2. career_high = pd.DataFrame(career.career_high()).ix[[0,1,5]]
  3. print career_high[["GAME_DATE", "STAT", "STAT_VALUE", "VS_TEAM_CITY", "VS_TEAM_NAME"]]
  4.  
  5. print "Kobe"
  6. show_career_high(kobe_career)
  7.  
  8. print "Curry"
  9. show_career_high(curry_career)
  1. Kobe
  2. GAME_DATE STAT STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
  3. 0 JAN 22 2006 PTS 81 Toronto Raptors
  4. 1 JAN 24 2010 REB 16 Toronto Raptors
  5. 5 JAN 15 2015 AST 17 Cleveland Cavaliers
  6. Curry
  7. GAME_DATE STAT STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
  8. 0 FEB 27 2013 PTS 54 New York Knicks
  9. 1 DEC 28 2015 REB 14 Sacramento Kings
  10. 5 DEC 27 2013 AST 16 Phoenix Suns

本赛季最高:

In [14]:

  1. def show_season_high(career):
  2. career_high = pd.DataFrame(career.season_high()).ix[[0,1,5]]
  3. print career_high[["GAME_DATE", "STAT", "STAT_VALUE", "VS_TEAM_CITY", "VS_TEAM_NAME"]]
  4.  
  5. print "Kobe"
  6. show_season_high(kobe_career)
  7.  
  8. print "Curry"
  9. show_season_high(curry_career)
  1. Kobe
  2. GAME_DATE STAT STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
  3. 0 FEB 02 2016 PTS 38 Minnesota Timberwolves
  4. 1 FEB 04 2016 REB 12 New Orleans Pelicans
  5. 5 NOV 15 2015 AST 9 Detroit Pistons
  6. Curry
  7. GAME_DATE STAT STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
  8. 0 OCT 31 2015 PTS 53 New Orleans Pelicans
  9. 1 DEC 28 2015 REB 14 Sacramento Kings
  10. 5 JAN 25 2016 STL 5 San Antonio Spurs

比赛信息

In [15]:

  1. game_ids = gb.GameIDs()
  2. game_ids = pd.DataFrame(game_ids.game_list())
  3.  
  4. game_ids.head()

Out[15]:

AST BLK DREB FG3A FG3M FG3_PCT FGA FGM FG_PCT FTA PTS REB SEASON_ID STL TEAM_ABBREVIATION TEAM_ID TEAM_NAME TOV VIDEO_AVAILABLE WL
0
——-
28 4 45 29 8 0.276 124 56 0.452 46 147 64 22015 7 DET 1610612765 Detroit Pistons 11 1 W
1
——-
30 2 36 23 9 0.391 87 53 0.609 34 142 46 22015 9 SAC 1610612758 Sacramento Kings 15 1 W
2
——-
34 2 30 21 9 0.429 86 52 0.605 13 123 38 22015 10 SAS 1610612759 San Antonio Spurs 13 1 W
3
——-
29 6 36 35 16 0.457 95 52 0.547 15 131 46 22015 10 GSW 1610612744 Golden State Warriors 15 1 W
4
——-
34 8 38 31 8 0.258 104 52 0.500 16 122 46 22015 10 SAC 1610612758 Sacramento Kings 20 1 L

5 rows × 29 columns

获得运动员的头像

In [16]:

  1. from IPython.display import Image
  2.  
  3. Image("http://stats.nba.com/media/players/230x185/"+str(kobe_id)+".png")

Out[16]:

10.03 探索 NBA 数据 - 图1

In [17]:

  1. Image("http://stats.nba.com/media/players/230x185/"+str(curry_id)+".png")

Out[17]:

10.03 探索 NBA 数据 - 图2

More

修改了 goldsberry\player_Player.py 代码中的错误,使之能够查询退役球员的信息,修改后的代码在本文件夹下,放到安装目录之后下面的代码均可以运行:

In [18]:

  1. from goldsberry.player import _Player as pl_old

1997 年的球员列表:

In [19]:

  1. players_1997 = pl_old.PlayerList(1997)
  2.  
  3. players_1997 = pd.DataFrame(players_1997)

乔丹的球员 ID:

In [20]:

  1. jordan_id = players_1997["PERSON_ID"].ix[players_1997["DISPLAY_LAST_COMMA_FIRST"].str.contains("Jordan, Michael")]
  2. jordan_id = jordan_id[jordan_id.index[0]]
  3. jordan_id

Out[20]:

  1. 893

乔丹在 1997-1998 赛季常规赛表现:

In [21]:

  1. jordan_logs_1997 = pl_old.game_logs(jordan_id, season="1997")
  2. jordan_logs_1997 = pd.DataFrame(jordan_logs_1997.logs())
  3.  
  4. show_avg_info(jordan_logs_1997.mean())
  1. 得分:28.7
  2. 篮板:5.8
  3. 助攻:3.5
  4. 盖帽:0.5
  5. 时间:38.9
  6. 抢断:1.7
  7. 失误:2.3
  8. 犯规:1.8
  9. 投篮:46.5%
  10. 三分:23.8%
  11. 罚篮:78.4%
  12. 后篮板:4.2
  13. 前篮板:1.6
  14. 正负值:7.3

乔丹在 1997-1998 赛季季后赛表现:

In [22]:

  1. jordan_logs_1997 = pl_old.game_logs(jordan_id, season="1997", seasontype=2)
  2. jordan_logs_1997 = pd.DataFrame(jordan_logs_1997.logs())
  3.  
  4. show_avg_info(jordan_logs_1997.mean())
  1. 得分:32.4
  2. 篮板:5.1
  3. 助攻:3.5
  4. 盖帽:0.6
  5. 时间:41.0
  6. 抢断:1.5
  7. 失误:2.1
  8. 犯规:2.2
  9. 投篮:46.2%
  10. 三分:30.2%
  11. 罚篮:81.2%
  12. 后篮板:3.5
  13. 前篮板:1.6
  14. 正负值:7.5

头像:

In [23]:

  1. Image("http://stats.nba.com/media/players/230x185/"+str(jordan_id)+".png")

Out[23]:

10.03 探索 NBA 数据 - 图3

原文: https://nbviewer.jupyter.org/github/lijin-THU/notes-python/blob/master/10-something-interesting/10.03-nba-data.ipynb