第三章

原文:Chapter 3

译者:飞龙

协议:CC BY-NC-SA 4.0

  1. # 通常的开头
  2. import pandas as pd
  3. # 使图表更大更漂亮
  4. pd.set_option('display.mpl_style', 'default')
  5. figsize(15, 5)
  6. # 始终展示所有列
  7. pd.set_option('display.line_width', 5000)
  8. pd.set_option('display.max_columns', 60)

让我们继续 NYC 311 服务请求的例子。

  1. complaints = pd.read_csv('../data/311-service-requests.csv')

3.1 仅仅选择噪音投诉

我想知道哪个区有最多的噪音投诉。 首先,我们来看看数据,看看它是什么样子:

  1. complaints[:5]
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Action Updated Date Community Board Borough X Coordinate (State Plane) Y Coordinate (State Plane) Park Facility Name Park Borough School Name School Number School Region School Code School Phone Number School Address School City School State School Zip School Not Found School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
0 26589651 10/31/2013 02:08:41 AM NaN NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11432 90-03 169 STREET 169 STREET 90 AVENUE 91 AVENUE NaN NaN ADDRESS JAMAICA NaN Precinct Assigned 10/31/2013 10:08:41 AM 10/31/2013 02:35:17 AM 12 QUEENS QUEENS 1042027 197389 Unspecified QUEENS Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.708275 -73.791604 (40.70827532593202, -73.79160395779721)
1 26593698 10/31/2013 02:01:04 AM NaN NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk 11378 58 AVENUE 58 AVENUE 58 PLACE 59 STREET NaN NaN BLOCKFACE MASPETH NaN Precinct Open 10/31/2013 10:01:04 AM NaN 05 QUEENS QUEENS 1009349 201984 Unspecified QUEENS Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.721041 -73.909453 (40.721040535628305, -73.90945306791765)
2 26594139 10/31/2013 02:00:24 AM 10/31/2013 02:40:32 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 10032 4060 BROADWAY BROADWAY WEST 171 STREET WEST 172 STREET NaN NaN ADDRESS NEW YORK NaN Precinct Closed 10/31/2013 10:00:24 AM 10/31/2013 02:39:42 AM 12 MANHATTAN MANHATTAN 1001088 246531 Unspecified MANHATTAN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.843330 -73.939144 (40.84332975466513, -73.93914371913482)
3 26595721 10/31/2013 01:56:23 AM 10/31/2013 02:21:48 AM NYPD New York City Police Department Noise - Vehicle Car/Truck Horn Street/Sidewalk 10023 WEST 72 STREET WEST 72 STREET COLUMBUS AVENUE AMSTERDAM AVENUE NaN NaN BLOCKFACE NEW YORK NaN Precinct Closed 10/31/2013 09:56:23 AM 10/31/2013 02:21:10 AM 07 MANHATTAN MANHATTAN 989730 222727 Unspecified MANHATTAN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.778009 -73.980213 (40.7780087446372, -73.98021349023975)
4 26590930 10/31/2013 01:53:44 AM NaN DOHMH Department of Health and Mental Hygiene Rodent Condition Attracting Rodents Vacant Lot 10027 WEST 124 STREET WEST 124 STREET LENOX AVENUE ADAM CLAYTON POWELL JR BOULEVARD NaN NaN BLOCKFACE NEW YORK NaN N/A Pending 11/30/2013 01:53:44 AM 10/31/2013 01:59:54 AM 10 MANHATTAN MANHATTAN 998815 233545 Unspecified MANHATTAN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.807691 -73.947387 (40.80769092704951, -73.94738703491433)

为了得到噪音投诉,我们需要找到Complaint Type列为Noise - Street/Sidewalk的行。 我会告诉你如何做,然后解释发生了什么。

  1. noise_complaints = complaints[complaints['Complaint Type'] == "Noise - Street/Sidewalk"]
  2. noise_complaints[:3]
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Action Updated Date Community Board Borough X Coordinate (State Plane) Y Coordinate (State Plane) Park Facility Name Park Borough School Name School Number School Region School Code School Phone Number School Address School City School State School Zip School Not Found School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
0 26589651 10/31/2013 02:08:41 AM NaN NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11432 90-03 169 STREET 169 STREET 90 AVENUE 91 AVENUE NaN NaN ADDRESS JAMAICA NaN Precinct Assigned 10/31/2013 10:08:41 AM 10/31/2013 02:35:17 AM 12 QUEENS QUEENS 1042027 197389 Unspecified QUEENS Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.708275 -73.791604 (40.70827532593202, -73.79160395779721)
16 26594086 10/31/2013 12:54:03 AM 10/31/2013 02:16:39 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 10310 173 CAMPBELL AVENUE CAMPBELL AVENUE HENDERSON AVENUE WINEGAR LANE NaN NaN ADDRESS STATEN ISLAND NaN Precinct Closed 10/31/2013 08:54:03 AM 10/31/2013 02:07:14 AM 01 STATEN ISLAND STATEN ISLAND 952013 171076 Unspecified STATEN ISLAND Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.636182 -74.116150 (40.63618202176914, -74.1161500428337)
25 26591573 10/31/2013 12:35:18 AM 10/31/2013 02:41:35 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 10312 24 PRINCETON LANE PRINCETON LANE HAMPTON GREEN DEAD END NaN NaN ADDRESS STATEN ISLAND NaN Precinct Closed 10/31/2013 08:35:18 AM 10/31/2013 01:45:17 AM 03 STATEN ISLAND STATEN ISLAND 929577 140964 Unspecified STATEN ISLAND Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.553421 -74.196743 (40.55342078716953, -74.19674315017886)

如果你查看noise_complaints,你会看到它生效了,它只包含带有正确的投诉类型的投诉。 但是这是如何工作的? 让我们把它解构成两部分

  1. complaints['Complaint Type'] == "Noise - Street/Sidewalk"
  1. 0 True
  2. 1 False
  3. 2 False
  4. 3 False
  5. 4 False
  6. 5 False
  7. 6 False
  8. 7 False
  9. 8 False
  10. 9 False
  11. 10 False
  12. 11 False
  13. 12 False
  14. 13 False
  15. 14 False
  16. ...
  17. 111054 True
  18. 111055 False
  19. 111056 False
  20. 111057 False
  21. 111058 False
  22. 111059 True
  23. 111060 False
  24. 111061 False
  25. 111062 False
  26. 111063 False
  27. 111064 False
  28. 111065 False
  29. 111066 True
  30. 111067 False
  31. 111068 False
  32. Name: Complaint Type, Length: 111069, dtype: bool

这是一个TrueFalse的大数组,对应DataFrame中的每一行。 当我们用这个数组索引我们的DataFrame时,我们只得到其中为True行。

您还可以将多个条件与&运算符组合,如下所示:

  1. is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
  2. in_brooklyn = complaints['Borough'] == "BROOKLYN"
  3. complaints[is_noise & in_brooklyn][:5]
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Action Updated Date Community Board Borough X Coordinate (State Plane) Y Coordinate (State Plane) Park Facility Name Park Borough School Name School Number School Region School Code School Phone Number School Address School City School State School Zip School Not Found School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
31 26595564 10/31/2013 12:30:36 AM NaN NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 11236 AVENUE J AVENUE J EAST 80 STREET EAST 81 STREET NaN NaN BLOCKFACE BROOKLYN NaN Precinct Open 10/31/2013 08:30:36 AM NaN 18 BROOKLYN BROOKLYN 1008937 170310 Unspecified BROOKLYN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.634104 -73.911055 (40.634103775951736, -73.91105541883589)
49 26595553 10/31/2013 12:05:10 AM 10/31/2013 02:43:43 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11225 25 LEFFERTS AVENUE LEFFERTS AVENUE WASHINGTON AVENUE BEDFORD AVENUE NaN NaN ADDRESS BROOKLYN NaN Precinct Closed 10/31/2013 08:05:10 AM 10/31/2013 01:29:29 AM 09 BROOKLYN BROOKLYN 995366 180388 Unspecified BROOKLYN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.661793 -73.959934 (40.6617931276793, -73.95993363978067)
109 26594653 10/30/2013 11:26:32 PM 10/31/2013 12:18:54 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 11222 NaN NaN NaN NaN DOBBIN STREET NORMAN STREET INTERSECTION BROOKLYN NaN Precinct Closed 10/31/2013 07:26:32 AM 10/31/2013 12:18:54 AM 01 BROOKLYN BROOKLYN 996925 203271 Unspecified BROOKLYN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.724600 -73.954271 (40.724599563793525, -73.95427134534344)
236 26591992 10/30/2013 10:02:58 PM 10/30/2013 10:23:20 PM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11218 DITMAS AVENUE DITMAS AVENUE NaN NaN NaN NaN LATLONG BROOKLYN NaN Precinct Closed 10/31/2013 06:02:58 AM 10/30/2013 10:23:20 PM 01 BROOKLYN BROOKLYN 991895 171051 Unspecified BROOKLYN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.636169 -73.972455 (40.63616876563881, -73.97245504682485)
370 26594167 10/30/2013 08:38:25 PM 10/30/2013 10:26:28 PM NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 11218 126 BEVERLY ROAD BEVERLY ROAD CHURCH AVENUE EAST 2 STREET NaN NaN ADDRESS BROOKLYN NaN Precinct Closed 10/31/2013 04:38:25 AM 10/30/2013 10:26:28 PM 12 BROOKLYN BROOKLYN 990144 173511 Unspecified BROOKLYN Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified Unspecified N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 40.642922 -73.978762 (40.6429222774404, -73.97876175474585)

或者如果我们只需要几列:

  1. complaints[is_noise & in_brooklyn][['Complaint Type', 'Borough', 'Created Date', 'Descriptor']][:10]
Complaint Type Borough Created Date Descriptor
31 Noise - Street/Sidewalk BROOKLYN 10/31/2013 12:30:36 AM Loud Music/Party
49 Noise - Street/Sidewalk BROOKLYN 10/31/2013 12:05:10 AM Loud Talking
109 Noise - Street/Sidewalk BROOKLYN 10/30/2013 11:26:32 PM Loud Music/Party
236 Noise - Street/Sidewalk BROOKLYN 10/30/2013 10:02:58 PM Loud Talking
370 Noise - Street/Sidewalk BROOKLYN 10/30/2013 08:38:25 PM Loud Music/Party
378 Noise - Street/Sidewalk BROOKLYN 10/30/2013 08:32:13 PM Loud Talking
656 Noise - Street/Sidewalk BROOKLYN 10/30/2013 06:07:39 PM Loud Music/Party
1251 Noise - Street/Sidewalk BROOKLYN 10/30/2013 03:04:51 PM Loud Talking
5416 Noise - Street/Sidewalk BROOKLYN 10/29/2013 10:07:02 PM Loud Talking
5584 Noise - Street/Sidewalk BROOKLYN 10/29/2013 08:15:59 PM Loud Music/Party

3.2 numpy 数组的注解

在内部,列的类型是pd.Series

  1. pd.Series([1,2,3])
  1. 0 1
  2. 1 2
  3. 2 3
  4. dtype: int64

而且pandas.Series的内部是 numpy 数组。 如果将.values添加到任何Series的末尾,你将得到它的内部 numpy 数组。

  1. np.array([1,2,3])
  1. array([1, 2, 3])
  1. pd.Series([1,2,3]).values
  1. array([1, 2, 3])

所以这个二进制数组选择的操作,实际上适用于任何 NumPy 数组:

  1. arr = np.array([1,2,3])
  1. arr != 2
  1. array([ True, False, True], dtype=bool)
  1. arr[arr != 2]
  1. array([1, 3])

3.3 所以,哪个区的噪音投诉最多?

  1. is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
  2. noise_complaints = complaints[is_noise]
  3. noise_complaints['Borough'].value_counts()
  1. MANHATTAN 917
  2. BROOKLYN 456
  3. BRONX 292
  4. QUEENS 226
  5. STATEN ISLAND 36
  6. Unspecified 1
  7. dtype: int64

这是曼哈顿! 但是,如果我们想要除以总投诉数量,以使它有点更有意义? 这也很容易:

  1. noise_complaint_counts = noise_complaints['Borough'].value_counts()
  2. complaint_counts = complaints['Borough'].value_counts()
  1. noise_complaint_counts / complaint_counts
  1. BRONX 0
  2. BROOKLYN 0
  3. MANHATTAN 0
  4. QUEENS 0
  5. STATEN ISLAND 0
  6. Unspecified 0
  7. dtype: int64

糟糕,为什么是零?这是因为 Python 2 中的整数除法。让我们通过将complaints_counts转换为浮点数组来解决它。

  1. noise_complaint_counts / complaint_counts.astype(float)
  1. BRONX 0.014833
  2. BROOKLYN 0.013864
  3. MANHATTAN 0.037755
  4. QUEENS 0.010143
  5. STATEN ISLAND 0.007474
  6. Unspecified 0.000141
  7. dtype: float64
  1. (noise_complaint_counts / complaint_counts.astype(float)).plot(kind='bar')
  1. <matplotlib.axes.AxesSubplot at 0x75b7890>

第三章 - 图1

所以曼哈顿的噪音投诉比其他区要多。