In [1]:
import turicreate as tc
In [2]:
sales = tc.SFrame.read_csv('Philadelphia_Crime_Rate_noNA.csv/')
sales
Finished parsing file /Users/Hunter/Documents/UW/416/resources/course-dev/demos/Philadelphia_Crime_Rate_noNA.csv
Parsing completed. Parsed 99 lines in 0.036217 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,float,float,float,float,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /Users/Hunter/Documents/UW/416/resources/course-dev/demos/Philadelphia_Crime_Rate_noNA.csv
Parsing completed. Parsed 99 lines in 0.009758 secs.
Out[2]:
HousePrice HsPrc ($10,000) CrimeRate MilesPhila PopChg Name County
140463 14.0463 29.7 10.0 -1.0 Abington Montgome
113033 11.3033 24.1 18.0 4.0 Ambler Montgome
124186 12.4186 19.5 25.0 8.0 Aston Delaware
110490 11.049 49.4 25.0 2.7 Bensalem Bucks
79124 7.9124 54.1 19.0 3.9 Bristol B. Bucks
92634 9.2634 48.6 20.0 0.6 Bristol T. Bucks
89246 8.9246 30.8 15.0 -2.6 Brookhaven Delaware
195145 19.5145 10.8 20.0 -3.5 Bryn Athyn Montgome
297342 29.7342 20.2 14.0 0.6 Bryn Mawr Montgome
264298 26.4298 20.4 26.0 6.0 Buckingham Bucks
[99 rows x 7 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [3]:
sales
Out[3]:
HousePrice HsPrc ($10,000) CrimeRate MilesPhila PopChg Name County
140463 14.0463 29.7 10.0 -1.0 Abington Montgome
113033 11.3033 24.1 18.0 4.0 Ambler Montgome
124186 12.4186 19.5 25.0 8.0 Aston Delaware
110490 11.049 49.4 25.0 2.7 Bensalem Bucks
79124 7.9124 54.1 19.0 3.9 Bristol B. Bucks
92634 9.2634 48.6 20.0 0.6 Bristol T. Bucks
89246 8.9246 30.8 15.0 -2.6 Brookhaven Delaware
195145 19.5145 10.8 20.0 -3.5 Bryn Athyn Montgome
297342 29.7342 20.2 14.0 0.6 Bryn Mawr Montgome
264298 26.4298 20.4 26.0 6.0 Buckingham Bucks
[99 rows x 7 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [4]:
# to select a single column
sales['HousePrice']
Out[4]:
dtype: int
Rows: 99
[140463, 113033, 124186, 110490, 79124, 92634, 89246, 195145, 297342, 264298, 134342, 147600, 77370, 170822, 40642, 71359, 104923, 190317, 215512, 178105, 131025, 149844, 170556, 280969, 114233, 74502, 475112, 97167, 114572, 436348, 389302, 122392, 130436, 272790, 194435, 299621, 210884, 112471, 93738, 121024, 156035, 185404, 126160, 143072, 96769, 94014, 118214, 157446, 150283, 153842, 197214, 206127, 71981, 169401, 99843, 60000, 28000, 60000, 61800, 38000, 38000, 42000, 96200, 103087, 147720, 78175, 92215, 271804, 119566, 100231, 95831, 229711, 74308, 259506, 159573, 147176, 205732, 215783, 116710, 359112, 189959, 133198, 242821, 142811, 200498, 199065, 93648, 163001, 436348, 124478, 168276, 114157, 130088, 152624, 174232, 196515, 232714, 245920, 130953]
In [5]:
# An SArray is a single dimension of a SFrame
type(sales['HousePrice'])
Out[5]:
turicreate.data_structures.sarray.SArray
In [6]:
# can also select multiple  by using list of columns names
sales['HousePrice', 'CrimeRate']
Out[6]:
HousePrice CrimeRate
140463 29.7
113033 24.1
124186 19.5
110490 49.4
79124 54.1
92634 48.6
89246 30.8
195145 10.8
297342 20.2
264298 20.4
[99 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [7]:
# If there are multiple columns, will be an SFrame instead of an SArray
type(sales['HousePrice', 'CrimeRate'])
Out[7]:
turicreate.data_structures.sframe.SFrame
In [8]:
# To get the row with index 2
sales[2]
Out[8]:
{'County': 'Delaware',
 'CrimeRate': 19.5,
 'HousePrice': 124186,
 'HsPrc ($10,000)': 12.4186,
 'MilesPhila': 25.0,
 'Name': 'Aston',
 'PopChg': 8.0}
In [9]:
# The type of a row is a dictionary (For Java people: A Map with String keys and any typed values)
type(sales[2])
Out[9]:
dict
In [15]:
# To access the value for a specific input in that row
row = sales[2]
row['County']

# Can also do this in one line with
#     sales[2]['County]
Out[15]:
'Delaware'
In [10]:
# Can be fancy and select multiple rows with "slice" notation
# This creates a range from 0 (inclusive) to 10 (exclusive) counting by 2's 
#    equivalent to sales[0, 2, 4, 6, 8]
sales[0:10:2]
Out[10]:
HousePrice HsPrc ($10,000) CrimeRate MilesPhila PopChg Name County
140463 14.0463 29.7 10.0 -1.0 Abington Montgome
124186 12.4186 19.5 25.0 8.0 Aston Delaware
79124 7.9124 54.1 19.0 3.9 Bristol B. Bucks
89246 8.9246 30.8 15.0 -2.6 Brookhaven Delaware
297342 29.7342 20.2 14.0 0.6 Bryn Mawr Montgome
[5 rows x 7 columns]
In [11]:
# To get all the rows that are in Bucks county
sales[sales['County'] == 'Bucks']
Out[11]:
HousePrice HsPrc ($10,000) CrimeRate MilesPhila PopChg Name County
110490 11.049 49.4 25.0 2.7 Bensalem Bucks
79124 7.9124 54.1 19.0 3.9 Bristol B. Bucks
92634 9.2634 48.6 20.0 0.6 Bristol T. Bucks
264298 26.4298 20.4 26.0 6.0 Buckingham Bucks
134342 13.4342 17.3 31.0 4.2 Chalfont Bucks
190317 19.0317 19.4 26.0 1.9 Doylestown Bucks
114233 11.4233 29.0 30.0 1.3 Falls Town Bucks
194435 19.4435 15.7 32.0 15.0 L. Makefield Bucks
143072 14.3072 40.1 23.0 1.6 Middletown Bucks
96769 9.6769 36.1 15.0 5.1 Morrisville Bucks
[? rows x 7 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [12]:
# Let's break up what this is doing
print 'What are you passing into the array access?'
mask = sales['County'] == 'Bucks'
print mask
# This is a "mask" of 1s and 0s that indicate which ones should be selected. The entries are 1 if they match the condition `== 'Bucks'`
print 'What happens if you use this list to access the sales?'
sales[mask]
What are you passing into the array access?
[0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1]
What happens if you use this list to access the sales?
Out[12]:
HousePrice HsPrc ($10,000) CrimeRate MilesPhila PopChg Name County
110490 11.049 49.4 25.0 2.7 Bensalem Bucks
79124 7.9124 54.1 19.0 3.9 Bristol B. Bucks
92634 9.2634 48.6 20.0 0.6 Bristol T. Bucks
264298 26.4298 20.4 26.0 6.0 Buckingham Bucks
134342 13.4342 17.3 31.0 4.2 Chalfont Bucks
190317 19.0317 19.4 26.0 1.9 Doylestown Bucks
114233 11.4233 29.0 30.0 1.3 Falls Town Bucks
194435 19.4435 15.7 32.0 15.0 L. Makefield Bucks
143072 14.3072 40.1 23.0 1.6 Middletown Bucks
96769 9.6769 36.1 15.0 5.1 Morrisville Bucks
[? rows x 7 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [13]:
# What if you want every row that is in Bucks county and has a CrimeRate > 15
sales[(sales['County'] == 'Bucks') & (sales['CrimeRate'] > 15)] # Note: Use `&` instead of `and` like you would in regular python
# Could also have done this which is a bit clunkier
#    sales[sales['County'] == 'Bucks'][sales['CrimeRate'] > 15]
Out[13]:
HousePrice HsPrc ($10,000) CrimeRate MilesPhila PopChg Name County
110490 11.049 49.4 25.0 2.7 Bensalem Bucks
79124 7.9124 54.1 19.0 3.9 Bristol B. Bucks
92634 9.2634 48.6 20.0 0.6 Bristol T. Bucks
264298 26.4298 20.4 26.0 6.0 Buckingham Bucks
134342 13.4342 17.3 31.0 4.2 Chalfont Bucks
190317 19.0317 19.4 26.0 1.9 Doylestown Bucks
114233 11.4233 29.0 30.0 1.3 Falls Town Bucks
194435 19.4435 15.7 32.0 15.0 L. Makefield Bucks
143072 14.3072 40.1 23.0 1.6 Middletown Bucks
96769 9.6769 36.1 15.0 5.1 Morrisville Bucks
[? rows x 7 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [14]:
# Also remember that in most cases in Python, if you want to find out the number of elements in a collection you use `len`
print 'All sales', len(sales)
print 'Bucks sales', len(sales[sales['County'] == 'Bucks'])
All sales 99
Bucks sales 19