Using the code set up in class, I created an aggregation function to determine a statistic about the data. The statistic computed is like the probability of arrest per precinct. In the code, loaded the CSV line by line, and each line is converted into an array of field values. After I grouped those elements per precinct in a dictionary, for each precinct in the dictionary, I iterate through all of its records, see if a particular record is an arrest, and increment the total for that precinct. Finally, I computed the percentage of number of arrests over the total records for that precinct. I saved the computed values in a dictionary and dumped them into a JSON file for the chart in Step #2. For a link to the python code, click here
I created a bar graph for the custom chart that depicts the probability of being arrested per precinct. I copied the JSON data into the HTML folder, and I dynamically loaded it in the chart, including data and labels. Then I dynamically generated the colors on random.
We can see a few outliers in our data through the analysis of this data. In some precincts, their arrest rate is upwards of 25%, which is relatively high. These precincts may be located in more dangerous or active communities. Some precincts, however, seem to have low probability rates, which suggests those areas may have less police presence. There is one more exciting outlier for precinct '759'. They have no arrests on record. Now, this could be a mishap from the data.. or is it? For future research, I ask, "Is there a geographical correlation between these numbers?". I could not do this in my project because the lat/lon data represents where the incident took place, not where the precinct station is. I would also ask, "what types of arrests are most common?", do some precincts have more vehicle accidents that lead to more arrests? Future research will be needed to determine these answers.