Simply collecting the data for this story about the Berlin Police Department is more complicated than I first suspected.
It’s my first data journalism story and I wanted something challenging — a project where I would learn — but something doable. Studying my hometown police department’s daily blotter for the month of January seemed reasonable and interesting.
In last week’s post, I told you Google’s search engines turned up valid entry after valid entry in its results. At first, it was easy: I went from one .pdf to another, downloading the files to my computer. But after downloading the 13th .pdf, I found out Google did not bring up all the results.
At first, I thought it was the police department, so I waited a few days before running the search again. But the same search a few days later on Jan. 23 gathered the same results.
That’s when I decided to manipulate the URL of one of the documents that was there in hopes of finding documents not retrieved by Google.
I started with the URL to the daily police blotter for Jan. 7: http://www.berlinpd.org/images/pdfs/DAILY%20BLOTTER%201-07-2014.pdf
Since the date is in the address, I simply changed the date to a document I didn’t have. The document loaded; I downloaded it and I kept changing the date until I got an error message.
Here’s the lesson: A search engine is a good starting point when looking for information, but it has limitations.
The next step was converting the data into something I could use.
You can’t use data in .pdf format because .pdfs are designed for reading and publishing. You have to have it in a .xls format, something malleable so that it can be played with, measured and counted.
Reading about data journalism, I learned there are ways to convert .pdfs into something usable, but by the sounds of it, a person needed to know a bit of code.
I instead opted for the easy way out and Googled “convert pdf to excel” and found a few websites that do it for free.
The only problem? The .pdf converter made each page of the .pdf into a separate sheet in the Excel workbook. After trying to find a quick solution online today, I simply copy and pasted each sheet into a “Master List” in the workbook.
It probably needs copyediting, but I have three weeks worth of data that I can start exploring.