At Swiss Public TV and Radio (SRF) we recently published an investigation of the “Collection #1-5” password leaks. In this post, I show how I searched through 900GB+ of data with Spark and R.
How I used the kknn and ggplot2 packages together with some parallel computation to spatially interpolate several hundred thousand points.
Recently I made a point for “true” RMarkdown reproducibility via checkpointed package versions. Shortly thereafter I learned the hard way how crucial it is to use exactly the same R packages that were used when the script was initially written.
In this blog post, I explain step by step how I (eventually) achieved a nice thematic map with pure ggplot2 – from a very basic, useless, ugly, default map to the publication-ready and (in my opinion) highly aesthetic choropleth.
For me, 2015 was the year of R. The year I finally started to use R productively and on an almost daily basis (after years of learning and forgetting and learning all over again). In this post, I share my experiences and tell you why you should start using it for your next data journalism project in 2016.
Back when I was working at Tages-Anzeiger, I was asked to find a way to condense the content of several hundred PDF files into one spreadsheet. These PDFs contained indicator variables about the performance of nursing and retirement homes, and for some strange reason, they were only available as individual PDFs. I took it as an opportunity to learn new features of Node.js and it turned out to be a really good solution. In this post, I explain what I came up with.