At Swiss Public TV and Radio (SRF) we recently published an investigation of the “Collection #1-5” password leaks. In this post, I show how I searched through 900GB+ of data with Spark and R.
Recently I made a point for “true” RMarkdown reproducibility via checkpointed package versions. Shortly thereafter I learned the hard way how crucial it is to use exactly the same R packages that were used when the script was initially written.
For me, 2015 was the year of R. The year I finally started to use R productively and on an almost daily basis (after years of learning and forgetting and learning all over again). In this post, I share my experiences and tell you why you should start using it for your next data journalism project in 2016.