Recently I made a point for “true” RMarkdown reproducibility via checkpointed package versions. Shortly thereafter I learned the hard way how crucial it is to use exactly the same R packages that were used when the script was initially written.
Since more than two years I have been preaching reproducibility and transparency in data journalism. My tool of choice: R and reproducible reports with RMarkdown.
But these reports aren’t really reproducible. A solution.
For me, 2015 was the year of R. The year I finally started to use R productively and on an almost daily basis (after years of learning and forgetting and learning all over again). In this post, I share my experiences and tell you why you should start using it for your next data journalism project in 2016.
Back when I was working at Tages-Anzeiger, I was asked to find a way to condense the content of several hundred PDF files into one spreadsheet. These PDFs contained indicator variables about the performance of nursing and retirement homes, and for some strange reason, they were only available as individual PDFs. I took it as an opportunity to learn new features of Node.js and it turned out to be a really good solution. In this post, I explain what I came up with.