Welcome back folks!
Thanks to all those people who I know personally and professionally who have told me they are enjoying this newsletter. It genuinely means a lot and makes me feel less like a first year US college student recording audiotapes to send to a family friend.
And an even bigger thanks to those of you who have responded with follow up questions or comments. The most frequently asked question so far (n = 2) is one of the topics in this week’s FFF. Your follow up has inspired me to share some of my own thoughts (and I definitely have them) rather than just share links.
1 - Why R in the public sector? This is a good question. Like many people, I taught myself R because I thought it would be useful in my job. It was, but I literally don’t really know any better.
That said, when probed by one of my dear readers over a pre-lockdown dinner, I felt in good conscience that I could recommend R as the best analytical tool for people working with data and statistics in the public sector in Australia. Which is actually a lot of people when you think about it…
My reasons were the following:
it’s accessible to beginners/people from a non-computer science background through the {tidyverse}. The tidyverse is probably a whole topic in itself, but it is a collection of packages - I think it of as a dialect or accent of the R language - that are designed for data science. The reason it is more accessible to beginners is that it supports a more human readable style of programming than base R and has some familiarity for people used to working with column/tabular data in Microsoft Excel.
existence of tools like {ggplot2} to make highly polished and publication-ready graphics and charts. Like it or not, and I like it, part of working with data is creating data visualisations that effectively communicate. A common objection to moving from tools like PowerPoint or Excel is that programming languages make ugly charts that don’t comply with the style guide. In short, {ggplot2} is the ultimate answer to these objections. Admittedly, the defaults in ggplot2 are a tad ugly, but it is eminently customisable and can be made to comply with style guides down to the precise RGB codes you require.
fantastic geospatial capabilities without using specialist GIS tools. As the saying goes, “people like maps”. Who doesn’t have that family member who loves poring over the Melways despite the existence of Google Maps? And R has the ability to not only make beautiful maps but undertake fast geospatial analysis without using specialist GIS tools. If you want to learn more, I recommend starting with the online text Geocomputation with R.
an ecosystem of R packages that makes accessing public data easier. There is a growing ecosystem of R packages that make accessing public data in Australia easier. I say easier rather than easy as most of the work that the packages is wrangling data from a difficult format to a ‘tidy’ format. These packages include readabs, readrba and hildareadR.
increasing integration with Microsoft products that make it easier to work with colleagues and the IT department. There are an increasing number of tools that make integration with Microsoft products from R a dream. Some of these tools are unofficial, but still powerful, e.g. readxl or officer. There is also an increasing collection of R packages and tools produced by Microsoft, including to load data from Sharepoint.
Convinced? If not, please let me know in the comments.
There are also a range of separate reasons on why you would choose a programming language rather than Microsoft Excel (e.g. automation, reproducibility, version control) but none of these are exclusive to R.
Also, the scope of this recommendation excludes use cases like AI or machine learning. R might still be the best tool but I don’t feel qualified to comment. That said, if that is where you are starting on your analytics journey, then I humbly suggest you might be starting in the wrong place.
2 - A webinar on “Scaling up Analytics in the Public Sector” - This is a shameless plug for a webinar that my colleagues Peter Ellis (Nous’ Chief Data Scientist) and Martin Burgess (Senior Data Scientist) are giving on some of the topics mentioned above, plus more!
3 - The Ten Most Misleading Charts During Donald Trump’s Presidency. It’s about a month since the inauguration of President Biden, so maybe it is time to reflect on what we can learn (about charts) from the Trump era? This is a great collection of misleading charts made during the Trump era. In all seriousness, looking at bad data visualisations is probably more educative than admiring fancy visualisations in my opinion.
4 - Causal Inference: the mixtape. Not only does this book have a cool cover that betrays/flaunts the author’s Gen Y status, this book is also a fantastic introduction to a topic of interest to most: causal inference, with accompanying R code written in tidyverse style. This review is a banger1 and is actually written by someone who is past the second chapter. The online book is available free or you can purchase in other formats.
5 - Building a team of internal R packages. A fantastic blog post from the prolific Emily Riederer on building a “team of internal R packages”. This is definitely for the people who I just convinced in item 1 to file away for about six months time. It provides a great way of thinking about the different types of internal R packages that you might want to start building to make the lives of you and your colleagues easier.
Have a good weekend folks and see you in a fortnight!
I think that is what the Gen Z’s would say.