The full article is here, but the headline points in the article were; R programming’s use across disciplines fitting in well with multidisciplinary policy analysis teams; the greater reproducibility/transparency written code provides; and the practical advantages that can come from automating repetitive bits of policy analysis (such as reporting results of policy analysis across multiple scenarios).
While the article didn’t end in me getting a book deal, it did result in me receiving a surprising number of messages from people that were just as passionate as I am about the potential of R in the public policy world. At the same time, I also had a number of people asking me to put my money where my mouth is by showing them how it’s useful by teaching them.
So after being offered a space by the Microsoft Reactor in Sydney, I took up their challenge. Throwing together a course based on what I thought would be most useful based from my experience as a consultant/economist/policy analyst.
Running for an hour a week over four weeks it covered the basics of automating tasks, undertaking exploratory analysis, visualizing data and generating summary statistics in the context of answering questions as a policy advisor.
The course went well. So well in fact, that the most common request from participants in course evaluations were for future courses to be longer. I also found:
The grammar of the Tidyverse made learning the basics much faster: I learned base R via an online series of courses and managed to learn the core principles of the Tidyverse in a little more than a day. For policy analysts/consultants it also made more sense, thanks to Tidyverse’s more intuitive grammar.
People didn’t need a background in statistics to be able to quickly pick up the basics: the course was equivalent to a little over a full day of material and covered a lot of ground but everyone kept up. From past courses I’ve seen this isn’t guaranteed, so it was a pleasant surprise.
Practice was preferred to theory: I wasn’t a straight A student, so I get this. But everything was picked up quicker if it was made relevant to the daily lives of participants rather than being draped in a purely theoretical framework.
Pipes are confusing: This is contentious, but I remember feeling this way when I first started to learn R. I love pipes now, but people in my sessions preferred nested formulas. Trying to introduce it so early was just distracting.
People loved data viz with ggplot: However, this was more because of ggplot’s ability to quickly segment and visualize data (such as through applying facets to demographic classifications) than quality of what it could produce. This makes sense given a large part of a policy analyst’s work is about exploratory analysis that is used to inform written recommendations, rather than being presented.
So where to from here? Well, outside of shamelessly rebranding my 2019 article for 2020, I’ve been convinced to develop a longer and more widely accessible online version of the free course to satisfy the demands from those that wanted to join but couldn’t due to time constraints or being in the wrong city/country:
Which is the second reason I wanted to write this up, as if you’re a fellow R/Python programmer in the policy/consulting space I’d love to hear from you to get your thoughts about what you think is useful. So if that’s you, feel free to drop me a line either via LinkedIn, Twitter or the contact form here.
And if you or someone you know is interested in signing up for the first run of the online crash course in R, you can do so via program4policy.com
With the rise of ‘Big Data’, ‘Machine Learning’ and the ‘Data Scientist’ has come an explosion in the popularity of using open-source programming tools for data analysis.
This article
provides a short summary of some of the evidence of these tools overtaking
commercial alternatives and why, if you work with data, adding an open
programming language, like R or Python, to your professional repertoire is
likely to be a worthwhile career investment for 2019 and beyond.
Like most faithful public policy wonks,
I’ve spent more hours than I can count dragging numbers across a screen to understand,
analyse or predict whatever segment of the world I have data on.
New policy questions, new approaches to
answer them and a fresh set of data.
Yet, every silver-lining has a cloud. And in
my experience with data it’s often the need to scale a new learning curve to adhere
to legacy systems and fulfil an organizational fetish for using their statistical
software of choice.
Excel, SAS, SPSS, Eviews, Minitab, Stata
and the list goes on.
Which is why I’ve decided this article
needed to be written:
Because not only am I tired of talking to fellow
analytical wonks about why they’re limiting themselves by only being able to work
on data with spreadsheets, but also that there are distinct professional
advantages to unshackling yourself from the professional tyranny of proprietary
tools:
Open-Source Statistics is
Becoming the Global Standard
Firstly, if you haven’t been watching, the
world is increasingly full of data. So much data, that the world is chasing
after nerds to analyse it. As a result, the demand for a ‘jack of all trades’
data person, or “data scientist” has been outstripping that of a more
vanilla-flavoured ‘statistician’:
% Job Advertisements with term “data scientist” vs. “statistician”
And although you
might not have aspirations to work in what the Harvard Business Review called
the ‘Sexiest Job of the
21st Century’ the data gold rush has had implications far beyond the
sex appeal of nerds.
So much, that some
of the best evidence, suggests that not only is
demand for quants with R and Python skills booming, but the practical use of open-source
statistical tools like R and Python are starting to eclipse their proprietary relatives:
Of course, I’m not here to conclusively make the point that a particular piece of software is a ‘silver bullet’. Only that something has happened in the world of data that the quantitatively inclined shouldn’t ignore: Not only are R and Python becoming programming languages for the masses, but they’re increasingly proving themselves as powerful complements to more traditional business analysis tools like Excel and SAS.
But I’m going to goosestep right over the issue as in my
opinion much of what I say for R, is increasingly applicable to Python.
For those of you
unfamiliar with R, in essence it’s a programming language made to use computers
to do stuff with numbers.
Enter: “10*10”
and it will tell you ‘100’
Enter: “print(‘Sup?’)”
and the computer will speak to you like that kid loitering on your lawn.
Developed
around 25 years ago, the idea behind R was
in essence to develop a simpler, more open and extendible programming language
for statisticians. Something which allowed you greater power and flexibility
than a ‘point and click’ interface, but that was quicker than punch cards or manually
keying in 1s and 0s to tell the computer what to do.
The result: R – A free statistical tool whose sustained growth has helped create one of the most flexible statistical tools in existence.
So much growth
in fact, that in 2014 enough new functionality was added to R by the community
that “R
added more functions/procs than the SAS Institute has written in its entire
history.” And while it’s
not the quantity of your software packages that counts, the speed of
development is impressive and a good indication of the likely future trajectory
of R’s functionality. Particularly as many heavy hitters including the likes of
Microsoft,
IBM and Google are already using R and making their own
contributions to the ecosystem:
Using R for Analytics – Get in Before George Clooney Does:
Not only that, but with much of this growth being driven by user contributions, it is also a great reminder of the active and supportive community you have access to as an R and Python user. Making it easier to get help, access free resources and find example code to steal base your analysis on.
One of the first
things that motivated me to learn R, was the observation that many of the most interesting
questions I encountered went unanswered because they crossed disciplines, involved
obscure analytical techniques, or were locked away in a long-forgotten format. It
therefore seemed logical to me that if I could become a data analytics “MacGyver”,
I’d have greater opportunities to work on interesting problems.
Which is how I
discovered R. You see, as somebody that is interested in almost everything, R’s adoption by such a diverse range of fields
made it nearly impossible to overlook. With extensions being freely available to
work with a wide variety of data formats (proprietary
or otherwise) and apply a range of nerdy methods, R made a lot of
sense.
Yet there is perhaps a subtler reason adopting R made sense and that’s the simple fact that by being ‘discipline agnostic’ it’s well-suited for multidisciplinary teams, applied multi-potentialites and anyone uncertain about exactly where their career might take them.
4. R Helps Avoid Fitting the Problem to the Tool
As an economist, I love a good echo chamber. Not only does everybody speak my language and get my jokes, but my diagnosis of the problem is always spot-on. Unfortunately, thanks to errors of others, I’m aware that such cosy teams of specialists, isn’t always a good idea – with homogeneous specialist teams risking developing solutions which aren’t fit for purpose by too narrowly defining a problem and misunderstanding the scope of the system it’s embedded in.
While good
organizations are doing their best to address
this, creating teams that are multidisciplinary and have more
diverse networks can be a useful means to protect against these
risks while also driving better performance. Which of course stands
to be another useful advantage of using more general statistical tools with a
diverse user base like R: as you can more fluidly collaborate across
disciplines while being better able to pick the right technique for your
problem, reducing the risk that everything look like
a nail, merely because you have a hammer. 5. Programming Encourages Reproducibility
Yet programming languages also hold an
additional advantage to more typical ‘point and click’ interfaces for
conducting analysis – transparency and reproducibility.
For instance, because software like R encourages
you to write down each step in your analysis, your work is more likely to be ‘reproducible’
than had it been done using more traditional ‘point and click solutions. This
is because you’re encouraged to record each step needed to achieve the final
result making it easier for your colleagues to understand what the hell you’re
doing and increasing the likelihood you’ll be able to reproduce the results when
you need to (or
somebody else will).
In addition to this being practically
useful for tracing your journey down the data-analysis-maze, for analytical
teams it can also serve as a means for encouraging collaboration by allowing to
more easily understand your work and replicate your results. Assisting with
organizational knowledge retention and providing an incentive for ensuring
analysis is accurate by often making it easier to spot errors before they
impact your analysis or soil
your reputation.
Finally, while the use of scripting isn’t unique to open-source programming languages, by being free, R and Python comes with an additional advantage that in the instance you decide to release your analysis, the potential audience is likely to be greater and more diverse than had it been written using propriety software. Which is why in a world of the “Open Government Partnership” open-source programming languages makes a lot of sense, providing a means of easing the transition towards government publicly releasing government policy models.
6. R Helps Make Bytes Beautiful
As data-driven-everything becomes all the
rage, making data pretty is becoming an increasingly important skill. R is great at
this, with virtually unlimited options for unleashing your
creativity on the world and communicating your results to the masses. Bar
graphs, scatter diagrams, histograms and heat maps. Easy.
But R’s visualization tools don’t finish at your desk, with the ‘Shiny’ package allowing you to take your pie graphs to the bigtime by publishing interactive dashboards for the web. Boss asking you to redo a graph 20 times each day? Outsource your work to the web by automating it through a dashboard and send them a link while you sip cocktails at the beach.
7. R and Python are free, but the Cost of Ignoring the Trend Towards Open-Source Statistics Won’t Be
Finally, R and Python are free, meaning not
only can you install it wherever you want, but that you can take it with you
throughout your career:
Statistics lecturers prescribing
you textbooks that are trying to get you hooked on expensive software that
likely won’t exist when you graduate? Tell
them it’s not 1999 and send
them a link to this.
Working for a not-for-profit
organization that needs statistical software but can’t afford the costs of
proprietary software? Install R and show them how to install Swirl’s free interactive
lessons.
Want to install it at home? No
problem. You can even give a copy to your cat.
Got a promotion and been gifted
a new team of statisticians? Swap the Christmas
bonuses for the give the gift that keeps giving: R!
Rather,
I’d like to suggest that for all the immediate costs involved in learning an
open-source programming language, whether it be R or Python, the long-term benefits
are more than likely to surpass them.
Not only that, but as a new generation of data scientists continue to push for the use of open-source tools, it’s reasonable to expect R and Python will become as pervasive a business tool as the spreadsheet and as important to your career as laughing at your boss’ terrible jokes.
My latest op-ed was published in Monday’s edition of the Myanmar Times.
The article provides a brief summary of Myanmar’s democratic and economic reforms as they relate to the country’s management of their public finances. A summary of the article and a link to the full piece is provided below.
Catalysing transition through public financial management reform
By Giles Dickenson-Jones and Matthew Arnold
Public financial management reforms are central to Myanmar’s entire transition. Improvements to social services like garbage collection, investment in new roads and bridges, and raising standards of health and education are all premised on the government being able to raise more revenue and then effectively spend it achieving policy goals. In order for the National League for Democracy government to achieve its goals for economic and political reform, it is therefore a critical area for prioritisation.
The paper ‘Intergovernmental Fiscal Relations in Myanmar’ takes a look at how Myanmar’s State, Region and Union governments relate to each other as part of budget and planning processes.
Although it is targeted at a more general audience, it has been developed in the interest of providing greater clarity around the informal and formal processes that inform public budget processes and fiscal decentralization in Myanmar.
As somebody who started his career in government, I think perhaps one strengths of this report was that many of the initial findings were tested at the drafting stage as part of an interactive workshop the team held in Naypyitaw.
Yet another quiet couple of months on the blogging front can be explained by me feverishly working on a number of projects as I reach my 2 year anniversary in Myanmar. The latest of these has been the launching of the Open Myanmar Initiative’s Budget Dashboard, which is now available online here:
The website, which I helped develop using the open-source R language and the free Shiny library provides the first user-friendly interface for exploring Myanmar’s budgets both at the Union level and across all 14 States and Regions.
Although there is still a long way to go before citizens become genuinely engaged with the budget process, I think this is a significant first step in the right direction and will allow interested citizens, researchers and businesses to more easily examine where public money is spent, so a conversation about where it should be spent can be had. I’m also encouraged to see public finances have been included in the National League for Democracy’s economic polices.
The budget dashboard is part of the Asia Foundation’s support of an open budget process in Myanmar in partnership with the Open Myanmar Initiative (OMI). OMI’s Budget explorer was developed by Ewan Keith, Loren Velasquez and Giles Dickenson-Jones with the help of Statistics Without Borders.
* Postscript: As an update, the original budget dashboard described in this post has since been taken over (and greatly improved!) by the locally based budget and parliamentary transparency organization ‘The Ananda’. The link has been updated to reflect this.
For the few of you who might be interested in knowing more about how Myanmar’s taxation system works outside of the union government, I’ve recently published a briefing note with a colleague on the topic. The note is available online here.
So no doubt you would have all noticed I have been rather silent lately on the ye olde interweb. Although there is of course no excuse for this, it’s predominantly a result of having been working rather intensely on a piece of research looking at Myanmar’s public finance system:
This paper focuses on understanding the role of state and region governments in relation to Myanmar’s public finances. This has been done to take stock of existing research, better understand the composition of subnational finances, and attempt to address whether, at this point in the fiscal decentralization process, state and region governments have sufficient resources to fulfil their constitutionally delegated responsibilities. Recognizing the complex and varied factors relevant to addressing these questions, a range of qualitative and quantitative approaches were employed, including semi-structured interviews of stakeholders, consultation with sector experts and analysis of published budget and socioeconomic data.
I was lucky enough to be invited to speak at a leadership conference about applying a ‘Bright Spots’ approach to tackling problems and have received a number of requests for further information around the idea.
At the outset, I should make it clear to everyone that I unfortunately did not come up with this idea. Rather, the approach was popularized by Chip and Dan Heath in their book ‘Switch’.
There I am, trying my best to grow a prize pumpkin so as to decimate my neighbour Jim in the annual harvest festival.
But lo and behold after 3 months, six out of the ten pumpkins have barely grown at all and another two appear to have ceased to live.
But I’m determined. After all Jim couldn’t be more deserving of a trouncing at the pumpkin festival.
So I begin to try and figure out the problem, checking the acidity of the soil, ensuring my automated watering system is working, my gate is locked to keep Jim out and ensuring there is sufficient horse manure to keep my infant pumpkins thriving past their awkward teenage years.
But here’s the problem, as I’m spending time chastising my dog for the teeth marks on the watering system, which Jim assured were not his, I’m diverting all my attention into solving pumpkin-related problems, rather than trying to replicate pumpkin-related success.
Put simply, I’m ignoring those two pumpkins which appear to be thriving.
And in a nutshell this is the idea behind the ‘Bright Spots’ approach: don’t solve problems, copy success.
Bright Spots and Fighting Child Malnutrition
It is also a helpful reminder in the world of international development where we can become obsessed with the process of solving problems, when the solutions might have already presented themselves through past success.
Now Jerry, knowing very little about Vietnam, knew a lot about the causes of malnutrition; poor sanitation, poverty and a lack of clean water.
But how does a person make a dent on these problems in 6 months?
Taking the context as given, he started looking for ‘Bright Spots’.
He did this by touring village after village and looking for children who were less malnourished than their peers, despite facing the same context of poverty and poor sanitation.
From this he then started to build a picture of what the mothers of these children were doing differently.
What he found was striking. You see, the accepted wisdom was in order for children to avoid malnutrition their parents should feed them soft foods with clean rice two times a day.
Yet the mothers of the ‘Bright Spot’ children were doing something quite different.
Firstly, instead of feeding their children two times a day, they were feeding the same amount of food over four smaller meals, allowing more nutritional value to be taken from the same amount of food.
Secondly, they were supplementing the meals with locally available food (such as crabs and shrimp which lived in the rice paddies), which provided an additional source of protein and nutrients.
Armed with this knowledge, he started to implement cooking classes run by the ‘Bright Spot’ mothers to cement the knowledge.
The results?
Six months after Sternin had come to the Vietnamese village, 65% of the kids were better nourished and stayed that way.
The program was expanded and today has reaches 2.2 million Vietnamese people in 265 villages (Source).
Explaining the Outliers
But the significance of this approach extends far beyond pumpkins and shrimp.
In fact in the world of economics this idea couldn’t be more relevant, as we are often looking for general relationships. Take the relationship between how happy somebody says they are and wealth provided in the figure below:
Life satisfaction tends to increase with GDP per capita
Now for the many of you who have made it your life’s work to avoid the painful process of interpreting graphs, the key idea to get out of this is that as a general rule individuals in more wealthy countries have greater levels of life satisfaction.
Genius right?
But we can clearly see that this isn’t true for all countries. For instance, Argentina’s average income is as high as New Zealand’s, but they’re not very satisfied.
On the other hand, China is much poorer than France, but has higher levels of satisfaction.
In the world of statistics we might call China and New Zealand ‘outliers’, as they’re countries which seem to be bucking the trend.
Now although this is not very surprising, given that we all know that money doesn’t buy happiness (although it helps), it does provide a great example of how we might look to use the approach, even in the (sometimes) boring world of economics.
Instead of trying to get more happiness through raising incomes, why not examine what makes people in New Zealand and China more satisfied to see if we can replicate it?
Want to develop professionally?
Perhaps build on your strengths, rather than focusing on the identified weaknesses.
Making a new year’s resolution?
Focus on those you’ve managed to keep and nurture success.
Growing pumpkins?
Steal your neighbours pumpkin seeds, rather than sabotaging their watering system.