arrow
Search icon

One-day Workshop: Managing Demographic Big Data Analysis in R

Efficient data management and analysis of the Human Fertility Database and Human Mortality Database using tidyverse paradigms

The Sociological Research Methods Cluster of the Department of Sociology will host a one-day workshop on large-scale data analysis using the R statistical programming language, on Monday November 27.

Thanks to support from the UL Research Office, we can offer a limited number of places free of charge to current UL research students. Fees for all other UL-affiliated participants will be €50, and for participants with a non-UL academic affiliation, €75.

To apply, e-mail Anne McCarthy, anne.mccarthy@ul.ie.

 

Abstract

The Human Fertility Database and Human Mortality Database contain a wealth of high quality data on population, mortality and fertility variables from dozens of countries over many decades and centuries. When downloaded, the two databases comprise hundreds of separate files and are around half a gigabyte in size.

Using tidy data and functional programming paradigms within the R 'tiyverse', this one day course will present a generalisable workflow for working efficiently and effectively with these two databases, illustrating data management patterns which can be applied to many other sources of complex data. The two paradigms will first be applied to produce functional programming solutions for reading in and combining data from many different files, to produce two large datasets each in a tidy data format. In the second part, these tidied and combined datasets will be explored non-programmatically using dplyr verbs and piping, which make R code much more human readable and intelligible.

Finally, the third part of this course will again use functional programming approaches to batch produce both graphical analyses and derived data outputs for dozens of different populations, allowing outputs to be produced consistently in seconds rather than hours or days. If time permits, the course will conclude with additional bespoke analyses of HFD and HMD data, and discussion about how the data analysis workflow used throughout the course can be applied to other forms of data, such as the British Household Panel Survey.

Instructor

Dr Jon Minton is a research associate based at the College of Social Sciences, University of Glasgow. His research focuses on demographic data visualisation using Lexis surfaces, the 3D printing of demographic data, and more recently in using the insights from complex data visualisation to develop better models of demographic and epidemiological processes in an integrated workflow. Two recent examples of this, including one on the Troubles in Northern Ireland, are available as pre-prints from the links below: