Skip to content
Guilherme Jacob edited this page Aug 12, 2024 · 3 revisions

1. TITLE OF PRESENTATION

Measurement of Poverty and Inequality with Publicly Available Microdata

https://guilhermejacob.github.io/context/

2. NAME(S) AND ADDRESS(ES) OF INSTRUCTOR(S)

List by presentation order. Email and office phone and fax numbers are to be included. It is essential that the Education Department at ASA is notified of any changes that occur between the time of submission and the time of presentation.

Guilherme Jacob & Anthony Damico

3. ABSTRACT

Provide an abstract not to exceed 200 words of the proposed course including the prerequisite for the anticipated audience. If the course is selected, this abstract will be used for advertising purposes in the registration material and on the JSM web site. Prerequisite knowledge or assumptions regarding the background of the attendees must be included in the abstract. If the abstract is more than 200 words, it will be edited by ASA.

Governments, NGOs, and other research institutes spend billions of dollars each year collecting demographic, economic, and health information about their populations. These efforts form the basis of many official reports, academic journal articles, and public health surveillance systems, each of which motivate public policy or inform the public to varying degrees. Though dependent on the sensitivity of the topic, these sponsoring organizations often publish household-level, person-level, or company-level datasets alongside their final, summary report. This response-level data (commonly known as microdata) allows external researchers both to reproduce the original findings and also to more deeply focus on segments of the population perhaps not discussed in the data products released by the authors of the original investigation. For example, the Census Bureau publishes an annual report, "Income and Poverty in the United States" with a series of tables, and also a database with one record per individual within each sampled households. While the Bureau helpfully provides many different measures of income dispersion in their results, an external researcher might find utility in this dataset by investigating other measures of poverty or inequality (such as the laeken measures to make comparisons between the United States and the European Union), and so the public microdata files allow continued research where it otherwise might end. The website https://guilhermejacob.github.io/context/ offers a wide range of poverty, inequality, and richness measures applicable to many publicly-available datasets using the R language. This textbook contains three core components, each with step-by-step instructions: (1) Data preparation of major economic wellbeing surveys from the United States and Brazil; (2) Poverty Indices; (3) Inequality Measurement.

4. OUTLINE

Provide a detailed outline of the entire program. Describe what will occur during each segment. DO NOT INCLUDE chapters of an upcoming book. Provide a description of the target audience.

Target Audience:

Researchers interested in conducting poverty or inequality research with the extremely rich and varied amount of public data available. This course could be of interest to anyone hoping to learn more about quantitative research, economics, public policy, demography, or any other field reliant on social statistics to better understand the distribution of income and wealth in a nation, state, or population.

Both beginner or advanced R users are welcome, however some understanding of R syntax will be helpful depending on the complexity of the microdata chosen. The instructor will attempt to guide participants toward datasets appropriate for their coding skill level.

15 minute discussion of interest in this topic:

Personally interested because we hope to equip more researchers with the ability to calculate a gini coefficient and other measures of income and wealth distribution. Population surveys show that the public wants inequality at a lower level than they believe it to be - and also, the public believes inequality to be lower than it actually is:

https://doi.org/10.1016/j.resglo.2023.100118

https://doi.org/10.1177/1745691610393524

Participant introductions and responses to any (or all) of these questions:

1. What are your professional goals and interest in the topic of poverty and inequality research?

2. What microdata sets do you have interest in or experience with?

3. What's your level of experience with the R language and the survey package?  (Zero experience is OK!)

15 minute illustrative example to calculate the gini coefficient by hand:

1. 10 families, each have $10: draw the lorenz curve - gini = 0

2. 10 families, nine have $0 while one has $100: draw the curve - gini = 1

3. 10 families, two have $0, two have $5, two have $10, two have $15, two have $20

x <- c( 0 , 0 , 5 , 5 , 10 , 10 , 15 , 15 , 20 , 20 )

y <- sum( x * seq_along( x ) )

z <- 2 * y / sum( x ) - ( length( x ) + 1 )

gini_coefficient <- z / length( x )

15 minute lecture introducing complex sampling:

  1. The NHANES mobile examination center performs in-person dental examinations and blood labs on a representative sample of the country but not a simple random sample of the country

  2. In-person interviewers administer the Consumer Expenditure Survey and the American Time Use Survey by instructing respondents how to record every expenditure into a ledger, every ten minutes into a journal, respectively. Both of these result in representative samples, but neither are simple random samples.

  3. The American Housing Survey visits each selected housing unit, collecting information with Computer-Assisted Personal Interviewing on both occupied and unoccupied housing units. Again, this allows for a dataset that generalizes to the country without being a simple random sample.

Fundamentally, a complex sample survey aims to save money on the transportation costs of its interviewers by sampling geographies first and then people (or businesses or structures) within the geographies. So instead of sampling individuals nationwide, a survey administrator samples twenty towns and cities across the country, and then within those geographic areas, again samples multiple individuals. Nationwide, everyone still has the same probability of being sampled, but once the first stage of sampling occurs - when geographies are sampled - then suddenly the residents of those sampled geographies have a much higher probability of inclusion and everyone else's inclusion probability goes to zero. But now, instead of sending an in-person survey team to ten thousand different interviews across the country, they'll only need to travel to twenty. Suddenly, the survey interviewer transportation budget looks much nicer.

15 minute hands-on loading of the CPS-ASEC, PNAD-Contínua, and/or SCF:

In 2006, the Financial Times reported on a team of researchers finding that “Personal wealth is distributed so unevenly across the world that the richest two per cent of adults own more than 50 per cent of the world’s assets while the poorest half hold only 1 per cent of wealth.” Although the original publication presented a global estimate, will begin the session by reproducing this calculation using nationally-representative surveys from both the United States and Brazil. We reproduce this inequality statistic with a variety of surveys and levels of analysis to highlight how this software can be used not only to estimate a number but also to understand the uncertainty around that number.

Textbook sections 1.4, 1.5, and 1.6 offer these three datasets; http://asdfree.com/ entries offer the same.

Differences between income, earnings, net worth, and consumption!

30 minute hands-on review of Chapter 1.7 - Real World Examples

BREAK FOR SNACKS!

30 minute whole class review of gini coefficient entry:

  1. Strengths and weaknesses

  2. Mathematics and commentary

  3. Replication example

  4. Real World Examples

30 minute hands-on session replicating svygini() within every state in the US or Brazil

15 minute review of flowchart

[remaining time] minute review of http://asdfree.com/ to discuss additional microdata

5. LEARNING OUTCOMES: The following must be included in your proposal:

(a) Learning outcomes (performance objectives): The proposal must include a clear and concise statement of intended learning outcomes for the course. Learning outcomes are statements that identify what knowledge, skills and/or attitudes attendees are expected to accomplish/demonstrate as a result of the course. The attainment of the stated learning outcomes will be assessed as part of the CE Course evaluation process at the conclusion of the course so it is imperative that the presenter teach to these objectives.

Participants will ideally complete this course with the ability to explain why governments and other organizations fund complex sample surveys rather than drawing simple random samples. Participants will also feel confident reproducing the nationwide gini coefficient in the United States and Brazil directly using publicly available microdata. And given the high similarity of the structure of each poverty index and inequality measure on https://guilhermejacob.github.io/context/, participants will ideally quickly understand that once they are able to get started using any of these entries, it's quite simple to apply the same knowledge to all of these entries.

(b) Content and instructional methods: The presenter must include a description of course content and instructional strategies based on the learning outcomes (performance objectives).

This hands-on course will begin with a powerpoint-free discussion of the motivations behind inequality research methodology, followed by a mix of discussions of the wide breadth available indicators and also time to test out the most frequently used measure while having an instructor present and available for questions.

6. INSTRUCTORS(s)

Paragraph highlighting instructor’s background and experience with subject. DO NOT include resumes and/or curriculum vitae.

Guilherme Jacob is currently a PhD candidate at the National School of Statistical Sciences (ENCE/IBGE) in Rio de Janeiro, Brazil. He studied for a B.Sc in Economics at the Federal University of Amazonas in Manaus, Brazil, and completed a MSc. in “Population, Territory and Public Statistics” at ENCE/IBGE. He was also Leslie Kish Fellow at the University of Michigan’s 2020 Sampling Program for Survey Statisticians and recipient of the 2021 Cochran-Hansen Prize, awarded by the International Association of Survey Statisticians.

Anthony Damico is an Independent Consultant who conducts data analysis for health care policy research. He has published in peer-reviewed policy and methods journals using the R, SAS, Stata, and SUDAAN statistical programming languages. Prior to becoming an independent consultant, he was with the Kaiser Family Foundation in Washington, D.C. Anthony holds a Bachelor’s degree in Mathematics from Oberlin College and a Masters in Health Policy from Johns Hopkins University.

7. AUDIO-VISUAL EQUIPMENT

Each presentation will be provided with one screen, one data projector and one lavaliere microphone. A flip chart and second screen are available at no extra charge upon request. Presenters desiring additional AV equipment are responsible for additional equipment expense. Details are available upon request.

Presenter will just need a projector and wifi for laptop

Participants will need a laptop with wifi and R installed