urbnindicators

Overview

urbnindicators aims to provide users with analysis-ready data from the American Community Survey (ACS).

With a single function call, you get:

Access to hundreds of standardized variables, such as percentages and the raw count variables used to produce them.
Margins of error for all variables–those direct from the API as well as derived variables.
Meaningful, consistent variable names.
A codebook that describes how each variable is calculated.
The built-in capacity to pull data for multiple years and multiple states.
Supplemental measures, such as population density, that aren’t available from the ACS.
Built-in quality checks to help ensure that calculated variables and measures of error are accurate. Plus some good, old-fashioned manual QC. That said–use at your own risk. We cannot and do not guarantee there aren’t bugs.

Installation

Install the development version of urbnindicators from GitHub with:

# install.packages("renv")
renv::install("UI-Research/urbnindicators")

You’ll want a Census API key (request one here). Set it once with:

tidycensus::census_api_key("YOUR_KEY", install = TRUE)

Note that this package is under active development with frequent updates–check to ensure you have the most recent version installed!

Use

Discover Available Data

list_tables() |> head(10)
#>  [1] "age"                    "computing_devices"      "cost_burden"           
#>  [4] "disability"             "educational_attainment" "employment"            
#>  [7] "gini"                   "health_insurance"       "household_size"        
#> [10] "income_quintiles"

Obtain Data

A single call to compile_acs_data() returns analysis-ready data with pre-computed percentages, meaningful variable names, and margins of error:

df = compile_acs_data(
  tables = "race",
  years = c(2019, 2024),
  geography = "county",
  states = "NJ")

df %>%
  select(1:10) %>%
  glimpse()
#> Rows: 42
#> Columns: 10
#> $ data_source_year             <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019,…
#> $ GEOID                        <chr> "34025", "34037", "34013", "34015", "3403…
#> $ NAME                         <chr> "Monmouth County, New Jersey", "Sussex Co…
#> $ total_population_universe    <dbl> 621659, 141483, 795404, 291165, 503637, 9…
#> $ race_universe                <dbl> 621659, 141483, 795404, 291165, 503637, 9…
#> $ race_nonhispanic_allraces    <dbl> 554491, 129866, 612222, 273106, 294434, 7…
#> $ race_nonhispanic_white_alone <dbl> 467752, 122081, 242965, 228576, 208005, 5…
#> $ race_nonhispanic_black_alone <dbl> 41697, 2991, 305796, 28452, 52523, 49249,…
#> $ race_nonhispanic_aian_alone  <dbl> 440, 16, 1107, 204, 651, 1000, 123, 191, …
#> $ race_nonhispanic_asian_alone <dbl> 33451, 2887, 41976, 9002, 25732, 151090, …

Visualize Data

compile_acs_data() makes it easy to pull multiple years and produce publication-ready visualizations:

plot_data = df %>%
  transmute(
    county_name = NAME %>% str_remove(" County, New Jersey"),
    race_personofcolor_percent,
    race_personofcolor_percent_M,
    data_source_year = factor(data_source_year))

state_averages = plot_data %>%
  summarize(
    .by = data_source_year,
    mean_pct = mean(race_personofcolor_percent)) %>%
  arrange(data_source_year) %>%
  pull(mean_pct)

## order counties by 2019 value for the dumbbell plot
county_order = plot_data %>%
  filter(data_source_year == "2019") %>%
  arrange(race_personofcolor_percent) %>%
  pull(county_name)

plot_data = plot_data %>%
  mutate(county_name = factor(county_name, levels = county_order))

dumbbell_data = plot_data %>%
  pivot_wider(
    id_cols = county_name,
    names_from = data_source_year,
    values_from = race_personofcolor_percent,
    names_prefix = "year_")

ggplot() +
  geom_segment(
    data = dumbbell_data,
    aes(
      x = county_name,
      y = year_2019,
      yend = year_2024),
    color = palette_urbn_main[7],
    linewidth = 1) +
  ggdist::stat_gradientinterval(
    data = plot_data,
    aes(
      x = county_name,
      ydist = distributional::dist_normal(
        race_personofcolor_percent,
        race_personofcolor_percent_M / 1.645),
      color = data_source_year),
    point_size = 2,
    .width = .95) +
  geom_hline(
    yintercept = state_averages[1],
    linetype = "dashed",
    color = palette_urbn_main[1]) +
  geom_hline(
    yintercept = state_averages[2],
    linetype = "dashed",
    color = palette_urbn_main[2]) +
  annotate(
    "text",
    y = state_averages[1] - .15,
    x = 21.5,
    label = "State mean (2019)",
    fontface = "bold.italic",
    color = palette_urbn_main[1],
    size = 9 / .pt,
    hjust = 0,
    nudge_y = .01) +
  annotate(
    "text",
    y = state_averages[2] + .01,
    x = 21.5,
    label = "State mean (2024)",
    fontface = "bold.italic",
    color = palette_urbn_main[2],
    size = 9 / .pt,
    hjust = 0,
    nudge_y = .01) +
  labs(
    title = "All NJ Counties Experienced Racial Diversification from 2019 to 2024",
    subtitle = paste0("Share of population who are people of color, by county, 2019-2024
Confidence intervals are presented around each point but are extremely small"),
    x = "",
    y = "Share of population who are people of color") +
  scale_x_discrete(expand = expansion(mult = c(.03, .04))) +
  scale_y_continuous(
    breaks = c(0, .25, .50, .75, 1.0),
    limits = c(0, .75),
    labels = scales::percent) +
  coord_flip() +
  theme_urbn_print()

Custom Geographies

ACS data are available for standard geographies (tracts, counties, states, etc.), but many analyses require non-standard areas like neighborhoods, school zones, or planning districts. calculate_custom_geographies() aggregates tract-level data to any user-defined geography, properly re-deriving percentages and propagating margins of error:

dc_tracts = compile_acs_data(
  tables = "snap",
  years = 2024,
  geography = "tract",
  states = "DC",
  spatial = TRUE)

## assign each tract to a quadrant based on its centroid
dc_tracts = dc_tracts %>%
  mutate(
    centroid = sf::st_centroid(geometry),
    lon = sf::st_coordinates(centroid)[, 1],
    lat = sf::st_coordinates(centroid)[, 2],
    quadrant = case_when(
      lon <  median(lon) & lat >= median(lat) ~ "NW",
      lon >= median(lon) & lat >= median(lat) ~ "NE",
      lon <  median(lon) & lat <  median(lat) ~ "SW",
      lon >= median(lon) & lat <  median(lat) ~ "SE")) %>%
  select(-centroid, -lon, -lat)

## aggregate tracts to quadrants
dc_quadrants = calculate_custom_geographies(
  .data = dc_tracts,
  group_id = "quadrant",
  spatial = TRUE)

dc_quadrants %>%
  sf::st_drop_geometry() %>%
  select(GEOID, snap_received_percent, snap_received_percent_M)
#>   GEOID snap_received_percent snap_received_percent_M
#> 1    NE            0.15951925             0.019448994
#> 2    NW            0.07036185             0.006889427
#> 3    SE            0.24445974             0.012073306
#> 4    SW            0.06525691             0.012003668

See vignette("custom-geographies") for more.

Custom Derived Variables

Beyond the package’s built-in tables, you can define your own derived variables using the define_*() helpers and pass them directly to compile_acs_data(). Your custom variables automatically get codebook entries and margins of error:

df = compile_acs_data(
  tables = list(
    "snap",
    define_percent(
      "snap_not_received_percent",
      numerator_variables = c("snap_universe"),
      numerator_subtract_variables = c("snap_received"),
      denominator_variables = c("snap_universe")),
    define_one_minus(
      "snap_received_complement",
      source_variable = "snap_received_percent")),
  years = 2024,
  geography = "county",
  states = "DC")

df %>%
  select(matches("snap.*percent")) %>%
  glimpse()
#> Rows: 1
#> Columns: 4
#> $ snap_received_percent       <dbl> 0.143
#> $ snap_not_received_percent   <dbl> 0.857
#> $ snap_received_percent_M     <dbl> 0.0064
#> $ snap_not_received_percent_M <dbl> 0.0071

The available helpers are:

Helper	Use case
`define_percent()`	Ratio of a numerator to a denominator
`define_across_percent()`	Percentages for every column matching a regex
`define_across_sum()`	Sum paired columns (e.g., male + female counts)
`define_one_minus()`	Complement of an existing percentage (1 - x)
`define_metadata()`	Codebook-only entry for a non-computed variable

See vignette("custom-derived-variables") for detailed examples of each helper.

Learn More

Check out the vignettes for additional details:

A package overview to help users Get Started.
An interactive version of the package’s Codebook so that prospective users can know what to expect.
A brief description of the package’s Design Philosophy to clarify the use-cases that urbnindicators is built to support.
An illustration of how Quantifying Survey Error can improve inference making.
You can re-create your indicators and their measures of error for Custom Geographies. Neighborhoods? Unincorporated counties? Start here.
A guide to defining Custom Derived Variables using the define_*() helpers.

Credits

This package is built on top of and enormously indebted to library(tidycensus), which provides the core functionality for accessing the Census Bureau API. For users who want additional variables, library(tidycensus) exposes the entire range of pre-tabulated variables available from the ACS and provides access to ACS microdata and other Census Bureau datasets.

Learn more here: https://walker-data.com/tidycensus/index.html.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
R		R
man		man
renv		renv
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
debug_auto.R		debug_auto.R
renv.lock		renv.lock
urbnindicators.Rproj		urbnindicators.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

urbnindicators

Overview

Installation

Use

Discover Available Data

Obtain Data

Visualize Data

Custom Geographies

Custom Derived Variables

Learn More

Credits

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

urbnindicators

Overview

Installation

Use

Discover Available Data

Obtain Data

Visualize Data

Custom Geographies

Custom Derived Variables

Learn More

Credits

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages