Title: | Quickly Find, Extract, and Marginalize U.S. Census Tables |
---|---|
Description: | Extracting desired data using the proper Census variable names can be time-consuming. This package takes the pain out of that process by providing functions to quickly locate variables and download labeled tables from the Census APIs (<https://www.census.gov/data/developers/data-sets.html>). |
Authors: | Cory McCartan [aut, cre] |
Maintainer: | Cory McCartan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.1 |
Built: | 2024-10-31 18:36:39 UTC |
Source: | https://github.com/CoryMcCartan/easycensus |
Tries environment variables CENSUS_API_KEY
and CENSUS_KEY
, in that order.
If none is found and R is used in interactive mode, will prompt the user for
a key.
cens_auth()
cens_auth()
a Census API key
This function uses fuzzy matching to help identify tables from the census which contain variables of interest. Matched table codes are printed out, along with the Census-provided table description, the parsed variable names, and example table cells. The website https://censusreporter.org/ may also be useful in finding variables.
cens_find(tables, ..., show = 4) cens_find_dec(..., show = 2) cens_find_acs(..., show = 4)
cens_find(tables, ..., show = 4) cens_find_dec(..., show = 2) cens_find_acs(..., show = 4)
tables |
A list of |
... |
Variables to look for. These can be length-1 character vectors, or, for convenience, can be left unquoted (see examples). |
show |
How many matching tables to show. Increase this to show more possible matches, at the cost of more output. Negative values will be converted to positive but will suppress any printing. |
The codes for the top show
tables, invisibly if show
is positive.
cens_find_dec("sex", "age") cens_find(tables_sf1, "sex", "age") # same as above cens_find_dec(tenure, race) cens_find_acs("income", "sex", show=3) cens_find_acs("heath care", show=-1)
cens_find_dec("sex", "age") cens_find(tables_sf1, "sex", "age") # same as above cens_find_dec(tenure, race) cens_find_acs("income", "sex", show=3) cens_find_acs("heath care", show=-1)
Currently used mostly internally.
Builds a Census API-formatted specification of which geographies to download
data for. State and county names (or postal abbreviations) are partially
matched to existing tables, for ease of use. Other geographies should be
specified with Census GEOIDs. The usgazeteer
package, available with
remotes::install_github("bhaskarvk/usgazetteer")
, may be useful in finding
GEOIDs for other geographies. Consult the "geography" sections of each API
at https://www.census.gov/data/developers/data-sets.html for information on
which geographic specifiers may be provided in combination with others.
cens_geo(geo = NULL, ..., check = TRUE, api = "acs/acs5", year = 2019)
cens_geo(geo = NULL, ..., check = TRUE, api = "acs/acs5", year = 2019)
geo |
The geographic level to return. One of the machine-readable or
human-readable names listed in the "Details" section. Will return all
matching geographies of this level, as filtered by the further arguments to
|
... |
Geographies to return, as supported by the Census API. Order
matters here—the first argument will be the geographic level to return
(i.e., it corresponds to the |
check |
If |
api |
A Census API programmatic name such as |
year |
The year for the data |
Supported geography arguments:
us
region
division
state
county
county_subdiv
(County Subdivision)
subminor_civil_division
(Subminor Civil Division)
place_remainder
(Place/Remainder (Or Part))
tract_part
(Tract (Or Part))
urban_rural
(Urban Rural)
block_group_part
(Block Group (Or Part))
block
tract
aian_area_part
(American Indian Area/Alaska Native Area/Hawaiian Home Land (Or Part))
block_group
(Block Group)
county_part
(County (Or Part))
place_part
(Place (Or Part))
place
consolidated_city
(Consolidated City)
alaska_native_regional_corporation
(Alaska Native Regional Corporation)
aian_area
(American Indian Area/Alaska Native Area/Hawaiian Home Land)
tribal_subdiv
(Tribal Subdivision/Remainder)
aian_reserve_stat
(American Indian Area/Alaska Native Area (Reservation Or Statistical Entity Only))
ai_tribal_subdiv_part
(American Indian Tribal Subdivision (Or Part))
ai_off_reserve_trust
(American Indian Area (Off-Reservation Trust Land Only)/Hawaiian Home Land)
tribal_census_tract
(Tribal Census Tract)
tribal_census_tract_part
(Tribal Census Tract (Or Part))
tribal_block_group
(Tribal Block Group)
state_part
(State (Or Part))
county_subdiv_part
(County Subdivision (Or Part))
tribal_subdiv_part
(Tribal Subdivision/Remainder (Or Part))
aian_reserve_stat_part
(American Indian Area/Alaska Native Area (Reservation Or Statistical Entity Only) (Or Part))
ai_off_reserve_trust_part
(American Indian Area (Off-Reservation Trust Land Only)/Hawaiian Home Land (Or Part))
tribal_block_group_part
(Tribal Block Group (Or Part))
msa
(Metropolitan Statistical Area/Micropolitan Statistical Area)
principal_city_part
(Principal City (Or Part))
metro_division
(Metropolitan Division)
msa_part
(Metropolitan Statistical Area/Micropolitan Statistical Area (Or Part))
metro_division_part
(Metropolitan Division (Or Part))
combined_statistical_area
(Combined Statistical Area)
combined_necta
(Combined New England City And Town Area)
necta
(New England City And Town Area)
combined_statistical_area_part
(Combined Statistical Area (Or Part))
combined_necta_part
(Combined New England City And Town Area (Or Part))
necta_part
(New England City And Town Area (Or Part))
principal_city
(Principal City)
necta_division
(Necta Division)
necta_division_part
(Necta Division (Or Part))
urban_area
(Urban Area)
urban_area_part
(Urban Area (Or Part))
consolidated_city_part
(Consolidated City (Or Part))
cd
(Congressional District)
sld_upper
(State Legislative District (Upper Chamber))
sld_lower
(State Legislative District (Lower Chamber))
alaska_native_regional_corporation_part
(Alaska Native Regional Corporation (Or Part))
zcta
(Zip Code Tabulation Area)
zcta_part
(Zip Code Tabulation Area (Or Part))
school_district_elementary
(School District (Elementary))
school_district_secondary
(School District (Secondary))
school_district_unified
(School District (Unified))
congressional_district_part
(Congressional District (Or Part))
school_district_elementary_part
(School District (Elementary) (Or Part))
school_district_secondary_part
(School District (Secondary) (Or Part))
school_district_unified_part
(School District (Unified) (Or Part))
voting_district_part
(Voting District (Or Part))
subminor_civil_division_part
(Subminor Civil Division (Or Part))
state_legislative_district_upper_chamber_part
(State Legislative District (Upper Chamber) (Or Part))
state_legislative_district_lower_chamber_part
(State Legislative District (Lower Chamber) (Or Part))
vtd
(Voting District)
ai_tribal_subdiv
(American Indian Tribal Subdivision)
puma
(Public Use Microdata Area)
A list with two elements, region
and regionin
, which together
specify a valid Census API geography argument.
cens_geo(state="WA") cens_geo("county", state="WA") # equivalent to `cens_geo(county="all", state="WA")` cens_geo(county="King", state="Wash") cens_geo(zcta="02138", check=FALSE) cens_geo(zcta=NA, state="WA", check=FALSE) cens_geo("zcta", state="WA", check=FALSE) cens_geo(cd="09", state="WA", check=FALSE) cens_geo("county_part", state="WA", cd="09", check=FALSE)
cens_geo(state="WA") cens_geo("county", state="WA") # equivalent to `cens_geo(county="all", state="WA")` cens_geo(county="King", state="Wash") cens_geo(zcta="02138", check=FALSE) cens_geo(zcta=NA, state="WA", check=FALSE) cens_geo("zcta", state="WA", check=FALSE) cens_geo(cd="09", state="WA", check=FALSE) cens_geo("county_part", state="WA", cd="09", check=FALSE)
Leverages censusapi::getCensus()
to download tables of census data. Tables
are returned in tidy format, with variables given tidy, human-readable names.
cens_get_dec( table, geo = NULL, ..., sumfile = "sf1", pop_group = NULL, check_geo = FALSE, drop_total = FALSE, show_call = FALSE ) cens_get_acs( table, geo = NULL, ..., year = 2019, survey = c("acs5", "acs1"), check_geo = FALSE, drop_total = FALSE, show_call = FALSE ) cens_get_raw( table, geo = NULL, ..., year = 2010, api = NULL, check_geo = FALSE, show_call = TRUE )
cens_get_dec( table, geo = NULL, ..., sumfile = "sf1", pop_group = NULL, check_geo = FALSE, drop_total = FALSE, show_call = FALSE ) cens_get_acs( table, geo = NULL, ..., year = 2019, survey = c("acs5", "acs1"), check_geo = FALSE, drop_total = FALSE, show_call = FALSE ) cens_get_raw( table, geo = NULL, ..., year = 2010, api = NULL, check_geo = FALSE, show_call = TRUE )
table |
The table to download, either as a character vector or a table
object as produced by |
geo |
The geographic level to return. One of the machine-readable or
human-readable names listed in the "Details" section of |
... |
Geographies to return, as supported by the Census API. Order
matters here—the first argument will be the geographic level to return
(i.e., it corresponds to the |
sumfile |
For decennial data, the summary file to use. SF2 contains more detailed race and household info. |
pop_group |
For decennial data using summary file SF2, the population group to filter to. See https://www2.census.gov/programs-surveys/decennial/2010/technical-documentation/complete-tech-docs/summary-file/sf2.pdf#page=347. |
check_geo |
If |
drop_total |
Whether to filter out variables which are totals across another variable. Recommended only after inspection of the underlying table. |
show_call |
Whether to show the actual call to the Census API. May be useful for debugging. |
year |
For ACS data, the survey year to get data for. |
survey |
For ACS data, whether to use the one-year or
five-year survey (the default). Make sure to check availability using
|
api |
A Census API programmatic name such as |
A tibble of census data in tidy format, with columns
GEOID
, NAME
, variable
(containing the Census variable code),
value
or estimate
in the case of ACS tables,
and additional factor columns specific to the table.
cens_get_dec()
: Get decennial census data.
cens_get_acs()
: Get American Community Survey (ACS) data.
cens_get_raw()
: Get raw data from another Census Bureau API. Output will
be minimally tidied but will likely require further manipulation.
## Not run: cens_get_dec("P3", "state") cens_get_dec(tables_sf1$H2, "state") cens_get_dec("H2", "county", state="WA", drop_total=TRUE) cens_get_acs("B09001", county="King", state="WA") ## End(Not run)
## Not run: cens_get_dec("P3", "state") cens_get_dec(tables_sf1$H2, "state") cens_get_dec("H2", "county", state="WA", drop_total=TRUE) cens_get_acs("B09001", county="King", state="WA") ## End(Not run)
For ACS data, margins of error will be updated appropriately, using
the functionality in estimate()
.
cens_margin_to(data, ...)
cens_margin_to(data, ...)
data |
The output of |
... |
The variables of interest, which will be kept. Remaining variables will be marginalized out. |
A new data frame that has had group_by()
and summarize()
applied.
## Not run: d_cens = cens_get_acs("state", "B25042") cens_margin_to(d_cens, bedrooms) ## End(Not run)
## Not run: d_cens = cens_get_acs("state", "B25042") cens_margin_to(d_cens, bedrooms) ## End(Not run)
Uses the same parsing code as that which generates tables_sf1 and tables_acs
See https://www.census.gov/data/developers/data-sets.html for a list of
APIs and corresponding years, or use censusapi::listCensusApis()
.
cens_parse_tables(api, year)
cens_parse_tables(api, year)
api |
A Census API programmatic name such as |
year |
The year for the data |
A list of cens_table
objects, which are just lists with four elements:
concept
, a human-readable name
tables
, the constituent table codes
surveys
, the supported surveys
dims
, the parsed names of the dimensions of the tables
vars
, a tibble
with all of the parsed variable values
## Not run: cens_parse_tables("dec/pl", 2020) ## End(Not run)
## Not run: cens_parse_tables("dec/pl", 2020) ## End(Not run)
Proportions and percent-change-over-time calculations require different standard error calculations.
est_prop(x, y) est_pct_chg(x, y)
est_prop(x, y) est_pct_chg(x, y)
x , y
|
An estimate vector. For |
An estimate vector.
x = estimate(1, 0.1) y = estimate(1.5, 0.1) est_prop(x, y) est_pct_chg(x, y)
x = estimate(1, 0.1) y = estimate(1.5, 0.1) est_prop(x, y) est_pct_chg(x, y)
A numeric vector that stores margin-of-error information along with it. The margin of error will update through basic arithmetic operations, using a first-order Taylor series approximation. The implicit assumption is that the errors in each value are uncorrelated. If in fact there is correlation, the margins of error could be wildly under- or over-estimated.
estimate(x, se = NULL, moe = NULL, conf = 0.9) is_estimate(x) as_estimate(x)
estimate(x, se = NULL, moe = NULL, conf = 0.9) is_estimate(x) as_estimate(x)
x |
A numeric vector containing the estimate(s). |
se |
A numeric vector containing the standard error(s) for the
estimate(s). Users should supply either |
moe |
A numeric vector containing the margin(s) of error. Users should
supply either |
conf |
The confidence level to use in converting the margin of error to a standard error. Defaults to 90%, which is what the Census Bureau uses for ACS estimates. |
An estimate
vector.
estimate(5, 2) # 5 with std. error 2 estimate(15, moe=3) - estimate(5, moe=4) estimate(1:4, 0.1) * estimate(1, 0.1)
estimate(5, 2) # 5 with std. error 2 estimate(15, moe=3) - estimate(5, moe=4) estimate(1:4, 0.1) * estimate(1, 0.1)
Format an estimate for pretty printing
## S3 method for class 'estimate' format(x, conf = 0.9, digits = 2, trim = FALSE, ..., formatter = fmt_plain)
## S3 method for class 'estimate' format(x, conf = 0.9, digits = 2, trim = FALSE, ..., formatter = fmt_plain)
x |
An estimate vector |
conf |
The confidence level to use in converting the margin of error to a standard error. Defaults to 90%, which is what the Census Bureau uses for ACS estimates. |
digits |
The number of dig |
trim |
logical; if |
... |
Ignored. |
formatter |
the formatting function to use internally |
Getter functions for estimate()
vectors.
The posterior::rvar class may be useful in handling standard errors for
more complicated mathematical expressions. This function assumes a Normal
distribution centered on the estimate, with standard deviation equal to the
standard error of the estimate. The posterior
package is required for this
function.
get_est(x) get_se(x) get_moe(x, conf = 0.9) to_rvar(x, n = 500)
get_est(x) get_se(x) get_moe(x, conf = 0.9) to_rvar(x, n = 500)
x |
An estimate vector. |
conf |
The confidence level to use in constructing the margin of error. |
n |
How many samples to draw. |
An estimate vector.
A posterior::rvar vector.
x = estimate(1, 0.1) get_est(x) get_moe(x) x = estimate(1, 0.1) if (requireNamespace("posterior", quietly=TRUE)) { rv_x = to_rvar(x) (rv_x^2 / rv_x) - rv_x # std. errors zero (correct) x^2 / x - x # std. errors not zero }
x = estimate(1, 0.1) get_est(x) get_moe(x) x = estimate(1, 0.1) if (requireNamespace("posterior", quietly=TRUE)) { rv_x = to_rvar(x) (rv_x^2 / rv_x) - rv_x # std. errors zero (correct) x^2 / x - x # std. errors not zero }
Contains parsed table information for the 2010 Decennial Summary File 1 and
2019 ACS 5-year and 1-year tables.
This parsed information is used internally in cens_find_dec()
,
cens_find_acs()
, cens_get_dec()
, and cens_get_acs()
.
For other sets of tables, try using cens_parse_tables()
.
tables_sf1 tables_acs
tables_sf1 tables_acs
A list of cens_table
objects, which are just lists with four elements:
concept
, a human-readable name
tables
, the constituent table codes
surveys
, the supported surveys
dims
, the parsed names of the dimensions of the tables
vars
, a tibble
with all of the parsed variable values
An object of class list
of length 83.
An object of class list
of length 848.
Some table labels are quite verbose, and users will often want to shorten them. These functions make tidying common types of labels easy. Most produce straightforward output, but there are several more generic tidiers:
tidy_simplify()
attempts to simplify labels by removing words common to all labels.
tidy_parens()
attempts to simplify labels by removing all terms in parentheses.
tidy_race_detailed()
creates logical columns for each of the six racial categories.
tidy_race(x) tidy_race_detailed(x, x2, x3) tidy_ethnicity(x) tidy_age(x) tidy_age_bins(x, as_factor = FALSE) tidy_income_bins(x, as_factor = FALSE) tidy_simplify(x) tidy_parens(x)
tidy_race(x) tidy_race_detailed(x, x2, x3) tidy_ethnicity(x) tidy_age(x) tidy_age_bins(x, as_factor = FALSE) tidy_income_bins(x, as_factor = FALSE) tidy_simplify(x) tidy_parens(x)
x |
A factor, which will be re-leveled. Character vectors will be converted to factors. |
x2 , x3
|
Additional character columns containing detailed information for certain variables (e.g. detailed race) |
as_factor |
if |
A re-leveled factor, except for tidy_age_bins()
, which by default
returns a data frame with columns age_from
and age_to
(inclusive).
ex_race_long = c("american indian and alaska native alone", "asian alone", "black or african american alone", "hispanic or latino", "native hawaiian and other pacific islander alone", "some other race alone", "total", "two or more races", "white alone", "white alone, not hispanic or latino") tidy_race(ex_race_long) tidy_age_bins(c("10 to 14 years", "21 years", "85 years and over")) tidy_parens(c("label one (fake)", "label two (fake)")) tidy_simplify(c("label one (fake)", "label two (fake)")) ## Not run: # requires API key d = cens_get_acs("B02003", "us", year=2019, survey="acs1") dplyr::mutate(d, tidy_race_detailed(dtldr_1, dtldr_2, dtldr_3)) ## End(Not run)
ex_race_long = c("american indian and alaska native alone", "asian alone", "black or african american alone", "hispanic or latino", "native hawaiian and other pacific islander alone", "some other race alone", "total", "two or more races", "white alone", "white alone, not hispanic or latino") tidy_race(ex_race_long) tidy_age_bins(c("10 to 14 years", "21 years", "85 years and over")) tidy_parens(c("label one (fake)", "label two (fake)")) tidy_simplify(c("label one (fake)", "label two (fake)")) ## Not run: # requires API key d = cens_get_acs("B02003", "us", year=2019, survey="acs1") dplyr::mutate(d, tidy_race_detailed(dtldr_1, dtldr_2, dtldr_3)) ## End(Not run)