DataFerrett Advanced Training Topics
- Data integration - dataset joining by common variable (Example 1)
- Multiple-variable recodes
- microdata (Example 2)
- aggregate data (Example 1)
- Multi-instance tabulations
- averaging across time (Example 2)
- time in spreadsheet (Example 2)
- Spreadsheet functions
- computational calculation (Example 1)
- summing calculation (Example 4)
- ranks (Example 4)
- if conditions (Example 4)
- using a single cell as part of the formula instead of entire row or column (Example 6)
- calculating medians (or any percentile) on-the-fly (Example 7)
- sorting (Example 2)
- changing order of operations (Example 4)
- create custom spanners (Example 4)
- clearing one dimension only (Example 3)
- advanced tables
- turning on and off totals/cross-variable explosion (Example 2)
- nesting / not nesting (multiple tables in one spreadsheet) (Example 2)
- hiding rows/columns (Example 4)
- viewing hidden rows/columns (Example 4)
- changing universes (modifying variables) from the spreadsheet (Example 3)
- view underlying records (Example 3)
- timeseries graph (Example 2)
- list mode (Example 5)
- navigation of hierarchical variables (e.g. naics. problem - LED and EWKS are embargoed. demonstration only)
- mapping (Example 3)
- points on a theme (Example 3)
- calculating medians on-the-fly (Example 7)
Data integration - dataset joining by common variable: Creating a Single Table using Mutiple Datasets
Other features illustrated: Aggregate data multi-variable recode, computational calculation
Background: Different datasets may have variables whose value sets are
defined alike(comparable). Examples include demographic variables (e.g.
gender, race, marital status) and geographic variables (e.g. FIPS
state, FIPS county). Co-tabulation allows you to tabulate information
from the different datsets side by side using the comparable variable
as a hinge (one of the dimensions).
Goals: 1) Estimate the number of people per gas station in each
county in Maryland. 2) Add the number of households without vehicles to
Multi-variable recode for microdata
Other features illustrated: Using percent buttons, sorting on a
column, creating a timeseries graph from a table cell(s), creating a
table that averages across time, creating a table that shows data over
time, turning off/on Automatic Totals, cross-variable explosion,
nesting vs. not nesting
Background: In microdata datasets, you can create new variables
with values that are based on the combinations of multiple variables'
Goals: 1) Determine the percentage of people with and without
health care coverage by age groups. 2) Examine health insurance
coverage numbers over time.
Mapping address points on a thematic map
Other features illustrated: Viewing underlying records, clearing one dimension, changing universe from the spreadsheet
Background: If a dataset has address point information that has
been defined correctly, points can be mapped on top of a thematic map
from a separate dataset.
Goals: 1) Show the locations of all the public schools in DC on top of
a thematic map showing the median household income by tracts from SF3.
2) Show how you can view the underlying records from the spreadsheet to
see all the information for the universe of the selected cell(s).
Spreadsheet functions - Ranking, weighted ranking, order of formula processing.
Other features illustrated: Summing calculation, hiding columns,
creating column spanners, viewing hidden columns, IF conditions
Background: This example shows how you can use ranking and
apply weights to those ranks in order to answer questions like "What is
the best county for ...?"
Goals: 1) Create a table that will rank the best county in the
US in which to live, based on our 3 criteria - commute time, median
household income, and median housing value. 2) Limit our rankings to
counties that meet our population size criteria - between 50,000 and
List mode - showing a list of only those records meeting a set criterion
Other features illustrated: Advanced SQL
Background: You can have your table only show results for the
records that meet your criteria. For example, if you want to see all
the counties within a select number of states that have a population
within a certain size, you can get those listed in a table and include
additional information from the dataset.
Goals: 1) Create a listing of all the counties in Alabama,
Georgia, and Florida that have a population of less than 75,000 people.
2) Also show the actual population estimate and the median household
income for those counties.
Using a single cell as part of a formula instead of the entire row or column
Other features illustrated: Column number formatting (decimals shown)
Background: The DataFerrett
spreadsheet typically does row and column calculations since
calculations normally apply to all cells in the row or column equally.
However, sometimes you want to create a calculation that uses one
specific cell as part of the formula and not the entire column or row.
Goal: Create a table that calculates the percentage of drivers within
commute time ranges for a set of counties, while still showing the
numbers of drivers within each commute time.
Calculating medians (or any percentile) on-the-fly
Other features illustrated: Column formulas and IF conditions
Background: The DataFerrett spreadsheet allows you to calculate medians (or any percentile)
on-the-fly by creating your own bins (microdata) or utilizing income
distribution counts (aggregate data), and using standard formulas to
calculate the linear percentile you have defined within a given
Goal: Create a table that calculates the median income for age
groups (for the population 15 and over) using the March CPS supplement
(Annual Social and Economic Supplement).