2. Background on Shapefiles and Surrogates

2.1 Shapefiles

Shapefiles contain geographic data objects in the form of points, lines, or polygons with associated informational attributes. Shapefiles are a GIS industry standard format developed by ESRI (see http://www.esri.com/) and are a primary type of input data set for the Spatial Allocator software.

Each shapefile consists of a group of files. The files that constitute the "shapefile" contain the coordinates of the shapes and the attributes associated with each shape (e.g., population, number of households, area). Each shapefile includes a dBASE format (.dbf) file for attributes, and a group of constituent files including .shp and .shx files. The .shp files store the coordinates of the shapes. Each constituent file and the .dbf file have the same file name up to the last "." (e.g., states.shp, states.dbf, states.shx).

Each shapefile stores data for a single feature class (i.e., points, lines, or polygons). Point shapes are single-coordinate features such as stacks, schools, churches, or monuments. Line shapes can be continuous lines, such as fault lines on a map; they can also be polylines, like branches of a river. Shapefiles have a nontopological vector data format. Since there are no topological relationships between lines in a shapefile, they are not associated with any points or polygons. Polygon shapes can be simple areas or multipart areas like states and counties. Polygon shapes can overlap, but the topological relationship of this overlap is not stored in the shapefile.

2.2 Spatial Surrogates

The programs that constitute the Spatial Allocator can generate spatial surrogates, convert shapefiles from one map projection to another, filter shapefiles, merge surrogates, perform gap filling of surrogates, and map data from one spatial resolution to another (e.g., from grids to counties). Spatial surrogates are most commonly used to map county-level emissions data onto the rectangular grid cells used by an Eulerian air quality model such as the Community Multiscale Air Quality (CMAQ) modeling system (see www.cmascenter.org). A spatial surrogate is a value greater than zero and less than or equal to one that specifies the fraction of the emissions in the county that should be allocated to a particular grid cell. Typically, some type of geographic attribute is used to "weight" the county attributes into grid cells in a manner more specific than a simple uniform spread over the county. Some examples of weights are points that represent ports, the population in a census tract polygon, and lines representing the location of roads or railroads. A single surrogate value for county C and grid cell GC is expressed as:

Surrogate fraction equation

where srg = surrogate, C = county, and GC = grid cell. Note that a surrogate does not have to be for a county; it could be for some other geographic region such as a state, nation, or census tract. These polygons are known in the program as the "data polygons."

Three types of surrogates can be created with this program: polygon-based, line-based, and point-based. Polygon-based surrogates use attribute information that is based on area (e.g., population in a census tract). The surrogate value is calculated as the ratio of the attribute value in the intersection of the county and the grid cell to the total value of the attribute in a specific "data polygon" (e.g., county, state, census tract). Examples of polygon-based weight attributes are area, population, number of households, and land use. The numerator (i.e., the value of the weight attribute in the intersection of the county C and grid cell GC) and the denominator (i.e., the total attribute value in the county or other data polygon) are calculated according to the following equations:

Numerator and denominator or the summation over weight polygons

where srg=surrogate, C = county (or other data polygon), GC = grid cell, WP = weight polygon, int. = intersection of, i = weight polygon, and n = the number of weight polygons.

For line-based surrogates, the length of the linear weight feature (e.g., railroad, river, road) replaces area in the above equations. For point-based surrogates, instead of using area, the software will allocate a value of 1 if the weight point falls within the region of interest or a value of 0 (zero) if it does not. In some cases, no special weight attribute is desired. Instead, the surrogate is based purely on the area of the polygon, the length of the polyline, or the count of the points. This mode is supported by the Spatial Allocator. In this case, the above equations simplify to the following:

Simplified numerator and denominator


To Section 3.1: Download and Installation