GIS Raster/Vector Data Structures, Database Management Systems, Database Organization

K Brooks

 


Data Structures

All Spatial features recorded as Geographic Primitives with several primary characteristics.

Points (0-D. no length or width);

Lines (1-D, length, no width);

Polygons/areas (2-D, length and width / area and perimeter);

Surfaces (3-D Areas with Z dimension);

Two Major Types of Spatial Data Representations of these primitives, raster and vector.

 

Raster/Grid/Image

Raster: matrix of cells (pixels) referenced by row/column, stored as a matrix or array;

Raster in Cartersian Coordiate Graph

(Example from Getting Started with ArcGIS, ESRI 2002)

Raster: For geo-referenced rasters, every cell represents a given area on the ground (resolution). The smaller the area the cells represent, the larger the data set size for a given area.

Raster: Cell values represent nominal, ordinal, or continuous data. Numbers in cells can be integer or floating point. 

Raster: Attributes are the data set.

Raster Example in color    Raster Example: Infrared Aerial Image  Raster Example: Elevation Data

(Examples from Getting Started with ArcGIS, ESRI 2002)

Raster: To the computer, this:

Consists of this:

Raster: In the ArcGis grid data model, data tables can store additional information about nominal/categorical data, in the
Value Attribute Table (VAT).  VATs store information about the categories, not about individual cells:

Example from Getting Started with ArcGIS, ESRI 2002)

Raster Coding/Representation of Geographic Primitives

Points: single cells, unique/known values;

Lines: strings of cells with common values;

Polygons/areas: groups of cells with common values;

Surfaces: cells represent real or virtual elevations;


 

Vector

Vector: discrete Cartesian x,y coordinates;

Vector: Vector objects on Cartesian Graph.

(Example from Getting Started with ArcGIS, ESRI 2002)

Vector: sizes of lines or areas vary, as they trace surface phenomena.

Vector: data stored as pairs of x,y coordinates, usually with ID numbers; data typically stored in separate data tables.

Vector: In ArcGis, except in polygon coverages, the data tables contain exactly as many records as there are unique features in the data set.

   

(Examples from Getting Started with ArcGIS, ESRI 2002)

 

Vector Coding/representation of Geographic Primitives

Points: id, x, y;

Lines: id, x1,y1 ... xn, yn;

Polygons: id, x1,y1 ... xn, yn, where xn=x1, yn=y1 (closed);

Surfaces: represented by Triangulated Irregular Networks (TINS):

(Example from Getting Started with ArcGis, ESRI 2002)

 


Feature Data Formats and Characteristics in ArcGIS

In ArcGis spatial and attribute data may be stored in several formats.  These formats have evolved in synch with the evolution of ESRI's GIS products.  [Other data formats exist, these are discussed in a later topic].

Coverage Data Model

Traditional/original Arc/Info data model.

Primary features are points, lines and polygons.

Special point types include: label points and nodes.

Composite features: Linear composites are routes/sections and polygonal composites are regions.

Secondary features: annotation, tics,  and links.

Coverages are file based, consisting of multiple files housed in a system folder (subdirectory). The folder name == the coverage name.

Coverages exist in Arc/Info workspaces (subdirectories) which are characterized by the presence of an INFO folder. The examples below show system and ArcCatalog views of a workspace.

      

(Examples from Getting Started with ArcGis, ESRI 2002)

Handling Coverages: NEVER use standard Windows tools to copy a coverage; Use the ArcCatalog (or other ArcGis tools). [If you are careful you can copy an entire workspace with windows commands].

 

Shapefile Data Model

The shapefile format was originally associated with the ArcView GIS software. 

Primary features are points, lines and polygons. These may be simple or multi-part.

Handling Shapefiles: A feature dataset in shapefile format consists of 3 or more files with the same name and different extensions. (See examples below showing system and ArcCatalog views).  If system commands are used to copy a shapefile, be sure to include all the files.  ArcCatalog is the safest means to copy shapefiles.

    

(Examples from Getting Started with ArcGis, ESRI 2002)

Geodatabase Data Model

Geodatabases are the latest ESRI data format. The implement an "object-oriented"  GIS data model.

In the geodatabase, each vector feature receives a row in a data table; the vector shape is stored in the shape field, and attributes are in other fields.  Each data table stores a feature class.

Geodatabases store multiple features, rasters, other data tables and references to yet others.

A single geodatabase file can store multiple features and objects. Geodatabase files are Database Management System files.

Primary features are points, lines and polygons, simple and multipart. Special features (and relationships between features) can be designed.

Special features: Point domain: points with special references or behaviors, simple or complex network junctions. Lines can be traditional x, y coordinate lines, or computed lines such as Bezier curves. Lines can also be simple or complex network "edges."

Polygons consist of the line types noted above, and may also implement complex relationships of behaviors. Multiple representations can now be arranged thematically (e.g., points, lines and polygons describing local hydrological features). Essentially the item to be mapped can be paramount, as opposed to the GIS data model.

Handling geodatabases: These can be single DBMS files.  The system shows the DBMS file name, while ArcCatalog show the contents of the file:

 

(Examples from Getting Started with ArcGis, ESRI 2002)


Topology

Vector Data can explicitly represent spatial relationships through Topological Coding.

Coverage Topology in Arc/Info: In the coverage data model, 'Planar Topology' is strictly enforced.  Its characteristics:

  • Strings of points from lines;
  • Lines (arcs) must begin and end with Nodes;
  • Nodes must exist where any two lines cross;
  • Nodes are numbered and coded as from-nodes and -to-nodes;
  • Arcs join at nodes: for networks from-to coding allow us to describe connectivity ;
  • Arcs are numbered, and joined at nodes to create polygons;
  • From-to coding of the arcs allows coding of spatial relations between polygons ( adjacency, enclosure).

Shapefile Topology in Arc View and ArcGis 8+:

  • ArcView can represent topology by ordering vertices in rings in clockwise (known) order (Theobald). Computed as needed, not a permanent aspect of the data as in the case of coverages;
  • In shapefiles planar topology is not enforced: Be careful for gaps, overlaps and so on -- use snapping, vertex editing and so on to eliminate those you do not intend.

Geodatabase Topology in  and ArcGis 8+:

  • Arc 8's new GeoDataBase model: ".. allows you to define relationships between objects, together with rules for maintaining the referential integrity between objects" (Arc GIS Help).
  • Can represent shared geometry between features in a feature class or across feature classes.
  • Create planar topologies (areas/adjacency) or network topologies (lines/connectivity).
  • Consists of nodes, edges and faces:

(Example from Getting Started with ArcGis, ESRI 2002)

 


Advantages/Disadvantages/Appropriate Use of Data Types
(PA Burrough 1986)

Vector:

+ Good representation of phenomenological data;

+ Compact;

+ Topology can be completely described;

+ 'Accurate' graphics;

+ Retrieval, update & generation of graphics possible;

- Complex structure;

- Combination of overlays creates difficulties;

- Simulation difficult (non-uniform sizes);

o Display & plotting can be $expensive;

o Technology relatively expensive;

o Spatial analysis/filtering not possible.

Raster

+ Good Representation of Continuous Phenomena (e.g., elevations, reflectance);

+ Simple structure;

+ Overlay, & combining with RS data is easy;

+ Various spatial analyses easy;

+ Simulation easy (uniform size)

+ Technology is inexpensive, dynamic development is easy;

- Data sets can be quite large;

- Use of large cells introduces imprecision, mixed (partial) cells;

- Crude raster maps not aesthetically pleasing;

- Network (topological) linkages difficult to establish;

- Spatial relationships implicit rather than explicit;

- Some operations CPU intensive (e.g., projection, re-sampling).


 

Comparison table of two data structures and their applications

Vector (inventory)

Raster (analysis)

Lines real

Lines artificial (pixels)

Data known (pre-interpreted)

Data Probabilistic

Descriptive Inquiries

Prescriptive Analysis

Computer Mapping

Spatial statistics

Spatial DBMS

Modeling

Adapted from J. Berry, 1993, Beyond Mapping, Table 2.1, page 11.


Which structure is best to use? The most appropriate is best. (J. Berry).

 

 


Database Management Systems

Computer-based systems for creation, manipulation, management and update of data;

 

Benefits & Liabilities of Data Base Management Approach

+ Central Control;

+ Data independence;

+ Easier Implementation of New Applications;

+ Direct User Access;

+ Control Redundancy;

- $$$ (Expense);

- Complexity;

- Centralized Risk.

Modern GIS Links geography and attributes .

  • Arc/Info: Links ARCS with geography using INFO (or other) DBMS.
  • ArcView: Can access Arc/Info data, plus employ DBASE format files directly.
  • DBMs are central to GIS operation, supporting both graphical and logical query , attribute creation, update, reporting and statistics. Even more so at ArcGIS and ArcView version 8.
  • Basic Database Components

  • Field/variable/attribute/column: description of spatial data item;
  • Fields contain instances of the possible attribute values;
  • The data in fields may be coded as integer (I), floating point (F), or character (A) (aka alphanumeric or string).
  • Record: A row in the data base, representing a discrete spatial object; in the database, each object gets a row and each row/record is comprised of attributes (aka fields).
  • Types of Database Management Systems

    Flat File: simplest, most common system type found in PC environment;

    Hierarchical: structured as a hierarchy, need to navigate up and don thru hierarchies;

    Network: Hierarchical with links across levels;

    Relational; most common DBMS used in GIS: INFO, Oracle, Sybase are examples;

    Relational: normalization reduces redundancy, facilitates updates; successful implementation depends upon key codes, where each object (row) contains at least one unique identifier, with one-one relation to the geography.

     


    GIS Database Organization

    Map/Coverage Organization

    Arc/Info coverages are physically directories on the computer;

    Each directory has the name of the coverage; In the directory are multiple files which together constitute the coverage;

    Arc/Info coverages must reside in an Arc/Info workspace, characterized by presence of an INFO subdirectory;

    ArcView can access this data structure; ArcView coverages (shapefiles) consist of two shapefiles (name.shp, name.shx), and a DBASE file (name.dbf);

    Other GIS systems adopt similar strategies: most use more than one file to contain a coverage.

    Data Base / Study Area/ Multi-Coverage Organization

    Depending upon the scale and extent of a study area, GIS data bases may be organized into either a single, seamless coverage, or into multiple, spatially integrated coverages;

    Multiple maps for a single study area are sometimes termed tiles;

    Digital Map libraries may consist of multiple tiles, each containing multiple thematic layers;

    Higher-end software (Arc/Info) has optional software to assist in management of map libraries (Arc Librarian, Arc Storage Manager (STORM), Arc SDE, (Spatial Database Engine), allowing high-end DBMS tolls to be employed.


    Return to LA 467 page.