Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets Diverse elevation point data: density, distribution, accuracy
Photogrammetry 0.76m v. accuracy (5ft contours) 1974 1995 Lidar 0.15m v. accuracy; altitude 700m and 2300m
1999 2001 1998 RTK-GPS 0.10m v. accuracy
2004 Constructing Digital Elevation Models Grid DEM: Elevation stored at uniform grid points TIN: Triangulation; elevation stored at vertices Contours Maps: Iso-contour lines at regular intervals
Natural feature extraction Maps of topo parameters: computed simultaneously with interpolation using partial Ma derivatives of the RST function, terrain features: combined thresholds of topo parameters Extracted foredunes 1999-2004 using profile curvature and elevation threshold
Tracking Evolving Features slip faces: slope > 30deg 1995 new slip face in 1999 2001
dune crests: profile curvature threshold how to automatically track features that change some of their properties over time? Terrain Analysis Flow Analysis
Watershed hierarchy Visibility Navigation Challenge: Massive Data Sets LIDAR NC Coastline: 200 million points over 7 GB
Neuse River basin (NC): 500 million points over 17 GB Applachian Range: 50GB-5TB Output is also large 10ft grid: 10GB 5ft grid: 40GB Approximation Algorithms
Exact computation expensive Many practical problems are intractable Multiple & often conflicting optimization criteria Suffices to find a near-optimal solution Tunable Approximation algorithms Tradeoff between accuracy and efficiency User specifies tolerance
I/O-Bottleneck Data resides in secondary memory Disk access is 106 times slower than main memory access Maximize useful data transferred with each access Amortize large access time by transferring large contiguous blocks of data
head I/O not track CPU is often read/write the bottleneck for processing massive data sets read/write arm magnetic surface
I/O-efficient Algorithms [AV88] Traditional algorithms optimize CPU computation Not sensitive to penalty of disk access Disk
I/O model Memory is finite Data is transferred in blocks (B: block size) Complexity: #disk blocks transferred (#I/Os) B RAM
CPU B ~ 2K-8K External Memory Algorithms [Vitter] M The TerraStream Modules
TIN DEM Level Sets, Contours Contour Maps Contour Maps
Usage of contour lines (also called iso-contours, isogons, etc) goes back to at least 17th century Philosophical Transactions of Royal Society of London, 1779 Computing Contour Map: Internal Memory Algorithm Find a seed point on each contour and traverse the triangulation to trace each contour
Use a simple data structure (e.g. contour trees) to compute seed points Query time: O(log N + T) T: #contour edges Contour map: O(Nlog N +T) T: #contour map edges For massive terrains I/O efficiency is bad: O(N+T) instead of O((N+T)/B)
Storing a TIN on a disk so that it can be traversed with as few I/Os possible Is there a good ordering of the triangles? Our results Ordering Theorem: A total ordering, called C-ordering, of triangles can be computed in O(Sort(N)) I/Os s.t. the subsequence of triangles
intersecting a contour appears along the contour and contours in a level set are broken in nested order. Individual contours can be retrieved in O(T/B)I/Os from this ordering Computing contour maps: O(Sort(N)+T/B) I/Os Answering contour queries Preprocessing Time: O(Sort(N)) I/Os Space: O(N/B) disk blocks Query: O(logBN+T/B)
Height Graph H*: Dual graph of H Critical Points maximum
saddle minimum regular Simple Terrains
Positive & Negative Saddles Cut Trees Surgery on Terrain Surgery & Contours
Nesting of Contours [ [ [
] [ ] ] ] Ongoing Work Compute C-ordering for general 2manifolds
Maintain C-ordering for hierarchical representation of terrains Denoising contours Collaborators Lars Arge
Helena Mitasova Andy Danner Thomas Molhave Sponsored by
Army Research Office W911NF-04-1-0278 Bardia Sadri Ke Yi I/O-Efficient Algorithms
Answering a contour query: Preprocessing O(NlogBN), Space: O(N/B) blocks Query: O(logBN+T/B) Front-ends