The Ninth Bay Area Discrete Math Day
Fall 2004
October 30, 2004, Time: 3:30-4:00PM, Location: 60 Evans Hall, UC Berkeley





Algorithms For Finding Optimal Partitions of Astronomical Data

Bradley Jackson

San Jose State University




Abstract

In this talk I will discuss some algorithms for analyzing the density of point data in n-dimensional space. Applications include time-series analysis, finding optimal histograms with unequal bin sizes allowed, cluster analysis, density estimation, signal processing, and image processing. We assume that the data space is partitioned into data cells (often the voronoi diagram of the data points). We partition the data cells into blocks (unions of data cells) and for each such partition consider a function that is constant on each block. Using Bayesian statistics, each partition of the data is assigned a value corresponding to the likelihood that the corresponding function fits the data. Our goal is to find the optimal partition of the data for which the maximum value is attained. The value functions we use satisfy the additive property, that is, the value of a partition is equal to the sum of the values of its blocks. Thus any subpartition of an optimal partition is an optimal partition of the region that it covers. This is known as the principle of optimality. Using the principle of optimality we are able to use dynamic programming to find the optimal partition of N data cells on an interval, in O(N^2) time. The value functions we use are also strictly convex. This leads to a property that we call the intermediate density property. Using this property we are able to find the optimal partition of N data cells into arbitrary blocks (not necessarily connected) also in O(N^2) time. This algorithm works for data in any dimension, by reducing the problem down to an equivalent 1-dimensional problem. Using this algorithm we obtain a branch and bound algorithm for finding the optimal partition of N data cells into connected blocks, which also works for data in any dimension. We don't yet know if there is an efficient algorithm for finding the optimal partition of a set of data cells into connected blocks that works for data in dimensions higher than 1.