Background for High Performance
Astronomical Data Processing
Astronomical Data Reduction
The aperture synthesis technique makes it possible to
achieve high resolution without having to build radio
telescopes tens of kilometers in diameter. An array of a
large number of antennas all interconnected as
interferometers measures simultaneously a large number of
spatial frequency Fourier components; the radio image (with
an angular resolution determined by the separation of the
individual antennas) is then formed in a digital computer by
Fourier transformation of the observed visibilities. Because
the images produced by a synthesis array are formed in a
digital computer by Fourier transformation of the observed
visibilities, the computer system is an integral part of the
synthesis array telescope - its image-forming element. The
computational requirements are not driven solely by the
single FFT of the observed visibilities required to produce
an image, but by algorithms developed to overcome two
factors which greatly degrade performance. Sparse sampling
of the aperture plane results in strong sidelobes
(i.e., large spurious responses away from the principal
maximum); this instrumental signature may be removed by
computationally intensive non-linear deconvolution
techniques. Time varying systematic errors due to
instrumental and atmospheric instabilities also degrade
image quality; a powerful but computationally intensive
method called self-calibration of correcting for rapid
atmospheric and instrumental distortions has been developed.
Together, deconvolution and self-calibration techniques can
provide several orders of magnitude improvement in the
fidelity of images produced by radio synthesis arrays.
Many of the most computationally intensive observations
are spectral line observations, where hundreds or thousands
of images of the same region are to be obtained
simultaneously, each at a slightly shifted frequency.
Because each of these images may be treated as independent
from the others, this is a completely parallelizable
problem. The solution is simply to have each of the
available processors handle the image construction for one
of the spectral line channels. If the number of processors
is less than the number of channels, a scheduling system can
keep processors supplied with new data until all spectral
channels have been processed. The separate results can then
be combined into a single data cube for visualization and
analysis.
IMAGER: A Prototype Spectral Line Imaging System
The first step of the high performance Astronomical
Image Processing was to develop a prototype system
within the radio synthesis array software system SDE (Software
Development Environment). The SDE code was implemented in
a parallel fashion with the IMAGER package.
IMAGER
documentation (PostScript,
169 kB) is available. Code for gridding observed
visibility data, Fourier transforming the data into the
image plane, and performing a non-linear deconvolution
has been implemented.
The IMAGER package uses the miriad user interface to
the available tasks which are set up as a pipeline.
The pipeline of tasks is carried out separately for
each channel. Individual channels are sent to
separate processors and for concurrent operation.
There is some time spent in a serial setup (usually
done only once) and beyond that the number-crunching
tasks are parallelized with nearly linear speedup with
the number of processors. Currently, the IMAGER
package is installed on the NCSA SGI systems.
|
The
SGI Origin 2000 at NCSA
|
The underlying 2-dimensional data reduction programs
that are used to carry out the processing of
individual channels are from SDE. In SDE the
visibility and image data is assumed to fit into the
virtual memory. IMAGER was created to run optimally
on the parallel SGI systems at NCSA. The total
physical memory of the Origin2000 is 4 - 64 GB
(depending on the machine); thus almost
all problems will fit into physical memory and will
not require the program swapping to disk. SDE
requires visibility data to be in UVFITS format with
each channel in a separate file. Additionally,
multiple pointing data sets have all pointings in a
special ``mosaic'' database. Thus, functionality for
data conversion is included in the IMAGER package.
The IMAGER system has been used by astronomers at the
University of Illinois and visiting astronomers to
carry out analysis of data from the VLA and BIMA
telescopes. In all projects, there is an initial
overhead for the conversion into the proper data
format. The worst case is that of multiple-field
spectral line data, in which initially data for all
channels of a single pointing are in separate files
(one file for each pointing). The data conversion
stage separates each pointing and channel and
recombines the pointings together, resulting in a
separate file for each channel. In the worst case,
the overhead may be 50% of the total execution time.
However, the IMAGER package is intended to give
astronomers the power for iterative data reduction;
thus in all real cases this step is done only once and
subsequent steps (those optimized for parallel
execution) are carried out repeatedly.
The IMAGER wrapper and SDE binaries were compiled on
the SGI Cray Origin2000 with the following
hardware/software.
SGI Cray Origin2000 Hardware information:
FPU: MIPS R10010 Floating Point Chip Revision: 0.0
CPU: MIPS R10000 Processor Chip Revision: 2.6
32 195 MHZ IP27 Processors
Main memory size: 4096 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 4 Mbytes
Operating system:
IRIX64 6.4
Compiler version:
MIPSpro 7.1 compiler
Compiler flags:
-Ofast=IP27 -mips4 -64 -static -Ofast -IPA
Performance Testing:
Our test problem was the imaging and CLEAN
deconvolution of a single pointing observation of the
molecular gas (the CS line) associated with the
``sickle'' HII region near the Galactic center. The
data were acquired with the BIMA interferometer. The
visibility data set was about 300 MB and the output
images were 256 x 256 pixels x 100 channels. All
timing tests were carried out after the initial
conversion of data formats. The number of threads
were varied from 1 to 16. A speedup
graph is available.
|