NOAO is operated by the Association of U niversities for Research in Astronomy (AURA), Inc. under cooperative agreement with the National Science Foundation
Draft date: November 6, 1996
This section presents a brief overview of the data reduction software for the NOAO CCD Mosaic. It concentrates on describing the general data flow and the main data reduction software components. The details of each component are given in later sections. The figure below illustrates the components and data flow with the data reduction components highlighted.
The data acquisition system (DAS) sends pixel and descriptive information down the message bus. Various components receive the information from the message bus. Two major components are the real time display (RTD) and the data capture agent (DCA). The DCA writes the information for the observation to a FITS Multiextension file (FITS-ME) in the Mosaic data format. It also sends a message to the data reduction agent (DRA) as each observation file is completed.
The data reduction agent automatically (or by user command) operates on the observation files. Basic calibrations are applied by CCDPROC. This removes detector biases and defects. Another basic calibration applies a world coordinate system (WCS) calibration prepared earlier from a standard astrometry field. This is done by MSCWCS. The basic WCS calibration defines a fairly accurate mapping between pixels and celestial coordinates. If a catalog of sources for the field of observation is available MSCWCS finds the sources based on the WCS calibration and updates the WCS for small errors. The DRA may automatically obtain source information for the field from on-line catalogs.
Calibration observations, such as flat fields, generally include many exposures to minimize noise. The multiple exposures are combined by COMBINE to create master calibrations to be applied to the science observations. The DRA automatically senses sequences of calibration exposures and combines them once the sequence is finished. It also keeps track of the master calibrations and applies the appropriate one to new science exposures. Science exposures are generally not combined by the DRA since these are usually dithered or rastered.
In the basic processing the individual amplifiers and CCDs are kept separate except that multiple amplifiers from a single CCD may be merged together into a single extension for the CCD. The user may analyze the calibrated exposures keeping the readouts separate. This avoids any resampling of the pixel data. However, the user may wish to resample the elements of the Mosaic into a single large image. MSCIMAGE uses the WCS to resample the pixels into a uniform grid on the sky. This corrects alignment errors in the detector and optical distortions.
When multiple exposures of a field are taken the images from MSCIMAGE are combined with COMBINE. This would include offsets for dithering and raster patterns. To avoid resampling the data a second time MSCIMAGE produces images which are sampled on an even grid of pixels on the sky. This means that multiple exposures can be combined using integer shifts along the raster axes.
For this to work well the WCS used by MSCIMAGE must be consistent over the data set. There are two ways in which this is assured. One is if the WCS has been absolutely calibrated by MSCWCS using catalog sources. If one does not have a catalog of sources to determine an absolute WCS calibration then the objects in the images may be used to derive a self-consistent WCS over a set of overlapping images. The objects in each image are cataloged by an automatic detection algorithm. Each object will have a coordinate based on the approximate WCS calibration. The objects in the catalogs are matched using an automatic matching algorithm. The WCS coordinates of matched objects are adjusted to define a consistent WCS for all the images from the field. In effect this registers objects in the images so that when the images are combined the common objects will be aligned. The task that does this is MSCREGISTER.
In addition to calibrating, registering, and combining the pixel data the data reduction software also creates and maintains auxilary data. This includes bad pixel masks, uncertainty arrays, and exposure maps.
The NOAO CCD Mosaic data format consists of a single FITS file for each observation. The FITS file contains a primary header with no associated data and a number of extensions. The primary header is used to described the contents of the file and contains global keyword information applicable to all the image extensions. The extensions include the image data from each amplifier, pixel masks, uncertainty arrays, exposure maps, auxilary tables, etc. The image extensions are always present while the other information is added at various stages during the reductions.
The following figure illustrates the data structure. The PDU stands for the primary data unit. The figure also shows how the inheritance convention defines the header for each image extension as the combination of the global keywords and the keywords for each individual image header.
A detailed description of the NOAO image format including the keyword definitions is given in NOAO Image Data Structure Definitions.
Pixel masks assign an non-negative integer value to each pixel in an image. The meaning of the mask value depends on the purpose of the mask, there may be more than one assigned to an image, and the application that will use it. Because it is often the case that most pixels have the same mask value IRAF provides a special representation called a "pixel list". The representation is very compact. One of the issues still to be resolved for the Mosaic data format is how pixel lists will be stored as a FITS extensions. Use of the IMAGE extension clearly defeats the purpose of the compact list.
Because the pixel list format is so compact it will also be used to represent some real values associated with the pixels. This is appropriate when there are only a few values that have large regions of constant value or when a range of values can be mapped to a set of discrete values with some desired precision. The mapping will often be linear, comparable to the FITS mapping of real values to integers using BSCALE and BZERO, though non-linear maps may also be used.
The types of integer pixel masks being considered are:
The pixel masks that will definitely be associated with the NOAO CCD Mosaic data are bad pixel masks and uncertainty values. When combining multiple calibrations or dithered exposures there will also be an exposure map. Specific extension names will defined for these associated data.
The bad pixel mask will identify good and bad pixels. The proposed values for the mask are:
A very important aspect of the image data is the uncertainties. Many of the concepts are reasonably well understood such as the characterization of the uncertainties in the raw CCD data in terms of a readout noise and Poisson statistics and how uncertainties are propagated when combining pixels with independent errors. Others are less well understood such as what happens with resampling. The biggest dilemma has been how to maintain the uncertainty information without doubling the data volume by using an associated data array of uncertainty values of the same size as the image data. The NOAO CCD Mosaic Software Project provides an opportunity to address the question of uncertainties. In terms of the data structure we need something that will be compact yet offer the flexibility to characterize the uncertainties of each pixel.
The model we propose for CCD uncertainties is
V(i,j) = A + (B + I(i,j))) * f(U(i,j)) (1)where V(i,j) is the variance (sigma squared), I(i,j) is the data, A and B are constants, U(i,j) is an array of values, and f is a mapping function. In order to provide a compact description U(i,j) is represented as a pixel list of integers which, hopefully, have large regions of constant value. The use of integers means that the variances will be quantized at some precision. The mapping function f can be defined to adjust the resolution at different levels. Note that there is already a mapping relative to the pixel sigmas because of the definition in terms of the variance.
This model allows easy propagation of errors in the common cases. The A value is a constant noise term. Typically this would be the CCD readout noise. When adding or subtacting two images corresponding A terms add. The B term is used when adding or subtracting constant values from images. For raw CCD data this value is zero.
The usefulness and compactness of this model, that is how well the idea of largely constant areas in the U array will work in practice, still needs to be investigated. Preliminary experiments show promise that this approach will work effectively.
The problem of the storage format for the U pixel list is essentially the same as that for pixel masks. As with the masks the format in a FITS file is still to be determined.
The data reduction agent (DRA) provides pipeline data handling of the observational data. Its functions are
The DRA is a continuously running event-driven process. The events which trigger the above functions are
The first case provides automatic processing and archiving. The second case allows the user to perform manual calibrations or initiate recalibrations of the automatic processing. Reprocessing would be done when additional or improved calibration data becomes available. For example, the automatic processing can proceed using calibration data from the start of the night and recalibration can be done after additional calibrations at the end of the night are obtained.
The pipeline calibration, reduction, and quality assessment are defined by "recipes" selected from a list of recipes. A recipe is basically a "macro" or "script" that is executed on a specified disk file or set of disk files.
The DRA is controlled by a graphical user interface. This interface provides
Pipeline calibration consists of the standard CCD calibration operations. These are
The zero level, dark count, and flat field calibrations are created by combining multiple individual calibration exposures. The combining provides
Series of calibration exposures will be automatically detected and combined by monitoring the exposure types. For example, when a flat field type is first seen the individual exposures will be logged to the calibration database and when the first exposure which is not a flat field is detected all the preceding flat field exposures will be combined into a master flat field.
The automatic pipeline calibrations will use the closest calibration or master calibration in time. During initial automatic processing this will be the most recent previous calibration. When recalibration is done the nearest in time may be either before or after the exposure being calibrated. However, the DRA can be instructed which calibration to use if desired.
The details of the pipeline calibrations are specified by selecting a recipe and parameters in the DRA. Normally there will be one standard calibration recipe which will be part of the initial implementation. Variations of the standard recipe would be for special modes of operation (e.g. drift scans) and for future types of detectors and data (e.g. IR detectors where the order of calibration steps is different).
Pipeline data reductions are those operations automatically performed after CCD calibration. Possible examples are spectral extraction and object cataloging. The pipeline data reductions are selected from a list of recipes in the GUI. In most cases there will be no data reductions performed. Those that might be performed would generally be quicklook reductions that are redone later by the investigators in a more interactive manner.
The DRA does not provide the data reduction recipes. It only provides the mechanisms for adding data reduction recipes. The initial version of the DRA will probably not include any data reduction recipes.
Pipeline data assessment is a special kind of data reduction. It does something to the calibrated (or possibly uncalibrated) data which results in one or a few numbers. Often the numbers will be related to the signal-to-noise of the observation. Examples of this are monitoring the aperture photometry of some object(s) in a series of exposures of the same field or computing the mean extracted counts in spectrum.
The DRA provides for recipes that perform data assessment with the results viewed as graphs or text output.
When the DRA is notified of a completed observation it may queue the raw observation and, possibly, pipeline calibrated data to be archived and taped. The archiving would, at a minimum, be something like "save-the-bits". The archiving will include access control to prevent general users from avoiding observatory mandated archiving.
Basic CCD calibration processing is performed by the IRAF task CCDPROC. It provides the standard CCD calibrations for each of the amplifier/CCD readouts of the Mosaic.
The processing is performed on input data in the Mosaic data format. The output data is also in the Mosaic data format with the CCD image data calibrated and the associated pixel masks and uncertainties updated. The output data is created in a temporary file until the processing is successfully completed. Then the input data is renamed to a backup directory and the output file is renamed to the input name.
One change of data format is when there are multiple amplifiers from each CCD. The calibrated amplifier images are combined into a single image for the CCD. The output Mosaic format then consists of multiple extensions for the CCDs.
The Mosaic version of CCDPROC is actually a relatively simple task, possibly an IRAF script, that understands the details of the Mosaic data format. It extracts the individual amplifier images and associated data, such as pixel masks and uncertainties, and passes them to a lower level task to do the actual processing. It then takes the calibrated data and updated associated data and puts them back into the Mosaic data format. The lower level task is written to process individal images and associated pixel masks and uncertainties from an input to an output and has no knowledge of the details of the Mosaic data format.
The extraction from the Mosaic format to individual images and the reconstruction of the Mosaic format from the individual calibrated images does not actually involve extra copying of the data or intermediate files for the bulk CCD data. The FITS image kernel allows individual input images to be addressed directly in a multiextension FITS file and the output images to be appended to new extensions of a multiextension file. This is done so that an IRAF application does not need to know the disk structure of the data and can be written as simply reading and writing logically individual images. The Mosaic CCDPROC task controls the syntax to the FITS kernel image specification.
To illustrate how this works consider the following command sequence.
The first statement copies the input data global header to a new output FITS multiextension file. The second statement passes the image extension "im1" to the lower level CCDTOOL task as a single image and tells CCDTOOL to create a new output image "outdata[im1]", output pixel mask "tempmask", and output uncertainties "tempvar". The FITS kernel appends the calibrated data to "outdata" without CCDTOOL knowing that it is appending to an existing file. The next statement appends (sequentially) the pixel mask and uncertainty data from the temporary files to the output data file. Note how this avoids simultaneous access to the output image. Mask and uncertainty files are small and there is no significant overhead to using a temporary disk image. The final part of the example shows that other extensions in the input data can be copied by the task that knows about the data format without requiring something like CCDTOOL to know about the non-image extensions.
The combining of multple calibration exposures from the Mosaic detector is performed by an IRAF task COMBINE. It combines the individual elements of the Mosaic matched by amplifier or CCD identification. The combineing is done pixel-by-pixel within each amplifier/CCD image. It also propagates combined bad pixel masks, variance images, and exposure maps. The input and output data formats for the combining are the Mosaic data format.
The Mosaic version of COMBINE is a relatively simple task that understands the details of the Mosaic data format. It extracts the individual amplifier/CCD images and associated data, such as pixel masks and uncertainties, and passes them to a lower level task to do the actual combining. The calibrated data and updated associated data are then put back into the Mosaic data format.
The extraction from the Mosaic format to individual images and the reconstruction of the Mosaic format from the individual calibrated images does not actually involve extra copying of the data or intermediate files for the bulk CCD data. The FITS image kernel allows individual input images to be addressed directly in a multiextension FITS file and the output images to be appended to new extensions of a multiextension file. This is done so that an IRAF application does not need to know the disk structure of the data and can be written as simply reading and writing logically individual images. The Mosaic COMBINE task controls the syntax to the FITS kernel image specification.
COMBINE also will define how the image headers and non-image extensions are combined. In the initial implementation the output combined image will have the image header and non-image extensions from the first input image in the specified list of input images. This is the current approach in most IRAF tasks, such as IMCOMBINE and IMARITH, that produce an output from more than one input image.
The combining of calibration exposures will generally be controlled by the data reduction agent. It will detect sequences of calibrations and combine the sequence. Simple scripts layered on CCDPROC and COMBINE will be used and may also be used by the observer. These are ZEROCOMBINE, DARKCOMBINE, and FLATCOMBINE, and COMPCOMBINE.
The combining of calibration exposures is straightforward in the sense that there does not need to be any interpolation, shifting, and coordinate manipulation. The combining of dithered or rastered science exposures is more complex, particularly with regard to coordinate systems. Such data are first resampled into a single image in a celestial coordinate system that can be shifted by integer amounts along both image axes before combining. This is done by MSCIMAGE. COMINE uses the coordinate system produced by MSCIMAGE to shift and then combine dithered or rastered obsrvations.
The Mosaic World Coordinate System (WCS) maps the image pixels to celestial coordinates on the sky. The mapping is stored in the headers for each amplifier/CCD image. The WCS is defined in two stages. The first stage applies a predetermined calibration and the second stage adjusts this calibration based either on a catalog of sources in the field of the exposure or registers the WCS in multiple overlapping exposures based on common objects in the images.
The WCS calibration file consists of "plate solutions" for each amplifier/CCD determined from calibration exposures. This is done using MSCMAPWCS. The plate solution is then applied to observations by adding the telescope pointing and, possibly, instrument position angle. In other words, the WCS is determined once at some telescope pointing reported by the telescope control system. This WCS is used for other telescope pointings with a zero point offset set by the difference in reported telescope coordinates between the calibration and the observation. If the detector may be rotated then the calibration also includes a rotation axis origin determination and uses the difference in instrument position angles to adjust the WCS.
The plate solution may be determined by an instrument support person at some point prior to the observer or by the observer at the beginning of a run. Hopefully the NOAO CCD Mosaic will have sufficient geometric stability that the calibration need be done only when major maintanence is done or when the detector is mounted on the telescope at the beginning of a block of observing time. Regardless of whether this is done by an instrument support person or the observer some standard calibration fields with source catalogs will be prepared and a "cookbook" sequence documented.
A secondary calibration tool, MSCZERO, allows marking a single object in an exposure and entering a celestial coordinate to update the calibration file to "zero" the coordinates relative to the telescope pointing. For possible rotations two objects may be marked.
The first stage of setting the WCS for an observation using a calibration file and the telescope pointing is a basic calibration operation performed by MSCWCS. Note that if the WCS is set at an earlier stage by the data acquisition system or the data capture agent then this option of MSCWCS might not be needed.
The WCS set by the first stage is likely to be off by a small amount due to errors in the telescope pointing and instrument flexture. The second stage is to use objects in the image to adjust the WCS. This second stage may use many objects and a full astrometric catalog to make a new calibration. However it is more likely that there are only a few objects and possibly no source catalog. In that case the few objects can be used to make small zero point and rotation adjustments in either an absolute sense if the objects have known celestial coordinates or a relative sense if common objects in multiple exposures are used to register the exposures.
The adjustment of the WCS using a catalog of sources in the field of observation is performed by the task MSCWCS. It assumes that the existing WCS is fairly close. It takes each source in an input source catalog and searches near the expected position in the image for an object. The object position is determined using a centering algorithm. Once a set of measured pixel positions and catalog celestial coordinates is determined the WCS can be adjusted for an offset and rotation or possibly a new plate solution can be computed.
MSCWCS can be run automatically given a good first WCS and a catalog of sources. If the user supplies the source catalog or the data reduction agent can automatically obtain a catalog (say by using the telescope coordinates and a "catalog server") then this second stage WCS calibration performed by MSCWCS can be part of the basic calibration performed by the DRA.
MSCWCS applies both the initial calibration based on a calibration file and the telescope information and an observation correction based on a catalog of sources in the observation. Thus, while the logic is described as two steps the DRA may do both operations at once with one call to MSCWCS. The way MSCWCS works is if a calibration file is specified it does the first stage and if a source catalog is specified it does the second stage. If both are specified in one execution then both stages are done.
The task MSCREGISTER uses objects in a set of Mosaic observations to adjust the world coordinate system (WCS) for each observation to best "register" the objects. This means that overlapping objects will have nearly the same coordinates subject to the limitations set by the form of the WCS description. The set of objects need not appear in all observations but there must be some reasonable overlap so that each observation has common objects with one other observation and all the observations form a single continuous region.
Several algorithms are required. The objects in each amplifier/CCD image must be cataloged. Then common objects between the many catalogs must be identified. Finally the set of WCS must be registered in some "best" way.
To simplify the problem the data are required to have some approximate world coordinate system that places common objects within some distance of each other. This is based on a astrometric calibration, offset by the position of the telescope, that takes the CCD alignments and optical distortions into account.
The FITS WCS descriptions for celestial coordinate systems is under development. The least certain area is representations of the higher order terms of a plate solutions. The initial implementation will measure the full plate solution but will set the image WCS using only one of the WCS representations described in the FITS WCS draft. Since each amplifier/CCD image has it's own WCS the plate solution should be sufficiently accurate without higher order terms. The following needs to be considered.
The individual amplifier/CCD pieces from a calibrated Mosaic exposure are put together to create a single image using the task MSCIMAGE. This operation
Basically, a uniform sky grid of equal sized pixels about some point in the sky is defined and the observed pixels are interpolated to this grid. By using the same grid for dithered or rastered sets of observations, the images can then be combined using only integer pixel shifts in the two image axes. The goal is to require only a single interpolation of the data.
The mapping between the coordinates of the input pixels and the output pixels is defined by the world coordinate system in the image headers. This is set during the calibration steps as described in another section.
While the default action of MSCIMAGE is to create a single resampled image from the elements of the Mosaic there is also an option to preserve the Mosaic data format by keeping the resampled elements as separate extensions.
The WCS for a transformed image made from the components of the Mosaic will be a Cartesian projection which allows simple shifts to register dithered and rastered images. This type of WCS is perfectly fine for the scales on which dithered and rastered observations will be done. It is an acceptible and defined WCS in the FITS draft standard. The point to note is that this type of WCS projection has not been considered common in FITS optical images. With the increased use of optical Mosaics with fields of a square degree or less this will likely become much more common because of it's property of straightforward combining of rastered observations.