JWST General Calibration Pipeline Caveats
Unique features of the JWST Science Calibration Pipeline that are not necessarily instrument- or mode-specific, including caveats for users, are described in this article. Users should also refer to the other articles in this section for characteristics and caveats that are specific for their mode of interest. This information reflects the status for the jwst calibration pipeline package 1.5.3 released with build 8.0.1.
The following sections highlight some aspects of the JWST calibration pipeline that may affect all or several modes and instruments. It is also important to note that early in the mission our understanding of the observatory performance is evolving quite rapidly, and changes to calibration procedures are expected.
Note: Check the "Latest Update" box at the bottom of each article to see when the last updates were made.
This page highlights some information about general data processing using the JWST science calibration pipeline software that may be helpful for observers. For a list of known issues, bugs, and updates for a particular software release, please visit the JWST Operational Pipeline Build Information page.
All stages of processing
Each stage of the JWST Science Calibration Pipeline uses a set of instrument-specific reference files that ensure the science calibration pipeline meets its accuracy requirements. Calibration reference files are stored in the Calibration Reference Data System (CRDS). CRDS is directly integrated with calibration steps and pipelines, and the reference file mappings are set by default to always access the most recently delivered reference files according to certain selection rules (for example, instrument and filter used for an observation).
Before launch, instrument teams used dummy or ground testing data to support development of the science calibration pipeline stages. These files can be identified using the
PEDIGREE header keyword, which will have a value of
GROUND. After launch, instrument teams began using data taken during commissioning to produce higher quality reference files; these files have a
PEDIGREE header keyword value of
INFLIGHT. All reference files include the
DATE header keyword, which indicates the UTC date the reference file was created. Teams are actively developing and delivering improved files, and will continue to do so as we begin Cycle 1 calibration programs. As such, some of the pre-flight reference files are still in use until enough data is available to replace them.
Observers have the flexibility to create their own reference file versions or to override the default reference files when running the calibration pipeline manually. More information about the reference files can be found in the software documentation.
The operational pipeline—the JWST Science Calibration Pipeline version used by the MAST Archive to calibrate all JWST data—was designed to process JWST data as optimally as possible for most instruments and modes using a common set of calibration steps and parameters within those steps. The default set of parameters was chosen based on ground test data or simulations and will likely be updated as we obtain more data on-orbit. The parameters are stored in parameter reference files that are included in CRDS along with the standard calibration reference files. For specific science cases or instrument modes, there may be ways to further improve or optimize the JWST Science Calibration Pipeline by using the parameters to change the standard data flow.
Most calibration steps include a set of parameters for processing that can be tweaked to improve the outputs of various stages of the pipeline when observers are running it manually. For example, in the jump detection step of stage 1 processing, which flags outliers and cosmic rays in the uncalibrated data, there are multiple parameters included in the algorithm. Observers may find that too many (or too few) cosmic rays are flagged in their data, and decide to rerun the step on their own with different settings. Available parameters for a step can be found in multiple ways:
- by looking at the parameter reference file for a step, which contains the default parameters used in the operational pipeline
- by visiting the "Arguments" section for the calibration step software documentation (e.g., for jump detection)
by importing a calibration step in a Python session and using the .
specattribute, as shown in the example below:
To learn more about how to edit parameters and run the calibration pipeline manually, video tutorials are available in JWST Data Video Tutorials.
Intermediate data products
The operational pipeline was designed to provide a specific set of data products to observers when retrieving data from MAST; however, there are many additional data products that can be produced by the calibration steps. When running the science calibration pipeline on their own, observers can opt to save the data output after each calibration step is completed, or only after the steps of interest. In either case, the additional data products are accessed by manually processing the data and changing the parameters for the step or pipeline.
For example, the Python code below demonstrates how to save intermediate data products for a single step (jump detection step) and also for the stage 1 pipeline (calwebb_detector1) using the
Calibration step precedence
The flow of data through the stages and steps of the calibration pipeline was intentionally designed to process the raw data to produce count rate (slope) images, calibrate the slope images, and then carry out any additional processing, including the creation of combined images and spectra. As such, observers should be careful when turning steps on and off, as subsequent steps may rely on a change to the data values, data structure and format, or header keywords that would have been made during a previous step.
Error arrays are initialized in stage 1 processing and are stored in the "ERR" extension of the data. The uncertainty from each step that contributes noise to the final measurement is separately calculated and propagated by various steps in the calibration pipeline using a noise model. Anytime a step creates or updates variances, the total error array values are recomputed as the square root of the quadratic sum of all variances available at the time. Note that the "ERR" array values are always expressed as standard deviation (i.e., square root of the variance), and the variances are stored in the "VAR_POISSON' (variance due to Poisson noise), and "VAR_RNOISE" (variance due to read noise) arrays. In some cases, the variance arrays are only used internally within a given step.
Different uncertainty sources behave in different ways. Some noise sources (e.g., photon noise) are independent between integrations and others (e.g., flat field noise) are not. Additionally, the spatial covariance of different sources varies. By propagating each term through the calibration pipeline, the use of each term can be customized for the processing. For example, the use of the flat field noise term is different between non-dithered and dithered observations. For the former, the noise does not reduce with the addition of more integrations while for the latter it does.
In level 3 mosaic products, observers may find that the "WHT" (see the Data quality information section below) and "ERR" extensions are useful, since the "WHT" extension can be used for source detection (as it contains the background noise terms) and the "ERR" extension can be used for calculating photometric errors in an aperture (as it contains all the noise, i.e., background plus photon noise).
Stage 1 processing
See also: Stage 1 Detector Processing (software documentation outside JDox)
Data quality information
The data quality (DQ) initialization step in the calibration pipeline populates the data quality mask for a dataset to flag any pixels that may be unreliable or unusable for a number of reasons, such as dead pixels, hot pixels, etc. These flags are carried through the steps of the pipeline and may inform how the calculations within a calibration step are performed for a pixel. Different instruments monitor different characteristics and hence may have differing pixel flags; however, the common value name for pixels that get excluded from calculations is "DO_NOT_USE". Other unreliable or sub-optimal pixels may still be included in the calculations for a calibration step, so observers should keep this in mind when analyzing their data products. For a full list of data quality flags that may be used, refer to the software documentation.
Throughout stage 1 processing, this information is stored in the "PIXELDQ" and "GROUPDQ" extensions, until they are replaced by a single "DQ" extension in the final stage 1 data products. More information on data quality flags for subsequent processing stages are provided below.
The jump detection step in the calibration pipeline flags jumps in the ramp where the ADU level between 2 consecutive groups is large relative to those between other consecutive pairs of groups. These ramp jumps are often caused by cosmic rays (CRs) that deposit large amounts of charge in a pixel, and the number of sigmas above the noise threshold (called the rejection threshold) is given as a parameter. The default parameter chosen for the operational pipeline was determined from pre-flight data, so it may not be optimal for some in-flight data.
Observers running the calibration pipeline manually may find that they need to increase or decrease the jump detection threshold depending on whether they notice over- or under-flagging of jumps in their data, or they may decide to increase the detection threshold in order to speed up the step calculations. This is done by updating the
rejection_threshold parameter for the step (see the Parameters section above). Bear in mind that a second pass at flagging outliers happens during stage 3 in the outlier detection step, which uses the overlapping regions observed in different exposures to catch cosmic rays undetected during the jump step. Note that efficiency improvements are underway for a few long-running steps in the calibration pipeline, including the jump and outlier detection steps. Additionally, the jump step will be skipped if the input data contain fewer than 3 groups per integration, but an update to the algorithm that allows CR flagging in 2-group integrations is in development.
One particular type of CR that was seen in ground tests and has now been detected in flight is referred to as a snowball. While these phenomena and how to correct for them are still in discussion, snowballs seem to account for a very small part (possibly less than 0.1%) of the total CR population, but can have a significant impact on affected pixels. They appear to generally be round and feature a heavily saturated "core" and an extended "halo" region, and are often accompanied by a shower of CRs in their immediate vicinity of varying intensity and size. Updates to the jump detection step to improve cosmic ray flagging for snowballs and to handle charge spilling into neighboring pixels are in progress.
Because this step flags jumps or outliers in the ramp, it is also possible that observations that have guide star instability or any movement of the image during the exposure may end up with false jump flagging in their data, since the calibration pipeline sees this change in the pixel values as a jump in the ramp. While this does not appear to be a common issue, observers who notice false flagging in cases like this can report them via the JWST Help Desk. Also note that jump detection for moving target observations has not been tested extensively using ground data, so the performance in flight for those types of observations will need to be evaluated.
Stage 2 and 3 processing
See also: Pipeline Stages (software documentation outside JDox)
Data quality information
At the end of stage 1 processing and throughout stage 2, the individual "PIXELDQ" and "GROUPDQ" extensions for a ramp are replaced by a single "DQ" extension, which is a data array containing DQ flags for each pixel, for each integration (or for averaged integrations, depending on the data product type). In stage 3 processing, the data is resampled based on the WCS and distortion information and then combined into a single undistorted product. Resampled data products contain "WHT" and "CON" extensions in place of the "DQ". These extensions provide observers with the 2-D weight image giving the relative weight of the output pixels (WHT) and the 2-D context image, which encodes information about which input images contribute to a specific output pixel (CON).
Words in bold italics are also buttons
or parameters in GUI tools. Bold
style represents GUI menus/
panels & data software packages.
Information on how to run the calibration pipeline using association files is available at JWST Data Video Tutorials.