JWST Science Calibration Pipeline

JWST has a single science calibration pipeline for processing all observational data, unlike prior observatories such as HST. This pipeline is designed to share code across instruments where practical (e.g., for the intial detector-level processing) and to be modular, allowing imaging and spectroscopy to run only the steps relevant to a given observing mode. Likewise, there are many optional steps, and different ways to configure these steps depending on the science needs of a given set of observations. 

On this page

The pipeline depends upon a library of reference files managed through the Calibration Reference Data System (CRDS); these reference files are just as important to the quality of the final science data products as the pipeline software itself.

This article and linked pages provide a brief astronomer-level overview of the JWST Science Calibration Pipeline and CRDS reference file infrastructure. Links are provided where relevant to the more extensive technical information that is distributed via ReadTheDocs and updated with each release of the pipeline software.

Note that this article provides generic information about the pipeline across different release versions. If you're looking for specific calibration information about the latest JWST data products, see the JWST Calibration Status page, "What's New" in the latest build, what's "Coming Soon" in the next build, and the list of current known Issues with JWST data.



Installation

Words in bold are GUI menus/
panels or data software packages; 
bold italics are buttons in GUI
tools or package parameters.

The JWST Science Calibration Pipeline (jwst) is open-source, written in Python (with some C bindings for speed), and is available on GitHub. Users are therefore encouraged to contribute to its development by submitting bug reports and pull requests.

Instructions for installing the pipeline software can be found in the Github pipeline package's README page. This page contains instructions for installing specific released versions as well as the latest development version. Generally, most users who want to run the pipeline for themselves should install the latest released version given in the Github software build table. Stable releases of the jwst package are also registered at PyPI. If you want to reproduce data products downloaded from MAST, check the latest JWST Operations Pipeline Build Information to determine which pipeline version is currently being used at STScI to produce MAST data products.

Important: Note that while the jwst pipeline is tested and supported on both the Linux and MacOS platforms, Windows platforms are not currently supported.

If you have installed the jwst pipeline in a Conda environment (a good idea to make sure that it doesn't conflict with any other Python installations on your local machine), you must first activate that environment before you can use it. For instance, if you installed the pipeline in an environment called "jwst" you would have to activate that environment in your terminal window by typing either

conda activate jwst

or for some systems,

source activate jwst

Once this environment has been activated, you may also wish to use pip to install a variety of other packages not bundled with the jwst pipeline that can nonetheless be useful. For example:

pip install jupyter

Instructions for installing the CRDS reference files used by the pipeline can also be found on the Github pipeline package README. In brief, environmental variables must be set to tell the pipeline software where to obtain reference files from: (export CRDS_SERVER_URL=https://jwst-crds.stsci.edu) and where you want to cache these reference files on your local machine (e.g., export CRDS_PATH=/Users/bilbo/crds_cache/jwst_ops). The CRDS environment variables need to be defined before any imports, including importing the jwst or crds software packages.



The four stages of the JWST Science Calibration Pipeline

The figure below shows the flow of data through the JWST calibration pipeline. This pipeline and the corresponding science data products that it produces can be divided into 4 main stages, depending on the degree of processing:

  • Stage 0: Produces uncalibrated raw data products from single exposures in units of total DN (e.g., "uncal.fits")
  • Stage 1: Produces data products that have been corrected for certain detector effects and converted to units of DN/s (e.g., "rate.fits")
  • Stage 2: Produces calibrated data products from single or multiple exposures with world coordinates and photometric information (e.g., "cal.fits")
  • Stage 3: Produces calibrated data products resulting from the combination of multiple exposures into a single integrated product (e.g., "i2d.fits", "s3d.fits", "x1d.fits")

Summaries of the algorithms used for individual pipeline stages can be found in the links below. For further information on the JWST science data products themselves, see either the JDox data products overview or the more extensive ReadTheDocs Data Product Types.

See also Key Differences for JWST Time Series Observations and Key Differences for JWST Moving Target Observations.

Figure 1. Overview of JWST pipeline stages

Click on the figure for a larger view.


Stage 0 Pipeline

The Science Data Processing (SDP) system performs pre-processing of the raw telemetry downloaded from the spacecraft prior to the start of the JWST Science Calibration Pipeline. SDP processing is the stage at which key header information is populated from a variety of sources including onboard telemetry, APT proposal inputs, JPL spacecraft ephemerides, etc. The SDP code is distinct from the JWST Science Calibration Pipeline, and general users will not interact with it, instead starting any offline processing with the uncalibrated data (i.e., "*uncal.fits" files) that it produces.

These "*uncal.fits" files are the inputs to stage 1 of the calibration pipeline and usually have 4 dimensions since all JWST detectors use up-the-ramp readout (sometimes referred to as MULTIACCUM) in which pixel values increase between groups within a given integration. The first 2 dimensions are the column and row axes of the detector, while the 3rd dimension is determined by the number of groups per integration, and the 4th dimension by the number of integrations in the exposure. Note that all 4 dimensions will be used, even if Ngroup = 1 or Nint= 1.

These data come from single exposures and are usually contained within a single FITS file. However, when the raw data volume for an individual exposure is large enough, like for time-series observations, the uncalibrated data can be broken into multiple segments less than 2GB each, so as to keep total file sizes to a reasonable level. Such broken-up exposures usually include "segNNN" in the file names, where NNN is 1-indexed and always includes any leading zeros.

Stage 1 pipeline

Input: Uncalibrated ramp data ("*uncal.fits" files)

Output: Uncalibrated slope images ("*rate.fits" files, "*rateints.fits" files)

Step code overview: 

The first stage of the JWST Science Calibration Pipeline applies detector-level corrections to raw non-destructively read "ramps" from the uncalibrated data in order to produce 2-dimensional count rate images per exposure (or per integration for some modes). These include corrections for dark current, flagging of known bad pixels, countrate non-linearity, jumps in the ramps produced by cosmic rays, and many other effects. This stage consists of a single pipeline that is used for all JWST data; for more information about these steps see the calwebb_detector1 overview article.

Stage 2 pipeline

Input: Uncalibrated slope images ("*rate.fits" files, "*rateints.fits" files)

Output: Calibrated slope images ("*cal.fits"

Step code overview: 

The second stage of the JWST Science Calibration Pipeline calibrates the ramps provided by the first stage of the pipeline. Individual steps include pixel flat-fielding, derivation and attachment of world coordinate information mapping detector pixels to wavelengths and sky coordinates, application of spectrophotometric calibration factors, etc. The output of this stage is calibrated data from individual exposures (typically in units of MJy/sr) that is still on the native detector pixel grid. In this stage, processing splits into 2 main workflows, with the calwebb_image2 and calwebb_spec2 pipeline handling imaging and spectroscopic data respectively.

Stage 3 pipeline

Input: Calibrated slope images ("*cal.fits") 

Output: Science-ready mosaics, IFU data cubes, and one-dimensional spectra (e.g., "*i2d.fits" files, "*s3d.fits" files, and "*x1d.fits" files).

Step code overview: 

  • calwebb_image3 (applied to MIRI, NIRCam, and NIRISS direct imaging data)
  • calwebb_coron3 (applied to MIRI and NIRCam coronagraphic imaging)
  • calwebb_ami3 (applied to NIRISS aperture masking interferometry data)
  • calwebb_spec3 (applied to MIRI MRS/LRS spectroscopy, NIRCam and NIRISS WFSS, and NIRSpec MOS/FS/IFU spectroscopy)
  • calwebb_tso3 (applied to all time-series observations, both photometry and spectroscopy)

The third and final stage of the JWST Science Calibration Pipeline takes the individual calibrated exposures provided by the second stage of the pipeline and combines them into final science-ready data products. For imaging modes, this typically involves combining individual dithered exposures onto a common regularly-sampled mosaic grid, in addition to producing basic source catalog information from this mosaic. For spectroscopic modes, this stage can involve resampling dithered observations into composite 3-dimensional data cubes (for NIRSpec and MIRI integral field spectroscopy) and/or producing final one-dimensional extracted spectra. Data products are typically provided in units of MJy/sr (for images and IFU data cubes) or Jy (for one-dimensional spectra). This stage involves the most unique processing steps for each of the JWST observing modes, as the final science data products can differ substantially between these modes. As such, stage 3 processing uses different pipelines for direct imaging, coronagraphy, aperture-masked imaging, and spectroscopy.  Additionally, observations obtained in time-series (TSO) mode have a different stage 3 pipeline tailored to the needs of the TSO community.



Error propagation and data quality arrays

The JWST pipeline includes both error estimates and data quality flags with all data products. For details, see JWST Science Data Overview.



CRDS reference files

ReadTheDocs documentation: ReadTheDocs Reference File Documentation

Each stage of the JWST Science Calibration Pipeline uses a set of instrument-specific reference files that ensures the pipeline meets its accuracy requirements. These reference files are stored and managed within the Calibration Reference Data System (CRDS), which is located at https://jwst-crds.stsci.edu/. CRDS is directly integrated with the pipeline via reference file mappings, also known as contexts, that are set by default to always access the best reference files for the data, according to certain selection rules (for example, instrument and filter used for an observation). Each context consists of a pmap (e.g., jwst_1188.pmap) that lists an imap for each instrument (e.g., jwst_miri_0380.imap for MIRI).  These imaps list the rmaps for each kind of reference file (e.g., jwst_miri_photom_0048.rmap for MIRI photometric calibration reference files).  The rmaps in turn encode the selection rules according to which specific reference files should be chosen, for example, based on the detector in use, the date of the observation, etc.

Reference files are continually being updated by the JWST instrument teams based on analysis of in-flight data, allowing the overall calibration quality of JWST data products to improve as the instrument performance becomes better characterized. New CRDS contexts are created to point to such updated files and specify in what cases they should be used by the calibration pipeline. Accessing the CRDS website (https://jwst-crds.stsci.edu/) allows users to browse the latest reference files that apply to particular observing modes. The website also provides tabular information for each reference file, such as the "ACTIVATION DATE" for when these files were activated for use in the operational pipeline (i.e., the install of the JWST Science Calibration Pipeline that produces the data products available via MAST), and their "PEDIGREE", indicating the type of data used to create these files. (To see these columns, click on an instrument, then on one of the reference files listed below.)

A detailed list of the required reference files for each pipeline step, and the corresponding format of those reference files, can be found in the ReadTheDocs Reference File Table.

Types of reference files

Broadly speaking, there are 2 kinds of reference files contained within CRDS.

Basic reference files contain calibration data necessary for the pipeline. Flat fields, bad pixel masks, photometric calibration vectors, etc., are all basic reference files called from within a given pipeline step.

Parameter reference files (see ReadTheDocs documentation) are ASDF/YAML-format text files that configure how the pipeline is run for a particular step. For example, there are distinct parameter reference files for the calwebb_spec2 pipeline stage (see, for example, jwst_miri_pars-spec2pipeline_0005.asdf) that configure the pipeline to run with a different set of default arguments for individual instrument modes. These parameter reference files can be used to skip steps or provide defaults for user-customizable input parameters. Such defaults may either be provided at the individual step level (e.g., jwst_miri_pars-outlierdetectionstep_0052.asdf) or for the stage within which that step is contained (e.g., jwst_miri_pars-spec2pipeline_0005.asdf)

Choosing a context

In general, users running the JWST Science Calibration Pipeline for themselves should always use the latest CRDS context, which will point to the best reference files available for the date of every observation.  By default this will happen automatically as the pipeline checks for context updates every time it starts. However, it is also possible to override this default to choose a specific context if there is some need to do so; for instance, the pipeline could be commanded to use context jwst_1179.pmap by setting the environmental variable CRDS_CONTEXT=jwst_1179.pmap (or in a Jupyter notebook environment , %env CRDS_CONTEXT jwst_1179.pmap). Note that any context overrides should be set prior to importing JWST pipeline functions.

Since calibration files are updated frequently, it is critical to specify both the jwst software version and CRDS context in use when publishing scientific results based on JWST data. See Citing JWST Data for further details.

Subscribing to notifications about new reference files

For each instrument, users can be notified about new reference file updates by subscribing to a list server notification system for each science instrument. Subscribing is done by simply sending an email command to the list server maillist.stsci.edu. The list server "subscribe" command for each instrument is: 

  • MIRI: miri_reffiles_upd-subscribe-request
  • NIRCAM: nircam_reffiles_upd-subscribe-request
  • NIRISS: niriss_reffiles_upd-subscribe-request
  • NIRSPEC: nirspec_reffiles_upd-subscribe-request
  • FGS: fgs_reffiles_upd-subscribe-request

To subscribe to notifications, send an email (with no subject and email body) using the address format <subscribe_command>@maillist.stsci.edu. For example, to subscribe to MIRI reference files deliveries, send an email to miri_reffiles_upd-subscribe-request @ maillist.stsci.edu. 

The Reference Data for Calibration and Tools Management (ReDCaT) team will authorize your request and you will receive an email with information about the mailing list and how to unsubscribe.

Whenever a delivery of reference files is activated for use with the JWST Operational Pipeline, the ReDCaT Team sends a notification indicating the type of files delivered, the modes affected, and the USEAFTER dates for the files. It also includes the full list of files, the reason for the delivery, and relevant links.

If you prefer to wait for your data to be reprocessed by the operational pipeline, use the MAST Portal  to subscribe to observations of interest. You will receive notifications when recalibrated products appear in the Archive. See the Program Subscriptions and Notifications chapter of the Portal Guide for details.



Operations pipeline builds

Once per quarter a fixed build is produced corresponding to a tagged version of the JWST Science Calibration Pipeline and a particular CRDS reference file context. This pipeline build is then used by STScI to reprocess all archival JWST data as well as newly obtained observations.

For details on the pipeline build framework and historical differences between the builds, see JWST Operations Pipeline Build Information.



Why should I rerun the pipeline?

Standard calibration pipeline processing should produce publication quality science data products that can be downloaded directly from the MAST archive (see Accessing JWST Data). However, this pipeline automatically processes data from sparse deep fields to ultra-bright Jovian planets and everything in-between, and processing can usually be optimized by users making science-specific decisions for their own observing programs. The pipeline is therefore designed to be modular with multiple user-configurable options to support a variety of science cases. As an example, the MIRI MRS integral field spectrometer (like many mid-IR instruments) experiences significant spectral fringing which can give rise to artifacts in the extracted spectra. Multiple pipeline steps exist to help remove these artifacts, but should only be enabled if the science spectrum is not expected to contain regular periodic astrophysical features with a similar frequency.

Likewise, while STScI reprocesses all science data periodically with the latest software and reference file versions, users who do not wish to wait can reprocess the data themselves as soon as any updates are available.

In general, users are encouraged to familiarize themselves with description of the JWST calibration status of each instrument mode of interest, along with the list of any known issues with JWST data of which they should be aware for their particular science. The Known Issues with JWST Data articles also help to provide guidance for each instrument mode on when reprocessing may be desirable, and how to set relevant parameters accordingly.



How do I rerun the pipeline?

See Running the JWST Science Calibration Pipeline and Tips and Tricks for working with the JWST Pipeline.



References

Bushouse, H., et al. 2023, Zenodo
JWST Calibration Pipeline.




Notable updates


Originally published