JWST has a single science calibration pipeline for processing all observational data, unlike prior observatories such as HST. This pipeline is designed to share code across instruments where practical (e.g., for the intial detector-level processing) and to be modular, allowing imaging and spectroscopy to run only the steps relevant to a given observing mode. Likewise, there are many optional steps, and different ways to configure these steps depending on the science needs of a given set of observations.

On this page

The pipeline depends upon a library of reference files managed through the Calibration Reference Data System (CRDS); these reference files are just as important to the quality of the final science data products as the pipeline software itself.

This article and linked pages provide a brief astronomer-level overview of the JWST Science Calibration Pipeline and CRDS reference file infrastructure. Links are provided where relevant to the more extensive technical information that is distributed via ReadTheDocs and updated with each release of the pipeline software.

Note that this article provides generic information about the pipeline across different release versions. If you're looking for specific calibration information about the latest JWST data products, see the JWST Calibration Status page, "What's New" in the latest build, what's "Coming Soon" in the next build, and the list of current known Issues with JWST data.

Pipeline builds

JWST pipeline development proceeds according to a quarterly build schedule. Each quarter, the active development version of the JWST Science Calibration Pipeline is frozen, combined with other dependent software (e.g., the JWST Science Data Processing system; SDP) and a particular CRDS reference file context, and tagged to produce a candidate build. This candidate build is then tested extensively prior to becoming an operations build installed by the Data Management System (DMS) at STScI and used to reprocess all previous JWST data as well as newly obtained observations. Once a build is no longer operational, it becomes archived.

There are thus four kinds of builds:

Development (Dev) build: Changing daily, uses the Latest CRDS context, not recommended for non-developers.
Candidate build: Development has finished, currently working on testing and documentation. Uses a fixed CRDS context. Available for experienced users.
Operations build: Installed at STScI and being used to produce data products available in MAST. Uses a fixed CRDS context. Default recommendation for users.
Archived build: Outdated operations build that is no longer recommended for general use (may still correspond to some MAST data products until reprocessing is completed).

For details on the pipeline build framework and historical differences between the builds, see JWST Operations Pipeline Build Information.

Installation

Words in bold are GUI menus/
panels or data software packages;
bold italics are buttons in GUI
tools or package parameters.

The JWST Science Calibration Pipeline (jwst) is open-source, written in Python (with some C bindings for speed), and is available on GitHub. Users are therefore encouraged to contribute to its development by submitting bug reports and pull requests.

Installation of the jwst pipeline has two parts; installing the pipeline software and setting up paths to the CRDS calibration reference files used by the pipeline.

Instructions for installing the pipeline software can be found in the Github pipeline package's README page. This page contains instructions for installing specific released versions as well as the latest development version. Generally, most users who want to run the pipeline for themselves should install the latest released version given in the Github software build table. Stable releases of the jwst package are also registered at PyPI. If you want to reproduce data products downloaded from MAST, check the latest JWST Operations Pipeline Build Information to determine which pipeline version is currently being used at STScI to produce MAST data products. Note that the jupyter package is not bundled with the pipeline software, and you will need to use pip (or a similar method) to install it if you wish to use the JWST Pipeline Notebooks.

Instructions for setting up the CRDS reference files used by the pipeline can also be found in the Github pipeline package's README. In brief, the environment variable CRDS_SERVER_URL must be set to tell the pipeline software where to obtain reference files from and the environment variable CRDS_PATH must be set to specify where to cache these reference files on your local machine. These environment variables need to be defined before any imports, including importing the jwst or crds software packages.

Important: Note that while the jwst pipeline is tested and supported on both the Linux and MacOS platforms, Windows platforms are not currently supported.

The four stages of the JWST Science Calibration Pipeline

The figure below shows the flow of data through the JWST calibration pipeline. This pipeline and the corresponding science data products that it produces can be divided into 4 main stages, depending on the degree of processing:

Stage 0: Produces uncalibrated raw data products from single exposures in units of total DN (e.g., "uncal.fits")
Stage 1: Produces data products that have been corrected for certain detector effects and converted to units of DN/s (e.g., "rate.fits")
Stage 2: Produces calibrated data products from single or multiple exposures with world coordinates and photometric information (e.g., "cal.fits")
Stage 3: Produces calibrated data products resulting from the combination of multiple exposures into a single integrated product (e.g., "i2d.fits", "s3d.fits", "x1d.fits")

Summaries of the algorithms used for individual pipeline stages can be found in the links below. For further information on the JWST science data products themselves, see either the JDox data products overview or the more extensive ReadTheDocs Data Product Types.

*Click on the figure for a larger view.*

Stage 0 Pipeline

The Science Data Processing (SDP) system performs pre-processing of the raw telemetry downloaded from the spacecraft prior to the start of the JWST Science Calibration Pipeline. SDP processing is the stage at which key header information is populated from a variety of sources including onboard telemetry, APT proposal inputs, JPL spacecraft ephemerides, etc. The SDP code is distinct from the JWST Science Calibration Pipeline, and general users will not interact with it, instead starting any offline processing with the uncalibrated data (i.e., "*uncal.fits" files) that it produces.

These "*uncal.fits" files are the inputs to stage 1 of the calibration pipeline and usually have 4 dimensions since all JWST detectors use up-the-ramp readout (sometimes referred to as MULTIACCUM) in which pixel values increase between groups within a given integration. The first 2 dimensions are the column and row axes of the detector, while the 3rd dimension is determined by the number of groups per integration, and the 4th dimension by the number of integrations in the exposure. Note that all 4 dimensions will be used, even if N_group = 1 or N_int= 1.

These data come from single exposures and are usually contained within a single FITS file. However, when the raw data volume for an individual exposure is large enough, like for time-series observations, the uncalibrated data can be broken into multiple segments less than 2GB each, so as to keep total file sizes to a reasonable level. Such broken-up exposures usually include "segNNN" in the file names, where NNN is 1-indexed and always includes any leading zeros.

Stage 1 pipeline

Input: Uncalibrated ramp data ("*uncal.fits" files)

Output: Uncalibrated slope images ("*rate.fits" files, "*rateints.fits" files)

Step code overview:

calwebb_detector1 (applied for all data)

The first stage of the JWST Science Calibration Pipeline applies detector-level corrections to raw non-destructively read "ramps" from the uncalibrated data in order to produce 2-dimensional count rate images per exposure (or per integration for some modes). These include corrections for dark current, flagging of known bad pixels, countrate non-linearity, jumps in the ramps produced by cosmic rays, and many other effects. This stage consists of a single pipeline that is used for all JWST data; for more information about these steps see the calwebb_detector1 overview article.

Stage 2 pipeline

Input: Uncalibrated slope images ("*rate.fits" files, "*rateints.fits" files)

Output: Calibrated slope images ("*cal.fits", "*calints.fits")

Step code overview:

calwebb_image2 (applied to all imaging data, including coronagraphy and NIRISS AMI)
calwebb_spec2 (applied to all spectroscopic data)

The second stage of the JWST Science Calibration Pipeline calibrates the ramps provided by the first stage of the pipeline. Individual steps include pixel flat-fielding, derivation and attachment of world coordinate information mapping detector pixels to wavelengths and sky coordinates, application of spectrophotometric calibration factors, etc. The output of this stage is calibrated data from individual exposures (typically in units of MJy/sr) that is still on the native detector pixel grid. In this stage, processing splits into 2 main workflows, with the calwebb_image2 and calwebb_spec2 pipeline handling imaging and spectroscopic data respectively. The "*cal.fits" files are the integration-combined data used for regular non-TSO mode observations, while TSO mode observations use "*calints.fits" files in which individual integrations are kept separate to preserve the time-series information.

Stage 3 pipeline

Input: Calibrated slope images ("*cal.fits" for non-TSO mode data, "*calints.fits" for TSO mode data)

Output: Science-ready mosaics, IFU data cubes, and one-dimensional spectra (e.g., "*i2d.fits" files, "*s3d.fits" files, and "*x1d.fits" files).

Step code overview:

calwebb_image3 (applied to MIRI, NIRCam, and NIRISS direct imaging data)
calwebb_coron3 (applied to MIRI and NIRCam coronagraphic imaging)
calwebb_ami3 (applied to NIRISS aperture masking interferometry data)
calwebb_spec3 (applied to MIRI MRS/LRS spectroscopy, NIRCam and NIRISS WFSS, and NIRSpec MOS/FS/IFU spectroscopy)
calwebb_tso3 (applied to all time-series observations, both photometry and spectroscopy)

The third and final stage of the JWST Science Calibration Pipeline takes the individual calibrated exposures provided by the second stage of the pipeline and combines them into final science-ready data products. For imaging modes, this typically involves combining individual dithered exposures onto a common regularly-sampled mosaic grid, in addition to producing basic source catalog information from this mosaic. For spectroscopic modes, this stage can involve resampling dithered observations into composite 3-dimensional data cubes (for NIRSpec and MIRI integral field spectroscopy) and/or producing final one-dimensional extracted spectra. Data products are typically provided in units of MJy/sr (for images and IFU data cubes) or Jy (for one-dimensional spectra). This stage involves the most unique processing steps for each of the JWST observing modes, as the final science data products can differ substantially between these modes. As such, stage 3 processing uses different pipelines for direct imaging, coronagraphy, aperture-masked imaging, and spectroscopy. Additionally, observations obtained in time-series (TSO) mode have a different stage 3 pipeline tailored to the needs of the TSO community.

Error propagation and data quality arrays

The JWST pipeline includes both error estimates and data quality flags with all data products. For details, see JWST Science Data Overview.

CRDS reference files

ReadTheDocs documentation: ReadTheDocs Reference File Documentation

Important: In October 2024 the CRDS interface was changed so that each version of the pipeline software uses a specific and frozen CRDS context. See details at CRDS Migration to Quarterly Calibration Updates.

Each stage of the JWST Science Calibration Pipeline uses a set of instrument-specific reference files that ensures the pipeline meets its accuracy requirements. These reference files are stored and managed within the Calibration Reference Data System (CRDS), which is located at https://jwst-crds.stsci.edu/. CRDS is directly integrated with the pipeline via reference file mappings, also known as contexts, that link to the appropriate set of reference files to use for a given pipeline version. Each context consists of a pmap (e.g., jwst_1188.pmap) that lists an imap for each instrument (e.g., jwst_miri_0380.imap for MIRI). These imaps list the rmaps for each kind of reference file (e.g., jwst_miri_photom_0048.rmap for MIRI photometric calibration reference files). The rmaps in turn encode the selection rules according to which specific reference files should be chosen, for example, based on the detector in use, the date of the observation, etc. The website also provides tabular information for each reference file, such as the "ACTIVATION DATE" for when these files were delivered, and their "PEDIGREE" indicating the type of data used to create these files. (To see these columns, click on an instrument, then on one of the reference files listed below.)

Reference files are continually being updated by the JWST instrument teams based on analysis of in-flight data, allowing the overall calibration quality of JWST data products to improve as the instrument performance becomes better characterized. New CRDS contexts are created to point to such updated files and will be used by the next pipeline quarterly release. See Choosing a Context below.

A detailed list of the required reference files for each pipeline step, and the corresponding format of those reference files, can be found in the ReadTheDocs Reference File Table.

Types of reference files

Broadly speaking, there are 2 kinds of reference files contained within CRDS.

Basic reference files contain calibration data necessary for the pipeline. Flat fields, bad pixel masks, photometric calibration vectors, etc., are all basic reference files called from within a given pipeline step.

Parameter reference files (see ReadTheDocs documentation) are ASDF/YAML-format text files that configure how the pipeline is run for a particular step. For example, there are distinct parameter reference files for the calwebb_spec2 pipeline stage (see, for example, jwst_miri_pars-spec2pipeline_0005.asdf) that configure the pipeline to run with a different set of default arguments for individual instrument modes. These parameter reference files can be used to skip steps or provide defaults for user-customizable input parameters. Such defaults may either be provided at the individual step level (e.g., jwst_miri_pars-outlierdetectionstep_0052.asdf) or for the stage within which that step is contained (e.g., jwst_miri_pars-spec2pipeline_0005.asdf)

Choosing a context

As of October 2024 (see CRDS Migration to Quarterly Calibration Updates), each version of the JWST calibration pipeline will automatically choose the appropriate default CRDS context consisting of the latest reference files guaranteed to be compatible with that pipeline version.

There may, however, be more recent reference files than provided by this default context and users can thus select specific contexts if desired. The CRDS website (https://jwst-crds.stsci.edu/) provides a list of all of the available contexts. 'Build' contexts include all reference files required for a given released version of the software, and will be the most up to date context guaranteed to be compatible with that software version. The 'Latest' context will continue to be updated as new reference files are delivered, and provides a preview of the reference files that will be associated with the next software release. These 'Latest' reference files are not guaranteed to work well with older pipeline versions however.

Users can override their context in use by setting (for example) the environmental variable CRDS_CONTEXT=jwst_1179.pmap (or in a Jupyter notebook environment , %env CRDS_CONTEXT jwst_1179.pmap). Examples of such overrides are provided in the JWST pipeline notebooks. Note that any context overrides should be set prior to importing JWST pipeline functions. If not using the default context it is important to specify both the jwst software version and CRDS context in use when publishing scientific results based on JWST data. See Citing JWST Data for further details.

Subscribing to notifications about new reference files

For each instrument, users can be notified about new reference file updates that will be used by the next pipeline build by subscribing to a list server notification system for each science instrument. Subscribing is done by simply sending an email command to the list server maillist.stsci.edu. The list server "subscribe" command for each instrument is:

MIRI: miri_reffiles_upd-subscribe-request
NIRCAM: nircam_reffiles_upd-subscribe-request
NIRISS: niriss_reffiles_upd-subscribe-request
NIRSPEC: nirspec_reffiles_upd-subscribe-request
FGS: fgs_reffiles_upd-subscribe-request

To subscribe to notifications, send an email (with no subject and email body) using the address format <subscribe_command>@maillist.stsci.edu. For example, to subscribe to MIRI reference files deliveries, send an email to miri_reffiles_upd-subscribe-request @ maillist.stsci.edu.

The Reference Data for Calibration and Tools Management (ReDCaT) team will authorize your request and you will receive an email with information about the mailing list and how to unsubscribe.

Whenever a delivery of reference files is activated for use with the JWST Operational Pipeline, the ReDCaT Team sends a notification indicating the type of files delivered, the modes affected, and the USEAFTER dates for the files. It also includes the full list of files, the reason for the delivery, and relevant links.

If you prefer to wait for your data to be reprocessed by the operations pipeline, use the MAST Portal to subscribe to observations of interest. You will receive notifications when recalibrated products appear in the Archive. See the Program Subscriptions and Notifications chapter of the Portal Guide for details.

Why should I rerun the pipeline?

Standard calibration pipeline processing should produce publication quality science data products that can be downloaded directly from the MAST archive (see Accessing JWST Data). However, this pipeline automatically processes data from sparse deep fields to ultra-bright Jovian planets and everything in-between, and processing can usually be optimized by users making science-specific decisions for their own observing programs. The pipeline is therefore designed to be modular with multiple user-configurable options to support a variety of science cases. As an example, the MIRI MRS integral field spectrometer (like many mid-IR instruments) experiences significant spectral fringing which can give rise to artifacts in the extracted spectra. Multiple pipeline steps exist to help remove these artifacts, but should only be enabled if the science spectrum is not expected to contain regular periodic astrophysical features with a similar frequency.

Likewise, while STScI reprocesses all science data periodically with the latest software and reference file versions, users who do not wish to wait can reprocess the data themselves as soon as any updates are available.

In general, users are encouraged to familiarize themselves with description of the JWST calibration status of each instrument mode of interest, along with the list of any known issues with JWST data of which they should be aware for their particular science. The Known Issues articles also help to provide guidance for each instrument mode on when reprocessing may be desirable, and how to set relevant parameters accordingly.

How do I rerun the pipeline?

See Running the JWST Science Calibration Pipeline and Tips and Tricks for working with the JWST Pipeline.

References

Bushouse, H., et al. 2023, Zenodo
JWST Calibration Pipeline.

Notable updates	10 Oct 2024 Updated CRDS information to reflect new quarterly calibration update model.
Originally published	15 Mar 2024