Running the JWST Science Calibration Pipeline
Information about how to run the JWST Science Calibration Pipeline is provided in this article.
On this page
See also: Algorithm Documentation, Stages of JWST Data Processing, Understanding JWST Data Files, JWST Data Associations
Software documentation outside JDox: Software Documentation, Running the Pipeline, File Naming Conventions, Data Product Types, Science Product Structures and Extensions, Data File Associations
Standard calibration pipeline processing should produce publication quality data products. However, your science cases may require specialized processing with settings other than the defaults used in pipeline processing. Also, while bulk reprocessing will be performed by STScI as conditions and resources permit, you may wish to expedite reprocessing of data sets of interest to you when new calibration and reference files become available.
Most data has now been calibrated using on-orbit calibration files from commissioning; over time, these will be replaced with calibrations based on more updated on-orbit data from Cycle 1 and 2 calibration programs. Other conditions under which an observer may need to reprocess data will not be known until JWST has completed Cycle 1 and 2 calibration activities and the instrument data is better characterized; however as more calibration data becomes available, STScI will provide guidance for users on whether they should reprocess their data.
For information on citing your data reduction for publication, see the guidelines in How to Cite JWST Data Reductions and Reference Files.
Science calibration pipeline stages
There are 3 main calibration pipeline stages required to completely process a set of exposures for a given observation:
- Stage 1: Apply detector-level corrections to the raw data for individual exposures and produce count rate (slope) images from the "ramps" of non-destructive readouts
- Stage 2: Apply physical corrections (e.g., slit loss) and calibrations (e.g., absolute fluxes and wavelengths) to individual exposures
- Stage 3: Combine the fully calibrated data from multiple exposures
Words in bold are GUI menus/
panels or data software packages;
bold italics are buttons in GUI
tools or package parameters.
There are generally 2 types of input: science data files or associations, and reference files. The reference files are provided by the Calibration Reference Data System (CRDS) unless they are explicitly overwritten.
Reproducing MAST data products
In some cases, observers may want to run the pipeline in exactly the same way as the operational pipeline used for data going into MAST. The syntax for this will depend on the method being used to run the pipeline. The available methods are outlined below.
Note: Observers must be sure to use the same pipeline and CRDS version currently used in operations in order to reproduce the operational pipeline settings. Check the header of the MAST data product for the
CRDS_CTX header values, which will provide the jwst pipeline software version and the CRDS context (on the operational server https://jwst-crds.stsci.edu/).
Using the .call() method will search for and use any parameter reference files that exist in CRDS, which can contain parameter overrides that get applied to different exposure types (e.g., TSO vs non-TSO).The particular parameter reference files used during processing are logged in the standard logging method but are not stored in the header, as that information is redundant with the CRDS context. As long as the matching context is used, all parameters will be set in the same way. This is the way data is processed by the Data Management System (DMS) within the operational environment. The table below lists the pipeline modules to use for each stage of processing depending on the mode:
After activating a pipeline environment, the 3 pipeline stages can be imported and run as follows in a Python session:
from jwst.pipeline import Detector1Pipeline detector1 = Detector1Pipeline() result = Detector1Pipeline.call("jwxxxxx_uncal.fits")
Note that stages 2 and 3 pipelines will take either an individual file as input or an association table. See the documentation for more information about inputs and outputs for each pipeline.
strun command line or
strun command can also be used to run the pipeline or pipeline steps from the command line in the same way DMS does to process the data. The table below lists the pipeline modules to use for each stage of processing depending on the mode:
After activating a pipeline environment, the 3 pipeline stages can be run as follows from the command line:
$ strun <pipeline_name> <input_file>
The first argument to
strun must be either a pipeline name, Python class of the step or pipeline to be run, or the name of a parameter file for the desired step or pipeline (see Parameter Files). The second argument to
strun is the name of the input data file to be processed. It is also possible to use the command line method from within a Python session.
.run() method is the lowest-level method to executing a step or pipeline, so initialization and parameter settings are left up to the user. This makes it relatively complicated to replicate the operational settings for running the pipeline. To reproduce data products coming from MAST, we do not recommend using this method. More information on this method is provided here: Running a Step in Python.
Documentation outside JDox: JWebbinars
While the pipeline software documentation offers a general description on how to run the pipeline, a number of intricacies exist in the way in which the various software and data products interact. Several Jupyter notebooks have been developed to help you understand your data or to highlight general science workflows that you may want to consider while reducing your own data.