Running the JWST Science Calibration Pipeline

The JWST Science Calibration Pipeline can be run in a variety of different ways, each of which may be better suited to different workflows. This article provides a high-level overview of these different methods, and link to more extensive documentation provided elsewhere. Note that before running the pipeline it is essential to make sure that the software is properly installed, that your system has been configured to point to the CRDS reference file server, and that you have activated any relevant conda environments (see installation instructions on the JWST Science Calibration Pipeline main page).

On this page

Input/output

Most stages of the pipeline can either be run on individual science data files or on associations that list multiple such files and the relationship between them. Low-level detector processing, for instance, would typically be done on individual files, while final combination of multiple different exposures into a single mosaic would require use of an association file.

For an introduction to pipeline input/output data files, associations, and when they should each be used, see the JWST Science Data Overview article.

Note that interested users can customize the logging configuration of the pipeline if they wish; see ReadTheDocs for details.



Running from the command line

Words in bold are GUI menus/
panels or data software packages; 
bold italics are buttons in GUI
tools or package parameters.

The strun command can be used to run either a pipeline (e.g., calwebb_detector1) or individual pipeline steps from the command line:

$ strun <pipeline_name> <input_file>

The first argument to strun must be either a pipeline name (see Table 1), Python class of the step or pipeline to be run, or the name of a parameter reference file for the desired step or pipeline. The second argument to strun is the name of the input data file or association file to be processed.  For instance, to process an example uncalibrated file through the calwebb_detector1 pipeline:

strun calwebb_detector1 jw01523003001_03102_00001_mirifulong_uncal.fits

Parameters can be set to non-default values by providing them after the filename. For instance,

strun calwebb_detector1 jw01523003001_03102_00001_mirifulong_uncal.fits --steps.jump.maximum_cores='half'

would enable the jump step to use multiprocessing with half of the available CPU cores instead of just a single core.  To see all available parameters, the -h option can be used, for example:

strun calwebb_detector1 -h

It is also possible to use the command line method from within a Python session. See strun pages on ReadTheDocs for further details.

Table 1. Pipeline names and Python classes

ModePipeline namePython class
Stage 1 (All)calwebb_detector1Detector1Pipeline
Stage 2 (Imaging)calwebb_image2Image2Pipeline
Stage 2 (Spectroscopy)calwebb_spec2Spec2Pipeline
Stage 3(AMI)calwebb_ami3Ami3Pipeline
Stage 3
(Coronagraphy)
calwebb_coron3Coron3Pipeline
Stage 3 (Imaging)calwebb_image3Image3Pipeline
Stage 3 (Spectroscopy)calwebb_spec3Spec3Pipeline
Stage 3 (TSO)calwebb_tso3Tso3Pipeline



Running from within Python

The pipeline can also be called directly from within a python environment by importing the relevant Python classes (see Table 1); for example:

from jwst.pipeline import Detector1Pipeline

This pipeline can then be executed in two possible ways: using the .call() method or the .run() method.  Which method is best depends on your preferences for how to configure the pipeline.

Using the .call() method

The .call() method is designed to execute the pipeline as simply as possible with the default parameter reference settings available in CRDS. As such, this is the manner in which data is processed by the Data Management System (DMS) within the operational environment and used to populate the MAST database. For instance, to execute the calwebb_detector1 pipeline on an example file:

from jwst.pipeline import Detector1Pipeline  
result = Detector1Pipeline.call('jw01523003001_03102_00001_mirifulong_uncal.fits')

Note that this method does not save results to output files by default, as a user may simply wish to pass them in memory instead. To enable writing results to disk,

from jwst.pipeline import Detector1Pipeline
result = Detector1Pipeline.call('jw01523003001_03102_00001_mirifulong_uncal.fits', save_results=True)

The .call() method can also set individual parameters to non-default values by passing a parameter dictionary. For example:

from jwst.pipeline import Detector1Pipeline
dict = {"jump": {"maximum_cores": 'half'}} # Set up a python dictionary with parameters to change
result = Detector1Pipeline.call('jw01523003001_03102_00001_mirifulong_uncal.fits', steps=dict, save_results=True)

would enable the jump step to use multiprocessing with half of the available CPU cores, and write the results to a file. Setting additional parameters can be achieved by adding to the parameter dictionary.

More information on this method can be found in the ReadTheDocs .call() pages.

Using the .run() method

The .run() method is designed to provide the user with the maximum possible control over how individual steps are configured. No defaults are set from CRDS parameter reference files, and only user-specified values are adopted.  As such, .run() can be dangerous as it is easy to forget to set some parameters that must be configured for a given observing mode. In practice, users of the .run() method should therefore spend an additional line of setup as the price for easier configuration of the parameters that they wish to change. For example:

from jwst.pipeline import Detector1Pipeline
inpfile = 'jw01523003001_03102_00001_mirifulong_uncal.fits'
crds_config = Detector1Pipeline.get_config_from_reference(inpfile) 
detector1 = Detector1Pipeline.from_config_section(crds_config)
detector1.jump.maximum_cores = 'half' # Set the jump step to use half of the available cores
detector1.ramp_fit.maximum_cores = 'half' # Set the ramp fitting step to use half of the available cores
detector1.save_results = True # Save results to disk
result = Detector1Pipeline.run(inpfile)

will set up the calwebb_detector1 pipeline as a function, set the default parameters to those of the appropriate CRDS parameter reference files for the input file in question, set both the ramp and the jump steps to use half of the available CPU cores, and write the results to a file when the function is called on a given input file.

Which method of executing the pipeline to use is thus a matter of personal preference. Note that if neither .call() nor .run() is specified, the .run() method will be used, although this usage is deprecated and will be removed in an upcoming build.

More information on this method can be found in the ReadTheDocs .run() pages.



Running from Jupyter notebooks

Multiple notebooks are available to illustrate how to run the JWST pipeline in a Jupyter notebook environment. See JWST Pipeline Notebooks.



Saving intermediate data products

By default, each stage of the pipeline passes information between individual steps in memory rather than writing these files to disk. However, individual steps can also be configured to write their results to disk. For instance, to get the jump and ramp_fit steps of calwebb_detector1 to write out their results using the .run() method, you could write

from jwst.pipeline import Detector1Pipeline
inpfile = 'jw01523003001_03102_00001_mirifulong_uncal.fits'
crds_config = Detector1Pipeline.get_config_from_reference(inpfile) 
detector1 = Detector1Pipeline.from_config_section(crds_config)

detector1.jump.save_results = True # Save intermediate results to disk
detector1.ramp_fit.save_results = True # Save intermediate results to disk

detector1.save_results = True # Save final results to disk
result = Detector1Pipeline.run(inpfile)



Full example for overriding reference files and setting optional parameters

When running the pipeline from within a Python environment, it can be particularly powerful to set up custom functions that contain a variety of optional switches, reference file overrides, etc. This example demonstrates a .run() implementation of a Python function to run the calwebb_spec3 pipeline on MIRI MRS IFU data, listing a variety of the optional parameters that are available and illustrating how to override various reference files. (See ReadTheDocs: Reference File Types for a full list of reference files associated with each pipeline step).  Many of these custom parameters are commented out, leaving the user to decide whether they want to change them or not. This particular example is drawn from the curated MIRI MRS notebook available on the JWST Pipeline Notebooks page.

from jwst.pipeline import Spec3Pipeline
spec3_dir = 'mydirectory/' # Custom output directory

# Define a function that will call the spec3 pipeline with our desired set of parameters
# This is designed to run on an association file
def runspec3(filename):
    # This initial setup is just to make sure that we get the latest parameter reference files
    # pulled in for our files.
    crds_config = Spec3Pipeline.get_config_from_reference(filename)
    spec3 = Spec3Pipeline.from_config_section(crds_config)
    
    spec3.output_dir = spec3_dir # Set custom output directory
    spec3.save_results = True # Save final results to disk

    # Overrides for whether or not certain steps should be skipped
    #spec3.assign_mtwcs.skip = False
    #spec3.master_background.skip = True
    spec3.outlier_detection.skip = False
    spec3.outlier_detection.kernel_size = '11 1'
    spec3.outlier_detection.threshold_percent = 99.5
    #spec3.mrs_imatch.skip = True
    #spec3.cube_build.skip = True
    #spec3.extract_1d.skip = False
     spec3.spectral_leak.skip = False  
    
    # Cube building configuration options
    # spec3.cube_build.output_file = 'mycube' # Custom output name
    spec3.cube_build.output_type = 'band' # 'band', 'channel', or 'multi' type cube output
    # spec3.cube_build.channel = '1' # Build everything from just channel 1 into a single cube (we could also choose '2','3','4', or 'ALL')
    # spec3.cube_build.weighting = 'drizzle' # 'emsm' or 'drizzle'
    # spec3.cube_build.coord_system = 'ifualign' # 'ifualign', 'skyalign', or 'internal_cal'
    # spec3.cube_build.scale1 = 0.5 # Output cube spaxel scale (arcsec) in dimension 1 if setting it by hand
    # spec3.cube_build.scale2 = 0.5 # Output cube spaxel scale (arcsec) in dimension 2 if setting it by hand
    # spec3.cube_build.scalew = 0.002 # Output cube spaxel size (microns) in dimension 3 if setting it by hand
    
    # Cubepar overrides
    #spec3.cube_build.override_cubepar = 'myfile.fits' # Override the cube_build step to use a custom cubepar reference file
        
    # Extract1D overrides and config options
    #spec3.extract_1d.override_extract1d = 'myfile.asdf' # Override the 1d extraction step to use a custom extract1d reference file
    #spec3.extract_1d.override_apcorr = 'myfile.asdf' # Override the 1d extraction step to use a custom aperture correction reference file
    spec3.extract_1d.ifu_autocen = True # Enable auto-centering of the extraction aperture
    #spec3.extract_1d.center_xy=(20,20) # Override aperture location if desired
    spec3.extract_1d.ifu_rfcorr = True # Turn on 1d residual fringe correction

    spec3(filename)

myasn = 'l3asn.json' # Define input association file
runspec3(myasn) # Run the custom pipeline function on the association file




Notable updates

  • Added information about the caveats and artifacts

 

  • Added pointer to new citation article
Originally published