STScI Reprocessing of JWST Data

JWST data stored in the Mikulski Archive for Space Telescopes (MAST) is reprocessed after new software is implemented or reference data becomes available. This article provides information about when, how, and why JWST data is reprocessed and re-archived in MAST.

On this page

Reprocessing triggers

MAST documentation outside JDox: Updates to JWST Data

When observations are received at STScI, they go through a series of processes, including converting the telemetry to readable files, calibrating the exposures, and archiving them in the  Mikulski Archive for Space Telescopes (MAST). The generated data products, however, can go through one or many of these processes again when there are changes in the JWST Operational Pipeline subsystems or changes to the reference data it uses. The main changes that can trigger reprocessing for a given proposal are described in the following sections.

Delivery of new calibration reference files

Improved or updated calibration reference files are frequently delivered to the Calibration Reference Data System (CRDS). Only programs with data that are affected by these calibration changes are reprocessed to update the science products in MAST. Observers can subscribe to mailing lists to be notified about new reference files.

Operational pipeline updates

The quality of the science products can also be improved in other ways; for example, more accurate information from engineering data is available, new and better calibration algorithms are implemented, and bugs in the science calibration pipeline are fixed. When these changes are pushed to the operational pipeline, programs with instruments and observing modes that are affected by the changes are reprocessed to take advantage of the improvements.


MAST documentation outside JDox: Program Subscriptions and Notifications

To make reprocessing more efficient and less confusing for the community, the pipeline team implemented a well-ordered reprocessing schedule. But first, a comment about process to better understand the scheduling rationale: Instrument teams are responsible for analyzing calibration data and creating improved reference files. As soon as those reference files are ready for use in the operational pipeline, they are delivered to CRDS. Now, imagine a situation where a particular mode received several reference files within a short period. The frequent reprocessing of that mode, every time a new reference file comes in, is an inefficient way to recalibrate data. To avoid this scenario, the operations team devised a schedule to optimize reprocessing, a waiting period to allow several reference files to be delivered so they could be used simultaneously in a pipeline run. As a result, reprocessing is scheduled twice a month; on the first business day of the month and again in the middle of the month.

Changes to the operational pipeline, which includes the science calibration pipeline, are generally only made every 3 months. Usually, the installation of these changes coincides with or is very close to the reprocessing schedule due to reference file deliveries. As such, the reprocessing starts for both the science calibration pipeline and reference files changes soon after the operational build is installed. Observers may elect to receive notifications from MAST when data from their programs are reprocessed. Note that if an observer attempts to download data from MAST that are being reprocessed, a message will pop up with an alert and ask if they want to proceed to download the data.

Processing levels

Changes to the calibration of the science data can often affect only a subset of the data and only specific stages of processing. To make reprocessing faster and more efficient, it is started at the minimum level affected by the changes (for example, stage 1 raw ramps or stage 2 calibrated slope images). This means that observers might see that their data has different versions of the software used during different stages of processing. However, it should be expected to see the same software versions for all of the data products generated by stages beyond the first updated stage of processing that includes the recent change. Software versions used for processing are tracked in the data headers

Ephemeris information

The Flight Dynamics Facility (FDF) group at Goddard evaluates the ephemeris (the 3-dimentional position and velocity as a function of time) for JWST's orbit and releases a weekly update that includes the actual ephemeris up to a specific date, and a predicted ephemeris that is used for planning and executing upcoming observations. Both are stored in a database within minutes of the files being delivered by FDF, and the database is accessed by level 1 data processing in the operational pipeline. Subsequent weekly deliveries by FDF will update the predicted ephemeris database values. To populate the ephemeris headers in the JWST data products, the JWST Data Management System (DMS) uses the definitive ephemeris values, or if they are not yet available, they use the predicted ephemeris values. Most data are initially processed with the predicted ephemeris values, so the first time DMS reprocesses data for a program, they start at level 1a to retrieve the definitive values and include them in the headers. The header keyword EPH_TYPE lists if Definitive or Predicted was used. The ephemeris files (both definitive and predicted) are also ingested into MAST. The ephemeris is used for aberration correction in the WCS parameters in the science headers, so all science types are affected.

Unprocessed data

MAST documentation outside JDox: Updates to JWST Data

Based on the reprocessing schedule, most data in MAST should have the highest quality reference data and the latest software. If observers believe that their data was affected by a reference file delivery or calibration pipeline build and it has not been reprocessed within the stated time frame, please submit a ticket to the  MAST Archive Help Desk. Alternatively, observers can always install and run the science calibration pipeline on their own platform using the public version of the software with the most recent reference data.

Latest updates
Originally published