JWST Data Structures

JWST data generally shares a basic structure with slight variations that depend on observing mode or instrument used. Their particular format depends on the stage of the JWST Science Calibration Pipeline where they were created. 

On this page

See also: Data File Naming Conventions, Associations, JWST Data Calibration Considerations

Words in bold are GUI menus/
panels or data software packages; 
bold italics are buttons in GUI
tools or package parameters.

Knowing how the science data are constructed is an important part of understanding JWST science data products. Once the telemetry for the science and engineering data is received, the science files are constructed and header keywords are populated. Header keywords provide relevant information about the observations and are populated with information from different operational subsystems. The science data is then processed by the JWST Science Calibration Pipeline. JWST uncalibrated data and calibrated data are archived in the Mikulski Archive for Space Telescopes (MAST). All these operations run automatically within different subsystems of JWST Operations Pipeline.

Making the data science files

Once JWST data are obtained, the telemetry data are processed to extract the science data and relevant detector and exposure information. The Science Data Processing (SDP) subsystem transforms the science data from detector to science coordinates and generates uncalibrated files with the pixel data organized by detector and exposure. SDP formats the science data into a 4-D data array, with keywords NAXIS1 = number of columns, NAXIS2 = number of rows, NAXIS3= Ngroup,  and NAXIS4 = Nint. See Understanding Exposure Times for the meaning of these parameters.

Figure 1. Science exposure data cube

The first 2 dimensions are the 2-D science images. The dimension of the 3rd axis is determined by the number of groups in the exposure. And the 4th axis corresponds to the number of integrations. All 4 dimensions will be used, even if Ngroup = 1 or Nint= 1. Each integration within an exposure must have the same number of groups. The standard readout sampling for all JWST detectors is up-the-ramp readout (sometimes referred to as MULTIACCUM), meaning pixel values increase between groups within an integration.

Populating the header keywords

The FITS headers of all JWST data contain keywords required by the FITS standard and keywords relevant to the observation. Knowledge of these keywords is an important first step in understanding JWST data. By examining the file header using tools such as Astropy, observers can find detailed information about the data, including:

  • Coordinates of the target, program number, and other observation identifiers
  • Date and time of the observation including start, end, and mid-exposure times
  • Exposure parameter information, such as the instrument configuration (DETECTOR, FILTER, SUBARRAY)
  • Readout definition parameters (READPATT, NINTS, NGROUPS, and GROUPGAP)
  • Exposure-specific information, such as detailed timing and world coordinate system information
  • Calibration information, such as the calibration switches and reference files used by the pipeline

Header keywords related to a particular topic are kept together logically, such as the program information or target information. This sample data header shows some of the keywords and groupings. The full sample of schematic headers for all the JWST modes can also be found in MAST. The JWST Keyword Dictionary in the MAST documentation contains the complete list of standard JWST header keywords, the FITS header extension where they can be found, where the information comes from, and their valid values. 

Figure 2 shows a schematic representation of how data from many sources are used by the Science Data Processing (SDP) subsystem to populate the uncalibrated (stage 0) FITS files of the science data. These will have keywords required by the FITS standard and keywords relevant to the observation that are extracted from the telemetry packet headers and science image header. At this stage, the science data is also transformed from the detector positions to a world coordinate frame (ICRS and wavelength) via distortion and spectral models. After the data goes through SDP, the stage 0 data file will be ready for calibration. 

Figure 2. Source and usage of information by the SDP system to populate the headers of science data

Following FITS conventions, each keyword is no longer than 8 characters, and their values can be an integer, real (floating-point) number, or a character string. Several keywords are common to all JWST data, and others are instrument-specific. 

JWST Operations Pipeline data flow

The uncalibrated JWST data are then processed through the different stages of the science calibration pipeline and archived in MAST

Figure 3 shows the flow of the data through the JWST Operations Pipeline, from the original telemetry packages to the combined set of data. Uncalibrated data (stage 0 data files) and all science data processed in stage 1 and stage 2 of the JWST Science Calibration Pipeline are FITS files that contain the pixel values for a single exposure from a single detector. Stage 3 of that pipeline associates and combines a set of these files into a single integrated product. A JSON association file provides the list of data to be associated into a single stage 3 product. Any catalogs generated by the science calibration pipeline are ECSV files. As data travels through the stages of processing, changes to the structure may occur.

Figure 3. Flow of data from the telemetry packages to high-level data products

Calibrated data products

All JWST FITS calibrated science data products have a few common features in their structure and organization:

  1. The FITS primary header data unit only contains header information, in the form of keyword records, with an empty data array, which is indicated by the occurrence of NAXIS = 0 in the primary header. Metadata that pertains to the entire product is stored in keywords in the primary header. Metadata related to specific extensions in the data products is stored in keywords in the headers of each extension.

  2. All data related to the product are contained in one or more FITS image ("IMAGE") or binary table ("BINTABLE") extensions. The header of each extension may contain keywords that pertain uniquely to that extension.

The default and optional output files for each stage of processing (stages 1–3) are listed in the table of data products. The number and type of extensions depend on the data product, as shown in these science product tables, which also provide more details about the format and content structure and type of information contained in each extension.

Latest updates
Originally published