Understanding Data Files

JWST data files have unique names that map to the original proposal, observation, visit, instrument, and detector used. Their particular format depends on the stage of the JWST Data Reduction Pipeline where they were created.

On this page

General overview

See also: File Naming Conventions and Data ProductsData Structure, Associations, Data Processing and Calibration Files

JWST data generally shares a basic structure with slight variations that depend on observing mode or instrument used. Working with JWST data requires an understanding of JWST FITS files,  Advanced Scientific Data Format (ASDF) files, and JSON files. FITS format files contain the science data pixels, ASDF files contain the world coordinate system information, and the JSON files contain information regarding the way the science data is associated. JWST data is generated by the Data Management System (DMS) via Science Data Processing (SDP) and the Calibration Pipeline before it is archived in MAST. Telemetry data from the recorded science data files received by DMS is in the same binary format that is stored on the JWST Solid State Recorder (SSR). These files will come in compressed packets that will be read by SDP to extract science data and relevant detector and exposure information. 

The initial FITS header will contain keywords required by the FITS standard and those required for identification, naming of the files, data structure definition, and calibration of the science data by the calibration pipeline. The keywords are populated with telemetry packet headers and science image headers; proposal, planning, and scheduling information; spacecraft position; time conversions; pointing information; and select engineering parameters. The corresponding transformation from the detector positions to a world coordinate frame (ICRS and wavelength) for the science data is provided via distortion and spectral models stored in ASDF format extensions of the FITS file.

The JSON file provides the list of data to be associated. JWST data products can be divided into two main types of data; data products from single exposures produced during stage 1 and stage 2 of the calibration pipeline, and data products that result from the combination of these exposures into a single product in stage 3. Within the two categories, there are different ways in which a set or subset of exposures is combined, and each of these corresponds to a unique association. The way exposures are combined is determined by the information from the Astronomers Proposal Tool.



File naming conventions and data products

See also: Data Structure, Associations, Data Processing and Calibration Files

Science data file names

JWST science data files are "exposure-based" during their first stages of processing, meaning they contain only the values from a single exposure for a single detector. "Exposure-based" file names are constructed using information from the observation in the corresponding APT files, according to the stage 0 -2 naming schemes. These files are processed and combined with other exposures in stage 3 of the calibration pipeline, where products become "source-based." These are also known as "Associations," and their naming convention includes information such as target and source ID, instrument, and optical element according to the stage 3 naming schemes.

Each stage of processing creates corresponding intermediate and final data products depending on the type of data being calibrated, and the level of processing is indicated by the suffix of a file. The data product table contains the full list of all "exposure-based" and "source-based" product types for each stage of processing, including active links to more detailed descriptions of each kind of data product in the input and output columns.

Association file names

JWST association files follow a similar naming convention. Associations capture the relationship between exposures that are to be combined by design to form a single product. Association data products result from the combination of multiple exposures like dithers or mosaics, and are generated by stage 3 of the calibration pipeline. These products will have a file name that includes the astronomical source and observation parameters that will help users map them to the proposed observation. The software documentation contains the latest information on naming conventions for association files for the different association types.



Data structure

See also: File Naming Conventions and Data Products, Associations, Data Processing and Calibration Files

Uncalibrated data (stage 0 data files) and all science data processed in stage 1 and stage 2 of the calibration pipeline are FITS files that contain the pixel values for a single exposure from a single detector. Stage 3 of the calibration pipeline associates and combines a set of these files into a single unit. Any catalogs generated by the calibration pipeline are ASCII ECSV (Extended Comma-Separated Value) files. Figure 1 shows the flow of the data from the telemetry packages to the combined set of data. As data travels through the stages of processing, changes to the structure may occur.

Figure 1. Flow of data from the telemetry packages to high-level data products


Multi-extension FITS format

Flexible Image Transport System (FITS) is a standard format for exchanging astronomical data, independent of the hardware platform and software environment. FITS format files consist of a series of Header Data Units (HDUs), each containing two components: an ASCII text header and binary data. The header contains a series of keywords that describe the data in a particular HDU; the data component may immediately follow the header.

For JWST FITS data, the first HDU, or primary header, only contains header information in the form of keyword records with an empty data array, which is indicated by the occurrence of NAXIS=0 in the primary header. The primary header may be followed by one or more HDUs called extensions, which may take the form of images, binary tables, ASDF files, or ASCII text tables. The data type for each extension is recorded in the XTENSION header keyword.

Header keywords and relationships

Most JWST data products are FITS files. The FITS header keywords in the files contain important information that characterizes the observation, telemetry received during the observations, and information related to post-observation processing of the data. Knowledge of the keywords and their location is an important first step to understanding JWST data. By examining the file header using tools such as astropy, observers can find detailed information about the data, including:

  • Coordinates of the target, program number, and other observation identifiers
  • Date and time of the observation including start, end, and mid-exposure times
  • Exposure parameter information, such as the instrument configuration (DETECTOR, FILTER, SUBARRAY)
  • Readout definition parameters (READPATT, NINTS, NGROUPS, and GROUPGAP)
  • Exposure-specific information, such as detailed timing and world coordinate system information
  • Calibration information, such as the calibration switches and reference files used by the pipeline

Following FITS conventions, each keyword is no longer than eight characters, and their values can be an integer, real (floating-point) number, or a character string. Several keywords are common to all JWST data, and others are instrument-specific. The JWST Keyword Dictionary contains the complete list of standard JWST header keywords.

Header keywords related to a particular topic are kept together logically, such as the program information or target information. The sample data header shows some of the keywords and groupings. These keywords can originate from the Proposal and Planning System (PPS), Observatory Status File (OSF), Science and Operations Center Project Reference Database (PRD), Science Data Processing (SDP), or Calibration Software. The JWST Keyword Dictionary provides information about the source of keywords, the FITS header extension where they can be found, and valid values.

Uncalibrated data

After telemetry conversion and coordinate transformation, Science Data Processing (SDP) will generate uncalibrated (stage 0) FITS files with the pixel data organized by detector and exposure. SDP will format the science data into a 4D data array, with NAXIS1 = column, NAXIS2 = row, NAXIS3= NGROUPS, NAXIS4 = NINTS.

Figure 2. Science exposure data cube


The first two dimensions are the 2D science images, the dimension of the third axis is determined by the number of groups in the exposure, and the fourth axis corresponds to the number of integrations. All four dimensions will be used, even if NGROUPS = 1 or NINTS = 1. Each integration within an exposure must have the same number of groups. The standard readout sampling for all JWST detectors is up-the-ramp readout (sometimes referred to as MULTIACCUM), meaning pixel values increase between groups within an integration. The first frame of an integration may be read out separately, even if that frame is also averaged into the first group of the integration. A zero frame readout is identified with Group ID = 0.  If there is a zero frame readout, there will be one for each integration. The zero frame will be stored in the data cube as the first image in each integration.

The FITS header of stage 0 files will have keywords required by the FITS standard and those extracted from the telemetry packet headers and science image headers. The second phase of stage 0 of the processing of these files will add additional keywords to the files. Figure 3 shows a schematic representation of how the data from many sources is used by the Science Data Processing system to populate the FITS headers of the science data. After the completion of science data processing, the stage 0 data file will be ready for calibration.

Figure 3. Source and usage of information by the SDP system to populate the headers of science data


Calibrated data

All JWST FITS data products have a few common features in their structure and organization:

  1. The FITS primary Header Data Unit (HDU) only contains header information, in the form of keyword records, with an empty data array, which is indicated by the occurrence of NAXIS=0 in the primary header. Metadata that pertains to the entire product is stored in keywords in the primary header. Metadata related to specific extensions in the data products is stored in keywords in the headers of each extension.

  2. All data related to the product are contained in one or more FITS IMAGE or BINTABLE extensions. The header of each extension may contain keywords that pertain uniquely to that extension.

The default and optional output files for each stage of processing (stages 1 - 3) are listed in the table of data products. The number and type of extensions depend on the data product, as shown in the Science Product tables. The Science Product tables also provide more details about the structure and type of information contained in the extensions of the data.

ASDF format

The Advanced Scientific Data Format (ASDF) is a next generation, human-readable, hierarchical metadata structure made up of basic dynamic data types such as strings, numbers, lists, and mappings. Data is saved as binary arrays. It is primarily intended as an interchange format for delivering products from instruments to scientists or between scientists, or, for example, between stages of the calibration pipeline. ASDF files are added to certain JWST calibration pipeline products and are part of the calibration pipeline reference data used by the software. As an example, distortion and spectral models needed to transform detector positions to a world coordinate frame are in ASDF format. 



References

Pence, W. D., et al. 2010, A&A, 524, A42
Definition of the Flexible Image Transport System (FITS), version 3.0

Swade, D., 2014, JWST-STScI-004078
Design of Imaging Associations

Swade, D., Greenfield, P., Jedrzejewski, R., and Valenti, J., 2010, JWST-STScI-002111
DMS Level 1 and 2 Data Product Design



Published

 

Latest updates