JWST Data Volume and Data Excess
Downlink and onboard data storage capacity are limited resources that must be managed. Observers can reduce data volume and data excess by using readout patterns that combine more frames or by allowing interruptions between visit execution. Visits that generate data volume significantly above the nominal allocation will be more difficult to schedule, potentially delaying execution.
On average, JWST downloads data to the Earth every 12 hours, with each contact or "downlink" nominally lasting 4 hours. JWST uses a Solid-State Recorder(SSR) to stage data for downlink. The SSR can store approximately 65 GB of science data. A sequence of scheduled visits generally will not fill the SSR, even if one downlink contact is missed.
Factors that affect downlink and SSR usage
Data volume for a visit is the sum of data volume for each exposure in the visit plus a small contribution from the guider.
Data volume for an exposure depends mainly on number of detectors, number of outputs per detector, readout pattern, number of groups per integration, and number of integrations. Number of detectors includes both prime and parallel instruments. Number of outputs per detector is 1, 4, or 5, depending on data mode. Readout patterns reduce data volume by combining multiple frames into a single downlinked group and by dropping frames between groups in an integration. Subarray size has a relatively small effect on data volume because each detector output generates a sample every 10 µs regardless of subarray size, except for small dead time between rows and between frames.
Sustainable downlink rate is a statistical estimate of the average downlink rate achievable over a cycle, accounting for the fraction of time JWST downlinks data (about 30%), data overheads associated with the downlink protocol, and occasional missed contacts. The nominal data allocation for a visit is the sustainable downlink rate (0.87 MB/s) times visit duration, where visit duration is slew time plus scheduling duration. Data excess for a visit is data volume minus the nominal data allocation. Data excess is negative when data volume is less than the nominal allocation, which is good from a resource management perspective.
The ensemble of all JWST visits generates data at a rate slightly below the sustainable downlink rate. If a visit generates a substantial data excess on the SSR, then it must be followed by visits with negative data excess to give downlinks an opportunity to empty the SSR. If visits with a data deficit cannot schedule after the visit with a substantial excess, then the visit with an excess will not be scheduled at that time. Increasing data excess in one visit for incremental science gain may negatively impact science yield in the ensemble of all JWST visits. For this reason, observers must adequately justify data excess above a certain threshold during the program review process.
Non-interruptible set of visits
See also: APT Special Requirements
In APT, users may add the NON-INTERRUPTIBLE constraint to a SEQUENCE OBSERVATIONS1, GROUP OBSERVATIONS, or GROUP VISITS special requirement, which forces constituent visits to execute without interruption. In all of these cases, data excess accumulates as each visit in the set executes. APT calculates the sum of data excesses for visits that must execute without interruption and compares that sum with the same thresholds that apply to individual visits.
1 Bold italics style indicates words that are also parameters or buttons in software tools (like the APT and ETC). Similarly, a bold style represents menu items and panels.
As of APT 2021.2, APT reports data volume and data excess for each visit. To examine an individual visit, select Form Editor in the top tool bar, open the parent observation in the tree editor (left sidebar; click the triangle to expand each container), and select the visit of interest. To examine all visits, select Spreadsheet Editor in the top tool bar, select Observations in the tree editor, and choose Visit in the Show dropdown menu at the top of the spreadsheet editor's GUI interface. See the APT GUI Overview video tutorial for a general introduction to components of the APT user interface.
As of APT 2021.2, APT issues a warning or error if data excess exceeds the lower, middle, or upper threshold, as illustrated in Figure 1. Table 1 summarizes APT behavior and potential user actions. User action is required if data excess is above the middle threshold (15,000 MB). The user must either reduce data excess or justify (during the review of accepted programs) why a large data excess is required to achieve science goals of the program. Small improvements in efficiency or S/N are not sufficient to justify data excess above the middle threshold. Below the middle threshold, users are not required to reduce data excess, but doing so may enable a visit to schedule earlier. This is true even for visits with data excess below the lower threshold (5,000 MB).
Table 1. APT diagnostics and user actions depending on value of data excess
|Data excess||APT diagnostic||User action|
|< 5,000 MB||None||Consider reducing data excess to facilitate scheduling|
|5,000–15,000 MB||Warning: Data Excess over lower threshold||Consider reducing data excess to facilitate scheduling|
|15,000–30,000 MB||Warning: Data Excess over middle threshold||Reduce data excess or justify excess scientifically|
|>30,000 MB||Error: Data Excess over upper threshold||Reduce data excess or contact Help Desk|
Dependency on smart accounting
For programs near a data excess threshold, the APT diagnostic may become more severe when Smart Accounting is run. The diagnostic may then revert if a program change invalidates the previous smart accounting analysis. Smart Accounting may reduce slew time and hence total duration. Less total duration means a smaller nominal data allocation, but data volume remains the same. Thus, the data excess increases when smart accounting reduces slew time. Make sure smart accounting is up to date when assessing whether all data excess diagnostics have been addressed.
How to reduce data excess
Increasing group cadence
See also: Understanding Exposure Times
For individual visits, a common way to reduce data excess is to increase the cadence of groups in an integration by switching to a detector readout pattern that combines and/or drops more frames. Groups are written to the SSR and downlinked to the ground, so increasing the group cadence decreases SSR and downlink usage. Each instrument has its own set of readout patterns (MIRI, NIRCam, NIRISS, NIRSpec).
Increasing group cadence exacerbates saturation issues, reducing the brightness threshold where saturation occurs in the second group. For NIRCam only, frame 0 may provide flux information for pixels that saturate in the second group. If saturation is not a concern and an exposure has multiple integrations, then reducing the number of integrations while keeping the number of groups constant can reduce data excess with minimal impact on science. In rare cases, using a smaller subarray while keeping the number of groups constant can reduce data excess with minimal impact on science.
Sometimes decreasing the number of groups per integration is necessary to reduce data excess, even though S/N is impacted. However, avoid reducing the number of groups below 2 for NIR instruments or below 5 for MIRI because calibration accuracy may be degraded. Fewer groups yield lower S/N because each cosmic ray affects a larger fraction of the exposure time and detector read noise is slightly higher. Use the ETC to assess S/N impact. Despite the S/N impact, if data excess is above the middle threshold, the observer must reduce data excess, unless doing so would compromise the primary science goal of a program. Exceptions must be justified by the observer and approved by all instrument scientists reviewing the program.
Executing visits independently
For a non-interruptible set of visits, another way to reduce data excess is to remove the constraint that visits must execute without interruption. In APT, remove the NON-INTERRUPTIBLE constraint from special requirements for relevant observations.
The NON-INTERRUPTIBLE constraint is intended to address temporal variability in targets or observatory performance. The constraint is appropriate when continuous monitoring spans multiple visits. The constraint may be appropriate when multiple visits require the target and/or observatory to be in a consistent state. Compare the timescale of the visits with the expected timescale of variability. If the timescales are comparable, the constraint is appropriate. If the timescale of the expected variability is substantially longer, use the WITHIN constraint instead of the NON-INTERRUPTIBLE constraint. This gives schedulers flexibility to interleave visits with a data deficit. If the WITHIN time interval is too short, then the linked visits can only schedule without interruption, and policies for NON-INTERRUPTIBLE apply.
Improving efficiency is not a valid scientific use of the NON-INTERRUPTIBLE constraint. Optimization occurs during planning and scheduling, taking into account constraints on the entire pool of visits that must be executed. Executing nearby visits in sequence is best from an efficiency perspective, but is not always possible given numerous other observing constraints. Constraining a set of visits to execute without interruption makes that sequence harder to schedule and more likely to be delayed.
Use of the NON-INTERRUPTIBLE special requirement must be justified by the observer and approved during the instrument scientist review. This is true in general, but the bar will be higher when data excess is above the middle threshold.
Instrument-specific ways to reduce data excess
There are known scenarios for each instrument that may exceed the data excess limit. Some of these scenarios and suggested solutions are described below.
See also: NIRCam Detector Readout Patterns
A group with 2048 × 2048 pixels occupies ~8.4 MB per detector on the SSR. Neglecting all overheads and assuming continuous data acquisition with all 10 NIRCam detectors (both short- and long-wave channels), NIRCam readout patterns accumulate data excess at the following rates.
Table 1. Data excess accumulated by NIRCam using all 10 detectors
|Readout pattern||Group cadence||Data Rate||Accumulated Data Excess||Time to Exceed Medium Threshold|
|DEEP2, DEEP8||~200 s||~0.4 MB/s||-1.6 GB/hour||N/A|
|MEDIUM2, MEDIUM8||~100 s||~0.8 MB/s||-0.1 GB/hour||N/A|
|SHALLOW2, SHALLOW4||~50 s||~1.7 MB/s||2.9 GB/hour||5.1 hour|
|BRIGHT1, BRIGHT2||~20 s||~4.2 MB/s||12 GB/hour||1.3 hour|
|RAPID||~10 s||~8.4 MB/s||27 GB/hour||0.6 hour|
For visits longer than about 6 hours, data excess remains below the medium threshold only for the MEDIUM or DEEP readout patterns.
In addition to selecting a readout pattern with a longer group cadence, NIRCam observers may use the following options to reduce data excess.
- Use only one NIRCam module: It is possible to use only one module for an observation as a method to reduce data excess by selecting a single module in APT rather than ALL. The module options in APT will vary depending on the observing mode (e.g., module A for coronagraphy or module B for imaging).
- Change the number of outputs: In the case of grism time-series observations, a proposer may want to change the number of output amplifiers. Readout of the full NIRCam detector (2048 × 2048 pixels) is performed with 4 outputs simultaneously (Noutputs = 4), each delivering a stripe of data (2048 pixel rows × 512 pixel columns), and taking 10.7 s altogether. Smaller subarrays are read out more quickly, and most are read out through a single output (Noutputs = 1). Noutputs is pre-defined for most subarrays, but observers are given a choice between Noutputs = 1 or 4 in the grism time-series observing mode. Choosing one output reduces data excess by a factor of 4.
The Mid-Infrared Instrument (MIRI) has only 3 detectors, but obtaining simultaneous imaging with both the imager and spectrograph can potentially exceed the data excess thresholds. MIRI observers should consider the general advice described earlier in the article if their planned observations exceed the data volume limit.
The Near Infrared Spectrograph (NIRSpec) has 2 detectors, which have four science readout patterns divided into two readout modes, IRS2 and traditional. NRSIRS2RAPID (IRS2mode) results in higher data volume (~1.95 MBytes/s) than NRSRAPID (traditional mode; ~1.56 MBytes/s) because the digitized reference output is also saved. Despite the higher data volumes, the IRS2 patterns are generally preferred over the normal modes due to the reduction in correlated read noise
NIRSpec observing options that could result in APT data excess errors include:
- Deep, full frame FS/IFU/MOS long exposures with NRSRAPID or NRSIRS2RAPID readout (with exposure times beyond about 500 s).
- NIRSpec MOS + NIRCam parallels (data volume mostly driven by NIRCam).
The general solution is to use one of the frame-averaging patterns, NRS or NRSIRS2. Both the traditional and IRS2 patterns that average 4 (NRS) and 5 (NRSIRS2) frames per group generate significantly smaller data excess and should be within limits.
See also: NIRISS Detector Overview
Since the Near Infrared Imager and Slitless Spectrograph (NIRISS) has only one detector, data excess should not be an issue when it is used as the prime instrument. The wide field slitless spectroscopy or imaging modes of NIRISS can be used in parallel with other instruments. In these cases, the full frame readout format will generally be used with the NIS readout pattern to produce data at a rate of 0.195 MBytes/s. In rare cases (e.g., with bright targets), the NISRAPID readout pattern could also be used, producing 0.782 Mbytes/s. These rates and the accumulated data volumes are typically small compared with the other instruments, but must still be considered.
This article uses the S.I. definitions of gigabyte and megabyte: 1 Gbyte = 1 GB = 109 bytes, and 1 Mbyte = 1MB = 106 bytes.