Monitoring data quality and detector status

come-a-long-way
We’ve come a long way…

Experiments are only as good as their data. Online monitoring (OM) and prompt processing systems will ensure that the ProtoDUNEs, and eventually DUNE, can keep nearly real-time track of their data and fix any problems quickly, giving the detectors more time to accumulate valuable, high-quality data.

“In the OM we want to make basic displays — such as raw ADC distributions — so that we can see, for example, dead or noisy channels, average noise levels, pedestals, and any data problems like stuck ADC bits,” said Voica Radescu, of CERN, who focuses on selecting the metrics to monitor to ensure that the detector systems are operating properly and providing quality data.

The OM will assess channel status by run number and store this information in a database for quick reference. The team is testing different parameters under different beam conditions and after any changes to detector components. When compared to reference histograms, the displays can illuminate anomalous behavior.

If enough CPU is available in the DAQ, where the OM runs, it may even be able to run fast reconstruction algorithms and produce more sophisticated results.

det-daq-dqm
ProtoDUNE data flow first through the DAQ for event building and OM for initial monitoring, then are saved on the buffer storage. The next step is transfer through a dedicated network link to the large, distributed EOS storage system at CERN. From there, a subset of the data undergoes more extensive prompt processing. A copy of the full data goes to Fermilab. Courtesy B. Viren

Closely related to OM is prompt processing. In contrast to OM, prompt processing relies on a special prioritized batch system, called ProtoDUNE Prompt Processing System, p3s, to execute more intensive processing on fewer events. P3s is under development by Maxim Potekhin, of BNL, and it will run on a separate set of computers from the OM.

“The exact line as to which monitoring algorithms run on OM or prompt processing will be drawn based on how much CPU any given algorithm needs, how much data the algorithm needs to crunch in order to be meaningful, and how much CPU is available on each of the OM and prompt processing computers,” said Brett Viren of BNL who contributes to the high-level design. “The goal is to have prompt processing results visible on an interactive web display within ten minutes of a trigger so that any needed action can be taken quickly.”

So far, the p3s system can run trivial test jobs and has an initial web display. Viren hopes that the system will be ready enough to support the “vertical slice” test at CERN in the early 2017 timeframe and to be ready for production running in the spring, in time for testing the first APA for ProtoDUNE-SP.

The DAQ, OM and prompt processing groups are working together to define the requirements for the systems as well as the payloads that produce the desired monitoring results.