Starting DAQ Software¶
The DAQ is run by a process called DAQInterface. This process is started by a single script. It will run in the current terminal until it receives an INTERRUPT signal (CTRL-C). Everything should be run on lariat-daq04 on the sbnd account.
While logged in as the user sbnd on the server lariat-daq04, run (from any directory):
Stuff to Watch for on Initialization¶
After running the setup script, DAQInterface will go through 3 states to initialize: BOOT, CONFIG, and START.
A message viewer should pop up on BOOT. If it does not, or if the viewer that does pop up is just a white screen, x-out the viewer and run the command start_messageviewer in a different terminal. This will re-start the message viewer. There is no need to restart the DAQ.
There are a few possible errors on initialization if the previous run of DAQInterface was not exited correctly. If such an error occurs, the start_daq_interface script will complain and then exit.
If the error is related to not being able to create a shared memory segment, run the command kill_ipcs. Make sure to only do this after DAQInterface has stopped running. If this does not fix the issue, there are likely too many background processes on the server. You should kill them. Note that you should not kill processes run by the sbnddqm account as those are important background DAQ processes. But pretty much everything else is fair game.
If the error is related to not being able to start a process on a certain port (usually 7200), then likely the DAQ did not finish cleanly last time. See "Unclean Crashes".
After running these fixes, try re-starting the DAQ by running startDaqInterface again. If this does not work, contact an expert.
Stopping DAQ Software¶
To stop DAQInterface, you run CTRL-C into the terminal. Shutdown takes a few seconds. Do not press CTRL-C twice in the terminal, as this will result in improper shutdown of the process (see "Unclean Crashes").
Monitoring the DQM¶
During your shift, keep an eye on the DQM website: https://sbn-online.fnal.gov/cgi-bin/minargon/minargon.wsgi/ (available through fgz and/or the fermilab proxy). The website has information on (among other things) noise RMS, pulse heights, and waveforms. It's a good idea to keep an eye on these things to make sure they don't go crazy. A bad state of the electronics will often be accompanied by very large noise RMS values and non-physical looking waveforms.
Also, look at the "Analysis alive time" and "Snapshot alive time" values. These will report the last time that each of the pair of online monitoring processes ran. These values falling behind (by greater than about 5 minutes), along with data not updating, is an indication that the DAQ is having an issue (see "Monitoring for Crashes"). Before checking the DAQ though, it is worth refreshing the website page and re-checking "alive time" values to make sure the webpage has not gone stale.
Monitoring Event Displays¶
It can also be helpful to look at event displays to identify stuff like tracks. Running the command:
evd_vst RUN SUBRUN
on sbnd@lariat-daq04 where RUN is the run number and SUBRUN is the subrun number will start the larsoft event display for the events in the specified subrun. You can identify recent runs/subruns by looking at the online monitoring.
If the event display does not work, it may be that the digits file was not created for the associated subrun. This may happen for isolated subruns, but if it happens consistently there is likely a problem. For example, if the online monitoring is not updating in conjunction with digit files being unavailable, the data outputted by the DAQ may be corrupt.
Monitoring for Crashes¶
On most crashes, DAQInterface will note the crash and write to terminal. However, sometimes the DAQ will crash silently. In this case, you will notice the crash because the DQM will stop updating. The DQM does not crash, so an outage of the DQM => DAQ is down.
It is also possible that the DAQ will be running fine without crashing but the DQM is not updating. In this case, likely either the Lariat DAQ is not on (so the SBND DAQ is not getting any triggers) or the BNL electronics have not been configured (they should be configured after a power cycle). Try checking the other terminal to see if the Lariat DAQ is running. If the cold electronics were recently powered on, check with an expert (perhaps on the hangout) whether they have been configured.
If you notice the DAQ is down, then re-start the DAQ. However, it may happen that with a silent crash DAQInterface does not successfully shutdown all DAQ processes. See "Unclean Crashes".
If DAQInterface crashes cleanly, it will tell you the process that caused the crash. There are 4 possibilities:
- DataLogger => Restart the DAQ. If the DataLogger is consistently causing crashes, the HOLD OFF in the readout electronics needs to be tuned up (Bill knows how to do this)
- BoardReader => Restart the DAQ. If the BoardReader is consistently causing crashes, the nevis electronics may have gone into a bad state. Call Davio and/or Jose and follow their instructions, including possibly power cycling. Do not power cycle without a sign-off from Jose or Davio.
- EventBuidler => Restart the DAQ.
- Dispatcher => Restart the DAQ. Alert an expert from artdaq, as this really should not happen.
In addition, if you see a number of kerberos messages preceding a crash, this means that the sbnd kticket expired -- the DAQ did not necessarily crash. You should still stop the DAQ by pressing CTRL-C, but the DAQ will have stopped in a bad state (see below).
Sometimes, DAQInterface will stop and/or crash without cleaning up its processes successfully. This can happen if the shifter pressed CTRL-C multiple times on shutdown, or if the crash was instigated by a kticket issue. In this case, the remaining DAQ processes from the last run need to cleaned up. This can be done by killing every process on the sand account that does not have the name "bash" or "sshd". You can access the list of process owned by the sbnd account by running "ps u".
After doing this, run the command "kill_ipcs" to cleanup any remaining leftover memory. Then, you can start the DAQ.