Last week the following situation appeared on our MarkV:
<C> core (A4 state):
- BAD EEPROM CSUM
- 3145 DCC ERRORS (number keeps increasing)
- DCC TO RESET
- DEFAULT SYS
<R> core(A4 state)
- 1 DCC errors (constant)
- DEFAULT SYS
<S> core (A7 state), no alarms shown
<T> core (A7 state)
- 1786 DCC errors (constant)
- NO BMS MEM BUFFS
- QST DPM TIMEOUT
Suspecting an issue with the <C> core EEPROM we replaced it, after downloading ALL to C core and R core all the 4 processors went to A7 status and no alarms were shown on the screen.
The day after many diagnostic alarm started coming from the MarkV (although all the 4 cores are still in A7 status) and the LCC screens show the following alarms:
R, S, T core:
- 45 DCC ERRORS
- QST DPM NO DEST
C core: no alarms
Here below some of the diagnostics alarm that keep appearing quite frequently:
- <C> DCC DPM: Invalid destination address
- <C> Sequencing execution overflow
- <C> LCC: Processor rebooted independently
- <Q> COMMON IO communication loss
- <C> DCC BMS: out of memory
- <C> DCC DPM: no BMS memory for isr
- <C> DCC UDM: no BMS memory buffer available
- <C> TCD1 Relays dropped due to IONET failure
In particular the independent reboot of the <C> core occurs in average twice a day or even more and it is for us very concerning.
This MarkV has been running for 20 years with a very few alarms.
For information in this period an upgrade of the DCS is being performed, and the issues came up in concomitance with it.
Can anyone advise which actions should be taken to tackle the issue?
Thank you in advance,
As you have learnt, it's possible to have Diagnostic Alarms (on the LCC/SLCC Display) even if the processor(s) is(are) in I/O State A7. A card (or memory) can have problems without changing the I/O State of the entire processor.
You provided very good information! I'm going to bet the Mark V operator interfaces in use are GE Mark V HMIs, running some version of MS-Windows and CIMPLICITY. AND, the DCS communicates with (receives data from and possibly sends commands to) the Mark V (through the operator interface!) using either MODBUS or GSM (GE Standard Message protocol). If this is true it is most likely there is some serious problem with the format of the request(s) for data from the Mark V, and/or the format of the command(s) to the Mark V, and this is causing the memory problems (and re-boots) of <C>. Or, if in this DCS upgrade process something has been changed in the Mark V operator interface that is causing the <C> problems. If you have multiple Mark V operator interfaces (especially GE Mark V HMIs) and they are also requesting data as normal in addition to the DCS information, the ARCnet-based StageLink just gets overwhelmed, as does <C>. And <R>, as the designated voter, can also get overwhelmed, or confused (if the communications requests/commands are not properly formatted).
I would suggest a test (if possible). Shut off the DCS communication with the Mark V. And, reset all the Diagnostic Alarms on the Mark V (it might be a good idea for this test to re-boot <C>). Then wait for a while to see if the Diagnostic Alarms re-occur. If no changes were made to the Mark V operator interface as part of this DCS upgrade AND the Diagnostic Alarms do not start again, then it's something amiss with the format of the data/commands from the DCS to the Mark V (through the Mark V operator interface). And, if you re-start the DCS communications and the Diagnostic Alarms start happening again then you've isolated the problems to the DCS communications.
CIMPLICITY, as used in GE Mark V HMIs, is not a really well-behaved application. It asks for data for EVERY point defined in the CIMPLICITY project at least once-per-second--and if the Mark V operator interface is ALSO requesting data for DCS MODBUS or GSM communications AND the requests/commands from the DCS are not formatted correctly then all manner of problems can occur--especially <C> memory problems as you are experiencing.
By "bad format" I am referring to mis-spelling of signal names, or requesting signal values that don't exist or exist at different memory locations than specified in the request/command. Things like that.
The same <C> memory problems can also occur when the DCS initiates data requests, or sends commands, and then can't process the requested data or the command confirmation in time for the next set of data or command confirmation requests.
But, you have probably already identified the source of the problem: something amiss with the DCS upgrade, specifically the Mark V communications.
Hope this helps! Please write back to let us know what you find and how you resolve the problem. Also, you were very observant to note--and tell us--that the problem began at the same time as the DCS upgrade; that was very good troubleshooting!
thank you very much for your precious advise and I apologize for my late feedback.
In this power plant there are actually two Mark V and two IDOS <I>, each of the <I> communicating with both Mark V.
What we did first was to run a couple of diagnostic tools on IDOS to analyse the modbus communication (PRT_STAT and GBL2FILE): the result showed how the DCS was trying to write 27 MarkV registers at once, or at least within 1 second, which I understood is way more than the recommended from GE (max 10 registers per second).
Speaking with the DCS guys we actually understood they were even trying to write 27 registers every 0,100s.. so the overloading issue was very clear.
So as you proposed, we disconnected from the <I> the serial modbus cable coming from the DCS and we observed the situation for a few days. During this period, neither diagnostic alarms nor warning messages appeared on both MarkV, which confirmed our hypothesis of a communication overloading. Also the idle time for each core increased about 2-3%.
I forgot to mention that another abnormal behaviour was some strange logic forcing automatically appearing after the independent reboot of the <C> core: also this strange behaviour was not observed anymore after the modbus was disconnected.
Therefore we agreed with the DCS guys to split the 27 registers to be written in 3 packages of 9 registers, each packet being sent every 1 second, which should be according to GE specs.
After connecting again the MODBUS, the situation looks much better, only sometimes a diagnostic alarm appears because the <C> core loses the Arcnet communication with the <R> core for a few seconds. This is probably due to the fact that the communication load is still very close to the limit stated by GE, and <R> core which is the core with less idle time, might be sometimes overloaded. We will keep the situation monitored, if you have any further advise please feel free to share it. In the meantime many thanks for your assistance!