Modbus slaves stop listening on ports

Interesting. So are the 3 Maxcess units that use the FL BT EPA bridges the ones you mentioned in your original post that drop out after a few days and the one that lasts a week or two is the one without the FL BT EPA bridge?
but after a few days, 3 of the 4 stop listening on ports 80 & 502, eventually after a week or two, the last one also does the same
Do any other devices on this network use a FL BT EPA bridge?

Perhaps the Wireshark captures will shed more light on what effect the FL BT EPA bridges are having on the communications that would cause the ProSoft to create multiple connections.
 
Exactly, the 3 devices which use the BT bridges fall over very quickly with multiple TCP connections to the point the little MCU can't cope. The one of them which has no BT bridge lasts for over a week, and now I have the switch port set to device for automation rather than multiport, then it may last indefinitely.

But if I use only a standard PC modbus client, none fall over, the ones with a BT bridge or without. So it seems the ProSoft card and the BT bridge is a bad combination. But why. I will hook up to a hub and see if I can capture packets from wireshark but I'm not it's biggest fan. Lots of people swear by it, I find it tedious and time consuming to use in any serious way. I have other stuff to do as well as this carry on.

Nothing else in the whole plant uses these bridges except this one machine. Was meant to be an upgrade to measure product tension but it's not reliable so that's a problem I do need to solve.
 
Ok one more observation, it isn't anything to do with the Bluetooth modules, its when the Prosoft write function is enabled. I rebooted them all and this time the hard wired unit failed first with multiple TCP connections to it from the Prosoft card.

Next, I reset all of the Maxcess units, enabled only the Prosoft read function (FC 4) and not the write function (FC 16) - as we do need to send a setpoint to these units eventually but for now, while isolating the issue I tried reading only, and so far, only one Modbus TCP connection has been made to each unit by the PS card. No drop outs or shut downs noted.

Oddly while connected to a hub port, between the Maxcess units and the managed switch, with them and the managed switch port both connected to the hub, I cannot trap any TCP 502 packets on the wire with wireshark despite seeing the connections appear on the units, in their netstats using Putty. I expected I should be able to see those packets? Using "tcp.port == 502" there is nothing found. Even in 3 million packets.
 
Good observations. So it seems that the Prosoft may be creating new TCP connections for writes. Have you asked Prosoft about this behavior? Is this expected?

Unless your "hub" is 20 years old or is a very specialized modern device, it is more than likely a standard switch and will not send all packets to all ports. That's probably why you're not seeing anything in Wireshark for TCP port 502. If your hub device has a port mirroring function, you will need to enable that. Otherwise you will need to use another switch that does support port mirroring or use a dedicated Ethernet tap device.
 
It's a new TPLink desktop switch but unmanaged, I assumed like an old hub they would send all packets to all ports, is that not the case? Does it build an arp table and manage the ports that way, and if so how does that arp table ever get cleared - with no management interface nor dedicated IPs to the switch.

Have raised a ticket with ProSoft and am waiting on escalated tech support.
 
A new, unmanaged switch will not function like an old hub. It will only send packets to the specific port(s) necessary. You cannot perform capturing using Wireshark with an unmanaged switch, since you cannot enable port mirroring. You need to use a managed switch that supports port mirroring or use a dedicated Ethernet tap device (such as one of these https://www.midbittech.com/index.html).
 
Ok so the issue appears to be solved. After spending some time in a video call with a senior engineer from ProSoft, we finally got to the bottom of the issue. It's a bit of interesting info for anyone using a ProSoft MVI56E-MNETC card to write to a slave. If you have the write function turned on, say FC 16 (writing a float or real value) then options are for the "enable" section of the command: "No" (disabled), "Yes" (enabled - what I was using) and "Conditional"

This surprised me, I thought it would send a value only when I did, from the PLC. Nope, in "Yes" mode it fires a write connection to the slave every cycle, with a "null" value unless I send a specific value, so in the little Maxcess slave unit it was causing multiple TCP connections to be created..crashing the slave when it hit the max number it can cope with.

What we think, is the slave should probably have gracefully ACKd the "null" transmission and closed the connection, but it seems not to understand the "null" write, and held the connection open waiting on a value or something to arrive, which it won't. The mnetc then tried repeatedly which creates another, and another etc.

Switching the FC 16 command to "Conditional" meant it does not write anything, not even a null, unless the PLC triggers a write value (by writing to the address space in the PLC reserved for that slot in the mnetc card of course) and when the write is correctly ACKd, does nothing till the PLC writes again (the value must be different too or the write is ignored by the mnetc, that's fine as it wrote the value the first time so no need to repeat unless it changes)

So, solved.

ProSoft were surprised at the behaviour of the card setting up 7 TCP connections when using the "Yes" enable setting, which shouldn't occur, but suggested trying the "Conditional" and see what happens. We did, it works perfectly now. We both learned something there.

They can't have firmware updates to suit every single slave brand out there and sometimes an incompatibility occurs. Luckily there was a good workaround which seems to be solid.
 
It all makes sense, but what constitutes a "Null" write command that the slave would not acknowledge or return an exception code for? Does an null write mean the data field in the write command is number zero: 0000?

I'm drawing a blank on a "null write command".
 
ProSoft say it "writes zero" every second when no value is sent from the PLC. I say "null" because if it actually wrote zero, it would overwrite my previous setpoint, which it doesn't, it just creates 7 TCP connections trying, and the Maxcess slave can't cope with that many. I expect it creates a connection, calls FC 16 on the slave but provides no value, which is not correctly interpreted by the slave and results in a hanging connection.

Have you ever written software in C#? The null concept would be familiar. It's not zero, it's not a number. It's not a blank, it's a null. No value. A variable which is created but not explicitly given a value, still exists with a pointer to it in memory but it has no value. It is assigned a "null" value at creation. It exists but holds nothing yet.

I think the master appears to create a connection without completing the transaction, and the slave keeps the connection open waiting on a value. If you write software to make a TCP connection, all you need is the IP of the intended device and the port, then connection can be made without actually sending data..it's down to the master to close the TCP connection and the slave will then drop it. I think that's what's not happening.
 
I remember several instances where inexpensive single loop controllers running Modbus RTU could be choked by bombarding them with 10-20 Modbus read commands per second. In one case the master ran Modbus/TCP through a gateway and polled the poor slave at 20Hz, never waiting for reply. Slowing down the polling to once every second or two allowed the controller to process the command and reply.
 
I initially thought it was a speed issue and set polling to once per second. That made no difference but I can see how it might have. We have other slaves which hate being polled faster than that.
 
While the overall concept of the solution you've presented makes sense, there is a lot of seemingly incorrect information here.

in "Yes" mode it fires a write connection to the slave every cycle, with a "null" value unless I send a specific value, so in the little Maxcess slave unit it was causing multiple TCP connections to be created..crashing the slave when it hit the max number it can cope with.
I know you and David_2 already talked about this, but again, there is no such thing as "null" in Modbus. Also, as you stated, the ProSoft can't be writing a zero value, otherwise the value in the slave would have changed. My only guess is that the ProSoft is simply establishing a TCP connection, but then not sending a Modbus packet at all. Wireshark would show exactly what the ProSoft is doing.

What we think, is the slave should probably have gracefully ACKd the "null" transmission and closed the connection, but it seems not to understand the "null" write, and held the connection open waiting on a value or something to arrive, which it won't. The mnetc then tried repeatedly which creates another, and another etc.
A slave does not close a TCP connection. Since the ProSoft initiated the TCP connection, it is responsible for closing the connection. Even if the ProSoft isn't sending a Modbus packet and just opens a TCP connection, the ProSoft should also gracefully close the TCP connection. What may be happening is the ProSoft is not closing the TCP connections it is creating and the slave eventually runs out of sockets (Modbus/TCP servers have a fixed number of TCP connections they can accept). As I said before, after some time, the slave is supposed to recycle abandoned sockets and then be able to accept new connections. It may be doing that, but then the ProSoft is immediately creating yet another new connection, again using up all of the slave's sockets. Again, there's no need to guess what's happening. A Wireshark capture will show you exactly what is happening.

ProSoft were surprised at the behaviour of the card setting up 7 TCP connections when using the "Yes" enable setting, which shouldn't occur, but suggested trying the "Conditional" and see what happens. We did, it works perfectly now. We both learned something there.
Typically if a vendor is "surprised" by something, this warrants more investigation. Personally, I wouldn't trust that an issue is actually fixed until the root cause is definitively determined and addressed. The solution you've found may simply be a temporary workaround, and the issue may still occur, just after a longer period of time.

It is, of course, your decision whether or not to spend additional time on this, but I still would recommend getting a Wireshark capture of the behavior when the ProSoft's write enable is set to "Yes".
 
We now have a full wireshark capture of all the transactions before and after the "crash" totalling over 300mb. The slave is not recycling abandoned sockets. It leaves them open indefinitely. The prosoft should not be creating additional sockets, all transactions would go through the same original socket and no new ones need to be made, as the "read" socket should be the only one, then also used by writes too.

That is what puzzles Prosoft, they see no reason why this should be happening. The Wireshark captures have been shared with both OEMs.

We can see in the packets where the master writes to the slave, if it does not answer within 100ms then the master breaks the connection and the slave eventually runs out of sockets (since it does not close or reuse an abandoned socket). Initially it answers fine, but then as time goes on, it slows down and that's when the problem starts. It just seems to run out of steam, but when the master is set to only write on update from the PLC if the value has changed in the memory map, then all is well, only one connection is created and the slave always answers in time. Changing the timeout is easy and that also potentially makes the communication more stable, allowing the slave ample time to reply.

The issue of time for further trials is one I can't push; the machine is a production system required to make product so now I have a working solution, I need to hand it back and let them get on with using it, no time left to play with it.

The OEMs may come back with a firmware update, they have the information now to investigate further and are doing that.
 
I wanted to say, thank you to all who contributed on the thread, helping me to get to an understanding of what was going on. I am truly grateful for your insight and time. Thank you all.
 
An update for anyone who was following this or has similar issues.

The Prosoft MVI56E-MNETC card can read from, or write to, any configured modbus device / client but the write mode has three selections possible, enabled, disabled and conditional.

We had the default which is "enabled" but this causes the card to write the setpoint to the slave (in this case the Spyder module) every scan cycle. That created the problem with multiple TCP connections in the slave, crashing it when it runs out of internal connections it can support, which is limited to 6 (1 more and it will crash)

So the fix is to set the write function in the mnetc to "conditional" which means it only sends the value in the memory location if and when it changes. Also in this mode, the mnetc closes the connection when the value is sent.

This works perfectly and stops the slave devices getting swamped with TCP connections - now all is well.
 
Top