Error Sending Event: Read Timed Out

Grant_Slender · July 26, 2021, 7:16am

I’m getting these too… actually happens regularly, and to the point that events from devices commonly fail to show correctly in the dashboard.

josh · July 26, 2021, 2:42pm

As a first step, I would try rebooting the hub as mentioned above.

If you have custom code running on the hub, the Hubitat team usually recommends and approach of temporarily disabiling this custom code to see if the issue is resolved, then enabling custom code one-by-one to identify the culprit.

If you’re seeing this regularly, it could be indicative of a larger issue on the hub (or intermittent network connectivity).

Grant_Slender · July 28, 2021, 7:49am

I don’t believe it is the hub or network, but I’ll let you decide if there is anything else that can be logged or reported in the app that might cause these type of log events…

Grant_Slender · July 28, 2021, 7:50am

Grant_Slender · July 28, 2021, 7:56am

Also, as an example of odd event behaviour … the following sensor has windGustMaxDaily that is showing in Hubitat as being 20.2 but in SharpTools its only 13 and was last updated a few days ago !?? But yet other attributes are updating fine in the same device???

BTW - the hub was rebooted today, so that’s not fixing it

EDIT - 3 hours later

…and yet now it’s decided to update!? Now it’s correctly reporting the right attribute value.

Something is wrong and I’m not convinced it’s the hub or my network - nothing else has any issues and without any more logging details, it isn’t obvious what’s causing it

josh · July 28, 2021, 3:30pm

The ‘Read timed out’ part is the error message that the hub is reporting. It’s basically a try/catch block around the HTTP request and thus is at the mercy of what the hub reports.

I’ve split this out from the original post since they are two different messages. The message in the other thread was that the hub wasn’t authorized to post data to SharpTools – a much more clear and specific error.

Unfortunately, the “Read timed out” message is a really generic error message from the Groovy/Java stack. I’ve only seen rare reports of this with the Hubitat-SharpTools integration and in each of those cases it’s ended up being intermittent hub slow downs or network latency.

When these type of hard to troubleshoot issues come up, I’ve seen the Hubitat team recommend temporarily disabiling custom code to help rule out the impact of those apps/devices.

–

Hubitat also has some tools for troubleshooting hub load, but even Hubitat doesn’t have these “officially” documented. There’s some unpublished endpoints for getting free memory and things of that nature, but my gut feeling is that’s too deep of a first troubleshooting step.

It might be worth at least looking at the Runtime Stats for Devices/Apps from the hub – especially if you’re running custom code (though temporarily disabling custom code as noted above is a recommended troubleshooting step).

Grant_Slender · July 28, 2021, 7:29pm

Hi Josh,

With all respect, I believe you’re wrong.

I’ve done a lot of network socket development in Java and rarely, if ever, it ends up being mysterious performance of the host or network stability. TCP is a very robust protocol, and it requires a significant network or host error to cause failure. A read timeout is due to the remote end (your end) failing to send any bytes when the near end (my end) is left expecting a byte (or bytes). This can occur for a few reasons - the most common being your HE apps logic is broken, or the remote server logic is broken. A lost or missing packet isn’t possible with TCP - retransmission of the packets is built into the stack to handle CPU or network congestion.

Up to you what action you take, but I’m growing increasingly reluctant to stick with SharpTools because occasional update failures seem to leave the dashboard in an inconsistent state compared to the hub. You wouldn’t be the first app developer who doesn’t have a solid understanding of the network stack, and I’ve encountered this line of thinking before - ie it can’t be my code because I write good code, but the network is mysterious and so I’m sure that’s the cause of any network bugs.

I firmly believe that something isn’t okay and you’re casually dismissing a core bug.

I’d be happy to help you troubleshoot - can you outline the architecture of how the hub gets data into your servers? Is it HTTP if so, a POST and is it polling regularly or what? Per event with a fresh session or doesn’t leave a lingering HTTP session that is updated with new event data as it arrives?

G

josh · July 28, 2021, 9:35pm

Sorry to hear that things aren’t working as expected - I can understand how frustrating that can be.

I’m sure we would see a lot more posts about it on this community if it wasn’t a rare or one-off occurrence. We have seen very few reports like this in the past and in each case it ended up being a “canary in the coalmine” for larger issues brooding with the hub which is why I’ve suggested the troubleshooting steps above.

Have you tried those troubleshooting steps or had a chance to check the metrics?

Grant_Slender · July 28, 2021, 10:03pm

Yes. There is no other app, service or driver issues occurring across any other app or system. Even Homebridge is working flawlessly. The only “system” that is having issues is the SharpTools app. If I was to remove all of the custom code there wouldn’t be both much left to test, leaving the point of a robust dashboard solution irrelevant - that’s seriously a lame troubleshooting response. Removing anything non standard isn’t an okay response to a ecosystem chock full of custom and community driven solutions.

How thorough is the canvassing of user community with regards to reporting of this issue? I’m only talking about occasional loss events - most users would be unlikely to be checking logs so I’m not sure waiting for a higher number of support issues to be the only valid way to determine a problem.

Riddle me this - what do you technically believe why the read timeout is occurring? What bytes is it expecting from the remote end that is totally okay for it to be mysteriously lost when running on a guaranteed network protocol like TCP? Isn’t it more likely that some app logic has failed to consider a condition that occurs in some rare situation?

Armand_Welsh · July 28, 2021, 10:10pm

Actually no. A timeout is due to a client side code not receiving the response in a timely manor. Not receiving data at all is a totally different thing. We do not know what the timeout is set to in the HE drivers. I cannot say if they are using httpGet or asyncHttpGet.

Timeout, is a client set parameter to prevent an infinite wait loop. I have seen very odd behaviors on Hubitat in respect to handling of messages. When the hub slows down, due to heavy loads, the time it takes for the httpGet to receive the inbound data increases. Here is the method they use (if httpGet):

httpGet

Send an http GET request. Any response from the call will be passed to the closure.

Signature

void httpGet(String uri, Closure closure)

void httpGet(Map params, Closure closure)

Parameters

uri - The full uri to send the request to.

params - the parameters to use to build the http GET call. Possible values:

uri - The uri to send the request to

queryString - The raw, already-escaped query string.

query - Add these parameters to the existing query string. If any of the parameters already exist in the query, these values will not replace them. Multiple values for the same query parameter may be added by putting them in a list.

headers - Request headers

path - The path component of this request. The value may be absolute or relative to the current path.

contentType - The content-type used for any data in the request body, as well as the Accept content-type that will be used for parsing the response.

requestContentType - Assign a different content-type for the request than is expected for the response.

timeout (since 2.0.9) - timeout in seconds for the request, max timeout is 300

textParser (since 2.1.1) - possible values: true, false. If set to true, the response will be parsed as plain text, if false the system will attempt to determine the content type and parse the response into an object. Defaults to false.

ignoreSSLIssues (since 2.1.8) - possible values: true, false. Ignores certificate issues for SSL connections. Cert does not have to be from a trusted authority and the hostname does not need to be verified. This is primarily for dev situations that make use of localhost, build, and test servers. Defaults to false.

closure - code to handle a successful HTTP response. an object of type HttpResponseDecorator is passed to this code.

On my LAN, my TVs typically send me all my data in less than 1 seconds full round trip, so I set my timeout to 3 seconds, which is way more than enough time. Turns out, for a LAN to LAN transmission over wired ethernet, while the hub is under load, some responses were timing out. I traced this down to the response taking more than 10 seconds. Meanwhile, my desktop could pull the data in millisecond time via my REST client tool.

It is not fair to put the timeout of the HE client on the sharptools services when no-one but HE developers even knows what they have set for their timeout. In addition, none of this tells us what your network queuing looks like. If you have QOS, these packets could be discardable. If you are streaming movies, maybe there is a lot of inbound congestion, and the queues are backed up at the ISP? The point is that the timeout is defined in the client, and this can only be tuned in the client.

I would recommend asking HE developers about the timeout setting, and seeing if it can be increased somehow.

Grant_Slender · July 28, 2021, 10:29pm

…and that condition can be generated by a server failing to respond to a client request. You can’t always blame client side for a client-server application. I see this flawed logic from app developers all the time.

A read timeout is just that. A client was waiting on more bytes that were never received. It isn’t always due to the network or host CPU, and if the app logic expects guaranteed delivery, and you don’t handle this condition gracefully then your logic is flawed.

Network stacks will eventually deliver packets. The client or host ends need to handle conditions where bytes may or may not be delivered in expected timeframes because TCP is guaranteed delivery but not in a specific timeframe. Even QoS doesn’t guarantee a time for delivery - it’s just a priority queue.

SMTP has this logic built into it and we don’t blame end user devices for the failed email delivery - the application logic handles re-delivery and notifications after an extended period of retires fail.

The SharpTools app is clearly lacking if a single HTTP Get is used and failure results in a missed set of events that never get delivered. That’s poor if that’s the approach taken here.

Armand_Welsh · August 4, 2021, 1:18am

The problem is that the timeout setting on the Hubitat app for SharpTools is set too low. I don’t know rowboat it is set to, but if a timeout is being experienced, the. It is too low for you network. Most of us don’t have this problem, but the timeout should be user configurable so that when it is a problem, it can be tuned.

The most likely cause to network timeouts is an overloaded hub. If the hub has too many I/O or CPU bound operations going on, it takes longer for the main message handler to receive the message and deliver it to the client app thread which results in a timeout situation.

Have you considered looking at the performance metrics of your hub? You can see in the “Runtime Stats” what items are impacting the hub performance. Anything in red is excessive and should be looked at. Also, you can use the % of total and % of busy to see who the biggest offenders are of CPU resources. % of busy should always total 100%, since idle is not in there. But this can be used to see how much each device or app impacts the total load on the system.

Justin_Leonard · May 28, 2022, 2:02pm

I’m experiencing the read timeout errors now with alarming regularity. I haven’t looked into the culprit yet. And I’m definitely running a lot of custom code. Looks like I’ve got my work cut out for me to figure this out