Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error enabling device tracker #22

Closed
smar000 opened this issue Jan 1, 2022 · 54 comments
Closed

Error enabling device tracker #22

smar000 opened this issue Jan 1, 2022 · 54 comments

Comments

@smar000
Copy link

smar000 commented Jan 1, 2022

Hi @travisghansen

Many thanks for this plugin. I have installed it (along with the corresponding opnsense plugin) and am getting the non device_tracker entities coming through, as expected.

However, wen trying to save after clicking "Enable Device Trackers" in the configuration options, I get a socket timeout error:

2022-01-01 06:25:59 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 435, in _handle_request
    resp = await request_handler(request)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_app.py", line 504, in _handle
    resp = await handler(request)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_middlewares.py", line 117, in impl
    return await handler(request)
  File "/usr/src/homeassistant/homeassistant/components/http/security_filter.py", line 60, in security_filter_middleware
    return await handler(request)
  File "/usr/src/homeassistant/homeassistant/components/http/forwarded.py", line 98, in forwarded_middleware
    return await handler(request)
  File "/usr/src/homeassistant/homeassistant/components/http/request_context.py", line 28, in request_context_middleware
    return await handler(request)
  File "/usr/src/homeassistant/homeassistant/components/http/ban.py", line 78, in ban_middleware
    return await handler(request)
  File "/usr/src/homeassistant/homeassistant/components/http/auth.py", line 181, in auth_middleware
    return await handler(request)
  File "/usr/src/homeassistant/homeassistant/components/http/view.py", line 137, in handle
    result = await result
  File "/usr/src/homeassistant/homeassistant/components/config/config_entries.py", line 215, in post
    return await super().post(request, flow_id)
  File "/usr/src/homeassistant/homeassistant/components/http/data_validator.py", line 62, in wrapper
    result = await method(view, request, *args, **kwargs)
  File "/usr/src/homeassistant/homeassistant/helpers/data_entry_flow.py", line 110, in post
    result = await self._flow_mgr.async_configure(flow_id, data)
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 252, in async_configure
    result = await self._async_handle_step(flow, cur_step["step_id"], user_input)
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 325, in _async_handle_step
    result: FlowResult = await getattr(flow, method)(user_input)
  File "/config/custom_components/opnsense/config_flow.py", line 195, in async_step_init
    return await self.async_step_device_tracker()
  File "/config/custom_components/opnsense/config_flow.py", line 237, in async_step_device_tracker
    arp_table := await self.hass.async_add_executor_job(
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 489, in get_arp_table
    response = self._exec_php(script)
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 64, in inner
    response = func(*args, **kwargs)
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 97, in _exec_php
    response = self._get_proxy().opnsense.exec_php(script)
  File "/usr/local/lib/python3.9/xmlrpc/client.py", line 1116, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/lib/python3.9/xmlrpc/client.py", line 1458, in __request
    response = self.__transport.request(
  File "/usr/local/lib/python3.9/xmlrpc/client.py", line 1160, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/local/lib/python3.9/xmlrpc/client.py", line 1173, in single_request
    resp = http_conn.getresponse()
  File "/usr/local/lib/python3.9/http/client.py", line 1371, in getresponse
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 319, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 280, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

Saving other configuration options (e.g. Scan Interval) works fine. The issue only appears with enabling device trackers.

I have tried repeatedly (including restarting opnsense/home assistant etc), but the error persists. I am using OPNSense version 21.7.7 and Home Assistant 2021.12.7.

Can you help by pointing me in the right direction to enable device trackers.

Many thanks!

EDIT: I am using root user to login to OPNsense.

@travisghansen
Copy link
Owner

Welcome! How many devices roughly would you say are on your network?

@smar000
Copy link
Author

smar000 commented Jan 1, 2022

Just did a quick check of current dhcp leases, and it is showing about 180.

@travisghansen
Copy link
Owner

Ok. Are you comfortable manually editing files of the integration by chance?

@smar000
Copy link
Author

smar000 commented Jan 1, 2022

Are you comfortable manually editing files of the integration by chance?

Sure!

@travisghansen
Copy link
Owner

https://github.com/travisghansen/hass-opnsense/blob/main/custom_components/opnsense/config_flow.py#L238

Change True to False then restart hass and try again to see if it’s any better.

@smar000
Copy link
Author

smar000 commented Jan 1, 2022

Worked!!

Thank you so much for the quick response, especially on new years' day!

Wishing you a very happy new year!

@smar000
Copy link
Author

smar000 commented Jan 1, 2022

One small feature request - would it be possible to sort the devices to track list by either the mac address or (preferably) the IP? As I have so many devices, it would make it easier to find the required devices. Not a big issue, so only if you have time.

Thanks again!

@smar000
Copy link
Author

smar000 commented Jan 1, 2022

Hi again

Unfortunately I seem to be having another issue. After adding 1 device in the device tracker list, the
HASS Integrations screen widget is showing an exclamation mark and Retrying Setup and Check the logs. The logs seem to only have the following warning:
2022-01-01 08:03:35 WARNING (MainThread) [homeassistant.config_entries] Config entry 'OPNsense.smar.co.uk' for opnsense integration not ready yet; Retrying in background

There are other debug messages in the log from opnsense that seem normal, e.g.
2022-01-01 08:03:20 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event entity_registry_updated[L]: action=update, entity_id=binary_sensor.opnsense_smar_co_uk_pending_notices_present, changes=disabled_by=config_entry>

Appreciate your help again.

Thanks.

EDIT: All the entities (including the non-device_tracker ones) are now also showing as Unavailable. Restart does not appear to help.
I also noticed the following in the debug logs:
2022-01-01 10:22:22 DEBUG (MainThread) [custom_components.opnsense] Finished fetching OPNsense.smar.co.uk OPNsense device tracker state data in 10.001 seconds (success: False)
Finally, disabling device tracker in the configuration returns everything back to normal (i.e. it all the non-device tracker entities are back online etc).

@travisghansen
Copy link
Owner

Yeah that was somewhat expected. Change this to False as well..

https://github.com/travisghansen/hass-opnsense/blob/main/custom_components/opnsense/__init__.py#L286

@smar000
Copy link
Author

smar000 commented Jan 1, 2022

That seems to have done the trick! Thanks again.

@travisghansen
Copy link
Owner

That flag tells the api to not resolve hostnames with arp table entries. So something about that in your setup is adding significant delay to the process as a whole.

@smar000
Copy link
Author

smar000 commented Jan 2, 2022

That flag tells the api to not resolve hostnames with arp table entries.

Yes, I had figured that that was the case. Do you have any ideas as to what might be causing this delay? I'm assuming it is not normal...?

@travisghansen
Copy link
Owner

It’s expected to take a little extra but it seems in some cases something is hanging somewhere. Others have experienced the same issue so you aren’t alone.

@smar000
Copy link
Author

smar000 commented Jan 3, 2022

Ok, good to hear that it is not just me! Appreciate all your help.

@smar000 smar000 closed this as completed Jan 3, 2022
@travisghansen travisghansen reopened this Jan 3, 2022
@travisghansen
Copy link
Owner

Want to leave this open for a bit. Can you try these 2 commands at the cli of opnsense and see how long they take?

time /usr/sbin/arp --libxo json -a
time /usr/sbin/arp --libxo json -an

@smar000
Copy link
Author

smar000 commented Jan 3, 2022

Sure.

time /usr/sbin/arp --libxo json -a comes back with 0.005u 0.025s 0:15.20 0.1% 140+1810k 0+0io 0pf+0w
and
time /usr/sbin/arp --libxo json -an responds almost instantly with 0.000u 0.006s 0:00.00 0.0% 0+0k 0+0io 0pf+0w

@travisghansen
Copy link
Owner

Well that's a bit crazy, the 1st is what gets executed when the value is set to False and the 2nd is what gets executed when the value is set to True. I wonder why it's takes so much time using the api..

@smar000
Copy link
Author

smar000 commented Jan 3, 2022

the 1st is what gets executed when the value is set to False and the 2nd is what gets executed when the value is set to True

Are you sure about that? With the first one, I am seeing hostnames in the json (e.g {"hostname":"esp32.smar.co.uk","ip-address":"172.16...), but with the second one I only see a question mark for hostname (e.g. {"hostname":"?","ip-address":"172.16.3.32"...).

@travisghansen
Copy link
Owner

That is correct, when I glanced at the code I missed the ! :(

In any case, both seem to be returning incredibly fast, I wonder why it would cause such a headache when trying to use that option via the api..

@alexdelprete
Copy link
Contributor

I wonder why it would cause such a headache when trying to use that option via the api..

Can we benchmark that through time+curl just to see the delay?

@travisghansen
Copy link
Owner

Not very easy no, but plopping a file on the hass filesystem shouldn’t be terrible. I’ll send an example when I get back to my desk.

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

In any case, both seem to be returning incredibly fast

If it helps, the first one (with the hostnames) is taking about 15 seconds in real life (even though 'real' value is showing 0.005u). The data comes in spurts - i.e. about a third of the data comes through, then it pauses for a few seconds, then another splurge of date, pause etc.

Is something timing out on the pauses?

@travisghansen
Copy link
Owner

Ah ok, so the real time is the 3rd value (15.20)?

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

Yes, I think so. Reading the FreeBSD man page, the 3rd value is apparently the system time (the 1st being 'real' time), so not quite sure how to interpret this! I used a separate stopwatch to reconfirm though, and yes, the actual elapsed time on the stopwatch appears to be the 3rd value (varies from 10 seconds to 15 seconds per run).

@travisghansen
Copy link
Owner

Ok well name resolution is done via standard nsswitch semantics from what I can tell so it sounds like some 5s timeout is getting hit 2 or 3 times. We may need to strace the call or something to see where it’s getting stuck.

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

I just realised what the cause of the slow down is.... I have Adguard in the middle which then forwards on to Unbound. Taking Adguard out of the equation, the same command now returns all the hostnames almost instantly: 0.016u 0.024s 0:00.35!

Apologies for wasting your time on this...

@alexdelprete
Copy link
Contributor

alexdelprete commented Jan 4, 2022

I have the same setup...AGH then unbound. We're talking about reverse lookups, right? Unless you have many filters/lists, it shouldn't take 15s. You could try disabling some lists/filters until you find the cause of the delay.

@travisghansen
Copy link
Owner

Definitely not wasted time! This is great debugging going on so thanks for the help! Others have reported the same or similar issues so this has been extremely helpful.

Let’s see if we can figure out what exactly in adguard is causing the delay and perhaps we can mention this issue in the README and possibly give advice on configuring adguard in an agreeable manner.

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

I have the same setup...AGH then unbound. We're talking about reverse lookups, right? Unless you have many filters/lists, it shouldn't take 15s. You could try disabling some lists/filters until you find the cause of the delay.

Thanks for confirming that you have the same set up and yours is working fine. This being the case, I just went through trying each option on Adguard (I didn't have many filter running earlier) and have found the culprit!

Let’s see if we can figure out what exactly in adguard is causing the delay and perhaps we can mention this issue in the README and possibly give advice on configuring adguard in an agreeable manner.

It was not the number of filters, but the default rate limit on Adguard. This was set to 20. Disabling it by setting to 0 solves the bottleneck and we are getting results returned in fractions of a second (0.029u 0.019s 0:00.19).

All good now :)

Thanks again.

@travisghansen
Copy link
Owner

I was wondering about exactly that given the 5s intervals! Very good! We’ll document.

In the mean time I’ll reach out to others who are potentially seeing the same issue and see if we have some matches.

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

It will be interesting to hear if theirs is the same issue.

Would you like me to close this issue now, or leave it open a bit longer?

@alexdelprete
Copy link
Contributor

alexdelprete commented Jan 4, 2022

It was not the number of filters, but the default rate limit on Adguard. This was set to 20. Disabling it by setting to 0 solves the bottleneck

Defaults of a critical service like AGH should be carefully chosen by devs. I remember it was one of the first parameters whose defaults I didn't like...

image

Glad you sorted this out. It's a good catch in case someone else has the same issue.

@travisghansen
Copy link
Owner

Let’s leave it open until it’s been documented etc.

@alexdelprete
Copy link
Contributor

I was wondering about exactly that given the 5s intervals! Very good! We’ll document.

When you see a fixed pattern, it's not a bug, it's by design. It was basically a throttle. :)

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

When you see a fixed pattern, it's not a bug, it's by design. It was basically a throttle. :)

Yes, that's what it felt like.

Let’s leave it open until it’s been documented etc.

Sure.

@alexdelprete
Copy link
Contributor

This debugging experience is the main reason why I love the open-source philosophy/approach. Great teamwork of people giving their time/experience for free with the sole scope of solving issues and learning new things while doing it...

I was not an advocate of open-source a few years ago...I was totally wrong...obviously...

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

This debugging experience is the main reason why I love the open-source philosophy/approach. Great teamwork of people giving their time/experience for free with the sole scope of solving issues and learning new things while doing it...

Yes, absolutely!

And a snapshot of the final result in the HA integration:
Webp net-resizeimage

@alexdelprete
Copy link
Contributor

If it works so good, I might enable device_tracker too. The ones I tried in the past didn't leave me confident, but Travis is a very good developer, I might give it another shot. :)

@smar000
Copy link
Author

smar000 commented Jan 4, 2022

It seems to be working well on my system (even without the hostnames earlier).

Travis is a very good developer,

Yes, very much so!

@travisghansen
Copy link
Owner

Everyone helps! The device tracker integration in this integration is quite unique in that I have direct control over the arp table and remove the entries. As long as you set the scan interval and consider home interval to sane values you should get very consistent and good results.

@alexdelprete
Copy link
Contributor

alexdelprete commented Jan 4, 2022

As long as you set the scan interval and consider home interval to sane values you should get very consistent and good results.

Could be worth a note about recommended values/defaults in the docs. I'll definitely follow your recommendations.

BTW Travis, I opened an issue yesterday, don't know if you saw it. No urgency obviously...

@travisghansen
Copy link
Owner

The values are personal preference for sure, but 30s and 300s (scan ever 30s and consider home for 5 minutes) or 60s and 600s seems like decent starting points.

@alexdelprete
Copy link
Contributor

alexdelprete commented Jan 4, 2022

Thanks. I'll try those...the scanning frequency is in addition to the component's polling for status info, correct?

@travisghansen
Copy link
Owner

The device tracker scans are in addition and are on an entirely separate scan interval from the main scans. Both loops collect distinct data points however.

@alexdelprete
Copy link
Contributor

Got it. Thanks.

@alexdelprete
Copy link
Contributor

alexdelprete commented Jun 5, 2022

@smar000 this issue can be closed right? :)

or maybe @travisghansen wants to keep it open for others with similar issues...

@smar000
Copy link
Author

smar000 commented Jun 5, 2022

@smar000 this issue can be closed right? :)

or maybe @travisghansen wants to keep it open for others with similar issues...

I had closed it earlier but @travisghansen wanted it to keep it open for a bit.

@travisghansen
Copy link
Owner

Yeah I’ll add some notes in the docs as discussed.

@travisghansen
Copy link
Owner

Note has been added to README.md about AdGuard.

@Iceman248
Copy link

I would recommend adding "Home" to end of "AdGuard" in the readme. I had to read through this to figure out it was AGH, not one of the other apps/services that are called AdGuard something or simply just AdGuard.

@travisghansen travisghansen reopened this Aug 17, 2022
@travisghansen
Copy link
Owner

updated README

@dayt47
Copy link

dayt47 commented Jan 18, 2023

Hey @travisghansen,

I got an error too when trying to activate the device tracker feature. I have Adguard Home installed on the opnsense and your opnsense plugin. My unbound listens at the port 5353 and adguard on 53.

When I disable adguard and set the unbound port to 53 (so unbound works on its own, standalone) and I activate the device tracker in opnsense configuration it works instantly without errors. But when trying to do the same with adguard activated AND rate limit is set to 0 it does not work unfortunately.

I tried your fix #22 (comment)
and it worked for me too. I don't know where the problem exactly is, but it seems so be still an adguard home problem. I also tried to activate device tracker with one small block list at AGH as suggested, didn't work either.

@travisghansen
Copy link
Owner

The trouble is when it resolves hostnames it does so for every entry in the arp table. If the DNS system used doesn't like the rate or chokes it will fail. I'm not super-excited about disabling the hostname lookup as the user experience for that generally is not great :(

Does it log errors to the console by chance? If so send them over.

@dayt47
Copy link

dayt47 commented Jan 18, 2023

Here is the output from the HA logs:

`This error originated from a custom integration.

Logger: aiohttp.server
Source: custom_components/opnsense/pyopnsense/init.py:121
Integration: OPNsense (documentation, issues)
First occurred: 20:47:32 (1 occurrences)
Last logged: 20:47:32

Error handling request
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 435, in _handle_request
resp = await request_handler(request)
File "/usr/local/lib/python3.10/site-packages/aiohttp/web_app.py", line 504, in _handle
resp = await handler(request)
File "/usr/local/lib/python3.10/site-packages/aiohttp/web_middlewares.py", line 117, in impl
return await handler(request)
File "/usr/src/homeassistant/homeassistant/components/http/security_filter.py", line 60, in security_filter_middleware
return await handler(request)
File "/usr/src/homeassistant/homeassistant/components/http/forwarded.py", line 100, in forwarded_middleware
return await handler(request)
File "/usr/src/homeassistant/homeassistant/components/http/request_context.py", line 28, in request_context_middleware
return await handler(request)
File "/usr/src/homeassistant/homeassistant/components/http/ban.py", line 81, in ban_middleware
return await handler(request)
File "/usr/src/homeassistant/homeassistant/components/http/auth.py", line 236, in auth_middleware
return await handler(request)
File "/usr/src/homeassistant/homeassistant/components/http/view.py", line 136, in handle
result = await result
File "/usr/src/homeassistant/homeassistant/components/config/config_entries.py", line 241, in post
return await super().post(request, flow_id)
File "/usr/src/homeassistant/homeassistant/components/http/data_validator.py", line 73, in wrapper
result = await method(view, request, data, *args, **kwargs)
File "/usr/src/homeassistant/homeassistant/helpers/data_entry_flow.py", line 110, in post
result = await self._flow_mgr.async_configure(flow_id, data)
File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 286, in async_configure
result = await self._async_handle_step(
File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 374, in _async_handle_step
result: FlowResult = await getattr(flow, method)(user_input)
File "/config/custom_components/opnsense/config_flow.py", line 197, in async_step_init
return await self.async_step_device_tracker()
File "/config/custom_components/opnsense/config_flow.py", line 239, in async_step_device_tracker
arp_table := await self.hass.async_add_executor_job(
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/config/custom_components/opnsense/pyopnsense/init.py", line 545, in get_arp_table
response = self._exec_php(script)
File "/config/custom_components/opnsense/pyopnsense/init.py", line 88, in inner
response = func(*args, **kwargs)
File "/config/custom_components/opnsense/pyopnsense/init.py", line 121, in _exec_php
response = self._get_proxy().opnsense.exec_php(script)
File "/usr/local/lib/python3.10/xmlrpc/client.py", line 1122, in call
return self.__send(self.__name, args)
File "/usr/local/lib/python3.10/xmlrpc/client.py", line 1464, in __request
response = self.__transport.request(
File "/usr/local/lib/python3.10/xmlrpc/client.py", line 1166, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/local/lib/python3.10/xmlrpc/client.py", line 1179, in single_request
resp = http_conn.getresponse()
File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
response.begin()
File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.10/http/client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.10/ssl.py", line 1274, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.10/ssl.py", line 1130, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
`

EDIT: When using dnsmasq DNS I get the same error when trying to enable device tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants