Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: UISP integration and complex setups, relative to BracketOoS #140

Closed
thebracket opened this issue Oct 21, 2022 · 12 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@thebracket
Copy link
Collaborator

thebracket commented Oct 21, 2022

Setting this up as a tracking issue while I poke at UISP integration a bit. My intent is to gradually fix these issues and offer them up to LibreQoS (rather than just handing out my Rust-based tool and keeping it updated separately).

My "playpen" for working on this is here: https://github.com/thebracket/LibreQoS/tree/uisp-integration . My intent is to tackle issues there, and then turn them into merge-able PRs for Libre. Oh, and I grabbed a copy of this book from my publisher (I get e-books super-cheap since I work for them) and re-learned Python. :-)

One of the areas that took a lot of work with BracketQoS was getting our UISP setup to work with it. We run a mixed vendor network, with UISP handling billing/CRM even for the parts that are running Cambium, Mimosa, Mikrotik and a few others. When I run the 1.3 UISP integration script against our network:

Once those were out of the way, I still ran into some issues:

  • ShapedDevices.csv only contains 59 devices. It should contain several hundred. (See main...thebracket:LibreQoS:uisp-integration for work in progress)
  • network.json shows a single node. The site it picked is one that doesn't contain any devices and isn't connected to the rest of the tree (long story short, someone jumped the gun and added it before we install the hardware in the coming weeks).
  • Many of our devices have more than one IP, but the current integration only looks at the one UISP picks as "management". That's particularly problematic for us, since we have a LOT of CPEs with separate management and traffic IPs.
  • There's no subnets, which may be tricky and require some manual intervention. We have a few spots where a customer has an entire subnet, and we shape them collectively with a single speed limit in a single queue. For example, we provide lobby WiFi at a public housing facility. The lobby is handing out IPs in the range 100.64.20.0/24. I've been using the Trie support on our setup to lump them together (so we don't run NAT at the site, it goes to our egress NAT - making for a much faster/happier router and better queueing).
  • It looks like allowedSubnets is processed, but ignoredSubnets is not. We'll need that if we start processing all of the IPs found on devices, since we have about 600 different 192.168.15.0/24 subnets NATed at the customer. (fixed in Add integrationCommon.py - shared functionality #142 )

Niceties I'd like to try and arrange:

  • Some choice of topology. Bracket lets you pick "flat" (every customer parented off the root), "AP only" (APs are a top layer), "Site only" (sites are top level entries and every customer feeds off of the site) and "full" (which builds a complete topology graph between sites and maps the entire network).
  • The bane of my existence, relays always break topology. (A "relay" being a customer fed via another customer). BracketQoS occasionally fails on these. I swear my colleagues come up with new and interesting topologies to install every time I take a day off.
  • Suspended customers. One thing we found useful with Preseem - and ported over to our version of BracketQoS - was the ability to set a "suspended customers get this much Internet" option. We'd pick a low number, so their service sucked rather than being off altogether (helpful if they have VoIP and you don't want to cut off 911, and if your "pay your bill!" page is offsite)
@thebracket thebracket added the enhancement New feature or request label Oct 21, 2022
@thebracket thebracket self-assigned this Oct 21, 2022
@rchac
Copy link
Member

rchac commented Oct 21, 2022

My intent is to gradually fix these issues and offer them up to LibreQoS (rather than just handing out my Rust-based tool and keeping it updated separately).

Thank you!

Some choice of topology. Bracket lets you pick "flat" (every customer parented off the root), "AP only" (APs are a top layer), "Site only" (sites are top level entries and every customer feeds off of the site) and "full" (which builds a complete topology graph between sites and maps the entire network).

That makes sense. I think adding a "flat" option would be great.

  • The bane of my existence, relays always break topology. (A "relay" being a customer fed via another customer). BracketQoS occasionally fails on these. I swear my colleagues come up with new and interesting topologies to install every time I take a day off.

My solution has been to create a UISP site for each repeater PoP and have the host household as a client of that site. It's flexible and allows operators to have complex relays with multiple APs and such. Is this a reasonable workaround? If not we can try to have it better accommodate these relay site cases.

image

Suspended customers. One thing we found useful with Preseem - and ported over to our version of BracketQoS - was the ability to set a "suspended customers get this much Internet" option. We'd pick a low number, so their service sucked rather than being off altogether (helpful if they have VoIP and you don't want to cut off 911, and if your "pay your bill!" page is offsite)

Hm, I just assumed suspension would be handled separately (we do redirect to payment portal via MikroTik) so I excluded suspended subscribers from even being shaped. This makes sense and wouldn't be that hard to implement. I think this is a good idea.

thebracket added a commit to thebracket/LibreQoS that referenced this issue Oct 21, 2022
Introduce new function, `isDeviceRoleValid` that accepts most device
types (and rejects "ap"). Use this instead of only accepting "router" as
a valid device role for end-points.

This tackles several corner cases in our UISP setup:

* We often have customers on non-Ubiquiti gear, with their IP addresses
  tracked by "other devices". For example, a Mimosa C5x (which only does
  bridge mode) might have an "Other" device for the CPE itself and a
  "Service IP" record tracking the customer's bridged address. (In our
  setup, there's some DHCP Option 82 magic to make this happen).
* We frequently track Cambium devices in a similar manner, but they are
  often in router/NAT mode - so we just need the device IP for shaping.
* Many of the older M5 devices show "station" in this field even though
  they are in "router" mode. I'm not sure if that's a bug in UISP.

This upped the number of discovered devices from 59 to a more realistic
539 when run against our WISP's UISP setup.

References: LibreQoE#140

I'm pretty sure this needs a bit more work before turning it into an
upstream patch.

Signed-off-by: Herbert Wolverson <[email protected]>
@dtaht
Copy link
Collaborator

dtaht commented Oct 21, 2022

Hilariously, I run out of bandwidth on celluar all the time, they actually rate limit it to about 2Mbits with sane buffering,
and with cake in the way on my usb tether, I hardly notice. videoconference still "just work", web pages get slow, but I don't use the web much.

@thebracket
Copy link
Collaborator Author

thebracket commented Oct 21, 2022

Suspension is an odd one. We work with a third-party who provide VoIP to some of our customers, and they were pretty insistent on allowing 911 calls even if the Internet service is suspended. So we do the redirect also, but only for web traffic. (@dtaht would be able to do most things that weren't the web, and is smart enough to open a VPN... we don't block that, right now, so he'd have free service until our installer shows up for the gear... it's not perfect, but it's working)

The "site" model for relays is how you should do it, and we used to do it that way. We have something like 75 site-to-site relays now, and it became really unwieldy. So we have a bunch of client sites linked to other client sites. It's pretty ridiculous, but if I don't support it I get grumbles from down the hall...

A funny one. So a non-profit gets a big circuit from us. Easy - client site off of a tower. They realize that they really should be two non-profits and put up a building on the same site - which just happens to be inaccessible due to terrain. So now there's a relay from charity 1 to charity 2. Initially in the same client site because Charity 1 wanted to pay for it all. Of course, time passes and Charity 1 is complaining that Charity 2 are using all their bandwidth so they've agreed to pay for their own. No biggie, now Charity 2 is a client site - with its own bandwidth tracking. Another charity (they tend to cluster) sets up shop next to Charity 2, and want a relay too. So now Charity 1 has a site with 3 client sites coming off of it. And it just keeps going. There's something like 5 charities, 2 of the manager's houses, a church and a barn all linked up - sometimes daisy chained. Ugh.

Edit: forgot to mention that they are all in a bowl-shaped valley with conservation department rules prohibiting tower construction.

@rchac
Copy link
Member

rchac commented Oct 21, 2022

I feel you there, building towers is pretty much a no-go where we are thanks to zoning, though we are considering OTARD hub towers to skirt around that. Tower construction limitations make these complex repeater setups inevitable. Given how many existing sites are already set up in UISP like that for your network, let's accommodate them going forward. =)

@dtaht
Copy link
Collaborator

dtaht commented Oct 21, 2022

I think cake so saves your bacon on each hop here... but I imagine it is all nat hell?

@thebracket
Copy link
Collaborator Author

Not really NAT hell. There's a router at each site with links to other sites, with a "customer" port that provides connectivity to the customer. The routers relay DHCP requests from each router (adding option 82 data on the way) to ensure that whatever gets plugged into the customer's port receives the correct public IP.

I really should open source our "make option 82 work with UISP" setup, one day. In any client site, we setup an "other" device with a MAC address (equal to the port providing service's MAC), the name "Service IP" and the intended IP address as the device's address. A program periodically reads UISP and builds a DHCP configuration (ye olde isc-dhcpd) and hot-reloads it when it changes. Combine that with Bracket assigning queues to the customer and it's really seamless. Whatever the customer plugs in gets the right IP, and is shaped appropriately. There's even a small pool of IPs for each area into which "we've no record of you existing" devices get dumped (with short lease times) and redirect to a page reminding our installer to finish the process.

@dtaht
Copy link
Collaborator

dtaht commented Oct 21, 2022

What y'all do is so different than my second generation attempt in 2008. I wish I'd published it. I had had great pain in PPPoe in my first generation network, and said screw it, used static IPv6/48 as my underlying transport, allowed service or not based on the underlying radio MAC address, tunneled ipv4 under that, and split bandwidth up evenly (or so I thought) via SFQ. It was a minimum amount of service (5mbit) up to whatever was available, flat rate (well, I soaked the gringos and intended to subside the schools).

Was all you can eat, no complicated shaping needed. The cpe did their own dhcp for ipv4. Of course, no billing systems or decent shaping systems existed at the time either!

@thebracket
Copy link
Collaborator Author

In my testbed, commit thebracket@5b57b9a contains a bit more work on this:

  • I've got ignoredIPs doing something.
  • I've added some wrappers to make it easier for me to reason what's going on.
  • A good start on a flat topography.

@dtaht
Copy link
Collaborator

dtaht commented Mar 19, 2023

@rchac @thebracket it looks like you have covered most of this. What haven't you covered?

@thebracket
Copy link
Collaborator Author

thebracket commented Mar 19, 2023 via email

@thebracket
Copy link
Collaborator Author

"Infrastructure" items (which may or may not be a good idea) and a good support-oriented long-term stats retention are the only remaining items on this. I don't think either is a 1.4 issue, changing the milestone.

@thebracket thebracket removed this from the v1.4 milestone Mar 23, 2023
@bile0026
Copy link
Contributor

+1 for "suspension" feature. Must have for my network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants