Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: add network addresses to the network port objects #21

Open
pcmoore opened this issue Nov 18, 2016 · 9 comments
Open

RFE: add network addresses to the network port objects #21

pcmoore opened this issue Nov 18, 2016 · 9 comments

Comments

@pcmoore
Copy link
Member

pcmoore commented Nov 18, 2016

At present the SELinux network port objects only include the protocol/port information and not the traditional address/protocol/port information traditionally used to specify a network communication endpoint. This can be problematic for policy writers who wish to differentiate between the same port with different addresses; external networks vs localhost is a common example.

pcmoore pushed a commit that referenced this issue Dec 15, 2016
When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.

irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<c036b71c>] sh_eth_interrupt
Disabling IRQ #21

Fixes: db89347 ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
pcmoore pushed a commit that referenced this issue May 17, 2017
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
@pcmoore
Copy link
Member Author

pcmoore commented Jun 15, 2017

@wrabcak @mgrepl @rhatdan can any of you remind me of the use case for this? I vaguely remember that you needed this ability to control access to Apache's management API?

@mgrepl
Copy link

mgrepl commented Jun 19, 2017

I believe we are talking about https://bugzilla.redhat.com/show_bug.cgi?id=1168044.

@stephensmalley
Copy link
Member

Seems like a job for secmark; with that, you can already control based on both port and IP address, yes?
name_connect is essentially a legacy check predating the introduction of secmark; it is strictly less expressive/flexible.
In any event, even if we added support for (IP address, port number) tuples for portcon statements, it sounds like you'd end up allowing this by default anyway because httpd wants to be able to connect to any IP address that it binds, and we can't know that a priori in the default policy.

@wrabcak
Copy link

wrabcak commented Oct 3, 2017

It looks like apache doesn't need httpd_graceful_shutdown anymore. I cannot reproduce it on my Rawhide machine. I'll contact apache developers and ask for more info but, for now We can fix BZ(1168044) by turning this boolean OFF by default.

@wrabcak
Copy link

wrabcak commented Oct 3, 2017

Apache maintainers confirmed that this is no longer needed in Fedora27+. I'll change default value of boolean and @pcmoore I leave decision on you if we close this issue or not.

@rhatdan
Copy link

rhatdan commented Oct 3, 2017

Well even though the primary need for this is gone, I still believe having the ability to control network addresses + ports would be a big improvement. If I could allow confined domains to connect to network ports but only over localhost...

@stephensmalley
Copy link
Member

Can already be done using secmark, so maybe we ought to actually use that for once? We have all of that infrastructure sitting around essentially unused except by custom policies.

@pcmoore
Copy link
Member Author

pcmoore commented Oct 3, 2017

Using secmark is messy, and definitely not as clean as a policy-only approach.

I'm also still of the opinion that port objects should be a addr/proto/port tuple, it is much more consistent with how we networking works.

I'm happy to hear that we've resolved the Apache issue, but I'm in no hurry to close this at the moment.

@Gunni
Copy link

Gunni commented Jun 15, 2018

This might be related although i have not tracked down the underlying code that causes that message.

caddyserver/caddy#2203

pcmoore pushed a commit that referenced this issue Oct 22, 2018
pid_task() dereferences rcu protected tasks array.
But there is no rcu_read_lock() in shutdown_umh() routine so that
rcu_read_lock() is needed.
get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
then calls pid_task(). if task isn't NULL, it increases reference count
of task.

test commands:
   %modprobe bpfilter
   %modprobe -rv bpfilter

splat looks like:
[15102.030932] =============================
[15102.030957] WARNING: suspicious RCU usage
[15102.030985] 4.19.0-rc7+ #21 Not tainted
[15102.031010] -----------------------------
[15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
[15102.031063]
	       other info that might help us debug this:

[15102.031332]
	       rcu_scheduler_active = 2, debug_locks = 1
[15102.031363] 1 lock held by modprobe/1570:
[15102.031389]  #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
[15102.031552]
               stack backtrace:
[15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
[15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[15102.031628] Call Trace:
[15102.031676]  dump_stack+0xc9/0x16b
[15102.031723]  ? show_regs_print_info+0x5/0x5
[15102.031801]  ? lockdep_rcu_suspicious+0x117/0x160
[15102.031855]  pid_task+0x134/0x160
[15102.031900]  ? find_vpid+0xf0/0xf0
[15102.032017]  shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
[15102.032055]  stop_umh+0x46/0x52 [bpfilter]
[15102.032092]  __x64_sys_delete_module+0x47e/0x570
[ ... ]

Fixes: d2ba09c ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
pcmoore pushed a commit that referenced this issue Jan 27, 2020
Some QPs (e.g. XRC QP) are not tracked in kernel, in this case they have
an invalid res and should not be bound to any dynamically-allocated
counter in auto mode.

This fixes below call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000390
PGD 80000001a7233067 P4D 80000001a7233067 PUD 1a7215067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 24822 Comm: ibv_xsrq_pingpo Not tainted 5.4.0-rc5+ #21
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
RIP: 0010:rdma_counter_bind_qp_auto+0x142/0x270 [ib_core]
Code: e1 48 85 c0 48 89 c2 0f 84 bc 00 00 00 49 8b 06 48 39 42 48 75 d6 40 3a aa 90 00 00 00 75 cd 49 8b 86 00 01 00 00 48 8b 4a 28 <8b> 80 90 03 00 00 39 81 90 03 00 00 75 b4 85 c0 74 b0 48 8b 04 24
RSP: 0018:ffffc900003f39c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88820020ec00 RSI: 0000000000000004 RDI: ffffffffffffffc0
RBP: 0000000000000001 R08: ffff888224149ff0 R09: ffffc900003f3968
R10: ffffffffffffffff R11: ffff8882249c5848 R12: ffffffffffffffff
R13: ffff88821d5aca50 R14: ffff8881f7690800 R15: ffff8881ff890000
FS:  00007fe53a3e1740(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000390 CR3: 00000001a7292006 CR4: 00000000003606a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 _ib_modify_qp+0x3a4/0x3f0 [ib_core]
 ? lookup_get_idr_uobject.part.8+0x23/0x40 [ib_uverbs]
 modify_qp+0x322/0x3e0 [ib_uverbs]
 ib_uverbs_modify_qp+0x43/0x70 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs]
 ib_uverbs_run_method+0x6be/0x760 [ib_uverbs]
 ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x18d/0x3a0 [ib_uverbs]
 ? get_acl+0x1a/0x120
 ? __alloc_pages_nodemask+0x15d/0x2c0
 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs]
 do_vfs_ioctl+0xa5/0x610
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x48/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 99fa331 ("RDMA/counter: Add "auto" configuration mode support")
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Reviewed-by: Ido Kalir <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>
pcmoore pushed a commit that referenced this issue Feb 2, 2022
…enable

With HW tag-based kasan enable, We will get the warning when we free
object whose address starts with 0xFF.

It is because kmemleak rbtree stores tagged object and this freeing
object's tag does not match with rbtree object.

In the example below, kmemleak rbtree stores the tagged object in the
kmalloc(), and kfree() gets the pointer with 0xFF tag.

Call sequence:
    ptr = kmalloc(size, GFP_KERNEL);
    page = virt_to_page(ptr);
    offset = offset_in_page(ptr);
    kfree(page_address(page) + offset);
    ptr = kmalloc(size, GFP_KERNEL);

A sequence like that may cause the warning as following:

 1) Freeing unknown object:

    In kfree(), we will get free unknown object warning in
    kmemleak_free(). Because object(0xFx) in kmemleak rbtree and
    pointer(0xFF) in kfree() have different tag.

 2) Overlap existing:

    When we allocate that object with the same hw-tag again, we will
    find the overlap in the kmemleak rbtree and kmemleak thread will be
    killed.

	kmemleak: Freeing unknown object at 0xffff000003f88000
	CPU: 5 PID: 177 Comm: cat Not tainted 5.16.0-rc1-dirty #21
	Hardware name: linux,dummy-virt (DT)
	Call trace:
	 dump_backtrace+0x0/0x1ac
	 show_stack+0x1c/0x30
	 dump_stack_lvl+0x68/0x84
	 dump_stack+0x1c/0x38
	 kmemleak_free+0x6c/0x70
	 slab_free_freelist_hook+0x104/0x200
	 kmem_cache_free+0xa8/0x3d4
	 test_version_show+0x270/0x3a0
	 module_attr_show+0x28/0x40
	 sysfs_kf_seq_show+0xb0/0x130
	 kernfs_seq_show+0x30/0x40
	 seq_read_iter+0x1bc/0x4b0
	 seq_read_iter+0x1bc/0x4b0
	 kernfs_fop_read_iter+0x144/0x1c0
	 generic_file_splice_read+0xd0/0x184
	 do_splice_to+0x90/0xe0
	 splice_direct_to_actor+0xb8/0x250
	 do_splice_direct+0x88/0xd4
	 do_sendfile+0x2b0/0x344
	 __arm64_sys_sendfile64+0x164/0x16c
	 invoke_syscall+0x48/0x114
	 el0_svc_common.constprop.0+0x44/0xec
	 do_el0_svc+0x74/0x90
	 el0_svc+0x20/0x80
	 el0t_64_sync_handler+0x1a8/0x1b0
	 el0t_64_sync+0x1ac/0x1b0
	...
	kmemleak: Cannot insert 0xf2ff000003f88000 into the object search tree (overlaps existing)
	CPU: 5 PID: 178 Comm: cat Not tainted 5.16.0-rc1-dirty #21
	Hardware name: linux,dummy-virt (DT)
	Call trace:
	 dump_backtrace+0x0/0x1ac
	 show_stack+0x1c/0x30
	 dump_stack_lvl+0x68/0x84
	 dump_stack+0x1c/0x38
	 create_object.isra.0+0x2d8/0x2fc
	 kmemleak_alloc+0x34/0x40
	 kmem_cache_alloc+0x23c/0x2f0
	 test_version_show+0x1fc/0x3a0
	 module_attr_show+0x28/0x40
	 sysfs_kf_seq_show+0xb0/0x130
	 kernfs_seq_show+0x30/0x40
	 seq_read_iter+0x1bc/0x4b0
	 kernfs_fop_read_iter+0x144/0x1c0
	 generic_file_splice_read+0xd0/0x184
	 do_splice_to+0x90/0xe0
	 splice_direct_to_actor+0xb8/0x250
	 do_splice_direct+0x88/0xd4
	 do_sendfile+0x2b0/0x344
	 __arm64_sys_sendfile64+0x164/0x16c
	 invoke_syscall+0x48/0x114
	 el0_svc_common.constprop.0+0x44/0xec
	 do_el0_svc+0x74/0x90
	 el0_svc+0x20/0x80
	 el0t_64_sync_handler+0x1a8/0x1b0
	 el0t_64_sync+0x1ac/0x1b0
	kmemleak: Kernel memory leak detector disabled
	kmemleak: Object 0xf2ff000003f88000 (size 128):
	kmemleak:   comm "cat", pid 177, jiffies 4294921177
	kmemleak:   min_count = 1
	kmemleak:   count = 0
	kmemleak:   flags = 0x1
	kmemleak:   checksum = 0
	kmemleak:   backtrace:
	     kmem_cache_alloc+0x23c/0x2f0
	     test_version_show+0x1fc/0x3a0
	     module_attr_show+0x28/0x40
	     sysfs_kf_seq_show+0xb0/0x130
	     kernfs_seq_show+0x30/0x40
	     seq_read_iter+0x1bc/0x4b0
	     kernfs_fop_read_iter+0x144/0x1c0
	     generic_file_splice_read+0xd0/0x184
	     do_splice_to+0x90/0xe0
	     splice_direct_to_actor+0xb8/0x250
	     do_splice_direct+0x88/0xd4
	     do_sendfile+0x2b0/0x344
	     __arm64_sys_sendfile64+0x164/0x16c
	     invoke_syscall+0x48/0x114
	     el0_svc_common.constprop.0+0x44/0xec
	     do_el0_svc+0x74/0x90
	kmemleak: Automatic memory scanning thread ended

[[email protected]: whitespace tweak]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Doug Berger <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants