Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result while running biolatency.py with flags option on kernel 4.9.266 #3587

Open
ismhong opened this issue Aug 19, 2021 · 4 comments

Comments

@ismhong
Copy link
Contributor

ismhong commented Aug 19, 2021

Environment: Linux kernel 4.9.266 on aarch64 platform
Issue: There is no "Write" operation while running biolatency.py with flags option on kernel 4.9.266, the workload is dd if=/dev/zero of=/mnt/sda1 bs=1M count=64, but it works fine with Linux kernel 5.10.46 on the same platform

The incorrect result attached at the end.

I have made some investigation and find the root cause.
Linux kernel 4.10 and later, the definition of cmd_flags of struct request in include/linux/blkdev.h is changed by patchset.

╔════════════════════════════════════╗                                      
║4.9.x and earlier struct request -> ║                                      
║             cmd_flags              ║                                      
╚════════════════════════════════════╝                                      
┌────────┬─────────────────────────────────────────────────────────────────┐
│ req_op │                            req_flag                             │
└────────┴─────────────────────────────────────────────────────────────────┘
 ▶──────◀ ▶───────────────────────────────────────────────────────────────◀ 
  3 bits                               61 bits                              
                                                                            
                                                                            
╔═════════════════════════════════════════════════════════════╗             
║                      4.10.x and later                       ║             
║                 struct request -> cmd_flags                 ║             
╚═════════════════════════════════════════════════════════════╝             
┌────────────────────────────────────────────────────────┬─────────────────┐
│                        req_flag                        │     req_op      │
└────────────────────────────────────────────────────────┴─────────────────┘
 ▶──────────────────────────────────────────────────────◀ ▶───────────────◀ 
                          56 bits                              8 bits       
                                                                            

There is a fix (#889) for bcc tool (biosnoop, biotop) related to this kernel changes.
The implementation of biolatency is passing the cmd_flags of struct reqeust back to user space, and Python part (https://github.com/iovisor/bcc/blob/master/tools/biolatency.py#L183) to analyze the meaning of cmd_flags.

I have made a local changes for biolatency to fit the data layout of cmd_flags for Linux kernel 4.9, and everything works correctly, but I am not sure how to make a coexist changes to fit both different kernel data layout.

For req_op, maybe we can reference to the changes of biosnoop/biotop to extract req_op in BPF program, but the definition of req_flags is totally different between two data layout.

I am wondering there is any more correct way to make a fix for this issue. Could you please give me some advice for it? Thank you.

4.9.266

(bcc)root@OpenWrt:/# biolatency -e -T -F
Tracing block device I/O... Hit Ctrl-C to end.
^C
15:58:50

flags = Idle-Metadata-Read
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 29       |************************************    |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 32       |****************************************|
      2048 -> 4095       : 12       |***************                         |
      4096 -> 8191       : 2        |**                                      |
      8192 -> 16383      : 0        |                                        |
     16384 -> 32767      : 0        |                                        |
     32768 -> 65535      : 0        |                                        |
     65536 -> 131071     : 1        |*                                       |

flags = Read
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 7        |**                                      |
       128 -> 255        : 24       |*********                               |
       256 -> 511        : 49       |*******************                     |
       512 -> 1023       : 99       |****************************************|
      1024 -> 2047       : 11       |****                                    |

flags = Idle-NoMerge-Metadata-Read
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 0        |                                        |
      4096 -> 8191       : 0        |                                        |
      8192 -> 16383      : 0        |                                        |
     16384 -> 32767      : 0        |                                        |
     32768 -> 65535      : 2        |*                                       |
     65536 -> 131071     : 61       |****************************************|

flags = NoWait-Background-Idle-Priority-Read
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 4        |****************************************|
       128 -> 255        : 2        |********************                    |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 1        |**********                              |

avg = 18477 usecs, total: 6208330 usecs, count: 336
@chenhengqi
Copy link
Collaborator

I have made a local changes for biolatency to fit the data layout of cmd_flags for Linux kernel 4.9, and everything works correctly, but I am not sure how to make a coexist changes to fit both different kernel data layout.

See #3540

@ismhong
Copy link
Contributor Author

ismhong commented Aug 19, 2021

Hi @chenhengqi
Thank you for getting back to me.

As far as I know, the KERNEL_VERSION() macro is used in BPF program.
The meaning of req_flag is parsed in user space Python part (https://github.com/iovisor/bcc/blob/master/tools/biolatency.py#L183).
How do we identify the kernel version in user space Python code?

In the meantime, the definition of rq_flag_bits is pretty different from 4.9.x and latest kernel.
Should we define two set of flag string in Python parts or is there any elegant solution for this? Thank you.

// 4.9.266
enum rq_flag_bits {
	/* common flags */
	__REQ_FAILFAST_DEV,	/* no driver retries of device errors */
	__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
	__REQ_FAILFAST_DRIVER,	/* no driver retries of driver errors */

	__REQ_SYNC,		/* request is sync (sync write or read) */
	__REQ_META,		/* metadata io request */
	__REQ_PRIO,		/* boost priority in cfq */

	__REQ_NOIDLE,		/* don't anticipate more IO after this one */
	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
	__REQ_FUA,		/* forced unit access */
	__REQ_PREFLUSH,		/* request for cache flush */

	/* bio only flags */
	__REQ_RAHEAD,		/* read ahead, can fail anytime */
	__REQ_THROTTLED,	/* This bio has already been subjected to
				 * throttling rules. Don't do it again. */

	/* request only flags */
	__REQ_SORTED,		/* elevator knows about this request */
	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
	__REQ_NOMERGE,		/* don't touch this for merging */
	__REQ_STARTED,		/* drive already may have started this one */
	__REQ_DONTPREP,		/* don't call prep for this one */
	__REQ_QUEUED,		/* uses queueing */
	__REQ_ELVPRIV,		/* elevator private data attached */
	__REQ_FAILED,		/* set if the request failed */
	__REQ_QUIET,		/* don't worry about errors */
	__REQ_PREEMPT,		/* set for "ide_preempt" requests and also
				   for requests for which the SCSI "quiesce"
				   state must be ignored. */
	__REQ_ALLOCED,		/* request came from our alloc pool */
	__REQ_COPY_USER,	/* contains copies of user pages */
	__REQ_FLUSH_SEQ,	/* request for flush sequence */
	__REQ_IO_STAT,		/* account I/O stat */
	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
	__REQ_PM,		/* runtime pm request */
	__REQ_HASHED,		/* on IO scheduler merge hash */
	__REQ_MQ_INFLIGHT,	/* track inflight for MQ */
	__REQ_NR_BITS,		/* stops here */
};
// 5.13
enum req_flag_bits {
	__REQ_FAILFAST_DEV =	/* no driver retries of device errors */
		REQ_OP_BITS,
	__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
	__REQ_FAILFAST_DRIVER,	/* no driver retries of driver errors */
	__REQ_SYNC,		/* request is sync (sync write or read) */
	__REQ_META,		/* metadata io request */
	__REQ_PRIO,		/* boost priority in cfq */
	__REQ_NOMERGE,		/* don't touch this for merging */
	__REQ_IDLE,		/* anticipate more IO after this one */
	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
	__REQ_FUA,		/* forced unit access */
	__REQ_PREFLUSH,		/* request for cache flush */
	__REQ_RAHEAD,		/* read ahead, can fail anytime */
	__REQ_BACKGROUND,	/* background IO */
	__REQ_NOWAIT,           /* Don't wait if request will block */
	/*
	 * When a shared kthread needs to issue a bio for a cgroup, doing
	 * so synchronously can lead to priority inversions as the kthread
	 * can be trapped waiting for that cgroup.  CGROUP_PUNT flag makes
	 * submit_bio() punt the actual issuing to a dedicated per-blkcg
	 * work item to avoid such priority inversions.
	 */
	__REQ_CGROUP_PUNT,

	/* command specific flags for REQ_OP_WRITE_ZEROES: */
	__REQ_NOUNMAP,		/* do not free blocks when zeroing */

	__REQ_HIPRI,

	/* for driver use */
	__REQ_DRV,
	__REQ_SWAP,		/* swapping request. */
	__REQ_NR_BITS,		/* stops here */
};

@chenhengqi
Copy link
Collaborator

How about making this change to https://github.com/iovisor/bcc/tree/master/tools/old ?

@yonghong-song
Copy link
Collaborator

@ismhong As @chenhengqi suggested, let us just put the tools supporting the old kernel into old directory with some comments on it. Otherwise, tools may become too complicated and hard to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants