Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation regarding bkpt and related registers #38

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LoekHabets
Copy link
Contributor

I have done some research around the breakpoint instruction and some of the undocumented registers that are intended to be used with it. I figured I should put my findings in the Addendum.

I'd like to bring some extra attention to this paragraph:

In case you want to debug using breakpoints and the maximum ioctl timeout of 1 second is too short, it is recommended to use the V3D QPU scheduler registers to schedule threads directly with unlimited run time. Keep in mind that this somewhat breaks interrupts, as although the V3D_DBQITC register works as intended, it is quickly reset by the Linux kernel, so polling this register at a high rate is required for reliable use of the host interrupt peripheral. Luckily, interrupts are no longer needed without use of the mailbox interface.

I am not sure this should be included at all. After all this is a Linux-specific problem; as a user space program we do not have access to hardware interrupts. However, it also begs the question whether direct scheduling could be added to vcio2 in the future and whether it is possible to then bring IRQ functionality back, while retaining the option to run programs asynchronously or with very long timeouts.

@maazl
Copy link
Owner

maazl commented Jul 9, 2024

Hmm, direct scheduling has the major disadvantage that it may give race conditions with other code accessing the hardware registers. Furthermore polling hogs the ARM CPU. On the other side the mailbox interface of the firmware adds a considerable delay.

Maybe another approach might work as well.
First of all you can specify the timeout at exec_qpu. But of course it makes no sense to block the mailbox channel for a long time.
But there is another option. While the timeout causes no longer to wait for the interrupt it does not stop the QPU from executing the code unless you switch the QPU power off. So this could be used for debugging. You just need to avoid the call to qpu_enable(false) or do not close the driver handle in case of vcio2.

I did not check whether a race condition free check for running code is possible. In this case a IOCTL for polling (at low rate) could be implemented as IOCTL in vcio2 for long running QPU code. Unfortunately it might give false negative results when another QPU code starts within the polling intervall.

@LoekHabets
Copy link
Contributor Author

LoekHabets commented Jul 11, 2024

Okay, in that case I suppose it is better to remove that paragraph until there is a safer solution, do you agree?

I am out of my depth when it comes to writing Linux kernel drivers, but do you think it is possible to handle direct register access via vcio2 and put a global mutex on the hardware registers? And as far as I can tell, it is not necessary to poll the DBQITC register from the kernel driver since it can handle the IRQ. Low rate polling of the HLT register is fine, because that one will never change state until it (or RUN) is written to anyway. We just need to know which QPUs got which threads.

Unfortunately, I have not yet found a way to influence or retrieve to which QPU a new thread is scheduled. Oddly enough, it appears that using the mailbox causes the QPUs to be scheduled in ascending order, while direct scheduling does so in descending order. Perhaps it prefers scheduling new threads onto unhalted QPUs first? I need to check this.

One point that we are at risk of missing here is that by supporting the scheduling of new QPU threads while others are still running, we are exposing the fundamentally unsafe nature of the GPU's memory model: you must use VPM to be able to write data back into RAM. As far as I am aware, it is not possible to allocate sections of VPM to the QPUs in hardware outside of "all user programs get these 4 KiB". Different code may have different ways of handling VPM safety so hybrid scheduling is risky business, unless there is some standardized way to do this. I have been thinking about writing a shared .qinc file that contains some macros that "safe" programs can use, but that is beyond the scope of this discussion for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants