Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Display by Mouse Location #93

Merged

Conversation

cparish312
Copy link
Contributor

This PR is addressing #91

The issue was that when a window in in full screen mode on an external displayNSScreen.main is set to 1 (which is the main monitor) even if the active application is on the external monitor. Since this was method used to determine which display to record, an external monitor in fullscreen mode was never able to be recorded.

I attempted to use the active window frame to identify the active display ID but things got strange with the returned frame dimensions, and I could not identify a consistent way to get the correct display ID. This got me concerned about the onlyOCRFrontmostWindow functionality as it utilizes the window.frame to determine the region to crop. Two weird things occurred:

  1. Sometimes the height of the frame would be a number below 100 despite the window being much larger (actually happened most when the window was fullscreen)
  2. Moving the frame around and resizing it did not change the window.frame dimensions 😬

Also, since the window.frame will contain negative values when the window is on an external monitor, the cropImage functionality fails causing a fallback to full image OCR.

Sub ideal solution:
Use the mouse location to identify the active display.
Main considerations:

  1. Now the active application stored is not necessarily the one in the given frame. This makes the searching by application less functional. Looking at rewind.ai's product, I noticed that the active application is not always the image being shown on the timeline, and I'm assuming this is the reason why.
  2. To avoid cropping the wrong region when the active window in on the main monitor, but the active display is not the main monitor, cropping before OCR will only occur when the displayID is 1. Although not very clean, having the active window on an external monitor and the active display as the main monitor is handled already since cropImage will fail with the negative window frame values.

P.S. Maybe this is not an issue when there are no external monitors, but the funky window.frame results (even when the window was on the main display) sketched me out, so I will personally be turning off onlyOCRFrontmostWindow 🤷. Hopefully can find a good solution, but a small amount of research leads me to believe using the accessibility API may be the only robust option.

rem/remApp.swift Outdated Show resolved Hide resolved
@jasonjmcghee
Copy link
Owner

jasonjmcghee commented Apr 30, 2024

@cparish312 weird frame dimensions were probably due to scaling factors... which get weird and tricky when different monitors have different scaling factors - backingScaleFactor.

Also - what exactly does DisplayID being 1 mean?

@cparish312
Copy link
Contributor Author

cparish312 commented Apr 30, 2024

Okay cool I can look into the backingScaleFactor! Here is an example where I had all 3 windows on fullscreen:

App: Xcode
window.frame: (712.0, -1056.0, 1920.0, 54.0)
Screen: VG245, Display ID: 5, Coordinates: x=712.0, y=982.0, Width=1920.0, Height=1080.0

App: Firefox
window.frame: (-1208.0, -1080.0, 1920.0, 72.0)
Screen: HP 24mh, Display ID: 3, Coordinates: x=-1208.0, y=982.0, Width=1920.0, Height=1080.0

App: Messages
window.frame: (0.0, 37.0, 1512.0, 68.0)
Screen: Built-in Retina Display, Display ID: 1, Coordinates: x=0.0, y=0.0, Width=1512.0, Height=982.0

From my understanding (at least for my device and settings), DisplayID being 1 means that the active display is the main display of the computer.

Something else I came across is the Screens have SeparateSpaces setting that could impact a bunch of these APIs. I have mine set to true.

@cparish312
Copy link
Contributor Author

How do you feel about using the mouse for tracking the active display?

@jasonjmcghee
Copy link
Owner

jasonjmcghee commented Apr 30, 2024

I'm too ignorant of mac apis to be able to say it confidently, but i'd be very hesitant to rely on a specific display id to determine anything (displayId == 1) - i'd use == CGMainDisplayID()

re using mouse to track the active display - feels relatively (or even quite) sketchy. if i left my mouse on my external monitor, and used cmd + tab to swap over to my ide and start typing, i'd sure hope rem would record wherever i am "active", not necessarily where my mouse happens to be

@jasonjmcghee
Copy link
Owner

Could we do something like NSApplication.shared.keyWindow?.screen

@cparish312
Copy link
Contributor Author

I agree the mouse tracking is not ideal but I wonder if there's a good reason rewind decided to use it. My guess would be robustness given how fickle these API window approaches seem to be.

Just tried NSApplication.shared.keyWindow?.screen. NSApplication.shared.keyWindow is returning nil and I think it is because this only accesses the window within your application that is receiving keyboard events.

@cparish312
Copy link
Contributor Author

That said I am currently using my mouse for scrolling even though my active window is on a different monitor so the trade off seems to go both ways (for me at least)

@jasonjmcghee
Copy link
Owner

Maybe mouse event steals focus to wherever it is and keyboard event falls back to active window?

@jasonjmcghee
Copy link
Owner

We have global listeners for both that could trigger

@cparish312
Copy link
Contributor Author

Unfortunately, not seeing any way to get the active window from the actual keyboard event listener.

I'm a bit worried about this approach bringing back the original bug. Imagine you scroll over to an external monitor that is in full screen mode (start capturing on that monitor). Then, you click and start typing (starts recording your main monitor).

Really wish NSScreen.main didn't have this weird functionality (/bug)!

@jasonjmcghee
Copy link
Owner

Does CGMainDisplayID have the same problem?

@jasonjmcghee
Copy link
Owner

I think NSScreen.main is "the screen with the menu bar"

@cparish312
Copy link
Contributor Author

CGMainDisplayID() just returns 1. It's just the display ID of the main display

@cparish312
Copy link
Contributor Author

They should make a NSScreen.menuBar for that 😂

@cparish312
Copy link
Contributor Author

I mean another option is to completely rethink the multiple monitor recording where one possibility is just recording the displays with changes, but this could obviously get more tricky than it is worth at this stage (unless we came up with some elegant way to achieve this). Also this assumes the capturing and image comparison steps are relatively cheap computationally.

@cparish312
Copy link
Contributor Author

Overall I'm leaning towards mouse location unless you're strongly opposed (or we find the "real" NSScreen.main without having to use the Accessibility API)

@jasonjmcghee
Copy link
Owner

Can we make a setting that says "always record window with mouse"?

@cparish312
Copy link
Contributor Author

Yeah sure! And then otherwise it defaults to NSScreen.main?

@jasonjmcghee
Copy link
Owner

Yeah until we find a better solution!

let mouseLocation = NSEvent.mouseLocation
if let screen = NSScreen.screens.first(where: {$0.frame.contains(mouseLocation)}) {
if let screenID = screen.deviceDescription[NSDeviceDescriptionKey("NSScreenNumber")] as? NSNumber {
displayID = CGDirectDisplayID(screenID.uint32Value)
Copy link
Contributor Author

@cparish312 cparish312 May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put mouse first since it sometimes results a nil displayID when computer wakes up

guard displayID != nil else { return }

guard displayID != nil else {
logger.debug("DisplayID is nil")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some instances where recording will randomly stop so added these debugs

@cparish312
Copy link
Contributor Author

This is a great compromise! It is probably worth putting this in the docs somewhere as it would be frustrating to realize rem was never recording your external monitors in fullscreen.


let frameId = DatabaseManager.shared.insertFrame(activeApplicationName: activeApplicationName)

if settingsManager.settings.onlyOCRFrontmostWindow {
if settingsManager.settings.onlyOCRFrontmostWindow && displayID == CGMainDisplayID() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still applicable?

Copy link
Contributor Author

@cparish312 cparish312 May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so for mouse mode it is to make sure the image isn't cropped when the active window is on the main display but the recorded display is not the main display.

For active window mode it doesn't matter (for my set up at least) since the cropping step will fail because the window.frame will have negative indexes (Also the frame would be wrong either way). I briefly tried to recover this functionality for the external monitors but ran into the funky window.frame behaviors. Did you think that the cropping was working for external monitors previously?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funky window frame behaviors

Is the backingScalingFactor being properly applied to the window frames? that's usually the culprit.

I assume the window frames need to be repositioned based on the monitor.

I think there's a larger issue here.

So - should checking the active mouse mode uncheck the only ocr frontmost window setting?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Seems like they can't be used together)

Copy link
Contributor Author

@cparish312 cparish312 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I mean theoretically it could work when both the mouse and active window are on display but would require some more complex handling (that we can't even currently do without consistently identifying the display of the active window).

Also, I think the reality is for people with multiple monitors most of the time the front window will be the entire display so I don't think the feature is a big save. I'm going to look at the window.frame problem with some fresh eyes and try the backingScalingFactor again but assuming that doesn't work one option could be ignoring settingsManager.settings.onlyOCRFrontmostWindow when there are multiple displays. Otherwise the whole thing gets a bit funky since the window is only ever cropped on the main display.

Copy link
Contributor Author

@cparish312 cparish312 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With every window.frame exploration the mystery deepens. So It seems like it is possible to match up a window.frame to a screen location by making the y coordinate of the screen negative. However, I would be very hesitant to assume this is the case for all set ups.

Here is an example where I had all 3 windows on fullscreen:

App: Xcode
window.frame: (712.0, -1056.0, 1920.0, 54.0)
Screen: VG245, Display ID: 5, Coordinates: x=712.0, y=982.0, Width=1920.0, Height=1080.0
backingScaleFactor: 1.0

App: Firefox
window.frame: (-1208.0, -1080.0, 1920.0, 72.0)
Screen: HP 24mh, Display ID: 3, Coordinates: x=-1208.0, y=982.0, Width=1920.0, Height=1080.0
backingScaleFactor: 1.0

App: Messages
window.frame: (0.0, 37.0, 1512.0, 68.0)
Screen: Built-in Retina Display, Display ID: 1, Coordinates: x=0.0, y=0.0, Width=1512.0, Height=982.0
backingScaleFactor: 2.0

"Funky" behavior:

  1. Weird very small heights (sometimes) when in full screen. Ran two back to back tests without changing the code or moving the window and one had a height of 72 and one had 1080. Strange the widths always seem to be reasonable 🤷
  2. The window.frame doesn't change when the window is manipulated. I tried both moving it around and resizing it.
  3. I have a terminal open on each monitor and it seems to always return the window.frame of the terminal on the main monitor even if I am actively using the terminal on an external monitor.

From my perspective, the cleanest solution is to ignore the settingsManager.settings.onlyOCRFrontmostWindow setting when there are multiple displays. Based on some online research and my belief that rewind just uses mouse location and no window based OCR optimization, I'm not entirely sure a robust solution exists given the current APIs (without using the Accessibility API). unlost uses the Accessibility API to do window based OCR but it causes this annoying icon to appear on the active window. Also for most users with multiple monitors, I doubt it will be much of a compute save anyways.

What do you think?

Copy link
Owner

@jasonjmcghee jasonjmcghee May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only running ocr on the current window can be a pretty large savings especially for people using multiple monitors. if i have a 4k monitor, i can easily have 4-5 open windows with text. only doing 1 is a big deal. running rem 24/7 could/can be expensive from an energy perspective, so always working to reduce it as much as possible.

if you're using external monitors though - probably plugged in.

@cparish312
Copy link
Contributor Author

cparish312 commented May 2, 2024

One other thing worth noting is if I wake up and my mouse is on my one monitor that takes a second to get an active display (or if a monitor is unplugged), the shareableContent.displays.first(where: { $0.displayID == displayID }) will fail causing the screenshot loop to be exited. For wake up a 4 second or so time delay could fix this. I'm unsure how to elegantly handle the unplugging. One solution I thought of is when the display can't be retrieved instead of returning schedule another screenshot. Then you could add something to make sure it can only try 3 times in a row or so and after that it will return. Thoughts?

Also, thanks for all of the support with these updates!

@jasonjmcghee
Copy link
Owner

What do you mean by "the unplugging"? I'd say retry a few times and then stop remembering seems like a fine solution here!

I want to enable you to get this over the line, so is this an existing issue? Or does it cause problems as a result of this change set?

If the former, happy to get this in as is!

@jasonjmcghee jasonjmcghee merged commit 5449bcc into jasonjmcghee:main May 3, 2024
@cparish312 cparish312 mentioned this pull request May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants