Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw file colors are being clipped due to Rawpy postprocessing auto brightness #86

Closed
atadams opened this issue Jun 3, 2024 · 9 comments

Comments

@atadams
Copy link

atadams commented Jun 3, 2024

UPDATE: @MediaTruth opened a issue and is commenting on this issue. I want it to be known that he has blocked me even though I've never had any interaction with him. The block prevents me from commenting on his issue. I don't believe he is acting in good faith. I also suspect that he is actually @Ray9T using an alternate account.


When using Sherloq on a raw file, I noticed some questionable results, e.g., the same PCA mean vector values, and the results of the Signal Separation showed areas that appeared to have no noise. When looking at the Channel Histogram, the upper end of one channel was clipped.

This clipping appears to be due to Rawpy’s postprocessing automatically increasing the brightness (no_auto_bright=False is the default) and the ratio of clipped pixels (auto_bright_thr) defaulting to 0.01 (1%).

At the very least, shouldn’t auto_bright_thr be set to 0 on line 204 of utility.py so no pixels are clipped?

Link to file in question

@atadams
Copy link
Author

atadams commented Jun 10, 2024

To illustrate the issue, these are the results for a .CR2 raw file that is being clipped. Shown are the results with the default settings and when the auto_bright_thr parameter (ratio of clipped pixels) of the rawpy postprocess function is set to 0 in line 204 of utility.py.

204: image = cv.cvtColor(raw.postprocess(use_auto_wb=True, auto_bright_thr=0, ), cv.COLOR_RGB2BGR)

Histogram with default settings (red channel clipped)

default-histogram

Histogram with auto_bright_thr=0 (no clipping)

auto_bright_thr0-histogram

Signal Separation with default settings

default-signal-sep

The clipped areas are standing out.

Signal Separation with auto_bright_thr=0

auto_bright_thr0-signal-sep

The image appears more homogeneous.

Please let me know if there is anything else I can provide to help illustrate this issue, and thank you for your work on this app.

@atadams atadams closed this as completed Jun 10, 2024
@atadams atadams reopened this Jun 10, 2024
@Ray9T
Copy link

Ray9T commented Jun 20, 2024

I checked, looks like the problem is with your images atadams.
Sherloq does not clip the data, it's calling the standard libraries.

@Ray9T
Copy link

Ray9T commented Jun 20, 2024

Your images do not look authentic, have you checked your images before blaming the tools?
I looked at your image and it looks compressed with jpeg compression artifacts.

The area with noise is how the pixel values are being read. It's likely the images are edited using tools like Photoshop. We see this often of edited images.

@GuidoBartoli
Copy link
Owner

GuidoBartoli commented Jun 20, 2024

When using Sherloq on a raw file, I noticed some questionable results, e.g., the same PCA mean vector values, and the results of the Signal Separation showed areas that appeared to have no noise. When looking at the Channel Histogram, the upper end of one channel was clipped.

This clipping appears to be due to Rawpy’s postprocessing automatically increasing the brightness (no_auto_bright=False is the default) and the ratio of clipped pixels (auto_bright_thr) defaulting to 0.01 (1%).

At the very least, shouldn’t auto_bright_thr be set to 0 on line 204 of utility.py so no pixels are clipped?

Link to file in question

Hi @atadams, I also noticed a strange behavior in PCA average values in RAW images, thanks for reporting that. Maybe it is a bug, I will investigate it and let you know my findings!

GuidoBartoli added a commit that referenced this issue Jun 20, 2024
… 'PIL.JpegImagePlugin'` error (#72)

- Avoid clipping when importing RAW files (#86)
- Added Image Resampling tool (#88)
- Updated README.md
@GuidoBartoli
Copy link
Owner

Should be solved by 37eb1c5.

@atadams
Copy link
Author

atadams commented Jun 20, 2024

@GuidoBartoli, thanks for addressing this issue. This does resolve the clipping issues. But, after further investigation, I believe there is still one problem. The issue with the same PCA means is related to auto_brightness, but is actually due to the image being cast as np.float32 in line 40 of pca.py.

x = np.reshape(image, (rows * cols, chans)).astype(np.float32)

When the mean is calculated, it limits the sum of the pixels to 2^32. That limited sum is divided by the number of pixels to get the erroneous mean. This isn't limited to raw files. It has to do with the size and values of the pixels, i.e., the sums of larger and brighter (higher pixel values) images are more likely to reach the 2^32 limit.

You can test this by creating a large, solid light-color image, noting the RGB values (I used 180, 220, 241)

image

When you run the PCA tool on the image, the means should equal the RGB values of our solid, but they don't and are equal. To further illustrate the issue, if you divide 2^32 by the number of pixels, you will get the displayed erroneous mean.

(2^32)/(4500*6000) = 159.07286

Changing the np.float32 to np.float64 will resolve the same PCA means issue as shown by the displayed means matching the image's RGB values.

image

I believe we mistook this for a raw file issue because raw files are typically large and because rawpy brightens and white balances the image when postprocessing is done. This increases the pixel values, making it more likely that the 2^32 limit will be reached when summed for the mean calculation.

@MediaTruth
Copy link

MediaTruth commented Jun 20, 2024

@GuidoBartoli for context,
The image shared by Adam is part of a larger collection of synthetic (fake) raw images that circulate on social media to disseminate disinformation. These images remain ‘unverified’ by professionals, and Canon has determined that some CR2 files are manipulated and presented as legitimate CR2 RAW images.

Sherloq serves a dual purpose: it helps communities identify fabrications in images while educating them about image manipulation on social media. Additionally, it also perfectly corroborates issues related to images found by tools like Darktable.

Interestingly, alias ‘atatams’ is part of the group of individuals involved in this disinformation campaign. They have targeted Sherloq, its volunteers, and its creator on Twitter spaces, and are now propagating further disinformation by fabricating bugs.

My intention is to provide context on these images, and their plan to inundate with fake bugs to discredit the application.

GL9pZZxbIAEzM1f

@under-score
Copy link

thank you @MediaTruth AFAIK Canon CR2 RAW can be easily faked while I never heard of a Nikon NEF hack. Maybe C2PA content credentials could be included in Sherloq if it is more widely distributed.

@GuidoBartoli
Copy link
Owner

@GuidoBartoli for context, The image shared by Adam is part of a larger collection of synthetic (fake) raw images that circulate on social media to disseminate disinformation. These images remain ‘unverified’ by professionals, and Canon has determined that some CR2 files are manipulated and presented as legitimate CR2 RAW images.

Sherloq serves a dual purpose: it helps communities identify fabrications in images while educating them about image manipulation on social media. Additionally, it also perfectly corroborates issues related to images found by tools like Darktable.

Interestingly, alias ‘atatams’ is part of the group of individuals involved in this disinformation campaign. They have targeted Sherloq, its volunteers, and its creator on Twitter spaces, and are now propagating further disinformation by fabricating bugs.

My intention is to provide context on these images, and their plan to inundate with fake bugs to discredit the application.

I don't care about @atadams social background and the politics that may be behind what you are talking about, I evaluate issues purely from a technical point of view, and I can confirm that I find his remarks sensible: I have modified the code to disable the clipping introduced by rawpy default options and increased the accuracy in the PCA calculation, so I consider this issue closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants