Page MenuHomePhabricator

CCC: Limited false flagging of non-corrupt JPG files
Open, Unbreak Now!Public

Description

Currently there have been two reports of TSB ccc tagging and logging non-corrupt files as corrupt. It also appears to be storing the wrong hash value to the images, even for their thumbnail sizes. So far File:History of Carroll County, Indiana - DPLA - a7b297e0565accecf77382ce750032cf (page 457).jpg and File:Track near Tobelbach 2020-03-11 16.jpg have been affected.

Revisions and Commits

Event Timeline

TheSandDoctor triaged this task as Unbreak Now! priority.May 7 2020, 6:14 PM
TheSandDoctor created this task.
TheSandDoctor created this object with visibility "Public (No Login Required)".

First step in triage is to revert away from using thumbnails for large image sizes.

It appears that the script is somehow saving the wrong hash values for every image. Investigating.

TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 6:45 PM
TheSandDoctor added a commit: Restricted Diffusion Commit.
TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 8:18 PM
TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 8:22 PM
TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 8:36 PM
TheSandDoctor added a commit: Restricted Diffusion Commit.
TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 8:41 PM

Turns out that hashes weren't an issue and were actually the change hash....that isn't the issue here by looks.

Restored activity, going to be adding more logging for corrupt images...want to know the type of errors being caught....

TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 9:01 PM

I added logging for the specifics of the OSError triggered when corrupt images identified. Will watch the logs closely to ensure it is capturing the sort of information that I am looking for. Hopefully this leads to some useful insight if another false positive is detected.

TheSandDoctor added a commit: Restricted Diffusion Commit.May 7 2020, 9:13 PM