Page MenuHomePhabricator

Commons-Corruption-CheckerProject
ActivePublic

Watchers

  • This project does not have any watchers.
  • View All

Recent Activity

May 8 2020

TheSandDoctor closed T138: CCC: Followup.py needs to handle deleted images automatically as Resolved by committing Restricted Diffusion Commit.
May 8 2020, 9:00 PM · Commons-Corruption-Checker
TheSandDoctor triaged T138: CCC: Followup.py needs to handle deleted images automatically as Normal priority.
May 8 2020, 8:58 PM · Commons-Corruption-Checker

May 7 2020

TheSandDoctor added a comment to T136: CCC: Limited false flagging of non-corrupt JPG files.

I added logging for the specifics of the OSError triggered when corrupt images identified. Will watch the logs closely to ensure it is capturing the sort of information that I am looking for. Hopefully this leads to some useful insight if another false positive is detected.

May 7 2020, 9:08 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T136: CCC: Limited false flagging of non-corrupt JPG files.

Restored activity, going to be adding more logging for corrupt images...want to know the type of errors being caught....

May 7 2020, 8:54 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T136: CCC: Limited false flagging of non-corrupt JPG files.

Turns out that hashes weren't an issue and were actually the change hash....that isn't the issue here by looks.

May 7 2020, 8:42 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T136: CCC: Limited false flagging of non-corrupt JPG files.

Bot shut down in meantime

May 7 2020, 6:30 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T136: CCC: Limited false flagging of non-corrupt JPG files.

It appears that the script is somehow saving the wrong hash values for every image. Investigating.

May 7 2020, 6:27 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T136: CCC: Limited false flagging of non-corrupt JPG files.

First step in triage is to revert away from using thumbnails for large image sizes.

May 7 2020, 6:16 PM · Commons-Corruption-Checker
TheSandDoctor triaged T136: CCC: Limited false flagging of non-corrupt JPG files as Unbreak Now! priority.
May 7 2020, 6:14 PM · Commons-Corruption-Checker
TheSandDoctor created T136: CCC: Limited false flagging of non-corrupt JPG files.
May 7 2020, 6:14 PM · Commons-Corruption-Checker

Apr 21 2020

TheSandDoctor changed the visibility for T135: CCC: TIF images should be avoided by TSB.
Apr 21 2020, 10:25 AM · Commons-Corruption-Checker
TheSandDoctor closed T135: CCC: TIF images should be avoided by TSB as Resolved by committing Restricted Diffusion Commit.
Apr 21 2020, 10:09 AM · Commons-Corruption-Checker
TheSandDoctor triaged T135: CCC: TIF images should be avoided by TSB as High priority.
Apr 21 2020, 9:55 AM · Commons-Corruption-Checker

Apr 8 2020

TheSandDoctor changed the visibility for T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg.
Apr 8 2020, 5:04 PM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor closed T117: CCC: Start second instance of rcworker.py, a subtask of T114: CCC: Identify processes to increase speed and files checked per minute, as Resolved.
Apr 8 2020, 5:04 PM · Commons-Corruption-Checker
TheSandDoctor closed T117: CCC: Start second instance of rcworker.py as Resolved.
Apr 8 2020, 5:04 PM · Commons-Corruption-Checker
TheSandDoctor closed T104: CCC: Rcworker.py time data mismatch value error as Resolved.

Rcworker has since been rebuilt and this has not re-emerged since. Closing.

Apr 8 2020, 5:03 PM · Commons-Corruption-Checker

Mar 8 2020

TheSandDoctor closed T130: CCC: Tagging fixed images as corrupt as Resolved.
Mar 8 2020, 5:54 PM · Commons-Corruption-Checker

Mar 5 2020

TheSandDoctor added a comment to T130: CCC: Tagging fixed images as corrupt.

Single (new build) worker started on the backlog. Seeing if issues re-emerge before starting feeder & other workers. With a backlog of 490,252, I don't need to worry about it running out of stock at the moment.

Mar 5 2020, 10:42 AM · Commons-Corruption-Checker

Mar 3 2020

TheSandDoctor updated the task description for T130: CCC: Tagging fixed images as corrupt.
Mar 3 2020, 8:44 AM · Commons-Corruption-Checker
TheSandDoctor triaged T130: CCC: Tagging fixed images as corrupt as Unbreak Now! priority.
Mar 3 2020, 8:44 AM · Commons-Corruption-Checker

Feb 28 2020

TheSandDoctor triaged T129: CCC: Convert to proper structured project as Normal priority.
Feb 28 2020, 1:58 PM · Commons-Corruption-Checker

Feb 27 2020

TheSandDoctor updated the task description for T128: Update sseclient direct from GitHub repo.
Feb 27 2020, 10:01 AM · Commons-Corruption-Checker
TheSandDoctor closed T128: Update sseclient direct from GitHub repo, a subtask of T118: CCC: rcwatcher.py trouble processing char, as Resolved.
Feb 27 2020, 9:59 AM · Commons-Corruption-Checker
TheSandDoctor closed T128: Update sseclient direct from GitHub repo as Resolved.

That was surprisingly quick. Upgraded and feeder/workers restarted. Total downtime for feeder (component that really matters for keeping up to date) was around 1 second (time it took me to Ctrl + C and then up arrow and hit enter).

Feb 27 2020, 9:59 AM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T128: Update sseclient direct from GitHub repo.
Feb 27 2020, 9:57 AM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T128: Update sseclient direct from GitHub repo.
Feb 27 2020, 9:51 AM · Commons-Corruption-Checker
TheSandDoctor triaged T128: Update sseclient direct from GitHub repo as High priority.
Feb 27 2020, 9:51 AM · Commons-Corruption-Checker

Feb 24 2020

TheSandDoctor added a comment to T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg.
2020-02-24 08:42:16,590 __main__    : INFO File:�রপক্ষ মন্দির থেকে ভক্তদের বের হওয়াjpgশ্য..
Traceback (most recent call last):
  File "rcworker.py", line 221, in <module>
    main()
  File "rcworker.py", line 214, in main
    run_worker()
  File "rcworker.py", line 63, in run_worker
    file_page = pywikibot.FilePage(site, change.title)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
    return obj(*__args, **__kw)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__
    super(FilePage, self).__init__(source, title, 6)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
    return obj(*__args, **__kw)
File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__
    super(FilePage, self).__init__(source, title, 6)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
    return obj(*__args, **__kw)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2327, in __init__
    super(Page, self).__init__(source, title, ns)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 200, in __init__
    self._link = Link(title, source=source, default_namespace=ns)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
    return obj(*__args, **__kw)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 6029, in __init__
    raise pywikibot.Error(
pywikibot.exceptions.Error: Title contains illegal char (\uFFFD 'REPLACEMENT CHARACTER')
Feb 24 2020, 10:44 PM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor closed T127: CCC: Standardize test file names as Resolved.
Feb 24 2020, 10:41 PM · Commons-Corruption-Checker
TheSandDoctor triaged T127: CCC: Standardize test file names as Low priority.
Feb 24 2020, 1:28 PM · Commons-Corruption-Checker

Feb 19 2020

TheSandDoctor updated subscribers of T118: CCC: rcwatcher.py trouble processing char.

@AntiCompositeNumber it doesn't look like it caught the image in question, but it was whichever one was uploaded directly after File:PICT0430 - 301032 - onroerenderfgoed.jpg.

2020-02-18 18:45:16,662 __main__    : INFO File:PICT0430 - 301032 - onroerenderfgoed.jpg :Not corrupt. Stored
Traceback (most recent call last):
  File "rcworker.py", line 221, in <module>
    main()
  File "rcworker.py", line 214, in main
    run_worker()
  File "rcworker.py", line 61, in run_worker
    file_page = pywikibot.FilePage(site, change.title)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
 return obj(*__args, **__kw)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__
    super(FilePage, self).__init__(source, title, 6)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
    return obj(*__args, **__kw)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2327, in __init__
    super(Page, self).__init__(source, title, ns)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 200, in __init__
    self._link = Link(title, source=source, default_namespace=ns)
File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper
    return obj(*__args, **__kw)
  File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 6029, in __init__
    raise pywikibot.Error(
pywikibot.exceptions.Error: Title contains illegal char (\uFFFD 'REPLACEMENT CHARACTER')
Feb 19 2020, 10:39 PM · Commons-Corruption-Checker

Feb 16 2020

TheSandDoctor moved T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg from Backlog to Upstream on the Commons-Corruption-Checker board.
Feb 16 2020, 3:24 PM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor added a project to T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg: Restricted Project.
Feb 16 2020, 3:24 PM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor added a comment to T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg.

Filed upstream.

Feb 16 2020, 3:19 PM · Restricted Project, Commons-Corruption-Checker
AntiCompositeNumber added a comment to T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg.

I got pywikibot to fail attempting to access

Feb 16 2020, 2:33 PM · Restricted Project, Commons-Corruption-Checker

Feb 15 2020

TheSandDoctor closed T121: CCC: Examine files by Geagea flagged corrupt as Resolved.
Feb 15 2020, 6:33 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T125: CCC: Should not tag redirects to files..

Pushed to prod and workers/catalog scanner restarted with change.

Feb 15 2020, 10:09 AM · Commons-Corruption-Checker
TheSandDoctor closed T125: CCC: Should not tag redirects to files. as Resolved by committing Restricted Diffusion Commit.
Feb 15 2020, 10:07 AM · Commons-Corruption-Checker
TheSandDoctor triaged T125: CCC: Should not tag redirects to files. as High priority.
Feb 15 2020, 9:53 AM · Commons-Corruption-Checker

Feb 14 2020

TheSandDoctor added a comment to T118: CCC: rcwatcher.py trouble processing char.

Happened again, crashing 2 or 3 of the 5 workers.

Feb 14 2020, 1:17 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg.

Continuously gets stuck at that entry. Further work needed to identify and address.

Feb 14 2020, 12:50 PM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor triaged T124: CCC : Timeout error File:JR久慈 駅 - panoramio (1).jpg as High priority.
Feb 14 2020, 12:49 PM · Restricted Project, Commons-Corruption-Checker

Feb 13 2020

TheSandDoctor triaged T121: CCC: Examine files by Geagea flagged corrupt as Normal priority.
Feb 13 2020, 12:05 PM · Commons-Corruption-Checker
TheSandDoctor moved T121: CCC: Examine files by Geagea flagged corrupt from Backlog to Corruption verifications on the Commons-Corruption-Checker board.
Feb 13 2020, 11:55 AM · Commons-Corruption-Checker
TheSandDoctor moved T123: CCC: Investigate File:Tarnow Park Strzelecki wiewiorka 5.jpg corruption from Backlog to Corruption verifications on the Commons-Corruption-Checker board.
Feb 13 2020, 11:54 AM · Commons-Corruption-Checker
TheSandDoctor closed T123: CCC: Investigate File:Tarnow Park Strzelecki wiewiorka 5.jpg corruption as Resolved.

Updated in database to reflect the fact that it is not corrupt. The tag on the file page itself has been reverted and the uploader notified.

Feb 13 2020, 11:54 AM · Commons-Corruption-Checker
TheSandDoctor added a comment to T123: CCC: Investigate File:Tarnow Park Strzelecki wiewiorka 5.jpg corruption.

I am not 100% sure the cause, but my suspicion is that the file upload checked was stuck in the redis queue awaiting processing on the old version. The queue length is a known issue and one being investigated per T104, T117, T118, T122.

Feb 13 2020, 11:47 AM · Commons-Corruption-Checker
TheSandDoctor added a comment to T117: CCC: Start second instance of rcworker.py.

Started 5 total workers to process the queue faster and am looking into automating their deployment dependent on queue size.

Feb 13 2020, 11:45 AM · Commons-Corruption-Checker
TheSandDoctor closed T115: CCC: Run CCC script a second time, but in reverse order?, a subtask of T114: CCC: Identify processes to increase speed and files checked per minute, as Resolved.
Feb 13 2020, 11:45 AM · Commons-Corruption-Checker