Page MenuHomePhabricator
Feed Advanced Search

Feb 9 2020

TheSandDoctor added a project to T114: CCC: Identify processes to increase speed and files checked per minute: Commons-Corruption-Checker.
Feb 9 2020, 11:19 PM · Commons-Corruption-Checker
TheSandDoctor renamed T115: CCC: Run CCC script a second time, but in reverse order? from CCC: Run CCC corrupt.py script a second time, but in reverse order? to CCC: Run CCC script a second time, but in reverse order?.
Feb 9 2020, 11:18 PM · Commons-Corruption-Checker
TheSandDoctor renamed T115: CCC: Run CCC script a second time, but in reverse order? from CCC: Run CCC script a second time, but in reverse order? to CCC: Run CCC corrupt.py script a second time, but in reverse order?.
Feb 9 2020, 10:29 PM · Commons-Corruption-Checker
TheSandDoctor changed the visibility for T114: CCC: Identify processes to increase speed and files checked per minute.
Feb 9 2020, 10:28 PM · Commons-Corruption-Checker
TheSandDoctor changed the visibility for T115: CCC: Run CCC script a second time, but in reverse order?.
Feb 9 2020, 10:28 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T115: CCC: Run CCC script a second time, but in reverse order?.

Per the pywikibot docs and signature in pwb_wrappers.py (which is a wrapper function around the direct pwb call), it appears to be possible to go backwards through the catalog.

Feb 9 2020, 10:28 PM · Commons-Corruption-Checker
TheSandDoctor triaged T113: CCC: corrupt.py KeyError, image unidentified as Normal priority.
Feb 9 2020, 6:48 PM · Commons-Corruption-Checker
TheSandDoctor triaged T111: CCC: corrupt.py always restarts at the beginning as Normal priority.
Feb 9 2020, 11:49 AM · Commons-Corruption-Checker

Feb 8 2020

TheSandDoctor closed T109: CCC: have_seen_image UnboundLocalError as Resolved by committing Restricted Diffusion Commit.
Feb 8 2020, 5:56 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T109: CCC: have_seen_image UnboundLocalError.

Return needs work.

Feb 8 2020, 5:52 PM · Commons-Corruption-Checker
TheSandDoctor triaged T109: CCC: have_seen_image UnboundLocalError as High priority.
Feb 8 2020, 5:49 PM · Commons-Corruption-Checker
TheSandDoctor closed T108: CCC: Remove login code from corrupt.py as Resolved by committing Restricted Diffusion Commit.
Feb 8 2020, 5:33 PM · Commons-Corruption-Checker, Restricted Project
TheSandDoctor triaged T108: CCC: Remove login code from corrupt.py as Normal priority.
Feb 8 2020, 5:30 PM · Commons-Corruption-Checker, Restricted Project
TheSandDoctor renamed T106: CCC: Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt from Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt to CCC: Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt.
Feb 8 2020, 11:40 AM · Commons-Corruption-Checker
TheSandDoctor triaged T107: CCC: Use /tmp in corrupt.py as Normal priority.
Feb 8 2020, 11:39 AM · Commons-Corruption-Checker
TheSandDoctor added a comment to T106: CCC: Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt.

The other images affected were File:Paar handschoenen, objectnr KA 15683.12.tif and Schild,_objectnr_KA_15728.tif. All have now passed tests and have been manually updated in the database to reflect the fact that they are not corrupt. I have also reverted TSB's edits relating to these images and notified the uploader that it has been fixed.

Feb 8 2020, 11:31 AM · Commons-Corruption-Checker
TheSandDoctor closed T106: CCC: Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt as Resolved by committing Restricted Diffusion Commit.
Feb 8 2020, 10:43 AM · Commons-Corruption-Checker
TheSandDoctor added a comment to T106: CCC: Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt.

Updating the server's version of my PIL flavour appears to have resolved some of these issues. Still investigating more.

Feb 8 2020, 10:29 AM · Commons-Corruption-Checker
TheSandDoctor triaged T106: CCC: Falsely identifies Handschoenen_(paar)_en_beschrijving_op_papier._objectnr_KA_15684.13.tif as corrupt as High priority.
Feb 8 2020, 10:05 AM · Commons-Corruption-Checker
TheSandDoctor added a comment to T104: CCC: Rcworker.py time data mismatch value error.
2020-02-08 07:54:06,423 __main__    : INFO File:Yackety yack (1901) - DPLA - 30e4078df08e452cbcded5fd72443251 (page 246).jpg :Not corrupt. Stored
2020-02-08 07:54:06,426 __main__    : INFO File:Regina Doherty 2015.jpg
2020-02-08 07:54:06,666 Image       : WARNING KeyError1 has occurred
Traceback (most recent call last):
  File "/home/ccc/Commons-image-corruption-detector/Image.py", line 29, in getRevision
    revision = file_page.get_file_history()[pywikibot.Timestamp.fromtimestampformat(self.log_timestamp)]
KeyError: Timestamp(2020, 2, 8, 7, 44, 20)
Feb 8 2020, 12:33 AM · Commons-Corruption-Checker

Feb 7 2020

TheSandDoctor added a comment to T104: CCC: Rcworker.py time data mismatch value error.

Just happened again at random.

2020-02-08 01:32:01,153 __main__    : INFO File:AIIPlogo3d.png
Traceback (most recent call last):
  File "rcworker.py", line 216, in <module>
    main()
  File "rcworker.py", line 209, in main
    run_worker()
  File "rcworker.py", line 105, in run_worker
    revision = change.getRevision(file_page)
  File "/home/ccc/Commons-image-corruption-detector/Image.py", line 24, in getRevision
    revision = file_page.get_file_history()[pywikibot.Timestamp.fromtimestampformat(self.log_timestamp)]
  File "/usr/local/lib/python3.8/site-packages/pywikibot/__init__.py", line 210, in fromtimestampformat
    return cls.strptime(ts, cls.mediawikiTSFormat)
  File "/usr/local/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2020-01-17T13:54:19Z' does not match format '%Y%m%d%H%M%S'
Feb 7 2020, 5:32 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T104: CCC: Rcworker.py time data mismatch value error.

Cannot reproduce locally nor on server side.

Feb 7 2020, 5:27 PM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T104: CCC: Rcworker.py time data mismatch value error.
Feb 7 2020, 5:27 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T104: CCC: Rcworker.py time data mismatch value error.

Initial tests do not appear able to reproduce this locally.

Feb 7 2020, 5:16 PM · Commons-Corruption-Checker
TheSandDoctor triaged T104: CCC: Rcworker.py time data mismatch value error as Normal priority.
Feb 7 2020, 4:55 PM · Commons-Corruption-Checker
TheSandDoctor closed T103: CCC: Increase max log file size, a subtask of T101: CCC: Implement proper logging, as Resolved.
Feb 7 2020, 4:46 PM · Commons-Corruption-Checker
TheSandDoctor closed T103: CCC: Increase max log file size as Resolved by committing Restricted Diffusion Commit.
Feb 7 2020, 4:46 PM · Commons-Corruption-Checker
TheSandDoctor triaged T103: CCC: Increase max log file size as Normal priority.
Feb 7 2020, 4:46 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T101: CCC: Implement proper logging.

Logging ini based off of this stack overflow post with influence from RealPython.com.

Feb 7 2020, 4:41 PM · Commons-Corruption-Checker
TheSandDoctor triaged T102: CCC: Catch KeyboardInterrupt as Normal priority.
Feb 7 2020, 4:02 PM · Commons-Corruption-Checker
TheSandDoctor triaged T101: CCC: Implement proper logging as Normal priority.
Feb 7 2020, 4:00 PM · Commons-Corruption-Checker

Feb 6 2020

TheSandDoctor triaged T100: CCC: rcworker.py FileNotFoundError as Normal priority.
Feb 6 2020, 10:51 PM · Commons-Corruption-Checker
TheSandDoctor closed T99: CCC: redis.exceptions.DataError as Resolved.

Now works. :)

Feb 6 2020, 10:10 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T99: CCC: redis.exceptions.DataError.
Traceback (most recent call last):
  File "rcworker.py", line 188, in <module>
    main()
  File "rcworker.py", line 183, in main
    run_worker()
  File "rcworker.py", line 61, in run_worker
    change = pickle.loads(pickle) # Need to unpickle and build object once more - T99
AttributeError: 'bytes' object has no attribute 'loads'
CRITICAL: Exiting due to uncaught exception <class 'AttributeError'>
Feb 6 2020, 10:08 PM · Commons-Corruption-Checker
TheSandDoctor updated subscribers of T99: CCC: redis.exceptions.DataError.
Feb 6 2020, 10:04 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T99: CCC: redis.exceptions.DataError.

Useful stack overflow for this.

Feb 6 2020, 10:04 PM · Commons-Corruption-Checker
TheSandDoctor triaged T99: CCC: redis.exceptions.DataError as High priority.
Feb 6 2020, 9:53 PM · Commons-Corruption-Checker
TheSandDoctor closed T98: CCC: Add tracking for if/when image fixed as Resolved by committing Restricted Diffusion Commit.
Feb 6 2020, 9:37 PM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T98: CCC: Add tracking for if/when image fixed.
Feb 6 2020, 9:37 PM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T98: CCC: Add tracking for if/when image fixed.
Feb 6 2020, 9:36 PM · Commons-Corruption-Checker
TheSandDoctor triaged T98: CCC: Add tracking for if/when image fixed as Normal priority.
Feb 6 2020, 9:36 PM · Commons-Corruption-Checker
TheSandDoctor closed T96: CCC: Add catch for LockedPage to all retry_apierror() save edit calls as Resolved by committing Restricted Diffusion Commit.
Feb 6 2020, 9:13 PM · Commons-Corruption-Checker
TheSandDoctor triaged T96: CCC: Add catch for LockedPage to all retry_apierror() save edit calls as Normal priority.
Feb 6 2020, 9:10 PM · Commons-Corruption-Checker
TheSandDoctor closed T95: CCC: ImageObj string indices must be integers, a subtask of T91: CCC: Test rcwatcher.py (dry run), as Resolved.
Feb 6 2020, 7:29 PM · Commons-Corruption-Checker
TheSandDoctor closed T95: CCC: ImageObj string indices must be integers as Resolved by committing Restricted Diffusion Commit.
Feb 6 2020, 7:29 PM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T95: CCC: ImageObj string indices must be integers.
Feb 6 2020, 7:14 PM · Commons-Corruption-Checker
TheSandDoctor updated the task description for T95: CCC: ImageObj string indices must be integers.
Feb 6 2020, 7:14 PM · Commons-Corruption-Checker
TheSandDoctor triaged T95: CCC: ImageObj string indices must be integers as Normal priority.
Feb 6 2020, 7:13 PM · Commons-Corruption-Checker
TheSandDoctor closed T94: CCC: ImageObj fed wrong format as Resolved by committing Restricted Diffusion Commit.
Feb 6 2020, 7:13 PM · Commons-Corruption-Checker
TheSandDoctor closed T94: CCC: ImageObj fed wrong format, a subtask of T91: CCC: Test rcwatcher.py (dry run), as Resolved.
Feb 6 2020, 7:13 PM · Commons-Corruption-Checker
TheSandDoctor triaged T94: CCC: ImageObj fed wrong format as Normal priority.
Feb 6 2020, 7:10 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T93: CCC: Screen out "WARNING: Empty message found" prior to pushing to redis.

I have filed a task/ticket on Wikimedia's Phabricator install regarding this. Cannot seem to screen it out and have asked for advice there.

Feb 6 2020, 10:47 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor triaged T93: CCC: Screen out "WARNING: Empty message found" prior to pushing to redis as Normal priority.
Feb 6 2020, 6:52 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor closed T92: VPS: Install SSEclient as Resolved.
Feb 6 2020, 6:50 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor closed T92: VPS: Install SSEclient, a subtask of T91: CCC: Test rcwatcher.py (dry run), as Resolved.
Feb 6 2020, 6:50 AM · Commons-Corruption-Checker
TheSandDoctor added a comment to T92: VPS: Install SSEclient.
Collecting sseclient
  Downloading sseclient-0.0.24.tar.gz (6.7 kB)
Requirement already satisfied: requests>=2.9 in /usr/local/lib/python3.8/site-packages (from sseclient) (2.22.0)
Requirement already satisfied: six in /usr/local/lib/python3.8/site-packages (from sseclient) (1.14.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.8/site-packages (from requests>=2.9->sseclient) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/site-packages (from requests>=2.9->sseclient) (2019.11.28)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.8/site-packages (from requests>=2.9->sseclient) (2.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.8/site-packages (from requests>=2.9->sseclient) (1.25.7)
Installing collected packages: sseclient
    Running setup.py install for sseclient ... done
Successfully installed sseclient-0.0.24
Feb 6 2020, 6:50 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor triaged T92: VPS: Install SSEclient as Normal priority.
Feb 6 2020, 6:48 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor triaged T91: CCC: Test rcwatcher.py (dry run) as Normal priority.
Feb 6 2020, 6:45 AM · Commons-Corruption-Checker

Feb 5 2020

TheSandDoctor moved T90: CCC: PIL submodule link needs updating from Backlog to Done on the Commons-Corruption-Checker board.
Feb 5 2020, 8:51 PM · Commons-Corruption-Checker, PIL submodule
TheSandDoctor closed T90: CCC: PIL submodule link needs updating as Resolved by committing Restricted Diffusion Commit.
Feb 5 2020, 8:51 PM · Commons-Corruption-Checker, PIL submodule
TheSandDoctor moved T82: CCC: Implement time durations as constant from Backlog to Deployment prep on the Commons-Corruption-Checker board.
Feb 5 2020, 8:50 PM · Commons-Corruption-Checker
TheSandDoctor triaged T90: CCC: PIL submodule link needs updating as Low priority.
Feb 5 2020, 8:49 PM · Commons-Corruption-Checker, PIL submodule

Feb 4 2020

TheSandDoctor added a comment to T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date.

Confirmed working

Feb 4 2020, 10:59 PM · Commons-Corruption-Checker
TheSandDoctor closed T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date, a subtask of T82: CCC: Implement time durations as constant, as Resolved.
Feb 4 2020, 10:57 PM · Commons-Corruption-Checker
TheSandDoctor closed T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date as Resolved.

Resolved.

Feb 4 2020, 10:57 PM · Commons-Corruption-Checker
TheSandDoctor closed T87: Install mwparserfromhell for python as Resolved.

Installed.

Feb 4 2020, 10:50 PM · Restricted Project
TheSandDoctor triaged T87: Install mwparserfromhell for python as Unbreak Now! priority.
Feb 4 2020, 10:48 PM · Restricted Project
TheSandDoctor closed T69: CCC Create tests for notifying users as Resolved.

Looks like notification tests are done for now and working as expected. :)

Feb 4 2020, 1:20 PM · Commons-Corruption-Checker
TheSandDoctor closed T69: CCC Create tests for notifying users, a subtask of T39: CCC: Trials, as Resolved.
Feb 4 2020, 1:20 PM · Commons-Corruption-Checker
TheSandDoctor closed T85: CCC: Tests experiencing action delay, a subtask of T39: CCC: Trials, as Resolved.
Feb 4 2020, 1:17 PM · Commons-Corruption-Checker
TheSandDoctor closed T85: CCC: Tests experiencing action delay as Resolved.

Set put_throttle to 3 (from 10) inside user-config.py, which is not repo tracked. This appears to have sped it up appropriately. maxthrottle and maxlag were not changed and have remained their default values.

Feb 4 2020, 1:17 PM · Commons-Corruption-Checker
TheSandDoctor reopened T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date, a subtask of T82: CCC: Implement time durations as constant, as Open.
Feb 4 2020, 1:16 PM · Commons-Corruption-Checker
TheSandDoctor reopened T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date as "Open".

Wrong ticket. Oops

Feb 4 2020, 1:16 PM · Commons-Corruption-Checker
TheSandDoctor closed T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date as Resolved.
Feb 4 2020, 1:16 PM · Commons-Corruption-Checker
TheSandDoctor closed T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date, a subtask of T82: CCC: Implement time durations as constant, as Resolved.
Feb 4 2020, 1:16 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date.
Feb 4 2020, 1:16 PM · Commons-Corruption-Checker
TheSandDoctor closed T78: CCC : Make notifications non-minor edits as Resolved.

Confirmed with this edit. Closing.

Feb 4 2020, 1:04 PM · Commons-Corruption-Checker
TheSandDoctor closed T78: CCC : Make notifications non-minor edits, a subtask of T69: CCC Create tests for notifying users, as Resolved.
Feb 4 2020, 1:04 PM · Commons-Corruption-Checker
TheSandDoctor triaged T85: CCC: Tests experiencing action delay as Normal priority.
Feb 4 2020, 1:02 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T78: CCC : Make notifications non-minor edits.

Once verified notifications working, will close.

Feb 4 2020, 12:23 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date.

Resolved. Better solution should be implemented though that doesn't require .value calls everywhere.

Feb 4 2020, 12:21 PM · Commons-Corruption-Checker
TheSandDoctor added a comment to T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date.

Still more work needed. Need to print out the value vs the literal enum name. Will need to look further when I have more time.

Feb 4 2020, 12:17 PM · Commons-Corruption-Checker
TheSandDoctor triaged T83: CCC: Now printing EDayCount.DAYS_30 instead of actual date as High priority.
Feb 4 2020, 12:11 PM · Commons-Corruption-Checker
TheSandDoctor closed T80: CCC: Standardize function signatures in image_corruption_utils.py as Resolved by committing Restricted Diffusion Commit.
Feb 4 2020, 12:01 PM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor closed T81: CCC: rcworker.py wrong days tag duration as Resolved by committing Restricted Diffusion Commit.
Feb 4 2020, 12:01 PM · Commons-Corruption-Checker
TheSandDoctor triaged T82: CCC: Implement time durations as constant as Normal priority.
Feb 4 2020, 11:55 AM · Commons-Corruption-Checker
TheSandDoctor renamed T81: CCC: rcworker.py wrong days tag duration from CCC: rcworker.py wrong days tag to CCC: rcworker.py wrong days tag duration.
Feb 4 2020, 11:53 AM · Commons-Corruption-Checker
TheSandDoctor triaged T81: CCC: rcworker.py wrong days tag duration as High priority.
Feb 4 2020, 11:53 AM · Commons-Corruption-Checker
TheSandDoctor claimed T80: CCC: Standardize function signatures in image_corruption_utils.py.
Feb 4 2020, 11:39 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor triaged T80: CCC: Standardize function signatures in image_corruption_utils.py as Low priority.
Feb 4 2020, 11:38 AM · Restricted Project, Commons-Corruption-Checker
TheSandDoctor triaged T78: CCC : Make notifications non-minor edits as Normal priority.
Feb 4 2020, 11:34 AM · Commons-Corruption-Checker
TheSandDoctor closed T77: CCC: TSB image identified corrupt should be fed word versions of date as Resolved.

Works now. Thanks @AntiCompositeNumber ! (Also thanks for adding the backwards compatibility.)

Feb 4 2020, 9:25 AM · Commons-Corruption-Checker
TheSandDoctor closed T77: CCC: TSB image identified corrupt should be fed word versions of date, a subtask of T69: CCC Create tests for notifying users, as Resolved.
Feb 4 2020, 9:25 AM · Commons-Corruption-Checker

Feb 3 2020

TheSandDoctor reassigned T76: CCC TSB corruption notification template not populating parameters from TheSandDoctor to AntiCompositeNumber.

Aaannnd account created :) @AntiCompositeNumber. Re-assigned.

Feb 3 2020, 9:47 PM · Commons-Corruption-Checker
TheSandDoctor closed T76: CCC TSB corruption notification template not populating parameters as Resolved.

Resolved by AntiCompositeNumber (would assign to them, but they do not have an account on here).

Feb 3 2020, 9:34 PM · Commons-Corruption-Checker
TheSandDoctor closed T76: CCC TSB corruption notification template not populating parameters, a subtask of T39: CCC: Trials, as Resolved.
Feb 3 2020, 9:34 PM · Commons-Corruption-Checker
TheSandDoctor closed T76: CCC TSB corruption notification template not populating parameters, a subtask of T69: CCC Create tests for notifying users, as Resolved.
Feb 3 2020, 9:34 PM · Commons-Corruption-Checker
TheSandDoctor triaged T77: CCC: TSB image identified corrupt should be fed word versions of date as Normal priority.
Feb 3 2020, 9:17 PM · Commons-Corruption-Checker
TheSandDoctor added a parent task for T76: CCC TSB corruption notification template not populating parameters: T69: CCC Create tests for notifying users.
Feb 3 2020, 9:13 PM · Commons-Corruption-Checker
TheSandDoctor added a subtask for T69: CCC Create tests for notifying users: T76: CCC TSB corruption notification template not populating parameters.
Feb 3 2020, 9:13 PM · Commons-Corruption-Checker