I added logging for the specifics of the OSError triggered when corrupt images identified. Will watch the logs closely to ensure it is capturing the sort of information that I am looking for. Hopefully this leads to some useful insight if another false positive is detected.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
May 8 2020
May 7 2020
Restored activity, going to be adding more logging for corrupt images...want to know the type of errors being caught....
Turns out that hashes weren't an issue and were actually the change hash....that isn't the issue here by looks.
Bot shut down in meantime
It appears that the script is somehow saving the wrong hash values for every image. Investigating.
First step in triage is to revert away from using thumbnails for large image sizes.
Apr 21 2020
Apr 8 2020
Rcworker has since been rebuilt and this has not re-emerged since. Closing.
Mar 8 2020
Mar 5 2020
Single (new build) worker started on the backlog. Seeing if issues re-emerge before starting feeder & other workers. With a backlog of 490,252, I don't need to worry about it running out of stock at the moment.
Mar 3 2020
Feb 28 2020
Feb 27 2020
That was surprisingly quick. Upgraded and feeder/workers restarted. Total downtime for feeder (component that really matters for keeping up to date) was around 1 second (time it took me to Ctrl + C and then up arrow and hit enter).
Feb 24 2020
2020-02-24 08:42:16,590 __main__ : INFO File:�রপক্ষ মন্দির থেকে ভক্তদের বের হওয়াjpgশ্য.. Traceback (most recent call last): File "rcworker.py", line 221, in <module> main() File "rcworker.py", line 214, in main run_worker() File "rcworker.py", line 63, in run_worker file_page = pywikibot.FilePage(site, change.title) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__ super(FilePage, self).__init__(source, title, 6) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__ super(FilePage, self).__init__(source, title, 6) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2327, in __init__ super(Page, self).__init__(source, title, ns) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 200, in __init__ self._link = Link(title, source=source, default_namespace=ns) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 6029, in __init__ raise pywikibot.Error( pywikibot.exceptions.Error: Title contains illegal char (\uFFFD 'REPLACEMENT CHARACTER')
Feb 19 2020
@AntiCompositeNumber it doesn't look like it caught the image in question, but it was whichever one was uploaded directly after File:PICT0430 - 301032 - onroerenderfgoed.jpg.
2020-02-18 18:45:16,662 __main__ : INFO File:PICT0430 - 301032 - onroerenderfgoed.jpg :Not corrupt. Stored Traceback (most recent call last): File "rcworker.py", line 221, in <module> main() File "rcworker.py", line 214, in main run_worker() File "rcworker.py", line 61, in run_worker file_page = pywikibot.FilePage(site, change.title) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__ super(FilePage, self).__init__(source, title, 6) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2327, in __init__ super(Page, self).__init__(source, title, ns) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 200, in __init__ self._link = Link(title, source=source, default_namespace=ns) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 6029, in __init__ raise pywikibot.Error( pywikibot.exceptions.Error: Title contains illegal char (\uFFFD 'REPLACEMENT CHARACTER')
Feb 16 2020
I got pywikibot to fail attempting to access
Feb 15 2020
Pushed to prod and workers/catalog scanner restarted with change.
Feb 14 2020
Happened again, crashing 2 or 3 of the 5 workers.
Continuously gets stuck at that entry. Further work needed to identify and address.
Feb 13 2020
Started 5 total workers to process the queue faster and am looking into automating their deployment dependent on queue size.
Implemented. Logs accidentally added to T117 instead.
Server-side also works
Local test identifies the image as not being corrupt.
Feb 12 2020
e 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2478, in __init__ super(FilePage, self).__init__(source, title, 6) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 2327, in __init__ super(Page, self).__init__(source, title, ns) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 200, in __init__ self._link = Link(title, source=source, default_namespace=ns) File "/usr/local/lib/python3.8/site-packages/pywikibot/tools/__init__.py", line 1744, in wrapper return obj(*__args, **__kw) File "/usr/local/lib/python3.8/site-packages/pywikibot/page.py", line 6029, in __init__ raise pywikibot.Error( pywikibot.exceptions.Error: Title contains illegal char (\uFFFD 'REPLACEMENT CHARACTER')
Now going through both forwards and backwards :)
Forward works, but backwards appears to get a empty page generator. Further investigation required.
File:Haim Laskov - Yackov Dori - David Ben-Gurion - Hagana Pin 1958.jpg (server side)
Traceback (most recent call last): File "test_fileerror.py", line 40, in <module> test("./Haim_Laskov_-_Yackov_Dori_-_David_Ben-Gurion_-_Hagana_Pin_1958.jpg") File "test_fileerror.py", line 9, in test image.tobytes() File "/usr/local/lib/python3.8/site-packages/Pillow-7.1.0.dev0-py3.8-linux-x86_64.egg/PIL/Image.py", line 711, in tobytes File "/usr/local/lib/python3.8/site-packages/Pillow-7.1.0.dev0-py3.8-linux-x86_64.egg/PIL/ImageFile.py", line 245, in load OSError: image file is truncated (41 bytes not processed)
Locally
Also confirmed using vanilla pillow
David is truncated early.
On server:
Both rcwatcher and rcworker have been restarted with the latest patch.
Another crash at some point last night. One commit pushed/pulled, but I am too busy to continue at the moment. Will work more on this tonight most likely.
Feb 11 2020
Resolved by making the pageID -1 (no actual IDs are or can be -1) in the event that it cannot be found. At that point, it can manually be fixed as the number of incidents is fairly low overall.
Should catch error, log locally, and continue to the next image. Last night it stalled out at some point due to this, after having run for over a day without issues.
Trials done for now.
Now running live.
Resolved upstream.
Feb 10 2020
It's hard to know exactly what is slowing things down without a profile. @TheSandDoctor, could you run corrupt.py through CProfile? I'd expect the largest source of delay to come from three places:
Feb 9 2020
Feb 8 2020
Return needs work.