User talk:ClueBot Commons/Archives/2013/December
This is an archive of past discussions with User:ClueBot Commons. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
False positive report
I very much want to report this false positive in the right place, but the link is currently dead. This is regarding edit 1581502 (diff). While the reverted edits constituted a clumsy attempt to convert a redirect into an article, they were in no way vandalism. Thanks, BDD (talk) 19:17, 4 December 2013 (UTC)
A barnstar for you!
Where's ClueBot?
Anyone know why ClueBot NG is down? How can we get in touch with the operators? I've been through this before and ended up finally getting a hold of someone on IRC (not cluebotng room on cluenet), and they got in touch with one of the operators on Twitter. He was then easily able to address the issue and get ClueBot back in working order. CB has been down for 5 days now, and consequently a great deal of vandalism has gone unnoticed. Perhaps we should enlist more operators to maintain the bot and minimize downtime? — MusikAnimal talk 16:24, 2 December 2013 (UTC)
- For the life of me I cannot remember who it is, but one of the operators was out at a wedding recently and unable to attend to the problem. There are people on IRC who know more. --Jprg1966 (talk) 17:28, 3 December 2013 (UTC)
- There is some issue related to labs. That's all I know so far. Also, Damianz (one of devs) are working on it. Petrb (talk) 22:06, 3 December 2013 (UTC)
- Yes, Damian was the one at the wedding. --Jprg1966 (talk) 22:59, 3 December 2013 (UTC)
- Now the amount of vandalism is rising. Jianhui67 talk★contribs 15:12, 4 December 2013 (UTC)
- Yeah that's the problem. Yesterday we got the red alert from VoxelBot, I've never seen it go above yellow. Hopefully they'll get all of this fixed soon. I just wish we could get a timeframe or an update... — MusikAnimal talk 15:20, 4 December 2013 (UTC)
- Someone please come and help me to revert vandalism please? :( Jianhui67 talk★contribs 15:26, 4 December 2013 (UTC)
- I've asked Damian about the issue, and he said he would probably get it fixed soon. That was nearly 5 days ago. Without ClueBot NG, the amount of vandalism on Wikipedia has risen sharply. This issue should definitely be fixed ASAP. K6ka (talk) 14:04, 6 December 2013 (UTC)
- Someone please come and help me to revert vandalism please? :( Jianhui67 talk★contribs 15:26, 4 December 2013 (UTC)
- Yeah that's the problem. Yesterday we got the red alert from VoxelBot, I've never seen it go above yellow. Hopefully they'll get all of this fixed soon. I just wish we could get a timeframe or an update... — MusikAnimal talk 15:20, 4 December 2013 (UTC)
- Now the amount of vandalism is rising. Jianhui67 talk★contribs 15:12, 4 December 2013 (UTC)
- Yes, Damian was the one at the wedding. --Jprg1966 (talk) 22:59, 3 December 2013 (UTC)
- I don't think that amount of vandalism has risen at all. It's is the same according to huggle stats, however, the amount of reverted vandalism has dropped a bit. That means there is higher number of edits that are vandalism but get overlooked or not reverted. Petrb (talk) 15:14, 6 December 2013 (UTC)
- Yes, most likely the case. It would make since Huggle stats are the same, if anything higher since ClueBot is down and more people are trying to compensate for that. Other tools like STiki however, that rely on ClueBot data are now not as effective. Moreover, patrollers make maybe at most 150 edits per minute with semi-automated tools, compared to ClueBot's 9000+, and we have to go to sleep sometime, ClueBot does not. So at any rate, we do have a fairly serious problem right now. — MusikAnimal talk 19:37, 6 December 2013 (UTC)
- Does anybody know when the bot will be back to cleaning up vandalism again? I'm getting fairly anxious. K6ka (talk) 03:53, 7 December 2013 (UTC)
- Good news - CBNG is back up and zapping vandals from behind! Wikipedia has been saved! :) K6ka (talk) 19:32, 8 December 2013 (UTC)
- Does anybody know when the bot will be back to cleaning up vandalism again? I'm getting fairly anxious. K6ka (talk) 03:53, 7 December 2013 (UTC)
- Yes, most likely the case. It would make since Huggle stats are the same, if anything higher since ClueBot is down and more people are trying to compensate for that. Other tools like STiki however, that rely on ClueBot data are now not as effective. Moreover, patrollers make maybe at most 150 edits per minute with semi-automated tools, compared to ClueBot's 9000+, and we have to go to sleep sometime, ClueBot does not. So at any rate, we do have a fairly serious problem right now. — MusikAnimal talk 19:37, 6 December 2013 (UTC)
- There is some issue related to labs. That's all I know so far. Also, Damianz (one of devs) are working on it. Petrb (talk) 22:06, 3 December 2013 (UTC)
Is Cluebot-NG the state of the art detection algorithm? I'm thinking of doing a project on learning from the wikipedia dataset but it seems like vandelism detection has been taken. Are there any other bots/papers I should look at? Shuhao (talk) 16:15, 4 December 2013 (UTC)
ClueBot III (Archiving) Bug: parses and acts on a ClueBot III template enclosed in <nowiki></nowiki> tags
The section #ClueBot III (Archiving) RFE: Relative addressing for archiveprefix argument originally had a blank ClueBot III template enclosed in <nowiki></nowiki> tags. When that was the case, ClueBot III archived all threads on this page within minutes of the page being saved. This was in spite of the fact that the ClueBot III template existed nowhere else on this page. Makyen (talk) 02:50, 7 December 2013 (UTC)
ClueBot III (Archiving) RFE: Relative addressing for archiveprefix argument
RFE CB3_RA: Enable the use of relative addressing for the archiveprefix argument. (e.g.
archiveprefix=/Archives/ |format=Y/F
or
archiveprefix=/Archive |format= %%i )
Forcing the use of absolute addressing results in additional work and confusion when setting up archiving, or when a page is moved. The vast majority of time the Archive pages are sub-pages of the page being archived. It is a rare occurrence that the page is archived to something other than a sub-page of the page being archived. Permitting the use of relative addressing would make configuring archiving easier and less prone to error. This would reduce the time consumed answering questions from editors who have misconfigured ClueBot III. It would also result in the page being able to be moved without the need to edit the ClueBot III template. Makyen (talk) 23:05, 6 December 2013 (UTC)
- I withdraw this request. While it makes page moves easier, they are rare. Having this feature leaves open a possible case where CB3 creates pages under pages under pages if the CD3 config is in the section being archived. Given that I have already seen this happen with the current logic, we don't need another way for it to happen.Makyen (talk) 15:02, 11 December 2013 (UTC)
ClueBot III (Archiving): No edits in a day an a half
I know a day and a half is not that much time. However, ClueBot III usually has a couple/few hundred edits every day. In case you were not aware, I just wanted to bring it to your attention that the bot was not active.Makyen (talk) 10:28, 10 December 2013 (UTC)
- Either I wrote the above a bit soon, or you got it working again. ClueBot III was making edits about an hour and half after I wrote the above comment. Thanks. Makyen (talk) 13:44, 11 December 2013 (UTC)
Archive not displaying
I added User:MiszaBot's archiving to my talk page last week, which ran once and sent old discussions to User talk:CR4ZE/Archive. However I didn't understand how to get the archival box at the top of my talk page to work, so I switched over to User:ClueBot III. I'm not sure if I've set the bot up correctly, because the archival box is there at my talk page but not linking properly to the article. It seems there are problems with the bot at the moment, given the above comments here, but I was hoping somebody could check to see that I've set everything up correctly. CR4ZE (t) 12:04, 11 December 2013 (UTC)
- (talk page stalker)Your ClueBot III (CB3) config was functional, but probably not the way you wanted it. CB3 configs use hours where MiszaBot (MB) uses days. You had your CB3 config set to 24 hours. I changed it to 744 which is 31 days (the same as you had your MB config). If you actually want 24 hours, change it back. I changed it because it is easier for you to change the number in the config back than to have to fix archives once they are moved.
- You do not see any archives listed in the archive box because CB3 has not run yet. Once CB3 runs this will be updated and the archives will be listed.
- You will have one ongoing inconvenience, unless fixed. Your MB config did not specify a counter in the archive name. This resulted in your archive being called "Archive". Currently, your CB3 config will result in a name of "Archive 1" and have the numbers increased as needed. This will leave both "Archive" and "Archive 1" in existence. If it was me, I would do one of the following:
- A) "Undo" the edit by Lowercase sigmabot III on your main User talk page (you will have to do it by hand) and let CB3 create a new "Archive 1" page with all the content. I would then put a speedy deletion template:
{{db-g6|rationale=reason}}
at the top of the "Archive" page with the reason explaining that all the content was reverted back onto your talk page and that the page was created in error. The page will probably be deleted in an hour or two and you will be good to go. - B) Move (using a page move) "Archive" to "Archive 1". Then replace the redirect, created on the "Archive" page by the move, with a
{{db-g6|rationale=reason}}
at the top of the page. I would give a reason describing that the page was made in error as a result of a MB configuration error and that you moved the page to the correct place. - I would offer to do it for you, but it is easier to have the deletions go through if it is you asking for a deletion in your own User talk space when there are no other editors, other than "Lowercase sigmabot III", in the page history. Makyen (talk) 14:40, 11 December 2013 (UTC)
- I went down option B because there were intermediate revisions after MBot. Thanks so much for your help. CR4ZE (t) 15:15, 11 December 2013 (UTC)
ClueBot III (Archiving) Bug: Assignment of default archiveprefix by CB3 can cause creation of recursive pages
If the archiveprefix is found to be invalid, CB3 uses {{FULLPAGENAME}}/Archives/}}. Instead it should fail and do nothing, and/or report the error.
The problem occurs when the CB3 config is in a section of the document that is moved into the archive file. If that is the case, CB3 will continue to create pages deeper and deeper and deeper. An example of this occurring is that the {{User:ClueBot III/ArchiveThis}} was in a section of User talk:Speedy Editing that was archived. In a few days, it resulted in the creation of User talk:Speedy Editing/Archives//Archives//Archives//Archives//Archives//Archives//Archives//Archives//Archives//Archives//Archives//Archives/. This happened between November 30 and December 10 of this year. There are at least three other examples I have seen in the relatively short time I have been paying more attention to archiving.
If the CB3 config, even if correct in all respects except location on the page, is in a section that is archived, then once the config is in the archive the archiveprefix will be invalid the next time CB3 runs on it. Because archiveprefix will not be valid (does not point to a subpage of the current one), CB3 will assign {{FULLPAGENAME}}/Archives/ to archiveprefix and the decent into recursive pages will begin. This type of problem would be eliminated if CB3 does nothing when an invalid archiveprefix is found instead of assigning the value of {{FULLPAGENAME}}/Archives/ to archiveprefix. Obviously, the key value should still be respected. Makyen (talk) 15:52, 11 December 2013 (UTC)
ClueBot III (Archiving) RFE: Single archive action exceeded maximum archive size limit
When changing from MiszaBot to ClueBot III on the Talk:HIV/AIDS page: The first time ClueBot III ran the bot appended 101k to an archive which was already 92k long when the maximum archive size was set to "maxarchsize=100000". This resulted in an archive size of 193k, which greatly exceeded the specified maxarchsize. Makyen (talk) 06:46, 6 December 2013 (UTC)
- ClueBot III evaluates maxarchsize as a target, and makes a decision whether or not to make a new archive. It then pulls all of the sections and appends them to the archive it has decided to use. It does not break up an archive run into multiple archives. This is mainly because when the bot runs regularly (as it should) this should never be a problem, since it will be only appending a few sections to an archive at a time.
- If on the other hand, ClueBot III split archives and used maxarchsize as a hard limit, there are numerous issues that need to be resolved. For example, assume a page with 5 sections (a, b, c, d, e) of these sizes: a=500b; b=100,184b; c=5,185b; d=212,185b; e=300b. What is the "correct" split? Is it one archive per section? Or should it be a + b; c; d + e; because of the tiny size of those? Should b and d even be archivable since they are so large? Or should it be a + c + e; b; d; even thought that is re-ordering the sections? And how do we decide this in the general case?
- ClueBot III's solution works, is easy to understand, and in the vast majority of cases is the correct way. -- Cobi(t|c|b) 23:43, 8 December 2013 (UTC)
- Change to RFE.
- When/if you make other changes to the bot, I would suggest a relatively simple additional complexity. It requires making the choice to use the file, or not, after you know the size of the data you want to append to the file. Effectively, it is to choose to use the file if the resulting file size is closer to maxarchsize than the current size. It would be:
- If (curfilesize > maxarchsize) use_new_file
- elseif ( (curfilesize + sizetoarchive - maxarchsize) > (maxarchsize - curfilesize) and (curfilesize > 0.75*maxarchsize ) use_new_file
- else use_current_file
- This will result in the file being used if it will be closer to the maxarchive size when used for this archive task, or without being used, but this will only be checked if the current file size is more than 75 percent of the max archive size. However, the reality is that the algorithm you are using is OK. Assuming you are not making other changes to the bot, making just this change probably does not justify the time for testing.
- When/if you make other changes to the bot, I would suggest a relatively simple additional complexity. It requires making the choice to use the file, or not, after you know the size of the data you want to append to the file. Effectively, it is to choose to use the file if the resulting file size is closer to maxarchsize than the current size. It would be:
- Meanwhile, I added text to the documentation explaining that maxarchsize is a target, not a limit, which will almost certainly be exceeded.Makyen (talk) 02:47, 15 December 2013 (UTC)
ClueBot III (Archiving) RFE: Archive indexing
Legobot (Task 15) has not performed archive indexing in many months. This task is effectively abandoned. ClueBot III already automatically indexes (User:ClueBot III/Detailed Indices) all Archives on pages where it runs with more information than was provided by LegoBot in an Archive index. It would be a relatively simple addition to consolidate all of the separate Detailed Indices/ for a particular page into a complete Archive index page.
RFE CB3_AI1: Add an option to the {{User:ClueBot III/ArchiveThis}} template to specify an Archive index page into which ClueBot III consolidates the Detailed Indices/ as an Archive index page (e.g. "archiveindex=Talk:YourPage/Archive index").
RFE CB3_AI2: Either also automatically create a Detailed Indices/ sub-page for the primary page being archived, or an option to have this done and thus included in the Archive index in RFE AI1.
Makyen (talk) 22:44, 6 December 2013 (UTC)
- After a bit of looking around, I found that CB3 does create a single page with indices from all of the archive pages, but not main page being archived. It then stores it in a page under User:ClueBot III/Master Detailed Indices/. So, an option to add the page being archived to the Detailed Indices and an option to have ClueBot III copy the index to a specified file would be quite useful.
- The current alternatives to the proposed option to have CB3 copy the Master Detailed Indices is to either redirect the Talk:YourPage/Archive index page, or transclude the Master Detailed Indices page onto that page with something to the effect of:
- {{User:ClueBot III/Master Detailed Indices/{{NAMESPACE}}:{{BASEPAGENAME}}}}
- Makyen (talk) 12:27, 8 December 2013 (UTC)
- So you want it to index the main page as well? Isn't that what the TOC is for? -- Cobi(t|c|b) 23:45, 8 December 2013 (UTC)
- Yes, the TOC is, in part, for that. In general, I would normally expect an index to include all threads. It is quite possible for someone to overlook and entry they are looking for in the TOC even when it exists.
- [Note: Legobot ran on December 10th.] Makyen (talk) 02:57, 15 December 2013 (UTC)
- So you want it to index the main page as well? Isn't that what the TOC is for? -- Cobi(t|c|b) 23:45, 8 December 2013 (UTC)
ClueBot III going forever
Is CB3 going to go for good? It's the only archive bot I can use. Adamdaley (talk) 06:55, 19 December 2013 (UTC)
CB3 adding millions of bytes to talk archive
CB3 is repeatedly adding the same cut/paste to the talk archive every few days, totaling over 2 million bytes. Very strange. Please advise. Rgrds. --64.85.216.158 (talk) 18:08, 19 December 2013 (UTC)
- (talk page stalker) I am working on putting your archives in the right place. There is a primary problem with your config which I will fix once all the archives are in smaller files. This config problem resulted in encountering what is probably a CB3/WP interaction/bug causing the growth of your archive beyond ~500k. Makyen (talk) 21:30, 19 December 2013 (UTC)
- (talk page stalker) From March 2, 2013, CB3 had repeatedly added the same threads to Archive 8. This caused it to grow from 530,026 bytes to 2,016,373 bytes. There were no changes to your CB3 config which would account for the behavior changing. There does not appear to have been a change to CB3 at that time (no change in the source code). Other than the small change in the size of the file from the previous functional run (521,483 -> 530,026), there does not appear, at first glance, to be any other obvious difference to cause the issue of not deleting from the main template talk page. This change in file size moves the file from under 512KiB (524,288 bytes) to over 512KiB. It is possible that there is something in CB3, or WP, or their interaction that causes this issue. The CB3 contributions at the time of the first and second erroneous actions show CB3 writing the archive, but not the main template talk page which would normally be written after the write to the archive. This implies the most likely cause is a response from the write which is not properly handled. It, of course, could be something else.
- However, we generally try to keep archive files well below 512KiB long. The cause of Archive 8 getting to 500k+ to begin with was a CB3 config that assigned a static archive page name and specified no maximum archive file size. This is how your configuration has been since at least 2010. I did not go back further than that, but it appeared to be your page's normal practice to manually edit the archive file name prefix (archiveprefix) to specify the complete file name without any automatic changes by CB3 to increment the file name when the archive got too big. It was static at least back to Archive 3.
- I have split Archive 8 into Archive 8, 9, 10, 11 and 12. Each file is about 100k. Your CB3 config has been changed such that CB3 will automatically start with Archive 13 and move to a new archive file when the current one exceeds 100k. I added a link to provide access to your archive index and a few other changes to your archive box, etc. Makyen (talk) 23:57, 19 December 2013 (UTC)
Question
What does NG in ClueBot NG stand for? Speling12345 (talk) 9:11, 19 December 2013 (UTC)
- I assume Next Generation, but maybe that says more about me ... Yngvadottir (talk) 21:44, 19 December 2013 (UTC)
- If I saw that, maybe. But, it is Next Generation. Oh. It is a nice report. Speling12345 (talk) 11:10, 19 December 2013 (UTC)
Checking edits that modify a file reference
I recently spent an idle hour looking at some of the articles listed here:
http://en.wikipedia.org/wiki/Category:Articles_with_missing_files
and found that several of the redlinks to images were caused by undetected vandalism where, for instance, a single character of the file name was altered. Would it be feasible to detect any edit that alters a file reference, and check whether that change creates an invalid reference? Might help cut down on the backlog in that category (although I have no idea what portion of that backlog is caused by said alterations, malicious or otherwise). Here are a couple of examples:
http://en.wikipedia.org/w/index.php?title=Novel&diff=583477215&oldid=583096745
http://en.wikipedia.org/w/index.php?title=Demographic_transition&diff=586727611&oldid=585902571
The second may also be interesting as a case of extensive vandalism that went undetected (and was edited over by a good-faith attempt to replace a few of the >500 deleted characters, preventing a simple undo). 172.12.225.38 (talk) 22:14, 19 December 2013 (UTC)
Here are two more examples of good-faith and malicious single-character file reference breakage:
http://en.wikipedia.org/w/index.php?title=The_Pickwick_Papers&diff=next&oldid=581548385
http://en.wikipedia.org/w/index.php?title=Piedmont,_California&diff=572813449&oldid=572129458
172.12.225.38 (talk) 02:19, 20 December 2013 (UTC)
Collaboration with the Axis Powers
Why are you keep reverting useful information? I AM ADDING references one by one. Don't use the label vandalism so recklessly. — Preceding unsigned comment added by 114.79.52.10 (talk) 04:35, 20 December 2013 (UTC)
- I saw your question, and although I don't run ClueBot, I thought I would jump in and try to answer it for you. You probably triggered the bot by deleting so much information as you did here. You can't just remove 2/3 of the article like that, it is not constructive. I see that you added a single reference for one change, but any time you make changes, you need to cite a reference for each change - at the time of the change. Hope this helps. Josh3580talk/hist 04:40, 20 December 2013 (UTC)
- OK thank you for your response. One clarification regarding the giant deletion is that it was not my intention to delete so much, it was meant to be a minor edit as stated in the reason of the edit, it only turned into a major deletion due to computer error(I was using a touch-screen rather than a mouse, it must been wrongly moved before my last edit so it selected a large part of the article for deletion). I would surely have corrected it had the bot had not done it before me. For the sources, I am searching to add more for each edit paragraph, but the bot was moving so fast that it deleted all before I could add more. — Preceding unsigned comment added by 114.79.51.217 (talk) 13:49, 22 December 2013 (UTC)
No one at the controls?
The ClueBot false positives report page leads this page for which the registration has expired. User:Cobi (the bot owner) appears to be only marginally active.
ClueBot is very nice, but it does throw false positives from time to time, like here/] and these need to have a live human being looking at them, figuring out what went wrong, and addressing the issue. If there's no one at the throttle its likely to run off the rails more and more I'd think. Can User:Cobi be persuaded to turn this bot over to someone else if he's burnt out on it? Herostratus (talk) 03:34, 21 December 2013 (UTC)
- There's a sensitivity / specificity trade-off which essentially means the bot is designed to make a small amount of false positives, don't think of it as something going wrong that can be investigated and fixed (and certainly not as "running off the rails"). That said, it's been known for a long time that remaining developers are largely inactive, and the detection could be improved using more-modern machine learning methods, as to what we do about that… edit: Also that line had a leading space, this is the version you reverted to benmoore 14:51, 21 December 2013 (UTC)
- Hmmm. I would suppose that the bot is not designed to make a small amount of false positives but rather expected to. Naturally nothing's perfect and ClubBot, which is amazingly good, isn't either. However, we have to keep in mind that each false positive falsely accuses the person of vandalism which is a pretty serious charge. Given that, I would expect that each false positive would be looked at by a human person, who would address that. There're a number of ways to address false positives I guess. I'm not a programmer, and the stuff about machine-learning is way over my pay grade, but even if the program writes and corrects its own rules and algorithms, at the end of the day some human person must have written the code that allows it to do that and so if that's working suboptimally it ultimately falls to some human person to make any necessary corrections I would think. Anyway its not clear to me how the robot can correct itself if its not accepting error reports anymore. Hoping this is not too naive, I suppose some ways you can address a false positive would be:
- Find out why it occurred and change the code so it doesn't happen anymore, or not as often.
- Log it, look for patterns, analyze the underlying cause of the problems, and at some future time change the code to reduce the occurrence of certain classes of problems.
- Try to find out why it occurred and be unable to do so, which happens of course.
- Find out why it occurred, conclude that's its not possible to fix it, either at all or without an unreasonable amount of effort and/or without degrading the tool in some worse way, and accept that.
- Do nothing, but make general-purpose soothing noises in the manner of "your complaint is important to us, be assured that our top people are on the case" or whatever.
- However, there's apparently no one with sufficient interest to even do #5, which is troubling. It would therefore be an increase in message accuracy, and therefor a user interface upgrade, to replace the message "False positive? Report it here" to "False positive? Sucks to be you" or some more formal equivalent; such a boldly stated but accurate message would possibly cause a political problem for ClueBot, so we do have a problem here I think.
- Hmmm. I would suppose that the bot is not designed to make a small amount of false positives but rather expected to. Naturally nothing's perfect and ClubBot, which is amazingly good, isn't either. However, we have to keep in mind that each false positive falsely accuses the person of vandalism which is a pretty serious charge. Given that, I would expect that each false positive would be looked at by a human person, who would address that. There're a number of ways to address false positives I guess. I'm not a programmer, and the stuff about machine-learning is way over my pay grade, but even if the program writes and corrects its own rules and algorithms, at the end of the day some human person must have written the code that allows it to do that and so if that's working suboptimally it ultimately falls to some human person to make any necessary corrections I would think. Anyway its not clear to me how the robot can correct itself if its not accepting error reports anymore. Hoping this is not too naive, I suppose some ways you can address a false positive would be:
- ClueBot is very good, but very busy and powerful, and should't run unattended such that false positives are accepted with no attempt to prevent or reduce them in future, or at least pretend to, I think. I wonder if the Foundation with its full-time software developers could absorb it into the base software, as a adjunct feature or something? I'm inclined to propose that (it probably won't go anywhere, but you never know), is there any objection to my doing that? Herostratus (talk) 17:15, 21 December 2013 (UTC)
- Oh please no, do not suggest having the Foundation with their execrable programming record take over ClueBot. I'm sure the registration will be renewed quickly; this kind of embarrassing slip-up in paperwork happens to major corporations, and although it's problematic when false positives can't be reported, the bot has remarkably few false positives. I'll go jump up and down on IRC. Yngvadottir (talk) 17:37, 21 December 2013 (UTC)
- If you review the NG Userpage you'll understand why what you suggest is not feasible, it's not the simple rule-based system many people on this page imagine (i.e., there's not a system of interpretable rules and thresholds that can be adjusted, it's a neural network which "learns" its rules from the training set, making it non-trivial to reverse-engineer its output). And when I say "designed to" make false positives, that's really what I mean—to understand, consider a receiver operating characteristic curve: it's possible to limit Cluebot's false positives to a level where we almost never see them, but in doing so we reduce our true positive rate also and a lot of vandalism goes unreverted.
- Regarding the WMF taking hold of a vandalism bot, this is something I've assumed (/hoped) they aim to do anyway. This issue has been discussed here and elsewhere each time a problem comes up and it takes time to find any knowledgable party; something needs to be done sooner or later for sure. benmoore 18:26, 21 December 2013 (UTC)
- Yes OK sure, I won't get involved. Hmmm yes ben, I see what you're saying (without, of course, actually understanding it). However, if that's true, I wonder what is the function of the "here" in "False positive, report it here". I gather that it may be that reporting false positives doesn't do anything useful, it's just an alternative to "False positive? Don't worry about it, all this is way over your head, just go about your business" which would rub people the wrong way. Which, you know, is actually reasonable.
- Oh please no, do not suggest having the Foundation with their execrable programming record take over ClueBot. I'm sure the registration will be renewed quickly; this kind of embarrassing slip-up in paperwork happens to major corporations, and although it's problematic when false positives can't be reported, the bot has remarkably few false positives. I'll go jump up and down on IRC. Yngvadottir (talk) 17:37, 21 December 2013 (UTC)
- ClueBot is very good, but very busy and powerful, and should't run unattended such that false positives are accepted with no attempt to prevent or reduce them in future, or at least pretend to, I think. I wonder if the Foundation with its full-time software developers could absorb it into the base software, as a adjunct feature or something? I'm inclined to propose that (it probably won't go anywhere, but you never know), is there any objection to my doing that? Herostratus (talk) 17:15, 21 December 2013 (UTC)
- Also -- again, I'm not trying to cause trouble or denigrate ClueBot, just making a point -- over at Wikipedia:WikiProject Editor Retention the lament was made that new editors' early contributions are reverted more (and more ruthlessly) than in earlier days, and this hurts editor recruitment. The main cause of this by far (or if it even is a problem -- it's complicated) is other human editors, but ClueBot false positives don't help. The decision of what level of false positives to tolerate is ultimately a business decision and has to be understood in total context. That said, ClueBot is much better than a gawping civilian like me would have ever expected and so there's no action item regarding the actual level of false positives at this time. Herostratus (talk) 19:17, 21 December 2013 (UTC)
- I can't profess to knowing the ins and outs of the "report false positive" process (Damian?) but I expect it will add the edge case to the training set (after review), which to some degree will sway the network away from making the same mistake (though the training set has thousands of ham/spam cases). So I don't believe it's a pointless endeavour, but is presumably more of a fine-tuning instrument than a hard-and-fast "don't do that again". Regarding your second paragraph, I agree and if the bot were under the jurisdiction of the WMF presumably makes that kind of decision more transparent and amenable to community input benmoore 20:17, 21 December 2013 (UTC)
- The report interface feeds those edits, along with a bunch of other random edits into the review interface where people can review edits to generate a corpus of edits to be fed to the bot the next time the bot is trained. While any specific instance makes ever so small changes (there 10's to 100's of thousands of edits in the corpus already), it is a useful endeavor. As to the note about more modern machine learning methods, there hasn't been much improvement in that area of research that I am aware of since ClueBot NG was written. There are some minor things that could be improved, like understanding more wikitext, but that is not in the realm of machine learning, but rather just more inputs to the existing machine learning system. Furthermore, wikitext is not an easily parsable language, so it would take a significant amount of time to implement parsing of wikitext. The main thing that will help it is a larger corpus. And that takes man-hours to categorize huge amounts of edits -- Both good and bad edits in about the correct proportion that Wikipedia gets. -- Cobi(t|c|b) 21:52, 21 December 2013 (UTC)
- I can't profess to knowing the ins and outs of the "report false positive" process (Damian?) but I expect it will add the edge case to the training set (after review), which to some degree will sway the network away from making the same mistake (though the training set has thousands of ham/spam cases). So I don't believe it's a pointless endeavour, but is presumably more of a fine-tuning instrument than a hard-and-fast "don't do that again". Regarding your second paragraph, I agree and if the bot were under the jurisdiction of the WMF presumably makes that kind of decision more transparent and amenable to community input benmoore 20:17, 21 December 2013 (UTC)
- Also -- again, I'm not trying to cause trouble or denigrate ClueBot, just making a point -- over at Wikipedia:WikiProject Editor Retention the lament was made that new editors' early contributions are reverted more (and more ruthlessly) than in earlier days, and this hurts editor recruitment. The main cause of this by far (or if it even is a problem -- it's complicated) is other human editors, but ClueBot false positives don't help. The decision of what level of false positives to tolerate is ultimately a business decision and has to be understood in total context. That said, ClueBot is much better than a gawping civilian like me would have ever expected and so there's no action item regarding the actual level of false positives at this time. Herostratus (talk) 19:17, 21 December 2013 (UTC)
Reporting error
Yea, this robot is so retarded! I undo vandalism reverts like on jizz and it stupidly reverts me. Since WP:CIVIL and WP:NPA doesnt protect bots from policy, u r the stupidest invention I have EVER seen. Dragonron (talk) —Preceding undated comment added 18:03, 24 December 2013 (UTC)
- Responded on your talk page; the bot was right. Yngvadottir (talk) 21:19, 24 December 2013 (UTC)
Dehumanizing
I rarely noticed vandalism before. Just another toy for the Wikileetists to wield over the "anyone" that is SUPPOSED to be able to edit wikipedia. If you want to make it your own private encyclopedia then do so but stop trying to pretend that "anyone" can edit it because between the bots and their wiki-nazi overlords us commoners can't. — Preceding unsigned comment added by 75.164.232.68 (talk) 22:14, 24 December 2013 (UTC)
A barnstar for you!
{{/censor}}