User talk:ClueBot Commons
The current status of ClueBot NG is: Running
The current status of ClueBot III is: Running
Praise should go on the praise page. Barnstars and other awards should go on the awards page.
Use the "new section" button at the top of this page to add a new section. Use the [edit] link above each section to edit that section.
This page is automatically archived by ClueBot III.
The ClueBots' owner or someone else who knows the answer to your question will reply on this page.
ClueBots | |
---|---|
ClueBot NG/Anti-vandalism · ClueBot II/ClueBot Script | ![]() |
ClueBot III/Archive · Talk page for all ClueBots |
![]() | Beware! This user's talk page is monitored by talk page watchers. Some of them even talk back. |
Cluebot III: ideas to improve the new archivist experience
[edit]Thank you for Cluebot III, which I just successfully used to set up archiving for Talk:Kat Abughazaleh! As I was learning how to use it, I had a few thoughts on what would have made it easier to start using it.
I really appreciate the documentation and easy "getting started" templates I could copy and paste. But, once I added the templates to the talk page (including configuration to "archivenow
" a few "resolved
" topics), I wasn't sure how long I should expect to wait before archiving would begin.
So I have a few questions/ideas:
- It would be great to have some kind of an easy-to-use web-based tool to test: "will this configuration successfully do what I want?" Like: "given this set of arguments to the template, and this talk page, what will happen when Cluebot III runs?" I'm imagining something like RegExr or shellcheck or connected to Template sandbox and test cases.
- As @Joy asked a little while back: if we could check publicly available logs of the bot run, that would help us troubleshoot. Another idea: an auto-updated spot on User:ClueBot III or at https://cluebot3.toolforge.org/ that says when the most recent run started, and whether it's still in progress or, if not, when it ended.
- I checked the bot's source code, and recent comments by maintainers, to update the documentation with expectations on how long to wait: "Cluebot III runs every 6 hours, on the Wikimedia Toolforge infrastructure. After you initially set up a page with the appropriate templates to invoke Cluebot III, it may take 24 or 48 hours for Cluebot III to execute archiving on that page for the first time." Is that accurate?
- This is a more minor idea, but: When figuring out "is the bot in the middle of a run?", a user might try to figure out what sequence ClueBot III uses to process pages. But, if I read https://www.mediawiki.org/wiki/API:Query correctly, the sequence is not really predictable, because Cluebot is getting the list of pages from an API call that doesn't guarantee any particular ordering. If that's right, if you confirm, I'll update the relevant documentation.
I noticed @NaomiAmethyst has recently taken a fresh look at the code so I hope these ideas are helpful now! Sumana Harihareswara 15:47, 13 May 2025 (UTC)
Improving ClueBot NG's algorithm
[edit]I'm looking into User:Cluebot NG#Vandalism Detection Algorithm, and this is my understanding of the algorithm:
1. For each word and pair of adjacent words that was added in the edit, add its score (which is determined from training data) to a counter.
2. Compute a few other statistics, such as length of text added, etc. and normalize them to prepare as inputs to the neural network.
3. Run neural network and get the score.
Clearly, the algorithm works just fine (just look at ClueBot NG's contributions page). However, there are some areas that could still be improved further. For example, the size of the window for the bayesian classifiers is just 2, meaning a vandalism edit with a phrase of 3 or more words (or extra words interspersed between) might get ignored. In fact, it may be better to use something like a Transformer (deep learning architecture) to more accurately obtain the meaning of the edit.
As far as I know, the principal maintainer of the bot (User:Crispy1989) has been inactive since 2011. Also pinging User:DamianZaremba since he seems to be active on the github repo. If I could, I would be excited to help improve the bot. Sungodtemple (talk • contribs) 01:04, 14 May 2025 (UTC)
Empty index??
[edit]User:ClueBot III/Master Detailed Indices/Talk:Transgender health care misinformation How does that happen? Aaron Liu (talk) 17:18, 15 May 2025 (UTC)
- ClueBot was never actually used to archive that page - It got added in Special:Diff/1269259214 and then removed in Special:Diff/1275633399 before it ever got to archive anything, so it makes sense that the index was nothing at the time and hasn't updated since. Aidan9382 (talk) 17:55, 15 May 2025 (UTC)