Wikipedia:Bots/Requests for approval/ChristieBot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: Mike Christie (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 22:34, Sunday, October 23, 2022 (UTC)
Function overview: Replace WP:GAN, currently written by Legobot. Create different versions of the GAN list in project space or userspace per user requests.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: User:ChristieBot/Source code
Links to relevant discussions (where appropriate): Discussion ongoing at WT:GAN, but an RfC will be started in addition for the proposal to take over writing WP:GAN.
Edit period(s): No more than every 20 minutes per target page.
Estimated number of pages affected: Probably single-digits -- WP:GAN plus perhaps a couple of differently formatted outputs such as User:ChristieBot/GANoms Single Table, depending on what is requested at WT:GAN or by individual users.
Namespace(s): Wikipedia and possibly User.
Exclusion compliant (Yes/No): No -- not relevant; only updates pages it has been specifically requested to update.
Function details: The bot iterates over Category:Good article nominees and looks up three further data items: edit count for the nominator, via the User class in pywikibot; reviews performed by the nominator, via the User:GA bot/Stats page, and GAs promoted for the nominator, via the database behind WP:WBGAN, which is currently bot-maintained. The resulting data and some calculated fields are then written to a page. The first approval for this bot only mentioned writing to User:ChristieBot/SortableGANoms; I would like to expand that to two more functions. One is to write WP:GAN instead of Legobot. This would only happen if there were a successful RfC. I have posted a note to Legoktm's talk page asking for comment on the proposed change. It might also be suggested at the RfC (which I have started a draft of here) that the bot should write more than one different page, since there might be demand for more than one way to sort it. I could imagine a request to write something like User:ChristieBot/GANoms Single Table on the same schedule, for example. The second function would be for any user who wishes a personalized list to be written to their userspace. I would propose to update such a page less often; probably daily. I doubt many users, if any, would want to take advantage, but if they did I don't want to have to come back here for further approvals. Note that if you wish to see what the alternate formats a user might request, the code also runs a tool which allows output of html or wikitext; it's the wikitext that would be generated for the user pages, per whatever filters they would select.
In case it's relevant, the first BRFA envisaged scanning the GA nominations page, but that would have built in a reliance on Legobot. Scanning the GA nominee category seemed more sensible.
Discussion
[edit]This might be a dumb question, but at any point was Legoktm brought in to this discussion? I don't know if it necessarily will impact the BRFA itself, but it seems like "I'm taking over this space without telling the primary bot operator" is a bit... unusual? (please do not ping on reply) Primefac (talk) 08:01, 25 October 2022 (UTC)[reply]
- I left a note on Legoktm's talk page just before posting this. I hadn't talked to them before because the original intention was not to replace the page -- I had intended the bot to read GAN, not write it. When I realized that reading the templates meant I had no reliance on Legobot I rewrote the code to do that, and it was when I got that working that I posted to Legobot's talk page and created this BRFA. My understanding of Legobot's involvement with GAN is that Legoktm has said for years they can no longer maintain or improve the GAN functionality of the bot, and there have been at least a couple of requests posted for bot writers to take over the functionality. Pinging BlueMoonset, who can probably cite chapter and verse on the attempts to find another bot. So when I realized that what I'd written could replace one piece of Legobot's functionality, I thought it would be regarded as a positive step.
- Legobot also updates the User:GA bot/Stats page, and adds informative edit summaries to the GAN updates, neither of which I can do with my current set up. One of the things I asked Legoktm on their talk page was whether taking over writing GAN would harm those other functions. I would need to know the answer to that before going ahead with writing GAN, of course. For me to take over those functions probably would require that I create a database to capture the state of the page; I could then, with each pass, compare the previous and current states and that would let me add the edit summary functionality and the stats page updates. I would only want to do that if Legoktm says the functionality is all in one piece and can't be separated. Otherwise I would prefer to write the GAN page and look into taking over the rest of the GAN functionality as a possible future project. Mike Christie (talk - contribs - library) 11:27, 25 October 2022 (UTC)[reply]
- I've been looking for someone to take on the GAN functionality for a few years now, multiple people expressed interest but it never went anywhere so this is great.
- On a coordination note, Legobot *does* currently read from WP:GAN to know what state reviews are currently in. I think we can just move Legobot's updates to a user subpage, and let ChristieBot take over the real WP:GAN page. Legoktm (talk) 20:36, 25 October 2022 (UTC)[reply]
- That would certainly work, but it means none of Legobot's GAN functions go away immediately. I will look into replacing the other functions after this phase. Legoktm, it looks like there are four GAN-related tasks: update GAN with the current state of the nominations from templates, using edit summaries drawn from the difference with the previous state of the page; update User:GA bot/Stats, again using the diff to understand what the update should be; post user talk page notifications to let users know their GAN has changed status; update the topic lists with the nomination information. I can see how to handle almost all of that, but how do you get the reviewer user name? By parsing the GA subpage? Or is there a better way? And a separate question, if you have a moment; I'm inclined to store the state of the GAN page in a database table with a timestamp, and use that to diff against the current state, to avoid having to read the page. Do you see any problem with that approach? Thanks -- Mike Christie (talk - contribs - library) 21:54, 25 October 2022 (UTC)[reply]
- I've been pinged, but I honestly can't remember who has expressed interest and/or been actually working on the Legobot replacement, only that no one has ever gotten to the point of trying out code. Someone could probably search the WT:GAN archives. However, there are two things that I'd like to point out. The first is that the Reports page is generated by WugBot (pinging Wugapodes) by parsing the currently existing WP:GAN page, so that any change in the page—or a relocation of it in the current format—will need to be coordinated with Wugapodes so we don't lose the Reports functionality. The second is that the new GAN page appears to remove the parts of the page that give valuable review information: reviewer name and the date when the review began. That information needs to be easily available and on a GAN page: when we have an issue with a missing or problematic reviewer, a quick search of the GAN page shows us each of their active reviews (and nominations). Mike, can the current sample pages be updated with a key? I'm only guessing at what some of the columns really mean. BlueMoonset (talk) 03:26, 26 October 2022 (UTC)[reply]
- Per Legoktm above, initially we'll have Legobot write the current GAN page to a subpage, because it needs to do that to provide the edit summaries and to know what has passed or failed, so it can post talk page messages. So WugBot can keep reading Legobot's output to produce that data. For the reviewer: I don't currently have a way to extract the reviewer name but can look at doing that and adding the name as a column. One thing that's come up in the WT:GAN discussions is that we don't want too many columns as the table quickly gets too wide for many screens to easily navigate. We could add the reviewer name, perhaps inside the notes column; we could also have a separate page that includes different columns where you could sort by reviewer. And for now, at least, we could still use the Legobot written page, which won't be going away just yet, just moving to a subpage.
- It's becoming clearer to me that it would make the most sense if ChristieBot took over all GAN functions, plus WugBot's updates to the GA report. I think it makes the most sense to go ahead with this BRFA which only asks to rewrite the GAN page and a few other pages on request, and doesn't take over Legobot or WugBot. Once that's in place and running smoothly, with Legobot's page still available as a fallback in case ChristieBot runs into trouble, I will work on designing the back end changes that I would need to make to perform the other tasks.
- To your last point: the column headings were abbreviated for space, but there are tooltips if you hover your mouse. I can easily add whatever explanatory material we want at the top of the page, along with the usual boilerplate for the GAN page. Mike Christie (talk - contribs - library) 10:59, 26 October 2022 (UTC)[reply]
- One more point: WugBot has been tested with the ChristieBot version of the existing GAN format and is working correctly. Mike Christie (talk - contribs - library) 12:04, 2 November 2022 (UTC)[reply]
- I've been pinged, but I honestly can't remember who has expressed interest and/or been actually working on the Legobot replacement, only that no one has ever gotten to the point of trying out code. Someone could probably search the WT:GAN archives. However, there are two things that I'd like to point out. The first is that the Reports page is generated by WugBot (pinging Wugapodes) by parsing the currently existing WP:GAN page, so that any change in the page—or a relocation of it in the current format—will need to be coordinated with Wugapodes so we don't lose the Reports functionality. The second is that the new GAN page appears to remove the parts of the page that give valuable review information: reviewer name and the date when the review began. That information needs to be easily available and on a GAN page: when we have an issue with a missing or problematic reviewer, a quick search of the GAN page shows us each of their active reviews (and nominations). Mike, can the current sample pages be updated with a key? I'm only guessing at what some of the columns really mean. BlueMoonset (talk) 03:26, 26 October 2022 (UTC)[reply]
- That would certainly work, but it means none of Legobot's GAN functions go away immediately. I will look into replacing the other functions after this phase. Legoktm, it looks like there are four GAN-related tasks: update GAN with the current state of the nominations from templates, using edit summaries drawn from the difference with the previous state of the page; update User:GA bot/Stats, again using the diff to understand what the update should be; post user talk page notifications to let users know their GAN has changed status; update the topic lists with the nomination information. I can see how to handle almost all of that, but how do you get the reviewer user name? By parsing the GA subpage? Or is there a better way? And a separate question, if you have a moment; I'm inclined to store the state of the GAN page in a database table with a timestamp, and use that to diff against the current state, to avoid having to read the page. Do you see any problem with that approach? Thanks -- Mike Christie (talk - contribs - library) 21:54, 25 October 2022 (UTC)[reply]
- Approved for trial (30 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 12:19, 26 October 2022 (UTC)[reply]
Primefac, since it's apparent I need to replicate Legobot's layout of the main GAN page unless I take over the other functions, and since Legoktm confirms above that he wants someone to take over the GAN functions of Legobot, I've gone and ahead and written the code for those other functions too. I would like to change the stated purpose of this BRFA to be: take over all GAN functionality from Legobot, and also add the ability to create new GAN pages per the BRFA description above. That way if this is successful and ChristieBot takes over, I would not have to come back here for a subsequent RfC regarding changing the format of GAN, even though that might mean writing several new pages. Re progress: the bot is creating copies of the various Legobot pages and I am finding and fixing issues that have arisen; I'll report back here when I've had several consecutive days of trouble-free running. Mike Christie (talk - contribs - library) 12:05, 2 November 2022 (UTC)[reply]
- I figured that was already what you were doing. Go for it. Primefac (talk) 12:08, 2 November 2022 (UTC)[reply]
- OK, thanks. Sorry if I should have checked in here before writing that code; not very familiar with bot approval protocol yet. By the way, a related question: I have thought of a way to rebuild some GA history in the back end database for the bot which would require iterating over all current and past GAs. No edits to Wikipedia would be required but it would be a costly job. If I write it, I would make sure it ran slowly so as not to hog resources, but for that kind of task do I need to come to BRFA? Or is it just "be sensible about toolforge resources"? Mike Christie (talk - contribs - library) 12:41, 2 November 2022 (UTC)[reply]
- I'd say the latter (
be sensible about toolforge resources
). In general, tasks which are not visible to other editors in terms of edits/actions (i.e. you're only querying things) do not require a BRFA. BRFA mostly ensures tasks have consensus, are operated by a suitable operator, and aren't buggy to the point of being disruptive. System performance is the domain of Wikimedia/Toolforge sysadmins, who I guess would get in touch with you if your project was using too many resources. #wikimedia-cloud on IRC is usually a good place to chat with them if you want thoughts in advance. - BTW I don't think such a job would be costly, especially if you only query articles matching some characteristic of a GA (eg Special:WhatLinksHere/Template:Good article). ProcrastinatingReader (talk) 16:27, 2 November 2022 (UTC)[reply]
- Thanks; that's what I thought but wanted to check. Re the job, it would iterate over all present and past GAs, using either a category or what-links-here or possibly the publicly accessible database behind WP:WBGAN, and then scan the revisions of the talk page for GA-related events -- nomination, change of nominee status, pass or fail, delisting. Say 70,000 pages and probably an average of 50-100 talk page revisions. And I would expect clean data no more than, say, 50% of the time, so I would have to keep rerunning it with fixes for the ones that didn't come up clean. I think it could be expensive. But it's just a plan at the moment; I haven't tried looking at any rev histories to confirm it could be done yet. Mike Christie (talk - contribs - library) 16:57, 2 November 2022 (UTC)[reply]
- That sounds quite similar to how the WBGAN database was initially generated – see processArticle() in https://github.com/siddharthvp/SDZeroBot/blob/master/most-gans/model.ts. – SD0001 (talk) 19:39, 4 November 2022 (UTC)[reply]
- Thanks! I'm not an experienced Python programmer but looking through I think I see what's going on. Am I right in thinking you don't keep a record in the nominators table in WBGAN if it loses GA status? That is, I can use nominators as a starting point for current GAs, but I would need a regeneration process to find and record all ex-GAs? Mike Christie (talk - contribs - library) 20:52, 4 November 2022 (UTC)[reply]
- That sounds quite similar to how the WBGAN database was initially generated – see processArticle() in https://github.com/siddharthvp/SDZeroBot/blob/master/most-gans/model.ts. – SD0001 (talk) 19:39, 4 November 2022 (UTC)[reply]
- Thanks; that's what I thought but wanted to check. Re the job, it would iterate over all present and past GAs, using either a category or what-links-here or possibly the publicly accessible database behind WP:WBGAN, and then scan the revisions of the talk page for GA-related events -- nomination, change of nominee status, pass or fail, delisting. Say 70,000 pages and probably an average of 50-100 talk page revisions. And I would expect clean data no more than, say, 50% of the time, so I would have to keep rerunning it with fixes for the ones that didn't come up clean. I think it could be expensive. But it's just a plan at the moment; I haven't tried looking at any rev histories to confirm it could be done yet. Mike Christie (talk - contribs - library) 16:57, 2 November 2022 (UTC)[reply]
- I'd say the latter (
- OK, thanks. Sorry if I should have checked in here before writing that code; not very familiar with bot approval protocol yet. By the way, a related question: I have thought of a way to rebuild some GA history in the back end database for the bot which would require iterating over all current and past GAs. No edits to Wikipedia would be required but it would be a costly job. If I write it, I would make sure it ran slowly so as not to hog resources, but for that kind of task do I need to come to BRFA? Or is it just "be sensible about toolforge resources"? Mike Christie (talk - contribs - library) 12:41, 2 November 2022 (UTC)[reply]
I need to replicate Legobot's layout of the main GAN page unless I take over the other functions
I don't think this is the case, I can just have Legobot write and parse the page as it wants in userspace rather than from WP:GAN. But if you're getting set to take all the GAN stuff over, that would be even better! Legoktm (talk) 01:50, 5 November 2022 (UTC)[reply]- Good to know, but I hope I can take over all the functions. I think I have it now doing everything live, at least to test pages, except for the transclusions, which are just commented out. I fixed an issue with the reviewing stats today so I’d like to let it run for another few days and see if anything else comes up as an issue. Mike Christie (talk - contribs - library) 02:24, 5 November 2022 (UTC)[reply]
Update
[edit]The bot has now been running trouble-free for several days. I noticed a couple of days ago that there was one more function that Legobot does that I hadn't written: adding the oldid to a new GA's talk page. I've added that function and tested it as far as is possible; I can't really test it fully till the bot takes over from Legobot. The only remaining thing I can think of that could be a problem is that there might be some other bots that depend on the current GAN page format. WugBot does depend on it, but works fine with the new format. SDZeroBot does not depend on the format of the page. The only other bot I know of that cares about GANs is AAlertBot; Hellknowz/Headbomb, would it matter to AAlertBot if the format of the GAN page changed from WP:GAN to include User:ChristieBot/GAN existing format as the body instead? The head and tail text would be unchanged. Or are you using page categories to find GANs instead? In which case there would be no issue. Mike Christie (talk - contribs - library) 03:24, 11 November 2022 (UTC)[reply]
- @Mike Christie AAlertBot uses GAN categories, so it shouldn't matter what happens to the WP:GAN page. — HELLKNOWZ ∣ TALK 10:49, 11 November 2022 (UTC)[reply]
- OK, thanks; glad to hear it. Mike Christie (talk - contribs - library) 10:55, 11 November 2022 (UTC)[reply]
Given the answer from Hellknowz above, I think ChristieBot is ready to take over from Legobot whenever this is approved. Is it usual to wait till the end of the 30-day trial period regardless? Mike Christie (talk - contribs - library) 10:55, 11 November 2022 (UTC)[reply]
- I think that'd be ok and the trial results seem fine to me; Primefac?
- I'm assuming you'd need to coordinate the swap-over with Legoktm? ProcrastinatingReader (talk) 17:18, 13 November 2022 (UTC)[reply]
- Yes, definitely. Legoktm, let me know when would be a good time for you. I'd like to do it no later than tomorrow afternoon (east coast time), or else wait till Thursday morning (11/17), as I will be unable to do anything about any issues that might come up for most of Tuesday and Wednesday. Mike Christie (talk - contribs - library) 17:21, 13 November 2022 (UTC)[reply]
- Thursday morning works, I'll probably just turn it off late Wednesday night then. Legoktm (talk) 01:49, 14 November 2022 (UTC)[reply]
- Legoktm, any time from now till tomorrow mid-afternoon would also work. If/when you do shut it off please let me know when you do so I can change the cron job and page name targets for ChristieBot right afterwards. ProcrastinatingReader, Primefac, can ChristieBot have the rapid editing flag turned on? It’s not necessary but it would sure speed up any debugging I might need to do. Mike Christie (talk - contribs - library) 02:24, 14 November 2022 (UTC)[reply]
- I'm not sure what you mean by the rapid editing flag? I think the bot flag (which ChristieBot currently has due to Wikipedia:Bots/Requests for approval/ChristieBot) should already give you
noratelimit
, if that's what you mean? ProcrastinatingReader (talk) 23:00, 16 November 2022 (UTC)[reply]- I assumed the bot was rate limited because when I run it in ssh I get a "sleeping for 9.2 second" message every time I write a page. If that's not rate limiting, what causes that? Mike Christie (talk - contribs - library) 23:11, 16 November 2022 (UTC)[reply]
- You're using pywikibot it seems? I'm completely guessing but the library might have support for
maxlag
. Basically the API returns the load the servers are under, and clients can choose to look at that load value and act accordingly. Guidance is to not run high-priority bot tasks in high-load periods. So pywikibot may under the hood be adhering to that by rate limiting in times of high load. There may also be built in throttling in the library to stop too many requests being sent in short spaces of time. - In any case, I'm pretty sure that behaviour is a function of the library you're using, not the site blocking the request. You may be able to disable this logic. Looking at mw:Manual:Pywikibot/Global Options maybe setting appropriate values for
maxlag
andput_throttle
will do the trick. ProcrastinatingReader (talk) 02:59, 17 November 2022 (UTC)[reply]- Thanks -- I'll look into that; I appreciate the pointer. With any luck the bot will work perfectly and will need no debugging so it won't be an issue.... Mike Christie (talk - contribs - library) 03:54, 17 November 2022 (UTC)[reply]
- You're using pywikibot it seems? I'm completely guessing but the library might have support for
- I assumed the bot was rate limited because when I run it in ssh I get a "sleeping for 9.2 second" message every time I write a page. If that's not rate limiting, what causes that? Mike Christie (talk - contribs - library) 23:11, 16 November 2022 (UTC)[reply]
- I'm not sure what you mean by the rapid editing flag? I think the bot flag (which ChristieBot currently has due to Wikipedia:Bots/Requests for approval/ChristieBot) should already give you
- Legoktm, any time from now till tomorrow mid-afternoon would also work. If/when you do shut it off please let me know when you do so I can change the cron job and page name targets for ChristieBot right afterwards. ProcrastinatingReader, Primefac, can ChristieBot have the rapid editing flag turned on? It’s not necessary but it would sure speed up any debugging I might need to do. Mike Christie (talk - contribs - library) 02:24, 14 November 2022 (UTC)[reply]
- Thursday morning works, I'll probably just turn it off late Wednesday night then. Legoktm (talk) 01:49, 14 November 2022 (UTC)[reply]
- Yes, definitely. Legoktm, let me know when would be a good time for you. I'd like to do it no later than tomorrow afternoon (east coast time), or else wait till Thursday morning (11/17), as I will be unable to do anything about any issues that might come up for most of Tuesday and Wednesday. Mike Christie (talk - contribs - library) 17:21, 13 November 2022 (UTC)[reply]
I've turned off Legobot's GA tasks. Legoktm (talk) 19:55, 17 November 2022 (UTC)[reply]
- ChristieBot is now running every 20 minutes. There was a bug that led to some duplicate GA review transclusions, which is now fixed. It has gone through a couple of times cleanly now, and I've fixed another very minor formatting issue. I'll update here only if there are major issues. Mike Christie (talk - contribs - library) 23:43, 17 November 2022 (UTC)[reply]
- Approved. Looks like all is running well and the switch-over has completed. So I'm going to go ahead and approve this, in line with the amended scope
take over all GAN functionality from Legobot, and also add the ability to create new GAN pages per the BRFA description above
. Feel free to get in touch if you have any questions or need to seek an amendment to the task. ProcrastinatingReader (talk) 14:01, 19 November 2022 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.