Wikipedia:Bots/Requests for approval/IndentBot: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Back: I think so
Line 318: Line 318:
Hi I'm back from break. If people feel this bot will be useful let's continue. Only thing I'm worried about is if people think it's a nuisance relative to the number of people it helps. [[User:Notacardoor|Winston]] ([[User talk:Notacardoor|talk]]) 20:01, 27 February 2022 (UTC)
Hi I'm back from break. If people feel this bot will be useful let's continue. Only thing I'm worried about is if people think it's a nuisance relative to the number of people it helps. [[User:Notacardoor|Winston]] ([[User talk:Notacardoor|talk]]) 20:01, 27 February 2022 (UTC)
*{{U|David Eppstein}}, sorry to put this on you, but you pursued the issues here farther than I did. Are your (our) concerns all addressed? [[User:EEng#s|<b style="color:red;">E</b>]][[User talk:EEng#s|<b style="color:blue;">Eng</b>]] 20:21, 27 February 2022 (UTC)
*{{U|David Eppstein}}, sorry to put this on you, but you pursued the issues here farther than I did. Are your (our) concerns all addressed? [[User:EEng#s|<b style="color:red;">E</b>]][[User talk:EEng#s|<b style="color:blue;">Eng</b>]] 20:21, 27 February 2022 (UTC)
**I think my concerns are addressed by {{tq|"I have limited the bot to listgap and non-final-character indentmix fixes only. Indentation levels and final indentation characters are not changed"}}. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 20:26, 27 February 2022 (UTC)

Revision as of 20:26, 27 February 2022

New to bots on Wikipedia? Read these primers!

Operator: Notsniwiast (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:20, Friday, October 15, 2021 (UTC)

Function overview: Adjust indentation on discussion pages.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python, pywikibot

Source code available: On Github

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests/Archive_83#Bot_to_fix_indents

Edit period(s): Continuous (tracking recent changes on a delay)

Estimated number of pages affected: Depends on parameters. With delay of 10 minutes, around 20-30 pages are checked per 10 minutes (see function details below). Initially, most pages having substantial content will be edited, but since the bot processes the entire page, this will get reduced over time as it covers more ground.

Namespace(s): All talk namespaces, and the project namespace. Not sure if any other namespaces have discussion pages.

Exclusion compliant (Yes/No): Yes, uses pywikibot's save function.

Function details: First, the wikitext is partitioned into lines in the usual manner using \n as a delimiter, except that certain newlines, such as those immediately preceding table, template, or tag (as detected by WikiTextParser), are not considered the end of a line. Then we apply fix_gaps, fix_extra_indents, and fix_indent_style to the sequence of lines.

Definitions

  • The indentation characters are *, :, and #.
  • Given a line X, we denote the indentation characters of the line by indent_text(X), and we denote the indentation level by lvl(X). In particular, if X is not indented then lvl(X) == 0.
  • A blank line is a line consisting of whitespace only.
  • A gap is a nonempty contiguous sequence of blank lines sandwiched between two indented lines, which are called the opening line and closing line.
  • The length of a gap is the length of the sequence of blank lines.

Fixes

  1. fix_gaps: This fix has many variations. Let A and B be the opening and closing lines, respectively. No gap with an opening or closing line beginning with # is removed. Otherwise, all length 1 gaps are removed, and longer gaps are removed only if lvl(B) > 1.
  2. fix_extra_indents: We iterate over the lines from beginning to end. If we encounter a line A followed by a line B such that lvl(B) > lvl(A) + 1, then the subsequent chunk of lines which have indentation level greater than or equal to lvl(B), beginning with B, is shifted to the left by lvl(B) - lvl(A) - 1 positions. This is done by stripping out indent_text[lvl(A):lvl(B)-1] (in Python notation) from these lines.
  3. fix_indent_style: We iterate over the lines from beginning to end and adjust the indent_text of each line to use corresponding characters from the closest previous line with the same or smaller level, except that # characters are not removed from, introduced to, or shifted inside a line.

The above description leaves out some details (namely some exceptions for edge cases). The fixes are repeatedly applied in the above order until another round won't alter the page (one round is almost always enough).

It's basically impossible to handle all edge cases and it's not difficult to come up with some of them, especially when you use ordered lists and combinations of possible mistakes. The hope is these are rare enough to be acceptable.

The bot tracks recent changes with a delay minute delay in chunks of chunk minutes, checking for non-minor non-bot edits which include a user signature with the edit that have not been superseded in the most recent delay minutes. The effect of this is that IndentBot is activated by signature-adding edits only, and does not edit any page which has had a signature-adding edit in the most recent delay minutes. I believe delay should be set to 10 to 30 minutes. Too long of a delay results in editors manually fixing indentation in active discussions, partially defeating the purpose of the bot. Non-talk pages must have at least 3 signatures to be edited, ensuring that a single accidental signature to a non-discussion page doesn't trigger the bot. Most sandboxes are avoided.

Discussion

  • Also, can someone make IndentBot a confirmed user so that it can bypass CAPTCHAs? Winston (talk) 04:01, 15 October 2021 (UTC)[reply]
    • Nevermind, now autoconfirmed. Winston (talk) 22:54, 15 October 2021 (UTC)[reply]
  • Does anyone know why I still see some bots when filtering recent changes for human edits only? Winston (talk) 08:25, 17 October 2021 (UTC)[reply]
    Answered here. Winston (talk) 01:51, 18 October 2021 (UTC)[reply]
  • information Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 23:07, 18 October 2021 (UTC)[reply]
    Sorry, ran the wrong function once. Winston (talk) 01:42, 19 October 2021 (UTC)[reply]
  • Thanks for working on this. In response to Not sure if any other namespaces have discussion pages, DYK noms are the odd example that always comes to mind, e.g. Template:Did you know nominations/La Folia Barockorchester. It's probably fine if we skip these to keep things simple, though. — The Earwig (talk) 03:39, 20 October 2021 (UTC)[reply]
    If it's only a couple cases like the DYK noms, then it's pretty easy to handle them with a quick title prefix check. Winston (talk) 03:53, 20 October 2021 (UTC)[reply]
  • The code has pretty much settled and the bot is ready for a short trial if the example diffs given look good. Winston (talk) 04:07, 20 October 2021 (UTC)[reply]

Approved for trial (50 edits or 7 days, whichever happens first). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 07:35, 20 October 2021 (UTC)[reply]

@Primefac Should edits be minor? Winston (talk) 07:36, 20 October 2021 (UTC)[reply]
For the trial, let's go with "no" so that it receives a bit more scrutiny. I think if this goes through, marking as minor would match similar bots. Primefac (talk) 09:06, 20 October 2021 (UTC)[reply]

Trial complete. See the diffs here.

  • Haven't looked too carefully yet, but one edge case I saw was Line 80 in the diff Wikipedia:Arbitration enforcement log/2021 involving {{Div col}}. It would be fine if {{Div col}} started on the same line as the comment. Winston (talk) 12:56, 20 October 2021 (UTC)[reply]
    Possible fix is to not adjust style if the previous line contains an exceptional newline character, in this case the exceptional newline is the one just before {{Div col}} (since newlines just before templates do not count as delimiters in the line partitioning phase). Winston (talk) 13:03, 20 October 2021 (UTC)[reply]
  • information Note: I suspect the easiest way to handle edge cases as they come up is to simply prevent the bot from making certain edits to certain lines, rather than trying to handle every case correctly. Winston (talk) 13:07, 20 October 2021 (UTC)[reply]
  • Not sure if this is open to all comments, feel free to remove if not. I came her after seeing the edit at Talk:List of Ayatollahs and I was interested. If you look at the edit it made, it didn't manage to get it correct. Although it start well it fails at the signature section starting "please study the answers", which should have been indented. Because it missed this, all the edits made afterwards are wrong.
    Also the messages it changed are over a decade old, will it be normal practice for it to change messages that are that old or was this part of the test? ActivelyDisinterested (talk) 23:34, 20 October 2021 (UTC)[reply]
    The edit looks better formatted to me and I don't see unintended edge cases, though I'm interested in others' opinions. Be sure to check out the links at User:IndentBot#IndentBot to understand why and how the indentation is being adjusted.
    As for old messages, the bot does not take that into account. It adjusts indentation on the entire page at once. For more active talk pages, old discussions are often stored in archives. Since the bot is only activated by a recent edit with signature, archived pages shouldn't be touched. Winston (talk) 00:07, 21 October 2021 (UTC)[reply]
    It's indented part of a message, and left the last section unindented (the second grey unedited section). That's definitely not right. ActivelyDisinterested (talk) 00:46, 21 October 2021 (UTC)[reply]
    Could you partially quote the lines you are referring to? Note that the bot does not fix indentation completely—in particular, it does not add extra indentation (so unindented lines will remain unindented). It only changes indentation characters, removes blank lines, and reduces over-indentation. Winston (talk) 00:56, 21 October 2021 (UTC)[reply]
    Ah! I see what happened. The section at issue in full is:
    please study the answers in his discussion pages in different languages. Academycanada (talk) 03:21, 24 November 2009 (UTC)[reply]
    This is the end of the message that began in full:
    :By a simple search, you can find the sources such as Islamic organizations, independent websites and academic institutions which introduced him as one of Marjas and Grand Ayatollahs. Here are some of them in different languages:. note the message starts with an indent
    The start of the message was indented, the bot correctly indented the middle lines of the message, but the end section was not originally indented and so the bot ignored it. As you said unindented lines remain unindented, but that does leave one message with two levels of indents. ActivelyDisinterested (talk) 01:08, 21 October 2021 (UTC)[reply]
    Yeah, this bot isn't smart enough to fix all errors. It would have to be way more advanced to tackle issues at a "per message" level of detail, and even then there are too many edge cases. Winston (talk) 01:14, 21 October 2021 (UTC)[reply]
  • Made minor improvements to the line partitioning. Also, fix_indent_style now resets its "memory" after list-breaking newlines. This behavior makes more sense and is more faithful to the original indentation. It solves quite a few bugs including the {{Div col}} one I mentioned earlier. Winston (talk) 05:08, 22 October 2021 (UTC)[reply]
  • @Primefac: Can I do another trial to draw more scrutiny? (Also want to test it on Toolforge this time.) Winston (talk) 08:42, 22 October 2021 (UTC)[reply]
  • Regarding [1]: this is not the bot's fault, but on AFDs its convention to use bullets for voting, but delsort notices use colon indent. In this case, the bot changed all bullets to colons after the first delsort notice – and this would happen on literally every AFD. Can something be done about this?
    Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I would suggest citing a policy/information page in the edit summary. Also, consider using minor edit flag for user talk pages even in trial as otherwise the users would get new messages alert. – SD0001 (talk) 12:24, 24 October 2021 (UTC)[reply]
    Ok, edits to user talk pages will get minor edits. Also, I can add a simple exception for comments beginning with <small class="delsort-notice". Added link to MOS:INDENTMIX to the edit summary. Winston (talk) 12:31, 24 October 2021 (UTC)[reply]
    @SD0001: Actually, how are these delsort notices inserted? Are they manually typed out or is there some automation involved? I noticed one of them just used <small> without the "class" attribute. Winston (talk) 12:48, 24 October 2021 (UTC)[reply]
    Found it. It is {{Deletion sorting}}. But I guess every now and then someone adds it manually. I'll just use a regex for a small tag followed by "Note:". Winston (talk) 12:57, 24 October 2021 (UTC)[reply]
    Yes, that template is substed by a couple of tools – MediaWiki:Gadget-twinklexfd.js and User:Enterprisey/delsort.js being the two common ones. – SD0001 (talk) 12:58, 24 October 2021 (UTC)[reply]
    @Notsniwiast Actually, I went ahead and boldly edited that template to use a bullet instead. If no one reverts my edit, then an exception would be unnecessary. – SD0001 (talk) 13:02, 24 October 2021 (UTC)[reply]
    @SD0001 Do you still want me to add this exception for the trial, then maybe remove it later? Winston (talk) 13:05, 24 October 2021 (UTC)[reply]
    Yes that would be better, as the bot would be touching many pages that already have colon-indented delsort notices. – SD0001 (talk) 13:06, 24 October 2021 (UTC)[reply]

Trial complete. See the contributions here, or see the diffs in alphabetical order here. Winston (talk) 14:50, 24 October 2021 (UTC)[reply]

  • It seems Wikipedia:Categories for discussion also uses some templates which trigger the bot. Winston (talk) 14:53, 24 October 2021 (UTC)[reply]
    See Template talk:Cfd2#Remove leading colons regarding those templates. – SD0001 (talk) 16:58, 24 October 2021 (UTC)[reply]
  • I think this edit should not have been made. It divided User:Salimfadhley's comment into 3 bullet points, when it looks like they intended to create an effect similar to parabreaks. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 15:53, 24 October 2021 (UTC)[reply]
  • Special:Diff/1051601137 is a worry for me. Not necessarily because the bot shouldn't have made the edit, but because those entries were all made by the default templates. Either we change the template, exclude the PERM pages from the bot, or accept the fact that every time someone requests a permission the bot will follow behind and fix it. I think option 1 (changing the pre-set layout) is likely best but that will likely require further discussion and/or consensus, especially since there's a bot that needs to clerk (not sure how that will affect it). Primefac (talk) 17:08, 24 October 2021 (UTC) sorry for the no-show today, dealing with a rather heavy headache for some reason[reply]
    • Yeah it seems there's a couple of these templates around. I guess the plan right now is to exclude the relevant pages, and include them later if the templates are changed. But I'm still not sure if all the relevant entries are made using templates. I see some variation in the delsort notices, e.g. <small class="delsort-notice"> versus just <small>, so unless there's more than one version of the templates or editors are doing it manually, there might be some other tools involved (I don't know anything about assisted editing tools). Another example is Wikipedia:Articles_for_deletion/Metropolitan_Gazette_(2nd_nomination) where the delsort notices still use : even though it was made after SD0001's edit. For now, I will skip "Wikipedia:Requests_for_permissions/" and "Wikipedia:Categories for discussion/". The notices using <small> tags were already handled for the trial. Winston (talk) 02:16, 25 October 2021 (UTC)[reply]
    I ended up asking one editor why their delsort notices didn't have the class attribute, and apparently they were just doing it manually. So I think it's likely that variations are due to manual edits. Winston (talk) 11:14, 25 October 2021 (UTC)[reply]
  • information Note: (This is in reply to ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ's comment, but I'm posting it here since it's more generally relevant.) Unfortunately, there’s not much to do in these inevitable cases. SD0001 brought up a similar example before. The nature of the problem requires that the bot operate on entire discussions at once. As a result, anything more than a single minor “violation” in a discussion makes it impossible to create a consistent and accessible list without sometimes changing an editor’s indentation visually. Making exceptions leaves broken lists/markup, and often just shifts the issue to a different part of the list. Since the change is usually minor and doesn’t alter core content, I hope this is acceptable. I also hope the bot’s work will increase awareness of templates such as {{pb}} and {{HTML lists}} which address the most common reasons (that I’ve seen) for incorrect markup. I have links to these templates and other guidelines on the bot's user page. Winston (talk) 02:26, 25 October 2021 (UTC)[reply]
  • I have noticed the bot doing useless edits removing blank lines, which is not needed. In fact everything listed for this bot to do is useless. I will deliberately indent more or change style of indent , so it looks as if this will try to undo that. Looks like this bot is trying to fix a non-problem. Surely tehre are more useful things to do with bots around here. Graeme Bartlett (talk) 10:24, 26 October 2021 (UTC)[reply]
    @Graeme Bartlett Could you provide an example of your using over-indentation or changing indent style, for which normal indentation would be inadequate and for which an accessible solution is impractical? From what I've seen, this is quite rare, but it could be an edge case that can be avoided. Winston (talk) 02:46, 27 October 2021 (UTC)[reply]
 – Winston (talk) 11:10, 26 October 2021 (UTC) First time moving a discussion, tell me if I did it incorrectly.[reply]

This ?bot? made a useless edit here: https://en.wikipedia.org/w/index.php?title=Talk:Bicarbonate&curid=1450293&diff=1051599806&oldid=1051598562

which has no effect on the output we see. I thought that bots were not permitted to make cosmetic only changes. Even if the extra blank line is redundant, ether is no need to remove it! Graeme Bartlett (talk) 10:18, 26 October 2021 (UTC)[reply]

Break

What's the status of this? Not sure where to go from here. I've noticed that on mobile, bulleted and unbulleted comments don't line up (check here for example), so the bot is even more effective there. Winston (talk) 01:06, 29 October 2021 (UTC)[reply]

{{BAG assistance needed}} Winston (talk) 09:17, 31 October 2021 (UTC)[reply]
I think this needs another round of trial, this time a larger one. The CfD templates have been fixed per talkpage note, and I see you've edited the PERM template too. As for WP:RFUD, which is where I assume @Graeme Bartlett is coming from, the issue seems to be that {{UND}} when substed produces a bullet indent, but most users haven't noticed this and are anyway adding a indent character of their own.
Also, I think the issue of changing the final indent character should be discussed. I don't have any preferences, but I think changing a visible bullet to no bullet (or vice versa, see several cases in [2]) can be seen as intrusive. Would like to hear others' thoughts on this. – SD0001 (talk) 12:59, 31 October 2021 (UTC)[reply]
Apologies for radio silence on this one, it's relatively low-priority at this point in my life, but I do agree based on a read-through here that a further trial would probably be good. Primefac (talk) 13:03, 31 October 2021 (UTC)[reply]
I did realize that changing the final (and hence visual) character could be annoying, but the point is that mixing characters shouldn't happen in the first place. So if the final indent character is not changed, it neuters a large portion of the fixes. Even a simple single-level list such as
* Comment 1.
: Comment 2.
* Comment 3.
: Comment 4.
would be left as four separate lists in HTML and to screen readers. Let me see if I can compute approximately what fraction of indentation style fixes occur in the final character. Winston (talk) 13:17, 31 October 2021 (UTC)[reply]
In Category:Non-talk pages that are automatically signed (just using this to get a quick collection of pages), 2770 lines would have indentation characters altered, and 839 of those lines would have an altered final character. Each altered character represents (almost always) a new list being started where there shouldn't be. Winston (talk) 13:31, 31 October 2021 (UTC)[reply]
@SD0001 I'm confused about the {{UND}} template. When I substed it into my sandbox I didn't see a bullet point, and the template's doc doesn't show bullet points either. I believe Graeme Bartlett noticed the bot through the diff they linked. Winston (talk) 14:26, 31 October 2021 (UTC)[reply]
Indeed it doesn't. I assumed that was the reason why so many of the RFUD comments were over-indented ([3]). – SD0001 (talk) 14:36, 31 October 2021 (UTC)[reply]

@SD0001 I reviewed the "last character issue" and I see how it can be intrusive when, for example, the first comment in a level is unbulleted, but the following comments are all or mostly bulleted which then get changed by the bot. Two examples are in the sections "Unban request for Soumya-8974" and "SoyokoAnis unban appeal" in this diff. Perhaps I could implement a compromise where the bot first computes which type (bulleted or unbulleted) is more common for each level, then when it encounters an INDENTMIX violation, it uses the more common type. Winston (talk) 10:10, 1 November 2021 (UTC)[reply]

With this strategy, the number of lines with altered final character gets reduced by 25% to 630. Winston (talk) 13:27, 1 November 2021 (UTC)[reply]

I've made a number of slight improvements to each of the three fixes and I think the bot is ready for a third trial. I don't think the final character issue can be mitigated any more without simply ignoring final character INDENTMIX violations. I guess we can see whether anyone complains during/after the trial. I'll continue the non-minor edit policy except for user talk pages for the trial to draw more scrutiny. Winston (talk) 13:56, 2 November 2021 (UTC)[reply]

@SD0001: I realized the bot wasn't conservative enough and would sometimes make the text harder to understand by not preserving the original editor's indentation style. After some brainstorming and trial and error, I've managed to make the bot respect the original indentation much more while sacrificing a bit of accessibility, i.e. it defers to the original text for certain INDENTMIX violations. The number of final indentation characters changed has been reduced a further 48%. Can we start a third trial? Winston (talk) 11:04, 4 November 2021 (UTC)[reply]

Sure go ahead. Approved for extended trial (200 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.SD0001 (talk) 06:48, 5 November 2021 (UTC)[reply]
An indent-bot is definitely required. Some editors make mistakes with their indents. Some simply don't know how to indent. Most frustrating? some deliberately mis-indent (usually after their mistakes have been pointed out) & when they 'continue' to deliberate mis-indent? it's basically their way of giving you (the adviser) the figurative 'middle finger'. GoodDay (talk) 17:47, 5 November 2021 (UTC)[reply]

Trial feedback

Examples
  • I just reverted this massive refactoring when I saw this bot editing my discussion; I chose bullets on purpose to break that section apart. — xaosflux Talk 13:53, 5 November 2021 (UTC)[reply]
  • Here is another example: diff - this doesn't make sense, that first line was clearly not intended to be part of the "discussion" - so was stylized differently. — xaosflux Talk 14:01, 5 November 2021 (UTC)[reply]
  • More bad edits (already reverted by another editor). — xaosflux Talk 14:03, 5 November 2021 (UTC)[reply]
  • Lets not chase around another bot example that I can assume was specifically programmed to edit one way already. — xaosflux Talk 14:09, 5 November 2021 (UTC)[reply]
  • Another example diff that made the new list worse, see the section around "Person who is autistic" - where this bot has introduced double bullets. — xaosflux Talk 14:54, 5 November 2021 (UTC)[reply]
Discuss
  • I think this task is going to need a much larger discussion before being released on all edits, all the the time; I expect it will continue to make contentious edits that don't have a policy to support them (i.e. a policy that only certain indentation or list styles are allowed to be used). — xaosflux Talk 13:58, 5 November 2021 (UTC)[reply]
  • The more I look at these edits, the more fundamentally broken I think this is. Perhaps as an OPT-IN-ONLY on certain pages it could be useful? — xaosflux Talk 14:04, 5 November 2021 (UTC)[reply]
    I guess I should disable altering the final indent character completely for now. Too many edge cases. Sorry about that. I'll review the diffs you posted and see if the bot would still have made those edits after disabling this behavior. The final character issue was brought up before, but I underestimated the problem. Winston (talk) 14:09, 5 November 2021 (UTC)[reply]
    Yes, I'd suggest not changing the final character indents at all. They're not much of an accessibility issue in practise I believe, and fixing them is clearly looking like more trouble than it's worth. – SD0001 (talk) 14:47, 5 November 2021 (UTC)[reply]
  • I don't know what you people were thinking when you approved this thing, but it's completely screwing up existing discussions [4]. And BTW, according to a friend who actually uses a screen reader, the whole idea that indenting patterns are this big deal is a myth. This has the potential to make literally hundreds of thousands of discussions and posts unintelligible. Cut it out RIGHT NOW. EEng 14:10, 5 November 2021 (UTC)[reply]
    EEng, this is a trial run, which is done specifically to see if these sorts of issues arise. Clearly, there are major concerns, and based on the last few posts here I'm starting to think that this bot will not be approved without significant overhaul. Primefac (talk) 14:12, 5 November 2021 (UTC)[reply]
    Ya think? How can you possibly have ever thought this could fly? Above I read I've managed to make the bot respect the original indentation much more -- oh, he's respecting the indentation used by discussants, which is critical to following the flow of the discussion, much more? You mean, like, you guys are willing to compromise on that and only make discussions somewhat impossible to follow? EEng 14:19, 5 November 2021 (UTC)[reply]
    Maybe it would be better not to do a trial run on userpages without pre-approval by the users involved? I've reverted the bot at EEng's talkpage, just because it seemed really hard to believe EEng would like the effect. Remember, his talkpage can be seen from space. Bishonen | tålk 14:23, 5 November 2021 (UTC).[reply]
    You're right, I've removed the user talk namespace. Winston (talk) 14:25, 5 November 2021 (UTC)[reply]
    You've removed the user talk namespace, so you're only going to fuck up article talk pages and project guideline talk pages? Well, I guess that's a start.
    You're not getting this. There is no possible way to do what you're doing without screwing up existing pages, because there's a fundamental conflict between the assertions in INDENT (or wherever) and the way people actually format their discussions. What you're trying to do inevitably changes the formatting of existing discussion so that the meaning of editors' comments is changed. You're trying to square the circle, and need to give it up completely. EEng 14:56, 5 November 2021 (UTC) P.S. I just noticed above that the plan is to fuck of project pages (e.g. actual guidelines and policies, not just the talk pages) as well. The lunatics have clearly taken over the asylum.[reply]
    There were 2 trials already done (50 + 50 = 100 edits) which drew basically no negative feedback, which was why this was approved for extended trial of 200 edits. Something looks to have regressed in the newer code that's causing the issues. It looks like @Notsniwiast has stopped the bot now. – SD0001 (talk) 14:25, 5 November 2021 (UTC)[reply]
  • Agree. This thing is jacking up the formatting on talk pages. Sometimes the formatting is there intentionally. Just undid the bot at Talk:Stanley Kubrick for an example. Jip Orlando (talk) 14:14, 5 November 2021 (UTC)[reply]
    @Jip Orlando Sorry about that, was the issue the swapping from bullet/no bullet issue? If the issue was isolated, could you mention which part? Winston (talk) 14:27, 5 November 2021 (UTC)[reply]
    [5] here, it looks like it's tweaking the replyto stuff by moving the discussions to the left and adding bullets where colons where. I understand that it is making the formatting appear consistent, but it is undoing what appears do have been done intentionally. I see the bullets as used for making a salient point and the indents as a reply to the point. Maybe I'm being nitpicky, but having a sudden sea of bullets doesn't make things look organized. Jip Orlando (talk) 14:38, 5 November 2021 (UTC)[reply]
  • Not a fan of having an alert pop up wrt my account talkpage, only to find out it's a semantically void whitespace twiddle. If it had been another person doing the same thing I'd be miffed. More so when it's a mindless thing. ☆ Bri (talk) 14:21, 5 November 2021 (UTC)[reply]
    Yeah sorry, I should have respected the user talk space more. It's been removed from the bot for now. Winston (talk) 14:29, 5 November 2021 (UTC)[reply]
    For the record, user talk page notifications are suppressed when edits are marked as minor edit + bot edit + account has bot flag. I believe you need to add bot=True near here. Also the bot flag has expired. – SD0001 (talk) 14:39, 5 November 2021 (UTC)[reply]
    I forgot the bot flag expired. When the bot flag is on, the edits are automatically marked as a bot edit. Winston (talk) 14:41, 5 November 2021 (UTC)[reply]
  • USER TALK testing should not happen unless this has a bot flag. By combing +bot and +minor attributes this could make use of the (nominornewtalk) feature to not trigger the new message notifications. (This is not an endorsement that this should be currently tested). — xaosflux Talk 14:39, 5 November 2021 (UTC)[reply]
  • @SD0001: I suggest that the operator, @Notsniwiast: needs to go manually review every edit they just made and revert anything that possibly made the page worse. — xaosflux Talk 14:56, 5 November 2021 (UTC)[reply]
    I don't think he can be trusted to do that. What he needs to do is revert everything immediately, and where a page has been edited subsequent to the bot's edit, post a message or something warning watchers to take a look themselves. This is really serious. I cannot believe this got anywhere at all. EEng 14:59, 5 November 2021 (UTC)[reply]
    Yeah I'm reverting right now. Winston (talk) 15:00, 5 November 2021 (UTC)[reply]
  • From my watchlist: I don't want to pile on, but [6] this changed the placement (and hence meaning) of, at least, AlwaysInRed's message. Many of the diffs have large changes, so it is hard to figure out which are problematic. Urve (talk) 15:51, 5 November 2021 (UTC)[reply]
    @Urve Could you partially quote the message so I can find it? Is it "I am the lead-moderator and one"? Winston (talk) 15:53, 5 November 2021 (UTC)[reply]
    Yes. Even if that was an unintentional error, people do purposefully comment in this way (several indents after an outdent), to continue reply to the comment that isn't outdented. Why people do this (instead of a message directly underneath what they wish to reply to), I'm not sure. But the problem is that the meaning is changed if these are all outdented without regard to what they're replying to. Urve (talk) 15:57, 5 November 2021 (UTC)[reply]
  • I think this bot is doomed to failure, should not be approved for ongoing use, and should never have been approved even for the limited testing runs it made. The first thing I saw here was several diffs of completely wrong talk page refactorings and the diff-posters were correct that they were completely wrong. People on discussion pages use indentation to mean different things, that cannot adequately be guessed by a bot, because the meaning of the indentation is in the semantics of what they're saying rather than in the syntax of their comment. Just to pick an easy example, people will choose the indentation level of a comment (among several different indentation levels for a comment placed in the exact same place in the discussion) to indicate to whom they are replying; unless the bot can understand that part of the back-and-forth (and it can't) it cannot correctly adjust the indentation. People will sometimes deliberately choose between *-indentation of their comments or :-indentation of their comments according to how prominent they want that comment will be, and will use both *-indentation and :-indentation for sub-elements within comments as well as for whole comments. Additionally, editors often take significant offense even at careful human refactoring of their comments. This is not a task that can be solved without full human-level AI, which does not exist, and even then is of dubious value. A bot rampage that changes what is meant is a bad thing, and completely unnecessary. We do not need our talk pages to be well structured according to some spec. We need them to communicate with each other. —David Eppstein (talk) 16:16, 5 November 2021 (UTC)[reply]
    • @David Eppstein: What if only edits with no visual difference were made? That is, edits of the sort
      * One.
      :: Two.
      
      to
      * One.
      *: Two.
      
      Winston (talk) 16:25, 5 November 2021 (UTC)[reply]
      • Look at your example above. Look at the wikicode. If you ran your bot on this very page, it would "fix" your first example, rendering your message meaningless. That is the inherent problem here: you can't write logic that will know that this particular instance of "*" followed by "::" should remain because it's an intentional example of an error. You need a human brain for that. Levivich 16:33, 5 November 2021 (UTC)[reply]
        The bot wouldn't fix the first example since it is inside a "syntaxhighlight" tag. Winston (talk) 16:34, 5 November 2021 (UTC)[reply]
        I realized that as soon as I hit publish :-) But most editors wouldn't know to use such a tag. Anyway, what can the bot do about this:
        * One.
        : Two.
        
        Can it tell if "Two" is a new comment or the second paragraph of "One"? What if One were unsigned? Etc. Levivich 16:40, 5 November 2021 (UTC)[reply]
        It would do nothing, since final indentation characters would no longer be altered at all since that would change the visual appearance. Winston (talk) 16:43, 5 November 2021 (UTC)[reply]
        OK, then what about this:
        * One.
        : Two.
        * Three.
        : Four.
        * Five.
        
        Should Two and Four be bullets? Or, alternatively:
        : One.
        * Two.
        : Three.
        * Four.
        : Five.
        
        Is this all one comment with two bullet lists in it, or five different comments? Are we changing to colons to bullets or bullets to colons or nothing? Levivich 16:45, 5 November 2021 (UTC)[reply]
        Nothing would change. Sorry I should clarify, by final indentation character I mean the last indentation character for a line. So for *: it would be :. Winston (talk) 16:48, 5 November 2021 (UTC)[reply]
        Only basic list gaps and non-final characters could be altered. So indentation levels and final bullet/no bullet would not be changed at all. Winston (talk) 16:49, 5 November 2021 (UTC)[reply]
        Heh, you anticipated my next question about indentation levels :-) So these two changes (no change to indentation level, no change to final character) would be two things that are different from the last trial run that was just run? Levivich 16:56, 5 November 2021 (UTC)[reply]
        Correct. The visuals should not change. Winston (talk) 16:57, 5 November 2021 (UTC)[reply]
        Well, technically the visual would change if there was something like ::: followed by ***, since the bot would change the latter to ::*. Winston (talk) 17:01, 5 November 2021 (UTC)[reply]
        Yeah, but it seems like that particular example (::: followed by ***) is just a flat-out mistake, so the change would be for the better for both sighted and non-sighted readers. I think you're right that not changing the indentation level, and not changing the final character, are key to not making a visual change. I'm not a BAG member or anything, but it seems reasonable to me to do another trial run with those modifications you've suggested (and limiting the namespaces for the trial, etc.). It does seem like limiting the bot as you're describing would make the changes invisible to sighted readers. I recognize it won't totally fix the problem that you're setting out to fix (which can't be fixed, because editing text files and using indentation to separate one comment from another is downright stone-age archaic, we might as well use vacuum tubes), but it could improve things without pissing editors off. :-D Levivich 17:05, 5 November 2021 (UTC)[reply]
        Yeah I feel bad for angering/annoying a bunch of people. I was overzealous. Winston (talk) 17:11, 5 November 2021 (UTC)[reply]
        No worries. Heck, many of us annoy EEng just for sport. Levivich 17:18, 5 November 2021 (UTC)[reply]
    • (edit conflict) This would run afoul of WP:COSMETICBOT. Jip Orlando (talk) 16:35, 5 November 2021 (UTC)[reply]
      • Nevermind, this is an exception. Either way, you'll have a horde of mad users cluttering their watchlists. Jip Orlando (talk) 16:37, 5 November 2021 (UTC)[reply]
        • The COSMETICBOT argument is compelling to me but there's more to it than that. If you think that changing talk pages to normalize indentation coding without changing the appearance is helpful as a way to produce semantically clean wikimarkup, you're deluded. :-indentation is never semantically clean. :-formatting is only proper within definition lists, where its actual purpose is to delimit the body of a definition and the indentation is merely a side effect of how this kind of list is formatted. Its use on talk pages for indentation is a hack. As such, the bot's task would be to fill our watchlists with edits while polishing a hack rather than accomplishing anything useful. —David Eppstein (talk) 17:56, 5 November 2021 (UTC)[reply]
          It's not about the semantics. The changes are to help screen readers. I was simply overzealous with the bot, and unfortunately it took until this trial to become apparent. The limited version described above in the comment chain with Levivich should be much better. If you use macOS, you can try reading a list with gaps and/or mixed indents with VoiceOver to see how screen readers are affected. Winston (talk) 18:39, 5 November 2021 (UTC)[reply]
  • This would be better as a script than a bot. Preferably, a script that worked on just one section of a talk page. As a script, editors could manually review/correct mistakes before publishing. Levivich 16:31, 5 November 2021 (UTC)[reply]
  • Look, Winston, I know you're trying to help, but you have only 1800 edits to Wikipedia, and only a handful of those are to talk pages. You don't have the experience to even begin to understand the subtleties of what you're getting into. It's like having someone who's never driven a car start redesigning the highways. EEng 16:44, 5 November 2021 (UTC)[reply]
  • We most certainly do need an Indent Bot. Some editors don't know how to indent, or make human mistakes or simply refuse to, after being given advice. GoodDay (talk) 17:49, 5 November 2021 (UTC)[reply]
    • That's not something a bot is capable of fixing. —David Eppstein (talk) 17:56, 5 November 2021 (UTC)[reply]
      • Wish one could be created, that was capable of doing so. Frustrating, when you read long drawn out discussions, with mis-indents. Throws you off, as to who's responding to who. GoodDay (talk) 18:15, 5 November 2021 (UTC)[reply]
  • It's a good idea and I'm sure there's some sort of bot task that could be approved someday. Editing the wikitext of discussions happens to be just about the hardest bot task I can think of to do correctly. I think starting with the LISTGAP change or otherwise trying to limit the amount of change the bot does would be a good idea. Please ask me if you have any questions; I (unfortunately? lol) have a few years of experience with manipulating discussion wikitext. Enterprisey (talk!) 21:15, 5 November 2021 (UTC)[reply]
    Basically I caught feature creep. I've pared the bot back to simple LISTGAP and non-final-indentation-character INDENTMIX changes. Winston (talk) 21:37, 5 November 2021 (UTC)[reply]

Changes to bot

In the original bot request for a bot to fix indentation, two examples were given. The first example was the removal of a single extra indent (a general fix), and the second was a non-final-indent-character indentmix fix (an accessibility fix). I decided to tackle this request, but caught feature creep and took the idea too far. This ended up making some "fixes" the very opposite, as the last trial demonstrated. I believe the issues brought up (other than procedural issues like editing user talks and missing the bot flag) were due to the features I implemented beyond the original request, and I apologize.

I have limited the bot to listgap and non-final-character indentmix fixes only. Indentation levels and final indentation characters are not changed (so the first example in the original bot request would actually be left alone). Here are some sandbox diffs. These are accessibility changes, and the only noticeable change for sighted readers should be the hiding of “floating bullets” which are bullet points that appear not as the last indent character. For example,

Markup Renders as
:One.
*: Two
*** Three.

One.
  • Two
      • Three.

would become

Markup Renders as
:One.
:: Two
::* Three.

One.
Two
  • Three.

Winston (talk) 10:56, 7 November 2021 (UTC)[reply]

@Notsniwiast: here is just a sample mixed up list - what, if anything would you do to it?
Extended content
  • A
    • A
      • A
        A
        A
        • A
          1. A
          2. A
          3. A
        • A
          1. A
          2. A
          3. A
        A
      • A
      • A
    • A
  • A
xaosflux Talk 13:05, 7 November 2021 (UTC)[reply]
I've just tested it on this list. It does nothing. Winston (talk) 13:09, 7 November 2021 (UTC)[reply]
@Xaosflux: Please see the above. --TheSandDoctor Talk 07:39, 29 December 2021 (UTC)[reply]
  • Trial complete. Closing out the previous trial which was aborted. Winston (talk) 06:29, 5 January 2022 (UTC)[reply]
  • {{BAGAssistanceNeeded}} I'd like to try out the limited version as described above. To recap, there shall be no changes to indentation levels and no changes to the final indent character. The only noticeable visual difference should the hiding of "floating" bullet points. The other changes are reductions in the number of list gaps and amount of indentation-style mixing, which should not be visually noticeable. Here are some fresh diff examples. I can do more sandboxed runs if we're still wary of a trial on the live wiki. Winston (talk) 06:29, 5 January 2022 (UTC)[reply]
    Btw, the ordered pair in the edit summaries represents (# of blank lines removed, # of lines with at least one altered indent character). Winston (talk) 06:51, 5 January 2022 (UTC)[reply]
  • ...Sure. Approved for extended trial (200 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I totally support approving this task in some form. But: although I'd rather not say this, recognizing that people really don't like discussions getting messed up (as you can see above), I must warn you that any more edits that change the meaning of discussions (even in the most insignificant way) or mistakes aren't going to look too good for the request. I'd err on the side of being cautious. From my experience developing reply-link, in the land of Wikipedia talk pages, even if something looks like a mistake, there's a decent chance that it's intentional. Enterprisey (talk!) 07:02, 5 January 2022 (UTC)[reply]
    Understood. If we find a legitimate use of floating bullets affecting meaning, then I can simply prevent the bot from changing * to : , thus preventing bullets from disappearing whether floating or not. I'll do this trial in smaller chunks, posting the diffs for each chunk after I review them and point out diffs where bullet points have been removed. Winston (talk) 07:19, 5 January 2022 (UTC)[reply]
    @Enterprisey Before starting, should the bot be given a (temp) bot flag? Also, minor edits or not minor? Winston (talk) 07:22, 5 January 2022 (UTC)[reply]
    If the previous trials weren't flagged, I wouldn't think this one should be; and I'd mark them as not minor (even though the distinction isn't very important these days) because it's a trial and I think people would be slightly more likely to pay attention to non-minor edits. Enterprisey (talk!) 07:34, 5 January 2022 (UTC)[reply]
    @Notsniwiast, I'd even recommend, to be extra cautious, making sure that the edits don't change the visual appearance of the page (besides removing the "double bullets" error); we can always add more tasks to the bot later. Not sure if you were doing that already (I didn't check); just making a note. Enterprisey (talk!) 08:12, 5 January 2022 (UTC)[reply]
    Yes, only bullets which are not the final indent character get removed. I will pause after 20 edits to show the diffs, pointing out the ones where a bullet has been hidden. Winston (talk) 08:16, 5 January 2022 (UTC)[reply]

Chunk 1 (20 diffs)

  • See here. I'm pausing here to see if any concerns are raised. Winston (talk) 08:54, 5 January 2022 (UTC)[reply]
    The few I checked look fine. If nobody objects in the next day or two, feel free to keep going. Maybe pause again after 100 edits have been made? Enterprisey (talk!) 01:48, 6 January 2022 (UTC)[reply]
    @Notsniwiast, I notice the bot is currently editing your sandbox. If the sandbox edits aren't part of the trial, keep going and ignore this message. However, since you linked to one of them just above, I'm assuming you're counting the sandbox edits as the trial. Since the bot task is for editing actual discussion pages, the trial should be as similar to that usage as possible. That means the bot should edit the actual pages, not just its sandbox, for this trial. Part of the trial, in my view, is making sure that people won't object to the edits, and they won't have the opportunity to object if the edits are made to the sandbox. Enterprisey (talk!) 08:10, 6 January 2022 (UTC)[reply]
    Yup the sandbox edits aren't part of the trial. Not sure which link you're referring to, but when I link to the actual trial diffs I use a permanent url to a revision of my sandbox where I put the diffs, so as not to clutter up this page. Winston (talk) 08:21, 6 January 2022 (UTC)[reply]
    Sounds good. My bad; misread. Enterprisey (talk!) 08:41, 6 January 2022 (UTC)[reply]

Chunk 2 (50 diffs)

  • See here. Winston (talk) 10:07, 6 January 2022 (UTC)[reply]
  • This is the first I'm aware of this bot, and I've not read all the text above so sorry if this has been addressed before, but could the edit summaries be improved please: e.g. "Adjusted indentation per MOS:ACCESS#Lists. Trial edit. (1, 10)" has three parts:
    • "Adjusted indentation..." is sort of OK, but could imply that it is changing the indentation level (which it isn't), "Fixing indentation markup" would be better imo.
    • "Trail edit." is entirely unproblematic
    • "(1, 10)" is cryptic and while potentially useful to the operator for debugging is just confusing for editors who aren't intimately familiar with the bot.
    • The edits summary does not mention that it removed multiple blank lines from lists, or why. Personally I know that this is per MOS:LISTGAP, but not everybody will. I recommend including it in the summary as (a) noting what the bot has done, and (b) noting why it has done it so that people aren't tempted to revert the bot and also learn why they shouldn't leave blank lines in the first place. I do a bit of fixing of lists, and Redrose64 does even more, both of us mention LISTGAP in edit summaries and I've seen positive responses to that. Thryduulf (talk) 13:02, 6 January 2022 (UTC)[reply]
      • Good points. I changed the edit summaries to "Adjusting indentation markup per MOS:LISTGAP and MOS:INDENTMIX. X blank lines removed. Y adjustments of indent markup. Trial edit." Winston (talk) 15:18, 6 January 2022 (UTC)[reply]

Chunk 3 (60 diffs)

  • See here. I am stuck figuring out an error in this one: Diff for Talk:Mass killings under communist regimes. The bot apparently introduced floating bullets to the line beginning with "I don't think it is fair". But when I copy the wikitext into my sandbox here, it looks fine (can anyone confirm this). It also looks fine in the edit preview of Talk:Mass killings under communist regimes. Is wikitext displayed differently in User talk vs Talk or something? I can't reproduce the error (though I haven't tried reproducing it in actual Talk pages and there doesn't seem to be a sandbox Talk page). Winston (talk) 20:22, 7 January 2022 (UTC)[reply]
    Ok so I copied the entire wikitext rather than just the section, and indeed the floating bullets showed up. So I tried to produce a minimal reproducible example (the original talk page is over 600k bytes), and have discovered that it has something to do with links. Consider this revision (excuse the gibberish, I did some transformations to reduce the page size). The line we are interested in is the one containing "conclusions were rejected". Notice the floating bullets. Now edit the page and delete the wikilink to water at the end of the wikitext (deleting some other wikilink may work too). Notice how the floating bullets are gone. Instead of deleting a wikilink, you can delete the first template on the page and the floating bullets also disappear... Not sure what's going on. Winston (talk) 01:42, 8 January 2022 (UTC)[reply]
    • From a discussion at WP:VPT, this is probably a GIGO issue due to a newline inside a wikilink. I had thought that such newlines were allowed since they seemed to work ok, but apparently not. To keep the fix simple and to be extra conservative, I'm having the bot simply refuse to perform any indentmix fix on the page at all if it encounters a wikilink containing \n.
      I did not see anything unexpected in the other edits for this chunk. Winston (talk) 00:13, 9 January 2022 (UTC)[reply]

Chunk 4 (70 diffs)

  • See here. There was only one error here where a bullet point was introduced. This was due to a template creating a table which the bot did not anticipate. The bot now expands templates to check for tables. Trial complete. Winston (talk) 06:59, 10 January 2022 (UTC)[reply]

Gonna take a break. Code is still available. Withdrawing this request. {{BotWithdrawn}} Notsniwiast (talk) 05:15, 13 January 2022 (UTC)[reply]

Well, as the trial has been completed, assuming no issues are found, you don't have to do anything more as of now. If this is approved, you can start running the bot whenever you return. – SD0001 (talk) 09:51, 13 January 2022 (UTC)[reply]
SD0001, are you approving this request, or are you accepting their withdrawal? Primefac (talk) 15:10, 23 January 2022 (UTC)[reply]
@SD0001? theleekycauldron (talkcontribs) (she/they) 10:10, 13 February 2022 (UTC)[reply]
Left a note at their talk page. Primefac (talk) 14:18, 27 February 2022 (UTC)[reply]

Back

Hi I'm back from break. If people feel this bot will be useful let's continue. Only thing I'm worried about is if people think it's a nuisance relative to the number of people it helps. Winston (talk) 20:01, 27 February 2022 (UTC)[reply]

  • David Eppstein, sorry to put this on you, but you pursued the issues here farther than I did. Are your (our) concerns all addressed? EEng 20:21, 27 February 2022 (UTC)[reply]
    • I think my concerns are addressed by "I have limited the bot to listgap and non-final-character indentmix fixes only. Indentation levels and final indentation characters are not changed". —David Eppstein (talk) 20:26, 27 February 2022 (UTC)[reply]