Jump to content

User talk:Headbomb/unreliable/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Lowercase sigmabot III (talk | contribs) at 07:07, 4 August 2021 (Archiving 1 discussion(s) from User talk:Headbomb/unreliable) (bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Improved version

At User:SD0001/unreliabe.js.

@Headbomb: You can copy over the changes.

  • When I initially wrote this, I mistakenly assumed the code will highlight the whole citation (rather than just the link). Today when I tested this for the first time, I saw this isn't the case. I've made it so now. I think this is desirable?
  • I've tweaked the entire structure (clubbed together the regexes with the corresponding CSS styles) which should make this easier to maintain.
  • Fixed the bug with "10\.1011\/\d+"
  • Now text content of all list items will also be checked. This should be good enough to catch stuff in further reading, bibliography sections. I know you asked for searching the whole page text, but I am not sure whether that is necessary. That is quite complicated to write, and by checking each and every text node on the page with each of the dozen-odd regexes, some of which are huge, I suspect there will be an impact on performance.

I haven't tested this exhaustively so let me know if anything isn't working.

SD0001 (talk) 10:05, 16 February 2020 (UTC)

@SD0001: I'll test and give some feedback shortly. Headbomb {t · c · p · b} 10:07, 16 February 2020 (UTC)
I prefer the original of only highlighting links, mostly because this can allow for more nuance when a reference has multiple links in it, but also because not everything is within <ref></ref> tags. See User:Headbomb/unreliable#Common_non-problematic_cases for example, but you could also have something like citation with url link to Scribd and a DOI. But here only the url to Scribd is problematic. The 1101 thing seems to work, which is weird because I thought I had tried that myself. Either way, very useful to finally have that fixed. Headbomb {t · c · p · b} 10:10, 16 February 2020 (UTC)
Testing list content might be a good compromise vs whole page however (whole page would still be preferable though). It doesn't work on the last column of User:Headbomb/unreliable/testcases, but it does work in other lists elsewhere. Headbomb {t · c · p · b} 10:45, 16 February 2020 (UTC)
@Headbomb: should work now. SD0001 (talk) 10:47, 16 February 2020 (UTC)
Ok, not on that last column. But if you pull that out of the table, it will work. SD0001 (talk) 10:49, 16 February 2020 (UTC)
That leads to another bug. We need a different regex for matching "/" when it's in a link vs when its in text. Looking into that now. SD0001 (talk) 10:52, 16 February 2020 (UTC)
fixed. SD0001 (talk) 10:53, 16 February 2020 (UTC)
@SD0001: Not in the column. Headbomb {t · c · p · b} 10:54, 16 February 2020 (UTC)
Ok, that's also fixed. SD0001 (talk) 11:00, 16 February 2020 (UTC)

@SD0001: Alright, so that's possibly a framework for some Frankenstein solution. What it would need now is some sort of logic that if there's a link you check the link, since that's better and more targeted. But if there isn't, then check the list. Because right now, it will highlight whole citations that aren't problematic [1], instead of just the link [2]. And this should really only be done as a fallback. Headbomb {t · c · p · b} 11:03, 16 February 2020 (UTC)

@Headbomb: Done. SD0001 (talk) 11:15, 16 February 2020 (UTC)
@SD0001: That looks really, really promising. Headbomb {t · c · p · b} 11:17, 16 February 2020 (UTC)
@SD0001: works well, but numbered lists don't seem to work. Headbomb {t · c · p · b} 16:59, 16 February 2020 (UTC)
@Headbomb: done SD0001 (talk) 18:03, 16 February 2020 (UTC)
@SD0001: yup. And with this, I believe we have a stable script. Many many thanks. Headbomb {t · c · p · b} 18:05, 16 February 2020 (UTC)

How to use?

I saw your recent post at WT:MED. I followed the first 3 steps of installation, but now what are the steps to actually using the script on an article? Thanks! Sorry if I missed that somewhere. I scanned things, and it did not appear obvious. I tried a couple different things to bypass the cache (I'm currently using google chrome on a chromebook), so I hope that's not the issue. Also, I'm in the middle of rewriting deep vein thrombosis (DVT). Might you be able to scan DVT for any of the potentially questionable sources / journals / publishers you posted about at WT:MED and post any hits on the DVT talk page to make sure I'm providing high-quality information? Thank you for your efforts here. Biosthmors (talk) 15:29, 14 February 2020 (UTC)

@Biosthmors: basically, you just read the page and it should work right away. You can check Science Publishing Group and you should see 3 red highlighted links or User:Headbomb/unreliable/testcases and you should see things working (at least on some of them, I'm still tweaking to expand things to cover more). Headbomb {t · c · p · b} 15:32, 14 February 2020 (UTC)
Thanks. It's working. I saw 3 pink-highlighted links at Science Publishing Group. Biosthmors (talk) 15:39, 14 February 2020 (UTC)
@Biosthmors: In deep vein thrombosis, you should see DOIs that start with 10.2147 (Dove) / 10.4103 (Medknow) / 10.1155 (Hindawi) highlighted in yellow. Keep in mind that those are more grey-area and might even not be problematic to begin with. Headbomb {t · c · p · b} 15:40, 14 February 2020 (UTC)
Thanks. I like the yellow color. It's subtle, and I missed the highlighting at first because I was scanning for something more obvious. But I like it because it pairs with the level of concern I have about the sources. (I saw 3 highlighted in yellow at DVT.) I don't have any plans to remove them or recite this material at the present, but thanks again for helping me check this out. Biosthmors (talk) 15:53, 14 February 2020 (UTC)
Totally and utterly awesome. Thank you, Headbomb! SandyGeorgia (Talk) 15:18, 19 February 2020 (UTC)

JSON?

@SD0001: would there be a way to make use of a JSON-like structure somewhere? Perhaps as a separate subpage?

{
  "Name":    "Publisher",                          // Name of the publisher/journal/website
  "Domains": ["Publisher\.org", "Publisher\.com"], // Regex for URL matches
  "DOI":     ["10\.1234", "10\.4321"],             // Regex for DOI matches
  "Class":   "MEDRS",                              // BL, GUN, PRED, ... to set the CSS classes/colors
  "Note":    "See [[WP:RSPSOURCES#Publisher]]"    // Comment to understand why something is listed, possibly to be displayed as a tooltip
}

(not pretending the above is syntaxically correct, treat this as pseudocode) Headbomb {t · c · p · b} 13:33, 20 February 2020 (UTC)

Note that I can build that JSON file myself once I have a working example/the syntax for it. Headbomb {t · c · p · b} 00:46, 21 February 2020 (UTC)
I was initially wanting this to simply makes things easier to manage, but it would open up certain possibilities. Headbomb {t · c · p · b} 13:43, 20 February 2020 (UTC)
Headbomb, Oh, I would love that. I already have a json file that I use to quickly check on all the online sources in an article. If i could just merge in something like the above in stead of maintaining it myself that would be super! Vexations (talk) 14:38, 20 February 2020 (UTC)
Prototype JSON file at User:Headbomb/unreliable.json. Headbomb {t · c · p · b} 11:01, 21 February 2020 (UTC)
This largely sounds like a good idea. Regarding the schema you have above, there's no need to put the backslashes before periods (in the domain and DOI fields). These can be inserted by the JS script while constructing the regex expression. Also, I'd say keep the domain and doi fields as arrays only if there are multiple of them (see User:SD0001/unreliable.json). This cuts down on the size of the page, and client scripts can easily convert the single-item to an array for processing should they need it. That being said, I'm no expert in writing good JSON schemas. It'd be useful to get advice from more editors.
The text of the "Note" field can probably be displayed as a hover tooltip when the user mouses over the highlighted text. SD0001 (talk) 05:00, 25 February 2020 (UTC)
Well, concerning the backlashes, the software keeps throwing "bad strings" errors at me when they aren't there. For the arrays, 2 extra bits seem pretty cheap for code consistency. No real opinion on the rest, as long as it works. Headbomb {t · c · p · b} 08:14, 25 February 2020 (UTC)
@SD0001: any progress? Headbomb {t · c · p · b} 21:26, 26 February 2020 (UTC)

Suggestion

This looks like a very nice script. I noticed, though, that there is a little functionality overlap with User:SuperHamster/CiteUnseen.js, which uses a really comprehensive list of conspiracy sites, fake news, biased sites, etc. That same list also has a list of generally reliable sources. Just my two cents, maybe we might also want to ping SuperHamster about this idea as well, since he created the list. epicgenius (talk) 03:50, 27 February 2020 (UTC)

Definitely something to look at. Thanks for the suggestion. Headbomb {t · c · p · b} 03:55, 27 February 2020 (UTC)
Thanks for the ping, epicgenius. Headbomb - thrilled to see more work being done in this space! I'm also going to ping @Newslinger: - we've been looking into taking Wikipedia' perennial sources list and creating a structured data format for it all, and then incorporating it into Cite Unseen (and could be used for other tools as well, of course). ~SuperHamster Talk Contribs 04:03, 27 February 2020 (UTC)
P.S. full docs / info about Cite Unseen can be found at m:Cite Unseen. ~SuperHamster Talk Contribs 04:04, 27 February 2020 (UTC)

Colour suggestion

I saw your post at the DYK talk page; this looks very useful, and I will install it tonight. Would it be possible to tweak the colour used for "marginally reliable" sources slightly; it's just because the pale yellow is very similar to the colour used in another commonly used script to signify Disambiguation pages, and I must admit I've used it for so long I completely associate that colour with Disambiguation pages now! Cheers, Hassocks5489 (Floreat Hova!) 15:02, 21 February 2020 (UTC)

@Hassocks5489: I use the same script, but since this doesn't highlight a wikilink, but rather external links and list elements, there's no overlap between the two. There might be customizable colours down the road, but for now this is beyond my coding skills. Headbomb {t · c · p · b} 15:51, 21 February 2020 (UTC)
Thanks for your reply, and no worries; it won't take long to get used to. Hassocks5489 (Floreat Hova!) 16:07, 21 February 2020 (UTC)
I second the request for color customization. :-) In my case, the yellow is nearly invisible on my screen. Sunrise (talk) 06:48, 27 February 2020 (UTC)

Summary tools?

In response to the request for suggestions, I think it would be valuable to be able to summarize the results we get from the script. In particular, the things that come to mind (if they're possible) are counting the number of links of each type on a particular page, and the ability to categorize pages based on those links. For instance, I'm hoping this could be used to create Category:Articles using predatory journals as sources and equivalents. Sunrise (talk) 06:48, 27 February 2020 (UTC)

@Sunrise:, see {{Predatory publisher}}, which could probably be updated to add categories similar to {{citation needed}}. Headbomb {t · c · p · b} 07:02, 27 February 2020 (UTC)
With regards to the category idea, I think I wasn't clear enough (for that matter, come to think of it I suppose such a category might already exist!) I was imagining it as something populated automatically instead of by tagging, but I don't know what that would actually involve. I suppose creating and maintaining it would be different issues, but ideally it might create something like a list of all articles (of a particular category?) sorted by number of hits per article, or something along those lines. That said, I think just a count would be useful as well. Sunrise (talk) 07:38, 27 February 2020 (UTC)
Well populating automatically can't really be done. A bot report of such sources could be made, similar to WP:CITEWATCH (see a semi-related botreq). Headbomb {t · c · p · b} 07:58, 27 February 2020 (UTC)

Bullet points

  • If you use bullet points while discussing a source in regular text, the script will act on the entire paragraph (as demonstrated here, as long as I mention a source like fb.com). I'm not sure if this is intended, since the documentation mentions checking list items. However, in the case of blacklisted sources the script will create multiple nested boxes if the comment is indented using multiple asterisks. Sunrise (talk) 06:48, 27 February 2020 (UTC)
@Sunrise: It's intended, but it's not the cutest implementation ever. For instance your above comment is highlighted, because you mention fb.com and have started your comment with a bullet (which makes it a list element), but not this one (because it starts with a colon, meaning it's not in list form). A side effect is that discussion pages can be a bit weird because sources are often discussed and comments often made in list form, but the script does highlight what it's meant to highlight: a list item that mentions a potentially problematic source. It can be weird to see this on talk pages, but I have no good solution to the weirdness. Disabling it on talk pages seems not ideal, given you could be having a discussion about making content based on problematic sources, and then this warns you about it, at least when in external links or bulleted comments.
For the blacklisted source issue, I'd have to see an example. Headbomb {t · c · p · b} 06:59, 27 February 2020 (UTC)
      • Thanks for the reply! Mentioning 112.ua with multi-asterisk indenting. Sunrise (talk) 07:38, 27 February 2020 (UTC)
The technical cause of the issue is each list element does contain the problematic 112.ua, and the script can't (at the moment) figure out to apply things to only the last one. But the weirdness here is also caused by a 2px border being applied 3 times to different element. I'll be doing a bit of thinking. Removing the border should be enough, but that's a bit hackish. Headbomb {t · c · p · b} 08:06, 27 February 2020 (UTC)
@Sunrise: see the new look. It will still color all bullet levels, but the underline instead of borders should make things better. Headbomb {t · c · p · b} 08:07, 27 February 2020 (UTC)

Forbes = marginally reliable?

See #488 and several others at 2019–20_coronavirus_outbreak#References --valereee (talk) 21:25, 27 February 2020 (UTC)

@Valereee: See WP:RSPSOURCES#Forbes and WP:RSPSOURCES#Forbes contributors. Headbomb {t · c · p · b} 21:33, 27 February 2020 (UTC)
Headbomb, hm...Forbes is good, but Forbes.com is sometimes suspect. But of course most of what is covered in Forbes is going to get released on Forbes.com, too, and the script doesn't know how to tell them apart. How annoying is that! :D --valereee (talk) 13:53, 28 February 2020 (UTC)

academia.edu?

Hey, Headbomb! See Sentientist Politics bibliography section Johannsen -- it's getting a highlight from the script, but it's an academic journal? --valereee (talk) 11:36, 3 March 2020 (UTC)

@Valereee: See User:Headbomb/unreliable#Common cleanup and non-problematic cases, general repositories. In this case, the linked version happens to be a preprint. No really super problematic, but if it's used as a source, you'll want to confirm that it's not substantially different from the published version Headbomb {t · c · p · b} 15:07, 3 March 2020 (UTC)
Headbomb, I am learning quite a lot from this script and you! :D --valereee (talk) 19:03, 3 March 2020 (UTC)

IMDb

It is not very reliable as source, but most of the time I see it in articles it's the entry of that movie/person/whatever in the weblinks, where the link is perfectly fine. Is it possible to filter that out somehow? --mfb (talk) 09:32, 6 March 2020 (UTC)

Mfb, you mean in an External links section? I actually find the highlighting useful even there, as if there are multiple external links, sometimes the IMDb link represents excessive external linking and can be culled if the other more reliable sites provide essentially the same information. --valereee (talk) 12:11, 6 March 2020 (UTC)
Yes, in that section. If IMDb has an entry about that person/movie... then it's good to have that in the article. --mfb (talk) 19:19, 6 March 2020 (UTC)
I can't filter by section. Also see the bit about external links in User:Headbomb/unreliable#Common cleanup and non-problematic cases. Headbomb {t · c · p · b} 19:22, 6 March 2020 (UTC)

Query re two sources

I was surprised at a recent GA review (Coronariae) to see two sources that I consider reliable, flagged.

  • ResearchGate - whatever you think of their practices, the research papers they provide access to are clearly reliable
  • Zenodo, an open access project of CERN

I can't see how either of these could be considered unreliable - should they be excluded? --Michael Goodyear   21:32, 13 March 2020 (UTC)

I was surprised by this, too. - Dank (push to talk) 21:34, 13 March 2020 (UTC)
Oh I see you have already responded on the talk page, thanks --Michael Goodyear   21:40, 13 March 2020 (UTC)

I'll copy-paste my reply from Talk:Coronariae/GA1 here so others with the same question can find this.

See WP:UPSD#Common cleanup and non-problematic cases, "General repositories" for your answer. Basically RG and Zenodo is user-uploaded and has no filtering system, so will often host preprints and articles from predatory journals. Hence why the links are in pale yellow (meaning double check, rather than probably problematic).

Headbomb {t · c · p · b} 21:49, 13 March 2020 (UTC)

Makes sense, thanks. - Dank (push to talk) 21:51, 13 March 2020 (UTC)
@Dank and Michael Goodyear: doing a deeper review, I did find some predatory publishers on Coronariae though (http://www.cibtech.org/index.htm), so you might want to update the article (should be trivial, since two refs back the same fact, just use the non-predatory one and remove the other). The script will now pick that one up. Headbomb {t · c · p · b} 21:56, 13 March 2020 (UTC)

Update

Headbomb, please sync with User:SD0001/unreliable.js.

Improvement made: now when you hover over any link/list item highlighted by the script, you are told why the highlighting has been done ("Deprecated source", "Blacklisted source", "Source that traditionally fails WP:MEDRS, but could be used for other more routine claims", etc). Should alleviate the concerns raised in the previous section. Also, now users will not need to memorise the color codes (or refer to the documentation too often).

SD0001 (talk) 11:48, 14 March 2020 (UTC)

@SD0001: Done, thanks. Opens up a lot of possibilities for more categories and better advice. Headbomb {t · c · p · b} 18:14, 16 March 2020 (UTC)

Next steps with deprecated sources

I really like how your script works Headbomb as it finds sources hidden in plain sight that are not reliable. I want to ask a logistics question -- is there guidance for how to remove sources or claims or text that cites these sorts of sources that are blacklisted or deprecated? I imagine this has to be done on a case by case basis, but am wondering if there is a checklist or other steps that are recommended for dealing with those that may be used. Thanks. --- FULBERT (talk) 13:15, 18 March 2020 (UTC)

@FULBERT: I'm not aware of any specific guidance. I mostly use common sense. If the claim is plausible, I simply replace the source with [citation needed] [unless it's part of multiple sources for the same thing, in which case I just remove it]. If it's MEDRS/BLP related, I remove the passage entirely. If it's one of those indiscriminate list of publications, I usually remove the entirely list, because those are typically WP:INDISCRIMINATE/WP:NOTCV violations anyway. Headbomb {t · c · p · b} 13:34, 18 March 2020 (UTC)
Headbomb, Thanks; this is an area of editing I have not done much with so will follow your guidance the next time I see something red or black in this way. Appreciate the suggestions. FULBERT (talk) 13:40, 18 March 2020 (UTC)

"Misleading journal metric"—oaji.net

Sources such as the following paper hosted at oaji.net are flagged as "Misleading journal metric"—red, but a different shade than generally unreliable. I am having trouble finding out what it means in terms of reliability. Is it a synonym of predatory journal, and if so, can we have an explanation on User:Headbomb/unreliable#What it does?

  • Šmigeľ, Michal (2017). "Anti-Semitism in Slovakia in Post-War Years 1945 – 1948: A Period of "Common People's Anti-Semitism"" (PDF). Population Processes. 2 (1). doi:10.13187/popul.2017.2.35. {{cite journal}}: Cite has empty unknown parameter: |1= (help)

(The paper is listed as further reading at Partisan Congress riots—it duplicates information that is available in a less dodgy Slovak-language publication by the same author.) buidhe 14:29, 17 March 2020 (UTC)

@Buidhe: http://oaji.net/ is the Open Academic Journal Index, which mostly exists to be an indiscriminate repository of journals which publishes bunk journal metrics (e.g. fake impact factors or similar). It's not a good sign when a journal is in those index, but the issue flagged here is the index itself. The journal may be problematic too, but I'll dig around some more to see if there's a way to tweak things. Headbomb {t · c · p · b} 13:40, 18 March 2020 (UTC)
I've replaced the link with [3] instead. Headbomb {t · c · p · b} 13:47, 18 March 2020 (UTC)

request to add

I've just been reminded of a "publisher" of public domain texts that sells them on Amazon etc. They are not reliable sources; anyone can cobble together unvetted PD text and try to sell it. With no editorial oversight to speak of, they are "self-published". So I'm wondering if you can add "Delphi Classics" as an unreliable publisher. Here is a blog I found that details some issues.[4] And yeah, there are some uses of these books on Wikipedia (search). Thanks! Outriggr (talk) 05:00, 15 March 2020 (UTC)

@Outriggr: I've added delphiclassics.com, but keep in mind that it won't find |publisher=Delphi Classics, but rather only the ones that have a link to the website. Headbomb {t · c · p · b} 07:31, 15 March 2020 (UTC)
@Outriggr: Searches will find you a bunch of them however. See Delphi Classics. Headbomb {t · c · p · b} 19:43, 18 March 2020 (UTC)

Malfunction?

Hey Headbomb, first off thank you for this script, it is very useful. However, it looks like there might be a malfunction. I'm seeing the entire Main Page highlighted red, including TFA, ITN, and On This Day. Just wanted to make sure you're aware. -- LuK3 (Talk) 18:00, 16 March 2020 (UTC)

  • I just came to mention this, too - a few minutes ago, anything in a bullet list or within certain templates (including list and all ref templates) are now in red. Kingsif (talk) 18:01, 16 March 2020 (UTC)
@LuK3 and Kingsif: sorry I missed this and didn't reply earlier. You just happened to view Wikipedia in a 2 minute window where I screwed up the script. Headbomb {t · c · p · b} 10:04, 23 March 2020 (UTC)

Always on?

I have a couple of text highlighting scripts now that are always on, is there a way to turn it off/on via the tools sidebar now? I didn't see an option at a glance (haven't installed it yet). I would love if I could easily flip it on when I am source reviewing and have it off in general, in case that is not a current feature, and in case that is easy to install, but I get it if others don't want it to function like that :). Kees08 (Talk) 16:52, 24 March 2020 (UTC)

I bet most people would not want this feature, and I can't think of any easy way to implement it, but I will leave the request in case you are more creative than me. Kees08 (Talk) 17:00, 24 March 2020 (UTC)
Not that I'm aware of, but maybe SD0001 (talk · contribs) has an idea here. Maybe a sort of Ctrl+Shift+U key press combo could toggle on and off, but I'm not sure it's possible. As a note, I have a few text highlighting scripts myself (including Anomie's link classifier), and there doesn't seem to be any problem using many at once here. Headbomb {t · c · p · b} 21:11, 24 March 2020 (UTC)
You can set up the script to load on-demand:
$.when(mw.loader.using('mediawiki.util'), $.ready).then( function() {
    var link = mw.util.addPortletLink('p-tb', '#', 'UPSD', '#t-upsd', 'Highlight unreliable sources', 'U');
    link.addEventListener('click', function(e) {
        e.preventDefault();
        importScript('User:Headbomb/unreliable.js');
    });
});
Maybe we should offer this as an option in the core script, but until then you can put the above in your common.js, instead of the standard importScript('User:Headbomb/unreliable.js'); line. Clicking on the "UPSD" option (or using the keypress Alt+Shift+U or Alt+U) turns on the higlightings. To turn them off, just reload the page and they'll go. SD0001 (talk) 15:27, 25 March 2020 (UTC)
How splendid, this does fine for me. I will add this to my article review toolkit. If it ever becomes a feature just hit me with a ping if you remember, otherwise this works for me. Kees08 (Talk) 14:55, 3 April 2020 (UTC)

False positive on Wikipedia:Citation expander?

Hello, on Wikipedia:Citation expander the whole section "1." explaining the 2 methods that can be used is marked in bright red as a predatory journal. Redalert2fan (talk) 16:23, 3 May 2020 (UTC)

@Redalert2fan: From testing the page with the Preview function, it looks like the script is picking up the example DOI parameter:
  • {{cite journal |doi=10.1234/ABC123}}
I would guess that this specific DOI corresponds to a flagged source, so it would need to be replaced with a different example. I changed it to 10.1000/ABC123 and it stopped the script from activating. Sunrise (talk) 23:29, 5 May 2020 (UTC)
Sunrise, Yes that seems to have been the problem, Thanks for changing it! Redalert2fan (talk) 08:34, 6 May 2020 (UTC)

Nice script!

Thanks for making this! I just noticed that Facebook was not listed as an unreliable source. buidhe 14:32, 14 February 2020 (UTC)

@Buidhe: that's mostly because it wasn't listed in WP:RSPSOURCES. But it's clearly twitter-like, so I'll add it to the generally unreliable. Headbomb {t · c · p · b} 14:46, 14 February 2020 (UTC)
 Done [5] Headbomb {t · c · p · b} 14:48, 14 February 2020 (UTC)
Likewise, YouTube is marked as unreliable but similar sites Vimeo, twitch.tv, and Dailymotion are not. Could they also be added? buidhe 03:58, 21 February 2020 (UTC)
Yup, easily.  Done Headbomb {t · c · p · b} 04:34, 21 February 2020 (UTC)
  • Be still, my beating heart. If they won't build it, happy to use this instead. Thank you! czar 23:47, 16 May 2020 (UTC)

Some more entries for the "generally unreliable" section

Two different regexes (first one is from COIBot, second is from edit filter 1045, to catch blogs and self-published websites, thought it might be worth tweaking/combining them and adding them to the generally-unreliable list:

  • \bblog(?:cu|fa|harbor|mybrain|post|savy|spot|townhall)?\.(com|in)\b
  • \b(angelfire|blogger|blogspot|geocities|livejournal|rootsweb|wordpress)\.\w{2,3}

Maybe rearrange them to something like this:

  • \bblog(?:cu|fa|harbor|mybrain|post|savy|spot|townhall|ger)?\.\w{2,3}
  • \b(angelfire|geocities|livejournal|rootsweb|wordpress)\.\w{2,3} already  Done

creffett (talk) 19:47, 17 April 2020 (UTC)

@Creffett: Sorry I somehow missed this for a month+. I think most of those are already covered, but I'll double check to make sure I haven't missed a few. Headbomb {t · c · p · b} 23:31, 17 May 2020 (UTC)

Custom rules?

Hey Headbomb (and other watchers) - how practical would it be to add support for user-added custom rules? JS isn't a language I'm great at (and I don't know the MediaWiki API), otherwise I'd suggest specific changes, but I'd imagine it working something like this:

  • User creates User:USERNAME/unreliable-rules.js, which would contain JSON formatted the same as the rules var. Handmade for now, could eventually create a script to help with it.
  • Script checks for existence of User:USERNAME/unreliable-rules.js, if it exists append the unreliable-rules.js rules to the rules var (this way the built-ins take precedence in case you have a custom entry which later gets added to the main module).
    • Maybe also have some kind of sanity checking when adding the rules? i.e. make sure that the comment, css, and regex fields exist before pushing the rule into the rules list.
  • Everything else should be plug-and-play.

Thoughts? creffett (talk) 19:56, 17 May 2020 (UTC)

Probably a question more for @SD0001: than me. Although if you have specific sources that are crap, it's probably a good idea to let me know here (or WP:RSN) so everyone can benefit from them. Headbomb {t · c · p · b} 23:30, 17 May 2020 (UTC)
I actually went ahead and put something together (apparently my JS isn't quite as bad as I thought...just don't look too much at the number of revisions I went through). You can see my changes at User:Creffett/unreliable.js - basically it looks for User:USERNAME/unreliable-rules.js and tries to load an array named unreliableCustomRules (formatted the same way as the existing rules var) and merge that into the rules list. Did some basic testing on it and it seems to work as expected. creffett (talk) 17:21, 18 May 2020 (UTC)
@Creffett: synced. Please add instructions somewhere in User:Headbomb/unreliable for how to add custom rules. Headbomb {t · c · p · b} 17:33, 18 May 2020 (UTC)
There were a couple of small issues with the javascript. I've fixed these up in User:SD0001/unreliable.js. Please sync it. The explanations are in the latest few edit summaries should Creffett be interested. SD0001 (talk) 08:17, 20 May 2020 (UTC)
Done. Thanks. Headbomb {t · c · p · b} 10:29, 20 May 2020 (UTC)

Source without URL and DOI

Can this handy tool identify any source without URL and DOI to see if it is a reliable source? For example, Adaptive_behavior#cite_ref-Heward_1-0? If such a feature has not added yet, I would recommend to folks that consider adding it! Thank you. --I am pleased to meet you (talk) 17:25, 21 May 2020 (UTC)

In theory it's possible. In practice you need a very rigorous string (series of characters) that won't match everything else. For example, if you just quote with "Heward" to find that author, you'll find every other author named "Heward". So that would be an example of what a bad pattern would be. It's also something that could grow to be very very hard to maintain in the long run. Headbomb {t · c · p · b} 18:49, 21 May 2020 (UTC)
In practice you need a very rigorous string (series of characters) that won't match everything else. "Title" seems to be viable? In this case, the title is Exceptional Children. I am pleased to meet you (talk) 19:21, 21 May 2020 (UTC)
Exceptional Children is a journal, not even an article's title. That source must be problematic. I gonna take it down. I am pleased to meet you (talk) 19:25, 21 May 2020 (UTC)
This is a string that appears in several articles [6] and in several unrelated citations/journals/titles, like Teaching Exceptional Children. So that would be an example of a bad string. It's very hard to match on generic sounding titles without causing more issues than you solve. Headbomb {t · c · p · b} 19:28, 21 May 2020 (UTC)
Ahh, I see. Thank you for your detailed explanation. Also, I am impressed by your forethought. Cheers! I am pleased to meet you (talk) 19:36, 21 May 2020 (UTC)

Thanks

Thanks for this script, and thanks also to User:SD0001, User:Jorm User:creffett. It is very handy. -- Colin°Talk 20:05, 26 May 2020 (UTC)

Record charts to avoid

Hello. I was wondering if you could add the websites listed at Wikipedia:Record_charts#Websites_to_avoid. Thanks! --MrLinkinPark333 (talk) 23:55, 29 May 2020 (UTC)

@MrLinkinPark333: Could... at what level should those be highlighted? Headbomb {t · c · p · b} 21:48, 1 June 2020 (UTC)
I would say Generally unreliable but I didn't create the list. --MrLinkinPark333 (talk) 22:27, 1 June 2020 (UTC)

Strange coloration of AfD comment with no link

In Wikipedia:Articles for deletion/Steven L. Tuck, the first comment (by P Aculeius) is colored entirely pink for me, despite no markup to that effect and despite no external links within it. I think it must be this script, because it doesn't happen not-logged-in. Any idea what might be triggering this? —David Eppstein (talk) 18:15, 12 June 2020 (UTC)

It sees "Amazon.com" Headbomb {t · c · p · b} 18:17, 12 June 2020 (UTC)
Should we really be highlighting that when it is not part of a url? —David Eppstein (talk) 18:18, 12 June 2020 (UTC)
See User:Headbomb/unreliable#Limitations, with the second Deprecated.com example (and third bullet in the nutshell banner). Not highlighting it would mean missing out on many 'manual' citations to crap sources. Headbomb {t · c · p · b} 18:22, 12 June 2020 (UTC)
I'd also argue that the script is working as intended here, given the comment is "there are some blurbs from reviews of his History of Roman Art on Amazon.com" and Wikipedia:Reliable sources/Perennial sources#Generally unreliable:Amazon specifically mentions user reviews as being unreliable. Headbomb {t · c · p · b} 18:29, 12 June 2020 (UTC)
The intent would have been a lot clearer if it highlighted only the "Amazon.com" text and not the whole paragraph. —David Eppstein (talk) 20:30, 12 June 2020 (UTC)
For sure, but apparently that's not really feasible for technical reasons I don't understand. Headbomb {t · c · p · b} 21:52, 12 June 2020 (UTC)

Add

Hi, can you add rt.com per WP:RSP#RT (or what's your procedure for choosing what to add)? czar 02:52, 16 June 2020 (UTC)

It's already there? Headbomb {t · c · p · b} 12:35, 16 June 2020 (UTC)
I'll move it from generally unreliable to deprecated however. Headbomb {t · c · p · b} 12:37, 16 June 2020 (UTC)
Huh, sorry about that. I might have been looking at a diff, which appears to not load the script czar 22:21, 16 June 2020 (UTC)
@Czar: Works for me there. Headbomb {t · c · p · b} 19:34, 19 June 2020 (UTC)
Huh. Strange. Working for me too but it definitely wasn't before... Thanks czar 20:46, 19 June 2020 (UTC)
Resolved

Some more unreliable sources please

royalark.net, thepeerage.com, worldstatesmen.org - all noew deprecated. Also 4dw.net/royalark, which is the actual host for royalark. Thanks! Guy (help!) 18:56, 18 June 2020 (UTC)

@JzG: done. Headbomb {t · c · p · b} 19:37, 19 June 2020 (UTC)
Headbomb, thanks Guy (help!) 21:22, 19 June 2020 (UTC)

A couple updates

I've noticed that RSP currently lists Jezebel as marginally reliable and has upgraded PinkNews to generally reliable (as of yesterday). Both are flagged by this script as generally unreliable; could this be updated? Armadillopteryx 14:36, 16 August 2020 (UTC)

PinkNews just went through RSN, and was closed and archived yesterday. Yes, this would need updating. Normal Op (talk) 15:15, 16 August 2020 (UTC)
Done. Headbomb {t · c · p · b} 17:07, 19 August 2020 (UTC)
Thanks! Armadillopteryx 18:18, 19 August 2020 (UTC)

blogspot.com

blogspot.com (Blogger) should be added as generally unreliable per WP:RSP. I see that the script includes blogger.com but not blogspot.com. Most blogs on the site use the blogspot.com domain I think. SD0001 (talk) 05:59, 30 August 2020 (UTC)

@SD0001: done. Headbomb {t · c · p · b} 15:31, 30 August 2020 (UTC)

Weebly

I have on several occasion missed that a source was a *.weebly.com domain, which is a free website creator cited both by well meaning editors (it often hosts school projects) and up-and-coming rappers. Would be helpful to add. (I may also have begun relying too much on this script ) – Thjarkur (talk) 22:15, 12 September 2020 (UTC)

@Þjarkur: Sure, I'll add it. Marked as generally unreliable since it's equivalent to a bloghost. Feel free to bring up to the RSN though. Headbomb {t · c · p · b} 02:03, 22 September 2020 (UTC)

Los Angeles Times

I am seeing https://www.latimes.com/ links in red. --- C&C (Coffeeandcrumbs) 02:34, 3 October 2020 (UTC)

I'm not... so... do you have a link to a page where this happens? Headbomb {t · c · p · b} 02:41, 3 October 2020 (UTC)
If you mean Talk:Andy Ngo, it's highlighting a mention of Quillette.com. See WP:UPSD#Limitations for an explanation and workaround. Basically, use Quillete[dot]com in the quote. Headbomb {t · c · p · b} 02:43, 3 October 2020 (UTC)
Thank you. Great script by the way! --- C&C (Coffeeandcrumbs) 02:50, 3 October 2020 (UTC)

Externalize the css?

@SD0001 and Oshwah:, is there a way to externalize the CSS rules, with mine being default, but so someone was able to override them with their own? That would prevent forks like User:Oshwah/UnreliableSourceHighlighter.js, which haven't kept up with the most recent sources covered. Headbomb {t · c · p · b} 20:09, 27 September 2020 (UTC)

Hi Headbomb! The reason I forked this script wasn't to steal any code or steal credit or anything of that sort. :-)I did this so that I could customize the colors in a way where I could register different levels of unreliability of the sources without having to look at the documentation each time. Have there been changes to the code that I need to be aware of? ~Oshwah~(talk) (contribs) 04:26, 9 October 2020 (UTC)
@Oshwah: Well, that's a bit the point of externalization; it would let you customize the colours as you want, without having to update the sources covered every time I did. For the changes, look at [7], which is the difference between your version and mine. You can copy-paste over any line start with regex: that changed, and we'll be in sync. Until the next time I update the sources. Headbomb {t · c · p · b} 04:45, 9 October 2020 (UTC)
Ah I see! You want to make variables that users can set in their common.js, vector.js, etc in order to customize those colors! I think that's a great idea! I'd be totally on board! :-) ~Oshwah~(talk) (contribs) 07:10, 9 October 2020 (UTC)
It's possible of course. We should probably have three different files: a JSON (for storing the data in a very easy-to-edit form), a CSS (for holding the styles) and a JS (which provides the core logic and binds everything together). – SD0001 (talk) 12:16, 15 October 2020 (UTC)
In theory a JSON sounds nice... and allow for things to be re-used in other scripts. In practice, it would likely explode the size of the script because JSON is really not an efficient format. Would be open to exploring it though. Headbomb {t · c · p · b} 14:10, 15 October 2020 (UTC)

Ultimate Guitar

The website, Ultimate Guitar should be highlighted yellow. The music source list lists Ultimate Guitar as a reliable source but cautions editors to only cite articles that are either written by the UG Team, one of their staff members, or authors with credentials to other reliable sources like Rolling Stone. That would make the website only marginally reliable. Lazman321 (talk) 02:44, 22 October 2020 (UTC)

@Lazman321: Since I'm not really familiar with the topic, I'd rather be able to point out to an RSN discussion with consensus it's problematic enough to be worth highlighting. Headbomb {t · c · p · b} 02:02, 29 October 2020 (UTC)
@Headbomb: Okay. This discussion reached a consensus to only cite articles written by the UG Team or members of it. This discussion reached a consensus that if the author isn't a staff member, but has credentials to other reliable sources, the article can be cited. Lazman321 (talk) 17:39, 29 October 2020 (UTC)

Free hosting services

Headbomb, maybe it'd be a good idea to add Appspot, Heroku, and Google Sites domains to unreliable's rules? They're all free-to-use web hosts, and the chances of them containing anything good is about zero. —moonythedwarf (Braden N.) 15:10, 3 December 2020 (UTC)

@Moonythedwarf: There's a often good stuff on those, but I agree they should be flagged in yellow at the very least. I'm a bit busy today, but find me a list of domains and I'll add it to the script. Headbomb {t · c · p · b} 15:30, 3 December 2020 (UTC)

I was doing a bit of cleanup and discovered that Physics Essays has published papers claiming to derive spacetime from consciousness, that energy conservation refutes relativity, that relativistic length contraction is a logical contradiction, and so forth. It should probably be marked as an unreliable journal. XOR'easter (talk) 15:52, 6 February 2021 (UTC)

I'd like to second that request. I just removed a section [8] from Double-slit experiment claiming that collapse was caused by consciousness, which was based on a paper published in Physics Essays. Tercer (talk) 08:13, 9 February 2021 (UTC)
I also support this request. Such absurd claims are hampering the progress of science.Guswen (talk) 09:19, 9 February 2021 (UTC)
I added PhysicsEssays.org as a 'borderline source', but since there's no DOI for them (AFAICT), the script will miss a lot of it. Headbomb {t · c · p · b} 11:56, 9 February 2021 (UTC)
Upgraded to Generally Unreliable. No way these papers could ever have passed any meaningful peer review process. Headbomb {t · c · p · b} 12:04, 9 February 2021 (UTC)
Great, thanks. The papers do have DOIs, although they are well-hidden. For example [9]. Tercer (talk) 12:07, 9 February 2021 (UTC)
(EC) I also found their {{doi|10.4006}} prefix, and added them to WP:CITEWATCH (they'll show up tomorrow). Headbomb {t · c · p · b} 12:08, 9 February 2021 (UTC)
Another winner: A parallel nonphysical universe containing dreams, thoughts, emotions, and memories [...] based on dark matter [10]. XOR'easter (talk) 18:00, 13 February 2021 (UTC)

Interesting similar script. – SD0001 (talk) 06:48, 1 March 2021 (UTC)

Yes, although I feel that one is fundamentally ... misguided is too harsh a word, but something along those veins. Going "New York Times? That's green, therefore reliable!" forgets all the times NYT isn't reliable. Headbomb {t · c · p · b} 07:10, 1 March 2021 (UTC)

sptnkne.ws

Headbomb, could you please add sptnkne.ws to the list of deprecated sources? It's an URL shortener for sputniknews.com. Kleinpecan (talk) 17:01, 15 April 2021 (UTC)

 Done Sure. Headbomb {t · c · p · b} 20:15, 15 April 2021 (UTC)

Specific WikiProjects unreliable sources inquiry

Hello. I was wondering if you could add the unreliable sources to your script from Wikipedia:WikiProject_Albums/Sources#Unreliable_sources and Wikipedia:WikiProject_Video_games/Sources#Unreliable_sources. The albums one has more details in that section explaining why several of the sources arent reliable. With About.com (now Dotdash), it's selective critics that are not reliable, but some critics from that site are fine to use. See Wikipedia:WikiProject Albums/Sources/About.com Critics Table. Thanks! --MrLinkinPark333 (talk) 02:15, 22 January 2021 (UTC)

It's definitely possible, the main issue I have is how wide is the consensus that those are unreliable (across Wikipedia, not just for videogame content)? But also that Wikipedia:WikiProject_Albums/Sources#Generally unreliable sources doesn't have the websites listed, meaning it would take me a few hours to track everything, and compare against WP:RSPSOURCES. Headbomb {t · c · p · b} 19:35, 25 January 2021 (UTC)
There's some overlap with RSP (i.e. IMDB, HuffPost contributors, Forbes.com contributors) etc. Maybe a separate section would be more suitable like Unreliable sources per Wikiproject? --MrLinkinPark333 (talk) 19:56, 25 January 2021 (UTC)
That could be a thing. All someone needs to do is to make use of this to add their own specific rules. This could be maintained by someone from the project, and then shared with other project members. Headbomb {t · c · p · b} 20:04, 25 January 2021 (UTC)
The edit summary does not indicate why, but GameSpot, which is on the VGRS list as reliable, was recently added as generally unreliable.
Please fix. Izno (talk) 14:45, 19 April 2021 (UTC)
I'll tweak to yellow later tonight. It was added because there a GameSpot hosted database where information is user-submitted. Headbomb {t · c · p · b} 16:21, 19 April 2021 (UTC)
Tweaked, it should only match gamefaqs.gamespot.com now. gamespot.com will be in yellow, to double-check that the article is staff-authored, and not user-submitted. Headbomb {t · c · p · b} 19:09, 19 April 2021 (UTC)
WFM. Izno (talk) 20:28, 19 April 2021 (UTC)

Headbomb, could you remove Film Music Reporter from the script? A recent discussion at the reliable sources noticeboard has found the website to be reliable, as they appear to simply state press releases from industry insiders. Some Dude From North Carolina (talk) 21:57, 20 April 2021 (UTC)

I wouldn't say they found the website reliable. It's still a self-published blog. See also Wikipedia:New page patrol source guide's comments about it. But I'll mark in yellow instead. Headbomb {t · c · p · b} 22:37, 20 April 2021 (UTC)

List of sources?

Nice tool! Something like this should be standard for every new editor to help with identifying reliable sources. The only thing that isn't very transparent is the source for the lists of sources you use. You state it's WP:RSPSOURCES, Predatory open access source list, and WP:CITEWATCH, with some minor differences. If the minor differences could be removed/clarified and direct use of those lists could be achieved then it would be reasonable to propose a wider adoption of this tool as it would reflect exactly the consensus of the community without any kind of filter. It would save a lot of editor time in wasted arguments due to editors not knowing previous community consensus. -- {{u|Gtoffoletto}}talk 19:16, 27 April 2021 (UTC)

@Gtoffoletto: The sources are mostly taken from WP:RSP, WP:NPPSG, Beall's list (not wholesale however), various WikiProject/WP:RSN discussions, and obvious nonsense (like satirical sites). The minor differences amount to classifying in yellow vs red or similar. For example, ResearchGate is in red at WP:RSP, but highlighted in yellow by the script, because the vast majority of the time, these are simply convenience links to published journal articles, which happen to be hosted on ResearchGate, and not to some original scholarship published on ResearchGate by a random person. Yellow is a better reminder to double-check, rather than a stronger warning of confirmed crap. Likewise, Forbes is both Green/Red at WP:RSP, but here we put yellow (depends on topic), since we cannot detect weather or not a piece was written by staff (green), or by an external contributor (red).
The sources themselves can be found on User:Headbomb/unreliable.js. I edit the script based on what my best determination of what the Wikipedian consensus is. Anyone is free to challenge specific sources, or requests to a sources at RSN, and I'll happily update the script accordingly. Headbomb {t · c · p · b} 19:37, 27 April 2021 (UTC)
Thank you for clarifying. If this could be automated based on those lists and the lists followed a more stringent criteria (yellow rather than green/red for forbes for example) we could disseminate the consensus of the community more widely in a more efficient/automated way. Every time a user tried to use a deprecated source he could get a warning for example. Just thinking out loud here but it could be an interesting proposal. -- {{u|Gtoffoletto}}talk 20:16, 27 April 2021 (UTC)
The WP:RSP lists do follow "stringent criteria". The issue here is that, for example, Forbes.com or ResearchGate or what have you hosts several types of content, and the script cannot discriminate between said types of content. As for being warned, that's what WP:Edit Filterss are for, but for editing warnings, the concerns about specific sources need to be substantially higher than 'you probably want to double check this/are you sure this is reliable?'. Deprecated sources would likely meet the criteria for an edit filter though. Headbomb {t · c · p · b} 00:24, 28 April 2021 (UTC)
Yes absolutely. This should just be a warning to editors. Something like: "The community regards this source as questionable (see discussion here). Please make sure the standards for reliable sourcing are being met or you edit might be reverted". It wouldn't block the user from using it. Just warn him of the previous consensus. -- {{u|Gtoffoletto}}talk 20:17, 28 April 2021 (UTC)
The warning could probably do a simple URL match in the visual editor cite tool. That would be great for both old and new users. How about we take this to WP:VPIL Headbomb? -- {{u|Gtoffoletto}}talk 20:21, 28 April 2021 (UTC)
Feel free to take it to WP:VPIL, but I'm fairly weary of making an opt-in script with several warnings on how to properly use it to be the default for masses that maybe don't understand the nuances of it. Headbomb {t · c · p · b} 20:27, 28 April 2021 (UTC)
No the script as is is definitely not appropriate for the masses. Either your idea of edit filters (at least for deprecated sources - that would not require any new UI) or something lighter integrated into the cite UI of the visual editor when you input the URL. But I have no idea of how features like that are implemented on Wiki. -- {{u|Gtoffoletto}}talk 20:34, 28 April 2021 (UTC)

www.ekushey-tv.com

User:Headbomb/unreliable.js marks a reference with www.ekushey-tv.com in its URL as a "Generally unreliable source". Presumably this is because of the pattern the code uses to match the user-generated tv.com. Ekushey Television is a reliable source for the sorts of things TV news is usually reliable for, so it would be nice if this could be fixed. Ekushey TV isn't cited very often, fewer than 100 times, so if fixing it isn't feasible, I understand. --Worldbruce (talk) 03:02, 2 May 2021 (UTC)

I'll take a look. Headbomb {t · c · p · b} 20:35, 5 May 2021 (UTC)
@Worldbruce: should be fixed. Headbomb {t · c · p · b} 20:38, 5 May 2021 (UTC)