Jump to content

Wikipedia talk:WikiProject Spam

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Hu12 (talk | contribs) at 20:11, 15 August 2012 (+). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    When reporting spam, please use the appropriate template(s):
    As a courtesy, please consider informing other editors if their actions are being discussed.
    {{Link summary|example.com}} -- do not use "subst:" with this template - Do not include the "http://www." portion of the URL inside this template
    • {{IP summary}} - to report anonymous editors suspected of spamming:
    {{IP summary|127.0.0.1}} --- do not use "subst:" with this template
    • {{User summary}} - to report registered users suspected of spamming:
    {{User summary|Username}} -- do not use "subst:" with this template

    Also, please include links ("diffs") to sample spam edits.

    Indicators
    Reports completed:
     Done
    no No action
     Stale
    Defer discussion:
     Defer to XLinkBot
     Defer to Local blacklist
     Defer to Global blacklist
     Defer to Abuse filter
    Information:
     Additional information needed
    information Note:

    Questions about "Chinese Knockoff Spam"

    Continuing earlier research about link spam has led me to intensive archive browsing. It is impossible not to notice the many-many iterations of "Generic Chinese Knockoff Spam" that appear throughout 2010 and 2011 (now inactive?). Given that this time is far past, it inhibits my investigation a bit. Moreover, it seemed the team here usually blacklisted it without much discussion or fanfare. For these reasons, I'd like to ask a few questions of those who were active at that time:

    • Was there a particular modus operandi for their actions?
    • How were "related domains" found? I notice that usually only a few domains were actually spammed, but many more were pro-actively blacklisted on some basis? Something to do with WHOIS registrations?
    • What was actually at the destinations? Were there just a couple of core sites that they endlessly purchased new domains for?
    • How do you think they got all those IPs? Do you think this was a single person or a team? Were they utilizing proxy servers, or something more powerful?
    • Do you have any evidence that these spam links were being used elsewhere? blog/forum spam?
    • Do you think these links were added by a human acting quickly, or were there automation scripts?
    • Has any attacker near this scale/complexity been seen before?

    Thanks for all you help, West.andrew.g (talk) 21:16, 28 July 2012 (UTC)[reply]

    From top to bottom:
    • Profit.
    • Web search for various unique Chinglish phrases.
    • They usually aren't redirects. Chinese knockoff domains are copypasta/reskins.
    • I don't know, but there is definitely more than one person doing it.
    • Google is your friend. Something like "louis vuitton cheap comment" will do. This is an internet wide problem.
    • Most likely human sweatshop spam.
    • No. This is organized, as in organized crime.
    MER-C 03:03, 29 July 2012 (UTC)[reply]
    In case people are not aware, Andrew is probably not just curious—he is able to monitor link additions and perform complex processing. If there were patterns that might be detected (article categories, timing of link additions, ranges of IPs, and more), his processing may be of assistance (potentially, it could do more background checking than XLinkBot). The copypaste/reskins observation is of interest—it would be difficult, but it might be possible to guess that the target of an added link is a reskin. Another clue might be the whois registrant for the domain, although whois lookups can become expensive. Johnuniq (talk) 03:22, 29 July 2012 (UTC)[reply]
    Johnuniq is correct. I have several corpora of Wikipedia link spam, run a link processing engine, and have worked heavily in anti-vandalism development (see WP:STiki. No offense, but I found MER-C's answers were a bit sarcastic/brief, whereas I was hoping to have a more technical discussion on the topic of Generic Chinese Knockoff Spam (CGKS). If anyone seems to be persistent and technically sophisticated in profiting off of wiki via link spam, it would seem to be those folks. Understanding their attack vectors would seem key to understanding Wikipedia's spam weaknesses. So to go top to bottom again, with a little more detail:
    • Yes, profit is obviously the over-arching MO. However, the actions are just puzzling. Sometimes they blank articles with links. Sometimes they insert 5-10 at a time in the middle of an article. Sometimes they post to talk pages. They obviously aren't trying to blend in and evade the first wave of anti-vandal checks. With this behavior so obvious to patrollers, its difficult to imagine how the links survive long enough to extract any utility. It's also a bit odd that they had to know the accounts/IPs would be blocked, but didn't seem to use accounts to their full capacity (i.e., spamming at speed until blocked). Real spam outfits know how things work and have technical sophistication, these folks just seemed dumb and random at times (which made me wonder about automation).
    • I don't imagine you actually searched Chinglish phrases. I imagine the related domains were sitting on the same server?
    • The destination sites were actually selling products, I assume? With this complexity I imagined they were probably part of a larger affiliate program into the actual goods, but probably using copypaste/reskins themselves - i.e., some central backend was fulfilling any "orders"
    • Looking at the accounts used was a bit perplexing. For their poor understanding of the interface and norms (i.e., spamming talk pages, they still managed to register some accounts. As far as the IPs involved, they seem all over the place. This isn't local IP hopping. They are geographically distributed. This screams proxy servers to me. In the worst case, it could be a botnet (but these attacks simply don't seem large enough for that). Why do you believe it is more than one person? The spam never really happened so quickly that it couldn't be a single person.
    • I understand that GCRS is a widespread Internet problem, but there are certainly many outfits involved in it globally. I was curious about the domains Wikipedia was dealing with in particular. They've paid for 1000s of domain registrations, so if they weren't getting used on wiki, they had to be showing up elsewhere. Strange, then, why they didn't show up on wiki? 50 or so pockets of spam over 1.5 years isn't *that* much for an outfit that seems to have these capabilities.
    • -skip-
    • I understand the organized crime ramifications of spam outfits. What I was asking if anyone/anything had come close to this in your experience *on Wikipedia"? My interest here is confined to wiki/Wikipedia experience.
    Additional questions:
    • Were all/most of these blacklisted? Over at meta? Including the "related but not yet seen on wiki" set? MER-C always seemed to lodge the reports, but it wasn't clear the blacklist treatment of these guys.
    Thanks for your help. More than a casual passer-by, I am trying to develop tools/algorithms that aide the project. I'd be happy to take this up on IRC if that is easier for anyone involved. Thanks, West.andrew.g (talk) 04:04, 29 July 2012 (UTC)[reply]
    P.S. Why when I do an archive search for "Generic Chinese" or any variant thereof I only get 24 results? I have more luck searching for "2.0", "3.0", "4.0", but this only gets me one at a time and doesn't help with the odd ones, i.e., "8.5". Thanks, West.andrew.g (talk) 04:36, 29 July 2012 (UTC)[reply]
    Thanks Andrew, however this page is not a good place for discussion because it is subject to a lot of churning as new reports come in.
    @MER-C: Do you think there is value in continuing this elsewhere? A subpage dedicated to the topic? I don't think IRC is desirable as complex statements there are difficult—a wiki page would help with development of any ideas, and may be useful for future discussions. I suggest moving this section to a subpage, with a link here. Johnuniq (talk) 04:42, 29 July 2012 (UTC)[reply]

    I think that this is very well within the capabilities of LiWa3/XLinkBot. Problem is that XLinkBot only reverts, spammers will re-insert add nausiam, they don't care when accounts get blocked (just move to another IP), they don't even care when their links get blocked (they just copypaste or reskin to a new domain). It is how they make money. I think that it is therefore valuable to have these link blanket-blacklisted. Preemptively, it saves us, and XLinkBot, a lot of work. If the spammers found out Wikipedia is a target (and they obviously did, some of the links were spammed to Wikipedia), then the fact that only so few links were actually added (while the internet at large was targetted) may actually show MER-C's efficiency here: either they did not spam because their domains were already blacklisted, or the sweat-shop workers were told to not make a priority of Wikipedia, as it was too effective too fast.

    Andrew, I'd like to hear more about your detection systems, maybe there are things that can be incorporated into LiWa3 so we can work in real time and ask COIBot to make reports. Indeed, maybe a dedicated subpage is better. --Dirk Beetstra T C 05:09, 29 July 2012 (UTC)[reply]

    Regarding IRC, MER-C and I generally hang around in #wikipedia-spam-t (the main 'command center' for XLinkBot, though it , you may want to join us there. Regarding doing everything on-wiki - I'd like to put WP:BEANS into consideration there. --Dirk Beetstra T C 05:12, 29 July 2012 (UTC)[reply]

    However, the actions are just puzzling. Sometimes they blank articles with links. Sometimes they insert 5-10 at a time in the middle of an article. Sometimes they post to talk pages. They obviously aren't trying to blend in and evade the first wave of anti-vandal checks.
    These guys (1) probably don't speak English too well and (2) are targeting the entire interwebz. This isn't "pay $X to get your link into Wikipedia" spam -- these spammers don't specialize and are paid piecewise per comment => quantity over quality. They do not target Wikipedia specifically -- we're just another website to be spammed. Something like this, I guess. (I have read a more detailed article on this subject but the link escapes me). Usernames are often the same across websites. Messages are usually copypasta. Automation (if used) is usually achieved with XRumer. If there are bots, they pass the Turing test (only because the humans are so dumb).
    I don't imagine you actually searched Chinglish phrases.
    Yes, I actually did. Examine the spam reports here closely and you will find the search phrases I used.
    The destination sites were actually selling products, I assume?
    Mostly. There are also splogs for the knockoff domains.
    Something to do with WHOIS registrations?
    WHOIS data isn't really helpful for knockoffs. Come to think of it, this might be an example of smurfing. Re: copy and paste reskins -- they look the same, and many use the same server-side software. (So do many other sites).
    blacklist treatment of these guys
    Blacklisted on sight, uncontroversially.
    Why do you believe it is more than one person? The spam never really happened so quickly that it couldn't be a single person.
    Remember, it's the entire internet being targeted. What you see here is the tip of the iceberg.
    Do you think there is value in continuing this elsewhere?
    I'll "sticky" this by adding an appropriate timestamp. Things tend to die on subpages.
    I apologise for my brevity, but the OP should have put a little more effort into asking the initial questions. MER-C 09:46, 29 July 2012 (UTC)[reply]
    Sticky. MER-C 09:36, 29 August 2012 (UTC)[reply]
    FWIW, I've compiled all the reports in archives into a single page @ User:West.andrew.g/GCKS
    Continuing to update, I geolocated all the IP addresses found in the GCKS campaigns. 99% trace back to a single Chinese province. This doesn't seem to be at botnet scope and too much time has passed to determine the open proxy status of the IPs. This seems to further support the "distributed sweatshop" thinking someone put forth earlier. Out of curiosity, what do the experts around here think is the second most prevalent or determined campaign/faction/attempt they've seen? Does it even begin to approach the scale of GCKS? Thanks, West.andrew.g (talk) 17:55, 4 August 2012 (UTC)[reply]
    The 'internet brands' stuff? Some specialists have managed to circumvent our detection for a long, long time. --Dirk Beetstra T C 04:01, 6 August 2012 (UTC)[reply]
    The "latestmoviez.com" crap (see below) is getting there as well. MER-C 09:36, 7 August 2012 (UTC)[reply]

    xt3.com

    Continuous pushing, see ANI thread Wikipedia:Administrators'_noticeboard/Incidents#Mass_spamming_from_multiple_socks.

    Final warning given. Creating this report 'for the record'. --Dirk Beetstra T C 07:08, 9 August 2012 (UTC)[reply]

    Long term mdct.com.au Spamming

    Accounts

    Long term(2007) persistant spamming including Link vandalism;[1][2]. --Hu12 (talk) 13:43, 9 August 2012 (UTC)[reply]

    R To Z Media Spamming

    Adsense google_ad_client = pub-7088803035940952 (Track - Report - reverseinternet.com • meta: Track - Report)
    Google Analytics ID: UA-7072263 - (Track - Report - reverseinternet.com • Meta: Track - Report)
    Accounts

    Randi Zuckerberg's company. --Hu12 (talk) 13:43, 10 August 2012 (UTC)[reply]

    DVBViewer promotion

    Articles
    Accounts

    En rago (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    CHackbart (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Oliver Sc (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    91.20.211.242 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    31.17.173.130 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    --Hu12 (talk) 21:35, 10 August 2012 (UTC)[reply]

    More showed up in the Wikipedia:Articles for deletion/DVBViewer debate
    Ezekial 9 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Patrick 1bc0 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)

    --Hu12 (talk) 14:40, 14 August 2012 (UTC)[reply]

    It appears that CHackbart (talk · contribs) is the re-creator of the articles (as evidenced by this post) and an administrator named hackbart on the dvbviewer.tv forums.--Hu12 (talk) 19:57, 15 August 2012 (UTC)[reply]

    92.251.229.192

    92.251.229.192 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)


    freerice.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    IP adding spam link to talk pages. --Nathan2055talk - contribs 01:14, 12 August 2012 (UTC)[reply]

    Triplett, J. M WP:BOOKSPAM

    Accounts

    Dohaschmoha (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    24.166.34.106 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    --Hu12 (talk) 00:38, 13 August 2012 (UTC)[reply]

    Bertrand Moingeon ‎WP:BOOKSPAM

    Articles
    Accounts

    Management1 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Bmpub (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Innovation91 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    79.85.198.179 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    80.214.5.61 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    --Hu12 (talk) 13:12, 13 August 2012 (UTC)[reply]

    HospitalGlobal

    Google Analytics ID: UA-33643851 - (Track - Report - reverseinternet.com • Meta: Track - Report)
    Accounts

    Medicworld (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Clear promotion additions (domain created:7/24/2012)-Hu12 (talk) 14:10, 13 August 2012 (UTC)[reply]

    Oxford Bibliographies WP:SOCK Spamming

    Accounts

    Rwarden87 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Christina1012 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Naar0711 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Scisw (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    12.182.77.130 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot) OXFORD UNIVERSITY PRESS OXFORD-U61-77-128
    --Hu12 (talk) 17:15, 13 August 2012 (UTC)[reply]

    Spammer(s) blocked--Hu12 (talk) 17:24, 13 August 2012 (UTC)[reply]

    Long term spamming by logos.com

    216.57.209.252 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    User has been spamming links for logos.com bible software products for at least three years, continuing in spite of eight warnings over the years, and a final warning four days ago. To add to the cynicism of this 'user,' they are spamming from the logos.com owned email server,[3] which also geolocates to their corporate headquarters in Bellingham, Washington. A longer than typical block is warranted in my opinion. First Light (talk) 00:13, 15 August 2012 (UTC)[reply]

    Tracking. MER-C 01:43, 15 August 2012 (UTC)[reply]

    hairsaloninclearwater.com

    Google Analytics ID: UA-25028809 - (Track - Report - reverseinternet.com • Meta: Track - Report)

    Spammers

    MER-C 12:10, 15 August 2012 (UTC)[reply]

    Tarai Dooars Tour & Travels Spam

    Google Analytics ID: UA-25929093 - (Track - Report - reverseinternet.com • Meta: Track - Report)
    Accounts

    117.201.115.177 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    117.201.125.100 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    117.201.118.197 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    --Hu12 (talk) 13:14, 15 August 2012 (UTC)[reply]

    Return of Long term doabaheadlines.co.in Citespam

    Articles

    Accounts

    Recent WP:CITESPAMming involes replaceing established reliable sources with this unreliable spam. --Hu12 (talk) 13:50, 15 August 2012 (UTC)[reply]