User talk:West.andrew.g/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5


Researcher flag

Hi! I've seen User:DarTar is the (only!) researcher (see http://en.wikipedia.org/w/index.php?title=Special:ListUsers&group=researcher ) and I've seen you asked hot to get the flag. Did you get a clue? Thanks! --phauly (talk) 10:27, 25 August 2010 (UTC)

Hi Pauley. I believe the Wikipedia Research Committee (Read-only Mailing List) was created to discuss this matter (among others pertaining to research). At current, I don't believe there is a formal process -- but you may be able to learn a little bit by going through the archives? Thanks, West.andrew.g (talk) 16:23, 1 September 2010 (UTC)

Using STiki in other Wikipedias

Hello. I'm a sysop in Turkish Wikipedia, and im currently operating a pywikipedia bot, Khutuck Bot in Turkish Wikipedia. Is it possible to run STiki on tr.wiki with changes in the codes you have released so far? Turkish Wikipedia is running low on RC patrollers lately and a tool like STiki will be a great ai for us. Please reply me at tr:User:Khutuck, tr:User:Khutuck Bot or User:Khutuck Bot. Thank you for the lovely tool. Khutuck Bot (talk) 22:44, 27 August 2010 (UTC)

Hi Khutuck, and sorry for the slow response. It is not difficult to implement STiki for different projects, but it is not trivial, either. First, a server is required to host the back-end component (which will need a static IP address). Secondly, there must be some way to identify some portion of vandalism in an ex-post facto fashion. For en.STiki, I use the common format of rollback strings for this purpose. Third, there would need to be language changes in the interface, "bad word" regexes, and perhaps some of the parsing. Fourth, I am willing to support anyone in such a venture -- but they need to have the coding (Java) skills to understand what is going on. If you are still interested, please let me know. Thanks, West.andrew.g (talk) 16:34, 1 September 2010 (UTC)
Thank you for the detailed explanation. I'll read coding again to better understand these four issues. Sadly I only have the basic coding knowledge, but i've been trying to learn java lately. Is it possible to run back-end component on my own PC for myself only with a dynamic IP with minor coding changes? If it's possible, STiki will be a multi-language Wiki tool :) Khutuck Bot (talk) 20:26, 1 September 2010 (UTC)

Competition on Wikipedia Vandalism Detection

Please take a look at my write-up on a recent competition in detecting Wikipedia vandalism. I'm interested in your thoughts on the nine tools submitted. Would it be possible to run the sample data set against STiki and see how it stacks up? Even if STiki doesn't come out #1, it has a huge advantage in already having a real-time Wikipedia implementation. As you can see at the top of the linked page, my goal is to use such a tool to flag high risk changes for review as a pending change. This will obviously be more sophisticated than my original edit filter proposal, but the concept is the same. If you're not aware, there has been massive interest and concern about the new pending changes feature on Wikipedia. Thanks! —UncleDouggie (talk) 06:29, 27 September 2010 (UTC)

Hello UncleDouggie. It was actually my intention to enter that PAN-CLEF competition, but short notice and some realities got the better of me. Since that time, I have run STiki against the test set and it would have finished in second place in the competition -- so STiki is indeed a competitive tool. Further, I have worked closely with Bo and Luca (authors of WikiTrust, who *did* finish second in the competition). When our feature-sets are used in combination, we do extremely well together, and would comfortably have taken first place. We're working to make APIs available so STiki can integrate that logic.
I've read the technical paper summarizing the PAN-CLEF competition. I've heard the first place individual basically implemented the work of the competition's author, Martin Potthast ("Automatic Vandalism Detection for Wikipedia...") -- which used Natural Language Processing (NLP). STiki has some trivial NLP features -- so the infrastructure is already in place to use them -- so I could easily implement what these individuals found to be the best ones (Though I also suspect that ClueBot et al. might be taking care of some of the low-hanging fruit on the NLP front). If STiki were going to be used in a more official capacity -- this would certainly be enough motivation for me to get something like this done. Thanks, West.andrew.g (talk) 15:55, 27 September 2010 (UTC)
Thanks for the information, West.andrew.g. There has been quite a bit of traffic over at my proposal page, including users who support trying something like this in the next Pending Changes trial. We already have Jimbo's support and I think if we could show him a realistic plan he would be willing to try it. We don't need a perfect version right now; it seems that what you have today is good enough for a trial of the concept. People understand that the detection algorithms can be improved later.
I would like to understand the performance results a bit better. I believe that the competition used edits that had made it through the edit filter without any adjustment for later reverts by ClueBot or Huggle users. The edit filter introduces bias itself because of how it gives instant feedback to vandals on how to tone down their edits. Here is an example from my proposal: "Many edits trigger multiple filters as shown here. Note that the edit at :48 was saved and subsequently rejected by a PC reviewer." ClueBot can also distort the performance in a different way. Let's say that we tune STiki for a TP of 90% using the competition data and in reality ClueBot is already reverting 50% as blatant vandalism with a low FP rate. On the real wiki, STiki will now have a TP rate of 80% after ClueBot has picked the low hanging fruit. I'm not trying to downplay the possibilities for STiki in anyway here; I only want to make sure that we capture meaningful stats and fully understand what the stats mean.
I'm also interested in the modularity of the rules and if you compute an internal confidence that we could use to apply a different threshold for different types of articles. For example, any insert of the word "death" into a BLP is a high risk change, while the risk is lower for non-BLPs. In another case, it may be desirable for an admin to place a particularly problematic article into a category for which a lower threshold is used to improve the TP rate at the expense of more FPs for that one article. In essence, tuning by class of article. Thanks! —UncleDouggie (talk) 09:52, 28 September 2010 (UTC)
I believe you are correct in your assessment that Potthast's corpus is just a random selection of committed edits (including those reverted by ClueBot seconds later). Given ClueBot's open-source nature, it should be possible to run it against the corpus to see how much "low-hanging fruit" is in there. Perhaps we could contact the author to this effect? In this manner, we can have a more concrete idea of tuning parameters. West.andrew.g (talk) 14:55, 28 September 2010 (UTC)
AGW, does your badwords list include the cluebot list or Lupin's more extensive list from the Filter Recent Changes option on his Anti-Vandal Tool? I think the latter could boost performance, and in combination with your other metadata could reduce the fairly high false positives from that filter by itself.
Also, I think I asked you before, but are you considering sub-testing individual bad-words? A program like Lupin's or possibly Huggle has seen hundreds of thousands of edits and all of those rollbacks might be recorded; if either had a database which could be mined for correlations or cross-listed against the bad-words list, it could help create a ranking scheme for which badwords are most indicative of vandalism, refining the bad-words filters. Along those lines, Huggle and Lupin's also have pretty extensive user blacklists from prior reverts. Any thought of including a feature based on that?
Overall, it sounds like between your meta-data, the trusted article/user work, some NLP enhancements, and possibly the textual analysis coming from UIowa, that a pretty sophisticated and thorough filter could run on the whole project. Combination with Uncle Douggie's ideas about flagging edits for review through pending changes, this could be very interesting.Ocaasi 12:30, 28 September 2010 (UTC)
The "bad word" list I use (and the resulting feature I calculate) is most similar to that employed by ClueBot. However, I was un-aware of the Lupin list until you provided that pointer, and it is impressive in its magnitude and scope. I will investigate adding that new list into STiki. The UIOWA folks also point out [1] and [2] as two resources for this purpose.
At current, I do not test for "individual" bad words. Instead, it just processes the diff and increments a "score" for every regular expression which is matched. I myself have a pretty extensive tagged vandalism collection, based on the detection of the 'rollback' action (collecting vandalism is easy -- its collecting confirmed *good* edits that is much harder). I'd be willing to export it for anyone willing to investigate which of the bad words are actually getting used the most in vandalism.
I'm willing to extend STiki with trivial features, but the work of UIOWA is pretty intense and involved. The bulk of their novelty is based on n-gram probabilities calculated over domain-specific corpora pulled from web-queries. The scalability/latency of such a method may not be appropriate for operation at en.wiki scale.
What the project really needs is a meta-classifier built from the APIs of all the anti-vandal tools. ClueBot is calculating a meaningful score for every edit and throwing it away if it doesn't exceed some threshold. The WikiTrust people are doing their thing. I have an API and magic numbers based on metadata. There is a ton of disjoint edit processing going on (some which I am probably not aware of). STiki is slowly evolving into a general-purpose tool (from its initial academic and strict-metadata approach). While I'm fine with that, it would seem so much more logical to have people perfect the components they are best at (for example, I am no NLP expert). The meta-classifier Potthast produces in the competition summary shows the viability and increased performance of such an approach -- although not all those methods have been implemented in a live fashion. West.andrew.g (talk) 02:45, 30 September 2010 (UTC)

After using STiki for many hours, I find myself wondering how it is scoring things. It just gave me an edit that was 30 days old followed by one that was one hour old. Both were vandalism. However, many others have not been. Since my goal is to evaluate the potential of STiki for broader usage, I find myself wanting to know how sure STiki is about the edits it shows me. It would be really nice to show some scoring information in the edit browser. If you are concerned about the information being used to skirt the tool, perhaps it could be an undocumented feature just for research purposes.

Please note that I added several feature requests/bugs to the tool talk page. These are all issues with the client, and while they would be nice to have, they have no real value to my current research on the back-end algorithms. —UncleDouggie (talk) 04:22, 4 October 2010 (UTC)

"I find myself wanting to know how sure STiki is about the edits it shows me" -- For each edit, STiki runs metadata through a pre-computed model and produces a "vandalism score." Low-scores are indicative of innocent edits, and high-scores suggest vandalism -- but they have no absolute interpretation. Of course, for the application you discuss/propose, we would determine a tagging threshold for these scores based on empirical evidence.
In terms of client operation -- these scores are the priority for insertion into a queue. The highest scoring edits are the first to be presented to users (assuming they are still the most recent on the page). Thus, a really old edit can still exist in the queue if it is high scoring and still the most recent. However, many factors affect the perceived performance by a STiki client. First, it depends on how many people are using STiki, or have used it recently. The deeper one gets into the queue, obviously, the less likely it is to see vandalism. When STiki was new (i.e., no one knew about it), I'd be able to use it once a night and have incredible perceived success as everything stacked up and was waiting for me. Now that it is a bit more popular, the "low hanging fruit" doesn't hang around as long. Second, there are competing tools. I'd imagine STiki is scoring a ton of vandalism very highly. However, users of Huggle come along and take care of things, so the high-score edit gets de-queued, and STiki users never get a chance to get it (though it was well-handled on the back-end).
I can work on a feature that displays these raw scores to an end user. In the meantime, my API [3] does make them available. But given there relative interpretation, I'm not sure how helpful this would be.
Regarding the client issues/suggestions - I will address those on the STiki talk page. But, thank you for interest and feedback. West.andrew.g (talk) 05:50, 4 October 2010 (UTC)
The API is cool, but rather hard to use since the client doesn't support copy of the revid from the diff browser screen. —UncleDouggie (talk) 11:11, 5 October 2010 (UTC)

Back in August, you promised to return and engage those who have contacted you. I contacted you via this talkpage on August 17th and have not received a reply. What's going on? —Stepheng3 (talk) 18:39, 3 October 2010 (UTC)

Stephen, matters pertaining to the Signpost article are currently being discussed with MediaWiki developers and Foundation staff. I intend to decline comment until everything has been resolved through those channels. Do notice that my edits have been nothing but constructive since the ArbCom resolution. Thanks, West.andrew.g (talk) 06:54, 5 October 2010 (UTC)

Overzealous AIV reporting?

STiki and I automatically reported an IP to AIV after the IP made a vandal edit as their first edit in ten days (previous edits had been vandalism also; IP talk page had the full gamut of warning levels). This was declined at AIV as "insufficient recent activity to warrant a block". This appears to be something along the lines of a false positive for STiki, reporting a user whose older warnings had gone stale for AIV purposes. keɪɑtɪk flʌfi (talk) 18:34, 28 October 2010 (UTC)

Hi there. If my memory serves me correct, STiki considers the "current month" to be the window of examination. It finds the highest vandalism warning issued in the current month and issues the next most severe. Thus the behavior you encountered was "expected", but is open to interpretation on "correctness." I agree that the case you mentioned seems a little inappropriate for AIV reporting. What is an interval do you think the AIV folks would consider appropriate? (and I also need to allow this interval to span over multiple calendar months). Thanks, West.andrew.g (talk) 22:30, 1 November 2010 (UTC)
Yeah I figured it had some internal guideline it was following and that it had just maybe gotten out of step with AIV's feelings. My rough feeling is about a week as the window - that's the interval after which I tend to restart warnings from level one with an IP when I'm doing them by hand, although I have no particular knowledge of what the AIV admins use as any private guidelines. Thanks for addressing this! keɪɑtɪk flʌfi (talk) 23:04, 1 November 2010 (UTC)

Another idea?

I'm sure you're busy with technical academic stuff; I just wanted to throw something in your direction. I've been playing around on some crowdsourcing sites and found a lot of overlap in the reliability/spam issues there as on Wiki. Not surprising, since both are startling new models of open communities with high rates of freeloaders that raise shared costs to the open community.

This guy, Panos Ipeirotis, is a Computer Science researcher at NYU's business school and he specializes in studying these communities and modeling trust research about their users. Here's a blog that gets a little at his scope. Maybe a) you'd have some areas of research in common b) there's a role for meta-data in crowdsource data qualification c) you can actually get paid for your skillz one day.

Obviously, I have some interest in this stuff, but my technical background is not quite sufficient to do more than pass stuff on to specialists. So.... here. Ocaasi (talk) 07:29, 1 November 2010 (UTC)

Thanks Ocassi, I'll take a look at the pointers you provided.
I have been a little busy lately with "academic stuff" -- but plan to clear a hurdle in mid-November, at which point I plan to dedicate some time to STiki -- implementing some of the feature requests and squashing some bugs I've been promising for weeks/months. Along the same front, I'm in some collaboration with the WikiTrust folks and a team of natural-language processing experts. Together our methods encompass the major vandalism detection strategies, and early results show a classifier built from our combined signals has some impressive performance. Thanks again, West.andrew.g (talk) 22:37, 1 November 2010 (UTC)

STiki on Linux

How should I run STiki on Linux? I downloaded the .jar file, but have no clue what to do with it. This wouldn't require compiling would it? Any help would be much appriciated.RadManCF open frequency 21:43, 15 July 2010 (UTC)

Hi there RadMan. If you download the "GUI/executable" version of STiki (as opposed to the "source") -- you won't need to compile. Since you have a *.JAR file -- you certainly got the "executable" version. All you need to do is issue the command "java -jar /path_to/STiki_exec...jar" in a terminal (fill in the requisite portions as it applies to your system), and the STiki GUI should display. There are a few details along these lines in the README file in the *.ZIP you downloaded. Thanks, and let me know if you have any other questions. West.andrew.g (talk) 17:41, 17 July 2010 (UTC)

Vandalism reversion mistake

Hi Andrew, 193.130.87.54 made two vandalism edits to the sky article, but you only reverted one of them. I've just cleaned the damage up. You might want to modify STiki to take into account consecutive vandalism edits, or do something to minimise the risk of this problem. Graham87 03:21, 16 September 2010 (UTC)

Thanks Graham (also for the grammar nit on STiki's homepage)! Incorporating "rollback" in place of "revert" is on my TODO list for STiki. It's easy for those who have the rollbacker right, but its a little more complicated to build "in-software rollback" for those who don't have it. I figure this functionality will take care of most multi-edit vandalism. Either way, its my own clumsiness that I didn't notice. Thanks, West.andrew.g (talk) 04:56, 16 September 2010 (UTC)
On reverting consecutive edits without rollback, see this thred at the technical village pump from April 2007. I'm not sure if the problem still exists; I subsequently encountered it at demographic transition, but I don't know of any more recent cases. It seems to require a vandal and a user to be editing two sections of an article at exactly the same time. Graham87 14:22, 16 September 2010 (UTC)
Rollback is now implemented in STiki, which should eliminate many issues relating to consecutive vandalisms. See the notes about the 2010/11/28 release. Thanks, West.andrew.g (talk) 03:34, 28 November 2010 (UTC)

STiki

I just came across a case of an editor issuing a level 1 warning to a vandal who had only just received a 4im. I blocked the vandal and then kind of took that editor's head off, only to receive an apologetic message that STiki had issued the warning. Can you please get it to report to AIV after a level 4 or 4im warning to avoid making a mockery of the previous warning? Thanks. HJ Mitchell | Penny for your thoughts? 21:01, 14 November 2010 (UTC)

STiki does report to AIV after detecting a level 4. Can you send me the diffs/edits in question so I can look at the particulars of what happened and see if I might be able to give you a better explanation? Thanks, West.andrew.g (talk) 22:23, 14 November 2010 (UTC)
This is the edit in question. STiki seems to have placed a {{uw-vandal1}} after a {{uw-defam4im}} issued an hour and a half earlier. Thanks for looking into it. HJ Mitchell | Penny for your thoughts? 22:39, 14 November 2010 (UTC)
Ahh, I see. Had the previous warning been a {{uw-vandalism4im}} then STiki *would* have reported at AIV. However, I was unaware of the "defamatory" set of warning templates. However, I'll note that STiki currently scans *only* for vandalism templates. I was unsure if it would be appropriate for STiki to report at AIV in such situations. A situation could exist where someone has a history of some problem (with warnings), which they fix -- but then get caught vandalizing once and end up blocked. Can you think of a rule by which to handle these situations? Along those lines, what other warning templates are there out there that might be leveraged like this? (obviously there are vandalism and spam). West.andrew.g (talk) 23:32, 14 November 2010 (UTC)
I see. That makes some sense. To answer your last question first, the full list of 4ims is here, they cover vandalism, spam (adverts and links), BLP (unsourced controversial statements), libel (deliberate defamation), personal attacks and a few other things. Personally, I would say that it's usually best to report to AIV after a 4im warning. I patrol AIV frequently, and I and the other admins that do recognise the reports that are generated automatically (Huggle, Igloo and STiki reports are very recognisable) and will weed out the reports based on incorrect warnings, but, in general, if they've received a 4im (even for personal attacks etc) and then vandalised, it's worth bringing to the attention of admins. Huggle automatically notes in its AIV reports that the last warning was a 4im, so that might be something to consider if it's not too complicated. That would prompt admins to investigate the warning and its preceding edit more thoroughly. HJ Mitchell | Penny for your thoughts? 00:07, 15 November 2010 (UTC)
Hi everyone. Per STiki's 2010/11/28 release, a solution has now been implemented. A user caught vandalizing by STiki who last received *any* 4im will be reported to AIV. The post to AIV will indicate the nature of this request. Thanks, West.andrew.g (talk) 03:42, 28 November 2010 (UTC)

Anti-Vandal Bot Census

Thought this might be of interest to you. It's a nice roundup of all the different Bots doing anti-vandal work. Not sure where he's going with it, but it seems right in your area of expertise. Link: http://en.wikipedia.org/wiki/User:Emijrp/Anti-vandalism_bot_census. Ocaasi (talk) 10:45, 7 December 2010 (UTC)

Thanks again for a pointer, Ocaasi. I am unsure where this project is going, either, but I heard about it on Wiki-Research-L mailing list and contributed some links/papers. Since you seem to take an interest in things of this nature, I thought I'd point out Cluebot-NG's review system. They seem to be amassing a labelled corpus for anti-vandalism purposes. How this differs from (1) my own labelling efforts, and (2) the PAN corpus of Potthast -- isn't immediately apparent. Thanks, West.andrew.g (talk) 22:26, 7 December 2010 (UTC)
Ok, makes sense that you had been there since it had some of your stuff on it. I signed up fro Cluebot's review system. Is that different from your actual STiki data... isn't STiki just a constant review system, or is there a difference between a reviewed corpus and the thousands of actual diffs? I did notice the STiki 'down days' and couldn't figure out if it had just gotten so popular that all of the vandalism was caught. The new version otherwise looks like it's working well. Cheers, Ocaasi (talk) 11:26, 8 December 2010 (UTC)r
STiki is a constant and live review system that produces corpus labels -- and then retrains over them to produce improved models. I don't think my efforts are terribly different from what the ClueBot-NG folks are doing. One possible difference, from what I could gleam, is that ClueBot-NG has multiple individuals classify the *same* edit in order to make sure poor labellers are not screwing up the corpus. STiki already has some 130,000 human labels, and probably another 200,000 vandalisms tagged by detecting "rollback" use. I'm going to contact the ClueBot folks and see if some cooperation might be advantageous. Thanks, West.andrew.g (talk) 17:02, 8 December 2010 (UTC)
That would be great. I have the feeling that between everyone already working on vandalism the solutions are just waiting to be put together. Would you learn anything from running all of their edits through STiki, or vice/versa? Ocaasi (talk) 17:19, 8 December 2010 (UTC)
First, see my posting over at Cluebot's talk page. Second, multiple solutions *have* been put together in an academic sense. I have cooperated with some of the top performers from the PAN competition (an NLP technique, and WikiTrust) -- and a paper is in submission about the effectiveness of the techniques in combination (I'll give you a hint -- it works really well). We are now in an effort to glue some APIs together and get the technique working in a live fashion for Wikipedia. The STiki GUI will still be the front-end, and I'll incorporate the other authors' techniques into the scoring engine. In fact, the WikiTrust portion should be integrated by sometime next week. Thanks, West.andrew.g (talk) 17:34, 8 December 2010 (UTC)
Awesome. I'm curious how it comes off. Have you considered throwing that RegEx badwords list into the mix? That would be meta-data, badwords, wikitrust... all together. I don't know if there's anything else, even. It's great that you've been able to link up your academic work with these practical solutions. Not all academics find themselves so useful. Was that part of the plan? Ocaasi (talk) 17:55, 8 December 2010 (UTC)
The PAN competition winner (Santiago Mola) who I am collaborating with includes a rather extensive list of "bad" words, as well as other lists that aren't necessarily "bad" but are highly indicative of bias, etc. Thanks, West.andrew.g (talk) 19:29, 8 December 2010 (UTC)

(outdent) Gotcha. One last idea and one question. Since WP:Pending changes has been running, about 1000 articles have seen 3-4 months of daily edit-by-edit monitoring. Only obvious vandalism is supposed to be reverted. I don't know if you need any more diffs, but maybe there's a way to collect those, or to see how editor judgment compares to STiki's guesses. Last, can you explain the difference (if there is one) between an alternating decision tree method and a neural network? Or point me to a link/article that does? I'm trying to piece together what's going on inside these things... Cheers. Ocaasi (talk) 12:26, 9 December 2010 (UTC)

The pending changes reviews might be interesting -- but I am pretty comfortable with the number of labelled diffs I have at the current point in time. More diffs aren't going to make dramatic improvements to STiki's performance -- which is why I am looking towards different feature methodologies (WikiTrust, NLP). As far as ADTrees vs. Neural Networks -- I am probably not the best person to ask. Of course, the articles themselves would probably be a good starting point. Thanks, West.andrew.g (talk) 20:56, 9 December 2010 (UTC)
Thanks... Wiki's machine learning coverage is still a bit rough, but I'll give them another look. I was more curious if there was a great overview in the literature or an AI introductory text that you had found particularly useful. For an interested non-coder who stopped at single-variable calculus, perhaps? Either way, looking forward to the STiki updates. Ocaasi (talk) 11:47, 10 December 2010 (UTC)

Regarding ClueBot NG

In regards to ClueBot-NG using your dataset of human-classified edits for training ClueBot-NG:

Considering that your dataset consists of edits that have passed your mechanism's filter, even if they are classified correctly, the set is sufficiently nonrandom and biased as to cause the bot significant problems. I don't think it would be very useful to us at all.

This is fine, but if you'd ever like these RIDs/labels, you are more than welcome to contact me. West.andrew.g (talk) 17:56, 16 December 2010 (UTC)

In regards to you using our dataset to train your algorithm:

We'd be happy to allow you access to our dataset. It's stored in a public MySQL database, indexed by (among other things) source of the edit. Since our dataset is a conglomeration of edits from multiple sources, you may wish to use only some of the sources available. Let me know if you'd like to follow up on this.

Not at the time being, but if the number of reviews grows significantly large, this is something I might be interested in. West.andrew.g (talk) 17:56, 16 December 2010 (UTC)

In regards to you using ClueBot-NG's scores for the STiki tool:

It should be possible for us to set up a live feed of all ClueBot-NG scores and actions, not just the ones above threshold. If you think this would be useful to you, let us know, and we can discuss a format for such a live feed. On IRC, you expressed interest in using the ClueBot-NG score as one input to your learning algorithm, and combining it with inputs from other algorithms - I'd caution you against this, because when combining multiple black-box outputs from different algorithms, there are really no statistical trends between the outputs that would help a machine-learning algorithm. It would end up just finding the most accurate one (with some error), and using that. I should also add that CBNG takes into account the "meta-data" that STiki does, as well as many statistics on content, and data from NLP. All are combined using a neural network to find optimal relationships between them. If you're still interested in having access to a complete feed of CBNG scores, let us know.

This is perhaps the point I am most interested in. Instead of combining black-box outputs, I could provide a separate edit queue entirely for Cluebot-NG non-reverts. Then GUI users could choose to pull from either the (a) STiki queue, or (b) Cluebot-NG queue. This would seem to be something your user's might appreciate as well, especially given your tolerance for false-positives is so low. Let's discuss making this happen? West.andrew.g (talk) 17:56, 16 December 2010 (UTC)

In regards to proper dataset generation:

Both of our tools would benefit from a proper, random dataset. We are trying to accomplish this with our review interface, which is focused on making sure edits are completely random, and edit classifications are completely correct. While this method will generate a very good dataset, it is quite slow. But I have an idea that may be able to generate a reasonable dataset much faster.

If the STiki userbase is large enough, and users patrolling using STiki are accurate enough, then human classified edits from STiki could be used, as long as they were sufficiently random. It's not practical (and would largely negate the purpose of STiki) to simply remove the edit pre-filter. But perhaps it could be disabled for a fraction of the edits. Say, one out of every four edits presented to the user would be random and not subject to the pre-filter. These 1/4 of edits would be marked as random, and could then contribute to a random dataset. Is this possible with your current architecture, and do you think STiki has a sufficiently large userbase to make this worthwhile? Crispy1989 (talk) 06:09, 16 December 2010 (UTC)

This is something I'll give consideration. STiki was officially rolled out in June and has since seen the manual inspection of 130,000 edits. While the user-base isn't necessarily huge -- we would have been able to collect some 30,000 annotations had we done this from the start. The technical details aren't too complex -- but I would like it if users could opt-in/opt-out of the random edits if they just wanted to use STiki in its pure form. West.andrew.g (talk) 17:56, 16 December 2010 (UTC)
Really interesting to try and follow this, and glad there's possibility for collaboration. Perhaps there'd be benefit in reviewing only the Cluebot NG edits that are below threshold; the ones Cluebot would revert if higher false positives were tolerable--not the ones which are already covered by the bot. What would be the point of having a live review of edits that Cluebot has already dealt with? Ocaasi (talk) 19:44, 16 December 2010 (UTC)
Ocassi, that is precisely the set I am interested in. At current, the only feed the ClueBot-NG folks have is one that says what their bot does. We are discussing setting up the inverse of that feed for STiki's purposes. Thanks, West.andrew.g (talk) 20:25, 16 December 2010 (UTC)

So it sounds like this could work. I've looked at STiki, and a few things strike me. First of all, it's a pretty nice GUI. The lack of a navigable browser is a small deficiency, but certainly not a deal-breaker. Although Java is not my language of choice (I try to squeeze every last bit of efficiency out of a problem), it is significantly better than existing alternatives such as Huggle (Huggle is Windows-specific, closed source, etc). Essentially, the interface is near ideal for live vandalism review and reverts.

I don't fully understand how each STiki client interacts with the central server (I can't find any such documentation), but I assume that the server is what scores the edits, and controls prioritization on the clients. I'm also guessing that this prioritization involves both "age" of the edit and score from the learning algorithms.

Looking at your algorithms for scoring edits, they don't look very effective. They do look significantly more effective than basic heuristic filters that existing alternatives use, but plugging it into a more advanced core engine would have a marked effect. ClueBot-NG, using optimal settings, uses both the statistics you use, and NLP, and as such, is highly accurate (using optimal settings for total accuracy, it's even more accurate than the PAN winner).

Combining your GUI (and revert/warn/etc logic) and our core engine could result in extremely effective live anti-vandalism tool with the potential to quickly eclipse alternatives. I have a few ideas of how to effectively combine them. Some of these, you may have already implemented, but I'll mention them anyway.

One major advantage STiki could have over, say, Huggle, is that STiki pulls its TODO from a central repository, presumably removing duplicate efforts, as I'm sure is common with Huggle. One of the key concerns is how to prioritize edits handed to the clients based on both score and timestamp. I propose an approach for this involving a sliding window over the time dimension. Ie, hold the last 60 seconds (or so) worth of edits in current consideration. Within this sliding window, prioritize by score. If the entire current sliding window is exhausted by users, then re-review edits that were not originally classified as vandalism within the sliding window. An additional useful feature would be to monitor the RC feed for reverts, and if an edit is reverted by another user while it's in the sliding window, it should be removed from the sliding window. This could even be extended to send a signal to a client currently reviewing an edit, if the edit is reverted by another user while it's being reviewed by STiki.

Another aspect that I previously mentioned is the ability to insert random edits, not subject to prioritization, into the queue to be reviewed by clients. You expressed an interest in implementing this on an opt-in basis with the clients. Potentially you could allow the client to set a "Percent Random Edits" that it receives. A good default could be 30%. "Random" edits would simply be picked from the sliding window at random without considering prioritization, and would be recorded as random, so they could be separated out for dataset use.

If you would like to use this method (which involves exclusively using CBNG's scores for prioritizing the sliding window), we can set up a full CBNG IRC feed for you, and that would be all you have to connect to. The feed would list edit IDs along with score and action taken by CBNG. CBNG typically operates at a set threshold, but sometimes, edits above threshold aren't reverted due to post-filters, the most significant of which is 1RR. Approximately a third of edits CBNG would have reverted are not reverted due to this rule - and the vast majority of them are indeed vandalism. STiki could present these along with below-threshold edits.

Let me know if you'd like to pursue this, so we can discuss specifics. Crispy1989 (talk) 22:05, 16 December 2010 (UTC)

There are more points raised here than I want to take a crack at in-line. But to summarize, I'd be interested in starting with a feed of RIDS/scores for precisely those edits which CBNG does *not* revert. I'd read them off of IRC and store them in a database-based queue which is identical to the style used by STiki's back-end. Then, I'll add a menu that allows clients to select if they want the "CBNG queue" or the "STiki queue." This seems like a reasonable starting point before we delve into some of the other functionality you discuss above. Thoughts? Thanks, West.andrew.g (talk) 23:38, 16 December 2010 (UTC)
I'd be interested in knowing how your system works right now. We're working on such a feed, but I think we could collaborate in more ways than just sharing the feed. There is indeed a lot to discuss, and it's more easily done on IRC than a forum-style talk page. I'll be there myself for the next several hours, if you're available. We could discuss everything in detail. Crispy1989 (talk) 23:44, 16 December 2010 (UTC)

Secure Web-Service Authentication Suggestion

I've been thinking about our discussion on IRC regarding converting STiki's direct db access to a web service, and ways to expand it. It seems to me that a key issue in the long run will be to authenticate users to the STiki server. Even with ClueBot NG's review interface, where we manually authorize accounts, we've already had several instances of vandals trying to gain access and corrupt the system with false results.

User authentication may be useful for more than just verifying classifications as well. If STiki becomes the predominant review software, issues involving vandals intentionally dequeueing edits to avoid review could arise. This could be mostly avoided by rate-limiting unauthenticated users (or users with below a requisite edit count) on the server - but to do this effectively, the server needs a way to verify the client's identity.

This is an interesting problem, because there are no apparent good solutions. The obvious is to have the client send the user's username and password to the server, and have the server try to authenticate against it - but I strongly recommend against this. Transmitting user credentials to an external server opens up a whole can of worms, and requires substantial security lockdown on the server. We explicitly decided against this method for the CBNG review interface.

One solution I can think of:

  1. Client requests authentication from server (via web service).
  2. Server generates secure random number (private identifier for the currently unauthenticated client), hereafter referred to as Session ID.
  3. Server generates a cryptographic hash (eg. SHA) of the Session ID, timestamp, and a secret key.
  4. Server sends hash, raw Session ID, and timestamp to the client.
  5. Client edits a Wikipedia page (perhaps a subpage of their own userpage, or maybe a page in STiki space) and adds the hash but NOT Session ID.
  6. Server watches RC feed, sees the edit, and records it in a database table containing username, hash, and edit timestamp. Duplicate usernames should be overwritten.
  7. Client contacts server via web service and gives the server the Session ID and the original timestamp that was sent to the client.
  8. Server generates a hash from the given Session ID and given timestamp, and does a database lookup to see if that hash has been placed on a page on Wikipedia.
  9. If the hash is not found in the database, or the timestamp is too old (more than a few minutes), return an error. Otherwise, if it was found in the database, the client is authenticated as the user in the database, and the database entry containing the hash and username should be removed.

After the user is authenticated, you need some way to preserve the authentication across multiple requests to the web services:

  1. Using another secret key, generate a hash of the user's username, a timestamp and a secret key. Append to this hash the raw username and timestamp. This set of information (hash, user, timestamp) is hereafter referred to as Authentication Token.
  2. The client supplies the authentication token to every future request to the web services. To verify the token, regenerate the hash using the username and timestamp supplied in the token. If the hash matches the hash given in the token, then the token is verified. Also make sure the timestamp is new enough (a token should only be valid for a day or two).

This method is not only completely secure, but also does not require handling users' Wikipedia credentials. It is also simple to implement, if you do so in PHP (as you suggested). PHP has builtin functions for hash generation and HTTP stuff.

Let me know how this looks to you. Crispy1989 (talk) 02:24, 18 December 2010 (UTC)

STiki Improvements

Per IRC, here are the improvements that can be made to STiki:

  1. "Post to AIV regarding frequent vandal found using STiki" Stating the obvious there, considering that's the edit summary on the page for that. Suggestion was to change to "Reporting [[User_talk:#u#|#u#]] for repeated vandalism." and replace User_talk: with Special:Contributions/ for an IP.
    1. Optional link on the word vandalism, to the last edit made that was reverted by STiki.
  2. All reports use {{Vandal}}; IPs should use {{IPvandal}}.
  3. The report states "Vandalized after recently receiving last vandalism warning ~~~~" While true, it doesn't give much for a blocking admin to go on. Certainly link the most recent vandalism, and perhaps try scrape some other diffs from the vandals' warn templates.
  4. I change the edit summary, but when I reopen, I need to type it again. Can you make it save it somewhere please.

I have some more, but am a bit short on time. Talk on IRC. 930913 (Congratulate/Complaints) 06:01, 23 December 2010 (UTC)

Items (1) and (3) have been completed. Item (2) was already implemented, but a mishandled boolean kept it from working. Item (4) will come sometime in the near future. I plan to developed a full fledged config.ini file to permit persistent user settings. I'll push these changes at the next release, which should include the CBNG edit queue. Look for it in the next few days. Thanks, West.andrew.g (talk) 17:41, 24 December 2010 (UTC)

Dec. 26 -- STiki going down for maintenance

Happy holidays, STiki users. STiki will be taken off-line for several hours today for maintenance -- I am aware of this down time. When it comes back online, old versions of the client will not work -- but there will be a new version available for download.

There is at least one big change in the new version which should make the trouble worth it -- MULTIPLE "revision queues." Whereas STiki's metadata-driven scoring system was the *only* one used in previous versions, "Cluebot" and "WikiTrust" now contribute their scores. I'll post the full CHANGELOG here when everything is online and operating. Thanks, West.andrew.g (talk) 15:52, 26 December 2010 (UTC)

New version available for download and servers are back up. Let the bug reports starting flowing (*sigh*). I'll change the main STiki page here shortly and post the CHANGELOG to the talk page. In the meantime, the documentation internal to STiki has been updated, or really, just look at the "Queues" menu. Thanks, West.andrew.g (talk) 19:46, 26 December 2010 (UTC)
For those interested, the CHANGELOG has been posted over at STiki's talk page. Thanks, West.andrew.g (talk) 03:41, 29 December 2010 (UTC)

IRC chat

Hey, I was on the cluebot irc channel and a few questions came up that I wanted to pass along. Crispy and Cobi didn't know that the new STiki version with the CBNG feed had come out, so I told them about it. I couldn't say much, except that it looks good. They weren't aware you were even pulling from the IRC spam feed to get the CBNG queue, since the IRC room didn't have any listeners. Crispy recommended that if you are pulling from the feed only periodically, that it'd be better to listen constantly and truncate the lowest scores if there's a database issue. They also said they're not going to incorporate the STiki data into the ANN until the STiki server security issue is worked out...? I don't know the details, but I believe Crispy posted about them a few sections up this page. It's interesting to see this come together, and I'm curious how the different feeds will compare in terms of their vandalism rates, etc. Let me know if there's any marginal, non-technical work I can do to help (so probably nothing). Anyway, thanks for the new version, and happy holidays/merry christmas. Ocaasi (talk) 04:26, 27 December 2010 (UTC)

Hi there, I'm a bit busy so I'll respond to this in brief. Yes, the new version is out. I was trying to roll out a bit quietly for now, because I'd like to enjoy some holiday time before getting too many bug reports, haha. STiki does intend to listen to CBNG all the time. However, when I restarted everything earlier today, I noticed the CBNG feed was not up (-Damian- said it was due to a power failure). If everything is back up over there, I'll reattach my listener. Thanks, West.andrew.g (talk) 04:32, 27 December 2010 (UTC)
Sorry to blow your holiday cover. All things in due course. Cheers, Ocaasi (talk) 04:44, 27 December 2010 (UTC)

I see your very good on the computer

I see your very good on the computer I was wondering if you could look at my messages here http://en.wikipedia.org/wiki/User_talk:TucsonDavid#Archiving_talk_pages and fix the archive box problem as I don't know how to. I only need one good box but I want the bot to auto archive if you can help that would be awesome. HAPPY YEAR.TucsonDavid GOD BLESS THE U.S.A. 05:17, 1 January 2011 (UTC)

Someone took care of it... User_talk:TucsonDavid#Archiving_talk_pages. Ocaasi (talk) 06:00, 1 January 2011 (UTC)
ThanksTucsonDavid GOD BLESS THE U.S.A. 10:23, 1 January 2011 (UTC)

Customizing edit summary in STiki?

Hi. I tried customizing the edit summary message (to say something a bit less strong than "test/vandalism"). But although the edit summary in the article revert was worded per my customization, the warning posted to the user's talk page still said "test/vandalism". I guess I'll restrict my use of STiki to those cases that are so absolutely "cut and dried" that no sane person could possibly think it was anything other than intentional, malicious vandalism. Richwales (talk · contribs) 06:40, 4 January 2011 (UTC)

Hi there. STiki's future plans include the addition of a fourth "good faith revert" button so that STiki users can clean up obviously unproductive things that don't meet the bar of "malicious intent." Of course if you *really* want, you could always open the source and re-compile with a new default comment of your choosing. Seems like a lot of work for something that shouldn't be too controversial, though. Thanks for your use of my tool, West.andrew.g (talk) 15:52, 4 January 2011 (UTC)
"Seems like a lot of work for something that shouldn't be too controversial, though." Actually, I've run into more than a few people who consider it a serious breach of WP:AGF to call anything "vandalism" that could possibly have been the product of good-faith cluelessness. I've come close to resolving never to use the terms "vandalism" or "rvv" again, ever — except that doing that would raise the ire of people who think I can't identify true vandalism.
This also makes me hesitant to use a tool that explicitly labels as "good faith" editing which very possibly was nothing of the sort. Perhaps a more neutral term, such as "unconstructive", would be a better choice in the edit summaries. That is, "identified as unconstructive" in the article revert's edit summary, and "user warning for unconstructive edit" in the user talk page edit summary. Richwales (talk · contribs) 17:57, 4 January 2011 (UTC)

STiki userbox

I couldn't find a userbox for STiki, so I made one (User:Richwales/Userboxes/STiki). Richwales (talk · contribs) 06:46, 4 January 2011 (UTC)

nice! Ocaasi (talk) 09:22, 4 January 2011 (UTC)
Cool deal! I also found User:Usb10/Userboxes/STiki floating out there. I'll add these both to STiki's main page shortly. Thanks, West.andrew.g (talk) 15:57, 4 January 2011 (UTC)

Talkback

Hello, West.andrew.g. You have new messages at Usb10's talk page.
Message added 02:12, 8 January 2011 (UTC). You can remove this notice at any time by removing the {{Talkback}} or {{Tb}} template.

Usb10 plug me in 02:12, 8 January 2011 (UTC)

Upgrades

Hi, I haven't seen you on IRC for a while.

I have a few days of almost-free time, and I'd like to use them to help you convert STiki into a proper web service architecture instead of its current state, so it's ready for mass use.

I've already written an authentication system for you. We can discuss the specifics on IRC. Since you're unfamiliar with PHP, Cobi and I can write most or all of it.

Converting it into this architecture will greatly enhance security and reliability, and minimize the chance that any malicious person could have an effect on results. The design I have in mind will also drastically increase flexibility with edit queue sources, hopefully without too much effort.

The authentication system is already working, and will allow the server to gauge the reliability of the user making the classifications without actually needing the user's credentials.

After implementing these improvements, it's my belief that STiki will be ready for mass use, and we (the ClueBot NG team) will be able to use it as dataset supplement, and we can officially refer users to STiki as their primary tool.

Please join us on IRC so we can discuss it. Crispy1989 (talk) 11:42, 14 January 2011 (UTC)

If you can't find time soon to join us on IRC, please provide me with a copy of your database schema, preferably with comments explaining what each table is for, and possibly non-obvious columns. I'd like to start writing the server-side PHP code for the edit pulling, and it will need to interface with the database. Crispy1989 (talk) 15:14, 14 January 2011 (UTC)

If you download the full source, then the [*server*] directory (something matching that regexp) contains a number of *.sql files. These are the stored procedures and the only calls made to the server. Shouldn't this be all that needs converted over to PHP? Thanks, West.andrew.g (talk) 15:19, 14 January 2011 (UTC)
Alternatively, couldn't we just make the PHP code call into the existing stored procedures? Heck, even I could probably code that up in PHP -- just straightforward parameter passing. Thanks, West.andrew.g (talk) 16:17, 14 January 2011 (UTC)
Um, you guys freakin rock. Glad you're both working on this! Ocaasi (talk) 21:36, 14 January 2011 (UTC)
One option would possibly be to just make the PHP code call the stored procedures, but this would only cause limited improvement. See section below and I'll detail my ideas.

Improvement ideas

The Problems

There are four areas I can think of that could use improvement:

  1. System architecture
  2. Server security
  3. Classification reliability
  4. Flexibility

I'll address each of these.

System architecture and flexibility kind-of go together. Currently, your use of stored procedures for everything server-side can work to some degree, but it has a number of problems. It's considered very bad practice to allow client applications to connect directly to a database.

In addition to the problems with flexibility and extensibility, you're also entirely removing abstraction from the equation. If you wanted to make a change and add an argument to a stored procedure, or perhaps switch over to a different SQL implementation, it would require universal client updates, or things would break. Also, direct connections to a database are very frequently blocked by firewalls, substantially limiting potential userbase. Using a normal web service, you could even make it into an applet that people could use from school or work in their free time. Another issue with the direct database connection is that, although not impossible, it's difficult to fully account for all possible security holes - particularly if someone decides to mount a denial-of-service-style attack by overloading the database with queries.

There are two aspects to server security. One, mentioned above, is the security of the database itself. This can be solved by the architecture changes. The other aspect is making sure people don't "game" the system. Ie, it would be very easy for someone to artificially drain the queue of all high-scored entries, allowing a lot of vandalism to get through. While STiki has a comparatively small userbase, this isn't a huge problem (but still is a problem), if it became the predominant tool, it could cause issues. So, there needs to be some method of preventing malicious users from just draining the queue.

Classification reliability refers to the usability of classifications as reliable machine learning training data. Training sets must be accurate. Right now, both novices making honest mistakes and malicious people could pollute the dataset. Such malicious people with this intent do exist - we get them all the time for the CBNG review interface. So there needs to be some way of gauging classification reliability and throwing out (not using in the dataset) classifications which are not sufficiently reliable.

There are a number of things that could be improved with flexibility, but one of the main one in my opinion is that it needs to be easier to add edit queue sources. Right now they're just hardcoded as database connections. Another possible improvement (I'm not sure if you've done this already) is to have a mechanism to remove an edit from a client's cached queue if the edit is reverted or updated while it's queued.

My Suggested Solutions

As far as system architecture goes, it's necessary for any of these other improvements to change it from a direct database connection to a more standard web service architecture (or similar). Nothing should need a direct database connection.

For both server security and classification reliability, there must be some mechanism to gauge how trustworthy the user is. This is a relatively minor aspect to server security, but is extremely important for classification reliability. There are a number of ways to effectively gauge user trustworthiness (WikiTrust, edit count vs warning count, etc), but all of these hinge on making sure the user is who they say they are. Currently, the Wikipedia credentials are only used on the client to log into Wikipedia, and the server has no way of verifying the user's Wikipedia username (ie, authentication). The easy solution is to just send the user's username and password to the server, but this could potentially cause a ton of problems relating to transmitting user credentials to any non-wikimedia server.

I have devised (and already implemented) a system that allows the client to authenticate itself to the server without transmitting the credentials. I call it WAT or WATS, for Wikipedia Authentication Token System. You can read about the steps for a client to implement it here. The main source code and verification source code are also posted. The end result is that the client gets an opaque token that it can use to authenticate to any non-wikimedia server without transmitting credentials. The server performing the authentication doesn't even need a secret key.

Once you have a reliable method of authenticating users, the following improvements are possible:

  1. Rate limiting queue removals and/or classifications per-user to avoid malicious emptying of the queue.
  2. Gauging user trustworthiness for classification reliability.

One of the largest architectural changes I have in mind to improve flexibility is to abstract edit queue sources to the point that it's trivial to add a new one - without even updating the STiki distribution. The idea is that each edit source is associated with the URL of a web service that can be called to pull edits from a queue or submit classifications. A predefined list of common ones could be distributed with STiki, but users would have the option of adding their own as well. Think about what this could mean:

  • Different queue sources could be hosted on different servers to prevent a single point of failure.
  • Different people could maintain their own queue sources easily. Ie, you could maintain the WikiTrust and your machine learning queues, and we could maintain the CBNG queue, updating it ourselves as necessary to comply with changes and improvements.
  • Non-database-driven queues could be added. For example, a simple queue that just picks a random edit out of the recent changes list could be extremely useful (even critical) to generating a random dataset.
  • Plugins to external systems could be added. For example, the CBNG team could write a queue source that uses its review interface queue as a backend, for non-live classifications.

I'm sure there are more possibilities I'm not thinking of.

I've already drawn up a specification for how such a web service could work. It's not final, but I believe it could work quite well. The first thing to implement would be a compatability layer with your existing database. Then, from there, more and different queue sources could easily be added and implemented.

Another improvement related to flexibility that's entirely client-side should be the ability to select multiple edit queue sources at the same time, and assign a percentage to each. Like, by default, 70% of edits could come from the CBNG feed (it seems to be the most reliable feed), and the remaining 30% could be divided among your old algorithm (for your research), the CBNG review interface (for offline reviews), and random edits from the RC feed (for a randomized dataset). It might also be nice to have an indicator somewhere of which queue source an edit came from.

About using your existing stored procedures - yes, it would be possible for the initial compatability layer, just to get the system off the ground and running. But additions such as rate limiting and user reliability estimations would require potentially substantial schema modification. The beauty of it though, is that these modifications (after the web services are implemented in the client) would not require any client updates at all. That's the great thing about abstraction.

Let's see if we can get this off the ground pretty soon. Crispy1989 (talk) 06:57, 15 January 2011 (UTC)

One very small comment to a mass of great stuff. I'm not sure that 'an indicator somewhere of which queue source an edit came from' is a good idea, since it couldlead to feedback loops (e.g. Oh CBNG is reliable and this is from CBNG so it's probably vandalism...). Maybe that feature could be opt-in or require digging into the advanced settings so that default users wouldn't perpetuate any bias they have. Ocaasi (talk) 07:50, 15 January 2011 (UTC)
Good point. Crispy1989 (talk) 13:52, 16 January 2011 (UTC)
The solutions all sound good to me. I'm not worried about queue bias; all queues have plenty of false positives to prevent anyone from getting into blind revert mode. If the queues ever do get that good, we can dispense with the reviewers entirely! Some further ideas:
  1. Test whether the user is blocked. We don't want blocked users to be able to drain a queue.
  2. Add a kill switch per user the way Huggle does so that admins can block a given user's access to STiki.
  3. Use edit count as part of the user trust determination. Any unblocked user with over 1000 edits probably isn't running STiki to drain the queue or mess up the classification scheme. They may still make bad edits, but they'll get blocked eventually. Block history may also be useful.
  4. Activate the option to not just consider the most recent edits. I've seen a lot of double edits where the purpose of the second edit is to hide the damage done by the first. There often is enough context to figure it out, but those are of course only the cases that I caught. Three modes would be ideal: Most recent only, any edit by the most recent editor, and any live edit. Determining a live edit can be difficult since I've seen cases of users adding extra vandalism on an undo operation, or not really undoing the whole thing. It would be nice if MediaWiki had a way to flag an undo action that doesn't exactly undo the original change.
  5. Integrate the classification results into the real-time CBNG processing so that if a reviewer flags an edit as vandalism, CBNG will disable 1RR for that specific user/article combination for 24 hours. I know there could be issues with this, including reduced accuracy or delays in reports to AIV, but it's something to think about to deal with persistent vandalism and to reduce the lifetime of vandal edits. Vandals will always attack the weakest link and 1RR is certainly a weak point.
UncleDouggie (talk) 08:11, 27 January 2011 (UTC)
Here's a few more:
  1. Add an option to ignore edits within the last x minutes. This will reduce contention with other tools that only work off the RC feed. One of the best capabilities of STiki is finding old vandalism. To the extent that we waste our time fighting with Huggle users for a revert, we're increasing the lifetime of older vandalism.
  2. Prioritize edits to any article that I've edited within the last 24 hours, except for my own edits of course. These edits should be exempt from the ignore recent edits described above. I realize that this is hard from a DB standpoint, but that's how I'd like it to work.
UncleDouggie (talk) 07:02, 28 January 2011 (UTC)

Stiki problem

I keep getting a funny error saying unable to connect to Back End, program will now disconnect. Only problem its saying I'm not connected to the internet but I am. any help would be appreciated.TucsonDavidU.S.A. 03:27, 29 January 2011 (UTC)

Server down, per one or two threads above. Working hard to fix it now. A clearer error message will replace that one in the future. Thanks, West.andrew.g (talk) 03:33, 29 January 2011 (UTC)
And as a note to all. Even when it does come back up -- things could be a little spotty as I work out the kinks. So let's hold off on any bug reports for a little while. Thanks, West.andrew.g (talk) 03:36, 29 January 2011 (UTC)
Should be back up. I have most things stabilized. Thanks, West.andrew.g (talk) 21:36, 29 January 2011 (UTC)

Database Access

Examining STiki's current stored procedures, it looks like the database should be completely redesigned - it looks like it got so messy mainly due to feature creep (ie, not designing it in the first place with all possible future features in mind). Over-reliance on overly complex stored procedures for critical application logic is also an issue.

I've finished making a redesigned database specification, and I'd like to write import scripts so data in the existing database can be used transparently without scrapping it. Can you give me full read access to your current database so I can work on an import script? After import scripts are done, I'll work on making a web service to duplicate current functionality. Crispy1989 (talk) 18:43, 16 January 2011 (UTC)

To Crispy and the other CBNG parties, I apologize for my recent latency. Real life has dominated and I have had no time to address these improvement issues. To be succinct -- a complete database re-design is problematic. What STiki uses is only a small portion of my database -- and I have tens of additional scripts that are dependent on the current structure (which is some of the reason why it is not STiki optimized). Moreover, the current schema still scales acceptably (computing is cheap, my time is probably less so). I'd rather concentrate on the more immediate issues of (1) authentication to thwart attacks, and (2) an http interface, to free those firewalled on [port:mysql]. While my timetable on this is probably a bit broader than your own, I am still heading in this direction.Thanks, West.andrew.g (talk) 07:06, 1 February 2011 (UTC)

Classification question

Just how messed up do the classification algorithms become when I just legitimately classified this lovely edit as Innocent? —UncleDouggie (talk) 11:06, 27 January 2011 (UTC)

Let it be an academic question, since the redirect is actually correct! The name of Eminem's album was actually Just Don't Give a Fuck. Ocaasi (talk) 11:28, 27 January 2011 (UTC)
That's why I didn't revert it. —UncleDouggie (talk) 01:08, 28 January 2011 (UTC)
Oh, you meant do the 'good fucks' throw the algorithm? That makes sense. I thought you thought you'd just had a false negative. Ocaasi (talk) 08:15, 28 January 2011 (UTC)
Given that STiki users have classified 150,000+ edits, this isn't the type of thing that makes a significant change in scoring models. Contrary, these classic "false positive" cases can be extremely useful in improving the classifiers. Thanks, West.andrew.g (talk) 14:16, 27 January 2011 (UTC)

STiki Release -- Jan. 31, 2011

A new version of STiki has been released. See the STiki project page to download, or visit release notes for a listing of the improvements and bug-fixes. Thanks, West.andrew.g (talk) 08:35, 31 January 2011 (UTC)