Wikipedia:Large language models: Difference between revisions

Content deleted Content added

Inline

Revision as of 12:41, 21 February 2023

The following is a draft working towards a proposal for adoption as a Wikipedia policy.
The proposal must not be taken to represent consensus, but is still in development and under discussion, and has not yet reached the process of gathering consensus for adoption. Thus references or links to this page should not describe it as policy, guideline, nor yet even as a proposal.

Shortcut

WP:LLM

This page in a nutshell: The use of Large Language Models to aid editing is not prohibited but has to be declared in the edit summary and their output must be rigorously scrutinized. Because of the many risks and pitfalls associated with them, only editors with substantial prior experience in the intended task should use them.

Large language models (LLMs) are computer programs for natural language processing that use deep learning and neural networks, like GPT-3. This policy covers how LLMs may and may not be used on Wikipedia to generate new or modify existing text. Potential problems include that generated contents are biased, non-verifiable, constitute original research, or violate copyrights. Because of this, LLMs should only be used for tasks in which the editor has substantial experience and their outputs must be rigorously scrutinized for compliance with all applicable policies. Furthermore, LLM use must be declared in the edit summary, and depending on the provider, in-text attribution may need to be given. Editors retain full responsibility for LLM-assisted edits.

LLM risks and pitfalls

Shortcut

WP:AIFAIL

“

Large language models have limited reliability, limited understanding, limited range, and hence need human supervision.

”

— Michael Osborne, Professor of Machine Learning in the Dept. of Engineering Science, University of Oxford,
January 25, 2023^[1]

Writing article prose

The use of LLMs to produce encyclopedic content on Wikipedia is particularly strongly associated with various risks. This clarifies key policies as they pertain to LLM application on the project, i.e. how the latter generally presents an issue with respect to the former. Note that this policy applies to all usages of LLMs independently of whether a provider or user of an LLM claims that, due to technological advances, it automatically complies with Wikipedia policies and guidelines.

Copyrights
Further: Wikipedia:Large language models and copyright
An LLM can generate copyright-violating material. Generated text may include verbatim non-free content or be a derivative work. In addition, using LLMs to summarize copyrighted content (like news articles) may produce excessively close paraphrases. The copyright status of LLMs trained on copyrighted material is not yet fully understood and their output may not be compatible with the CC BY-SA license and the GNU license used for text published on Wikipedia.
Verifiability
LLMs do not follow Wikipedia's policies on verifiability and reliable sourcing. They generate text by outputting the words most likely to come after the previous ones. If asked to write an article on the benefits of eating crushed glass, they will sometimes do so. LLMs can completely make things up. When they generate citations, those may be inappropriate or fictitious and can include unreliable sources such as Wikipedia itself.
Neutral point of view
LLM may produce content that is neutral-seeming in tone, but not necessarily in substance. This concern is especially strong for biographies of living persons.
No original research
While LLMs may give accurate answers in response to some questions, they may also generate interpretations that are biased or false, sometimes in subtle ways. Asking them about obscure subjects, complicated questions, or telling them to do tasks which they are not suited to (i.e. tasks which require extensive knowledge or analysis) makes these errors much more likely.

Other applications

The same policy considerations apply in potential LLM applications that are outlined here, with some of the more specific risks per each use case being pointed out.

Greater risk

Templates, modules and external software. LLMs can write code that works great, often without any subsequent modification. As with any code (including stuff you found on Stack Exchange), you should make sure you understand what it's doing before you execute it: bugs and errors can cause unintended behavior. Common sense is required; as with all programming, you should not put large chunks of code into production if you haven't tested them beforehand, don't understand how they work, or aren't prepared to quickly reverse your changes.
Copyediting existing article text. The same pitfalls as with LLM-assisted content creation exist here: instead of pure copyediting LLMs may change the meaning and introduce errors. Experienced editors may ask an LLM to improve the grammar, flow, or tone of pre-existing article text. Rather than taking the output and pasting it directly into Wikipedia, you must compare the LLM's suggestions with the original text, and thoroughly review each change for correctness, accuracy, and neutrality.
Summarizing a reliable source. This is inherently risky, due to the likelihood of an LLM introducing original research or bias that was not present in the source, as well as the risk that the summary may be an excessively close paraphrase, which would constitute plagiarism. You must proactively ensure such a summary complies with all policies.
Summarizing the article itself (lead expansion). Lead sections are nothing more than concise overviews, i.e. summaries, of article body content, and text summarization is one of the primary capabilities of LLMs which they were designed for. However pasting LLMs output to expand the lead is still inherently risky because of a risk of introducing errors and bias not present in the body.^[a] It's better to only use an LLM to generate ideas for lead expansion, and create the actual improvements yourself.

Lesser risk

Despite the aforementioned limitations of LLMs, it is assumed that experienced editors may be able to offset LLM deficiencies with a reasonable amount of effort to create compliant edits for some scenarios:

Tables and HTML. Because their training data includes lots of computer code (including wikitext and HTML), they can do things like modify tables (even correctly interpreting verbal descriptions of color schemes into a reasonable set of HTML color codes in fully formatted tables). If you do this, care should be exercised to make sure that the code you get actually renders a working table, or template, or whatever you've asked for.
Generating ideas for article expansion. When asked "what would an encyclopedia entry on XYZ include?", LLMs can come up with subtopics that an article is not currently covering. Not all of these ideas will be valid or have sufficient prominence for inclusion, so thoughtful judgment is required. As stated above, LLM outputs should not be used verbatim to expand an article.
Asking an LLM for feedback on an existing article. Such feedback should never be taken at face value. Just because an LLM says something, does not make it true. But such feedback may be helpful if you apply your own judgment to each suggestion.

Using LLMs

LLMs are assistive tools, and cannot replace human judgment. If you are using LLMs to edit Wikipedia, you must be aware of and then overcome their inherent limitations, and ensure your edits comply with relevant guidelines and policies.

The riskier use cases are tolerated, not recommended. They are reserved for experienced editors, who take full responsibility for their edits' compliance with Wikipedia policies.

Writing articles

LLMs are likely to make false claims. Their output is only a starting point, and must be considered inaccurate until proven otherwise. You must not publish the output of an LLM directly into a Wikipedia article without rigorously scrutinizing it for verifiability, neutrality, absence of original research, compliance for copyright, and compliance with all other applicable policies. If an LLM generates citations, you must personally check that they exist, and that they properly verify each statement. The use of language models must be clearly disclosed in your edit summary.

Even if you find reliable sources for every statement, you should still ensure that your additions do not give undue prominence to irrelevant details or minority viewpoints. You should ensure that your LLM-assisted edits reflect the weight placed by reliable sources on each aspect of a subject. You are encouraged to check what the most reliable sources have to say about a subject, and to ensure your edit follows their tone and balance.

Especially with respect to copyrights, editors should use extreme caution when adding significant portions of AI-generated texts, either verbatim or user-revised. It is their responsibility to ensure that their addition does not infringe anyone's copyrights. They have to familiarize themselves both with the copyright and sharing policies of their AI-provider.

Drafts

If an LLM is used to create the initial version of a draft or userspace draft, the user that created the draft must bring it into compliance with all applicable Wikipedia policies, add reliable sourcing, and rigorously check the draft's accuracy prior to submitting the draft for review. If such a draft is submitted for review without having been brought into compliance, it should be declined. Repeated submissions of unaltered (or insufficiently altered) LLM outputs may lead to a revocation of draft privileges.

Talk pages

While you may include an LLM's raw output in your talk page comments for the purposes of discussion, you should not use LLMs to "argue your case for you" in talk page discussions. Wikipedia editors want to interact with other humans, not with large language models.

Be constructive

Wikipedia relies on volunteer efforts to review new content for compliance with our core content policies. This is often time consuming. The informal social contract on Wikipedia is that editors will put significant effort into their contributions, so that other editors do not need to "clean up after them". Editors must ensure that their LLM-assisted edits are a net positive to the encyclopedia, and do not increase the maintenance burden on other volunteers. Repeated violations form a pattern of disruptive editing, and may lead to a block or ban.

Do not, under any circumstances, use LLMs to generate hoaxes or disinformation. This includes knowingly adding false information to test our ability to detect and remove it. Repeated misuse of LLMs may be considered disruptive and lead to a block or ban.

Wikipedia is not a testing ground for LLM development. Entities and people associated with LLM development are prohibited from running experiments or trials on Wikipedia. Edits to Wikipedia are made to advance the encyclopedia, not a technology. This is not meant to prohibit editors from responsibly experimenting with LLMs in their userspace for the purposes of improving Wikipedia.

Declare LLM use

Every edit which incorporates LLM output must be marked as LLM-assisted in the edit summary. This applies to all namespaces. For content added to articles and drafts, in-text attribution is necessary. If an LLM by OpenAI was used, this can be achieved by adding the following template to the bottom of the article: {{OpenAI|[GPT-3, ChatGPT etc.]}}. Additionally, the template {{AI generated notification}} may be added to the talk page of the article.

Experience is required

LLM-assisted edits should comply with Wikipedia policies. Before using an LLM, editors should have substantial prior experience doing the same or a more advanced task without LLM assistance.^[b] Editors are expected to familiarize themselves with a given LLM's limitations, and to use careful judgment to determine whether that LLM is appropriate for a given purpose. Inexperienced editors should be especially careful when using these tools; if needed, do not hesitate to ask for help at the Wikipedia:Teahouse.

Editors should have enough familiarity with the subject matter to recognize when an LLM is providing false information – if an LLM is asked to paraphrase something (i.e. source material or existing article content), editors should not assume that it will retain the meaning.

High-speed editing

Human editors are expected to pay attention to the edits they make, and ensure that they do not sacrifice quality in the pursuit of speed or quantity. For the purpose of dispute resolution, it is irrelevant whether high-speed or large-scale edits that a) are contrary to consensus or b) cause errors an attentive human would not make are actually being performed by a bot, by a human assisted by a script, or even by a human without any programmatic assistance. No matter the method, the disruptive editing must stop or the user may end up blocked. However, merely editing quickly, particularly for a short time, is not by itself disruptive. Consequently, if you are using LLMs to edit Wikipedia, you must do so in a manner that complies with Wikipedia:Bot policy, specifically WP:MEATBOT.

Handling suspected LLM-generated content

Identification and tagging

Editors who identify LLM-originated content that does not to comply with our core content policies should consider placing {{AI-generated|date=November 2024}} at the top of the affected article or draft, unless they are capable of immediately resolving the identified issues themselves.

This template should not be used in biographies of living persons. In BLPs, such non-compliant content should be removed immediately and without waiting for discussion.

Verification

All known or suspected LLM output must be checked for accuracy and is assumed to be fabricated until proven otherwise. LLM models are known to falsify sources such as books, journal articles and web URLs, so be sure to first check that the referenced work actually exists. All factual claims must then be verified against the provided sources. LLM-originated content that is contentious or fails verification must be removed immediately.

Deletion

If removal as described above would result in deletion of the entire contents of the article, it then becomes a candidate for deletion. If the entire article appears to be factually incorrect or relies on fabricated sources, speedy deletion via WP:G3 (Pure vandalism and blatant hoaxes) may be appropriate.

Citing LLM-generated content

For the purposes of sourcing: It is assumed that any LLM-generated material is not reliable, unless it appears from the circumstances of publication that it is significantly a human work insofar an entity with a reputation for fact-checking and accuracy took care that the output was modified in every way needed to ensure that the work meets a usually high standard.

Any source (work) originating from entities (news organizations etc.) known to generally produce content using LLMs, for which there is no clear indication of human involvement or lack thereof, especially a publication which attempts to deceive readers by crediting content that appears to be primarily LLM-generated to human authors (named, unnamed, or fictitious), should be treated as unreliable.

^ It should especially not be assumed that a prompt of "write/expand the lead section of X Wikipedia article" will generate a genuine summary; LLM-based applications which can't look things up on the internet (the norm as of early 2023) may not know what the exact content of the article is in order to be able to summarize it, and even if it has been a part of their corpus, they do not seem to function in such a way that they can isolate the whole article from the rest of the corpus, in order to derive the output exclusively from one article's content.
^ e.g. someone skilled at dealing with vandalism but doing very little article work is probably not someone who should start creating articles using LLMs, before they have gained actual experience at article creation without the assistance of these models; the same logic applies to creating modules, templates, using talk pages etc.

References

^ Smith, Adam (2023-01-25). "What is ChatGPT? And will it steal our jobs?". www.context.news. Thomson Reuters Foundation. Retrieved 2023-01-27.

[2] It should especially not be assumed that a prompt of "write/expand the lead section of X Wikipedia article" will generate a genuine summary; LLM-based applications which can't look things up on the internet (the norm as of early 2023) may not know what the exact content of the article is in order to be able to summarize it, and even if it has been a part of their corpus, they do not seem to function in such a way that they can isolate the whole article from the rest of the corpus, in order to derive the output exclusively from one article's content.

[3] .g. someone skilled at dealing with vandalism but doing very little article work is probably not someone who should start creating articles using LLMs, before they have gained actual experience at article creation without the assistance of these models; the same logic applies to creating modules, templates, using talk pages etc.

[1] Smith, Adam (2023-01-25). "What is ChatGPT? And will it steal our jobs?". www.context.news. Thomson Reuters Foundation. Retrieved 2023-01-27.

[1]

[a]

[b]

@@ Line 8: / Line 8: @@
 {{Rquote |1=right |2=Large language models have limited reliability, limited understanding, limited range, and hence need human supervision. |3=Michael Osborne, Professor of Machine Learning in the Dept. of Engineering Science, [[University of Oxford]]|4=<br />''January 25, 2023''<ref>{{Cite web |last=Smith |first=Adam |url=https://www.context.news/ai/what-is-chatgpt-and-will-it-steal-our-jobs |title=What is ChatGPT? And will it steal our jobs? |date=2023-01-25 |access-date=2023-01-27 |website=www.context.news |publisher=[[Thomson Reuters Foundation]]}}</ref>}}
-=== Content creation ===
+=== Writing article prose ===
 <!-- list of policies -->
 The use of LLMs to produce encyclopedic content on Wikipedia is particularly strongly associated with various risks. This clarifies key policies as they pertain to LLM application on the project, i.e. how the latter generally presents an issue with respect to the former. Note that this policy applies to all usages of LLMs independently of whether a provider or user of an LLM claims that, due to technological advances, it automatically complies with Wikipedia policies and guidelines.
@@ Line 20: / Line 20: @@
 === Other applications ===
 <!-- not a list of policies, descriptive prose per use case -->
+The same policy considerations apply in potential LLM applications that are outlined here, with some of the more specific risks per each use case being pointed out.
 ==== Greater risk ====
 * '''Templates, modules and external software.''' LLMs can write code that works great, often without any subsequent modification. As with any code (including stuff you found on [[Stack Exchange]]), you should make sure you understand what it's doing before you execute it: {{em|bugs and errors can cause unintended behavior}}. Common sense is required; as with all programming, you should not put large chunks of code into production if you haven't tested them beforehand, don't understand how they work, or aren't prepared to quickly reverse your changes.