Pywikipedia

From MythTV Official Wiki
Revision as of 02:21, 31 January 2006 by Gregturn (talk | contribs)

Jump to: navigation, search

pywikipedia is a set of python tools used to maintain mediawiki sites.

Description

You can build a set of scripts, or with an interactive python shell, perform maintenance activities. This has been used in many ways to help migrate this site by cleaning up Special:Wantedpages.

This library supports scanning all pages, reading the wikitext, and making updates. It also has a throttling mechanism built in so that robot operations won't tax the wiki web server.

Examples

The following section lists various samples of how pywikipedia has been used to maintain this site.

Example 1

The following batch job was kicked off to migrate every page that had Category Homepage into Category:MythPeople, and thus retire the wanted article of Category Homepage.

% python
>>> from pywikipedia import wikipedia
>>> page = wikipedia.Page(site=site, title='Category Homepage')
>>> links = page.getReferences()
Getting references to [[Category Homepage]]
>>> for eachPage in links:
...     wikitext = eachPage.get()
...     newtext = re.compile("Category Homepage").sub("Category:MythPeople", wikitext)
...     eachPage.put(newtext=newtext, \
...        comment='[[pywikipedia]] assisted cleanup -> Moving Category Homepage to [[:Category:MythPeople]]', \
...        minorEdit=True)

Example 2

The following batch job was kicked off to edit every article that had Mail To by substituting it with mailto:.

>>> page = wikipedia.Page(site=site, title='Mail To')
>>> links = page.getReferences()
>>> for eachPage in links:
...     wikitext = eachPage.get()
...     newtext = re.compile("\[\[Mail To\]\]").sub("mailto:", wikitext)
...     eachPage.put(newtext=newtext, comment='pywikipedia update -> Replacing Mail To with mailto:', minorEdit=True)

Example 3 - relocate.py

A common pattern is emerging. There are many page links that appear as Foo Bar, where the article is titled FooBar. Probably means a script should be written, to make it repeatable.

import re, sys
from pywikipedia import wikipedia

oldname = sys.argv[1]
newname = sys.argv[2]

site = wikipedia.getSite()
page = wikipedia.Page(site=site, title=oldname)
links = page.getReferences()
for eachPage in links:
        wikitext = eachPage.get()
        newtext = re.compile("\[\[" + oldname + "\]\]").sub("[[" + newname + "]]", wikitext)
        eachPage.put(newtext=newtext, \
                comment='[[pywikipedia]] assisted cleanup -> replacing ' + \
                oldname + ' with [[' + newname + ']]', minorEdit=True)

To run the script, it needs two arguments. (Use quotation marks to escape the shell).

% python relocate.py "Foo Bar" "FooBar"

It will find every article pointed at Foo Bar, substitue FooBar, and then write it as a minor edit. This helps reduce the number of Special:Wantedpages.

External References