I was recently looking for a way to put Wikipedia on my notebook to be able to access it when I have no Internet connection. A research on Wikipedia revealed that there was an up to date German DVD version but no English version which I think is somewhat strange considering that the English section of Wikipedia has more users than the German section.
I was able to spot a distribution called Schools Wikipedia however that offered parts of Wikipedia for schools as CD and DVD image. The only difference between both distributions is the lack of full sized images in the CD version. The CD version can be downloaded directly from the projects website while the DVD version is supplied via Bittorrent.
All submitted articles then went through a clean-up script to remove Fair Use images, all sentences whose only purpose was to link to unincluded articles (e.g. "see also"), stubs and editorial content, and sections containing material unsuitable for children and external links. Where articles had been vandalised or contained questionable material the most recent good version was used. The resulting 2007 Selection is browsable at http://schools-wikipedia.org and was fixed in terms of article selection on 17 May 2007
This is not the complete archive but it's the only release currently available that has been processed. The only other way of obtaining a complete copy of Wikipedia is to download the XML database and create the pages either dynamically or static using that database file.
If you want to give this a go I can recommend WikiFilter which is both a text parser and web filter. You need a XML database file, and an Apache or Windows server for it.
Since this is something that most users will have troubles with I suggest to download the Schools Wikipedia instead of if you want a local copy. You can browse the contents here.
Advertising revenue is falling fast across the Internet, and independently-run sites like Ghacks are hit hardest by it. The advertising model in its current form is coming to an end, and we have to find other ways to continue operating this site.
We are committed to keeping our content free and independent, which means no paywalls, no sponsored posts, no annoying ad formats (video ads) or subscription fees.
If you like our content, and would like to help, please consider making a contribution:
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.