Schools Wikipedia CD and DVD image

Martin Brinkmann
Jan 5, 2008
Updated • Dec 15, 2012
Internet
|
2

I was recently looking for a way to put Wikipedia on my notebook to be able to access it when I have no Internet connection. A research on Wikipedia revealed that there was an up to date German DVD version but no English version which I think is somewhat strange considering that the English section of Wikipedia has more users than the German section.

I was able to spot a distribution called Schools Wikipedia however that offered parts of Wikipedia for schools as CD and DVD image. The only difference between both distributions is the lack of full sized images in the CD version. The CD version can be downloaded directly from the projects website while the DVD version is supplied via Bittorrent.

All submitted articles then went through a clean-up script to remove Fair Use images, all sentences whose only purpose was to link to unincluded articles (e.g. "see also"), stubs and editorial content, and sections containing material unsuitable for children and external links. Where articles had been vandalised or contained questionable material the most recent good version was used. The resulting 2007 Selection is browsable at http://schools-wikipedia.org and was fixed in terms of article selection on 17 May 2007

This is not the complete archive but it's the only release currently available that has been processed. The only other way of obtaining a complete copy of Wikipedia is to download the XML database and create the pages either dynamically or static using that database file.

If you want to give this a go I can recommend WikiFilter which is both a text parser and web filter. You need a XML database file, and an Apache or Windows server for it.

Since this is something that most users will have troubles with I suggest to download the Schools Wikipedia instead of if you want a local copy. You can browse the contents here.

Advertisement

Tutorials & Tips


Previous Post: «
Next Post: «

Comments

  1. ilev said on August 4, 2012 at 7:53 pm
    Reply

    Doesn’t Windows 8 know that www. or http:// are passe ?

    1. Martin Brinkmann said on August 4, 2012 at 7:57 pm
      Reply

      Well it is a bit difficulty to distinguish between name.com domains and files for instance.

    2. Leonidas Burton said on September 4, 2023 at 4:51 am
      Reply

      I know a service made by google that is similar to Google bookmarks.
      http://www.google.com/saved

  2. VioletMoon said on August 16, 2023 at 5:26 pm
    Reply

    @Ashwin–Thankful you delighted my comment; who knows how many “gamers” would have disagreed!

  3. Karl said on August 17, 2023 at 10:36 pm
    Reply

    @Martin

    The comments section under this very article (3 comments) is identical to the comments section found under the following article:
    https://www.ghacks.net/2023/08/15/netflix-is-testing-game-streaming-on-tvs-and-computers/

    Not sure what the issue is, but have seen this issue under some other articles recently but did not report it back then.

  4. Anonymous said on August 25, 2023 at 11:44 am
    Reply

    Omg a badge!!!
    Some tangible reward lmao.

    It sucks that redditors are going to love the fuck out of it too.

  5. Scroogled said on August 25, 2023 at 10:57 pm
    Reply

    With the cloud, there is no such thing as unlimited storage or privacy. Stop relying on these tech scums. Purchase your own hardware and develop your own solutions.

    1. lollmaoeven said on August 27, 2023 at 6:24 am
      Reply

      This is a certified reddit cringe moment. Hilarious how the article’s author tries to dress it up like it’s anything more than a png for doing the reddit corporation’s moderation work for free (or for bribes from companies and political groups)

  6. El Duderino said on August 25, 2023 at 11:14 pm
    Reply

    Almost al unlmited services have a real limit.

    And this comment is written on the dropbox article from August 25, 2023.

  7. John G. said on August 26, 2023 at 1:29 am
    Reply

    First comment > @ilev said on August 4, 2012 at 7:53 pm

    For the God’s sake, fix the comments soon please! :[

  8. Kalmly said on August 26, 2023 at 4:42 pm
    Reply

    Yes. Please. Fix the comments.

  9. Kim Schmidt said on September 3, 2023 at 3:42 pm
    Reply

    With Google Chrome, it’s only been 1,500 for some time now.

    Anyone who wants to force me in such a way into buying something that I can get elsewhere for free will certainly never see a single dime from my side. I don’t even know how stupid their marketing department is to impose these limits on users instead of offering a valuable product to the paying faction. But they don’t. Even if you pay, you get something that is also available for free elsewhere.

    The algorithm has also become less and less savvy in terms of e.g. English/German translations. It used to be that the bot could sort of sense what you were trying to say and put it into different colloquialisms, which was even fun because it was like, “I know what you’re trying to say here, how about…” Now it’s in parts too stupid to translate the simplest sentences correctly, and the suggestions it makes are at times as moronic as those made by Google Translations.

    If this is a deep-learning AI that learns from users’ translations and the phrases they choose most often – which, by the way, is a valuable, moneys worthwhile contribution of every free user to this project: They invest their time and texts, thereby providing the necessary data for the AI to do the thing as nicely as they brag about it in the first place – alas, the more unprofessional users discovered the translator, the worse the language of this deep-learning bot has become, the greater the aggregate of linguistically illiterate users has become, and the worse the language of this deep-learning bot has become, as it now learns the drivel of every Tom, Dick and Harry out there, which is why I now get their Mickey Mouse language as suggestions: the inane language of people who can barely spell the alphabet, it seems.

    And as a thank you for our time and effort in helping them and their AI learn, they’ve lowered the limit from what was once 5,000 to now 1,500…? A big “fuck off” from here for that! Not a brass farthing from me for this attitude and behaviour, not in a hundred years.

  10. Anonymous said on September 28, 2023 at 8:19 am
    Reply

    When will you put an end to the mess in the comments?

  11. RIP said on September 28, 2023 at 9:36 am
    Reply

    Ghacks comments have been broken for too long. What article did you see this comment on? Reply below. If we get to 20 different articles we should all stop using the site in protest.

    I posted this on [https://www.ghacks.net/2023/09/28/reddit-enforces-user-activity-tracking-on-site-to-push-advertising-revenue/] so please reply if you see it on a different article.

    1. RIP said on September 28, 2023 at 11:01 am
      Reply

      Comment redirected me to [https://www.ghacks.net/2012/08/04/add-search-the-internet-to-the-windows-start-menu/] which seems to be the ‘real’ article it is attached to

  12. RIP said on September 28, 2023 at 10:48 am
    Reply

    Comment redirected me to [https://www.ghacks.net/2012/08/04/add-search-the-internet-to-the-windows-start-menu/] which seems to be the ‘real’ article it is attached to

  13. Mystique said on September 28, 2023 at 12:13 pm
    Reply

    Article Title: Reddit enforces user activity tracking on site to push advertising revenue
    Article URL: https://www.ghacks.net/2023/09/28/reddit-enforces-user-activity-tracking-on-site-to-push-advertising-revenue/

    No surprises here. This is just the beginning really. I cannot see a valid reason as to why anyone would continue to use the platform anymore when there are enough alternatives fill that void.

  14. justputthispostanywhere said on September 29, 2023 at 3:59 am
    Reply

    I’m not sure if there is a point in commenting given that comments seem to appear under random posts now, but I’ll try… this comment is for https://www.ghacks.net/2023/09/28/reddit-enforces-user-activity-tracking-on-site-to-push-advertising-revenue/

    My temporary “solution”, if you can call it that, is to use a VPN (Mullvad in my case) to sign up for and access Reddit via a European connection. I’m doing that with pretty much everything now, at least until the rest of the world catches up with GDPR. I don’t think GDPR is a magical privacy solution but it’s at least a first step.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.