wkhtmltopdf is an open source command-line tool that can save web pages as a PDF or an image - gHacks Tech News

wkhtmltopdf is an open source command-line tool that can save web pages as a PDF or an image

SingleFile, its fork SingleFileZ and Save Page WE are excellent options to save entire webpages as a single HTML page. But sometimes, you may want the web page to be easily accessible. For example, you may want to have your study materials, research papers on your phone and computer. Converting the HTML to a PDF is a good way to ensure cross-device compatibility,

wkhtmltopdf

Wkhtmltopdf is an open source and cross-platform tool that can convert HTML pages to PDF. This is a command-line utility, but don't worry, it's pretty easy to use.

Install the application or extract the portable archive to a folder. Open a command prompt window.

The syntax for using the tool is fairly simple, enter the name wkhtmltopdf, followed by the URL of the web page, and the name of the PDF that you want to create, like so.

wkhtmltopdf URL filename.pdf

Let's say you want to save a copy of a website, this is what the command will look like.

wkhtmltopdf https://www.example.com/page example.pdf

That wasn't difficult now, was it? But I would recommend using a slightly different command.

wkhtmltopdf -n https://www.example.com/page example.pdf

The only difference here is the -n switch, which disables Javascript from loading. It also speeds up the process of the PDF creation. You can also use --disable-javascript for the same effect, but using -n is much simpler. If a page has a lot of scripts, the conversion process might get stuck, and sometimes the output file's appearance could also be affected, but if you want a quick processing, -n is your best option. You can run it without the parameter if the output is garbled.

When the command is passed to the program, it loads the web page in the background (without opening your browser), and begins the conversion process. This may take a minute or two depending on the webpage's content, but it does not require any interaction. So, just to wait for it to complete and your PDF is ready to use. wkhtmltopdf saves the PDF in its own folder. You should pay attention to the PDF name that you're using, because the program will overwrite any existing file in the folder without warning you about it.

wkhtmltopdf is an open source command-line tool that can save web pages as a PDF or an image

Here's a screenshot of a web page opened in Firefox.

original webpage

Let's use the tool. wkhtmltopdf https://en.wikipedia.org/wiki/Computer wiki.pdf

And here's what the PDF version of it (created using wkhtmltopdf) looks like.

wkhtmltopdf version of the webpage

As you can see from the picture, wkhtmltopdf extracted the page perfectly. It also preserves links on pages.

There are a ton of other options that you can use, you can view them by using the built-in help command.

wkhtmltopdf -h

If you only want the text from the webpage, you can add --no-images to the command. Don't want links (to other pages) to be included? Use --disable-internal-links

What if what you want to save the HTML file as an image? Why not, you can do that with the help of wkhtmltoimage. This utility is included as part of the wkhtmltox package. You can use the same commands that you used for the PDF, with the image tool.

wkhtmltoimage

wkhtmltoimage https://en.wikipedia.org/wiki/Computer wiki.jpg

It also supports other image formats such as PNG, BMP, but these end up with really large file sizes (100+ MB), JPG has the best compression level.

wkhtmltoimage file size

Wkhtmltopdf and wkhtmltoimage are available for Windows, Mac, Linux. The tool comes in portable versions for 32-bit and 64-bit systems.

Summary
software image
Author Rating
1star1star1star1stargray
2.5 based on 6 votes
Software Name
wkhtmltopdf
Operating System
Windows, Linux, Mac
Software Category
Internet
Price
Free
Landing Page
Advertisement

Previous Post: «
Next Post: »

Comments

  1. Tom Hawack said on December 9, 2020 at 4:44 pm
    Reply

    Far too cumbersome, IMO. Might be fine for occasional conversions but otherwise needs a front-end to get things done quickly.
    Personally I use ‘SingleFile’ as mentioned in the article when I wish to keep a page exactly as it appears in the browser; most of the time my aim is to backup only the content in which case I use a bookmarklet to send the page’s url to the ‘PrintFriendly’ site. I save quite a lot, can’t imagine having to go through a lengthy process each time. Generally speaking apps working with the command-line get on my nerves : why doesn’t the developer include a front-end? Some other developers happen to contribute to finish the work with a simple front-end for such half-baked code.

    1. Anonymous said on December 9, 2020 at 9:43 pm
      Reply

      So command line apps are all “half baked code” according to you? You are showing your ignorance in that comment, have you ever heard of automation? command line apps are actually much more useful.

    2. nothing said on December 9, 2020 at 9:45 pm
      Reply

      The power of the command line comes from automation. you can write script and templates and the process becomes easier and faster than using a GUI.

    3. Cigologic said on December 10, 2020 at 7:41 am
      Reply

      > @Tom Hawack: “Personally I use ‘SingleFile’ as mentioned in the article when I wish to keep a page exactly as it appears in the browser”

      In which case, you might be interested in the portable, standalone Monolith, which works the same way as the SingleFile addon to create a single, complete HTML file-replica of how the content appears in the browser.

      Even though Monolith is a commandline tool, it has very easy-to-use options. The advantage that it is independent of a web browser & won’t be broken by browser updates. More info at:
      https://www.ghacks.net/2020/12/09/wkhtmltopdf-is-an-open-source-command-line-tool-that-can-save-web-pages-as-a-pdf-or-an-image/#comment-4480238

      > “why doesn’t the developer include a front-end?”

      Perhaps neither the new maintainer nor the original developer is very skilled at GUI design. Or maybe nobody has ever indicated any demand for a GUI build to the devs.

      Although the program seems to be still in beta development (v0.12.6.1, released: 11 Jun 2020), I wouldn’t go as far as call it “half-baked code” though. The application has many options (see: https://wkhtmltopdf.org/usage/wkhtmltopdf.txt), which is where a GUI would definitely prove useful. Since the project is open-source, maybe a UX talent could contribute a front-end build.

      Like Monolith, the advantage of wkHTMLtoPDF & the companion wkHTMLtoImage over the SingleFile addon is that the former are independent of any browser. One can also convert a *local/ self-created* HTML file to PDF or image — just specify the path of the local HTML file, as opposed to a URL.

      However, the turn-off (for me) is the huge size of both the wkHTMLtoPDF & wkHTMLtoImage binaries, as well as wkHTMLtoImage’s output image files.

      An alternative is install a system-wide PDF “printer” (eg. Bullzip PDF Printer), which will allow one to save basically anything (that offers the ‘Save as’ dialog) as a PDF file. Windows 10 comes with the built-in “Microsoft Print to PDF” in its print dialog, which functions the same way.

      Also, I recall that the Fireshot screen-capture addon is able to save a scrolling webpage as PDF (& of course, image) as well.

    4. Tom Hawack said on December 10, 2020 at 9:25 am
      Reply

      @Anonymous, @nothing, @Cigologic … OK, guys, I guess you’re right. I’ll admit that disliking command-line applications made me slip on trying to legitimate this dislike with wrong reasons : it’s not that the code is half-baked it’s only that it’s another approach. But is it not an approach which has the favors of geeks more than of the common user? If so this still wouldn’t mean it deserves a common user’s bad critic.
      I won’t beg your pardon because I never beg but you get the idea :=)

  2. Olivier said on December 9, 2020 at 6:32 pm
    Reply

    Thanks, very useful. I will definitely use it to quickly save a web page as an image.

  3. nothing said on December 9, 2020 at 9:42 pm
    Reply

    FoxyTab firefox extension can save pages as PDFs, it’s clunky as hell though.

  4. tester said on December 9, 2020 at 10:15 pm
    Reply

    Hi Tom,

    You make a good point.
    Could you pls include the bookmarklet JS code
    for the ‘PrintFriendly’ site,
    that makes your life easier?.

    Thanks for your help Tom!
    SFer
    using:
    latest chrome browser w/Ubuntu LINUX 20.04 here…

    1. Tom Hawack said on December 10, 2020 at 9:31 am
      Reply

      Hi Tester,

      The bookmarklet is more-simple-than-that-you-win-the-first-prize :

      Save Page with PrintFriendly:
      [ javascript:void(open(‘https://www.printfriendly.com/print/?url=’+location.href)) ]

      Without the [ ] of course, you know that but not sure everyone does.

      There’s also a dedicated Firefox extension but I see you’re running Chrome …

      1. tester said on December 10, 2020 at 2:52 pm
        Reply

        Thanks TOM HAWAK
        for your frequently good and clear advice!.

        Applying your practical bookmarklet now…

      2. tester said on December 10, 2020 at 2:55 pm
        Reply

        @TOM HAWAK
        Thanks for your (always) clear
        and practical advice, Tom.
        Using your bookmarklet now, in my Chrome browser.

        Perfect!…

      3. VioletMoon said on December 10, 2020 at 5:28 pm
        Reply

        There is a Chrome Extension for all major browsers. The Bookmarklet can be found under Other at the home site. It’s really the original Print Friendly because early on in development there wasn’t an add-on.

        https://www.printfriendly.com/extensions/chrome

        May want to try the API

  5. Peterc said on December 9, 2020 at 10:31 pm
    Reply

    I don’t save webpages as PDFs very often, but when I have in the past, I’ve used CutePDF Writer (free), which basically installs a “PDF printer” on your system. When you want to save a webpage as a PDF, you print it and choose “CutePDF Writer” as the printer. If I’m remembering correctly, the results weren’t always 100% beautiful, but they were *easily* “good enough.”

    I haven’t printed anything via CutePDF Writer on my Windows 10 system yet, but when I went through the printing step just now to refresh my recollection, I noticed there was already a “Microsoft Print to PDF” printer installed. I gather from a quick websearch that it’s a built-in Windows 10 feature. I’ve never used it, so I can’t say how it compares to CutePDF Writer. [UPDATE: I just printed the same Wikipedia article using both printers. The results were fine and almost exactly the same, so far as I could tell.]

    At any rate, both CutePDF Writer and Microsoft Print to PDF use your system’s standard “print routine” GUI, so no command-line stuff is required. From a privacy perspective, I *believe* all of the conversion via both printers is done locally, though I can’t guarantee that nothing at all ever gets “phoned home.” (In the past, CutePDF got a bad rep for questionable bundleware in its installers. I always managed to dodge it with the help of Unchecky, and it doesn’t seem to be present in the current version.)

    Long story short, if you’re running Windows 10 and you want to make PDFs, Microsoft Print to PDF should already be on your system and seems to be a lot easier to use than wkhtmltopdf. If you’re running an earlier version of Windows, CutePDF Writer should do the trick. If you want to make JPGs and prefer to avoid online conversion services, then maybe wkhtmltoimage is viable. As for Mac and Linux, well … I haven’t used a Mac for 25 years and I’ve never tried to save a webpage as a PDF or image in Linux! Someone else will have to point to easier local alternatives for those OSes. Still, wkhtmltopdf and wkhtmltoimage get at least one thumb up from me simply for being open-source and cross-platform.

    PS: Just out of curiosity, I opened the PDF I generated using Microsoft Print to PDF in IrfanView and saved it as a JPG. It only saved the first out of three pages (perfectly, by the way), and I didn’t see any easy way around viewing and saving each page as a JPG, one at a time — at least not using the GUI.

  6. VioletMoon said on December 9, 2020 at 11:46 pm
    Reply

    Not exactly what I would be looking for for saving a web page. I like Print Friendly. It allows some great editing of pages before saving and is ridiculously fast.

    Print Edit WE
    The Printliminator

    Papercut Mobility Print will fill the vacuum of Google Cloud Print if users don’t have a remote printing feature on their printers.

  7. Anonymous said on December 10, 2020 at 12:45 am
    Reply

    Print -> Save to PDF. What’s so hard about it?

    1. VioletMoon said on December 10, 2020 at 5:22 pm
      Reply

      Yes, the easiest, but one may want to delete something–a banner, e.g.

  8. Jim said on December 10, 2020 at 2:41 am
    Reply

    love to open up command line and type out word salad to get a screenshot that an extension can give in two or three clicks

  9. good_find said on December 10, 2020 at 5:34 am
    Reply

    This is a good find and nice put article. Don’t bother for the moaning of some “smart” or “I can do it better” comments from old regular posters here.

  10. Cigologic said on December 10, 2020 at 6:58 am
    Reply

    > From post: “wkhtmltopdf saves the PDF in its own folder.”

    To write the file at another location, one can optionally specify the output file path, egs:

    > wkhtmltopdf -n “URL or InputFilePath” “X:\Test\Output.pdf”
    > wkhtmltoimage -n “URL or InputFilePath” “X:\Test\Output.jpg”

    Based on my testing so far, it appears that that only wkHTMLtoPDF.exe & wkHTMLtoImage.exe are required for PDF/image conversion respectively. Just move the 2 EXE files to the main folder (or whichever location desired) for easier access.

    Everything else (as shown below) can be discarded to reduce the package’s overall filesize footprint:

    ▶ \wkhtmltox\bin\libwkhtmltox.a
    ▶ \wkhtmltox\bin\wkhtmltox.dll
    ▶ \wkhtmltox\include\wkhtmltox\ [everything in the \include\ folder]

  11. Cigologic said on December 10, 2020 at 7:21 am
    Reply

    Faster lightweight alternative … If one simply wishes to save a webpage as a **single, self-contained HTML file** that can be opened with any web browser or HTML editor, one can use the commandline tool called Monolith:

    ▶ Download: https://github.com/Y2Z/monolith/releases
    ▶ Usage Parameters: https://github.com/Y2Z/monolith#options

    Monolith has a quicker processing time, & the single 64-bit EXE binary is less than 5 MB — compared to the hefty wkHTMLtoPDF.exe or wkHTMLtoImage.exe (x64, almost 40 MB each, excluding several redundant files in the package).

    Comparison tests carried out on the same sample webpage:

    ❶ > monolith “https://en.wikipedia.org/wiki/Computer” -j -o “Output.html”

    ▶ Binary Filesize (monolith.exe): 4.87 MB
    ▶ Processing Time: 1 sec
    ▶ Output Filesize: 3.29 MB (HTML: single self-contained, with images & hyperlinks)
    ▶ Parameters: -j (disable Javascript); -o (specify output file name/path)

    ❷ > wkHTMLtoPDF -n “https://en.wikipedia.org/wiki/Computer” “Output.pdf”

    ▶ Binary Filesize (wkHTMLtoPDF.exe): 39.6 MB
    ▶ Processing Time: 5 sec
    ▶ Output Filesize: 1.23 MB (PDF)
    ▶ Parameters: -n (disable Javascript)

    ❸ > wkHTMLtoImage -n “https://en.wikipedia.org/wiki/Computer” “Output.jpg”

    ▶ Binary Filesize (wkHTMLtoImage.exe): 39.5 MB
    ▶ Processing Time: 5 sec
    ▶ Output Filesize: 10.7 MB (JPG: 1080 x 31904 px, 120 dpi)
    ▶ Parameters: -n (disable Javascript); –quality (image quality: 94 [default], available: 0-100)
    👎 Note: The output image — despite its (unreasonably) large filesize — is quite low-res & not very scaleable w/o becoming pixellated, even at the default save image quality of 94.

  12. Klaas Vaak said on December 10, 2020 at 12:21 pm
    Reply

    It is unfortunate that Ashwin compared wkhtmltopdf with Single File and Save Page WE only. While the comparison is a legitimate one, wkhtmltopdf can also be used in apps, such as e.g. in the note-taking app VNote, which is where I came across it for the 1st time.

    In VNote wkhtmltopdf is “baked into” the app, and is part of the export settings. There you can specify export requirements.

    FWIW, I can confirm that wkhtmltopdf is an excellent tool.

  13. DirCompUser said on December 10, 2020 at 6:22 pm
    Reply

    I’ve been using wkhtmltopdf for about five years up to Windows 8.1 and use Nirsoft’s Advanced Run as a gui of sorts (force UAC Elevation).
    On Windows the library and executable that I have total under 40MB in all (assuming nothing was dumped elsewhere than its program directory on install) so I didn’t understand that remark about bloat unless things have expanded massively in the last few years.
    As for the pdf printers and other means of preserving a webpage suggested by commenters, how many of them actually save hyperlinks with text showing the URL address rather than just the label which does not work as a link to a url? Inevitably you go back some time later to a web page saved as PDF and click on a link label only to find the url address wasn’t saved; example PDFCreator. Conversely wkhtmltopdf does save the url address under link labels as does PrintFriendly but the latter is a second record of your browsing history if you’re bothered (I’m not and use it also now and then).
    wkhtmltopdf has similar (but more flexible) functionality to the save as pdf function in Chromium, Opera etc which browsers presumably derived their versions from it. A longstanding criticism of Firefox imo remains its non-adoption of a similar save to / print to pdf function, although its simplify page in print preview is useful sometimes.

  14. Peterc said on December 10, 2020 at 10:47 pm
    Reply

    @DirCompUser:

    “As for the pdf printers and other means of preserving a webpage suggested by commenters, how many of them actually save hyperlinks with text showing the URL address rather than just the label which does not work as a link to a url?”

    Excellent point. I just retrieved the test PDFs I made using CutePDF Writer and Microsoft Print to PDF from my Recycle Bin, and none of the links worked. (Good thing I hardly ever save webpages as PDFs.)

    When I want to save a fully functioning local copy of a webpage for my own use, I save it as a MAFF archive using the MozArchiver extension in Pale Moon. The results are great — perfect, really — but using a browser/extension-specific file format is obviously not a very future-proof or sharing-friendly strategy.

    Monolith’s more universal file format sounds like a generally safer bet, and it looks like there’s a Monolith extension for Google Chrome. On the other hand, based on what I read, it doesn’t appear to do as complete and perfect a job as MozArchiver does (e.g., it doesn’t pull in embedded video).

    At any rate, thanks for pointing out the potential problem with links on PDF printers. I’ll be keeping it in mind the next time I need to save a webpage as a PDF. In fact, I’m downloading wkhtmltopdf right now…

    1. Cigologic said on December 31, 2020 at 12:10 pm
      Reply

      > @Peterc: “Monolith’s more universal file format sounds like a generally safer bet, and it looks like there’s a Monolith extension for Google Chrome. On the other hand, based on what I read, it doesn’t appear to do as complete and perfect a job as MozArchiver does (e.g., it doesn’t pull in embedded video).”

      The latest version of Monolith v2.4.0 (26 Dec 2020) saves embedded audio & video by default — unless specified otherwise by user via the parameters “-a” (exclude audio assets) & “-v” (exclude video assets).

      https://github.com/Y2Z/monolith/releases/tag/v2.4.0

      I tried it on a dictionary/ language-learning webpage with embedded HTML5 MP3 audio. The saved HTML file is complete & works fine offline in a web browser.

  15. VivvaldiViewer said on December 11, 2020 at 1:55 am
    Reply

    There is a most alarming warning on the website https://wkhtmltopdf.org/downloads.html
    * Do not use wkhtmltopdf with any untrusted HTML – be sure to sanitize any user-supplied HTML/JS, otherwise it can lead to complete takeover of the server it is running on! Please read the project status for the gory details.

    This is amplified at the bottom of the webpage https://wkhtmltopdf.org/status.html

    PLEASE EXPLAIN: As a non-techie, I have no idea what this all means, nor the suggested remedies. Can anyone explain, first whether the problems are serious for a normal website, and secondly, how to implement the remedies.
    – – – – – – – –

    My constant problem with the web is that I cannot reliably PDF-save webpages with active links. It’s extraordinary that routine solutions were not produced 20 years ago. Despite Tom’s initial post, I have no problem linking a command-line utility to a shortcut or button.

  16. Allwynd said on December 11, 2020 at 2:01 pm
    Reply

    They should change the name of the program to something more memorable, like “l897h3vj43gf”

  17. Anonymous said on December 14, 2020 at 11:04 pm
    Reply

    Unfortunately it is not maintained and will fail properly print html5 website.
    Any recent alternative to suggest?

  18. DirCompUser said on December 16, 2020 at 5:58 pm
    Reply

    @Anonymous December 14, 2020 at 11:04 pm

    https://github.com/wkhtmltopdf/wkhtmltopdf/pulse

    @VivvaldiViewer
    Why not ask the developer on github or in the Groups or otherwise as indicated at
    https://wkhtmltopdf.org/support.html
    and then report back here?

    I’ve always assumed the warning was for wkhtmlto pdf running on a server from which the pdf is downloaded whereas on a client machine the pdf is compiled on the client machine, but what do I know, check it out for yourself.

    @Allwynd
    wk html to pdf seems fairly mnemonic to this user who credits himself with average-normal pattern recognition capability.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.