#img2pdf — Public Fediverse posts on home.social

When you find a webpage that offers you a book but you can't download it, and you can't right-click to save the images of its pages, well – the page has loaded the images. Therefore the images are somewhere in your browser. What to do?

Knowing a bit of how web pages are structured and built helps make the most of what you see online.

1. In your browser, open the developer tools (push F12).

2. Go to the "Network" tab and restrict the view to "Images" and "Media" (see the upper right side).

3. Zoom into the book to ensure pages are of high resolution, then pass the pages.

4. You will notice new rows appearing into the table of the "Network" tab of the Developer Tools.

5. Now move your mouse over them and the image may even be shown to you; in any case just right-click and save it.

There are scripts online to automate this, but if all you are after are a few pages, this suffices.

To montage the pages into a PDF, use e.g.:

$ img2pdf *jpg -o book.pdf

... and even OCR them if you like:

$ ocrmypdf book.pdf book-OCR.pdf

Both programs can be installed with:

$ sudo apt get install img2pdf ocrmypdf

... in ubuntu, debian, and the like.

Or, import each into a page of a multi-page #Inkscape document and save it as a PDF.

#img2pdf #ocrmypdf