Before I start, I want to point out that I'm not agitating for change. I just noticed something neat and figured some of the rest of you might want to poke around and experiment, too. That said...
Has anybody out there (I guess scaners, primarily) done any work with the DjVu format? It was shopped around by AT&T when PDF was looking terrible (err...more terrible), but never really went anywhere except the Internet Archive. See more at
http://djvu.org, particularly the page of resources.
It looks to me like there aren't particularly good tools--what I found for free was DjVu Solo, which is discontinued, clunky, and very slow--but the format shows some promise with respect to comics.
First, there's the size. For experimental purposes, I took a few recent downloads, unzipped the archive, then stuffed the images into a DjVu file. I was actually kind of shocked at the results.
Triumph Adventure Comics #1, as a RAR archive, clocks in at 12.2MB. Telling DjVu Solo that the images were photographs (which I assume is the best quality) at 300dpi, it dropped to 8.3MB. Willing to lose a bit of fidelity? Going to a black-and-white image gives us lots of jaggies, but the file is less than a megabyte and still mostly readable!
Phantom Lady #17 went from 24.9MB to 10.3MB, again, as a "photograph," which is a more dramatic reduction of nearly 60%. And I can't tell the difference between the compressed pages and the originals, except that they render progressively. That is, a blocky image shows up immediately, which is then refined over time, like in old web browsers.
Likewise, Wham Comics #2 went from 49.1MB down to 20.2MB. Sacrificing a fair amount of quality (some regions are weirdly blurry, whereas some lines are absurdly sharp--I guess that's the Wavelet compression), but again, still nicely readable, the pages can be "scans" instead of "photographs," and the resulting file drops to...3.5MB. Feel free to do a double-take, because that's about 7% of the original size!
And different pages can actually be compressed differently, then merged, so the tradeoffs could be made fairly intelligently, I think.
(There's also a "clean" mode for saving files, but my poor little machine runs out of memory whenever I try it, so I don't know how useful it might be. Or it could be that it chokes on calling a JPEG image "clean," for all I know. It sounds promising for scans of line art, though.)
The other feature impressing me, at least conceptually, is that it's apparently possible to associate the images of text on the page with actual text for screen readers, translations, or copy-and-paste operations. I wasn't able to figure out HOW, mind you, with the tools at hand, but there's an article on making it happen with handwritten pages and the Internet Archive files are all copy-able, so it's presumably workable.
I might continue messing around, so if anybody has advice on software, processes, and so forth, or warnings about why nobody should ever be using such a thing, I'd like to hear it. Oh, and if anybody wants copies of those files for comparison purposes, I'll post them up somewhere.