Since little things like this percolate in the back of my mind all the time, here's what little thinking I've actually done, which you can feel free to leverage, work around, or ignore as you see fit, Robert (if you'll pardon the familiarity).
The big problem is overcoming a standard (CBR, CBZ) that's "good enough." For the time being, nobody cares about anything other than seeing pages, so all the extra work is adding points of failure without a significant selling point. Add in thousands of existing comics in that format, requiring enormous time and bandwidth to update them, and a project like this is going to sound like "hey, let's all convert our e-mail to Microsoft Word 2015!" to a lot of people.
You also have a bunch of "secondary formats" waiting for their day in the sun, with PDF being obvious and DjVu having a lot of unrealized potential.
That said, before those issues convinced me that I probably wasn't going to change the world, there are a few things that I thought were of some importance and might make it worth converting ten thousand (public domain) comics...eventually.
- Some unique token, like an MD5 hash, to tell me if two files think they're the same comic and can be compared with the actual contents to determine if the file has been modified and authenticate which is the "official scan." That would also make it easier to build a registry of scans, so that there's no mistaking a file's provenance. It would also make it easy to check if the user has already read a particular file, no matter how many times it's downloaded, copied, moved, or renamed.
- Database support, so that, if the XML file hasn't been created, the data can be populated as well as possible from open-access databases like the GCD.
- SVG support, since I imagine comic publication is eventually (eventually!) going to go vector-based; I mean, raster graphics are insane when you literally can't guess at all the screen resolutions your book is going to be read at and pixels can be anything from near-microscopic to the size of a fist, if you zoom in.
- Likewise, a layered image format would also help future-proofing, allowing a publisher to isolate pencils, inks, colors, and so forth, and allow access to each independently. A similar kind of "swapping" would also be nice when pages are repaired by later editors, so that both can be included without breaking the flow of reading, by only showing the version of the page (original or latest) that the reader wants but storing both.
- For small screens, optional panel transitions would be an excellent addition. DC's (at least) digital viewer does this, and it's a nice idea. The Inkscape presentation plug-in Sozi does something like this for vector art; it's obviously not quite enough to steal code or anything, but shows the sort of direction.
http://sozi.baierouge.fr/wiki/en:welcome- A step further might be worth supporting a native equivalent to the "motion comics," which right now are full movies that happen to just be showing comics, which is silly.
- Selectable text without requiring it to be rendered apart from the image, like PDF and DjVu do. OCR probably isn't worth pursuing in this respect (in case nobody typed it in), but is an interesting thought.
- When reading scans of microfiche, one thing I can't do without (that's easy to forget) is a gamma adjustment. I'm sure it's useful in other cases, too, but a lot of the old fiche scans are so dark that they're unreadable without being able to change it.
- Most important is a toolchain that helps the scanner or publisher put the books together. As things stand today, the minimal setup is a scanner, scanning software, and a ZIP program that's probably already sitting on your hard drive. With anything more complicated, we're not teaching anybody to edit XML or measure out the panel shapes. To catch on, it has to be trivial, or it won't get done except in the examples.
- I'd also like to see the tools help the scanner/publisher work with the open databases, reading initial content from them and assisting in correcting or uploading the data where needed. After all, if the work's going to be done, it shouldn't be done more frequently than absolutely necessary.
(On that note, one lost-cause wish I have for the future is for archives would start shifting to something like TAR format. Images don't compress well, so there's no gain by using ZIP or RAR, RAR is still proprietary, and ZIP doesn't handle corruption well. Since TAR is just concatenating files, by contrast, it's basically idiot-proof to find all the images that weren't damaged.)
Heh. And I agree with Jim. As the name of an example file, "Craphound" is...inauspicious.
As I said, take what you'd like from that, and where it's too much, just skip past.