- +

Author Topic: Digitial Comic Metadata Tagging Tool  (Read 11943 times)

0 Members and 1 Guest are viewing this topic.

Offline ComicTagger

  • DCM Member
  • Posts: 4
  • Karma: 0
Digitial Comic Metadata Tagging Tool
« on: April 21, 2013, 10:23:49 PM »
For those who might be interested:

http://code.google.com/p/comictagger/

Quote
ComicTagger is a free, open-source, and multi-platform app for writing metadata to comic archives, written in Python and PyQt.

Features:
 
  • Runs on Mac OSX, Microsoft Windows, and Linux systems
  • Communicates with an online database (Comic Vine) for acquiring metadata
  • Uses image processing to automatically match a given archive with the correct issue data
  • Batch processing in the GUI for tagging hundreds or more comics at a time
  • Reads and writes multiple tagging schemes ( ComicBookLover and ComicRack, with more planned).
  • Reads and writes RAR, Zip, and folder archives (free external tools needed for writing RAR)
  • Rename files based on tag info
  • Convert CBR/RAR files to CBZ/ZIP format
  • Command line interface (CLI) on all platforms, which supports batch operations, and which can be used in native scripts for complex operations.

https://lh3.googleusercontent.com/-ZjqcxDiXZvE/UQCDHvX84jI/AAAAAAAAAE4/11Jjp2NnDCI/s1152/mac1.png

Digital Comic Museum

Digitial Comic Metadata Tagging Tool
« on: April 21, 2013, 10:23:49 PM »

Offline Whale

  • DCM Member
  • Posts: 19
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #1 on: April 22, 2013, 02:10:06 AM »
Would be great if you added creation of ACBF metadata files into archives. For more information about format: https://launchpad.net/acbf
Advanced Comic Book Format project - a file format for electronic comic books. Any suggestions and help appreciated.
ACBF Wiki

Offline ComicTagger

  • DCM Member
  • Posts: 4
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #2 on: April 22, 2013, 10:40:21 AM »
It would indeed be cool.  I looked at ACBF in the past, but was a bit overwhelmed by the scope, since it's a different container format as well containing more that just the standard metadata.  I see now (that I missed before) that an ACBF XML file can be embedded in a legacy CBZ file, which is definitely easier to deal with, at least initially.

I had a glance through the XSD, and have a few questions:

1) Since I'm coming at this this from the perspective of American-style monthly issue comics, I am looking for fields for distinct series name (i.e. "Captain Marvel Adventures") vs title of a particular issue, and also issue number, volume number and series start date.  Is there a place for this info?  Or would it all get clumped together in the title?

2) On the flip side, any author information I am starting with may not be broken up into component parts.  For example: "Kelly Sue DeConnick", "Fred Van Lente", "Jock", "Brian Michael Bendis" would be impossible to programmatically decide which goes into first, middle, and last names.  If I were converting from a single string, should it all go into the "nickname" sub-field?

3) For the body section, could I just get away something like this?

Code: [Select]
<body>
  <page><image href="02.jpg"/></page>
  <page><image href="03.jpg"/></page>
  <page><image href="04.jpg"/></page>
  <page><image href="05.jpg"/></page>
</body>

Offline Whale

  • DCM Member
  • Posts: 19
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #3 on: April 22, 2013, 02:05:39 PM »
1) Since I'm coming at this this from the perspective of American-style monthly issue comics, I am looking for fields for distinct series name (i.e. "Captain Marvel Adventures") vs title of a particular issue, and also issue number, volume number and series start date.  Is there a place for this info?  Or would it all get clumped together in the title?

Series name goes into <sequence> tag. One comic book may be part of more then one sequence:
Code: [Select]
<sequence title="Captain Marvel Adventures">1</sequence>
<sequence title="Top 10 Comics in 1941">7</sequence>

There's no series start date, only <publish-date> for particular comic book. Regarding volumes, you may put it into title (if there is such) with issue number inside like:
Code: [Select]
<sequence title="Captain Marvel Adventures Volume 1">1</sequence>
But I usually make one bigger CBZ file which contains all issues from certain volume and then I use page title element to create table of contents which lists all issues inside. But that is just my personal preference, I don't like having a lot of separate comic books with just several pages. However, I don't think that the database you are querying organizes comics this way :-)

2) On the flip side, any author information I am starting with may not be broken up into component parts.  For example: "Kelly Sue DeConnick", "Fred Van Lente", "Jock", "Brian Michael Bendis" would be impossible to programmatically decide which goes into first, middle, and last names.  If I were converting from a single string, should it all go into the "nickname" sub-field?

I would rather put name.split(' ')[0] into <first-name>, name.split(' ')[-1] into last-name and anything in between into <middle-name>. That would be the best guess I think. Don't use <nickname> instead.

3) For the body section, could I just get away something like this?

Code: [Select]
<body>
  <page><image href="02.jpg"/></page>
  <page><image href="03.jpg"/></page>
  <page><image href="04.jpg"/></page>
  <page><image href="05.jpg"/></page>
</body>

That is correct, "href" attribute may contain also directories in the path if there are directories in the CBZ file like:
Code: [Select]
<body>
  <page><image href="Chapter One/02.jpg"/></page>
  <page><image href="Chapter One/03.jpg"/></page>
  <page><image href="Chapter One/04.jpg"/></page>
  <page><image href="Chapter One/05.jpg"/></page>
  <page><image href="Chapter Two/01.jpg"/></page>
  <page><image href="Chapter Two/02.jpg"/></page>
</body>

You may even consider putting <title> element inside for the first page in each directory. <title> tag is used to create table of contents:
Code: [Select]
<body>
  <page><title>Chapter One</title><image href="Chapter One/02.jpg"/></page>
  <page><image href="Chapter One/03.jpg"/></page>
  <page><image href="Chapter One/04.jpg"/></page>
  <page><image href="Chapter One/05.jpg"/></page>
  <page><title>Chapter Two</title><image href="Chapter Two/01.jpg"/></page>
  <page><image href="Chapter Two/02.jpg"/></page>
</body>

Take a look at NYC2123-Dayender comic, which makes use of directories: http://fictionbook-lib.org/showbook.php?id=219
Advanced Comic Book Format project - a file format for electronic comic books. Any suggestions and help appreciated.
ACBF Wiki

Offline ComicTagger

  • DCM Member
  • Posts: 4
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #4 on: April 23, 2013, 11:09:20 AM »
Thanks for the info.

Quote
Series name goes into <sequence> tag.

1. I see the sequence tag now.  Very clever!  Are non-numerics a problem there?  There can often be some very strange issue "numbers", as in the recent "Fantastic Four #6AU" (http://www.comicvine.com/fantastic-four-5au-the-death-of-the-family-richard/4000-395247/)

2. Regarding volume number, it's true that Comic Vine doesn't store them, but it's info that the indicia of serialized comics commonly had in the past, such as "Avengers No. 4 Vol. 3".  They're hard to rely on in cataloging databases, since they're not consistently used, and I guess that's why Comic Vine and GCD reference start year heavily.  Regardless, volume numbers show up often in filenames, and ComicTagger app tries to parse that info out.  

(As a feature request for ACBF: It might be nice to see optional attributes of the sequence tag to allow for "volume_number" and "start_year".  Also maybe a boolean attribute "primary"?)

3. Splitting names like you suggest is possible but is essentially creating information about the name boundaries and can often be wrong.  For example. in the cases I mentioned, "Van Lente" is the last name, and (I think) "Kellie Sue" is a first name.  I guess this not so much of an issue for ComicTagger, and converting to other tag formats from ACBF, as it all flattens out.   But going the other way, if the ACBF reader is using the "last name" tag for sorting, it could run into problems.  (I like how the Calibre app has a field for author name, and then a transformed version for sorting i.e. "Fred Van Lente" and "Van Lente, Fred".  It's a bit redundant, but no info is lost if a programmatically generated sorting name is wrong.)

I probably won't get around to adding this soon, but I definitely have ACBF on the radar now.  I imagine that in the case of non-existent ACBF file, CT would create a minimal version with only meta-data section, and very basic body section, listing only the order of images.  If it already exists, trying to preserve all other data will present a small challenge with the current design, as the other formats it knows about are fully converted internally, and I don't know it that would be worth it for ACBF.  Nothing insurmountable, but worth taking slowly, when I get around to it.

-------------------
On a related topic, I know it's a chicken-egg thing trying to get a standard in use.  What sort of acceptance/use of ACBF is there now? Any third-party readers, third-party content generation?
« Last Edit: April 23, 2013, 11:13:15 AM by ComicTagger »

Offline Whale

  • DCM Member
  • Posts: 19
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #5 on: April 23, 2013, 12:56:44 PM »
Thanks for the info.

Quote
Series name goes into <sequence> tag.

1. I see the sequence tag now.  Very clever!  Are non-numerics a problem there?  There can often be some very strange issue "numbers", as in the recent "Fantastic Four #6AU" (http://www.comicvine.com/fantastic-four-5au-the-death-of-the-family-richard/4000-395247/)


Non-numeric is OK. There's no constraint for data type.

2. Regarding volume number, it's true that Comic Vine doesn't store them, but it's info that the indicia of serialized comics commonly had in the past, such as "Avengers No. 4 Vol. 3".  They're hard to rely on in cataloging databases, since they're not consistently used, and I guess that's why Comic Vine and GCD reference start year heavily.  Regardless, volume numbers show up often in filenames, and ComicTagger app tries to parse that info out.  

(As a feature request for ACBF: It might be nice to see optional attributes of the sequence tag to allow for "volume_number" and "start_year".  Also maybe a boolean attribute "primary"?)

Good idea. I may add that in the future.

3. Splitting names like you suggest is possible but is essentially creating information about the name boundaries and can often be wrong.  For example. in the cases I mentioned, "Van Lente" is the last name, and (I think) "Kellie Sue" is a first name.  I guess this not so much of an issue for ComicTagger, and converting to other tag formats from ACBF, as it all flattens out.   But going the other way, if the ACBF reader is using the "last name" tag for sorting, it could run into problems.  (I like how the Calibre app has a field for author name, and then a transformed version for sorting i.e. "Fred Van Lente" and "Van Lente, Fred".  It's a bit redundant, but no info is lost if a programmatically generated sorting name is wrong.)

ACBF Viewer internally in its library uses the whole name (first_name + middle_name + last_name) for searching and sorting. So it doesn't really matter for him if it's split incorrectly. But some future program may use this elements in other ways. If the external database does not distinguish author names then what I proposed above may be the best guess. User may correct this manually later and there are not many authors with middle name to complicate things much. You may also add some rules like assume that Van in the middle is always part of last name etc.

I probably won't get around to adding this soon, but I definitely have ACBF on the radar now.  I imagine that in the case of non-existent ACBF file, CT would create a minimal version with only meta-data section, and very basic body section, listing only the order of images.  If it already exists, trying to preserve all other data will present a small challenge with the current design, as the other formats it knows about are fully converted internally, and I don't know it that would be worth it for ACBF.  Nothing insurmountable, but worth taking slowly, when I get around to it.

Ah, OK. If you're using some internal format to convert to at first, then it will be a bit of a problem. But loading the whole existing XML tree and modifying some elements in it shouldn't be a big deal.

-------------------
On a related topic, I know it's a chicken-egg thing trying to get a standard in use.  What sort of acceptance/use of ACBF is there now? Any third-party readers, third-party content generation?

Officially not. I know of some people using it. It was even quoted in a scientific paper :-) One guy joined the project some time ago and made a linux/unix command-line tool to create ACBF files. I started to work on ACBF in December 2011, so it's just about a year and a half. Since then I converted some Creative Commons comics to ACBF and was working on ACBF Viewer. From what I know ACBF Viewer is only in Archlinux and Gentoo repositories currently (I'm using Ubuntu :-) I also created ACBF Wiki only a couple of days ago http://acbf.wikia.com
I have plans for ACBF Editor (basic program structure is already in the repository), but didn't have much time to work on it and now that I've found out about your program I thought it would be easier and earlier to happen to add support for ACBF into it. For ACBF Editor I wanted to have similar functionality than your ComicTagger, but also include some other ACBF specific features like adding a table of contents, importing panels and text layers definitions from SVG files etc. Currently, for creating ACBF metadata files I'm using a simple spreadsheet where I have my comic books catalogued and a special column with ACBF XML created from fields in it (http://acbf.wikia.com/wiki/ComicsList). But having a tool like ComicTagger would be much better :-)
« Last Edit: April 23, 2013, 01:27:38 PM by Whale »
Advanced Comic Book Format project - a file format for electronic comic books. Any suggestions and help appreciated.
ACBF Wiki

Offline ComicTagger

  • DCM Member
  • Posts: 4
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #6 on: April 25, 2013, 12:54:40 PM »
I feel like I'm going to take a break from working on CT for a while myself.  I don't know if I want to add full ACBF support to it, as even the metadata addition adds complexities, as properly supporting ABCF has newer tags and with different structures (i.e. multiple series/sequences) from the others.  I will get to eventually, though.

Feel free to plunder any parts of ComicTagger you want for your editor. 

Offline Whale

  • DCM Member
  • Posts: 19
  • Karma: 0
Re: Digitial Comic Metadata Tagging Tool
« Reply #7 on: April 26, 2013, 01:11:17 AM »
Feel free to plunder any parts of ComicTagger you want for your editor. 

OK, thanks. Looks like Apache license is compatible with GNU GPL version 3 used in ACBF Viewer and Editor: http://www.apache.org/licenses/GPL-compatibility.html
I will eventually use some of your code in ACBF Editor as it is written in python as well.
Advanced Comic Book Format project - a file format for electronic comic books. Any suggestions and help appreciated.
ACBF Wiki