index

My system to archive images (photos of people)

keywords: gauche scheme, postgresql, imagemagick, gtk+

download: in my download area get

summary: I wrote this system to keep fotos both on file-system (FS) and in database (DB). The aim is to put as much information to a fast DB (triggers for example), make the DB invoke operations on the FS part, but, in the same time, be able to access the photos as files categorized in a tree of directories, with flexible file names. Synchronization is optimized.

Why I don't trust putting fotos itself in the DB

techniques which i adopted ....

main components

How to synchronize 2 sets of objects (a set of files and a set of records/tuples in DB)

assumption: we have 2 sets A, B, which once had a bijection between them. In fact there were/are 2 bijections maintained: by contents (file contents vs. a part of the tuple info) and by name: filename vs. another part of tuple information.

Problem to solve: We have added into A, removed from A, permutated A (filenames). Now do the same modification to B.

First make a mapping between the 2 sets: (by contents)

This can be done lazily (think of topology): you want to distinguish between objects in set A only as much as necessary: image dimension lookup (in a file) costs less than file's MD5-hash computation. Also, if we keep image-hash in DB (as well as MD5), if a file has the corresponding md5, it also has that image-hash!

Once we have a mapping (files -> tuples, or vice versa), and some (canonical) inverse, a composition gives up a permutation of an extended set.

Extended set is the union of set A, and the image of the canonical mapping from B into the "domain" of set A (filenames).

To do the final movement, we decompose the permutation in cycles, and cycles into transpositions (w/ the help of temporary place).

Keeping backups

Once again, we construct mappings, then decompose in transpositions, adition & deletion.

---

implementation details:

FS layout

{root}/ ..../{person-number}/{category}/{number}{file name}.{mime extension}

DB layout

what we keep about each file:

commands:

common options

For walking the FS tree, we have this set of arguments:

-r root

to limit the person-id to an interval, specify minimum and/or maximum:

--from -f {number}

--to -t {number}

or, in terms of thousands:

-F {thousand}

-T {thousand}

For executing a query:

-q "select ....;"

examples:

programs:

refresh_thumbnails.scm
Checks if 'derived' files need update. Derived files are various types of thumbnails.
sync.scm
synchronizes FS & DB for given person.
import.scm
import new files into the FS+DB archive. This is done either from an external tree (similar to that described in FS layout), or from suitably named files, or any file when we specify the coordinates as command options.

alternatives to imageMagick:

http://www.graphicsmagick.org/

>> Open Source Initiative and is compatible with the GPL. GraphicsMagick is originally derived from ImageMagick 5.5.2. Since the branch from ImageMagick, many improvements have been made (see news) by many authors using an open development model.

The RAW Flaw

http://www.luminous-landscape.com/essays/raw-flaw.shtml

Nikon D70 under Linux

How to compare (non-identical) images:

usage: compare 2 videos .avi and .mpeg

making a fast image (viewer) browser see image-viewers