A lightning overview.
| 1. Scan the photographs |
| 2. Generate XML from the file list of images |
| 3. Generate an html helper file to run some javascript to ... |
| 4. Generate a title and name the people in each image |
| 5. Generate a list of people shown (brief names) |
| 6. Name each person fully from the brief names |
| 7. List the size of each image |
| 8. Generate an html file of all the photographs |
| 9. Generate an html file listing each person and the html files in which they are shown |
Easy as that!
1. Start by scanning the photographs.
Generally photographs are in albums, envelopes or other containers. I found it helpful to retain such a structure, so each time I started a new container I named a directory after that container (no spaces in the directory name, that just makes it harder to process) and scanned all that group of photos into that directory.
For each photograph, I used a naming convention which is based on names. For photographs containing mainly people, I listed the photographs according to the people shown, so I might have a file called 'catherineMichaelDavid.jpg'. Note the use of capital letters, sometimes called camelCase, which aids a little when viewing a list of files. Again, no spaces please. For photographs showing places I similarly named the place (or object) shown, so I might have lumbInRossendale.jpg as a filename. For files which need a little more information I sometimes extend this model to using a full stop as a separator. So I might file bill.mates.jpg being a photo of Bill ( known), his mates (unknown). Again, make your rules and stick to them, you'll find it helpful later.
The really helpful rule is to be consistent in how you name them and where you store them. The rule that could break this process is creating files and directories with odd characters or spaces in the name. "Fred and William & their dog.jpg" is not a lot of use at all for automated processing. The only other advice, mainly due to a limitation of the software I've written, start with a clean directory and store nothing else (other than the scanned images) in it (or them).
I've shown jpeg being used when scanning the photographs. My scanner and software lets me store files in various formats. Choose one that can be handled by your readers browsers. If you don't know/care, choose jpg. It makes for slightly smaller file sizes than PNG format. Your choice. Again, if you use png or jpg file types, stick to it. Don't mix image types in the same folder, they will be missed out!
Scanning parameters. Again dependent on your scanner and software. I chose 300 dots per inch. Mainly because it is the default for the software I use. Again, your choice.
Develop the directory full of graphics files (.jpg) to a directory full of xml files.
I have a program which lists each file in a directory and creates an xml structure from that list. It recurses down into all sub-directories, so this is another reason to start of with a clean directory and load your scanned images into it. The output might look like Example 1.1
Example 1.1. Initial XML
<?xml version="1.0" encoding="UTF-8"?>
<pics base="/photos/pawson/old">
<file name="emily.html"/>
<file name="larkfieldTerrace.html"/>
<pics base="/photos/pawson/old/album1">
<file name="billCarrTeam.jpg"/>
<file name="bill.mates.jpg"/>
....
The software writes in the name of the directory from which the files originates but later manual processing can easily remove this.
The program is called dir2xml.py and requires two input parameters
-d the input directory
-o the output file name
For example
>python utils/dir1xml.py -d . -o pawson.xml
This creates a file such as the one above (name it as you like - mine contains photographs of the Pawsons going back a hundred years or so, hence the name). The emphasised line shows the basis of what will become a full entry (as in Example 1.2) For each photograph, we now need to give it a title and other comments, then identify the people (that we know) in each one
Example 1.2. The objective
<file name="IreneFredCatherine.jpg">
<title>Irene & Fred with Catherine </title>
<person nm="irenesmith" coords="97,175,125,202"/>
<person nm="fredemmett" coords="127,135,162,175"/>
<person nm="catherinejackson" coords="180,165,210,197"/>
<desc><para>The back of the house</para></desc>
<place>Larkfield Terrace. Keighley</place>
<date>C 1950</date>
</file>
This shows a completed entry. What has yet to be added are the photograph title (for readers to view), a list of people within the photograph plus any ancillary information about the image (description, place and date, if known). The next stage helps in the creation of this information
First a little detail about the elements being created.
This is the title that will be shown above the image in the finished work. Keep it moderately brief.
The person element holds two attributes. The name of
the person and the coordinates of that person within the
photograph. The 'name' attribute (nm) is
the name of a person shown in the photograph (where known - no need to
identify unknown people). Again I used a convention here. The basics
are to create a single word 'identifier' for a person, e.g. johnSmith,
maryWhitehouse, johnSmith.bradford etc. The names must be unique
across all photographs and always use the same name for the same
person
Always used maiden names where known. Simply for consistency
Use Married name if that is all that's known.
Avoid non alpha characters if you can.
Use all lowercase (reduces errors, easier to remember)
No spaces in names!
Fallback value is whatever name you know the person by! nm='billX'
Resolve duplicates using common sense. johnsmith.london and johnsmith.glasgow etc.
![]() | Note |
|---|---|
For my family album, I 'identified' faces that I thought I
would be able to identify (from relatives etc). I created two
'generic' unknown people. I called them |
The reason for this is simply ease of processing. This is an 'identifier' for that person. A later stage will allow you to provide the full name which is accessed using this key or identifier.
As examples, both good and bad (for use in processing that is)
| davepawson OK |
| dave pawson Not OK (contains space) |
| DavePawson Not OK... just harder to remember when you come across this face again |
| DavePawson. Not OK. You'll need to remember the period! |
| marysmith. Could be OK. Is this the lady's maiden name (if known). Just a way of ensuring consistent usage, so that I didn't find both Mary Smith and Mary Jones as two separate people in the list! Of course if you don't know Marys maiden name then there is no room for confusion |
is for any descriptive information about the photograph. This can be captured in multiple <para> elements or simply entered as plain text. <desc> comments are optional.
describes a location. If you use a regular form for places (say just town names) then it will be possible to list all the places contained in the photographs. <place> comments are optional
The date (if known) at which the photograph was taken. As accurately as you need dates. As with places if you use a regular format (2008 July 31 or simply 1904) then that data will be available for sorting. It may be worth while thinking about this before you start the laborious work.
Now it is ready to be expanded to contain all the detail. This is described below
Just one thing to do before adding detail. You'll notice in
pawson.xml that the base attribute on the pics element is
absolute. E.g. /photos/pawson/.... etc. This is not good if you intend
to give the photographs on a CD to another person. To manage this all
the directory names need to be made relative to some basic location on
disk. In the example above, I chose
/photos/pawson as the base. All that is needed is
to make all the directories relative to this. With a text editor
simply search for "/photos/pawson/" and replace it with an empty
string. The file will look like this afterwards
<?xml version="1.0" encoding="UTF-8"?>
<pics base="old">
<file name="emily.html"/>
<file name="larkfieldTerrace.html"/>
<pics base="old/album1">
<file name="billCarrTeam.jpg"/>
<file name="bill.mates.jpg"/>
which is just what is wanted to stop me getting into trouble with my sister when the file not found error message shows!.
This is to add the data to Example 1.1 to generate Example 1.2. Three steps are involved.
1. identify the people in the photograph and provide a title for the photograph.
2. Give the full name to the people based on the Identifiers that you use.
It's easy after that. As I developed this I actually worked on each photograph a number of times. I'm trying to reduce that this time. Clearly we can't provide full names until we know who is in the photographs, so that is the first process step.
If you look in the zipped up software, you'll see identify.html which is there for you to practice, if you need it, prior to working on your own XML file
To see how this is used, try the resulting HTML produced on the output side! This is in file whois.html. Also provided here as a working example
It's a partially automated process. In outline:
| 1. change directory to each of the directories containing photographs, in turn. |
| 2. Run a small program (identify.sh ) which generates an html file for each photograph in the current directory, containing links to the image with the same filename (but different extension). |
| 3. Open each html file in turn in the browser to show the image |
| 4. In the text box, enter a title for the photograph and optionally description and place, if known |
| 5. Pick out each person in the photograph, using the mouse |
| 6. Add the names of all the people you want to identify (in the way mentioned in Identifiers above) in the appropriate box. Repeat until finished. |
| 7. Copy and paste the title, names etc into the xml file. |
| 8. When all the photographs in this directory are done, delete all the html files (no longer needed) in this directory |
| 9. Repeat for the other directories containing images. |
Yes, quite a bit to do and quite tedious. Wait until you're feeling calm before you do this! It helps if you are able to identify the people or have names for the people in the photographs.
To help you along, there is an html file identify.html and an accompanying photograph (3children.jpg) to use for practice
Take each step at a time. For the purposes of this document I will assume a few directories underneath a root directory
Select a directory.
> cd \photos\pawson\old (or whichever directory you have used)
3.1.2. Generate the html files to work on
>identify.sh
This bash shell script creates html files of the same name as all the jpeg images with a .html extension, that use javascript to help collecting information about the photograph
Enter the title for the photograph in the 'title' box, then, for each person in the photograph,
1. Enter their name in the 'name' box (as outlined in 2.2 above)
2. Pick out their head, by making a box round it - just top left and bottom right is enough. These details will then appear in the text box.
![]() | Warning |
|---|---|
The javascript is expecting pairs of points (top left, bottom right) for each person. If you click the mouse once more, it will mess up the processing. If that happens, just re-load the file into the browser and start again |
Keep on adding names and the 'output' will grow.
When done, copy all the information, until something like
<title>Irene & Fred with Catherine </title>
<person nm="irenesmith" coords="97,175,125,202"/>
<person nm="fredemmett" coords="127,135,162,175"/>
<person nm="catherinejackson" coords="180,165,210,197"/>
<desc></desc>
is available
Yes, there is a twist there. Two characters you must not type in 'directly' are & (ampersand) and < (the less than sign). If you want to enter these, they need to be 'escaped' as its called. If you want &, enter & (yes, I know it includes the ampersand!) If you want <, enter < This is simply because it's XML.
and then copy it into the appropriate place in the xml file generated above in section 2. That might make the part of the file look like this
<pics base="/photos/pawson/old/album1">
<file name="ireneFredCatherine.jpg">
<title>Irene & Fred with Catherine </title>
<person nm="irenesmith" coords="97,175,125,202"/>
<person nm="fredemmett" coords="127,135,162,175"/>
<person nm="catherinejackson" coords="180,165,210,197"/>
<desc>On holiday in Brighton</desc>
</file>
<file name="bill.mates.jpg"/>
Now is the time to add any extra descriptive text about the
photograph, as above, the desc
content. Note that the end result is that I have converted
<file name="ireneFredCatherine.jpg"/>
into
<file name="ireneFredCatherine.jpg">
<title>Irene & Fred with Catherine </title>
<person nm="irenesmith" coords="97,175,125,202"/>
<person nm="fredemmett" coords="127,135,162,175"/>
<person nm="catherinejackson" coords="180,165,210,197"/>
<desc>On holiday in Brighton</desc>
</file>
This makes it well formed XML, essential for further processing
Next, close that html file and open the next one, repeat the operation until all the html files have been processed in this directory... then move on to the next directory.
Just don't forget to save the xml file every so often!
Finally, at any time during this process, you can check the
resulting file (pawson.xml in this example) for
validity using the schema provided, pics.rng. The command is
>
java -classpath jing.jar:xercesImple.jar \
-Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
org.apache.xerces.parsers.XIncludeParserConfiguration \
com.thaiopensource.relaxng.util.Driver pics.rng pawson.xml
The command should be all on one line (remove the backslash
character '\'). the 'Jing' validator is available from here
and xerces from the Apache
project
Step 5 Process steps. A two step process
Firstly collect the names out of the xml file listing the images
(pawson.xml in this example) and generate a list
of names (with all duplicates removed).
Next. Give a full name to each person shown
This converts the Identifiers (e.g. johnsmith.bradford) to a full name (e.g. John Smith (born Bradford)). A stylesheet (makeEmptynames.xsl) is available to create the 'empty' file containing just the list of identifiers, pulling all the names from the photographs xml file (persons.xml in the example above). This programs creates a 'shell' xml file (fullnames.xml) ready for you to fill out the details.
Run the program (an XSLT stylesheet) as follows
>java -jar saxon9.jar -o fullnames.xml pawson.xml makeEmptynames.xsl
This writes the file 'fullnames.xml' which looks like this.
<names >
<persons>
<person nm="bobwaterhouse" full=""/>
....
Now, whenever a 'brief' name is used, the software can replace it
with the full name that you have provided. The only thing to
remember is that each time you add a photograph, you must fill in
the details in two places. Firstly the
persons.xml file to 'register' the photograph,
secondly if you have added another person his or her name must be
added to the 'fullnames.xml' file.
Now you need to crawl through this and fill in the full names, such that the above might become
<names >
<persons>
<person nm="bobwaterhouse" full="Bob (Robert) Waterhouse"/>
....
I found that it helps others if you add married names where appropriate for the ladies.
Next build the persons.xml file from (in my
case) pawson.xml. This is done with a stylesheet called
persons.xsl
$java -cp saxon9.jar:xercesImple.jar net.sf.saxon.Transform \
-x:org.apache.xerces.parsers.SAXParser \
-o persons.xml pawson.xml utils/persons.xsl
Note the long lines are broken for clarity. All this should be on one line
This provides a file which links the 'token' names to a persons full names. It looks something like
<persons>
<person nm="mrwood" full=""/>
<person nm="nathanwood" full=""/>
....
</persons>
That's all the manual work done. A build script (build)is provided which helps in running all the necessary steps from now on, once the manual process has been completed.
A necessary step (7 in Process steps) is to determine the size of each image such that when the HTML files are generated, the images aren't too big or small for the page. An optional precursor to this step is to resize, or trim any images such that they don't have huge white borders, or to do any other processing you feel necessary. The ones I need generally include rotating them a degree or so (when I didn't get them square on the scanner platten) and cutting them to size to remove unwanted borders. I use GIMP for this, you choose and use whatever tools you have available.
Having processed them, we need to generate an XML file, within each directory containing images, which indicates the image size. This looks something like
.. <f nm="marysuemark.jpg" width="408" height="514"/> ..
I.e. each file has its size identified. This is done by a shell script, sizeXML.sh which uses another imagemagic command line tool. cd into each directory and run it as shown
$../utils/sizeXML.sh
This creates a small xml file called sz.xml in the current
directory. Repeat for any other directories containing images. That's it!
![]() | Note |
|---|---|
You will need to repeat this step if you make any subsequent changes to image sizes. |
Next start to build the html files. Steps 8 and 9 combined in Process steps.
A file, imaginatively called 'build' is available, to be tailored to fit with the files you've used in the directories you chose. Note that this is a Linux shell script (bash to be precise)
Overview
One output is a zip file containing all the html files and the images, ready to mail to someone. Could be a bit on the large side, so you might want to use a service such as sendthisfile or mailbigfile
The other output is a 'website' (just a whole bunch of html files containing images) of the images and the descriptions added earlier.
Taking the build file one piece at a time
rm -rf html/* rm ../pawson.zip cp utils/names.js html cp utils/pawson.css html
All the html files are kept in the html directory. The zip file output output (in my case 'pawson.zip'), is kept in the directory above where I keep all the files. A Directory called utils is used to keep all the working files, xslt scripts etc. The main files are (in my case) pawson.xml which is the xml file containing all the photograph information, fullnames.xml (maps brief names to full ones), persons.xml (derived from pawson.xml) which lists all the people and links to the photographs of that person. The two files index.html and persons.html are derived from the xml using xslt stylesheets. Then finally the directories containing all the scanned images (album1 album2 etc). So the 'working directory' layout is something like
/pawson /utils /html fullnames.xml pawson.xml persons.xml index.html persons.html /album1 /album2 /album3
The next task is to generate the 'website', that is all the html files. Another xslt stylesheet.
$java -cp saxon9.jar:xercesImple.jar net.sf.saxon.Transform \
-x:org.apache.xerces.parsers.SAXParser \
-o index.html pawson.xml utils/pics2html.xsl
This creates index.html in the top level, it also creates one file per photograph in the html directory.
Next, generate the 'persons.html' page.
$java -cp saxon9.jar:xercesImple.jar net.sf.saxon.Transform \
-x:org.apache.xerces.parsers.SAXParser \
-o persons.html persons.xml utils/persons.xslThis uses the 'persons.xml' file (list of all the people mentioned) and creates the 'persons.html' file.
Finally, (if you want it), create the zip file to send to your relatives or other people in the photographs.
$zip -q -r ../pawson.zip . -x utils/\*
This creates the zip file in the directory above the current one, excluding all the working files in the utils directory. And that's it!