More on geotagging

Some good comments came up in the last post on georeferencing. I thought a followup post was
merited.

The itch I’m trying to scratch here is that I want to be able to georeference just about any kind of data,
and I want to be able to embed the georeference information directly in the data file, whether it’s a
graphic, or audio, or video, or gene sequence data, or anything else. I want to have a standard form for tagging any of these files. And I don’t want to store the location metadata in a separate file.

What I think I need, then, is a standard, simple way of making geographic notations in a terse, concise format that is both easily parsed by and readily recognizeable to a computer, is reasonably human readable, and can be made to fit just about anywhere that arbitrary text is allowed.

Right now, there are only two types of files that have some way of embedding geographic information into them that I know of. The obvious one is that EXIF data in JPEG files can contain “GPS” tags. For hardcore GIS people, GeoTIFF is the other one. Both are for photographs or other still-image data only. What about the rest?

A variation of one of the current geotagging XML formats like the W3C (“<geo:lat>41.4354840</geo:lat><geo:lon>-112.6660845</geo:lon>”) or GeoRSS is an obvious possibility. XML has two potential problems though, as I see it. First, it’s not very terse – the markup substantially increases the amount of space the information takes up. I think in most cases that wouldn’t necessarily be a problem, but I suspect there are a few file formats out there with only comparatively small spaces set aside for a “comment” or “description” field.

The second potential “problem” is something odd that occurred to me today: it’s hard to pronounce out loud. There are some popular audio formats (e.g. “.wav”) that as far as I know have no space whatsoever for arbitrary text…but if my little standard was something that could be distinctly spoken, someone making a recording could literally speak the metadata in a format that a speech-to-text engine (like Sphinx) might be able to recognize and convert to a compatible string of text which could be parsed just like data from anywhere else. This is something of a corner case, I admit, but I think it’s at least worth considering.

Another good point that came up was what you do if your data extends beyond a single point. For example, if I want to georeference an audio recording I might make while narrating what I’m seeing out the window of a speeding train, it makes good sense to at least try to store line segments rather than just a point. That way, if someone wants to find the spot within a several-mile stretch where I suddenly exclaim “Hey, wow, look at that!” they can. The ability to define areas with a polygon or a point-and-radius seems like it would be handy, too, though obviously much more optional.

So, let’s see, I’m looking for a format with minimal markup, but which is easily recognized, is made of plain text which could be crammed into, say, a PNG tEXt chunk, an mp3 comment frame, a Genbank “Source” field, or any other field which allows arbitrary text. I want a form that’s minimally objectionable to anyone else who might be willing to use it. And I think I want it to be able handle points consisting of at least latitude, longitude, optional elevation, optional timestamp, and possibly even an optional heading and angle, and can handle more than one point per file (for the case of lines). Am I forgetting anything?

Besides “going to bed before 3am”?

Published by

Epicanis

The Author is (currently) an autodidactic student of Industrial and Environmental microbiology, who is sick of people assuming all microbiology should be medical in nature, and who would really like to be allowed to go to graduate school one of these days now that he's finished his BS in Microbiology (with a bonus AS in Chemistry). He also enjoys exploring the Big Room (the one with the really high blue ceiling and big light that tracks from one side to the other every day) and looking at its contents from unusual mental angles.

Leave a Reply