More on geotagging

Some good comments came up in the last post on georeferencing. I thought a followup post was
merited.

The itch I’m trying to scratch here is that I want to be able to georeference just about any kind of data,
and I want to be able to embed the georeference information directly in the data file, whether it’s a
graphic, or audio, or video, or gene sequence data, or anything else. I want to have a standard form for tagging any of these files. And I don’t want to store the location metadata in a separate file.

What I think I need, then, is a standard, simple way of making geographic notations in a terse, concise format that is both easily parsed by and readily recognizeable to a computer, is reasonably human readable, and can be made to fit just about anywhere that arbitrary text is allowed.

Right now, there are only two types of files that have some way of embedding geographic information into them that I know of. The obvious one is that EXIF data in JPEG files can contain “GPS” tags. For hardcore GIS people, GeoTIFF is the other one. Both are for photographs or other still-image data only. What about the rest?

A variation of one of the current geotagging XML formats like the W3C (“<geo:lat>41.4354840</geo:lat><geo:lon>-112.6660845</geo:lon>”) or GeoRSS is an obvious possibility. XML has two potential problems though, as I see it. First, it’s not very terse – the markup substantially increases the amount of space the information takes up. I think in most cases that wouldn’t necessarily be a problem, but I suspect there are a few file formats out there with only comparatively small spaces set aside for a “comment” or “description” field.

The second potential “problem” is something odd that occurred to me today: it’s hard to pronounce out loud. There are some popular audio formats (e.g. “.wav”) that as far as I know have no space whatsoever for arbitrary text…but if my little standard was something that could be distinctly spoken, someone making a recording could literally speak the metadata in a format that a speech-to-text engine (like Sphinx) might be able to recognize and convert to a compatible string of text which could be parsed just like data from anywhere else. This is something of a corner case, I admit, but I think it’s at least worth considering.

Another good point that came up was what you do if your data extends beyond a single point. For example, if I want to georeference an audio recording I might make while narrating what I’m seeing out the window of a speeding train, it makes good sense to at least try to store line segments rather than just a point. That way, if someone wants to find the spot within a several-mile stretch where I suddenly exclaim “Hey, wow, look at that!” they can. The ability to define areas with a polygon or a point-and-radius seems like it would be handy, too, though obviously much more optional.

So, let’s see, I’m looking for a format with minimal markup, but which is easily recognized, is made of plain text which could be crammed into, say, a PNG tEXt chunk, an mp3 comment frame, a Genbank “Source” field, or any other field which allows arbitrary text. I want a form that’s minimally objectionable to anyone else who might be willing to use it. And I think I want it to be able handle points consisting of at least latitude, longitude, optional elevation, optional timestamp, and possibly even an optional heading and angle, and can handle more than one point per file (for the case of lines). Am I forgetting anything?

Besides “going to bed before 3am”?

I want to geotag something besides photographs!

Cornelia - Queen of the Snow!For no particular reason, here is a picture of The Dog in her natural habitat. This picture really has nothing to do with today’s blog post, but since this is supposed to be a happy time of year, I suppose a happy picture is in order.

In case anyone is wondering if I’ve forgotten the supposed microbiological emphasis on this blog, the answer is no. In fact, I’ve got a post on amateur yeast culture brewing, but I’m still researching it a bit.

Meanwhile, it seems reasonable to post about geolocation, which after all is an important and useful trick for associating information with its place in The Big Room.

Geolocation of photographs is well established, at least for JPEG images. There are standard ways of tagging a JPEG file with an ICBM address, and I’ve been having a lot of fun doing this with my own pictures. (If you’re bored, you can browse them on Panoramio, and perhaps in a few weeks may stumble on some of them in Google Earth.)

There doesn’t appear to be any standard way of tagging other forms of media files, though. What if I want to geotag an .mp3 or OGG/Vorbis audio file recorded at a particular spot? Or a “DivX/Xvid” or OGG/Theora video?

Irritatingly, it seems as though a few people have mused about it, but nobody seems to have addressed it. There are projects like The Freesound Project which does geolocate sounds, but the geographic information is not actually embedded into the sound files in any way. As far as I can tell, the location is tracked in their own server’s database only. A Google search turned up a post on the “Random Connections” Blog musing about this, but the only application mentioned is adding georss tags to the RSS for a podcast feed, not to the podcast’s audio file itself. Even the otherwise excellent Mapping Hacks book (written before O’Reilly’s current decline into yet another “Proprietary Product® How-To Guides” publisher over the last couple of years) mentions the topic in Hack #59, but disappointingly appears to have really had nothing to do with tagging files so much as “interpolating a position from a GPS track, given a timestamp”.

This all comes up because we’re about to go on a roadtrip to check out a part of the country where we seem likely to end up living next year. I’ve been told I’ve got a pretty good voice, so I was considering generating a travelogue series along the way. It appears to be relatively easy to generate a “narrated picture” as a standard mp3 file, the picture being loaded as though it were “album art”. The only aspect of the whole thing that’s missing is geolocation. For now, just being able to easily obtain the ICBM address associated with the file while playing it so that one could plug the coordinates into Google Maps to see where the recording was done, but ideally I’d like to do it in a way that could be considered standardized, so that later on people might be encouraged to add geolocalization plugins to their media-playing software.

Sure, I can just generate a .kml file with a track of where we were, with markers containing picture and audio links. In fact, I probably will, but I don’t want people to have to use Google Maps or Google Earth to make use of the geolocation information associated with the audio.

Any suggestions, anyone?

I’m having too much fun with this.

I finally managed to get Hugin to work, as you can see from the picture of the Dead Fish Museum above.

Okay, it’s the visitor’s center at the Fossil Butte National Monument, but it really is a museum of dead fish. And other fossils. If you click the image to get to the Panoramio page, you can even see where it is on the map: in fact if you zoom in, the building itself is visible in the aerial photo imagery.

Between digiKam’s ability to handle geocorrelation with tracks from my GPS, Panoramio’s support for geolocation and mapping (and connection to Google Earth…), playing with High Dynamic Range digital photography, and now panoramas, I’m beginning to develop an increased urge to travel around and take pictures again…

Nerd Photography in the Big Room

Readers may have noticed by now that I have a cheap but serviceable digital camera that I’ve been using to take pictures which occasionally show up here on the blog. (Hey, there’s another thing that the External Deliverer, in Its benevolence, might bring me: a nicer digital camera.)

I’ve been playing with geolocation for a while now. Just recently, I started also doing some crude playing with High Dynamic Range digital photography. It’s obviously going to take me some work to get it figured out and get better results, but what I’m getting so far doesn’t look too bad, at least in my own opinion. Kind of surreal, like Mars Rover pictures…

I’ve discovered that my Handy-Dandy Linux box has access to a couple of tools that make these easy.

I noticed a few days ago that digiKam is actually able to read .gpx format files downloaded from my GPS and then correlate the track from the GPS with the timestamps on the photos automatically, so in what little spare time I have I’ve been going back through my archives of GPS tracks and timestamped photos and trying to find as many to correlate as I can. I managed to get geolocation tagged into pictures from as long ago as three years or so. I also tagged this more recent one. I saw this place half a decade ago and had been wondering if it was still there. Last week we finally had a chance to visit and sure enough, it was there. If you were wondering where one could go to learn to do the Squirrel Dance, here it is.

Landscape and Sign:Don't Trespass on the 'I'

Today after classes I trudged up to the top of the hill at one corner of the campus with my trusty GPS in hand and took a few pictures, as you can tell. Since Google Earth seems to get most of it’s photos from Panoramio, I’ve started uploading them there. I may also get around to uploading them to flickr one of these days, too. I kind of need some pleasant distraction – I’m starting to hit the “Am I there yet???” phase of the semester. Just another week-and-a-half of classes, then finals, then I’m finally done. At least with the undergraduate stuff.

If you’re bored, there are a couple of additional pictures on the Panoramio site, here. You can also get the ICBM address there, and a .kml file for Google Earth so my pictures will pop up if you happen to run past an area where one of them is while you’re browsing the globe.

More Search Amusements. (p.s. I Ain’t Dead Yet.)

A bit longer of a delay between posts than I’d like, but here you go:

+ =?????

I am often amused (and regularly baffled) by the kinds of search queries that lead people to this blog.

I wrote a sloppy little script to parse the server’s access logs and figure out who’s searching for what, where. Since I added the ability to recognize Google Image Searches, it’s gotten even stranger.

I do get a lot of perfectly understandable hits – people looking for information about “heat-fixing slides”, expired jello, and looking for pictures of lactic-acid bacteria or whatnot. Some of them are pretty interesting questions…but first, some oddities.

At the top of my current wierd-o-meter: “carbonated leprechaun”…what??? What’s funnier is that this was a Google Image search – someone doesn’t just want information ABOUT carbonation of leprechauns, they want pictures. Now I can’t stop imagining a mash-up of “Darkman” and Leprechaun. Thanks a lot, whoever you are…”I needs me gold! ARGH! SUNLIGHT! [bubblebubblebubble…]”

Another recent one was just a search for the phrase “new england sucks”. As another Image search. Somebody not only doesn’t like New England, but they want pictures of “new england sucks”?…

Less risible but still kind of funny are searches influenced by unfamiliarity with the English language. I have no idea what the search for pictures related to “useful of DNA” was hoping to find. (Uses of DNA? How to “use” [work with] DNA? Diagrams of genetic processes?). I also see a number of searches just based on the name of the blog – people looking for information about furnishing “big rooms”. I have no idea what the search for “name of thing in room” was expected to turn up. This one’s another language issue, but even taking that into account I’m still baffled about this one. I wouldn’t expect google.de to return any useful information for “Sache im Zimmer” (the original search was actually from a Spanish-speaking area, but No Entiendo Espanol, so I’ll use a German analogy instead.)

Or from Sweden: “Aerobic Oxygen fraud”. Somebody’s figured out that we don’t actually need to breathe and that it’s all a ploy by the Oxygen Lobby to enslave us, I guess.

Maybe just because “chemicals” get mentioned here from time to time, I get the occasional hit from someone looking for illegal drug information (either technical or just news of drug busts or whatever). Note to “HILLBILLY METH” searcher: Hillbillies do moonshine. Meth comes from Rednecks. Jeez, doesn’t everyone have to do a semester of Rural Population Stereotype Taxonomy in college anymore?

There are some more relevant and interesting questions that show up here, too.

Oreo CookieI guess someone in southern California used an interesting analogy in their microbiology class, because I recently got a couple of searches from there looking for why the cell membrane is not like an Oreo® cookie. The answer: There’s no “creme” filling. No seriously – the membrane is two layers of the same kind of molecule stuck together. The phrase you’re looking for is “Phospholipid bilayer”. In a way, the molecules are a lot like detergents – they’ve got one end that “likes” water, and a long tail at the other end that doesn’t (much as oil doesn’t). Since the cell is surrounded by and full of water, you end up with one layer with all its hydrophilic ends touching the water outside the cell, and the other layer with its hydrophilic ends on the inside of the membrane touching the water inside the cell, and the hydrophobic ends of both layers all tangled up together in the middle – without anything between them. See? Not like an oreo cookie at all. Aside from this, cell membranes are also squishier and not chocolate flavored most of the time.

I’ll deal with “does beer and ice cream make gas” in another post later…

It’s over!

No you can't have $10,000.  Not yours.

I am proud to announce that I am 5th Loser in this 2007 College Blogging Scholarship competition!

Lacking the emotional appeal and/or existing promotional network of the top scorers, I was pretty much up the creek without a plunger. Given the popularity contest format of the competition, I’m actually pretty pleased with how I did. My regular readers (judging by the hits to the RSS feed) have approximately tripled or quadrupled, and I did get a small but useful amount of feedback to help improve things. Oh, and hey, I seem to have readers in Berlin and somewhere in Chile, among other places, so now I can say I’m “world famous™”. Though the proportion of voters who actually did check out all of the blogs was pitifully low, it does still look like it was around 1-2% of the voters, which is actually higher than I would have predicted.

I get the impression that some of us running less well known blogs were a little disappointed about the format of the competition, but there’s really no reason to be. All it means is that rather than being a contest for “highest quality” blog, it was a contest for “most effective” blog. Certainly, being able to get your “vote for me” message out to a larger range of people is a valid measure of effectiveness, so the results seem reasonable to me. And I wasn’t the bottom scorer. Judging by the way my score moved, at least some portion of the people who were examining all of the blogs actually did like what they saw here as I was getting a couple of votes a day on average, so I’m doing something right at least.

The only complaint I really have about the “popularity contest” format is this: I think one of the major benefits to humanity of “blogging” is the fact that unlike mainstream media, a blogger can afford to present unusual, less broadly popular content which otherwise would never be made available. Not having to worry about the internet equivalent of “Nielsen Ratings”, we can afford to put up obscure or strange things that only a fraction of the world might be interested in, which is why if you poke around the internet, you can find something that isn’t the latest celebrity crap or badly-reported political scandal. I actually don’t know how much of a role it played in this particular competition, but this sort of approach in general strikes me as something that would be strongly biased towards “mainstream” content. I think a little more love for all of us off-center folks would be in order.

I also hope they’re offering runner-up prizes again this year. Even if *I* don’t win, at least one of “my people” (nerds, that is – hey, you don’t go for a PhD in Neuroscience without being at least a little bit of a nerd…) would get something again this year if they do.

This does mean, though, that I won’t have $10,000 to buy a microscope with. Woe is me. On the other hand, that means I’ve got no excuse not to try begging in front of scientific conferences. I figure that ought to be worth some entertainment, once I get some time to try it. Perhaps by this time next year, I’ll have a bit more fame and popularity and have a better shot at the prize.

Hey, scienceblogs.com, if you want to promote my blog next year when I’m (hopefully) in graduate school, I may have a shot at the prize next time around… (UPDATE: It may not be obvious, but this should be read as good-natured jealously, not some kind of complaint or accusation…)

And now that all that’s over, we’ll be returning once again to my usual nerdity. Stay tuned (some more).

Hello, College Blogging Scholarship reviewer and other casual viewers

I see the hits from people examining the finalist blogs (including this one) at the 2007 College Blogging Scholarship are up, presumably since this is the last weekend of voting (insert obligatory “please vote for me” plea here).

I have a favor to ask of you, and everyone else who happens to stumble on this blog one way or another (including my regular readers): Please tell me something about your impression of this blog. Even if all you have time for is a quick one-sentence comment, praise for something you like or thoughtful criticism of something you don’t like, or just something that you thought was noteworthy, it will help me improve the blog. No registration is necessary to comment.

If you have time for a more detailed comment, some opinions as to what else you might be interested in seeing here would be helpful. For example, I’m considering trying to do a regular or semi-regular podcast. Would that be of interest? More pictures? More detailed discussions of scientific matters? Naked pictures of myself? (Okay, almost none of you would really want the latter…)

I’m actually more interested in your opinions than your votes, though if I can have both I would obviously be grateful…

I shall return again to the science nerdity intended for a broad not-necessarily-nerdy audience shortly. Thank you.

#1 on Google!

Over on scienceblogs.com’s The World’s Fair, the author has started an amusing meme.

It goes like this: the challenge is to find 5 sets of search terms for which your own blog or site is the #1 hit on a Google search. Note that it is acceptable to quote specific phrases but of course it’s more impressive if you don’t. Here are 8 that (as I type this) for which this blog is the #1 hit (links go to the blog address that is the hit):

There was at least one other which I’m having trouble remembering at the moment. Perhaps I’ll update later if I remember what it was.

My server’s going to walk funny for a week after this…

Someone posted a link to the “No, You Can’t Have A Cookie” image I put together a while back.

On Fark.com.

In a comment thread for an article that seems to involve the suggestion of a nude college girl.

The server logs have been scrolling by rapidly for quite a while now. Ow.

Incidentally, if you’re coming here from Fark.com, do me a favor and click on the “vote for me” image thing below. I promise if I win, I’ll use some of the money to get a TotalFark account…

UPDATE: The thread seems to be FINALLY winding down. Thank you, Farkers who noticed my plea and voted for me, I’m rapidly gaining on the 4th Loser position!

Hey, it beats last…and voting goes through Sunday this week, as I recall…

Saturday Night Meta-Blogging

I’d like to thank (profusely and repeatedly) everyone who has been coming to check out my blog via the College Blogging Scholarship site. Whether you’re deciding to vote for me or not, I hope you’ll keep coming back.

First, the bad news. We all know how this goes: the finalists are announced, and they all go ask all their friends and associates to vote for them, and to pass on the message to go vote for them. When someone, given the “Go vote for Sean” (or whoever) message, lands on the voting page, they’re confronted by a list of 19 other blogs besides the one they went there to vote for. How many of them REALLY think “Gosh, I should probably check out all of the other blogs to see which one really deserves my vote!”. The vast majority, I’m sure we all realize, just click on whoever they came to vote for and leave. Obviously whoever has the largest network of affiliated bloggers to send out repeated “vote for my buddy” posts every day has a huge advantage. Checking my server logs, I see that this is definitely true. (I’m lookin’ at YOU, scienceblogs.com! Unfair? No, actually, not at all. Jealous? You bet your sweet bippy I’m jealous!)

Since the start of the competition, I’ve gotten a total of 465 unique visits referred from the scholarship finalists announcement page. At present, there are 19,740 votes that have been cast – so less than 2.5% of the voters have bothered to check out the other blogs.

Now, the good news: This is actually much higher than I would have guessed. Given the comparative obscurity of my blog (at the moment), the fact that more than one in fifty voters have at least looked over the first page of my blog makes me very happy. So, again, thanks, and I hope you’ll keep coming back.

In other news, Josh Charles of “[website]$sudo life” suggests that my post suggesting a “War on Science” could be a good thing for science would make a good basis for an entertaining spoof documentary. I’m kind of liking that idea…

Enough of that for now, though. I’ll be getting back to my science communication, amateur science, and microbiology roots in the next few posts. Stay tuned…