Well, a PHP example, anyway

Once I dove in and started messing around, I only had to fix two typos as the example I was working on seems to work correctly, at least to the extent that I’ve tested it. I now have what appears to be a working example of Geostring parsing in PHP. In this case, the example reads my feed from the Twitter website, sifts out any geostring tags it finds, then generates Google Maps links for each one found. As I write this, there are two geostring tags on that page, representing places (and times)
that I have actually been, and it seems to work.

You can take a look at the source code for the example here, or see it in action here.

Feel free to grab a copy to play with if you’d like (or write one yourself that isn’t so messy – hey, as someone who doesn’t consider himself a professional “coder”, I’m just happy that it did exactly what I wanted it to do on the first try…). You should only need to worry about two things – changing the $text_to_read, and whether or not your web server (or CLI) has fopen wrappers turned on so the script can read another web page if you use a web page as your text to parse rather than a local file.

Since generating a geostring tag is trivial, I didn’t bother trying to incorporate that into this example. If you want one, then here:

<?php
//generate a geostr tag with the most typical information only
//point not part of a track nor including heading or angle
$lat=44.027168;
$lon=-111.297892;
$elev=”1711.9m”; //could leave off the “m” and treat as float, since it defaults to “meters”
$timestamp=”20071125T123438-06″; //6 hours behind UTC

print(“geostr:$lat,$lon,$elev:$timestamp:geostr”);
//”full” version: print(“geostr:$lat,$lon,$elev:$timestamp,:,:geostr”);
//completely unnecessary, but legal
?>

As always, comments and suggestions are welcome.

New toy: “Twitter”

Wow – Celestron takes 8 business days to get me a terse one-sentence answer. BigC responds in one. Impressive. Apparently their technical people are all at trade-shows at the moment so my bigger question will have to wait until they get back, but they were at least able to answer my question about their “tabletop” digital microscopes magnification (answer: the “600x” really is optical magnification, not digital.)

Another digital microscopy WANT/DO NOT WANT post to follow when I get the followup reply. Meanwhile, after hearing about it on the This Week in Tech podcast for a while, I finally talked myself into signing up to play with the coincidentally named Twitter system.

Twitter logoIt sounds like a really stupid idea – “Oh, goodie, now I can broadcast ‘text messages’ no more than 140 characters long about trivial events in my life to the whole world! Whoopee!” “Wow! I can find out when random strangers are drinking coffee AS IT HAPPENS!” Thrills! Excitement! Adventure!…

On the other hand, having the messaging system watch for particular words might be a handy way of monitoring current events. Plus, there seems to be a lot of potential for fun, off-the-wall uses, even if many of them are kind of silly.

It DOES seem like kind of an ideal context to play with that “geostrings” concept I’ve been toying with. A terse, easily-machine-parsed format for geotag data that can fit into a “twitter” post and still leave room for a sentence or two to go with the geographic information seems like it might be useful. If you’re so incredibly bored that you want to see some examples, you can check out my own Twitter posts, several of which I’ve embedded geostrings into.

I has a books.

I also has a bad grammar (curse you, internet!)

The front cover: 'Wine Microbiology - Practical Applications and Procedures'It’s slow going trying to get the mess up here in Idaho organized in preparation for the move to Texas, but I did manage to sacrifice a large number of my old books that I no longer need. Trading them in at the local representative of the “Hastings” bookstore chain got me a decent amount of store credit, and I was able to special-order this wine microbiology book I’ve been lusting after for months. It showed up a couple of days ago.

Very interesting so far, but I’m only a little ways into it. I’m still in the theory sections, so I can’t say if it covers yeast-mating or not (see previous two posts on this blog…)

Front cover: Wildbrews: Beer Beyond the Influence of Brewer's YeastPrior to that, I picked up a book I found at the local brewing-supply place in The Woodlands, Texas. It’s an entire book on the subject of Belgian and “Belgian-style” beers (like Lambic) fermented with “wild” yeasts and bacteria. It’s an excellent mix of history, science, travelogue, and “how-to”. I highly recommend it.

I noted with particularly nerdly glee that there are several breweries here in the U.S. doing non-traditional brewing cultures. At least one was brewing entirely with Brettanomyces yeasts! (Most traditional brewers and vintners shriek in horror at the thought of Brettanomyces in their brew instead of the standard Saccharomyces yeasts, blaming Brettanomyces for – you guessed it – “off-flavors“.)

That is so amazingly spiffy I can hardly stand it. I note that one of them appears to be only a few hours from the area we’re moving to. And two of them are in Colorado, more or less on the road between Idaho and Texas, so on my next trip down which is likely to be as early as next week, I may have to try to arrange to visit at least one of them and see if I can get a tour.

YEAST HERPES!

After my previous post, there are bound to be a few wiseguys/wisegals with dirty minds who couldn’t resist chuckling and wondering “yeah, well if yeast have sex, they must get STDs too, right? Ha ha!”

Yeah, well, very funny.

Of course they do.

In fact, that bottle of hefeweizen you may have consumed at one time or another was almost certainly full of Yeast Herpes!.

Alert readers will be wondering how I can have said “there don’t seem to be any viruses of yeast” in the last post and now be telling you you’ve been eating and drinking yeast-herpes all your life.

Here’s the deal: Generally when we think of viruses we’re thinking of little protein-wrapped packages of genetic material floating around freely, which can ultimately attach to and infect some cell, forcing the cell to make more copies of the virus which are released one way or another to continue the cycle.

Fungi, including yeasts, don’t seem to have any viruses that infect their cells from outside. They do, however, have “virus-like particles”, which seem like they were probably once more traditional types of virus, whose populations have lost whatever genes were necessary to be released from and infect into yeast cells. Without this ability, there’s only one good way for the virus to spread from an infected cell to an uninfected one: sex.

It would seem that there is so much yeast-sex going on that it ends up being a much more efficient way for the viral particles to spread. As a result, despite the fact that only the cell fusion of yeast-sex can spread the particles, there are very few known yeast strains that don’t carry double-stranded RNA virus particles (“L-A”, “L-BC”, “M1“, “M2“, and possibly some others), and there don’t seem to be any known yeast strains that aren’t infected with yeast-herpes.

It’s not actually “herpes” of course, but just like herpes, it is a retrovirus, which is actually merged into the yeast’s own DNA strands, and which is then transcribed into RNA to make virus particles.  These in turn get converted back to DNA by reverse transcriptase and integrated into the infected cell’s genome. The review I found whence I got all of this information[1] mentions three versions of these “retrotransposons designated “Ty1”, “Ty2”, and “Ty3”. (I assume that’s “Transposon, yeast”.).

If anyone stares at you when you yell “Yeast herpes! NOOOOO!!!!” and run screaming from the room next time someone offers you a beer, feel free to point them to this post for an explanation.

POSTSCRIPT: My previous post made it sound like yeast cells were normally haploid. The review paper I’m citing in this post makes an interesting assertion though: it states that in the wild, yeast cells are usually diploid, and haploid cells normally only show up as a result of environmental stresses. This is somewhat at odds with, for example, a more recent Genetics textbook[2] that I have in my possession, which explicitly describes that once the two haploid mating cells merge to form a diploid cell, it “promptly undergoes meiosis to produce four haploid ascospores”. This may perhaps be a case of a difference between growth in laboratory conditions versus normal environmental conditions. Perhaps in the natural environment which has not been carefully formulated to specifically promote yeast growth, diploid yeast cells persist until particular conditions induce meiosis. Hopefully the spiffy new book I have on order will show up one of these days and will hopefully have some discussion of the topic.

[1] Wickner RB: “Yeast virology.” FASEB J. 1989 Sep;3(11):2257-65.
[2] Snustad DP, Simmons MJ: “Principles of Genetics (3rd Edition)”; 2003; John Wiley & Sons, Hoboken NJ [ISBN: 0471441805], pp 42-43

I should be getting more done…

Im Name des Nudelmonster! It’s been over a week since my last post!

“Someone” seems to have located a replacement original disk of a game I had many years ago (but lost when I loaned it to someone) and bought it for me. Now, in addition to a variety of issues I need to deal with related to moving over the next few months, I have this delightfully surreal old computer game beckoning at me. ARGH! MAKE IT STOP!

Meanwhile, I’ve been trying to put together topics for next week’s “Just Science 2008”. We’ll find out who, besides me, is interested in fermentation once it starts. I think I’ll have to start off the series with a post on evolution, however, since it really does play a fundamental role when it comes to yeast culture. I also think I may be able to work JellO® into at least one of the posts, too…

Internet connection will be spotty the rest of this week as we travel towards the area that is to be our New Home, but I should have posts assembled in time for next week.

If I get a chance, there will hopefully be at least one more Geostrings post, possibly with a sample mp3 and/or Ogg/Vorbis audio file.

My “geostrings” project, and coming attractions.

I have set up a more permanent “page” for my little project to come up with a way to embed geotags in things like mp3, Ogg/Vorbis, video files, text documents, image formats besides jpeg and geotiff, and so forth. I’ve got a definition of the format and a basic description of the parsing algorithm for it up there. Embedding and decoding examples and so forth will follow soon, though I’m hoping for some comments before I get too deep into assuming I’ve got the format finalized.

Meanwhile, I’ve signed on for this year’s “Just Science” week, So I’ve got to get together at least five consecutive days worth of science posts to go up between February 4th and 8th. Fortunately, I think I can fill most if not all of it with the brewing science (and yeast culture in particular) stuff I’ve been researching. I’d still like to get my hands on at least one more paper which isn’t readily available to me (Gasent-Ramírez JM, Castrejón F, Querol A, Ramón D, Benítez T.: “Genomic stability of Saccharomyces cerevisiae baker’s yeasts.”; Syst Appl Microbiol. 1999 Sep;22(3):329-40.) but I do have quite a few others that I’m going over.

Proposed format(s) for geotagging arbitrary types of media

Yet more thoughts on geotagging – here’s what I’ve come up with so far.

The format needs to handle only two fundamental data types – points and polygons. It also obviously needs to handle “lines” or tracks, but those are made of “points”. Polygon, for my purposes, might be unnecessary and I’m not sure if I should leave it in. I’m reluctant to leave it out – that way you could easily georeference media to a building or field’s outline, for example. On the other hand, I’m trying to keep this format terse and concise – I’m not trying to merely embed .gpx or .kml files in things.

A “point”, as I am thinking of defining it here, is made of up to seven attributes (more or less in order of importance): a latitude/longitude pair, elevation, timestamp, track-ID, heading, and angle. A polygon is the same, except that it contains a list of at least three lat/lon/optional-elevation sets. It still only has a single timestamp, though, just like a “point”. I suppose in some odd cases one could even define a track as a series of polygons – defining the field of view in a video taken from the bottom of an airplane that’s taking off, for example.

Leaving aside the question of polygons for now, I’m envisioning two possible formats which I will arbitrarily name “geotag” (XML-type) and “geostring”(simple text) for the moment.

I picture a geotag entry looking something like this:

<geotag:point lat="41.228063" lon="-115.058119" elev="1720.901m" datetime="20071115T143000-06" trackid="1" heading="340" angle="-5.0">Metropolis Hotel</geotag:point>

In this format, the optional description of the point is between the opening and closing tags there. “lat” and “lon” might be better as a single “latlon” or “coord” attribute, with the latitude and longitude separated by commas (i.e. <geotag:point coord="41.228063,-115.058119">:</geotag:point>)

A “geotring” point might look something like this instead:

geostring:point:41.228063:-115.058119:1720.901m:20071115T143000-06:1:340:-5.0:geostring

Not sure if the closing “geostring” is really necessary here, but it would make backwards-compatibility easier if fields were added to future revisions. As with the geotag, it might be better to treat the lat/lon pair (the only mandatory information for a minimal “point” definition) as a single field, so the minimal “geotag” example above done as a “geostring” would look something like: geostring:41.228063,-115.058119::::::geostring

Even as I write this, I find myself leaning towards combining the latitude and longitude into a single field, if for no other reason than it means each point only has one required field. Either way, I currently think the fields ought to be defined thus:

  • latitude and longitude are decimal degrees. Either may be prefixed by a + or – (lat: +=”Northern Hemisphere”, -=”Southern Hemisphere”, Lon: +=East, -=West) – if neither is there, + will be assumed. Latitude and longitude are required for every point.
  • Elevation may be suffixed by “m” or “f” (for “meters” or “feet”). If neither is specified, meters are assumed.
  • Timestamp is in the ISO 8601 “basic format”. If neither “Z” or an offset from UTC are specified, “the viewer’s local time” should be assumed (which is kind of silly, but it still would allow one to synchronize a track with, say, an audio recording or video.)
  • trackid is any arbitrary alphanumeric term with a maximum of, say, 16 characters (is that enough?) Any points with the same trackid are assumed to be part of the same track. If unspecified, the point is assumed to be unrelated to any other points (if any exist) that may be in the same file.
  • Heading is in decimal degrees from 0 to 360. This represents facing a particular (horizontal) direction from the point in question. “Which direction the camera was pointing” in the case of a photograph.
  • Angle is in decimal degrees from -90 to 90. This represents an angle above or below the current elevation at that point (for a picture, this would represent the upward or downward angle that the camera was pointing when the picture was taken.)

Hmmm, if I shorten “geostring” to “geostr” and either eliminate the “data type” field (“point”) or just reduce it to a single letter, that entire and complete “geostring” example would fit even into a single tiny 64-character comment field, if there are any file formats still floating around limited to that kind of small metadata size.

My main goal here is to make it easy to create files tagged with this information. So long as it’s easily read and not likely to get separated from the file it describes, using the data for anything ought to be easy, even if one has to do it “by hand”. As was mentioned on the “Into the Pudding” blog (found via the GeoRSS blog), having applications that can read metadata is useless if nobody’s putting the metadata in their files to begin with. If an acceptable format can be worked out, I intend to start making as much georeferenced information available as possible.

Who’s with me? Comments, suggestions, offers of patronage, anyone?

More on geotagging

Some good comments came up in the last post on georeferencing. I thought a followup post was
merited.

The itch I’m trying to scratch here is that I want to be able to georeference just about any kind of data,
and I want to be able to embed the georeference information directly in the data file, whether it’s a
graphic, or audio, or video, or gene sequence data, or anything else. I want to have a standard form for tagging any of these files. And I don’t want to store the location metadata in a separate file.

What I think I need, then, is a standard, simple way of making geographic notations in a terse, concise format that is both easily parsed by and readily recognizeable to a computer, is reasonably human readable, and can be made to fit just about anywhere that arbitrary text is allowed.

Right now, there are only two types of files that have some way of embedding geographic information into them that I know of. The obvious one is that EXIF data in JPEG files can contain “GPS” tags. For hardcore GIS people, GeoTIFF is the other one. Both are for photographs or other still-image data only. What about the rest?

A variation of one of the current geotagging XML formats like the W3C (“<geo:lat>41.4354840</geo:lat><geo:lon>-112.6660845</geo:lon>”) or GeoRSS is an obvious possibility. XML has two potential problems though, as I see it. First, it’s not very terse – the markup substantially increases the amount of space the information takes up. I think in most cases that wouldn’t necessarily be a problem, but I suspect there are a few file formats out there with only comparatively small spaces set aside for a “comment” or “description” field.

The second potential “problem” is something odd that occurred to me today: it’s hard to pronounce out loud. There are some popular audio formats (e.g. “.wav”) that as far as I know have no space whatsoever for arbitrary text…but if my little standard was something that could be distinctly spoken, someone making a recording could literally speak the metadata in a format that a speech-to-text engine (like Sphinx) might be able to recognize and convert to a compatible string of text which could be parsed just like data from anywhere else. This is something of a corner case, I admit, but I think it’s at least worth considering.

Another good point that came up was what you do if your data extends beyond a single point. For example, if I want to georeference an audio recording I might make while narrating what I’m seeing out the window of a speeding train, it makes good sense to at least try to store line segments rather than just a point. That way, if someone wants to find the spot within a several-mile stretch where I suddenly exclaim “Hey, wow, look at that!” they can. The ability to define areas with a polygon or a point-and-radius seems like it would be handy, too, though obviously much more optional.

So, let’s see, I’m looking for a format with minimal markup, but which is easily recognized, is made of plain text which could be crammed into, say, a PNG tEXt chunk, an mp3 comment frame, a Genbank “Source” field, or any other field which allows arbitrary text. I want a form that’s minimally objectionable to anyone else who might be willing to use it. And I think I want it to be able handle points consisting of at least latitude, longitude, optional elevation, optional timestamp, and possibly even an optional heading and angle, and can handle more than one point per file (for the case of lines). Am I forgetting anything?

Besides “going to bed before 3am”?

I want to geotag something besides photographs!

Cornelia - Queen of the Snow!For no particular reason, here is a picture of The Dog in her natural habitat. This picture really has nothing to do with today’s blog post, but since this is supposed to be a happy time of year, I suppose a happy picture is in order.

In case anyone is wondering if I’ve forgotten the supposed microbiological emphasis on this blog, the answer is no. In fact, I’ve got a post on amateur yeast culture brewing, but I’m still researching it a bit.

Meanwhile, it seems reasonable to post about geolocation, which after all is an important and useful trick for associating information with its place in The Big Room.

Geolocation of photographs is well established, at least for JPEG images. There are standard ways of tagging a JPEG file with an ICBM address, and I’ve been having a lot of fun doing this with my own pictures. (If you’re bored, you can browse them on Panoramio, and perhaps in a few weeks may stumble on some of them in Google Earth.)

There doesn’t appear to be any standard way of tagging other forms of media files, though. What if I want to geotag an .mp3 or OGG/Vorbis audio file recorded at a particular spot? Or a “DivX/Xvid” or OGG/Theora video?

Irritatingly, it seems as though a few people have mused about it, but nobody seems to have addressed it. There are projects like The Freesound Project which does geolocate sounds, but the geographic information is not actually embedded into the sound files in any way. As far as I can tell, the location is tracked in their own server’s database only. A Google search turned up a post on the “Random Connections” Blog musing about this, but the only application mentioned is adding georss tags to the RSS for a podcast feed, not to the podcast’s audio file itself. Even the otherwise excellent Mapping Hacks book (written before O’Reilly’s current decline into yet another “Proprietary Product® How-To Guides” publisher over the last couple of years) mentions the topic in Hack #59, but disappointingly appears to have really had nothing to do with tagging files so much as “interpolating a position from a GPS track, given a timestamp”.

This all comes up because we’re about to go on a roadtrip to check out a part of the country where we seem likely to end up living next year. I’ve been told I’ve got a pretty good voice, so I was considering generating a travelogue series along the way. It appears to be relatively easy to generate a “narrated picture” as a standard mp3 file, the picture being loaded as though it were “album art”. The only aspect of the whole thing that’s missing is geolocation. For now, just being able to easily obtain the ICBM address associated with the file while playing it so that one could plug the coordinates into Google Maps to see where the recording was done, but ideally I’d like to do it in a way that could be considered standardized, so that later on people might be encouraged to add geolocalization plugins to their media-playing software.

Sure, I can just generate a .kml file with a track of where we were, with markers containing picture and audio links. In fact, I probably will, but I don’t want people to have to use Google Maps or Google Earth to make use of the geolocation information associated with the audio.

Any suggestions, anyone?

I’m having too much fun with this.

I finally managed to get Hugin to work, as you can see from the picture of the Dead Fish Museum above.

Okay, it’s the visitor’s center at the Fossil Butte National Monument, but it really is a museum of dead fish. And other fossils. If you click the image to get to the Panoramio page, you can even see where it is on the map: in fact if you zoom in, the building itself is visible in the aerial photo imagery.

Between digiKam’s ability to handle geocorrelation with tracks from my GPS, Panoramio’s support for geolocation and mapping (and connection to Google Earth…), playing with High Dynamic Range digital photography, and now panoramas, I’m beginning to develop an increased urge to travel around and take pictures again…