Well, a PHP example, anyway

Once I dove in and started messing around, I only had to fix two typos as the example I was working on seems to work correctly, at least to the extent that I’ve tested it. I now have what appears to be a working example of Geostring parsing in PHP. In this case, the example reads my feed from the Twitter website, sifts out any geostring tags it finds, then generates Google Maps links for each one found. As I write this, there are two geostring tags on that page, representing places (and times)
that I have actually been, and it seems to work.

You can take a look at the source code for the example here, or see it in action here.

Feel free to grab a copy to play with if you’d like (or write one yourself that isn’t so messy – hey, as someone who doesn’t consider himself a professional “coder”, I’m just happy that it did exactly what I wanted it to do on the first try…). You should only need to worry about two things – changing the $text_to_read, and whether or not your web server (or CLI) has fopen wrappers turned on so the script can read another web page if you use a web page as your text to parse rather than a local file.

Since generating a geostring tag is trivial, I didn’t bother trying to incorporate that into this example. If you want one, then here:

<?php
//generate a geostr tag with the most typical information only
//point not part of a track nor including heading or angle
$lat=44.027168;
$lon=-111.297892;
$elev=”1711.9m”; //could leave off the “m” and treat as float, since it defaults to “meters”
$timestamp=”20071125T123438-06″; //6 hours behind UTC

print(“geostr:$lat,$lon,$elev:$timestamp:geostr”);
//”full” version: print(“geostr:$lat,$lon,$elev:$timestamp,:,:geostr”);
//completely unnecessary, but legal
?>

As always, comments and suggestions are welcome.

Off-Topic and Back Again: “Framing”, Cluetrain Manifesto, and Twitter

“Framing” came up briefly on one of the other small independent blogs I follow. I’d link
to the post but it’s gone now. I sincerely hope its disappearance wasn’t related to the
comment I posted there, unless it was just because of the “don’t feed the trolls” part
of it – (in which case excuse me for a moment while I tell myself what an amazing fountain of useful advice I am and feel self-important for about 15 seconds before I return to reality…). I’m guessing the poster just decided he didn’t want to keep the post, but I won’t let that spoil my brief ego-feeding fantasy.

For those lucky enough to have missed it so far, here’s my flippant and extremely brief explanation
of my understanding of how the “framing” thing goes. An assistant professor of communications popped up among the science blogs one day with what seemed to begin as a couple of reminders of the obvious (mainly because it occasionally seems that people have forgotten). Namely, that if you want someone to understand what you are trying to communicate (particularly scientific matters) and agree with you, you are more likely to succeed if you can connect what you are discussing to something that your audience already cares about, and you are less likely to succeed if you are, shall we say, unfriendly to them as you present your subject.

From there, “framing” seems to have grown into something resembling the brand-name of some kind of mass-market “self-help” product line. Its primary proponent, from the distant vantage point
whence I occasionally catch a glimpse of the fight, starts to seem like the angry Vice President
of Communications for Science, Incorporated, whose office issues angry memos denouncing the insubordinate “screechy monkeys” who insist on deviating from the approved language when discussing Science, inc.’s Mission Statement. The fact that science is a conversation among people rather than a corporation probably explains why so much of the response has been not “Oh, crap, we’d better behave ourselves or we’ll get in trouble” but “Who the heck are you, and why are you telling me what I can say and how I can say it?” And that, I think, is all that needs to be said. (Anyone who stumbles upon my little blog and disagrees is welcome to say so in the comments.)

Book: The Cluetrain ManifestoActually, it’s probably more than needs to be said, and I wouldn’t have even mentioned it except that the problem of trying to apply this sort of approved “Command and Control” approach towards information in the Internet age reminded me of something else. The Cluetrain Manifesto was published so long ago that AOL was still considered a successful and valuable operation at the time, but it still seems to be relevant. (It’s free to read online – follow the link if you want to do so). At its core, its central thesis seems to be that the “Command and Control” approach to information management favored by corporate and political entities is effectively broken now because of the two-way communication made possible by a ubiquitous internet. In essence, “the market” is no longer made of isolated individuals passively sitting on the couch “consuming” the approved messages coming through the television, but a “conversation” of people who can easily tell the difference between a corporate “message” and authentic human conversation. Here’s a relevant passage:

“Imagine for a moment: millions of people sitting in their shuttered homes at night, bathed in that ghostly blue television aura. They’re passive, yeah, but more than that: they’re isolated from each other.

Now imagine another magic wire strung from house to house, hooking all these poor bastards up. They’re still watching the same old crap. Then, during the touching love scene, some joker lobs an off-color aside — and everybody hears it. Whoa! What was that? People are rolling on the floor laughing. And it begins to happen so often, it gets abbreviated: ROTFL. The audience is suddenly connected to itself.

What was once The Show, the hypnotic focus and tee-vee advertising carrier wave, becomes in the context of the Internet a sort of reverse new-media McGuffin — an excuse to get together rather than an excuse not to. Think of Joel and the ‘bots on Mystery Science Theater 3000. The point is not to watch the film, but to outdo each other making fun of it.”

Twitter logoAnd now we take one more step towards on-topicness: One current set of the metaphorical wires described in that passage is Twitter. Twitter is kind of like a gigantic lobby at a convention center where some huge conference is going on. The lobby is filled with little groups of people, collectively discussing with each other all kinds of little thoughts, observations, and events that each person there has encountered. You can easily wander through the lobby for hours, listening for snippets of conversation that relate to your own interests. Sure, being a raw, natural, human group of discussions, Sturgeon’s Law (“90% of Everything is Crap”) is in full effect. Sometimes literally: On Twitter I’m tracking the term “brewing” which seems to pick up more metaphorical uses of the word than literal, and a recent “Tweet” that popped up was somebody commenting that someone didn’t flush the toilet (“someone’s been brewing up a 1.6 gallon pot of turd stew.”)

So why bother? Because I think the remaining 10% has enough potential value to make a little mental effort to sift through the stream of messages worthwhile. I’d say a majority of the messages that come through are related to events happening at that moment. Twitter seems to get a lot of use as a back-channel for commenting on things that are happening, and for organizing impromptu gatherings. In most of these cases I think location information would be a valuable addition…and now I’m finally back to “on-topic”.

I think it’d be exceedingly nifty to be able to map Twitter messages in real-time. If I can convince anyone else that my “geostrings” idea is worth using, and then if one were to track “geostr”, any “tweet” with parseable location information would automatically show up. A small tag containing precise location information would make it possible for your computer automatically alert you if a post was describing something anywhere near where you are. Imagine the case of posts like “I just saw a tornado touch down, I’m going down to the basement now”. Or, say, “Who wants to try the homebrew I’m about to bottle?”

Example code in Javascript and PHP for picking out and parsing geostrings to follow soon. I’ll get back to yeast again shortly thereafter, though.

New toy: “Twitter”

Wow – Celestron takes 8 business days to get me a terse one-sentence answer. BigC responds in one. Impressive. Apparently their technical people are all at trade-shows at the moment so my bigger question will have to wait until they get back, but they were at least able to answer my question about their “tabletop” digital microscopes magnification (answer: the “600x” really is optical magnification, not digital.)

Another digital microscopy WANT/DO NOT WANT post to follow when I get the followup reply. Meanwhile, after hearing about it on the This Week in Tech podcast for a while, I finally talked myself into signing up to play with the coincidentally named Twitter system.

Twitter logoIt sounds like a really stupid idea – “Oh, goodie, now I can broadcast ‘text messages’ no more than 140 characters long about trivial events in my life to the whole world! Whoopee!” “Wow! I can find out when random strangers are drinking coffee AS IT HAPPENS!” Thrills! Excitement! Adventure!…

On the other hand, having the messaging system watch for particular words might be a handy way of monitoring current events. Plus, there seems to be a lot of potential for fun, off-the-wall uses, even if many of them are kind of silly.

It DOES seem like kind of an ideal context to play with that “geostrings” concept I’ve been toying with. A terse, easily-machine-parsed format for geotag data that can fit into a “twitter” post and still leave room for a sentence or two to go with the geographic information seems like it might be useful. If you’re so incredibly bored that you want to see some examples, you can check out my own Twitter posts, several of which I’ve embedded geostrings into.

I should be getting more done…

Im Name des Nudelmonster! It’s been over a week since my last post!

“Someone” seems to have located a replacement original disk of a game I had many years ago (but lost when I loaned it to someone) and bought it for me. Now, in addition to a variety of issues I need to deal with related to moving over the next few months, I have this delightfully surreal old computer game beckoning at me. ARGH! MAKE IT STOP!

Meanwhile, I’ve been trying to put together topics for next week’s “Just Science 2008”. We’ll find out who, besides me, is interested in fermentation once it starts. I think I’ll have to start off the series with a post on evolution, however, since it really does play a fundamental role when it comes to yeast culture. I also think I may be able to work JellO® into at least one of the posts, too…

Internet connection will be spotty the rest of this week as we travel towards the area that is to be our New Home, but I should have posts assembled in time for next week.

If I get a chance, there will hopefully be at least one more Geostrings post, possibly with a sample mp3 and/or Ogg/Vorbis audio file.

My “geostrings” project, and coming attractions.

I have set up a more permanent “page” for my little project to come up with a way to embed geotags in things like mp3, Ogg/Vorbis, video files, text documents, image formats besides jpeg and geotiff, and so forth. I’ve got a definition of the format and a basic description of the parsing algorithm for it up there. Embedding and decoding examples and so forth will follow soon, though I’m hoping for some comments before I get too deep into assuming I’ve got the format finalized.

Meanwhile, I’ve signed on for this year’s “Just Science” week, So I’ve got to get together at least five consecutive days worth of science posts to go up between February 4th and 8th. Fortunately, I think I can fill most if not all of it with the brewing science (and yeast culture in particular) stuff I’ve been researching. I’d still like to get my hands on at least one more paper which isn’t readily available to me (Gasent-Ramírez JM, Castrejón F, Querol A, Ramón D, Benítez T.: “Genomic stability of Saccharomyces cerevisiae baker’s yeasts.”; Syst Appl Microbiol. 1999 Sep;22(3):329-40.) but I do have quite a few others that I’m going over.

Gather and harken unto my tale of woe!

Well, this roadtrip has been rather difficult so far. Not necessarily bad, but definitely difficult.

It was about a week before the weather would let up enough for us to even escape our home state. I came down with a cold as we were leaving. The campground we were originally going to be staying at on the second night was mysteriously closed for the season despite supposedly being open year-round. Panoramio appears to have forgotten that I exist and won’t let me login to upload more to my photos (and I’ve not yet heard back from the email contacts there about getting back in). And then on the third night, neither of the truckstops next to our campground had sour cream. And the following morning, after stopping briefly to pick up some food for breakfast, the truck sputtered and died on the way up the onramp to continue the trip. And then we had some stress and confusion getting things worked out initially with the RV’ers organization to get towed to a repair shop and a campground. And then my wife has apparently picked up the cold that I’m getting over now. And then someone took a doody in my sandbox…oh, wait. That was just a “song” on one of my CD’s. Never mind.

On the other hand, we did manage to finally escape our home state, we did find a replacement campground for the second night, we did get everything worked out okay, and our truck’s problem turned out to be a relatively minor issue with the distributor though it evidently took a fair amount of labor to extract, fix, reassemble, and reinstall it, and we should be able to get back on the road in the morning. So, enough whining from me for now.

Meanwhile, I’ve thought about my “geotagging arbitrary files” issue a bit more. At this point I’m favoring the “geostrings” approach, split into what I’m calling “Where, When, and Whither” fields, which is to say, a field containing location (latitude, longitude, elevation), a field containing time-related information (timestamp, track-id), and a field containing direction (heading and angle) information. I’ve actually started putting geostrings in this form into some of the pictures I’ve been taking, just to get a feel for how easy or hard they are to work with. An example containing all information including the optinal stuff would look like this:

geostr:35.068531033,-106.5019369,1716.0905m:20080104T122418-06,track01:60,20:geostr

The “where” field is latitude, longitude, and elevation, separated by commas. The “when” is the ISO8601 standard simplified timestamp and a track ID, and the “whither” indicates a heading of 60° and an upward angle of 20°. The colon-separated fields and the comma-separated data within each field are in order from (as I perceive it) most important to least important. Aside from the latitude and longitude, and the “geostr” markers on either side, everything is optional.

Comments?

Linking this more-relevant latter portion of the post to the whining at the beginning is the fact that the cold I’m now getting over has messed up my voice. I did bring microphones and both my computer and some cheap portable recording gadgets, so at some point along the way I still want to do at least one short audio recording, geotagged and including an embedded image to go with it. I just need to wait for my voice to properly return (and to spot something about which I feel an urge to inflict people with my blabbering.)

Proposed format(s) for geotagging arbitrary types of media

Yet more thoughts on geotagging – here’s what I’ve come up with so far.

The format needs to handle only two fundamental data types – points and polygons. It also obviously needs to handle “lines” or tracks, but those are made of “points”. Polygon, for my purposes, might be unnecessary and I’m not sure if I should leave it in. I’m reluctant to leave it out – that way you could easily georeference media to a building or field’s outline, for example. On the other hand, I’m trying to keep this format terse and concise – I’m not trying to merely embed .gpx or .kml files in things.

A “point”, as I am thinking of defining it here, is made of up to seven attributes (more or less in order of importance): a latitude/longitude pair, elevation, timestamp, track-ID, heading, and angle. A polygon is the same, except that it contains a list of at least three lat/lon/optional-elevation sets. It still only has a single timestamp, though, just like a “point”. I suppose in some odd cases one could even define a track as a series of polygons – defining the field of view in a video taken from the bottom of an airplane that’s taking off, for example.

Leaving aside the question of polygons for now, I’m envisioning two possible formats which I will arbitrarily name “geotag” (XML-type) and “geostring”(simple text) for the moment.

I picture a geotag entry looking something like this:

<geotag:point lat="41.228063" lon="-115.058119" elev="1720.901m" datetime="20071115T143000-06" trackid="1" heading="340" angle="-5.0">Metropolis Hotel</geotag:point>

In this format, the optional description of the point is between the opening and closing tags there. “lat” and “lon” might be better as a single “latlon” or “coord” attribute, with the latitude and longitude separated by commas (i.e. <geotag:point coord="41.228063,-115.058119">:</geotag:point>)

A “geotring” point might look something like this instead:

geostring:point:41.228063:-115.058119:1720.901m:20071115T143000-06:1:340:-5.0:geostring

Not sure if the closing “geostring” is really necessary here, but it would make backwards-compatibility easier if fields were added to future revisions. As with the geotag, it might be better to treat the lat/lon pair (the only mandatory information for a minimal “point” definition) as a single field, so the minimal “geotag” example above done as a “geostring” would look something like: geostring:41.228063,-115.058119::::::geostring

Even as I write this, I find myself leaning towards combining the latitude and longitude into a single field, if for no other reason than it means each point only has one required field. Either way, I currently think the fields ought to be defined thus:

  • latitude and longitude are decimal degrees. Either may be prefixed by a + or – (lat: +=”Northern Hemisphere”, -=”Southern Hemisphere”, Lon: +=East, -=West) – if neither is there, + will be assumed. Latitude and longitude are required for every point.
  • Elevation may be suffixed by “m” or “f” (for “meters” or “feet”). If neither is specified, meters are assumed.
  • Timestamp is in the ISO 8601 “basic format”. If neither “Z” or an offset from UTC are specified, “the viewer’s local time” should be assumed (which is kind of silly, but it still would allow one to synchronize a track with, say, an audio recording or video.)
  • trackid is any arbitrary alphanumeric term with a maximum of, say, 16 characters (is that enough?) Any points with the same trackid are assumed to be part of the same track. If unspecified, the point is assumed to be unrelated to any other points (if any exist) that may be in the same file.
  • Heading is in decimal degrees from 0 to 360. This represents facing a particular (horizontal) direction from the point in question. “Which direction the camera was pointing” in the case of a photograph.
  • Angle is in decimal degrees from -90 to 90. This represents an angle above or below the current elevation at that point (for a picture, this would represent the upward or downward angle that the camera was pointing when the picture was taken.)

Hmmm, if I shorten “geostring” to “geostr” and either eliminate the “data type” field (“point”) or just reduce it to a single letter, that entire and complete “geostring” example would fit even into a single tiny 64-character comment field, if there are any file formats still floating around limited to that kind of small metadata size.

My main goal here is to make it easy to create files tagged with this information. So long as it’s easily read and not likely to get separated from the file it describes, using the data for anything ought to be easy, even if one has to do it “by hand”. As was mentioned on the “Into the Pudding” blog (found via the GeoRSS blog), having applications that can read metadata is useless if nobody’s putting the metadata in their files to begin with. If an acceptable format can be worked out, I intend to start making as much georeferenced information available as possible.

Who’s with me? Comments, suggestions, offers of patronage, anyone?