It’s harder than it looks

Imagine that you have a web site that, among other things, allows your users to search for media (audio and video) using a simple query language.  So, if you want to find Britney Spears videos, you’d just type britney spears in the search box and click the Search button.  Simple, right?

Disclaimer:

The examples below mention particular artists whose content appears legitimately on YouTube and other media sites, and can be legally obtained with the blessing of the copyright holders.

Although it’s possible that content from these artists can also be obtained illegally from other sites, I do not advocate that practice.  I do not support the use of any Internet search technology to obtain music, video, or other electronic media illegally. 

Companies that operate search engines do not knowingly index such illegal content.  Reputable companies remove links to illegal content as required by the Digital Millennium Copyright Act (DMCA), when the existence of that content is made known in accordance with the DMCA’s notification procedures.

Except it turns out that britney and spears are pretty common spam terms in metadata (the keywords and description fields of YouTube videos, for example).  People will upload all manner of stuff to YouTube and put bogus terms in the description in an attempt to get people to watch the video.  To reduce the number of irrelevant or inappropriate results returned (it’s probably impossible to eliminate irrelevant content), you decide to index the metadata by field and allow the user to say which fields are searched.  So, if they want just those videos that have “Britney” and “Spears” in the title field, they would type britney spears IN Title.  That doesn’t eliminate all of the spam, but it reduces it quite a bit.

It turns out that you have to make the IN case sensitive.  Otherwise you’d never be able to search for the word “in” in any metadata.  The same is true for any word that you use in your query language.  For example, if wanted all the videos that contain “Britney” or “Spears”, we’d write britney OR spears IN Title.

Still, not too hard, right?  But what if you want to search the Title field and the Description field?  At first you’d think you could write:  britney spears IN Title OR Description.  You could make that work until you take into account the possibility of more complex query expressions.  For example, let’s say you wanted a list of all videos that claim to be a Led Zeppelin song, or some version of Stairway to Heaven.  One possible query would be:

led zeppelin IN Artist OR Description OR stairway heaven IN Title

Whereas that query might look reasonable to a non-programmer, writing a computer program to properly handle the general case of queries like that is non-trivial.  The query can be parsed in several different ways.  Three of which are:

(led zeppelin IN Artist OR Description) OR (stairway heaven IN Title)
(led zeppelin IN Artist) OR )(Description OR stairway) heaven IN Title)
(led zeppelin IN Artist) OR (description OR (stairway heaven) IN Title)

All three of those interpretations are perfectly valid.  Applying rules of operator precedence can disambiguate some of the cases, but if you go through the exercise you’ll find out that IN has to have lower precedence than OR, and if you do that, then you end up with:

(led zeppelin IN Artist OR (Description OR stairway heaven)) IN Title

You end up having to either decorate the field names (i.e. “@Artist”) or group them with brackets or parentheses (i.e IN [Artist or Description]).

All of this is doable, and not especially heavy lifting as far as parsing is concerned.  But then you have to explain it to a non-technical user and make it easy for the non-technical user to use.  Otherwise, only programmers will want to (or even be able to) use it.

I’ve heard many a programmer (myself included, come to think of it) complain about a search facility that doesn’t allow complex queries.  We look at it from a programmer’s perspective and think it’d be trivial to implement a comprehensive query facility.  And in most cases they’re probably right.  You could develop a query system that anybody with a couple years’ of programming experience could use without trouble and get exact results.  And when you flipped the switch to turn it on, you’d hear crickets.  Most users don’t understand Boolean algebra or the difference in precedence between AND and OR.  Trust me, people will go somewhere else to get their information rather than have to think of how to ask for it.

What users really want is a DWIM mode:  Do What I Mean.  They want to type word soup into the search and get back exactly what they were looking for, with no false hits (i.e. asking for beatles the music group and getting back something about dung beetles because somebody misspelled “beetle”).

But DWIM doesn’t exist.  Not today, and not for a long time (perhaps ever) in the future.  As a result, we have to restrict what the user can type and very carefully specify how things will be interpreted.  We have to make it easy for the most common cases, but able to do moderately complex and powerful things.  That balance is difficult to achieve, and no matter what you come up with, somebody will complain.  You can only hope that the number of users you delight will vastly outweigh those whom you annoy.

Plane crash in Austin

Updates and corrections (full story below):

The pilot’s name is Joseph (Joe) Stack.  He was a software engineer from Austin.  The airplane, a Piper Dakota tail number N2889D, was registered in his name.  He posted a suicide note on his web site at about 9:15, drove to the Georgetown airport and took off about 9:40.  He crashed into the office building shortly thereafter.

The ISP that hosted his web site took the site offline in response to a request by the FBI.  Thanks to the Internet, his suicide note (some are calling it a “manifesto”) will live on.

As of 4:30 PM, there are two reported injured and one still unaccounted for.  I don’t know if that unaccounted-for person is the pilot himself, or somebody who was supposed to be in the building.

My original report:

Around 10:00 this morning, I heard a report on the radio of “something happening” near a major highway intersection here in Austin.  I soon learned that a small plane had hit a building.

The crash started a very big fire, and the building is engulfed in flames.  One report I saw said that the building is likely to be completely destroyed.

It’s been about two hours since the incident.  The NTSB is investigating the it as an intentional act.  Early reports indicate that the pilot set fire to his own house, then stole an airplane and intentionally flew it into the building.  I haven’t yet seen any reports of a motive, and there’s still a lot of speculation.

One person from the building is still unaccounted for.  There are reports of two people being transported to the hospital, the extent of their injuries unknown.  There are no deaths reported.

The conspiracy theorists have already jumped on it.  I’ve seen several posts questioning whether “a little plane” could start such a large fire and cause so much damage to the building.  Another bunch of posters are accusing the Obama administration of leaning on authorities and media to prevent the incident from being described as terrorism.   I’d laugh, but it scares me that there are those who take these guys seriously.

Carving Simple Simon the Penguin

I have other projects on my bench at the moment or I would have tried this one already.  A great beginner project, or a quick and fun little project for the more experienced carver, is Simple Simon the Penguin.

Dave Brock presents a three-part video series that will walk you through it step-by-step.  All you need is a piece of basswood that’s approximately 1″x1″x6″.  The penguin itself is only 2-3/4″ long, but you’ll want the extra length to hold on to.  Carve one, flip the stick over and carve the other, and then cut the two penguins apart.

Simple Simon, Part 1
Simple Simon, Part 2
Simple Simon, Part 3

Spalted Maple Dog

Spalting is discoloration of wood caused by fungus, most often during decay.  It can happen to diseased or stressed trees, and rarely in live, healthy trees.  Spalting can create some very beautiful colorations in the wood, as it did in the piece of maple where I found this little dog hiding.

spaltedDog_sm

As I said, spalting occurs during decay.  Another side effect of decay is that the wood often becomes softer (sometimes a good thing) and more likely to splinter (not a good thing).  This piece was quite prone to splinter, causing me to lose the tail, half of the left foot, and part of an ear.  Still, I love the color and I think this is the best face I’ve done yet.

Out of Control

The President unveiled his new budget today: 3.83 trillion dollars. The numbers fairly boggle the mind. The total budget works out to just about $12,500 per person in the United States, or about $47,500 per family. Or $34,800 for each of the 110 million taxpayers in the country. Of course, 41% (about $1.56 trillion) is deficit spending, meaning that 41 cents of every dollar the government spends in fiscal year 2011 will be paid for (supposedly) in the future. But with an existing debt of $12.5 trillion, this year’s budget will push the accumulated national debt past $14 trillion: about the same as the U.S. Gross Domestic Product. Interest on the debt alone amounts to about $175 billion per year, or about $2,200 per family, 25% of which ends up in the treasuries of other countries that hold U.S. debt securities.

This year, total government debt will exceed total income for the entire country. The White House budget office says that debt will remain at that level through 2019 (that is, debt will roughly equal GDP), but those projections rely on GDP growing faster than most analysts say it can. At $14 trillion, national debt is almost 20% of all household and business assets in the entire country. If government spending continues at this rate, the accumulated federal debt alone will exceed total assets in 20 years or so. That doesn’t include the approximately $40 trillion (currently) in debt owed by local and state governments, corporations, and individuals.

I won’t try to lay the blame for this situation on the President. Not on the current President, and not on the former Presidents. Undoubtedly, they all have contributed to it by proposing budgets that fund pet projects or further their own agendas, but that’s to be expected. No, the real blame lies with Congress for approving such outrageous spending over the decades, and with us–the American voter and taxpayer–for continuing to allow it.

The President on Wednesday announced a proposed spending freeze on domestic discretionary spending as a way of trying to get the deficit under control. As laudable as that is (any freeze or decrease in government spending gets my vote), it’s pretty difficult to take it seriously. He’s talking about a projected “savings” of about $250 billion over the next 10 years. That’s less than 3% of the total debt expected to accumulate over that period, or about 1% of total spending. And it’s highly unlikely that Congress will approve even that miniscule spending reduction.

The President is in a tough spot because there are programs he positively cannot touch. Even if he were willing to forego re-election, there’s no way Congress would approve cuts in those programs. Doing so is tantamount to political suicide. What programs? I’m so glad you asked.

The following numbers are from the FY 2010 budget

  • Social Security is 19.63% of the budget. 13% of the people in this country are over 65 years of age, and a very large percentage of them vote. Need I say more?
  • Medicare is 12.79% of the budget. See above.
  • Unemployment, welfare, and other “mandatory spending” is 16.13% of the budget. Almost untouchable, regardless of which party controls Congress.
  • Medicaid and associated programs: 8.19%. Ditto.
  • Interest on the national debt: 4.63%.  Can’t have us defaulting on our debt.

When you throw in the Department of Defense share of 18.74%, the total comes to 80.11% of the budget that the President has almost no control over. The budget is 20% over revenue before the President even gets to attempt spending reduction. Think of that: if you cut out military and all government spending other than the programs I mentioned above, we’d still have a budget deficit.

This is nothing new, by the way. I remember the same math being presented to me in 1981 or 1982. If anything, the President has fewer options today than Reagan did back then.

I see three ways out of this mess: Reduce spending, raise taxes, or somehow increase GDP by about 50% so that current tax rates will cover the deficit. In the current climate, spending reductions and tax increases are political suicide, and a 50% increase in GDP is impossible. Tax increases are less suicidal in most cases, and they have the “benefit” (in political terms) of pissing off fewer people, so that’s the route Congress will likely take in an attempt to prevent the inevitable. Even so, there’s no way they can make up a 40% budget deficit (or even a 20% deficit) with tax increases.

No. I guess I don’t see any way out of this mess. Our spending is out of control and there isn’t anybody in a position to slow or stop it.  It’s a frightening thought.

Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.