Jim’s Random Notes

March 9th, 2010

What is dragonwood?

It’s rare that I’m stumped when I try to find something on Google, but this one beat me.  Somebody on the woodcarving forum asked about “dragonwood.”  Always curious, I thought I’d look it up.

Dragonwood appears to be very commonly used for the trunks and larger branches of artificial (silk) trees.  It’s also commonly used to make perches for pet birds, and I gather somewhat less commonly used to make cat trees and cheap furniture.  That’s all interesting, but I couldn’t find a picture of a dragonwood tree or anything that gave me the botanical name of the silly thing.  The best I could find is that it grows in Florida.

Somebody else on the forum posted an answer this afternoon, identifying the wood as Lyonia Ferruginea (rusty staggerbrush), a shrub or small tree that grows in Florida, Georgia, and South Carolina.  In case you’re interested, that person also indicated that it’s good carving wood.

I’m really surprised that this one stumped me.  The common name dragonwood (less often, “dragon wood”) is used in a lot of places, but I was unable to find a any reference that showed its botanical name.  I figured I could find it just like I can type “bottle brush tree” and get the botanical name.  No such luck.

One resource said that “dragonwood” was a corruption of the original “draggin’ wood”, which describes how they get the wood out of the thicket after it’s cut.

Hopefully anybody else looking for a description of dragonwood will find this post and not have to wade through a few dozen pages of links to fake plants and parrot cage goodies.

October 21st, 2009

Sniffing network traffic

My latest crawler modifications require me to scrape Web pages that host videos so that I can obtain metadata (title, description, date posted, etc.) that we place in our index.  Unfortunately, there’s no standard way for sites to present such information.  ESPN and Vimeo have HTML <meta> tags that provide some info, but I have to go parsing through the body of the document to find the date.  (And yes, I’m aware that Vimeo has an API that will make this a moot point.  I’ll be investigating that soon.)

Other sites are much worse in that they provide no metadata in the HTML.  For example, one site’s video page is very code-heavy.  Requiring that the page be reloaded every time you request a new video would require a lot of network traffic.  Their design instead uses JavaScript to request a particular video’s metadata from a server.  Loading a new video involves downloading just a few kilobytes of data.

I spent some time this afternoon searching through the a video page HTML and the associated JavaScript, looking for the magic incantation that would get me the data I’m looking for.  The amount of code involved is staggering, and I quickly went crosseyed trying to decipher it before I hit on the idea of hooking up a sniffer to see if I could identify the HTTP request that gets the data.

It took me all of five minutes to download and install Free Http Sniffer, request a video from the site in question, and locate the magic line in the 230 or so requests that the page makes when it loads.  Problem solved.  Now all I have to do is write code that’ll transform a video page url into a request for the metadata, and I’m set.

I have no idea why I didn’t think of the sniffer earlier.  I’d used one before for a similar purpose.  I suspect I’ll be making heavy use of it in the near future as I expand the number of sites that we crawl for media.

September 1st, 2009

Facebook is broke!

Checking Facebook tonight, I got a notification that I had hidden some applications from my news feed.  Thank you very much, Facebook, but I didn’t need reminding.

So I canceled the notification and clicked on something else.  It reminded me again.  Okay, so I added those applications back to my feed.  Won’t let a little bug stop me from posting a comment on a friend’s wall.

Except now I keep getting this:

facebook

I can still post comments (the dialog is not modal), but it’s just … weird.

Later:  Closing the Facebook tab in my browser and re-loading fixed the problem.

August 18th, 2009

Yahoo! Mail customer care stinks

I am currently engaged in a week-long struggle with Yahoo! Mail’s “customer care” about an email that they’re blocking.  Since the beginning of the year, my server has been sending a daily report about crawler performance to me and to my coworkers.  The email consists of a single HTML file, inside of which are some internal links and hundreds of text URLs, but no actual links to those URLs.  Our company’s email is hosted by Yahoo.

Until 10 days ago, that report was delivered daily, without fail.  But since we changed our colocation setup and got new IP addresses, the mail has been sporadic:  bouncing eight of the last ten times I’ve tried to send it.  The error message I get back from Yahoo’s server says that the message is rejected “for policy reasons.”  Digging deeper, I find that Yahoo’s filter seems to think that there are “links to potentially objectionable material or malicious software.”

When contacting Yahoo, I gave them considerable detail, including the text of the message, a full description of the problem, and the reason why I thought that their filter was being over zealous.  Their responses have been canned boilerplate paragraphs, first asking for information that is not relevant or that I’ve already supplied, then explaining the policy:  the same policy that’s on their Web site and that I told them I already understood.  I have yet to receive a response from Yahoo to indicate that they’ve actually read and understood any of the information that I’ve sent to them.  I’m convinced that if I sent a message requesting a ham and swiss on rye, they’d reply by asking me to forward the full headers from the email in question.

I would strongly discourage anybody from using Yahoo for their business email.  Their response to this simple request has convinced me that Yahoo’s incompetence is not limited to search (which they’ve finally agreed to farm out to Microsoft), but permeates the entire organization.  If you want reliable email and intelligent, helpful support, find somebody other than Yahoo to host it.

July 11th, 2009

Getting attention the new way

So let’s say that you’re a musician on your way by airplane from Point A to Point C with a stop at Point B. Trusting the airlines to handle your luggage, you check your guitar. While sitting in the airplane at Point B you see the luggage handlers treating your guitar roughly, and when you arrive at Point C you learn that the guitar has been broken.

So you spend a year trying to convince the airline that they should make things right.  When your efforts fail and the airline says that their final response is “No,” you decide on a different plan of action.

Dave Carroll posted that video on July 6.  CNN reported on it two days later.  Since then, it’s been reported on several other major news networks and countless blogs.  Today, five days after the video was posted, it has over 2 million views.

As a coworker said, “never piss off a musician.”  I’m betting United Airlines wishes they had handled this differently.

June 23rd, 2009

Web video: Searching for standards

Web video is all the rage these days, with seemingly everybody getting into the action.  The 900 pound gorilla, of course, is YouTube.  Estimates of YouTube’s size vary from 100 million to 250 million videos.   My suspicion is that it’s towards the top end of that range.  But even 100 million videos is more than all the other providers combined.

Yes, there are video sites other than YouTube.  And, no, they’re not all porn sites, although there certainly is a healthy number of those.  Other video sites include Vimeo, CNN, ESPN, LiveLeak, Fox News, Hulu, MTV, Newsweek, YouKu, and at least two dozen more that I’m too lazy to list.  All the major networks have video sites.  Microsoft, AOL, Yahoo, and even Google have videos.  Yes, Google video competes with YouTube.  Video is big.

Sorry.  You’ll have to locate the porn sites yourself if you’re so inclined.

That’s a Good Thing.  Except …  Except that every site has its own video player.  In order to play a video on the Web, you have to download the player and then stream the video through it.  It reminds me of the early days of video stores, when you had to rent a movie and a VCR.  Will that be VHS or Beta?.

Users don’t see this as a problem.  Yet.  Considering that many users see YouTube as the only place for video on the web, that’s no surprise.  But Web developers who want to embed videos from many different sources notice this problem in a big way.  Every player requires different embed code.  Every player looks different, with controls in different places and sometimes garish branding so that you know, without any doubt, where that video came from.  Some players have a JavaScript API that lets the embedding page control it, and others don’t.

The result is a horrible mishmash of wonky controls and hacked Web pages trying to get embedded video to work well.  Developers have to choose between excluding a particular video source, or including it with the understanding that those videos will look and work differently, and possibly cause their pages to crash, hang, or otherwise misbehave.  We don’t have just two formats to worry about, but 30 or more.  And for each one we have to know the magic incantation for obtaining the player, displaying it in a Web page, making it play a video and, if we’re lucky, controlling playback with a common set of user controls.  It’s maddening.

As it stands now, playing a video in a Web page is a heavyweight operation.  Web video is exploding, and this problem will only get worse unless the major players get together and standardize on a single video player.  Or at least a standard for embedded video player behavior and a standard API so that developers can concentrate on delivering the content that video providers want us to deliver.

And therein lies the problem.  It’s almost a certainty that YouTube will thumb its nose at the crowd and go its own way.  If we were extremely lucky YouTube would make their player available to the community, but that’s highly unlikely.  The better and more likely (although still not very likely) option is that the second tier of providers get together and create a standard.  Then, at least, developers would only have to worry about two players:  YouTube and everybody else.

I honestly don’t know what to expect here.  If my experience with MP3 music files on the Web is any indication, I probably shouldn’t hope for too much.  Although it’s true that the vast majority (well over 90%, based on six months’ crawling for different formats) of audio on the Web is MP3, there’s no standard player for streaming, and no standard API for controlling the disparate players.  And don’t even get me started on the pain of playing naked MP3 files.  Video will be much bigger than Web audio ever was.  I shudder at the thought of trying to handle 300 different providers rather than just the few dozen we have now.

April 4th, 2009

Broadband and the business of media

Earlier this week, Time Warner Cable announced plans to expand their use of metered broadband:  charging customers by the gigabyte rather than providing the all-you-can-slurp service we’ve come to know and love.  As a TWC customer, I have mixed feelings about this.  I realize that they’re in business to make money.  But as a government-protected monopoly (a practice with which I strongly disagree), they should have a moral responsibility—if not a legal responsibility—to upgrade their networks in order to provide the best possible service.  It’s not like the explosion of media and online gaming came as any surprise.  If they can’t provide adequate service at reasonable prices, then they should lose their monopoly protection.

TWC is also harassing business customers who actually try to use the bandwidth they’ve contracted for.  Customers who have been guaranteed 10 Mbps service, for example, get phone calls when their usage approaches 75% of that.  As a TWC business customer, my feelings on this practice are not mixed at all.  They contracted to provide a certain level of service, and they should honor that commitment.

J:Com, the largest cable company in Japan, offers 160 Mbps service for about $60 per month.  They had to invest about $20 per home in order to upgrade their systems to support it.  Seems to me that, rather than trying to charge more for less, U.S. companies should learn from J:Com.  They’d make more money and have happier customers.

Cable companies, by the way, are in danger of becoming no more than utility providers.  With sites like Hulu and programs like Boxee, you can view TV programs and movies from your Internet connection.  Why pay a middleman who wants to sell you two good channels and a package of crap when you can get everything for free? 

This conflict of interest could be a major reason why U.S. cable operators are reluctant to provide inexpensive high-bandwidth connections.  Doing so would further cannibalize their cable television operations.

Content creators, too, need to start thinking about how they’ll get paid.  Without the cable company packaging model, pay channels will have to sign up individual subscribers.  Ad-supported channels will likely have to find a format other than the traditional commercial break every 10 minutes.

I think motion picture studios are okay for a while. People are still willing to pay $10.00 or more for the privilege of watching movies in a theatre on opening weekend. However, I think DVD sales will begin to decline as more people move to on-demand video services.

Since its inception, the whole business of media has been based on the scarcity of content.  Today, Internet users have almost infinite choice.  A large portion of what’s available is dreck, but there’s a lot of really good stuff out there, too.  Big record labels are in their final death throes due in large part to the ubiquity of good music available for free or by direct purchase from the artist.  As bandwidth and storage become cheaper, cable companies and video media providers will find it increasingly more difficult to make money in the old way.

Companies like Time Warner Cable, who cling to the old idea of scarcity as the way to make money, will find themselves losing customers to companies who make it their business to provide high quality service at reasonable rates.  As far as I’m concerned, it can’t happen soon enough.

March 25th, 2009

Stack Overflow

For most of the ’90s, I was a part of TeamB—a group of volunteers who helped answer questions on Borland’s Compuserve forums.  I met a bunch of really great people doing that, got some free Compuserve time, a few trips to the Bay Area, and lots of Borland products.  But mostly, I learned a heck of a lot by helping to answer users’ questions.

When Borland, Microsoft, and other development tool companies moved their online technical support to the Internet, their support was mostly done through newsgroups, and I found the signal-to-noise ratio there almost unbearable.  Except for the moderated newsgroups, which were few and far between, asking a question was like talking to a wall.  Worse, even, because a wall won’t give you wrong answers or call you stupid for doing something different.  Even with the advent of forums rather than newsgroups, online technical help was virtually non-existent for a number of years and I just stopped trying.

Enter Stack Overflow, a free programming Q&A site where you can ask questions, share your expertise, or just browse for nuggets of programming wisdom.  Stack Overflow works.  In many ways it works better than the old Borland Compuserve forums that I enjoyed so much.

Why it works is simple: they’ve found a way to reward people for supplying good answers and, to a lesser extent, asking good questions.  It all has to do with reputation:  ego.  You gain reputation points for supplying good answers, and asking good questions.  “Good” is determined by a simple up- or down-votes by site users.  As you gain reputation points, you gain the ability to help moderate the site: re-tag questions, vote to close, edit questions, etc.  And your current reputation is prominently displayed beside your name.  There are also awards (”Badges”) given for a number of different things.

If you don’t care about reputation, that’s fine.  You can use the site anonymously and still ask, answer, and comment on questions.  But Stack Overflow works because a whole lot of people there do care about their reputations.  Giving more experienced users the ability to help moderate keeps the flaming and other invective to a minimum, and the constant peer review ensures that (in general) the higher-rated answers really are the best.

My only real complaint with Stack Overflow (and it’s not huge) is that the format doesn’t encourage an ongoing threaded discussion as was available on the Compuserve forums.  That’s not a problem in most cases, but there are times when arriving at a satisfactory answer requires much back-and-forth, and it’d be nice to see questions and answers displayed in threaded newsgroup fashion.  The ability to see answers ordered by date helps a lot, as does the comments feature, and I suspect that adding a threaded view would be of only limited additional help.

Despite a few nitpicks, I’m seriously impressed with Stack Overflow.  If you have a programming question on any topic, you should search for the answer there.  And if you don’t find it, ask.  You’ll probably be surprised at the speed and the quality of the answers you get.

March 9th, 2009

Music everywhere

One of the benefits of what I’m doing for work (we’re building a media search and discovery site) is that I find all kinds of different music all over the Web.  Sure, there’s lots of commercial music out there that shouldn’t be, but it’s a relatively small part of what’s there.  The crawler’s incredible breadth has allowed me to find lots of new (to me) music from many independent artists who post samples or full songs on their Web sites.  They know that their biggest problem is getting people to discover them.  Piracy is a problem only for hugely popular artists.  Small artists’ biggest enemy is obscurity.

A good example is guitarist and composer Randy Ellefson, whose music I discovered on an instrumental podcast.  I’ve become quite a fan of his music (I like early ’70s rock,  which his music resembles), and I’m impressed by the way he’s making his music available.  He allows podcasters (with permission) to feature his songs in their podcasts, and he also makes some of his songs available on his Web site.  For example, he’s released two albums.  On his main page, you can listen to four full songs from each album.  There are also links where you can download a half dozen songs:  three from each album.

Randy Ellefson, like many independent artists, understands that giving away a few full tracks encourages people to buy the rest.  He also accepts PayPal as well as credit cards, so purchasing his music is incredibly convenient.  If more artists made their music as easy to find and buy, we wouldn’t need the big record companies.

August 20th, 2008

Should VPN be this hard?

Last week we moved the crawlers from our office to a real data center where we can get more, and more reliable, bandwidth.  Getting everything installed and working wasn’t too much trouble, although the next time I have to do something like that I’m going to do a lot more pre-installation work here at the office before taking the machines to the data center.  Installing and configuring 10 machines while standing in the cold, noisy data center isn’t my idea of a good time.

Having machines at the data center means that we need some way to log in and check on them.  Not a problem, as the Cisco security appliance we bought supports VPN.  And configuring the Cisco IPSec VPN was quite simple.  I was pretty happy when, with just an hour of looking at the documentation and fiddling with the configuration, I was able to log in to the VPN from my laptop.  I packed up my stuff and headed back here to get everybody set up to use the VPN.

And then I found out that Cisco’s IPSec VPN client won’t run on 64-bit versions of Windows.  Nor does Cisco have any plans to upgrade it.  Since I’m not willing to create a 32-bit virtual machine just for running the VPN client, that leaves me with the option of configuring the router for some other type of VPN.  And there things get difficult.  The documentation that came with the router doesn’t discuss any type of VPN configuration other than IPSec, and the online documentation I’ve seen makes the assumption that I understand everything there is to know about VPN.  It gets confusing in a real hurry.

There are VPN standards.  There are so many, in fact, that no mere mortal can begin to understand them.  It might as well be a free for all with all those competing protocols.  Just the acronyms are enough to push a questionably sane person such as myself over the edge into babbling lunacy.  I’ve yet to find a document that explains, in terms a reasonably bright person who hasn’t passed Cisco’s certification can understand, how to configure the VPN.  I can’t even find a good discussion of the benefits and drawbacks of the different VPN technologies:  IPSec, L2TP, or SSL.

I also need to configure VPN on our pfSense box here at the office.  That looks almost as daunting as the Cisco’s configuration and the documentation is, if you can imagine, even worse.

I realize that much of my frustration stems from my lack of expertise in this area.  I’m a programmer, not a network admin.  But I have to think that VPN just doesn’t need to be this hard.

I can find lots of “how VPN works” types of discussions online, but they’re presented at a very high level.  There also is plenty of detailed documentation about VPN configurations for very specific situations.  But I’ve found nothing in the middle.  Something like “Simple VPN configuration for people who don’t live and breathe this stuff.”

Pointers to good discussions of the different types of VPN, and good tutorials about configuring VPN on the Cisco ASA or pfSense would be greatly appreciated…

June 16th, 2008

One more time: the Internet is public

[Note:  As Michael Covington pointed out, there's plenty of privacy on the Internet--just not on the World Wide Web.]

I know I’ve mentioned this before, but I keep running across people who don’t understand that there is no privacy on the Internet.  If you’ve uploaded something to your Web site, it’s highly likely that Google, MSN, Yahoo, or any (or all) of the many other search engines out there has found it.  Even our Web crawler–a small-scale operation–finds things in hidden nooks and crannies of the Web that most people with browsers would never stumble upon.

For example, the other day a coworker was spot-checking some of the crawler’s latest finds and stumbled upon a site where the owner had uploaded what looks like (from examining the file names) a bunch of very private stuff.  This all in an unprotected directory.  A person with a browser could go to that URL, get a listing of all files, and then browse to his heart’s content.  Although it’s unlikely that a person browsing would stumble upon the directory, a crawler almost certainly will.  Eventually.

When we run across something like that, we don’t actually browse, but rather find out how to contact the site owner and send him a very nice email suggesting that he either protect the directory or not upload that information.

The day after discovering the site I mentioned above, we ran across the story of Alex Kozinski, a judge in the 9th Circuit whose personal porn stash was found publicly accessible online:

Kozinski, 57, said that he thought the site was for his private storage and that he was not aware the images could be seen by the public, although he also said he had shared some material on the site with friends. After the interview Tuesday evening, he blocked public access to the site.

Of particular interest in this case is that the judge was presiding over an obscenity trial (now postponed) that involves material that’s apparently similar to some of the material on the judge’s site.  The judge also had some copyrighted music on the site, opening up the possibility of copyright violation.

No matter how far out in the country you live, if you stand naked in front of an uncovered window, somebody will eventually see you.  Similarly, if you upload something to your Web site and don’t take active measures to prevent access, it will be found.  Do not assume that it can’t be found because you never told anybody about it.  That’s like putting a key under the doormat and figuring it’s safe because only you know it’s there.

|