Codesion free Subversion hosting: Fail

I thought I’d try cloud hosting for source code version control. I’ve been using version control on my local box, but figured it’d be better to have that stuff stored offsite so that I can get to it from wherever I am.

Codesion came highly recommended, and I was pleased with the ease of setting up a new trial account. At least, I was impressed until I tried to access my new repository. If I try to access it from my Subversion command line client, I get this message:

svn: E175002: Unable to connect to a repository at URL 'https://jimmischel.svn.cvsdude.com/jimsstuff'
svn: E175002: The OPTIONS request returned invalid XML in the response: XML parse error at line 1: no element found (https://jimmischel.svn.cvsdude.com/jimsstuff)

I get a similar error if I try to access it with the TortoiseSVN GUI client.

So I thought maybe I misread the terms of the free trial. I went to the Codesion site and tried to view the repository with their ViewVC browser. The result is a browser window containing this information:

An Exception Has Occurred

Python Traceback

Traceback (most recent call last):
  File "/services/viewvc_template/lib/viewvc.py", line 4396, in main
    request.run_viewvc()
  File "/services/viewvc_template/lib/viewvc.py", line 268, in run_viewvc
    self.repos.open()
  File "/services/viewvc_template/lib/vclib/svn/svn_ra.py", line 204, in open
    self.ctx.config)
  File "/usr/local/python/lib/python2.5/site-packages/libsvn/ra.py", line 518, in svn_ra_open
    return apply(_ra.svn_ra_open, args)
SubversionException: 175002 - Unable to connect to a repository at URL 'http://10.36.235.136/svn/jimmischel/jimsstuff'
175002 - The OPTIONS request returned invalid XML in the response: XML parse error at line 1: no element found (http://10.36.235.136/svn/jimmischel/jimsstuff)

Checking their online forums, it looks as though others have had similar issues in the past few weeks. Sorry, Codesion. If you can’t get a simple free trial to work, I’m not going to trust you with my source code.

There are dozens of sites that offer free or inexpensive Subversion hosting. I don’t have the time or inclination to try every one of them. Anybody have a recommendation?

And, no, I’m not interested in using GIT. Please don’t suggest it.

Writing a web crawler: Introduction

This is the first of a series of posts about writing a custom Web crawler. It assumes some knowledge of what a Web crawler is, and perhaps what crawlers are typically used for. I don’t know how many posts this subject will require, but it could be a rather long series. It turns out that Web crawlers are much more complicated than they look at first.

Articles in this series:

Crawling Models
Politeness
Queue Management: Part 1

When you hear the term “Web crawler,” it’s likely that your first thought is Google. Certainly, Google’s crawler is better known than any other. It’s probably the largest, as well. There are many other crawlers, though: Microsoft’s Bing search runs one, as do Blekko and other search companies (Yandex, Baidu, Internet Archive, etc.) In addition, there are a few open source crawlers such as NutchHeretrix, and others. There are also commercial-licensed crawlers available.

The other thing that typically comes to mind when you think of Web crawlers is search engines. Again, Google’s crawler is used primarily to gather data for the Google search engine. When speaking of crawlers, the big search engines get all the press. After all, they’re solving big problems, and their solutions are impressive just on their sheer scale. But that’s not all that crawlers are good for.

The big search engines run what we call “general coverage” crawlers. Their intent is to index a very large part of the Web to support searching for “anything.” But there is a huge number of smaller crawlers: those that are designed to crawl a single site, for example, or those that crawl a relatively small number of sites. And there are crawlers that try to scour the entire Web to find specific information. All of these smaller crawlers are generally lumped together into a category called focused crawlers. The truth is that even the largest crawlers are focused in some way–some more tightly than others.

Smaller focused crawlers might support smaller, targeted, search engines, or they might be used to find particular information for a multitude of purposes. Perhaps a cancer researcher is using a crawler to keep abreast of advances in his field. Or a business is looking for information about competitors. Or a government agency is looking for information about terrorists. I know of projects that do all that and more.

Another class of programs that automatically reads Web pages are called screen scrapers. In general, the difference between a screen scraper and a crawler is that a screen scraper is typically looking for specific information on specific Web pages. For example, a program that reads the HTML page for your location from weather.com and extracts the current forecast would be a screen scraper. Typically, a scraper is a custom piece of software that is written to parse a specific page or small set of pages for very specific information. A crawler is typically more general. Crawlers are designed to traverse the Web graph (follow links from one HTML page to another). Many programs share some attributes of both crawlers and screen scrapers, so there is no clear dividing line between the two. But, in general, a crawler is an explorer that’s given a starting place and told to wander and find new things. A scraper is directed to specific pages from which it extracts very specific information.

My credentials

I’ve spent a good part of the list five years writing and maintaining a Web crawler (MLBot) that examines somewhere in the neighborhood of 40 million URLs every day. The crawler’s primary purpose is to locate and extract information from media (video and audio) files. The crawler’s basic design was pretty well fixed after three months of development, but it took about a year before we had “figured it out.” Even then, it took another year of tweaking, refactoring, and even some major architectural changes before we were happy with it. Since that time (the last three years), we’ve mostly made small incremental changes and added some features to recognize and specially process particular types of URLs that either didn’t exist when we started crawling, or that have become more important to us over time.

That said, I won’t claim to be an “expert” on crawling the Web. If there’s one thing my experiences have taught me, it’s that there’s a whole lot about the Web and how to crawl it that I just don’t know. As a small (four people) startup, we all wear many hats, and there are many things to be done. Although we know there are things our crawler could do better, we just don’t have the time to make those changes. We still discuss possible modifications to the crawler, and in many cases we have a very good idea of what changes to make in order to solve particular problems. But there are still hard problems that we know would take significant research work to solve. It’s unfortunate that we don’t have the resources to make those changes. For now, the crawler does a very good job of finding the information we want.

I have to point out here that, although I wrote almost all the code that makes up the crawler, I could not have done it without the help of my business partners. My understanding of the many challenges associated with crawling the Web, and the solutions that I implemented in code are the result of many hours spent sitting on the beanbags in front of the white board, tossing around ideas with my co-workers. In addition, major components of the crawler and some of the supporting infrastructure were contributed by David and Joe, again as a result of those long brainstorming sessions.

Based on the above, I can say with some confidence that I know a few things about crawling the Web. Again, I won’t claim to be an expert. I do, however, have a crawler that finds, on average, close to a million new videos every day from all over the Web, using a surprisingly small amount of bandwidth in the process.

Why write a Web crawler?

As I pointed out above, there are many open source and commercially licensed Web crawlers available. All can be configured to some degree through configuration files, add-ons or plug-ins, or by directly modifying the source code. But all of the crawlers you come across impose a particular crawling model, and most make assumptions about what you want to do with the data the crawler finds. Most of the crawlers I’ve seen assume some kind of search engine. Whereas it’s often possible to configure or modify those crawlers to do non-traditional things with the data, doing so is not necessarily easy. In addition, many of the crawlers assume a particular back-end data store and, again, whereas it’s possible to use a different data store, the assumptions are often deeply rooted in the code and in the crawler’s operation. It’s often more difficult to modify the existing crawler to do something non-traditional than it is to just write what you want from scratch.

For us, the decision to build our own was not a hard one at all, simply because five years ago the available crawlers were not up to the task that we envisioned. Five years ago, modifying any of the existing crawlers to do what we needed was not possible. That might be possible today, although the research I’ve done leads me to doubt that. Certainly, if I could make one of the crawlers work as we need it to, the result would require more servers and a lot more disk space than what we currently use.

If you’re thinking that you need a Web crawler, it’s definitely a good idea to look at existing solutions to see if they will meet your requirements. But there are still legitimate reasons to build your own, especially if your needs fall far outside the lines of a traditional search engine.

My next post in this series will talk about different crawling models and why the traditional model that’s used in most crawler implementations, although it works well, is not necessarily the most effective way to crawl the Web.

Sam Sheepdog and Ralph E. Wolf

Sam and Ralph were among my favorite Looney Tunes characters. I thought of them this morning when I said hello to my co-worker as I came into the office. So I thought I’d look them up on YouTube. Google is great. I searched for [wolf sheepdog looney tunes], and got the Wikipedia article.

All told, there were seven episodes that starred these two. The first was made in 1953, and the last in 1963. I was able to find all but the first (Don’t Give Up the Sheep) on YouTube. There are some clips of the first episode, with an alternate sound track, but I couldn’t find the original.

I did find Don’t Give Up the Sheep on mojvideo, but didn’t see a way to embed it.

Here are the others, from YouTube:

http://www.youtube.com/watch?v=IEQ1tA5QcPY

http://www.youtube.com/watch?v=WQ5ZVrHdtUY

http://www.youtube.com/watch?v=-k6s1_5BMGY

http://www.youtube.com/watch?v=Up0leZe1Un0

http://www.youtube.com/watch?v=yZKvuSYIykY

There also was a cartoon in which Taz takes the place of Ralph.

https://youtube.com/watch?v=s84jETMjCFQ

I didn’t much like the voice of Sam in that episode.

YouTube has a lot of the old Looney Tunes and Merrie Melodies cartoons. Be careful. You could spend the whole day laughing.

If you find Don’t Give Up the Sheep on YouTube, please let me know so that I can include it here.

Getting videos from YouTube

YouTube has a very rich API that developers can call to get information about videos, post new videos, etc. Lots of Web sites use this API to do all kinds of wonderful things with YouTube videos.

So suppose you wanted to create a little application that works like an “all news” channel. But instead of just showing you CNN or MSNBC or whatever channel, it gathers news from lots of different channels and aggregates them. You then have an “all news all the time” channel that shows lots of different stories and will give you different viewpoints on the same story. Fox news, for example, will have a much different take on a story than will CNN or Al Jazeera.

The YouTube API gives you at least three ways to get new videos from a particular YouTube user. The simplest is to do a search and filter it by the YouTube author name. For example, this query:

http://gdata.youtube.com/feeds/api/videos?orderby=published&max-results=50&author=AssociatedPress&v=2

will give you the 50 most recent videos that were posted by the YouTube user “AssociatedPress”.

There’s a problem, though: the results in that feed will be delayed sometimes by more than 15 minutes. If I go to the AssociatedPress page on YouTube, I’ll often see videos there that do not show up in the feed returned by that query.

You can get up-to-date results by querying the individual user:

http://gdata.youtube.com/feeds/api/users/AssociatedPress/uploads?orderby=published&max-results=50&v=2

With that query, I’m able to get the most recent videos. Anything that shows up in the YouTube page for that user also shows up in the results I get back from the query.

But that’s just one source! If I want my news channel to get data from 50 different sources, I’d have to poll each one individually. That’s not terribly difficult, but it takes time. YouTube places limits on how often I can poll for videos. To keep from being throttled, you have to wait at least 15 seconds (a minute or longer if you don’t have a developer API key) between requests to the API. So getting the latest videos from all 50 sources will take somewhere between 12 and 50 minutes. Perhaps longer.

Not many sources post a video every 12 minutes, so there’s a lot of waste involved in polling all the time. And 50 minutes is way too long to wait for an update. 50 minutes? It’s not news anymore!

There is a third way. Go to your YouTube account and subscribe to the sources that you’re interested in. According to the documentation, you can subscribe to up to 2,000 different sources, and whenever you go to your subscriptions page you’ll see the most recent videos from those sources.

The really nice thing is that you can then do a single query to get the most recent videos from all your sources:

http://gdata.youtube.com/feeds/api/users/YourUserName/newsubscriptionvideos?orderby=published&max-results=50&v=2

Replace YourUserName in the URL above with the name of the user who subscribed to the sources you want to see.

With this, you can poll YouTube once every five minutes and get the most recent videos from all of your sources.

Another benefit is that you don’t have to change your program if you want to add or remove sources. All you have to do is log in to YouTube and edit your subscriptions.

Reduce your bandwidth usage and get your videos more timely. Sounds like a great deal to me.

MSDN “Cannot service request” error

As I noted in my previous post, I’ve been having some trouble accessing MSDN information from my browser. I thought I’d solved the problem, but it came back this morning. I had been viewing pages on MSDN for a couple of hours while working, and at some point I clicked on a link and got the “Cannot Service Request” page again.

This happened on Chrome, and when I went to Internet Explorer to see if I could get to the page, IE wouldn’t display it either. Firefox, again, had no trouble.

My solution for IE, then, was to go into Tools -> Options, and delete the browsing history. I just deleted everything. Restarted the browser, and I was able to visit MSDN again.

With Chrome, I thought I’d try to narrow down the problem. I first went into Options and selected “Empty the cache.” That didn’t solve the problem. So I checked “Empty the cache” and “Delete cookies and other site and plug-in data.” That worked!

Apparently, something in the MSDN site is setting a cookie or storing other site-specific data that at some point causes the JavaScript to throw up its hands.

This is almost certainly a bug with the MSDN JavaScript. But at least now I know how to treat the symptom. It ticks me off, though, that I have to delete all my cookies and site-specific data in order to keep using the site. Maybe next time the problem occurs, I’ll see if I can delete just the MSDN-specific stuff.

Can’t get help!

09/16/2011 – Problem solved. See below.

\I don’t know why, but about a week ago I started getting a page that says “Unable to service request” whenever I try to get to Microsoft’s online documentation. At first it was just the MSDN documentation. Now it’s most any page that start with http://msdn.microsoft.com/.

This problem is very strange. If I try to visit the site with Google Chrome or Internet Explorer, I get the error. No problem with Firefox, though. It’s interesting to note that I haven’t fired up Firefox in many months. I wonder if my Chrome and IE installations are somehow corrupt?

I can access MSDN from other computers in the office, using IE or Chrome. And I can access it from my machine using curl, wget, and my own custom download program. It’s just IE or Chrome on my computer.

Any insight into the problem? I’m stumped. I suppose I could reinstall Chrome and see if that solves the problem.

Update 09/14: For Internet Explorer 8, I just had to turn on compatibility view. Perhaps it’s time to upgrade to IE 9? Firefox 3.6.18 (latest in the 3 series) works just fine.

Still no joy with Google Chrome. Others who want to view “old” sites with Chrome resort to silliness like running IE in a Chrome browser window. Whereas I’ll bow to the brilliance of the hack that makes such a thing possible, I can’t imagine why it should be necessary.

I wonder if this has been a problem for a long time, but has been hidden from me because the browser caches scripts. And, yes, I’m pretty convinced that the problem isn’t with the HTML so much as it is with scripts. If the MSDN site updated their scripts but didn’t change the version number, then pages could continue to work for a very long time.

I first experienced this problem on September 6. For several days prior to that I was doing a lot of JavaScript debugging, and had cleared the browser caches on IE and on Chrome in the process. I strongly suspect that doing so triggered this behavior. There’s no telling how long I was running with old scripts.

Update 09/16: After verifying that I could visit the site from my home computer, also running Chrome, and verifying that I was running the same version of Chrome at both sites, I had to conclude that it wasn’t the browser version. So here at the office this morning, I closed all of my Chrome windows and started up one. Then I went to Options and cleared all browsing data. Everything. Shut down Chrome, restarted it, and now I can see MSDN again.

So, if you run into this problem, Clear your browser cache! I thought I had done that. I might have cleared some of the data from the cache, but not all of it. So I just told it to delete all of the browsing data.

Most tweeted stocks

The Twitter convention for referencing a stock is to put a dollar sign in front of the ticker symbol. For example, a tweet that contains $MSFT is talking about the stock for Microsoft Corporation. The tweet likely contains a link to an article about the company’s performance, or perhaps somebody’s sentiment about the stock. I got to wondering which stocks get the most tweets.

Twitter has a very easy to use API, and fairly liberal usage restrictions. It was a matter of a few minutes to get a list of the stocks on the S&P 500 and write a simple program that does a Twitter search for each stock and computes a “tweets per hour” figure for each stock. From there, it’s a simple matter of sorting to come up with the most-tweeted stocks.

If you assume that Twitter is a reliable proxy for overall chatter about stocks, then you can say that the stocks below are the 20 most talked about stocks. I don’t know enough about the market to say with any certainty that the assumption holds true, but the list below does include more than just technology companies. So I suspect there is at least some correlation between tweets and overall market sentiment.

SymbolCompanyTweets / hour
AAPLApple Inc.44.12
MSFTMicrosoft Corp.40.64
GOOGGoogle Inc.24.61
GSGoldman Sachs Group17.74
BACBank of America Corp17.74
WAGWalgreen Co.17.09
TAT&T Inc13.35
MSMorgan Stanley13.08
CSCOCisco Systems12.95
AAgilent Technologies Inc12.40
SSprint Nextel Corp.11.69
NFLXNetFlix Inc.11.62
CCitigroup Inc.11.61
PEPPepsiCo Inc.10.79
FCXFreeport-McMoran Cp & Gld10.49
INTCIntel Corp.10.02
ESRXExpress Scripts9.98
AMZNAmazon.com Inc9.85
MHSMedco Health Solutions Inc.9.85
XUnited States Steel Corp.9.72

I was somewhat surprised to find Walgreen (WAG) in the top 10. PEP, FCX, MHS, and X were a bit surprising, too. The most likely reason for Express Scripts (ESRX) being on the list is that they announced a merger with Medco yesterday. Or was it Wednesday? Either way, Twitter reflects the recent buzz about ESRX.

Note that the list above is just a snapshot. I did a one-time snapshot to get the recent tweets for each stock at one particular time. There are all kinds of things that could skew the data in a single snapshot. You would need several snapshots per day over a few days to get a more reliable list. For my purposes, a single snapshot is just fine.

WiFi booster experiment

Debra and I bought a new TV last week, along with a computer and other accessories required to launch us into the wide world of Internet TV. Installation went pretty much as expected, with only minor problems, and now we can get all the online offerings on our new 42″ LCD television. Including, I might add, Podly TV. I’m working on a series of blog entries that outlines what we got and how we set it up. I’ll post here when it’s ready.

The biggest problem we had was with the Internet connection. It would be exceedingly difficult to get an Ethernet cable from the router to the living room where the television is. I figured that wouldn’t be a problem, as we do have wireless and we’ve been able to use it in the living room. And although I did get a signal when I hooked everything up, it was very weak and unreliable.

To solve the problem, I first went and bought a new wireless router, since somebody convinced me that it would produce a better signal than the old Linksys WRT54G that I’ve had for years. It didn’t. And although I’d heard that it’s possible to use the Linksys as a repeater, I certainly couldn’t figure out how to do it. Besides, I don’t know how well that would work with such low signal strength, and there wasn’t a good place in between to place the repeater.

So I took it back to Fry’s and exchanged it for a Hawking HAWNU2 adapter, which I connected to the computer in the living room. The adapter is your basic USB network adapter with a directional antenna (they claim 13dBi gain) and a built-in power amplifier. That thing works like magic. My signal strength went from almost nothing (I didn’t have a measuring tool at the time) to three or four bars (out of five). The wireless signal was strong and reliable.

This morning, I learned of cheap wireless extenders. Specifically, how to build a parabolic reflector that dramatically improves wireless signals. The Linksys has two little “rubber duck” antennas that are omnidirectional. The joke in the ham radio community is that vertical antennas (which these are) radiate equally poorly in all directions. The idea of the reflector is to turn those vertical antennas into directional antennas, increasing their effectiveness by directing more of the signal to where we want it. In my case, directing the signal that would go across the street to the new houses being built so that it goes instead to my living room.

If you can download and print an image, and have even a little skill with a glue stick and a knife or scissors, you can make one of these antennas. Or even two or three, if your router has more than one antenna on it.

Directions for building the Windsurfer. That includes the template image that you need to download and print.

I downloaded the image, printed two copies, and glued them to a manila folder.

Then, using my trusty Stanley No. 199 utility knife (also makes a great wood carving tool, by the way), I carefully cut out the pieces.

A couple of notes here. First, I didn’t do the cutting on that nice laminate. I used a scrap piece of wood for backing. Also, the six horizontal lines on the bottom piece are intended to be cut so that the tabs on the other piece can fit into them. But just slice along the center of the line. Don’t actually cut out a hole, because then the tabs won’t stick in there. Also, it’s probably best if you wait until after the next step before cutting those slices. Finally, cut vertical and horizontal on the “+” signs on the top piece and then push a pencil through them to open the holes. The antenna will slide through those holes.

More fun with the glue stick. Cover the back side of the lower piece with glue and then attach a piece of aluminum foil. Cut along the outline and slice the horizontal lines:

Then, fold the top piece and fit the six tabs into the six slots you cut in the rectangular piece. As an added precaution, I used some packing tape on the back to hold the tabs down. Here are the two reflectors, ready for installation:

The Linksys router sits on the top of a bookshelf, about six feet off the ground. Here are pictures before and after installing the reflectors.

So how well does it work? I measured the signal strength in the living room with NetSurveyor before I installed the reflectors. The beacon quality fluctuated between 48 and 50%, and gave a beacon strength value of between -61 and -59 dBm. It rated the signal quality as “Very Good.” With the reflectors installed, beacon quality is between 52% and 55%, with a beacon strength value of -59 to -55 dBm. Again, signal quality is “Very Good.”  So the reflectors seem to work, but the change isn’t terribly large.

Just for comparison, beacon strength here in my office, with the router just a few feet away, is -46 dBm, quality is 70%, or “Excellent.”

The measurements above were taken with the Hawking directional antenna installed.

If I disable the directional antenna and remove the reflectors from the router, the signal ranges from non-existent to “Poor.” When I installed the reflectors, signal strength was consistently “Very Low,” but it was reliable. Strength was about -87 dBm.

Conclusion: the reflectors work, but not nearly as well as I had hoped. They do make the difference between no signal and a weak but usable signal, but I was hoping for something better. I think more experimentation is required: either two slightly larger reflectors, or one much larger reflector that sits behind the router rather than attaching it to the antennas. Until then, the USB directional antenna remains connected to the computer in the living room.

Spam problem found, but solution questionable

A few months ago I noticed an marked increase in the amount of spam that I was receiving.  At the time it was a minor inconvenience and I just dealt with the problem the old fashioned way:  I deleted the offending messages.  But a week or two ago Debra started noticing a large increase.  And then we were gone over the weekend and when I came back I had to trash over 200 messages.  Time to do something about the problem.

I get my email through my ISP, who has SpamAssassin installed.  I checked my settings again, just to be sure I had it configured correctly, and then sent a message to my ISP’s support through their exceedingly user-unfriendly help desk software.  After a short exchange of messages I got their answer:  1) lower the spam threshold in my SpamAssassin configuration to 3; 2) train SpamAssassin.

Fine.  Except.

1) According to the SpamAssassin configuration information, a setting of 5.0 is “fairly aggressive”, and should be sufficient for one or just a handful of users.  The instructions caution that a larger installation would want to increase that threshold.  It doesn’t say what lower numbers would do, but since several of the obviously spam messages I’ve examined have numbers over 2.0, I hesitate to reduce the setting to 3.  Otherwise I’ll start getting false positives.

2) Their method of training SpamAssassin involves me installing a Perl script (written by a user who has no official connection to the ISP and that is not officially supported), forwarding good messages to a private email address (that I control), and having the Perl script examine those messages so that it can update the tables that SpamAssassin uses.

That’s ridiculous!

First, there’s no explanation why my spam count went from almost zero to 50 or more per day almost overnight.  Second, they expect me to have the knowledge, time, and inclination to install and run that script.  Oh, and if I want to make sure that Debra’s mail is filtered correctly, I should have her forward her good emails to that private email address, too.  “I promise I won’t look at them.”  I wouldn’t, and it’s unlikely that there’s anything she’d want to hide from me anyway, but I can imagine that others who share my predicament would have users who are reluctant to forward their emails.

Honestly, it’s among the most ridiculous things I’ve ever heard.  Why don’t they have a reasonable Web client that has a “mark as spam” button?  Why, after 10 years of dealing with spam, is there no informal standard for notifying your ISP that a message you received in your email client is spam?  Why should I even have to think about spam anymore?  Shouldn’t the ISP’s frontline filters catch the obvious garbage that’s been clogging my mailbox?

I think I need a new ISP.  Or at least a better way to get my mail:  something that will filter the spam for me after downloading from my ISP.  But it has to be Web based.  I like using a Web mail client, because I regularly check my email from multiple locations.  Any suggestions on Web-based email services that can do this for me?

But isn’t that what the web is for?

The Terms of Use for the site yobi.tv includes the following (the emphasis is mine):

8. RESTRICTIONS ON USE

You may use this Site only for purposes expressly permitted by this Site. You may not use this Site for any other purpose, including any commercial purpose, without YOBI’s express prior written consent. For example, you may not (and may not authorize any other party to) (i) co-brand this Site, or (ii) frame this Site, or (iii) hyper-link to this Site, without the express prior written permission of an authorized representative of YOBI. For purposes of these Terms of Use, “co-branding” means to display a name, logo, trademark, or other means of attribution or identification of any party in such a manner as is reasonably likely to give a user the impression that such other party has the right to display, publish, or distribute this Site or content accessible within this Site. You agree to cooperate with YOBI in causing any unauthorized co-branding, framing or hyper-linking immediately to cease.

Far be it from me to violate their Terms, which is why the name of their site, above, is not hyperlinked.

I thought this particular idiocy had been eliminated years ago.  If you don’t want people to link to you, why the heck are you on the Web at all?  I think somebody needs to rein in the lawyers again.