This word map graphic was generated from the text on this web page by Jonathan Feinberg's Wordle applet.

The Bearcave Links Page

The Exponential Trajectory of Knowledge

Human knowledge has become staggeringly huge. As my growing pile of unread books shows, we can't keep up even in areas we are interested in. I constantly find that curiosity leads me from one thing to the next (Richard I to the Knights Templar, Wavelets to signal processing to Kolmogorov complexity). I will not live long enough to delve into all of the areas of computer science and mathematics that I am interested in. This is one of the saddest parts of mortality.

The web is wonderful because it allows me to reference work that I would otherwise have a hard time getting access to or even learning about. The web also makes things worse in some ways too, as the web page demonstrates. I frequently find information that I might want to reference at some point in the future, so I make a note of it on this Web page.

This web page was started soon after www.bearcave.com came on-line, in June of 1995. Back then, everyone had a links pages, so we created a links page too. As with the rest of bearcave.com, this links page has evolved. Many of the links on this page I have added for myself, although the narrative is usually written for others. But as this Web page expands I feel a bit like Ted Nelson and his files (see The Curse of Xanadu by Gary Wolf, Wired Magazine, June 1995). There is a real danger that any system of notes and annotations will become useless as it grows to the point where finding material becomes difficult (then you need an index to the index). While this web page does not have an index (other than Google), it has acquired a disturbingly long table of contents (see Links to the Links, below). In fact, it is starting to resemble a sort of demented "blog".

Without better tools (see the links on Cyc, below) there may be a limit to human knowledge, not because there are not new things to discover, but because the human knowledge base becomes too unwieldy. Our lifetimes are limited, as is our brain capacity. From the point of view of intellectual work, upgrading the life span would be not very useful without upgrading our brain storage and retrieval capacity. As human knowledge grows there may come a time when people become so narrowly specialized that progress stops. Or that a person must spend their lifetime simply mastering what is known in a given field, without contributing anything new.

Links to the Links
The ANTLR Parser Generator
Bears Hate SPAM
The Bear Products International Spam Filter
Other Bear Web sites of various flavors
More Ursine Web Pages
Stuff (cool material items)
Companies Involved with Semantic Graph Software
Graphs and Semantic Graphs
PuTTY: An alternative to Windows Telnet
Subversive Software Developers
Snort: Protection from Subversive Software Developers
BSD UNIX: live free or die
CD-ROM and DVD-ROM Software for Linux
Miscellaneous Software Applications
Java Links
XML and Related Technologies
What is Microsoft .NET?
Computer Science Links
Art and Computer Science
Software
Computer Hardware
Programming Langauges
C++ Development and Debugging
Mathematics and Statistics Links
Agent Based Software and Modeling
Artificial Intelligence
Source Code Documentation Generators
Financial Literature
Market Information
Market Intra-day Data and Trade Execution
Time series data base support and the K language
Geospatial Information Systems
ISP's for UNIX Software Engineers
Literary Agents for Software Engineers and Computer Scientists
On-Line Libraries and literature sources
Local web site search engines
Google's Successor (fugetaboutit)
Some of our friends (hmm, not many friends)
Web Magazines and Periodicals
A Few Interesting People
Miscellaneous Links
Art Links
Wierd Stuff

The ANTLR Parser Generator

I have used the ANTLR parser generator to develop the Bear Productions International Java compiler front end. I have also written a fair amount about ANTLR on the bearcave.com web pages.

ANTLR has been used to create a wide variety of software tools. One interesting example is Ephedra, a C/C++ to Java translator developed by Johannes Martin in his PhD dissertaton.

Java has references, which are pointers, by another name. However, Java does not have references to references, which could be used to model pointers to pointers in C++. This is a problem when it comes to translating some C++ algorithms to Java. For example, translating the algorithm that supports binary tree delete.

Bears Hate SPAM

Boycott Internet spam!

SPAM is the term applied to unsolicited e-mail sent out by mass junk e-mail advertisers. SPAM threatens to flood our e-mail accounts with junk mail, making them unusable. I started writing about SPAM here years ago, when the problem first started to appear. Here is something I wrote about the infamous Stanford Wallace an "innovator" in mass spamming:

One of the most despicable SPAMers is Stanford Wallace (known as SPAMford). Mr. Wallace runs a company called Cyberpromotions. Wallace believes that the more controversy he can stir up, the more attention he will get and that his business will increase as a result. Recently Wallace hosted a site registered with interNIC as godhatesfags.com. This is a "Christian" hate site mainly aimed at gay men and lesbian women. The site is also antisemitic and states that Jews are in league with gays and are damned as well.

Wallace hosted this fountain of pathalogical hate in the hope that it would attract media attention. Wallace is a good example of the kind of scum that take part in SPAM "marketing". Fight SPAM and don't do business with anyone who uses SPAM marketing. Don't do business with an ISP (Internet Service Provider) that allows their site to be a SPAM platform.

Things change fast in Internet time. SPAMford Wallace has been driven off the Internet. He lost several lawsuits and has been looking for some other way to scam bucks. Those wonderful "Christians" at godhatesfags.com are (or were) hosted by www.L7.net (or at least they were at the time of this writing). As a strong believer in the first amendment, I have to support their right to post their spew.

Ever the master of reinvention Spamford Wallace went legit. He is opened a nightclub, complete with go-go dancers. But as Wallace points out at the end of a Wired News article, the spam continues. Wallace's night club, Plum Crazy filed for bankrupcy in 2004. Ever the master of the quetionable businesses, Spamford apparently went back to his roots:

Spam King must step down: This might make you feel a little better the next time you have to close dozens of pop-up windows or spend an hour removing "spyware" or "mal-ware" from your computer: A federal judge has issued a temporary restraining order against Stanford Wallace, a man known as the "Spam King," which will force him to disable most of his software. Mr. Wallace's case is the first action launched by the U.S. Federal Trade Commission in its crackdown on ad-ware and spam. He has also allegedly been selling software called "Spy Wiper" and "Spy Deleter" that the FTC says doesn't work.
The Globe and Mail, Mathew Ingram's Globe and Mail Update, October 25, 2004

You don't have to be very smart to be a spammer, just totally lacking in morality. A great site that can be used to track SPAMmers down to their ISP or web provider is Sam Spade (dot org). I use this site frequently and am grateful to steve@blighty.com who hosts this site and keeps its software running in the face of outraged spammers.

At the time I wrote this some people estimate that the industry wide cost for SPAM is about $10 billion (US). The SPAM problem has been growning, day by day, week by week, month by month. We definitely see this at bearcave.com. Yet there are no laws that have any teeth that attack the SPAM problem, nation wide, in the United States. Why is this? An excellent answer is provided by Keith H. Hammonds in his Fast Company article The Dirty Little Secret About Spam (Fast Company, August 2003, Issue 73, Pg. 84). The sub-title is: What J.P. Morgan Chase and Kraft want is exactly what the guys peddling porn and gambling want: free access to your inbox. That's why there's no easy solution to a problem that could soon make the world's email system crash and burn.. The point made by Mr. Hammonds is that laws have not been passed because "direct marketing" companies and large corporations have fought any laws that might stop them from SPAMming.

Banks, morgages companies and other large comporations, like Kraft which sells the widely spammed "Gevalia Kaffe", pay for leads or affiliates that bring them buyers. Spammers provide these leads, sometimes through multiple "cutouts". In Who profits from spam? Surprise: Many companies with names you know are benefiting by Bob Sullivan (MSNBC, August 8, 2003) followed the path from spammer to the company that benifited (and ultimately paid the spammer). These companies all denied that they had anything to do with spam. But the system was set up so that they did not know who provided the "leads" or affiliate links.

Online references discussing spam

One company arms both sides in spam war, By Saul Hansell, November 25, 2003, The New York Times, republished on news.com

This article discusses a company called IronPort, which makes a specialized computer system described by some as a "spam cannon". Apparently this system is designed to rapidly send vast amounts of e-mail. Some believe that IronPort's customers are spammers. IronPort claims that their products are used to send e-mail only to those how have "opted-in".

Ironicaly, IronPort has recently purchased SpamCop. SpamCop runs a spammer blacklist which includes IronPort's customers, according to this article. After being repeatedly attacked by spammers, Julian Haight, who runs SpamCop, was forced to look for a buyer.
Report: A third of spam spread by RAT-infested PCs By Munir Kotadia, December 3, 2003, CNET News.com

RAT is the acronym for Remote Access Trojan. This article discusses an alarming trend where spammers are using viruses and other computer security attacks in order to take over computers so that they can be used to send SPAM, or in some cases Distributed Denial of Service attacks (DDOS). Such a DDOS attack was recently launched at the anti-spam site run by the Spamhaus Project (see reference below).
New e-mail worm targets antispammers, By Reuters, December 3, 2003 (published on news.com)

This article discusses a distributed denial of service attack that targeted the Spamhaus Project. This attack was launched from virus infected computers.

Don't send spam to people claiming you are a technical "guru"

Sending someone spam is the quickest way possible of demonstrating that you are a clueless Internet user who knows little about technology. In particular sending spam to people who, at one time, had e-mail addresses that had "!" characters in them is a quick way to become very unpopular. These people remember an Internet without spam. Spamming people like this with your resume is the quickest way to get identified as someone they don't want to work with, as a gentleman named Bernard Shifman demonstrates. Finally, as Mr. Shifman should have known, the Internet has a long memory (sometimes disturbingly so given Google's USENET archives). It is not a good idea to send anyone you are not intimate friends with e-mail that you would not want posted on a company bulletin board (you know, the old kind, made of cork).

The mass layoffs that resulted form the 2000/2001 "dotcom dieoff" has lead more desperate people than Mr. Shifman to send out their resume in spam e-mail. I suppose that it is now a phenomena, since there is a Washington Post article discussing resume spam.

There are a variety of reasons that bearcave.com exists, but one of them is to open professional doors for the humble author of this site. The lesson here is, content, not spam.

The Bear Products International Spam Filter

Since I registered bearcave.com, I have fought spam by tracking down those who sent my account spam and getting their accounts and web pages canceled. My efforts and the efforts of others in the anti-spam Jihad have had little effect. The tide of spam keeps rising. Currently I get between 50 and 100 spam e-mails a day. So I finally threw in the towel and wrote an e-mail filter for my UNIX shell account. The C++ source code for the mail filter, along with its documentation is published here. This mail filter not only gets rid of the vast majority of the spam, but it also gets rid of stupid bear mail (see below).

Send it to Gmail: a solution to the spam avalanche

The spam filter above allowed me to continue using my iank at bearcave.com email address. However, I still got vast amounts of spam: a few hundred junk emails in my junk "folder". My spam filter is pretty accurate, but once in a while it would throw valid email into the junk folder. When my junk email got into the hundreds of emails, I would frequently delete it. On a few occasions I lost emails that I would have wanted to read. These problems made it clear that yet another version of this spam filter would be needed. I was happy to discover a Google Gmail based solution.

Google's Gmail provides directions for setting up a Gmail account so that it handles domain email. For example, I have routed email to iank at bearcave.com to my Gmail account. The GMail directions include the MX record settings to give your ISP for forward your email.

References on filtering out SPAM

Paul Graham's Web pages on filtering out spam

Paul Graham is the author of at least two books on Lisp and some elegantly written and thoughtful essays. On of my favorite is The Hundred-Year Language, which is a thoughtful essay on the evolution of languages for designing software. Paul also wrote this Web page which consists of links to his work on filtering spam, particularly using Baysian techniques.
SpamBayes: A Bayesian anti-spam classifer written in Python

SpamBayes is an open source Python implementation that started with Paul Graham's Bayesian spam filtering algorithm. They claim that they have improved on this algorithm. The spam filter is available as an Microsoft Outlook plugin, a POP proxy filter, or a procmail filter.
Gary Robinson's Rants: SPAM Detection

This is a link rich web page which discusses the technical detials of various spam filtering techniques.
Field Guide to SPAM by John Graham-Cumming

Compiled by Dr. John Graham-Cumming, a leading anti-spam researcher and member of the ActiveState Anti-Spam Task Force, the ActiveState Field Guide to Spam is a selection of the tricks spammers use to hide their messages from filters, providing examples taken from real-world spam messages.

From the Field Guide to SPAM web page

Other Bear Web sites of various flavors

Resources for Bears.

I sort of lumber around and take life at a slow considered pace. I am getting hairier by the day and I'm a fairly big guy (6'1"). So, in short, I'm just an ursine sort of person. So when got my domain in 1995, the name bearcave seemed to make sense. As it turns out, there are other people who consider themselves "bears". While browsing around on the Web, I met some really nice people who consider themselves Bears also (there is even an IRC channel named #bearcave, which is totally independent of this Web site).
The most common definition of a "bear" is a homosexual or bisexual man who is hairy, has facial hair, and a cuddly body. However, the word "Bear" means many things to different people, even within the bear movement. Many men who do not have one or all of these characteristics define themselves as bears, making the term a very loose one. Suffice it to say, "bear" is often defined as more of an attitude than anything else - a sense of comfort with our natural masculinity and bodies that is not slavish to the vogues of male attractiveness that is so common in gay circles and the culture at large.
Resources for Bears FAQ

I corresponded a bit with a couple of Bears (Scott and Kevyn), who seem like really nice people. There is little enough love and warmth in this world and these Bears seem to be loving and warm people. Although I happen to be straight, I would be honored to be considered a Bear.

The essayist and sometimes pundit Andrew Sullivan wrote a good essay on bears (of the gay variety), published in Salon.

Stupid Bear Mail

OK, some bears are way cool. Having said this, let me take a moment to comment on bearcave.org. This now defunct "bear" site once offered free Web e-mail accounts. Back then the ISP that hosted bearcave.com routed all of the email that went to bearcave.com to my email account. As it turned out, some of the bearcave.ORG users would enter .com when they meant .org. A few others intended to send their email to bearcave.net, the more intelligent "bear" site that at one time hosted several users (currently it seems to be exclusively Brian's web site).

There are those who don't really understand what domains are and think that it would be cool to have an address like clueless_dweeb@bearcave.com. One misdirected note had an enclosed picture of the guy's cock and asked the addressee to send him a picture of his. My only comment is that being gay does not excuse acting like a teenage boy. If you're sending e-mail like this out you should grow up or move to Palm Springs. For a selection of e-mail from bears trolling for other bears, click here.

In order to use my e-mail in the face of the massive stream of spam that gets directed to anyone with web pages and published e-mail addresses, I implemented a spam filter which put the misdirected "bear" mail in my junk "folder", along with the spam.

More Ursine Web Pages

The Evolution of Bears by Don Middleton, published on The Bear Den Web page. The Bear Den Web page has lots of information on bears (the real ones, not the human kind) and links to other bear sites on the Web.
The impact of mankind on our planet is causing one of the largest species die outs in geologic history. Some species of bear, like the Giant Panda Bear, are near extinction. Gary Coulbourne and Phil Pollard have created the www.bears.org Web site "dedicated to the preservation of the accurate Bear beliefs". Gary writes:

Many of the species of bears in the world are slowly dwindling. We are not alone on the Earth. Thousands of other forms of life live here too. If we are to survive, we must do it together. Not just humans and bears, but humans and everything that lives

Among the pages on this site is an interesting set of pages on the various species of bear. When I looked at the Web page, it was still growing. Gary and Phil's page looks like it will be an important contribution to ursine information on the Web.

Stuff (cool material items)

Beautiful Kitchen Knives

At the Bearcave we love good food (and other sensual experiences). I've been cooking since I was a little kid (my sister was given a Susie Homemaker oven which I quickly appropriated (hey, I shared the cakes with her). Good knives are important tools for any chef. I had been using a terrible Henkel knife that my mother gave me when I went off to college. The knife would not hold and edge and I had to sharpen it all them time. After spending years suffering with this knife I decided to buy a new 6-inch kitchen knife.

Some of the kitchen knives that caught my eye are made in Japan, using the techniques that are similar to those used to make samurai swords. The web site Japanese Woodworking Tools has an amazing selection of beautiful knives, in a range of prices (from about $70 US to over $1000 US). The 4" Ryusen Damascus paring knife and the 6" Ryusen Damascus small slicing knife are shown below:

I own both of these knives and I absolutely love them. They are not cheap, but they should last a lifetime and then can be passed on to your heirs. The folded steel construction makes the blades very strong, so the knives can be remarkable thin. They hold a very sharp edge. In fact so sharp that you really need to threat these knives with respect (we never leave them in the sink or the dish drainer to avoid any accidents).

One word of caution about these knives: the thinness of the blades makes them more fragil than thicker chef's knives. I was slicing down the lenght of some snow crab legs with my 6" knife and I shattered parts of the edge. Fortunately I was able to repair it with a Sharpton Water Stone (also sold by The Japan Woodworker Catalog). So if you have these knives, it's useful to have a thick chef's knife as well.

An article on Japanese knives and the chefs who love them can be found in This blade slices, it dices by Harris Salat, February 1, 2008, Salon.com
La Quercia Artisan Cured Meats (pronounced La Kwair-cha)

I have gone through some evolution when it comes to cured meats like prosciutto. Cured pork is not cooked and the idea of eating uncooked pork bothered me for some time. After visiting Italy twice, I could not escape the fact that cured pork tasts pretty good. To reconcile the two conflicting ideas: cured pork tastes good and eating uncooked pork is bad I came up with the following train of thought. Cooking meat denatures the proteins, which makes the proteins more difficult for bacteria to digest (but makes it easier for humans to digest). This is why cooked meat keeps longer than raw meat. Curing meat also denatures the proteins in the meat. Cured meat also contains preservatives like salt and sometimes smoke. The process of curing meat is similar to cooking. Ergo, cured meat is similar to cooked meat, so it's OK to eat cured pork. That's my story and I'm sticking to it.

After returning from Italy I started buying Italian prosciutto from CostCo. I read about La Quercia Artisan Cured Meats in the New York Times. I like to support US artisans, so I ordered the La Quercia kitchen sampler. The prosciutto is fantastic, better than the Italian prosciutto. The sampler comes with Prosciutto Americano crumble and it took me weeks to go through it. A couple of teaspoons adds wonderful flavor to an omlette with aged cheddar cheese.

The pigs that become the La Quercia meat are all free range and are raised humanely:

All of the pork we use comes from suppliers who subscribe to humane practices. To us this means that the animals have access to the out of doors, have room to move around and socially congregate, and root in deep bedding. We do not use meat from animals that have been given subtherapeutic doses of antibiotics or kept in large animal confinement facilities.

The meat for our Prosciutto Americano is all from antibiotic free, animal by product free, hormone free pork.

For La Quercia Rossa, our Heirloom Breed Culaccia, we use Berkshire meat from animals that have not had subtherapeutic antibiotics and have had no antibiotics at all for at least 100 days prior to harvest.

Americans eat too much meat and we'd be better off if we ate half as much meat and paid twice as much for animals that are raised well.

Companies Involved with Semantic Graph Software

Visual Analytics

At least on paper this system is as close as any commercial system I've seen to the Nebraska/ADVISE Distributed Semantic Graph System Database System I worked on at Lawrence Livermore. They seem to be thinking about the important issues, like entity disambiguation and security. They also seem to be trying to integrate unstructured data input via Attensity (good luck with that). Visual Analytics also claims to be "partnering" with TARGUSInfo, which is a consumer data mining company.
Investigtive Analysis Software

Investigative Analysis Software is a British company that has developed Analyst's Notebook and Analyst's Workstation. They have also purchased a small company which used to be on this list called Anacubis which makes software that allows graph browsing and construction of networks of information. Interestingly, this product is aimed at "commercial intelligence". Obviously a tools like this has applications in other areas (assumign that it provides some level of power).
Cogito

Cogito was named for one of the few propositions that are provable without reference to the senses, Cogito Ergo Sum (I think, therefore I am). Cogito has developed a database system based on semantic graph technology.
IBM Entity Analytics Solutions. IBM purchased Jeff Jonas' company SRD (Systems Research and Development), which made a product called NORA. Apparently Jonas' company lives on in Nevada as a division of IBM, with Jeff Jonas becoming an IBM distinguished engineer and the chief scientist of IBM's Entity Analytics division. See also IBM Beefs Up Criminal Detection/Analytics Software (news.yahoo.com)

SRD's original web pages stated:

NORA. is a software solution that uses SRD's unique Entity Resolution. technology to cross-reference databases and identify potentially alarming non-obvious relationships among and between individuals and companies. Companies use the generated NORA Intelligence Reports and alerts to focus investigative and audit resources on areas of real concern.

NORA is apparently used by gambling casinos to identify card counters and cheaters (casinos seem to view these groups as roughly similar, since both groups are engaged in taking the casino's money, rather then the casino taking theirs).

For more on Systems Research and Development:
- Entrepreneur Offers a Solution for Security-Privacy Clash by Don Clark, The Wall Street Journal, March 11, 2004.
- Geek War on Terror, by Steven Levy, Newsweek, March 22, 2004).
IBM semantic graph related research projects:
- Web Fountain
  
  Web Fountain is a project at the IBM Almeden Research Center involved with data mining the Web. According to the New York Times (Entrepreneurs see a web guided by Common Sense by John Markoff, November 12, 2006), Web Fountain "has been used to determine the attitudes of young people on death for an insurance company was able to choose between the terms "utility computer" and "grid computing" for an I.B.B. branding effort." IBM has also used Web Fountain "to do market research for television networks on the pop;ularity of showns by mining a popular online community site" and "mining the 'buzz' on college music Web sites". In this last case "researchers were able to predict songs that woujld hit the top of thte pop charts in the next two weeks." The IBM researcher mentioned in reference to Web Fountain is Daniel Gruhl.
Visible Path

From the Visible Path web pages:

The Visible Path platform applies the science of social network analysis to allow professionals to access the entire enterprise's trusted relationship network without invading privacy or compromising relationships. The platform integrates tightly with corporate SFA, CRM and business intelligence applications to measurably accelerate sales cycles, increase close rates, and reduce the cost of lead generation and customer acquisition.

According to a December 4, 2007 CNET article, Visible Path is being acquired:

Visible Path, which makes social-networking tools for business users, is set to be acquired.

A company representative on Tuesday said that Visible Path has signed a term sheet with a multibillion international company to sell the firm. She said the service is expected to continue operating.

Visible Path has confirmed the buyout, but the identify of those clueless enough to purchase this company remains in question as I write this (Dec. 7, 2007).

Apparently the Visible Path management "will not be continuing on" after the acquision. I have heard (but not confirmed) that the Visible Path engineering group has been entirely offshored to India.

Given the meager Visible Path technology, Kleiner Perkins will be lucky to get their money back, much less the factor of ten return that Venture Capitalists like to get.
Radar Networks

One of the founders of Radar Networks is Nova Spivack. His Minding the Planet blog can be found here (apparently also at www.mindingtheplanet.net). According to a New York Times article (Entrepreneurs see a web guided by Common Sense by John Markoff, November 12, 2006):

Radar Networks, for example, is one of several working to exploit the content of social computing sites, which allow users to collaboratet in gathering and adding their thoughts to a wide array of conent, from travel to movies.

Radar's technologhy is based on a next-generation database system that stores associations, such as one person's relationship to another (colleague, friend, brother), rather than specific items like text or numbers.

Radar Networks has a product called Twine which builds semantic content on the Twine site. An hour long demo, with the Radar Networks founder, Nova Spivak can be found on Robert Scoble's blog here. I tried to add a comment, which didn't seem to get posted. I was able to grab it from the web page and I've pasted it in below:

One of the features that is very powerful with the Web and the Internet that it is built on is that it is decentralized. Short of destroying technological civilization there is no way to destroy the Internet and the Web. Nor can the web be controlled. It is distributed on vast numbers of computer systems that are scattered around the world. Sites like Google do mirror much of the web, but the data exists elsewhere and is just mirrored on the Google mega server farm.

Currently Twine is entirely centralized. This creates a variety of problems. Twine has control of your Twine content and it exists in one logical place, the Twine server farm. Twine is not distributed through the Internet. Even if Twine opens their API, everything is still on Twine. I suppose that you could suck your graph off in something like RDF and put it on another system, but these systems don't exist yet.

It could be argued that FaceBook, LinkedIn and MySpace also control all content. At least some of these sites have probably also had to manage exponential growth. But Twine is not Semantic Web. Twine is a site that builds semantic graphs and presumably supports some of the semantic web standards. Perhaps the idea is that Twine is a seed from which the Semantic Web will grow. This remains to be seen.

There are also some serious scalability problems, which I at least, have not idea how to solve. As the size of the Twine graph grows, the link structure of the graph will grow as well. Exponential growth is easy to imagine with Twine data and their link structure. This will require a similar increase in their hardware support. Even if they purchase the necessary hardware infrastructure, some kinds of queries will become more difficult. For example, path traversal between two points (e.g., the connection in the graph between Ian Kaplan and Nova Spivack). As the size of the graph increases, each "hop" in that connection can bring in more links entities. This problem could be especially bad because Twine seeks to forge topic links automatically throughout the graph.

On the hype end of things: automatic entity extraction, the automatic recognition of names, places and organizations, is error prone. We can argue about how error prone, but I have yet to see an entity extraction system that does not mark "bank of cooling towers" as an organization (perhaps a financial organization). If these errors are not filtered out by a human, there will be some amount of "cruft" build-up of false links.

Link extraction is even more difficult. For example, the link between Paul Wolfowitz and Iraq. In Twine links are simply links through the entities (e.g., I have a reference to Wolfowitz and it become linked to other content with the same word).

Semantic Web has not been very promising so far because there was no motivation to use it and no tools to automatically build content. Twine is interesting because it is actually a step toward motivating semantic graph use. But it still seems like an early step.
21st Century Technologies

21st Century Technologies appears to be a small software consulting and product company (with a very impressive staff). They provide a variety of services based on the background of their staff. 21st Century Technologies is on this list because they do semantic graph data mining and analysis. In fact, reading their description of their Lynxeon system was like reading the description of the ADVISE system I worked on. It is not clear from the web site that Lynxeon is a core focus of their company, however. Building large scale semantic graph systems takes a lot of resources. Whether the Lynxeon system lives up to their description remains to be seen.
Metaweb Technologies

Danny Hillis, someone who keeps turning up in my list of interests, is one of (or the) founder of Metaweb Technologies. Metaweb is, apparently, doing something with semantic graphs or social networks. At the time this was written they were in "stealth" mode and what they are actually doing has not been publicly disclosed.

Despite the fact that few people have probably heard of Metaweb, they apparently think that they are Google (or the next Google). Perhaps Hillis thinks his past record with start up companies justifies this view. Or perhaps it is the amazing quality of Hillis' Phd thesis (which became the book The Connection Machine) that justifies this view. Or perhaps Hillis is a legend in his own mind. What ever the case, in order to apply for a job as a database engineer they want you to answer the following questions (which I'm quoting here under the copyright doctrine of fair use):

To apply, please respond to the following four questions in your cover letter. Brevity is the soul of wit.

1. Programming Languages have changed very little in the past 30 years: OO (Smalltalk) dates from the mid seventies. Closures and continuations (Scheme) were invented in the late seventies. List comprehensions, and other lazy functional constructs date from the early eighties. Is this vocabulary "it" for programming?

2. Have you ever built something you could have bought? If so, what and why?

3. What is your favorite time of day?

4. Imagine a graph that consists of directional links between nodes identified by small non-negative integers < 2**16. We define a "cycle" in the graph as a nonempty set of links that connect a node to itself.

Imagine an application that allows insertion of links, but wants to prevent insertion of links that close cycles in the graph.

For example, starting from an empty graph, inserting links 1 2 and 2 3 would succeed; but inserting a third link 3 1 would fail, since it would close the cycle 1 2 3 1. However, inserting a link 1 3 instead would succeed.

In your favorite programming language, declare data structures to represent your graph, and give code for an "insert link" function that fails if a new link would close a cycle. What, roughly, is the space- and time-complexity of your solution?
Instructions
Please submit cover letters and resumes in plain text or HTML only to (their email address) and include your answers to the questions above.

Perhaps the job market in the computer industry is so bad now that people will actually respond to all of these questions just for the chance of getting a job at an unknown startup company. This kind of game, that employers play with prospective employees, is what makes me glad that I have a job that I'm currently happy with.

How much you are willing to tolerate these little job interview questions probably depends on how desperate you are to get the job and whether you find the question interesting. I will confess that I did find one of MetaWeb's questions (for a "Semantic Tools Engineer") interesting:

Mark V. Shaney is an ancient Usenet bot that generated realistic (for some value of reality) prose that fooled many educated people into thinking a human was the author. (See http://groups.google.com/group/net.singles/msg/531b9a2ef72fe58 for an example.) Describe succinctly an approach, algorithm, or technique you would use to automatically distinguish Mark's prose from human prose, assuming you don't have access to his compiled program or source code.

According to Wikipedia the "Mark V. Shaney" text was created with a Markov model which mirrors the statistical distribution of words in english text. So it can be assumed that simple statistical tests on the text will not distinguish it from human written text. The text also appears to have relatively correct grammar structure, so simple grammar parsing does not look like it would be a good test either. The irony is that the software to differentiate such synthetic text from real text may be much more complicated that the software that created the text.

As of May 2007 MetaWeb has disclosed a bit more about what they do, although their business model is still obscure to ordinary mortals like myself. MetaWeb has spun off the an a site called freebase. Freebase is a term most often associated with hard core cocaine use, but in this case refers to free + database = freebase according to the web site. Freebase has been created with MetaWeb's technology. MetaWeb is looking for engineers with experience with the Semantic Web, so perhaps MetaWeb provides semantic mark up and semantic graph technology. As it noted, my work and interests have an odd intersection with Hillis'.

What is Freebase?

Freebase.com is home to a global knowledge base: a structured, searchable, writeable and editable database built by a community of contributors, and open to everyone. It could be described as a data commons. Freebase.com is enabled by the technology of Metaweb, which is described at www.metaweb.com.

Obviously this sounds a great deal like Wikipedia. To this freebase provides the cute and content free answer:

How is Freebase different than the Wikipedia?

It's an apple versus an orange: each is deliciously different. Wikipedia is an encyclopedia with information arranged in the form of articles. Freebase is more of an almanac, organized like a database, and readable by people or software. Wikipedia and Freebase both appeal to people who love to use and organize information. In fact, many of the founding contributors to Freebase are also active in the Wikipedia community. Whenever Freebase and Wikipedia cover the same topic, Freebase will link to the Wikipedia article to make it easy for users to access the best of both sites.

Given the vast and growing size of Wikipedia, I'm not sure what content freebase thinks to publish that is not on Wikipedia. Wikipedia attempts to avoid gross bias and overt commercialism, perhaps this will be freebase's forte.
Saffron Technology

This North Carolina based company seems to build learning algorithms on top of graphs. It's not clear whether these are semantic graphs or not. The company was founded by people with experience in neural nets and learning systems.
Analytic Technologies

Analytic Technologies is a small software company run by Steve Borgatti and Roberta Chase. Steve is a professor at the University of Kentucky where he teaches social network theory and analysis. The social analysis software is targeted at small graphs and, with refreshing candor, Prof. Borgatti writes that it goes slow around 10K vertices.

Graphs and Semantic Graphs

(Graphs are also frequently called networks)

The idea of representing relationships as graphs is both very old and very new. The idea has been around for a long time, but people are just starting to structure database systems in this way. See my long review of the book Linked, which discusses self-organizing complexity and networks. This review includes links to related literature and links to graph (network) visualization software.

Simile: Semantic Interoperability of Metadata and Information in unLike Environments

Simile is an MIT group that is working on a variety of useful Web software which can be used to build and process semantic web content. Simile includes tools for RDF, web page processing ("screen scraping") and building semantic content from other web pages.
InFlow Software from orgnet.com

Orgnet.com is a consulting company that appears to also sell a software tool that they claim you can use to analyze your organization to understand the self-organizing network relations.
The Semantic Web and the Resource Description Framework (RDF)

Arguably the most well know graph research project is the Semantic Web which is sponsored by W3C (the World Wide Web Consortium). Tim Burners-Lee is the Semantic Web's most well known proponent.
- The Resource Description Framework is apparently a language and software framework for publishing and discovering semantic web information. A very "link rich" review, written by Brian Donovan, of the book Practical RDF: Solving Problems with the Resource Description Framework by Shelley Powers, O'Reilly, July 2003, was published on Slashdot.
- An Introduction to RDF by Ian Davis
Uncloaking Terrorist Networks by Valdis E. Krebs, First Monday, Vol. 7, No. 4, April 2002

Valdis Krebs is involved (or perhaps the principle behind) www.orgnet.com, mentioned above.
TouchGraph

This is a somewhat primative tool that allows graphs to be constructed from google results and other data.
New Social-Network Mapping Tools Compared on slashdot.org.
WebGraph Department of Science and Information, Universita degli Studi di Milano

From the abstract of the paper The WebGraph framework I: Compression techniques by Paolo Boldi and Sebastiano Vigna, Technical Report 293-03, Universitartimento di Scienze dell'Informazione, 2003:

Studying web graphs is often difficult due to their large size. Recently, several proposals have been published about various techniques that allow to store [sic] a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other). WebGraph can compress the WebBase graph (118 Mnodes, 1 Glinks) in as little as 3.08 bits per link, and its transposed version in as little as 2.89 bits per link.

PuTTY: An alternative to Windows Telnet

PuTTY: an Open Source Telnet by Simon Tatham

At the Bearcave we do not use Microsoft's Outlook Express. This software is simply an invitation for viral infection (for some examples see my web page Why are there still e-mail viruses?). If we really, really need a Windows based e-mail program we use Eudora. Interestingly, Eudora is also the choice for the email program used at the Lawrence Livermore National Labs., where security is taken very seriously.

In general we read our e-mail on our shell accounts on Idiom. This means that we use Telnet a lot. This was no problem when we lived in California with its high Internet router density, but it became a problem when we moved to New Mexico, where there can be significant lag over the Internet. Telnet is particularly sensitive to this lag and the Microsoft Windows Telnet program has a habit of hanging up the connection while we are in the middle of an edit. Obviously this is pretty irritating, so I looked around for an alternative. One of my colleagues pointed me to PuTTY. I just started using this program. It seems to be slower than Windows Telnet. On a 56K baud connection I can type slightly faster than PuTTY sends and echos characters back. It handles window resizing properly, sending windows size information to the UNIX system properly.

Unfortunately it turned out that PuTTY was no more tolerant of network lag (timeout) than Microsoft's telnet. However, PuTTY supports a number of nice features, including SSH (encrypted) login. This makes it a good choice for network links that are monitored.

Subversive Software Developers

I think that breaking into computer systems and destroying information is both juvenile and criminal. I never cease to be amazed that the people who do this don't have something better to do with their time and talents. However, writing btp gave me an apprecation for client-server software. I'm also very irritated with Microsoft, which seems to take a very cavalier attitude toward system security. So I admire the work of the Cult of The Dead Cow. They have written a client-server program called Back Orifice 2000, which can be used to remotely and control a Windows PC over the Internet. This control can optionally be done covertly, which has lead to the controversy surrounding The Cult of the Dead Cow. Back Orifice is available as Free Source, which is the only way I would download it since The Cult of the Dead Cow notes:

1999-07-13:

BO2K Defcon CDs accidentally tainted with Chernobyl virus. We did it, it's our mistake. If you got a BO2K CD or downloaded the software from a 3rd party mirror, you need to scan your system for CIH v1.2 TTIT.

So look at the source code before you compile the program. There have been a number of cases where trojan horse code has shown up in open source software (for example, in DNS support code). Despite all this, the sophisticated features and Open Source make Back Orifice a nice program to learn from. Programs like Back Orifice would be much less dangerous if Microsoft paid any real attention to security. The fact that executable e-mail enclosures can install a trojan horse like Back Orifice on a Microsoft system is inexcusable on Microsoft's part. Given the current state of security on Windows NT, I would never use a Windows NT system as an Internet front end.

Update: January 13, 2000

The Cult of the Dead Cow web page seems to be a dead end. All that is seen is something that says "filling ditches with snitches, head up cDc 2000". I did not find any links to click elsewhere. The cDc folk are also associated with L0pht Heavy Industries. The L0pht folk have recently formed a venture funded company known as @Stake, which specializes in computer security issues.

Update: September 29, 2003

The company @Stake does not resemble the sort of company that I would have expected the CdC/L0pht people to be involved with. @Stake recently announced that they fired their Chief Technology Officer, Daniel Geer for co-authoring a report critical of Microsoft. Given that the L0pht fold wrote Back Orifice, this is a strange turn of events. I have moved my notes on Daniel Geer's firing and @Stake off this web page, since this web page is already so long. These notes can be found here.

Update: August 7, 2005

The Cult of the Dead Cow web page is back. I'm trying to get them to sell Cult of the Dead Cow teeshirts with the way cool graphic from the front page (La Vaca es Muerto).

Snort: Protection from Subversive Software Developers

Snort is a open source packet sniffer and intrusion detection package. It is described in the article Snort - Lightweight Intrusion Detection for Networks by Martin Roesch. Snort runs on both Unix systems and Windows systems.

Ethereal is another network sniffing program. Like many tools Ethereal can be used a number of ways. I've used it to track down network problems on my local network.

BSD UNIX: live free or die

Those who do not know history are doomed to repeat it.

The study of an academic discipline like computer science is, in large part, the study of history. We learn what others have done before. With Linux there seems to be an entire generation that is almost willfully ignorant of history. Linux has a monolithic kernel which is now so big that it is a struggle for anyone to understand it. Before Linux other operating systems like Mach, BSD Unix and even Windows NT were moving away from a monolithic structure into an operating system composed of components.

UNIX has a long history. The first version of UNIX that I used did not have virtual memory (e.g., System III). Once UNIX escaped from Bell Labs different versions started to develop. There was AT&T's version, System V, Sun's version (called SunOS back then), and the Berkeley version. The Berkeley version of UNIX had TCP/IP networking first and had a huge influence on the course of UNIX developement. Once the AT&T code was removed from UNIX, an open source version of the Berkeley Standard Development (BS) release was born. Versions of BSD calved off as well. James Howard has written a great article titled The BSD Family Tree which describes the various BSD versions and other BSD derived UNIXs.

CD-ROM and DVD-ROM Software for Linux

What ever the virtues of BSD UNIX, Linux has now become the UNIX standard. There are number of reasons I like to use Linux for development. The operating system is transparent and easier to understand than Windows (NT, 2000, XP or what ever Microsoft calls their latest release). But despite what the Slashdot crowd claims, all is not wonderful on Linux. There are not many options when it comes to software to burn CD-ROMs and especially DVD-ROMs. I purchased a Plextor DVD-RW drive so that I could back up my hard drives. Here are some notes on CD/DVD-ROM software for linux:

X-CD-Roast

Most people use X-CD-Roast on Linux. While I've succeeded in using this software to make copies of my music CDs (for personal use at work, I should add), I have never gotten it to work with a DVD (although I've destroyed several DVDs in the process). I find the user interface difficult and the documentation inadequate.
K₃B CD & DVD Burning

This software, for the Linux KDE Window system, looks really promising. I have not tried it yet, but I have hopes that it will be better than X-CD-Roast. One point of concern is that the documentation link does not point to anything but links to discussion groups. There is, apparently, a book in german on the software. So I will hope for an english translation.
Dvdrtools dvdrecord - dvd-rw/dvd-r made easy and free

Lets see, CD-Roast uses cdrecord. The cdrecord software is open source and this is a branch off of the cdrecord tree. I guess my comment is "free, perhaps, but easy?"
DVD+RW/+R/-R[W] for Linux by Andy Polyadov

Some notes by Andy Polyadov on CD/DVD tools on Linux. Some of this material may be a bit dated.
A Slashdot discussion: Free DVD Recording Tool for Linux?

A Slashdot discussion on DVD recording tools. As I recall the summary was that CD-Roast is about the best there is.

Miscellaneous Software Applications

This section includes links to various software applications that I've found useful.

Audiograbber: CD-ripper for Windows

I am preparing to move my office at work into an area where personal electronics and music CDs are not allowed. MP3 files are allowed, however. So I looked around for a music CD "ripper" that I could use "rip" my music CDs to MP3 files and store them on my computer's hard disk. I have been very happy to find the freeware Audiograbber application. Audiograbber is very easy to use, has a small disk footprint and is not infected with spyware. I am also pleasantly surprised at how small the compressed MP3 files are. I'm sure that I'm losing some quality compared to a music CD, but I'm playing the music on cheap computer speakers, so I have not noticed any difference.

The evil RIAA stalks the earth looking for grandmothers who might be downloading rock oldies without paying the members of the RIAA (the horror, the horror). I'm sure that the RIAA dislikes programs like Audiograbber because this software makes it easy to convert music CDs into MP3 files. However, despite what the RIAA would like to think, it is still legal to make a personal copy, for your own use, of the music that you have purchased. That is exactly what I have done.

While I despise the heavy handed tactics of the RIAA and the general stupidity of the major music publishers (suing your customers, there's a good business model), I strongly believe that people should be paid for creating music (or books or software). So I would never publish music that I've ripped as a matter of principle.

Java Links

At Bear Products International we are involved in compiler development, including software tools and compilers for Java. Compared to, for example, the IEEE Verilog standard, the Java Language Specification and the Java Virtual Machine Specification are models of clarity. However, there are still some holes. Some of these issues are discussed in The Java Spec Report. There is a lot of good material here, but its last update is listed as September 1998.
The www.bmsi.com Java Web page by Stuart D. Gathman publishes several interesting sources, including a Java class dependence analyzer. Java dependence analysis is interesting for a Java compiler because the compiler must compile classes that are referenced by the class being compiled. Clearly this is a recursive process.
Bill Venners, author of Inside the Java 2 Virtual Machine has a great web site with lots of information on the JVM, Java and Jini (the coolest Java technology yet released). The site is named after Venners' consulting company, Artima Software. You can click on the icon below to go to the Artima site.
Other bearcave.com Java related link pages:
- Links to Web Pages Related to Compiling Java (to native and JVM byte code)
- Sources for the Java Virtual Machine (JVM). This list concentrates on open source JVMs and JVMs for embedded systems. I have not included obvious sources like Sun and IBM.
- Compiler Front End and Infrastructure Software
- Miscellaneous Java issues
- Compiler related links
  
  This web page provides a limited set of links to compiler related resources on the Web.

XML and Related Technologies

Castor (from Exolab.org)

Castor supports serialization from Java to XML and to SQL (to update a database to provide persistance). The Castor software reads an XML Schema and generates Java classes which include code to "marshal" (write) to XML and "unmarshal" (read from) XML into the Java class.

I found Castors approach somewhat unexpected, at least for Java. Using Java introspection it is possible for Java software to discover the structure of an object. The object can then be written to or read from XML. In a non-XML context this is the operation defined by the Serializable interface.

Sun's Java 1.4 release (and later releases) includes an XMLEncoder and XMLDecoder which will encode and decode a Java class into XML form. John Zukowski, an active voice in the ANTLR community, wrote a brief tutorial on using the XMLEncoder and XMLDecoder classes. These XML serialization classes presumably makes use of the Java reflection API.
Rogue Wave XML Object Link

The Rogue Wave XML Object Link is a tool that reads XML schemas and generates C++ classes that include the marshalling and unmarshalling code. Although I like Rogue Wave software, I've always found it too expensive for use on my personal software projects, so I expect that this is another example of expensive Rogue Wave software. Perhaps its time to create an open source version.
Gnome libxml: the XML C parser and toolkit

This is an open source (MIT software license) C toolkit for parsing and validating XML. It apparently supports both "push" SAX type parsers, where you supply callback functions and "pull" parsers, where the parser requests the next token (this is a great feature, since many applications fall into this catagory). The parser can also generate DOM objects. All-in-all, it looks pretty good.
The Expat XML Parser developed by James Clark (no, not the Netscape guy).

Expat is an XML parser for C (or C++, using C linkage I assume). The main claim to fame, at least originally, for Expat, was speed. I don't know if this remains true compared to later versions of Xerces. However, this is the parser that was used for the Mozilla browser. Expat is probably smaller and easier to understand that Xerces, which can be a big plus.

James Clark has a software company in Bangkok, Thailand called Thai Open Source. Perhaps the name Expat is inspired by Mr. Clark's experience.
- Building a Data Structure with Expat by David M. Howard (the link is on this web page, under publications).
  
  David M. Howard's web page Building a Data Structure with Expat describes a limited version of Rogue Wave's XML Object Link. That is, a software tool that reads XML Schemas and generates C structs.
XP: a high-performance XML parser for Java

The Expat parser is targeted at C (or at least C linkage). XP is for Java. What makes XP interesting is that it seems to be designed for speed and it is does "pull" style (or demand style) parsing, not SAX callbacks. In the demand style parsing, the program doing XML semantics asks for the next token, which is delivered by the XML parser. In the SAX style the parser calls the semantic routines, which makes many applications difficult to implement. This is why people still tend to use DOM parsers.

The XP link above is from James Clark's XML Resources web page. This includes links to a number of tools that James Clark has developed for XML processing.
XML Pull

XML Pull is a fast Java XML parser with a very simple interface. Unlike SAX parsing, which produces events as it parses an XML document, XML pull allows a parser to be written that requests the next token. The XML Pull web site includes a brief tutorial and a discussion of how to use XML Pull with the Java XmlSerializer interface. I have also written some web pages on how XML Pull can be used and why it is better than SEX, ah SAX.
XPA: XML Processing for ANTLR, developed by Oliver Zeigermann

XPA allows SAX to be integrated with ANTLR. Oliver writes in the antlr-interest Yahoo discussion group:

It allows you to feed XML SAX events into ANTLR parsers as token streams. Optionally, if you do not care for space, you can create an AST from a SAX parser and transform it using ANTLR tree parsers.
The Resin XML Application Server from Caucho Technology

The Simple Object Access Protocol (SOAP) for XML is a way to pass data to and from Java servelets. The Resin XML application server is very fast and light weight. It is available for developing without fee and can be deployed in applications were no fee is charged (e.g., open source) without a license fee.

I've published some notes I wrote while installing and using Resin. These notes can be found here.
JBoss Java Application Server

Unlike Resin, JBoss is an open source project. However, JBoss is definitely a commercial endeavor. Most open source projects are described in purely engineering terms. The project has these objectives, the software currently provides these features. One of the strange things about reading the JBoss web site is that it departs from simple concise engineering description, toward the language used by marketing people. At times the buzzwords start to get out of hand as well. For example:

The Aspect-Oritented Programming architecture of JBoss 4.0 enables it to provide a wide range of services, including object persistence, caching, replication, acidity, remoteness, transactions and security. The framework allows developers to write plain Java objects and apply these enterprise-type services later on in the development cycle -- without changing a line of Java code. This new concept of programming provides a clean separation between the system architect and the application developer. The iterative development process becomes more fluid as architectural design decisions can be made later on in the development process without changing any of your Java code. Entirely unique among Java-based application servers today, this architecture combines the simplicity of standard Java with the power of J2EE.

JBoss 4.0 brings Aspect-Oriented technology to Java through a pure 100% Java interface. Base on the new JBoss.org project Javassist, JBoss-AOP allows you to apply interceptor technology and patterns to plain Java classes and Dynamic Proxies.

From JBoss Aspect Oriented Programming

JBoss has sometimes been given as an example of an open source project which actually brings in revenue. Does this mean that marketing and marketing driven exposition are necessary for a commerical enterprise?
Amazon has a SOAP/XML over HTTP interface to their software. They call this the Amazon Web Service.
IBM's Emerging Technologies Toolkit (ETTK)

The ETTK is based on SOAP and Apache AXIS. This is a package to support distributed computing and what IBM has come to call "autonomic technologies". The link above is under IBM's alphaWorks . In many cases projects move from alphaWorks to either a IBM product or an open source project.
Sun Microsystems' Java IDL

Before there was XML, there was IDL (the Interface Definition Language). Sun's Java platform also supports IDL and CORBA. As the above link entries show, there is a vast and rich set of software to support XML. However, from a simple language pont of view it is not clear to me what advantages XML provides above and beyond IDL what IDL provided.

There was a theory that humans would not actually read XML. It would be created and consumed by software. XML's structure certainly reflects this. However, humans do read and write XML Schemas. XML is, arguably, more difficult to read than IDL.
XML related web pages on bearcave.com:
- Notes on Resin, Axis, Servlets and SOAP.
  
  This web page consists of my notes on installing the Resin HTTP server, AXIS SOAP support and Apache SOAP services.
- Visiting XML
  
  This web page is intended to be a commentary on XML. Right now it is a set of annotated links. I plan to eventually finish this commentary and add the links above to this web page.

What is Microsoft .NET?

Somehow What is the Matrix is a more interesting question. But, like everything the evil empire foists on us (remember OLE, COM, DCOM and ActiveX), .Net is here whether we like it or not.

Microsoft .NET by DrPizza in ARStechnica, February 2002

Once Microsoft marketing gets ahold of something, it becomes entirely obscured by smoke, mirrors and the grand vision that Microsoft wants to sell you (which, of course, is only available on the Microsoft platform). This was certainly true of a now fading object technology called OLE. It is even more true of Microsoft's ".NET". As DrPizza (Peter Bright) writes at the start of his excellent and extensive article on Microsoft's .NET:

In a remarkable feat of journalistic sleight-of-hand, thousands of column inches in many "reputable" on-line publications have talked at length about .NET whilst remaining largely ignorant of its nature, purpose, and implementation. Ask what .NET is, and you'll receive a wide range of answers, few of them accurate, all of them conflicting. Confusion amongst the press is rampant.

The more common claims made of .NET are that it's a Java rip-off, or that it's subscription software. The truth is somewhat different.

What is impressive is that Mr. Bright seems to have written this article while an first year undergrad at the British Imperial College.

Microsoft's .NET seems to consist of two components: the common language interface (CLI), which is a compiler intermediate/virtual machine instruction set and Web services. The Web services are based on Microsoft's C# language and their web server technology. The real "point of the spear" as far as Microsoft is concerned is Web services, since this sells Microsoft operating systems and application software.

Microsoft .NET has not exactly taken the world by storm. There are several reasons for this. .NET is platform specific, although some componenets, like the CLI and C# are public standards. If you want to make use of .NET technology you pretty much have to do it on a Windows platform. However, a significant fraction of the web servers in the world use Apache or other non-Microsoft software. In most cases these non-Microsoft web servers run on UNIX or Linux. Java is also increasingly being used for Web services. Java has the advantage of running on UNIX, Linux and Windows. This appears to be a case where Microsoft attempted define a standard on the Windows platform and failed. .Net: 3 Years of the 'Vision' Thing by Peter Galli, eWeek, July 7, 2003, provides a brief discussion of the tepid adoption of .NET.

Computer Science Links

In Pursuit of Simplicity: the manuscripts of Edsger W. Dijkstra, University of Texas

Computer science and computer engineering are now getting old enough that the pioneers are starting to pass from us. Seymour Cray is gone, as is Edsger Dijkstra. I have read some of the essays published on this web site in Dijkstra's book Selected Writings on Computing: A Personal Perspective. Dijkstra come at computer science from an applied mathematics perspective. His computer science essays are interesting and educational. He was also a man of strong opinions and this makes his essays amusing. I am not convinced that viewing a 200,000 line compiler as a piece of mathematics is the right approach. Rather I'd say that the compiler is a set of data structures and transformations, with some mathematical algorithms like register coloring. So Dijkstra is certainly not the final work on software engineering and computer science. But he is someone who greatly influenced the field and this influence is felt today (Java has no GOTO).
Paul Graham's Web Site

I placed this link near the top of the computer science section because Paul Graham represents many of the things that I love about computer science. In Paul's essay Hackers and Painters he writes

When I finished grad school in computer science I went to art school to study painting. A lot of people seemed surprised that someone interested in computers would also be interested in painting. They seemed to think that hacking and painting were very different kinds of work-- that hacking was cold, precise, and methodical, and that painting was the frenzied expression of some primal urge.

Both of these images are wrong. Hacking and painting have a lot in common. In fact, of all the different types of people I've known, hackers and painters are among the most alike.

Like Don Knuth, Paul views computer science as both a science and an art. Sometimes I feel that this view is dying in the world around me. Computer science does not seem much valued anymore. Software engineering jobs have become hard to come by and employers seem to care most about whether you have mastered the latest buzzwords, not whether you have a deep background in computer science and software engineering. Job interviews frequently degenerate into inquisitions where the interviewee is asked to write a series of minor algorithms on a whiteboard. Fewer and fewer people seem interested in whether you can solve complex engineering problems. Like Knuth before him, Graham shows that there is more to computer science that whether you can pick the right Java class library component.
A slashdot discussion of Robert Milner's work on algorithmic proofs and proof of correctness.

This slashdot posting includes a number of interesting links to Robert Milner's work and related work by others including Peter Lee at CMU.

I've always been fascinated by the idea of applying theorem proving techniques to software. These techniques have been used in VLSI logic design tools to show that two designs are equivalent.

However, the application to software is more problematic. The problem in software is that one would like to assure that the software "does the right thing" or at least does not suffer from catastrophic defects. In theory you can show that a body of software implements a specification (which must, in turn, be specified in some formal language). But there is no way to prove that the formal specification is without catastrophic error in the context of the application.
Concepts, Techniques and Models of Computer Programming by Peter Van Roy and Seif Haridi

The authors write:
This textbook brings the computer science student a comprehensive and up-to-date presentation of all major programming concepts, techniques, and paradigms. It is designed for second-year to graduate courses in computer programming. It has the following notable features:
- Concurrency: the broadest presentation of practical concurrent programming available anywhere. All important paradigms are presented, including the three most practical ones: declarative concurrency, message-passing concurrency, and shared-state concurrency.
- Practicality: all examples can be run on the accompanying software development platform, the Mozart Programming System.
- Programming paradigms: the most complete integration of programming paradigms available anywhere.
- Formal semantics: a complete and simple formal semantics that lets practicing programmers predict behavior, execution time, and memory usage.
The book is organized around programming concepts. It starts with a small language containing just a few concepts. It shows how to design, write programs, and reason in this language. It then adds concepts one by one to the language to overcome limitations in expressiveness. In this way, it situates most well-known programming paradigms in a uniform framework. More than twenty paradigms are given, all represented as subsets of the multiparadigm language Oz.
Parsing Techniques - A Practical Guide by Dick Grune and Ceriel J.H. Jacobs

Parsing Techniques was originally published as a book. The authors write:

The latest publisher, Prentice Hall, claims their stock has run out. Specialized book shops may still have a copy or two. Prentice Hall has indicated that they will not reprint. Copyright of the book has been returned to us, and we are now (Spring 2003) working on a second edition, updated with recent developments. At the moment we are looking for a publisher.

I have not been able to find this book on any of the used book sites (e.g., abebooks.com, amazon.com used book sellers or alibris.com). Fortunately, Parsing Techniques is now published on-line by the authors. This is an extensive, practical, well regarded discussion on parsing.
ISO 14977 EBNF Standard (PDF)

Extended Bacus Naur Fform is the oldest language used to define grammars. Apparently there is an ISO standard (ISO 14977). The above link is a copy of the PDF in some one's home directory, so it is possible that this link will disappear.
Stratego: Strategies for Program Transformation

In a formal sense, the process of correctly compiling a program in a language like Java or C++ into byte code or processor assembly language is a process of rewriting the program from the input language into the lower level langauge. The distance between the input language (C++) and the target language (assembly language) means that the rewrite process takes many steps.

Stratego is described as:

Stratego is a modular language for the specification of fully automatic program transformation systems based on the paradigm of rewriting strategies. The construction of transformation systems with Stratego is supported by the XT bundle of transformation tools. The Stratego/XT distribution integrates Stratego and XT.

The Stratego compiler is, interestingly enough, implemented in Stratego (via bootstrapping).

I believe, by the way, that stratego is a latin term for general.
The Elegant compiler generator tool kit from Philips Research (released under the GNU copyleft).

From the Elegant web page:

What is elegant?

Elegant started as a compiler generator based on attributed grammars (the name stands for Exploiting Lazy Evaluation for the Grammar Attributes of Non-Terminals) and has grown into a full programming language. Although it has been inspired by the abstraction mechanisms found in modern functional languages, Elegant is an imperative language that does not discourage side-effects.

Elegant is written in Elegant. (Beware of any language implementation not written in that same language!) and has been used for internal use within Philips for about 15 years now. In this period, dozens of compilers have been built with Elegant. Elegant release 7 is distributed under the Gnu General Public License.

Front is front-end generator for Elegant. It will generate an Elegant attribute, a scanner and an implementation for the abstarct syntax tree from one BNF-like specification. Front can alternatively generate a front-end in C, using Bison and Flex. Click below for more details on the C-version of Front

My colleagues and I have been working on a query language which has grown to be fairly complex. One of the tools that is provided with Elegant is an EBNF to railroad diagram translator. Having syntax railroad diagrams would make our documentation easier to understand. But we have been unable to get this software to build and execute properly.

Like many large corporations, Philips has made big cutbacks in their research groups. I'm not sure that Elegant is still alive.

For other EBNF to graphic format (e.g., postscript) see:
- Ebnf2ps: Peter's Syntax Diagram Drawing Tool
  
  Unfortunately this software is written in Haskell98, which makes it a bit less portable than it would be if it were written in Java or C++.
LLVM Compiler Infrastructure

Quoting from the comp.compilers posting that announced the availability of LLVM:
LLVM is a new infrastructure designed for compile-time, link-time, runtime, and "idle-time" optimization of programs from arbitrary programming languages.

LLVM uses a low-level, RISC-like, language-independent representation to analyze and optimize programs. Key features include explicit control flow, dataflow (SSA), and a language-independent type system that can capture the _operational behavior_ of high-level languages. The LLVM representation is low-level enough to represent arbitrary application and system code, yet is powerful enough to support aggressive "high-level" transformations. The LLVM infrastructure uses this representation to allow these optimizations to occur at compile-time, link-time and runtime.

Release 1.0 is intended to be a fully functional release of our compiler system for C and C++. As such, it includes the following:
- Front-ends for C and C++ based on GCC 3.4, supporting the full ANSI-standard C and C++ languages, plus many GCC extensions.
- A wide range of global scalar optimizations
- A link-time interprocedural optimization framework, with a rich set of analyses and transformations, including sophisticated whole-program pointer analysis and call graph construction.
- Native code generators for x86 and Sparc
- A JIT code generation system for x86 and Sparc
- A C back-end, useful for testing and to support other targets
- A test framework with a number of benchmark codes and some applications
- APIs and debugging tools to simplify rapid development
WebKit's SquirrelFish JavaScript Interpreter

SquirrelFish is a JavaScript to byte code interpreter which is, the authors write, much faster than simply interpreting the JavaScript syntax tree. This is not much of a surprise, although the authors seem to have found it a surprise. SquirrelFish is part of WebKit, which is an open source web browser engine. WebKit seems to be aimed at Apple's Safari browser.
The Scheme Dialect of Lisp
- How to Design Programs: An Introduction to Computing and Programming, by Matthias Felleisen, Robert Bruce Findler, Matthew Flatt and Shriram Krishnamurthi, MIT Press (paper book edition, 2001, on-line edition, 2003)
- The DrScheme Programming Environment. The DrScheme Programming Environment runs on must platforms, including Windoz and Linux.
- The TeachScheme! Project. A guide to teaching programming, via Scheme and How to Design Programs.
For some people the answer to "How to Design Programs" is "in LISP (or perhaps, Scheme). I have to confess that while I have worked in LISP and Scheme, I've never been very good. Not like I am in C++ or Java. Perhaps this is a result of having my mind warped by Pascal at an early age.

The book How to Design Programs appears to be a more approachable version of The Structure and Interpretation of Computer Programs (see below). This books seems to be an attempt at an everyperson's tutorial on programming. At least where "everyperson" has taken high school algebra. The DrScheme programming environment looks promising.
The Structure and Interpretation of Computer Programs

The Structure and Interpreation of Computer Programs is a classic computer science text book. I read somewhere that MIT is revising its CS and EE courses, but for many years this book was the text used by the MIT freshman core computer science course. This is an amazingly deep and wide ranging book. It is available via the above link on-line, in HTML form.
Partial Evaluation and Automatic Program Generation by N.D. Jones, C.K. Gomard, and P. Sestoft, with chapters by L.O. Andersen and T. Mogensen, 1993. In the preface the authors write:

This book is about partial evaluation, a program optimization technique also known as program specialization.

It presents general principles for constructing partial evaluators for a variety of programming languages, and it gives examples of applications and numerous references to the literature.

This web page publishes the complete text of this book in postscript and PDF format.
Rel - An Implementation of Date and Darwin's "Tutorial D"

The SQL expression language that is used to access and modify relational databases has been criticized (by C. J. Date, Hugh Darwen and Fabian Pascal) as not being faithful to the relational model. The critics of SQL claim that the shortcomings of SQL make is more difficult to use and to understand.

Rel is an experimental implementation of a language that is supposed to avoid the pitfalls of SQL. Rel is based on a language originally described in the book Foundation for Object/Relational Databases -- The Third Manifesto by C. J. Date Hugh Darwen, (Addison-Wesley, 1998 and published as a second edition in 2000). This book contains a section titled "Tutorial D" which describes the language.
Beyond3D Web Site

The Beyond3D Web site has excellent articles on 3D graphics, hardware graphics accelerator architecture, products and much more. This site provides the kind of clear and detailed discussion of 3D graphics hardware that Ars Technica has been providing for processor architectures.
ReiserFS

ReiserFS seems pretty dead right now. As I write this, Hans Reiser is on trial for the murder of his wife. This case has had many twists, including the revelation that Reiser's wife was involved with a man who claims to be a serial killer. Interestingly, the police have not arrested this individual.

Even if it were not for the tragic events surrounding Hans Reiser and his family, work in the Linux community on file systems seems to have eclipsed ReiserFS.
June 5, 2007

The ReiserFS is a file system for Linux. It does "journaling" which means that like data base transactions, a ReiserFS file system transaction is either complete or in a known state of process. If it is in a known state of process it can be either completed or removed. This avoids the horrible UNIX fsck (f-suck) which has been done for twenty years when UNIX systems boot.

The ReiserFS uses "fast balanced trees" and they claim that this gives it much better worst case performance.

The file system supports "plug-ins" so one can, for example, plug in an encryption module. Slashdot.com reported that the company that is developing ReiserFS (which is an open source project), Namesys, received $600K from DARPA to develop encryption plug-ins for the file system.

The ReiserFS web pages include documentation on high performance file system data structures. One of the great things about the ReiserFS project is that it is dedicated to sharing the computer science information behind ReiserFS, rather than keeping it secret.
The Journey Operating System

the Journey Operating System has some interesting features. It apparently has a journaling file system (e.g., a file system that acts like a database, where operations are either committed or rolled back). Journey OS also includes something called a HyperQueue for interprocess communication and communication with the operating system. HyperQueue apparently avoids most context switches on what would usually be operating system calls.

Apparently Journey OS is largely or completely the work of J. Charles Kain. Sadly he has written very little about the design objectives of Journey OS or the architecture of the software components he has published so far. Perhaps looking at the example of Linux he believes that if you publish the software, they will come, even thought there is little in the way of documentation beyond an overview. After all, getting the source code out there is the important thing, right? And Journey OS has some way cool icons, so what more does it need.

The popularity of Linux is to some degree a historical accident. Linux is not popular because of technical excellence. Compared to operating systems like Mach, which evolved into the Next operating system, which evolved into Apple's OS X, Linux has been, historically primative. In fact, the BSD based operating systems (freeBSD, openBSD and netBSD) have been through much of Linux's existence better designed and more stable. So it is unlikely that Journey OS is going to take the world by storm because it is "way cool".

This is not the appropriate place to publish a long rant on software documentation. Such a rant can be found elsewhere on these web pages. But it is probably worth quoting something I read on the web page of a computer science professor at Carnegie Mellon University. This professor wrote that if you could not write well, don't bother applying for a graduate student position in his group. It does not matter how brilliant you are, he wrote, if no one understands what you have done or why you did it.

Some University projects may suffer from the opposite problem of projects like Journey OS: they publish lots of paper and don't build anything that works. This does not change the fact that software source code is not a medium for communicating ideas to humans. Programming languages are designed to represent algorithms in a form that can be efficiently compiled for execution on computer hardware. Anyone who claims that a large body of software is "self documenting" either can't write clearly or is simply lazy. People who do not document their source code are pushing work that they should have done onto someone else who will have to maintain the code at a later time. These poor souls will have the unenviable job of trying to understand a large body of undocumented source code so that it can be fixed or modified. In the case of the Journey OS, it seems likely that most people will not bother to wade through some obscure project whose motives and design are unstated. Regardless of how cool and brilliant it is.
Statistical Techniques in Language Translation: Franz Josef Och's web site at the University of Southern California's Information Science Institute

See also my web page on natural language processing information extraction.
Scott R. Ladd's Coyote Gulch Web site

Scott Ladd is a software engineer and author of a number of books including books on C++ and genetic algorithms. His web site has some interesting material, including some benchmarks that compare C++, Fortran and Java (Linux Number Crunching). Although this remains controversial for some people, it was no surprise to me that Scott concludes that Java is a poor choice for numerical simulation.
Large Limits to Software Estimation (PDF) by J.P. Lewis, July 2001

Large Limits to Software Estimation is an interesting paper. The paper shows that there are theoretical limits to estimating software complexity. Since software complexity cannot be exactly estimated, schedules for constructing software are sure to be off.

One has to be careful with mathematical proofs like this. They can be both true and false in practice. For example, perfect register allocation is impossible to solve in a reasonable amount of time (e.g., the time in which a compiler should compile a piece of code). So on one hand we can say that optimal register allocation is an intractable problem. But a solution that is 80% of the perfect solution will be good enough, in practice. So if we could estimate the complexity of software with 80% accuracy, this would be good enough in practice. Of course it's not clear that we can do this and the history of software development projects does not encourage optimism.

This article was discussed at length on SlashDot. I did not actually see anyone address the content J.P. Lewis' article. They simply blathered on about their own experience and how their methodology worked or did not work in estimating software projects. Perhaps this is part of the problem in software estimation too.

J.P. Lewis' paper caused a stir in the software engineering community, probably as a result of its clarity and its negative implications. A subsequent paper, elaborating on the July 2001 paper, can be found here.
Are there limits to software estimation?, a response to J.P. Lewis' article by Charles Connell, published on slashdot.org, January 11, 2001

One of the points raised by Lewis in his original article is that we are more likely to be right in our estimates for the time to complete a project if we have implemented similar software before. For software where we have no previous experience, the estimates become doubtful.

Extensive design and implementation documentation is, to some degree, a dry run for the software implementation. If this documentation is complete the author has, at least in thought, completed a prototype. As a result, the time estimates are likely to be more accurate.

I like Lewis' paper and I think he has some valid points. I certainly prefer Lewis' approach to the hype and fuzzy thinking in some of the software engineering literature (e.g., if methodology X is followed all of the problems previously encountered in large software projects will be avoided). But as Charles Connell points out in his critique, software estimation is not aimed at finding a perfect time estimate:

The real-world problem of software estimation is much less strict than Lewis states. We are just trying to get somewhere close a reasonable percentage of the time!

One of the points that Lewis raises is that the process tends to be inherently subjective and claims at objective methodologies are likely to be wrong.

Complexity, in terms of a software project, is not reducable to Kolmogorov complexity. For example, I have been working on wavelet algorithms for a year. The actual algorithms, once they are developed, are relatively small and simple (a.k.a. elegant). These algorithms should have a relatively low Kolmogorov complexity. In fact, they can be described largely using matrices, which ties directly into the Kolmogorov description.

The ideas behind wavelet algorithms are complex. In particular, some wavelet algorithms are only reversible for infinite data sets. In the case of finite data sets, errors are introduced at the edges of the data. Techniques like Gram-Schmidt orthogonalization have been proposed to deal with this problem, but this technique does not always work in practice. Other wavelet algorithms (e.g., some lifting scheme algorithms) do not have these problems, but developing the wavelet and the scaling functions takes a lot of insight.

As these issues demonstrate, Kolmogorov complexity will provide no measure for the "conceptual complexity" behind an algorithm. Yet conceptual complexity is of critical importance if the task at hand is developing software for wavelet compression of images.
Fast Algorithms for Sorting and Searching Strings by Jon L. Bentley and Robert Sedgewick (in pdf format), January, 1997

The related web page on string algorithms includes the article Ternary Search Trees (also by Bentley and Sedgewick)
Cotse Security

Cotse Security is a web site run by Stephen K. Gielda, who does computer security consulting work. Stephen's web site discusses issues concerning computer security, privacy, anonymity and freedom of expression. As with many other people, Stephen has found that there tends to be a conflict between anonymity and accountability.
Developer's Resource Guide. This site contains articles and links for software and Web content developers.
ENCORE! The Hitch-Hiker's Guide to Evolutionary Computation. This web page is written and compiled by Jorg Heittkoter, whose name can't easily be rendered in the English character set. He goes by Joke, presumably because he's tired of people messing up the spelling of his name.
John Walker's Index Librorum Liberorum. John Walker was the founder and chief software engineer at AutoDesk, the developers of AutoCAD and 3D Studio. His home page contains a collection of his writings (including his 1994 history of Autodesk and The Hacker's Diet) and various interesting and useful pieces of software. When I grow up I want to be John Walker (e.g., retired with a large powerful computer network, still developing software). John is a fan of Lord Kelvin.

Art and Computer Science

Complexification.net: Gallery of Computation

This web page showcases the work of Jared Tarbell. Jared's work is truly the intersection of art and computer science. He seems to use as a starting point a computer algorithm, from which he develops striking dynamically evolving art.

I first saw Jared's work Substrate running as a screen saver on a colleagues computer. Substrate looks like a cross between a fractal or cellular automata created city and impressionist art.
- Levitated Design and Code
  
  This is Jared Tarbell's studio/company. This has links to Jared's publications and more of his images.

Software

This section includes links to software libraries, software frameworks, system tools (e.g., compilers) and software that I find interesting from a computer science perspective.

Boost C++ Library (boost.org)

The Boost library publishes a set of library functions (in source form) that are compatible with the C++ Standard Template Library (STL). There are some interesting functions published in the Boost library. These include classes to support graphs and graph manipulation. However, unlike STL, there does not seem to be any guiding direction for what is included. For example, the math library section includes quaterion functions, which are used by people doing 3-D programming and tracking applications. This is cools stuff, but certainly not as generally useful as the string class or the vector template in the C++ STL.
POSIX Threads
- Pthreads Win32: Open Source POSIX Threads for Win32
  
  POSIX Threads (known as Pthreds) have kernel level support on several UNIX systems and provide a platform independent way to implement therading. Pthreads have no native support on Windows (Win32). The Pthreads Win32 package from RedHat provides a Pthreads interfact to the Win32 native threads.
  
  The Boost library also has a threads package. I can't see why anyone would use this package if Pthreads is available. Pthreads is well documented and is a POSIX standard. This means that your code is more likely to be portable between platforms.
- Posix Threads Programming: a tutorial from the Lawrence Livermore National Labs. This is part of the Livermore Computing Training. There are additional tutorials on Python, OpenMP, MPI and other topics.
OpenTop

OpenTop is an implementation of some of the core components of the Java class library, for C++. Compared to Java, C++ has a fairly limited class library (consisting of at most the Standard Template Library, Boost and a few others). OpenTop provides many of the core Java classes, which is a big step forward. An added advantage is that these classes will also be familiar to any Java programmer.

OpenTop is available as GPLed Open Source and as a commercial product (presumably the commercial product is a different source base).
Savannah, at nongnu.org

This appears to be a Free Software project repository for projects that are not an official part of the Free Software Project. This software is described as "Free Software" for free operating systems (e.g., not Windows).
Digital Mars

Digital Mars is Walter Bright's company and publishing site for his compilers. Walter was the author of the Zortech C and later C++ compilers. The Zortech C++ compiler was one of the first compilers available for the IBM PC (this was the version of C++ based on The Annotated C++ Reference Manual by Ellis and Stroustrup. The Zortech compiler was also one of the most reasonably priced and was the first C++ compiler I purchased. Zortech was purchased by Synantec, before they got into the anti-virus software. Walter apparently also wrote the Semantec Java compiler.

The Digital Mars web site publishes compilers for Win32 and DOS. The Digital Mars compilers and runtime libraries are available on CD-ROM for around $30, which as far as I'm concerned is just about free. You can also download the compilers if the CD is too expensive for you (go on, buy the CD).

There are also some interesting libraries published on the site, including the C++ STL and Boehm's garbage collector ported for the Digital Mars compiler/runtime environment.

Walter Bright continues an amazingly productive and innovative career. He has designed the D Programming Language. He has apparently written a compiler for D as well. As the name suggests, D is meant to be a successor to C/C++. Unlike C++, D does not have lots of compatibility baggage.

Walter has not only written a number of optimizing compilers for various languages, he has also written a multiplayer strategy game called Empire.
BrookGPU: Brook for GPUs is a compiler and runtime implementation of the Brook stream program language for modern graphics hardware.

Modern graphics processors usually have at least four floating point arithmetic units. This make graphic processors (GPUs) the fastest numeric engines available for microprocessor prices. The BrookGPU project is based at Standford and is sponsored in part by DARPA.
Mono: a Free Software implementation of .NET

The Mono project is an attempt to implement version of Microsoft's stadardized software components on Linux. Currently these are the C# langauge and the Microsoft CLI (Common Language Interface). Other, non-standardized components, will be implemented as well.

I have mixed feelings about the Mono project. The C# language and the .NET framework can be seen as yet another attempt by Microsoft to control the computing platform.

My perception, at the time this was written, is that .NET is losing ground to Enterprise Java. Enterprise Java is not simply Sun Microsystems, but includes a number of other companies like IBM, Oracle and Weblogic. Potentially this gives enterprise Java a development base that is so large that even Microsoft cannot compete with it. One advantage of enterprise Java is that it runs on both Windows and Linux. Portability and the large developer and user base make enterprise Java a huge treat to Microsoft's attempt to dominate enterprise software.

The Mono project brings portability to .NET, removing one of the advantages of enterprise Java. To the extent that Mono is usable (currently an open question), this could help Microsoft gain acceptance of .NET. Mono is an open source project and most developers are donating their development time. To the extent that developing Mono helps Microsoft, developers have to ask themselves whether this is how they want to spent their free time.

Some writers have pointed out that a successful Mono project which becomes widely adopted for on Linux could be destroyed by Microsoft, threatening any project that relied on Mono. See Mono-culture and the .NETwork effect posted on Librenix, October 13, 2003.
The Hoard Memory allocator

From www.hoard.org:

Hoard is a fast, scalable and memory-efficient allocator for multiprocessors. Hoard solves the heap contention problem caused when multiple threads call dynamic memory allocation functions like malloc() and free() (or new and delete). Hoard can dramatically improve the performance of multithreaded programs running on multiprocessors.
Valgrind: an open-source memory debugger for x86-GNU/Linux

I have implemented a C++ reference counted String container. As it turns out, this software exposes much of the complexity of C++. You can only assure proper function with a fairly large test suite and with a software tool to verify memory use (this is why people like Java).

I've used Purify, a commercial software tool that runs on the Sun Microsystems workstations, for verifying memory use in the past. For a cave based software developer like me Purify has a substantial license fee. This license fee can definitely be justified for commercial software, but it is harder to justify for software that I give away. Purify was originally developed for the Sun platform and has a good reputation on this system. At least at the time of this writing, Purify's reputation is not as strong on other systems, especially Linux.

Valgrind is an excellent open source alternative to Purify. Object code that is processed by Valgrind does not seem to suffer a huge performance penalty. On Linux I was able to use Valgrind to find some reference before definition errors in my String container test suite. The results produced by Valgrind are not as easy to understand as those produced by Purify. Also, Valgrind does not have Purify's nice GUI interface.
Commercial C++ Memory Debugging Tools

The complexity of some C++ code means that it can be difficult to tell whether there are memory errors. Memory errors take several forms, including references to memory that has been deallocated and failure to free allocated memory. In addition to the Open Source valgrind tool mentioned above, there are a few commercial tools. One of the first commercial tools to provide sophisticated memory reference debugging was Purify which is currently sold by IBM/Rational.

As noted above, Purify is an excellent tool and is very easy to use. I have used it to verify a number of large software systems on Sun's Solaris verson of UNIX. From my point of view, the problem with Purify is its cost, which at the time of this writing is over $1,000 for a single license.

Parasoft sells a tools called Insure++ which does many of the same things that Purify. However, like Purify, the Insure++ software tool is expensive (I've seen quotes around $1,200 per license).

A company named Software Verification sells a tool called Memory Validator. This is more affordable. At the time of this writing their web site states that it is available for at an "introductory price" of $299 and is usually priced at $399. The authors of Memory Validator discuss it in Software Verification's Memory Validator
TOM: A Pattern Matching Compiler for Multiple Target Languages

TOM is a tool for doing pattern matching on trees. For example, the abstract syntax trees generated by a compiler. TOM has also been used for pattern matching in rule based systems. This last application particularly caught my eye. Perhaps it is possible to use TOM for pattern matching in information extraction applications in natural language processing.
Fnorb: An Object Request Broker for Python

As I write this XML technologies seem to be all the rage. Or at least them seem like a disease that can't be eradicated. Before there were XML Schemas (XSDs), XSLT and XPath, there was IDL, CORBA and related object marshalling and unmarshalling technologies. Like XML, these offered platform independent ways to distribute data. Fnorb is a Object Request Broker platform for Python.
Zope

Zope is yet another Web application framework. The web site humbly describes Zope as:

Zope is a unique software system: a high-performance application server, a web server, and a content management system. Straight out of the box, it is a complete and self-contained solution, that includes a robust, scalable object database, web services architecture, and powerful programming capabilities.

Apparently Zope is closely integerated with the Python language, originally developed by Guido van Rossum. On a side note: although object databases been around for some time, they do not seem to have made much of a dent in the relational database model.
- Plone
  
  Zope is refered to as a "content management framework". Apparently Plone builds on top of Zope to provide a "content management system". The main goals of CMS are to allow easy creation, publishing and retrieval of content to fit a business needs. (from the plone.org web pages). I guess if you already know what content management frameworks and contement management systems are, this is probably all crystal clear. Obviously bearcave.com could use something to help beat it into a more organized whole.
The Visualization Toolkit: an open source toolkit for graphics visualization

this is a 2D and 3D graphics visualizatin toolkit, which seems to run on all major platforms. Unlike many open source projects, the Visualization Toolkit is extensively documented in a user guide and a text book.
Stegdetect

Steganography is the art and science of hiding information within other information. For example, hiding a message within an image. The Stegdetect program claims to find such messages. I think that the proper way to refer to this is that it claims to find some such messages.

One of the problems with steganography and internet images is that most images are encoded in .jpg (JPEG) format. In general JPEG compression uses lossy compression (information is lost when the image is compressed). The human eye usually does not notice the information that is lost, so the "lossyness" of the compression scheme does not effect the apparent image quality. However, if information is buried in the image "noise", JPEG compression may destroy the hidden message.

The problem with JPEG compression aside, one interesting way to hide information is to apply a wavelet noise filter to the image. This will remove noise from the image without affecting its quality. An encrypted message (which is similar to noise) can then be plugged into the "holes" left by the noise filter. If the message appears in a particular wavelet spectrum, it can be recovered by applying a wavelet transform. Assuming that such a technique worked (which is an open question), statistical tests might not work well in detecting such an image.
Readerware

At the Bearcave we have a large library of books. Readerware supports book indexing and cataloging. I first found out about it at a bookstore in Barcelona (Hibernian Books) which used Readerware to track their inventory.
Misinterpretation, yet another excellent article by "Robert X. Cringely" on code obfuscation (as it applies to Java byte code, .NET code and native code).

The article mentions a small software company called PreEmptive Solutions, which makes obfuscator software for Java and .NET object. They have developed a technique that they call Program State Code Protection which could be applied to native code as well.
- Arxan is another company involved in this area.
On February 17, 2004 I attended an unclassified talk at LLNL given by John Grosh, who is the associate Director for Advanced Computing, Information Systems Directorate at the Department of Defense. He was talking about the various DoD computing initiatives. One of these involved issues of code obfuscation to protect codes with national security importance. The thinking is that the US can no longer control the export of high performance computers, but it may be able to control the export of important software and the associated algorithms.

While DoD may be concerned with byte code based codes, my impression was that the primary area of concern was native (microprocessor) code. While there are several companies that sell "obfuscation" products for Java byte code, I have not heard of a company that does this for microprocessor instruction sets.

There are a number of interesting questions here. If you have highly optimized code for a modern processor, how well can you do disassembling it into the source language? I am not sure of the answer to this question. But at MasPar Computer Corp. we worked on a debugger for optimized code. It was a difficult problem and in many cases the debuger got confused. My impression is that the general feeling is that disassembly of machine code does not do very well.

Another problem, pointed out by one of my colleagues, involves embedded system code. Modern weapons platforms (e.g., planes, tanks, helicopters) have increasing amounts of embedded software. These platforms are captured every-once-in-a-while (like the signal intelligence plane captured three years ago by China). If the computer systems are not destroyed, the possessor of the platform can understand the system by examining the software. So DoD is probably concerned on a number of levels about protecting code.
GnuPG: the GNU Privacy Guard

Network Associates bought Phil Zimmerman's PGP and commercialized it. Apparently it never made money, perhaps because a free version existed in parallel. In early March Network Associates announced that they had been unable to find a buyer for their PGP division. They subsequently fired 18 members of the staff and announced that there would be no new development. GnuPG offers a complete replacement:

GnuPG is a complete and free replacement for PGP. Because it does not use the patented IDEA algorithm, it can be used without any restrictions.

GnuPG is, of course, open source.
CDex: CD-ROM to .wav, MPEG, etc

A great source of 1-D data for signal processing via wavelets or Fourier transforms is music stored on CDs. The problem is getting the music in digital form off the CD so it can be fed to signal processing code. CDex is an open source program read a track off a CD-ROM into a .wav file (which is more or less a pure bit-stream). It turns on Windows NT, which is the platform I use for developing a lot of my signal processing software. It apparently needs Adaptec's ASPI manager, what ever that is (I have not installed this software yet, but I do have an Adaptec UltraSCSI CD-ROM/Disk interface).
Backplane: An Open Source Distributed Database

The Backplane web page provides the following description:

The Backplane Open Source Database is a replicated, transactional, fault-tolerant relational database core. Currently supported on Linux and FreeBSD, Backplane is designed to run on a large number of small servers rather than a small number of large servers. With Backplane, it is possible to spread the database nodes widely, allowing database operations to work efficiently over WAN latencies while maintaining full transactional coherency across the entire replication group.

Backplane's native quorum-based replication makes it easy to increase the database capacity (by simply adding a new node), re-synch a crashed server, or take down multiple nodes for maintenance (such as an entire co-location facility) - all without affecting the database availability.
Rekall: An Open Source Database Front End

Rekall is a database agnostic frontend which is implemented in Phython. The GUI is implemented QT.

Rekall supports the creation of custom forms and data retrieval displays. These forms are aimed at end users, rather than database developers. This seems different from tools like TOAD or TORA which are database frontends for developers. Rekall includes python debug capability for forms and table display development.
PostgreSQL Manager from Electronic Microsystems

While I have studied database system architecture and theory, I only started using databases relatively recently. When I first started using Oracle, I did not know that there were tools like database management and administration tools. I used Oracle's shell interface (sql*plus). Then I discovered the TOAD Database Manager for Oracle. TOAD provides a very nice interface for entering SQL statements. It provides an excellent schema browswer which shows the database tables and the table structure. Life was much easier. No I can't imagine living without a tool like this.

Most of the database management tools are aimed at commercial databases, especially Oracle. I was very happy to Google into the PostgreSQL Manager. It turns on both Windows and Linux and has a very reasonable license cost.

The only problem is that I've been unable to get the evaluation version to connect to postgresql running on my local Linux system...

The company has no contact address. When I first sent mail to support@pgsqlmanager.com I got a bounce from a Russian email address:
```
... while talking to ems.ems.ru.:
>>> RCPT To:
<<< 550 : Sender address rejected: undeliverable address: host shell2.webquarry.com[67.131.250.77] said: 553 5.3.0 
550 5.1.1 support@pgsqlmanager.com... User unknown
```
I've since submitted a problem report through their support web page. We'll see what come of it.
SQuirreL SQL Client

SQuirreL SQL Client is a graphical Java program that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc.
Icarus Verilog HDL Simulator and Synthesis

This is a free (as in beer) Verilog simulator and synthesis tool. This software has been developed by Stephen Williams and he holds the copyright to the software (e.g., it is not GPLed).

From poking around the web pages I'm not sure how much of the Verilog standard is supported. Verilog is a huge language, with complex simulation semantics. Implementing a full simulator for Verilog is a major task. Implementing Verilog logic synthesis on top of this seems like a task that is much too large for a single person, even if that person has a trust fund and does not have to work for a living (which is not the case for Mr. Williams).
OVM Project

To quote from the OVM Project main web page:

The goal of the Ovm project is to develop an open source framework for building programming language runtime systems. Ovm is a DARPA funded collaborative effort between Purdue University, SUNY Oswego, University of Maryland, and DLTech. The current emphasis for Ovm is to produce a Java. Virtual Machine compliant with the Real-Time Specification for Java.

Quickly glancing through this site and the, at the time of this writing, sparse publications, it is not clear to me how this project attracted DARPA funding. DARPA projects are usually either aimed at pushing the state of the art or at providing technology that is of interest to the US military. This project does not seem to meet either of these criteria. There are some existing open source runtime systems for Java. There does not seem to be huge demand for building a myraid of different runtime systems.
Mutt E-Mail Client, an email client with an Elm-like interface.

While I am a follower of the one true religion (Emacs, you dolt) and use Emacs Rmail, my beloved, while also an Emacs user, uses Elm. I'm planning on moving to a new ISP and I wanted to do what I could to make sure that my beloved has a familiar e-mail client. Mutt looks like it may "fill the bill".
Disk Backup Software

Backing up a hard disk to preserve data in the event of a system crash has always been a tedious process. For years the best alternative was DAT tape. However, DAT tape drives were expensive as were the tapes themselves. An alternative was the "Travan" tape drives. My experience with these tape drives was not good. They seemed to be sensitive to dust and I had one tape drive fail after only limited use. Even when the Travan tape drive worked, they were very slow.

The availability of write once DVD burners has been a great step forward. A DVD can hold about 4.7 giga-bytes. Unfortunately many of the DVD data applications simply back-up directories and files. This is not much help in the event of a disk crash where it would be nice to restore the disk from a disk image stored on one or more DVDs.

Slashdot had an interesting, and for me, timely discussion on disk imaging software. See Experiences w/Drive Imaging Software?

I wish I had seen this discussion before I purchased Paragon Software's Drive Backup application. This application did not recognize my Pioneer Electronics DVD-104 DVD burner. Paragon is a German company and there is no phone contact in the United States. The only way to contact Paragon is via their support e-mail address. They ignored the first two e-mails I sent. Fortunately I paid via a credit card. When I sent Paragon e-mail telling them that I was going to contact my credit card company and dispute the charge, I finally got a reply. In the e-mail that they were replying to I explained that my computer was running Windows XP and I was having problems with the Pioneer DVD-104. I got an obscure note back suggesting that I download the Linux version.

Disk backup software is the last line of defense and the reliability of this software is critical. My experience with Paragon did not lead me to believe that I could trust their software. I have removed this software from my system and written my credit card company asking them to back out the charge for the Paragon software (later: eventually, I did manage to get a refund). I will never purchase another Paragon Software product.

A better alternative seems to be the Symantec Ghost backup program. A number of posters on Slashdot used this program. Apparently some versions of this software had problems with Microsoft copy protection. From looking at the Microsoft knowledge base article on these problems it appears that later versions (e.g., the 2003 version) may have corrected these problems. I currently have this on order. I'll update this section when I receive it and have had a chance to use it.

Computer Hardware

Net Express, the hardware vendor for the The Bear Cave

At the Bear Cave we do not buy off-the-shelf computer systems made by companies like HP, IBM or Dell (the exception is laptop systems). We place high premium on reliability and performance. For example, we have been using SCSI disks for many years because they tend to be more reliable and are faster. Companies like HP and Dell compete heavily on price. The systems they sell tend to cut corners to save money that result in less reliability and lower performance. For example, a system with a high speed processor can be crippled if it is not built with high performance memory.

One of the best system integrators we've encountered is Net Express. They have a vast knowledge of the currently available hardware. You can tell them the characteristics of the system that are important to you and they will build a system to these specifications.

When it came time for the Bear Cave to purchase a Linux system, Net Express was invaluable. Operating system installation and configuration are included with the system purchase. Net Express understands Linux construction and configuration. They will make sure that all the proper drivers are installed for the hardware you've chosen. We've been extremely happy with the system we purchased and the support we've gotten. In short, Net Express Rocks!
The Shuttle Cube

The Shuttle Cube is a very compact, fully functional computer system. At the time this was written it is largely for people who are willing to build their own system or for system integrators. A fully configured system (1Gb of memory, hard disk, DVD/CD ROM, processor) runs about $1,000 US. The catch is that you have to put it together. Parts are available from electronic retailers like Fry's. Assembled Shuttle Cube systems are available from The Book PC. Fully configured systems run about $1,400 (e.g., latest AMD processor, 1Gb RAM, DVD-R/W, 80Gb 7,200 RPM disk, etc...). Note that this is without a monitor/LCD display.

Programming Languages

Ruby

Ruby is yet another scripting language, joining the worlds most popular write only language Perl (e.g., Perl is difficult to read), Python, Php, among others. Perl has been growing over the years like some demented spawn from the deep and has become "object oriented" (if such an idea can really be applied to Perl). Ruby was designed from the beginning to be object oriented.

Major programming languages have converged around C style syntax (e.g., C++, Java and C#). One view of this was that language design was largely dead. This certainly does not seem to be the case with interpreted or scripting languages.
The F# Programming Language

The F# programming language is an implementation of ML targeted at Microsoft's .NET programming environment (F# has been implemented by Evil Empire, I mean, Microsoft Resarch). Despite the research nature of this project, it does not appear to be "open source". Given Microsoft's claim that open source/Free Software is akin to a communist plot (e.g., anti-capitolist), I suppose that this is not surprising.

The Microsoft F# languages appears to be a new implementation of Caml, which was developed by a project at INRIA in France (INRIA is the French National Institute for Research in Computer Science and Control). I assume that the Microsoft's version is called F# rather than simply Caml targeted at .NET because it was not developed from the INRIA source base. Unlike the closed source Microsoft version, targeted at Microsoft's .NET (which as far as I know only runs in the Windows platform), there is also INRIA's Caml. I have not looked at the distribution license for Caml, but it appears to be distributed in both source and object releases. Caml also has a consortium of users.

Here are some additional F# references:
- Inside Microsoft's New F# Language, May 22, 2003 by Robyn Peterson, ExtremeTech
The Eiffel Programming Language

Although C++ is powerful, it takes years to learn. It is also easy to make mistakes in C++. For example, it is dangerous to ship any product implemented in C++ which has not been checked with a memory use verification tool like Purify (from IBM) or the open source ValGrind. The standard C++ class libraries (e.g., the Standard Template Library) are tiny compared to Java's huge class library.

The Java language is not as powerful, but Java supports garbage collection, so memory reference errors are largely limited to null pointer references. The Java class library can save huge amounts of time. However, Java is slow.

Some people have called the Eiffel programming Language the best professional language available. Ignoring the issues of class library support (a not inconsiderable issue), Eiffel seems like a promising alternative to Java and C++. So where can you get development tools for Eiffel?
- SmartEiffel (GNU Eiffel)
  
  SmartEiffel currently supports an Eiffel to C compiler and an Eiffel to Java byte-code compiler, along with an Eiffel debugger. This project is apparently sponsored by the French state sponsored research organization INRIA. This software is published under the Free Software Foundation Copyleft.
  
  The fact that Eiffel can be compiled into Java byte-code seems to imply that the vast Java library would be available from Eiffel.
- Eiffel Software.
  
  Eiffel Software was founded by the inventor of Eiffel, Bertrand Meyer, so one might think that this would be the natural source for Eiffel development tools. At least in my case this is not true. Eiffel Software has apparently been been taken over by marketing people and managers without recent software development experience. You can't find out the cost of an EiffelStudio license without filling out a silly form. After filling out the form I gasped at the license cost. As of July 2003, Windows and Linux licenses for EiffelStudio were listed at $4,799. An "Enterprise" UNIX license was $7,999. In contrast, a Microsoft Visual C++ .NET license at full price is under $1,000 (which I think is expensive).
  
  I'm a software engineer. I like to be paid for my work. I'm sure that others do too. So I don't find paying license fees for software unreasonable. But there is no way I can afford these fees. I have not purchased a copy of MatLab because I could not justify its cost and MatLab is relative cheap compared to Eiffel.
- Object Tools
  
  The Object Tools web site looks like it was built by developers for developers. Object Tools sells Eiffel development tools for all the major platforms and their license feels are much more affordable. As of July 2003 a "Personal" Eiffel license was $100 and a "Professional" Eiffel license was $450. These are license fees I can afford. The Object Tools compiler produces COFF (Common Object File Format), which can be linked with C.
I was only able to find the Eiffel software tool vendors listed above. So the language does not seem to have taken the world by storm. However Eiffel can be linked with C. Eiffel can also be called via the Java Native Interface for native compiled Eiffel or directly in the case of the Java byte-code compiled Eiffel. As long as reasonably priced development tools are available, it may not matter how popular Eiffel is.
AMPL: A Modeling Language for Mathematical Programming

AMPL is a comprehensive and powerful algebraic modeling language for linear and nonlinear optimization problems, in discrete or continuous variables.

Developed at Bell Laboratories, AMPL lets you use common notation and familiar concepts to formulate optimization models and examine solutions, while the computer manages communication with an appropriate solver.

AMPL's flexibility and convenience render it ideal for rapid prototyping and model development, while its speed and control options make it an especially efficient choice for repeated production runs.

From the AMPL web site

The openGL Shading Language

The link above is on the www.opengl.org web site. The abstract for the openGL Shader Language draft specification summarizes the objectives of the shader language as:

The recent trend in graphics hardware has been to replace fixed functionality with programmability in areas that have grown exceedingly complex (e.g., vertex processing and fragment processing). The OpenGL Shading Language has been designed to allow application programmers to express the processing that occurs at those programmable points of the OpenGL pipeline. Independently compilable units that are written in this language are called shaders. A program is a set of shaders that are compiled and linked together.

The aim of this document is to thoroughly specify the programming language. The OpenGL entry points that are used to manipulate and communicate with programs and shaders are defined separately from this language specification.

The OpenGL Shading Language is based on ANSI C and many of the features have been retained except when they conflict with performance or ease of implementation. C has been extended with vector and matrix types (with hardware based qualifiers) to make it more concise for the typical operations carried out in 3D graphics. Some mechanisms from C++ have also been borrowed, such as overloading functions based on argument types, and ability to declare variables where they are first needed instead of at the beginning of blocks.

The openGL Shader Language seems to be in direct competition with the Cg language developed by NVidia and Microsoft (see The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics by Randima Fernando and Mark J. Kilgard, Addison Wesley, 2003).
Steve Ramsay's Guide to Regular Expressions, Electronic Text Center, University of Virginia

I've been using UNIX longer than I'd like to admit (let's just say that the first UNIX system I used did not have virtual memory). In all this time I've never learned to love "regular expression" notation which is used to define text matching sequences for software like egrep and Perl. While Steve Ramsay's Guide to Regular Expressions has not persuaded me to love regular expressions, it provides an excellent reference guide.

C++ Development and Debugging

Lets face it: for many applications C++ sucks compared to Java.

The class libraries that are available for the Java programming language from Sun Microsystems, the Apache project and other sources are the largest collection of reusable software in existence. The widely used Eclipse development environment supports a common development platform on both Windows and Linux. I will not go into Eclipse's features and virtues here, except to say that Eclipse is an excellent environment for creating, debugging and managing large software projects.

Then there's C++. The language is a crufty offshoot of C from the early days of object oriented programming languages. When it comes to class librarys that are available without a license fee, if pales beside Java. There is the Standard Template Library and the Boost library and perhaps a few others. There are, however, many situations where Java is not an appropriate choice as an implementation language and C++ is the only reasonable choice. In this case, what tools exist for developing large C++ applications and for understanding large C++ software sources.

Microsoft's Visual C++ is an excellent development and debugging platform as long as your only want to develop on Windows. Eclipse has a C++ environment, but at the time I wrote this it is not as good as the Java environment and is buggy.

There was an informative short post on Slashdot answering a question about what tools to use for making since of a large C++ source base.

I had a pile of C++ dropped in my lap 2 years ago by Richard Steiner

My main tool for figuring it all out was to use exuberant ctags to create a tags file, and Nedit to navigate through the source under Solaris, with a little grep thrown in. I also used gdb with the DDD front-end to do a little real-time snooping.

I've since added both cscope and freescope, as well as the old Red Hat Source Navigator for good measure.

Of course Richard is wrong about Nedit. There is one true editor and it's name is Emacs and Richard Stallman is it's prophet.

Mathematics and Statistics Links

The GNU Scientific Library

The GNU Scientific Library (GSL) is an excellent, portable, source for a fairly wide range of numerical software, including random number generators, basic statistics code and linear algebra codes. The GSL is available on UNIX platforms and a self-installing release is available for Windows. I have used the GSL random number functions for developing random walk generators, which I used to test my Hurst exponent estimation code.
MathType

HTML supports a few simple features for mathematics notation, for example, HTML supports _subscripts and ^superscripts. However, when it comes to even mildly complex equations like (s_i + s_i+1)/2, HTML is pretty awkward. Someday there is supposed to be something called mathML, which is an HTML extension to support mathematical notion. At the time of this writing, it is not there yet.

I have used MathType to "type set" all of the equations in my wavelet web pages. The MathType package allows complex mathematical notation to be defined and the resulting equation saved as a GIF. I would have preferred JPEG, but GIF can be converted to JPEG by a number of tools.
Eric Weisstein's World of Mathematics, hosted by Wolfram Research.

My interest in digital signal processing, wavelets and image processing lead to an interest in mathematics (which previously existed as a college graduation requirement).

Computer science is a relatively young field. The formal study of mathematics goes back over two thousand years and spans all human socieities. Mathematics is a vastly larger area that computer science, although there is, obviously, some significant overlap. I am always in search of mathematics reference material, since I've found that many authors write for those who are already knowledgable in the field and do not always define their terms. Eric Weisstein's World of Mathematics looks like a very useful reference.
Slashdot book review review: Five Free (Online) Calculus Text Books by Ben Crowell, March 8, 2004
Sean's Applied Math Book by Sean Mauch

Sean Mauch is an applied math grad student at Cal. Tech. Unlike many people in academia Sean seems to have a strong interest and talent for teaching. Of course this may mean that his academic career is in danger since teaching is not valued at most Universities.

Agent Based Software and Modeling

Agent Building and Learning Environment (ABLE) from IBM

ABLE provides a set of Java "beans" which allow intelligent "agents" to be constructed. A fairly wide variety of agents are supplied (e.g., agents to read time series, access relations databases, perform neural net computation).
And speaking of neural nets...

Stuttgart Neural Network Simulator (SNNS)

SNNS includes a rather rich environment for building neural nets and similar learning systems. These include Kohenen maps (self-organizedmaps), along with a wide variety of neural nets.
Agent-based computational economics at Iowa State University Department of Economics.

Agent based modeling of non-linear systems, like financial markets, has a natural attraction. The only problem with these models is that they can behave in unpredictable fashions. If you cannot predict your profit and loss bands in a model then the model is dangerous when it comes to trading real money, in real markets.
- How Economists Can Get ALife: Abbreviated Version
  
  This is another Iowa State University economics department web page.
Cougaar: Cognitive Agent Architecture

From the Cougaar index Web page:

Cougaar is a Java-based architecture for the construction of large-scale distributed agent-based applications. It is a product of two consecutive, multi-year DARPA research programs into large-scale agent systems spanning eight years of effort. The first program conclusively demonstrated the feasibility of using advanced agent-based technology to conduct rapid, large scale, distributed logistics planning and replanning. The second program is developing information technologies to enhance the survivability of these distributed agent-based systems operating in extremely chaotic environments. The resultant architecture, Cougaar, provides developers with a framework to implement large-scale distributed agent applications with minimal consideration for the underlying architecture and infrastructure. The Cougaar architecture uses the latest in agent-oriented component-based design and has a long list of powerful features.
D-OMAR: Distributed Operator Model Architecture

Apparently if you're building Agent based systems (and funded by DARPA) you've got to use that word architecture. D-OMER is available in both Java and Lisp (OmarJ and OmarL, respectively). The OmarJ software is described as:

OmarJ is an agent development environment that provides tools for creating and managing systems of agents operating in a distributed computing environment. OmarJ provides most of the features of OmarL with significant enhancements, including a much improved external communication layer that uses Jini for inter-node communication, and the ability to break out of simulation mode and run agents in a non-time-controlled environment. Simulation mode uses a Java implementation of the OmarL event-based simulator.

Artificial Intelligence

The Cyc Artificial Intelligence System

Note: Cycorp as a commercial (or semi-commercial) entity has been replaced by Cyc Foundation and the OpenCyc (OpenCyc.org) open source project.

Since its inception in the 1950s, the field of AI has been characgerized by massive overstatement and hubris. So I take the statements made by people in this field with a bit more than a grain of salt. Although many of the inflated claims and fantasticly over ambitions milestones have proven to be false, there has been solid, steady progress in Artificial Intelligence (AI). For example, IBM now has chess software running on their parallel processors that play at the Grand Master level.

The Cyc system, produced by Doug Lenat and his colleagues at Cycorp in Austin Texas looks like it might be an AI milestone. I first read about Cyc in this LA Times article. Lenat observes that intelligent behavior is based on a vast storehouse of "common sense" information about the world. Until this common sense background is built up it does not makes sense to attempt build expert knowledge, since the expert knowledge will be misapplied. Cycorp has spent what they estimate as 500 person years building up such a knowledge base and its related infrastructure (e.g., input and internal knowledge representation languages and AI software to use the knowledge base). At the time I created this link, Lenat claimed that Cyc was near the point where it could start reading in text and building up its own knowledge base, although it would have to ask for clarification in some cases.

According to the Cycorp press release some of the Cyc software and knowledge base are being made publicly available:

Douglas B. Lenat Ph.D., founder and President of Cycorp, Inc. announced that a greatly expanded version of the Cyc Common Sense Knowledge Base will be made available in open access form under the name OpenCyc. In addition, Cycorp will, for the first time, provide the Cyc Inference Engine and a suite of tools for creating knowledge-based applications. OpenCyc 1.0 will be released on July 1, 2001

On the Web page discussing Cyc applications they describe how Cyc can selectively access on-line information sources like the Internet Movie Database or the CIA World Factbook. Of course the Web contains both relatively reliable sources, like these and far less reliable ones. Humans frequently make errors of judgment, embracing complex conspiracy theories for conspiracies that do not exist. The common sense that Lenat discusses as the basis for Cyc is not quite so simple. If Cyc is at some point able to build its database from written material and it were turned lose on the Web its knowledge base would become poisoned with both solid information and bizzare "facts" published by cranks.
Machines in the Myth: the state of artificial intelligence by DeAnn DeWitt, chipcenter.com, August 2001

This article discusses Cyc and provides a brief survey of artificial intelligence.
Artificial Stupidity, Part 1. The saga of Hugh Loebner and his search for an intelligent bot has almost everything: Sex, lawsuits and feuding computer scientists. There's only one thing missing: Smart machines. By John Sundman, Salon, February 26, 2003
Artificial stupidity, Part 2 Can chatterbots be as dumb as a box of hammers and still pass the Turing test? Go ask ALICE, she might know. By John Sundman, Salon, February 27, 2003.
John Sundman's web site wetmachine.com. This web site includes links to sample chapters of John Sundman's book Acts of the Apostles and Cheap Complex Devices.

The Loebner prize is a prize for, in essence, passing a version of the Turing Test (originally proposed by Alan Turing). Apparently the Loebner prize and the Turing Test are now considered anathema in the artificial intelligence community. These two articles discuss the rather eccentric Hugh Loebner, the prize and the artificial intelligence community.

Machine intelligence and the Turing Test, Technical Forum, IBM Systems Journal, Vol 41, No. 3, 2002. This web page contains two articles by artificial intelligence researchers, some of them quite famous (e.g., McCarthy and Minsky).
Why the future doesn't need us by Bill Joy, Wired Magazine, April 2000.

Bill Joy is one of the founders of Sun Microsystems and is pretty famous within the computer industry. Joy has been involved in some very impressive developments, which include the Java langauge and JINI. He has also been involved in some not so impressive developments, like the UNIX cshell. And Joy has been involved in some which are simply stupid, like his article, published in Wired, which warns of technological peril from artificial intelligence and nanotechnology. As the articles on the Loebner prize point out, AI has a long way to go before it become a threat:

Spend a few minutes chatting with even the best of the bots, and you will cease to be threatened by their imminent eclipsing of humanity. Their performance is, in Loebner's own word, gruesome.
John Sundman, Artificial stupidity, Part 2

Somewhere out on the far horizon there may lurk David Zindell's silicon God, an artificial intelligence of great power that is a threat to humanity. But we are no closer to achieving this that we are of building Zindell's diamond hulled starships that can cross between star systems in the "manifold".
AI Founder Blasts Modern Research by Mark Baard, Wired New, May 13, 2003

Among the reasons that artificial intelligence researchers do not like the Turing Test is that it does not necessarily demonstrate learning. All it does is provide a simulation of a person. The core challenge for AI is learning and the ability of the system to display "common sense". That is, to know that red spots on a car are an example of rust, not measles. This is the challenge that Cyc attempts to address.

This Wired News article discusses a speech that Marvin Minsky gave at Boston University criticizing current progress and approaches in AI. He is particularly critical of the autonomous robot work, which requires nuts and bolts work like mechanical design and soldering. Apparently Minsky feels that this is a waste of time compared to developing heuristic AI algorithms. Part of the problem, however, is that the US Dept. of Defense is willing to fund autonomous vehicle/robot work and funding influences the research that people actually do.
The Aims of Artificial Intelligence: A Science Fiction View Ian Watson, published on the IEEE Computer Society Web site.

This article is an overview of science fiction speculation on artificial intelligence. Ian Watson is a British science fiction writer. In this article he covers many of the classic speculations, from Frank Herbert's Destination: Void to The Matrix. Watson does not mention David Zindell's work, which includes some of my favorite speculation on AI and the paths that intelligences may seek to take. But Zindell is even less well known than Watson.

Source Code Documentation Generators

Most software engineers do not document their source code. This makes the reasoning behind the implementation of the source code difficult to understand and harder to maintain. I find it interesting that "open source" projects, which many people work, do not place an emphasis on documentation. The coding standards for open software like Mozilla only mention documentation in passing (or at least this was the case last time I looked).

For the minority of software engineers who do document their code, documentation generators are great tools, at least in concept. Documentation generators read source code and produce HTML web pages. Here is a partial list of documentation generators:

javadoc.

Javadoc reads the source code for a set of java classes and generates web pages. The are primarily useful for understanding the class interfaces. Javadoc is part of the standard Sun Microsystems Java release. See http://java.sun.com. For an example of Web pages generated with Javadoc, see my documentation for javad, a class file disassembler.
doxygen

The doxygen program is a documentation generator for C++. It allows source code to be included in with the generated documentation, which makes it better for generating source code base documentation. Doxygen can be found at http://www.doxygen.org. An example doxygen generated web pages for a SPAM filter can be found here.

Doxygen can use the dot program to generate cool diagrams of the software structure. Dot is part of AT&T Labs' Graphviz
Literate Programming Tools

Donald Knuth, the author of the classic books in The Art of Computer Programming series has written some articles on "Literate Programming". The idea is to join algorithms and documentation to produce elegent documents for elegent algorithms. A software tool, called cweb reads the annotated software source and generates the documentation. While I love the idea, I've found that in practice, the cweb annotation tends to obscure the source code making it difficult to read. In any case, here are some "Literate Programming" tools:
- The cweb tool can be downloaded from http://www.literateprogramming.com. This web site also includes links to other tools, articles and examples.
- FunnelWeb, written by Ross N. Williams.
- Noweb -- A simple, extensible tool for literate programming

Financial Literature

Elsewhere on these web pages I've written about statistics and wavelet techniques applied to financial data. Working at Prediction Company left me with an interest in quantitative finance and what might be classified as the microeconomics of markets. Here are some link for finance, quantitative finance and microeconomics (e.g., the low level economics of markets):

John P. Scordo has an interesting web site, www.research-finance.com which includes useful links to the finance literature. This literature is huge and includes not only journals from finance and economics, but more recently journals like Physica A, from physics (e.g., econophysics), so the links are necessarily selective.
Peter Ponzo's tutorials on finance (and other topics)

Peter Ponzo is a retired math professor who taught at the University of Waterloo in Canada. He writes that he retired early, which resulted in a smaller pension payout. He took his payout and put it in a self-directed investment fund. He then started studying investment, finance and economics. Apparently when he started discussing some of these issues on an investment discussion group he was accused of "gumming things up with mathematics". I guess that those are fight'n words for a math prof. So Prof. Ponzo started his Gummy Stuff web pages. The result are some wonderful tutorials and essays, including topics like Ito Calculus (presented in a very approachable form). If these essays are anything to go by, Peter Ponzo was a wonderful teacher, a practice which he continues to this day.

In addition to Prof. Ponzo's wonderful essays and tutorials there are recipes from his wife, Heidi and much more. Finance, statistics, food, what could be more fun (well, there is one more thing...)
Prof. Didier Sornette's home page at UCLA

Prof. Sornette wrote a very interesting book titled Why Stock Markets Crash?. This web page contains an interesting discussion of the work that Prof. Sornette's group has been doing on market dynamics, along with links to various papers they have authored. It seems wise to start with the book and then move on to the papers.
Exchange traded options, terrorism and Sept. 11

After 9/11, there was talk that terrorists or their ilk had profited by taking bearish positions in the options market ahead of the attacks. Even the evening news programs wondered aloud about Osama bi8n Laden's supposed insider dealings. That talk was soon relegated to the realm of urban myth.

Allen Poteshman, assistant professor of finance at the University of Illinois at Urbana-Champaign, dusted off the rumor and parsed the data again. His findings are, at the least, disturbing.

In a paper under review for publication by the Journal of Business and available on his Web site (www.business.uiuc.edu/poteshma), Prof. Poteshman runs through already discussed evidence, such as options dealings in airline stocks in the days leading up to Sept. 11, 2001, among other findings. He finds that most of the evidence based on options-ratio statistics is indeed inconclusive.

Coming at the matter in another, more specific way, the professor discovered that an inordinately high number of put options were purchased during the four sessions preceding the attacks. In other words, the options market suggested an unusually high level of bearishness about stocks.

The professor drew on data he had been trying to get from the Chicago Board of Options Exchange even before the attacks. He says the CBOE provided him with the data broken down by numbers of purchases and sales of non-market makers, along with whether these were new positions or traders closing out existing positions.

Rather than use the data to look at put/call ratios -- which would -- reflect buying as well as selling -- for a few days before the -- attacks, Prof. Poteshman built up a distribution of put buys and compared the week before Sept. 11, 2001, to many weeks before hand. When he did that, the sessions preceding the attacks looked unusually busy for put buying -- about the 95th percentile, meaning that more -- than 95 out of 100 times, there is a smaller volume of put buying than what he saw right before the attacks.

People can draw whatever conclusions they want, the professor said, but in economics, an event that is in the 95th percentile generally is regarded as an unusual occurrence.

The Wall Street Journal, April 30, 2004, Pg. C4

The paper discussed above is Unusual Option Market Activity and the Terrorist Attacks of September 11, 2001 by Allen M. Poteshman, Draft of March 10, 2004 (PDF format)

Professor Poteshman has published a number of other interesting papers, a number of which involve the exchange traded options market. These can be found on his publications web page.
Prof. Gene Stanley's research group at Boston University

Prof. Stanley's group does work in a remarkable variety of areas, from polymers and glasses to complex networks and financial markets. Prof. Stanley and his group have published some interesting papers on power laws behavior in financial markets. Among the many interesting papers referenced on Prof. Stanley's publications web page is Quantifying Signals with Power-Law Correlations: A Comparative Study of Detrending and Moving Average Techniques (PDF). Market trading models must deal with a great deal of noise in their input data. So filtering techniques are of great interest.

Market Information

Information may want to be free, but market data (e.g., historical and current stock open and close price, volume, etc...) costs money. Given the growth in the Internet and in networking in general one might think that market information would be easy to find. This is true only if you can spend thousands of dollars a year for market information. I've tried to gather here a list of market information providers who provide data at prices that are affordable by individuals. The fact that a company is listed here in not necessarily an endorsement.

A number of sites sell daily open and close price information. The prices are frequently adjusted for splits, but in many cases they don't also provide information on dividends. A stock will frequently drop after a dividend is issued, since the company is worth less (since part of the corporate value has been paid out to shareholders). Without dividend information a market model may be mislead by a change in stock price that can be explained by the issuing of the dividend.

The ideal case is actually to have raw market data (the actual unadjusted market prices) along with time series for splits and dividends. This allows software associated with a model to create an adjusted time series that can be customized for the application.

finance.yahoo.com

Yahoo's finance site is an excellent source of split and dividend adjusted close price data. Of course it is only available one stock at a time, but it should be possible to write software that would "mine" Yahoo.
Market Source On-line: historical end of data (close price) data.

The pricing seems very reasonable. They provide stock, mutual fund and index end of day data. They also include split information, which allows historical data to be adjusted. They don't mention dividend data, which is equally important. In the case of corporate spin-offs or special dividends, the dividend issue can make a big difference in the stock price. If the stock price is not adjusted by the dividend amount it will appear that the stock took a big dip the day after the dividend was issued. This will mislead any models that use this data.
TickData.com

TickData sells historical intra-day "tick data" for all US equities. The volume of this data is huge. The cost is about $10 to $18 per stock. They sell the entire database for $35,000. One would probably need a substantial RAID array just to store this data.
- Some market models model clusters of stocks together (for example, models that look for "reversion to the mean"). Barra supplies risk and cluster information, but it is expensive. This site provides an Microsoft Excel formatted listing of NYSE stocks by industry cluster.
Prophet Finance

Prophet Finance sells historical market data on CD-ROM and sells current market data update information by subscription. Apparently this data can downloaded daily in several ways, including FTP. Prophet Data claims that they put a lot of effort into cleaning up their data, which is important, since raw data can contain errors. Using "dirty" data in models can destroy meaningful results.
www.quotes-plus.com (this is the same as www.qp2.com)

This company seems to sell stock market data and analysis software. When I looked at the web page it was long on hype and short on information. Apparently you get a CDROM of historical data. They then sell you live updates to this database (e.g., current market data). I found it difficult to understand what data you got along with the various options and what format this data was included in. For example, they provide daily stock market data (close price, volume, etc), they also provide cluster information (e.g., a cluster being a group of stocks, like banks). They also provide analyst recomendation information. But I had a difficult time understanding which packages include which data set.

Market Intra-day Data and Trade Execution

Here is a list of sites that provide intra-day "tick-data" (at least for the NYSE) or tick-data plus trade execution. A listing here does not imply any kind of endorsement.

Interactive Brokers

Interactive Brokers looks very interesting. According to their web site they provide low cost trade execution. Interactive Brokers also provides a software API that provides access to market information and trade execution. Apparently this interface is to their "Trader WorkStation" (TWS). TWS can be hosted on Windoz or UNIX (which I assume means Linux).
DTN Market Access

DTN Market Access offers tick-data at a reasonable price. They also sell a Windows hosted software package that allows interface via an API.

Time series data base support and the K language

Time series are data sets where each data point is associated with a time value. For example, market close prices, intra-day bid/ask/volume data. Relational databasae vendors like Oracle and Informix offer packages that support time series storage and retrieval. However, the nature of time series storage and retrieval does not fit well into the classic relational model. As a result of this mismatch, the performance of relational database systems tends to be relatively poor for time series data. One exception is the database from Kx Systems which is designed to support time series data and real time time series feeds.

When I looked at Kx Systems' products it was only from a database point of view. The company I was working for had their own in house language for doing time series processing. It turns out that Kx Systems also supports a langauge called "K", which is a vaguely reminiscent of APL. I read about "K" in A Shallow Introduction to the K Programming Language, November 14, 2002, published in Kuro5hin.

Large amounts of time series data are collected in many areas of science. As a result, it is surprising that time series databases are difficult to find, compared to relational databases. One of the few open source time series databases is RDDTool, which is available under the GNU Public License.

The Orla data flow programming system has also been proposed for processing large time series data bases. Orla is described in Orla: A data flow programming system for very large time series by Gilles Zumbach and Adrian Trapletti in Proceedings of the 2nd International Workshop on Distributed Statistical Computing March 15-17, 2001

Geospatial Information Systems

GRASS

GRASS is a free, open source software project (covered under the GNU Public License). It was originally developed under a US Army Corp. of Engineers contract. From the GRASS web page:

Commonly referred to as GRASS, this is a Geographic Information System (GIS) used for geospatial data management and analysis, image processing, graphics/maps production, spatial modeling, and visualization. GRASS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies.

ISP's for UNIX Software Engineers

There are some links that I created when I was searching for a host for bearcave.com (this web site). As I note below, I finally settled on WebQuarry and I have been very happy with their services and support.

WebQuarry. This looks very promising. They support Linux shells and provide bandwidth at a reasonable cost. (This was the ISP I eventually chose and that currently hosts bearcave.com).
Hurricane Electric. This is a Linux host. They support virtual hosting. The problem with Hurricane Electric from my point of view is that you cannot get multiple UNIX shells.
Aktiom Networks Colocation. This is a colocation only site, but they offer colocation for $60/month, with a full server and 30Gb/month of bandwidth. Sounds pretty interesting. The only issue that worries me is dealing with security issues. I spend enough time on bearcave.com as it is, without having to dive into all the security issue.
Focal Hosting Solutions

freeBSD shell (via SSH) - yeah baby! Their "experienced" user account includes 4Gb transfer per month, lots of disk, Java Servlets, JSP and C/C++ compilers.
AssortedInternet.com

This ISP is interesting because they provide shell access, databases (mySQL and PostgreSQL) and Java web services support (servlets and JSP). Their fees for bandwidth seem reasonable.
JavaServletHosting.com

Linux based Java servlet hosting (using Apache Tomcat). Shell access via SSH, plus development tools (Java, C++). Very reasonable prices (in fact, the hosting prices seem too cheap).

Literary Agents for Software Engineers and Computer Scientists

Once upon a time, during a long dead age, I wrote a book, Programming in Modula-2. I think that it is safe to say that many more people have read the web pages on bearcave.com than ever read this book. Somethings I think about writing another book. One thing I learned is that it is worth the cost to have an agent. You get a much better deal from the publisher and you have someone on your side who understands the ins and outs of publishing contracts.

Waterside Productions, Inc is a literary agency that specializes in technical and non-fiction titles.

On-Line Libraries and literature sources

Voidspace - Fiction and Cyberpunk

This UK site publishes a number of science fiction and so called cyberpunk works. These include:
- Halo by Tom Maddox
- Metrophage by Richard Kadrey
- The Works of William Gibson
  
  This seems to include on-line version of all of William Gibson's books. I guess that I'm of two minds on this: Gibson is an artist of unique talent who makes his living from the royalty paid on his books. The contract between reader and author is necessarily that we buy their books. When you read a work on-line you may be violating this contract. So if you like William Gibson's books, go out and buy them.
  
  On the other hand, books are expensive, even paperbacks. Just as on-line MP3s are a great way to understand if you like the work of a band or a musician, on-line books give you exposure to a writer. William Gibson is one of the seminal writers of the twentieth century. So read some of the work published via the link above. If you like it, buy Gibson's books.
CiteSeer: Scientific Literature Digital Library

This library of journal articles and scientific and engineering publications. The CiteSeer library is one of the great resources available on the Internet. The ability to rapidly access technical publications and the references that they cite drastically reduces the difficulty of doing research. As a result, this database has greatly increased the speed that human knowledge is disseminated. In summary, this is simply a fantastic resource.
Internet Archive

This site links together the WayBackMachine, Project Gutenbert, Open Source Books and many other Internet information sources. The archive seems to contain incomplete works in at least one case. I took a look at Andrew S. Grove's High Output Management which is archived as photographs. One a few pages seem to be available and the link that is labeled "PDF" is broken.

The bearcave.com web site is one of the major works of my life and I'm happy to see that it is archived by the WayBackMachine. Currently this archive is about a year and a half out of date. I find it comforting to think that my work may live beyond my lifetime (although it is an open question as to who would be interested in bearcave.com a century hence).
MIT's library of classic literature. This is a huge collection of classic Greek and Roman literature ranging from the writings of Aeschines to Xenophon.
Baen Free Library

Baen Books is a publisher of science fiction books. This on-line library publishes complete science fiction notes, reserving only commercial rights.
Remittance Girl

Remittance Girl is a wonderful writer and a gifted graphic artist. Her web site publishes her writing, including her on-line novels The Waiting Room and The Mistress of Dakao. Remittance Girl's prose is vivid and elegant, it is also erotic. So if this offends you, this may not be the site for you.
Broken Saints

Graphic novels are a form of literature, intermixed with graphic art. Broken Saints is an on-line graphic novel/anime. However, unlike static art, this is done in "Flash". The art is beautiful and some of the words are haunting. They have released a DVD set which publishes a version of the anime on the Broken Saints site.
Bartleby.com: Great Books Online

This web site started out as Columbia University's Project Bartleby. The Columbia University project placed a large body of classic literature (to which copyright had expired) on the Web. What started out as an academic resource has become a commerical site, with adds and annoying pop-up windows. There is even a Bartleby.com CEO named Steven H. van Leeuwen.

Given the costs I pay each month to support bearcave.com, I certainly understand the cost of bandwidth. However, Universities usually buy a fair amount of bandwidth. I suspect that if Columbia cracked down on MP3 downloads, they could probably support Project Bartleby without noticing. I'm not sure what prompted the conversion of an academic resource into a rather bogus commercial site.
William H. Calvin's Books and Articles Prof. Calvin teaches at the University of Washington at Seattle. He writes on topics ranging from neurobiology to the Anasazi (a Native American tribe).

Local web site search engines

Bearcave.com has been on-line since 1995. I have steadily added content. As the site grows, even I forget what I've placed on bearcave.com or where it is. I know that for visitors it's even worse. I find that it is a constant struggle to organize the hierarchy from the main web page. So it may be time to add a local search engine. One that was recommended to me is http://www.mnogosearch.org. The mnogosearch.org search engine is open source and covered by the GNU General Public License. The user can control the index generation and it apparently runs via a CGI scrip with C, PHP and Perl search front ends. Idiom.com, the superlatively wonderful host of bearcave.com, supports PHP.

Google's Successor (fugetaboutit)

According to an August 14, 2001 article in WiredNews (Searching for Google's Successor, there are some competitors to Google emerging (of course how any of these sites make money is still an open question). I expect that most of these sites will be weeded out, unless they succeeded in developing a dramatic and lasting advantage over Google. This seems unlikely, since Google has resisted becoming distracted by other issues (e.g., being a "portal") and concentrates on being a great search engine. One search engine, which was not listed in the WiredNews list is alltheweb.com. This is a very good search engine, with a huge database. While alltheweb.com is good, I don't see it offering enough advantages to displace Google. Google's success means that they have money to spend on constant improvement.

Any way, these "upstarts" are listed below. Some of them have already died since the WiredNews article appeared (www.lasoo.com and the CURE research database)

Then again... MIT Technology review published an excellent survey article on emerging search engine technology (Search Beyond Google by Wade Roush, March 2004). Among those mentioned is Teoma (listed below).

WiseNut

This site uses link references and content to rank pages. They claim to be able to do searching and ranking on less hardware.
Teoma

Teoma uses link ranking based on pages with similar content. For example, if my Web page on wavelets points to someone else's web page on wavelets, that gives them an increment in ranking.
Vivismo

This is a search engine that catagorizes the result of other search engines (e.g., Google, Alta Vista and HotBot). The clustering and catagorization of results is a very nice feature of this search engine.

When I tried out these search engines, using bearcave.com words as a target (e.g., "ian kaplan" templar for my material on the Knights Templar), I got hits on Google, but nothing on these other pages. I assume that this means that these search engines are still building up their web data base.

Even with the "dotcom" crash of 2001 I assume that the data on the Web is growing at an exponential pace. So building up a local approximation of the web is difficult task.

Some of our friends (hmm, not many friends)

Christopher Glaeser's Nullstone Compiler Test Suite page. The Nullstone test suite is one of the best ways to test compiler optimizations. Christopher also has links to a number of other compiler related resources.
The lovely and talented Anne Kelly. Anne currently works at the radio station 98.5 The Fox

Web Magazines and Periodicals

Back in 1995 there was a boom in Web content companies. There was only a fuzzy notion of how these companies would make money. There was the general idea that exponentially increasing numbers of people were gaining access to the Web and that these people would want content. How this content would pay for itself was left for the future (lot of advertising or something). In the dark days of the "dotcom" bust of 2001 web content looks like its in trouble. Sites like Feed have gone out of business. The remaining sites, like Salon, have announced still more staff cuts, which will definitely have an effect on content. I can only hope that this does not become a collection of graveyard markers.

Salon. Some people call Salon the New Yorker of the World Wide Web. Salon definitely has an intellectual tone, but it does not have the New Yorker's literature and jouralism "from on high" attitude (the New Yorker only grudgingly publishes letters from readers, for example). I also read Salon regularly.
Guardian Unlimited

Guardian Unlimited is the on-line publication of the Guardian newspaper in the United Kingdon. The Guardian is a UK leftist intellectual publication. However, unlike The Nation and Mother Jones in the United States, the Guardian has wide circulation in the UK.

After the September 11, 2001 terrorist attacks on the World Trade Center the US press largely acted as the propaganda arm of the Bush II administration. This continued during the run-up to the war with Iraq, when few in the press criticized the obvious lies the Bush II and others in his administration were telling about the treat that Iraq presented to the United States. An alternate, and history has shown, more accurate account of events was available from the Guardian Unlimited.

In the United States there is a deep suspicion of intellectuals and the educated. The remarkable popularity of Bush II rests in part on the fact that he was untouched by his education and is as far from an intellectual as you can get without being mentally impaired. The Guardian presents an interesting contrast with many US publications. The Guardian book reviews are excellent and have a definite intellectual cast without moving toward unreadable academic works.
Ars Technica is translated, according to the "who we ars" web page, as "the art of technology". The have some great to adequate material on some fairly hard core technology issues, like modern processor architecture and performance. I'd place Ars Technica somewhere between a news.com article and a journal article. Definitely more detail and depth than a technology news article, but they do not assume level of expertise that journal articles do.
ACM Queue

ACM's version of IEEE Spectrum: technical, but readable by non-specialists.
Nerve Magazine. This magazine came on-line around June 1997. Nerve is sort of a Salon of sex. The writing is excellent and intelligent. Nerve is a publication that shows how important the defeat of the communications decency act was. By making the Web as free as print media we have publications like Nerve. As the CDA shows, America is obsessed with sex and can rarely come up with anything intelligent on the topic. Nerve is a wonderful exception. Nerve's articles are well written, intelligent and, in some cases, very erotic.
Clean Sheets. This is a literary erotica site, somewhat along the lives of Nerve. They publish some very good (and very hot) material on occasion. Clean Sheets, like Salon and Nerve are all having problems supporting free content with advertising. At this time it remains unclear who these sites will survive in the long run.
FEED is another great on-line publication. FEED is published by a small staff, so it is not updated that often. But its content and layout are excellent.

Sadly as of June, 2001 FEED has shut down. They were a site that tried to keep within a modest budget and meet their expenses. Unfortunately this approach was not successful and they ran out of money.

As we at bearcave.com well know, bandwidth is not free. Someone has to pay bandwidth, diskspace and computer fees for material to be published on the web. The feed web site at www.feedmag.com no longer publishes the FEED content. This is tragic since FEED published some excellent material. FEED can still be found in the Internet Web archive, at www.archive.org.

Internet contains an increasing amount of the knowledge and intellectual creations of the human race. The disappearance of FEED's content is an example of a serious problem for the Internet as a repository of information. The archive.org site currently provides a last resort for internet information. Archive.org also requires a constant flow of funding. Tragically, there is no guarantee that archive.org will be permanent.
HotWired, was Wired magazines on-line publication. It used to be very good, but as the money started to run out it went down hill fast. In many ways, the decline of HotWired was the harbinger of the "dot-com" bust. The HotWired site is only good as a reference on HTML.
CyberWire Dispatch, by Brock N. Meeks. Brock used to write a column titled Muckracker for HotWired and he has published in WIRED magazine as well. I think that Brock was one of the best political commentators writing today. He has since moved on to become an editor at MSNBC and only occasionally writes articles on Washington politics. His work is now "balanced". I miss those angry articles.
San Jose Mercury News

Silicon Valley's newspaper and one of the best newspapers in California.
The Nation Digital Edition

OK, I read Marx in my youth, when I did not know any better. There are still parts of Das Kapital I like (Marx's discussion of capital formation through the excess value created by labor). But I think that Marx ignored human nature. And dictatorships always go bad. But I still have a few liberal leanings which I excercise by reading The Nation.
Mother Jones Magazine

Another liberal magazine.
WWW.WHITEHOUSE.ORG

The Nation, Mother Jones... There is a pattern here. Yeah, I'm a liberal intellectual. As a member of this rare and endangered species, I'm happy to add WWW.WHITEHOUSE.ORG to my links page. WWW.WHITEHOUSE.ORG, the parody web site that gets fan mail from Dick Cheney thanking them for their biographical sketch of his beloved wife, Lynn.

In the dark times of terrorism, war, fear and unemployment that have been the hallmarks of the Bush II Presidency, there is hope while sites like WWW.WHITEHOUSE.ORG are published. Dissent and the free press still exist in the United States.

Liberal/Progressive Web Sites and Blogs

Before George W. Bush gained the Presidency in 2000 I was not terribly concerned with politics. I come from a long line of Democrats (my Grandfather on my Mother's side worked for Franklin Roosevelt's administration). I've always voted for Democrats. But I did not have a lot of political passion. I believed to some degree the claim that there was not that much difference between the Democrats and the Republicans. Both sides sucked down money from large corporations and neither side seemed terribly interested in working to represent the voters.

George W. Bush changed all that. Bush has been a disaster for the United States on almost any level that can be named. On the economic front the Bush administration have been nothing more than tax cut and spend big government Republicans. Bush has never vetoed a spending bill and the Republican Congress has spent money with abandon. Bush and his administration started a war with Iraq, which we now know posed no danger to the United States. This war has cost well over 200 billion dollars and at this writing, over 1800 lives of US sevice people and tens of thousand Iraqi lives. The major achivement of the Bush administration in Iraq has been to install an Islamic government that is closely allied with Iran, a country that is no friend of the United States.

The Bush administration is anything but conservative. They are radicals bent on restructuring the United States and other areas of the world. The problem is that they seem incapable of formulating policy and everything they have done has been done incompetently. Whether you are a Liberal or a Conservative, you should oppose this administration.

The Web has been a great resource for speading political information and for organizing opposition to the Bush administration's policies. Here are some of the web sites and Blogs that I read regularly:

Talking Points Memo by Joshua Micah Marshall
Washington Monthy Political Animal
The Washington Note by Steve Clemons
Daily Kos
The Carpetbagger Report
The Rude Pundit

The Rude Pundit writes with the flair of Hunter S. Thompson before all of the drugs Thompson took fried his synapses. The Rude Pundit is rude, crude and entirely partisan. Don't read this blog if you're easily offended.

Blogs on the Iraq War

Baghdad Burning

This blog is by Riverbend, an Iraqi woman and computer scientist. Riverbend's english is excellent and she writes movingly about what it is like being an Iraqi in these times. Riverbend's writing has been reprinted in the book Baghdad Burning: Girl Blog from Iraq
Informed Comment by Professor Juan Cole

Professor Cole is an aribic speaker and professor at the University of Michigan. His daily entries cover news about Iraq that is frequently either missing or buried in the US press.

A Few Interesting People

I'm a rather boring person. My life is work, computer science, my beloved wife, books, a bit of cooking and not much else. But there are lots of interesting people, many more than I could ever meet, read or hear of. If you're not included here, my apologies. I'm sure you're more interesting than I am. As with everything else on this web page, this is a random collection driven by serendipity and the power of Google.

Mari-Ann Herloff

Most people who use a search engine like Google appreciate the Web for the massive about of information (but true and less true) that it now contains. If the Web is not already the largest repository of human information, it will be soon.

What people may not appreciate as much is the function of the Web as a serendipity engine. The Web is a function of links created by humans with different interests. These links lead the reader to unexpected places. Sometimes this is just information, but in a few cases when one is really lucky, you can find very special people.

Such a fortuitous accident lead me to Mari-Ann Herloff's writing. Mari-Ann lives in Denmark, so I have never had the opportunity to meet her in person. But in a small way I feel that I have had the good fortune to know a little part of her through her writing.
Vince Cate

Vince Cate runs a small ISP and co-location provider on the Caribbean island of Anguilla. Running an ISP on a Caribbean island is probably enough to qualify anyone as interesting, but Vince is an interesting person for a number of other reasons.

Vince is a computer scientist (OK, this probably qualifies one as unintersting) who, as a graduate student at Carnegie Mellon worked on the Alex Filesystem project. He moved to Anguilla just as cryptography was staring to become publicly accessible (that is, cryptography was not just the domain of the NSA, but also university researchers and computer scientists building financial transaction systems). At that time the US government and other governments like the UK attempted to suppress cryptography. The attempt to suppress software that implements strong cryptographic algorithms (encryption which cannot be broken using a realistic amount of computer processing power) failed. Strong cryptographic software is now widely available and this seems to have been accepted, however reluctantly, by the US and other governments.

But back in the 1990s this was not the situation. Under the US arms export laws it was illegal for a US citizen to export cryptographic software or to take part in the development of cryptographic software which would be exported. In response to these government restrictions the cypterpunk movement was born. In the US the first amendment provide broad protection for information published on paper. The source code for cryptograpic algorithms were published in book form, since publishing these same algorithms as computer data (e.g., computer readable ASCII text) or executables was illegal. Companies like Sun Microsystems contracted with Russian software engineers and cryptographers to implement algorithms which could be legally exported (since they were not develooped in the US).

During this era Vince Cate moved to Anguilla to implement strong cryptographic software for financial transactions. Since his status as a US citizen make this illegal for him, he gave up his US citizenship (see Encryption Expert Says U.S. Laws Led to Renouncing of Citizenship by Peter Wayner, The New York Times, Sunday, September 6, 1998).

Anguilla uses a consumption tax (I assume that this means a sales tax), and has no income tax. By giving up his US citizenship this also avoided US tax law, which as someone who seems to have Libertarian and Objectivist (Ayn Rand's philosophy) sympathies was also attractive.

Vince Cate seems to have been one of the organizers of the Financial Cryptography conferences which were held in the late 1990s (apparently 1997 through 2000).

Perhaps it is just an emotional connection, but I believe that there are many advantages to US citizenship and I treasure the fact that I am a US citizen. So from my point of view Vince Cate gave up this valuable asset over what turned out to be a short term issue (e.g., the control of cryptography). At most he is now a tax refugee and not even a wealthy one like Kenneth Dart.
Rachel Chalmers

I found out about the University of Texas' Web based collection of Edsger Dijkstra's work through Rachel Chalmers' lovely essay on Dijkstra GOTO Considered Joyful, Salon, July 9, 2003

I wondered who Rachel Chalmers is. You don't find that many people who both write well and understand the work of Dijkstra. The awesome power of Google lead me to a few places, listed below. According to a biographical sketch posted on the(451) web site Ms. Chalmers "has a degree in English literature from the University of Sydney and a Master's degree in Anglo-Irish literature from Trinity College, Dublin."
- the(451). This is a web site and company that publishes reports on technologies and companies.
- The author "blurb" on the Salon article metions that Ms. Chalmers is from Australia. So I assume that the essay Beyond the Bay, which is about people she knew in Australia, is hers. (These web pages would not display using Netscape. I could only get them to display using Microsoft's Internet Explorer).
- Rachel Chalmers interviews Jim Gleason. Jim Gleason is/was the president of the New York Linux Users Group and the interview discusses Jon Johansen, the Norwegian teenager who published some simple code to break DVD encryption, allowing DVDs to be played on a computer running Linux.
- The unknown Hackers: Bill and Lynne Jolitz may be the most famous programmers you've never heard of by Rachel Chalmers, Salon, May 17, 2000
Cosma Rohilla Shalizi

Cosma Shalizi is a researcher on complex systems at the Santa Fe Institute and the University of Michigan. He worked with James Crutchfield and knows Doyne Farmer. Farmer founded the Prediction Company where I spent a wonderful and horrible two years. I quoted some lovely sentences from the paper he co-authored An Algorithm for Pattern Discovery in Time Series on my web page which discusses using wavelet compression in time series prediction. Cosma Shalizi's web page included links to some interesting book reviews and various technical papers. Perhaps in satire someone labeled Cosma's web page as one of the worst ever. I most certainly disagree.
Alan Donovan's web site at MIT

Alan Donovan implemented an Emacs plug-in for Eclipse. As a diehard Emacs user and someone who does all of their Java development (and perhaps C++ development in the future) with Eclipse, I appreciate this plug-in.

Alan's web site at MIT has a number of interesting web pages, including the Emacs plug-in. He has a page on the Newton-Raphson approximation for roots (e.g., square root) as a fractal function, which is a feature of this function that I never knew about. Other interesting "hacks" include a web based Othello program and 3D graph plotting code.

On the web page, Alan notes that he has taken a leave from the MIT graduate program to work for Google in New York. Other than Google's ubiquitous search engine and advertising model, Google has not branched out into other areas successfully at the time of this writing. In fact, as time goes on, Google looks more like an advertising/media company than a software company. Still, if sheer numbers of talented computer scientists are any indication, Google seems to have a bright future as a software company.
Mike Benham's www.ThoughtCrime.org web site

I'm not sure what to make of Mike Benham's web site. It has a large Big Brother Is Watching You picture from the 1984 movie. It seems to be a sort of techno-anarchist web site. Perhaps the fact that Mike seems to be a fan of Jack Kerouac says something in itself.

By the way, I don't have much sympathy for anarchism. Anyone who wants to know what anarchy is like should think of Baghdad, right after the US invasion, when all of Saddam's government took off. Anarchy seems to be a philosophy that is most attractive to teenagers and those close to them in age or world view. The same goes for Libertarianism, which seems to me to be anarchy with a capitalist guise.

What ever the case is with www.thoughtcrime.org, I find Mike a lyrical writer and his stories section is pretty interesting. In the savage world that the computer industry has turned into after the dot-com bust, it can sometimes be difficult to remember what it is that we loved which originally motivated us to go into this line of work. Mike writes:

[...] it should seem reasonable that I should no longer refer to myself as a programmer but rather as a poet. The vast difference between societies perception of a poet and a programmer is the result of many wrong turns in logic and understanding. Programming is not an exact science. Programming is an art. Programmers are constantly searching for the elegant solution to a problem. For any given problem there exists an infinite number of solutions. No solution is "correct" or "incorrect," so programmers try to create a solution that is as elegant as possible. Arriving at a truly elegant solution is an art.

I don't think that Don Knuth could have put it better.

The only problem with software as poetry is that only other software engineers can understand what we have created. There is another problem too, which is not limited to software. People no more agree on what is elegant and beautiful in software than they do about these qualities in english or architecture.

Mike also started the Distributed Library Project, which is an interesting experiment in creating communities through shared interests in books. The idea is that one might lend books to other members of the Distributed Library Project. This is not required of Distributed Library Project members, however. There is an eBay style feedback system to rate borrowers and a slashdot collaborative filtering system for discussions.

There are other web sites for book discussion and the whole idea of lending out books from my personal library makes me nervous. My personal library is my most prized possession and I rarely lend books. So while I'm not sure that the Distributed Library Project will fly, Mike gets kudos for stepping up and giving it a try.

My first reaction to the Distributed Library Project was that it was an attractive idea. I live in the East Bay (of the San Francisco Bay Area) and outside of Berkeley, it seem to be pretty much of an intellectual wasteland compared to the San Francisco Peninsula (e.g, Menlo Park, Palo Alto). Although I work with lots of Phds, many of them don't seem to have a wide variety of interests, nor are they necessarily well read (somewhat to my surprise). So the idea of community was attractive, but I have to also realize that I have little time beyond by work and my beloved wife. So much for community...
Frank Atanassow

Frank is a PhD student at Utrecht University's Informatica department. He works on programming languages, which is related to my interest in compiler design and implementation. He was born in Germany, raised in the Los Angles, got his undergrad degree at Cornell and worked in Japan. Clearly Frank is an interesting person and his web pages has some great links.
Joel Spolsky's Joel on Software web page

Apparently Joel Spolsky is an ex-Microsoft manager and software developer. It also appears that Joel got rich at Microsoft. Joel seems to have the common Microsoft perception that he is a Master of the Universe (after all, that's why he is rich). For example, this is what Joel writes about the book A Random Walk Down Wall Street

If you spend enough time in this industry it's almost impossible to avoid suddenly finding yourself with a big pile of money that you are going to have to manage somehow.

Perhaps the real truth is that Joel is rich as a result of fortuna. Nassim Taleb's excellent book Fooled by Randomness has some wonderful commentary on how people commonly attribute good fortune to skill, hard work and other virtues.

There is one thing we know for sure: Microsoft employees did not get rich as a result of the fine software they produced. Microsoft brought us MS-DOS. Microsoft brought us Windows 3.1 when they could have been running UNIX. It was not until Windows NT 4.0 that they actually clawed their way to up to something resembling a stable modern operating system. And don't get me started on Word, Powerpoint and other fine Microsoft products.

What Microsoft people tend to forget is that they got rich because Microsoft is a monopoly. It has nothing to do with whether their products were better (or whether they "innovated for the customer"). For some of the economic theory behind Microsoft's success see Brian Arthur's writing.

Mr. Spolsky now runs a software company called Fog Creek Software. Obviously I find Joel irritating. Joel thinks that Bjarne Stroustrup is a genius. In contrast I hope that Stroustrup will find interests beyond adding new features to the already massively bloated and hacked up C++ language. I've included Joel here because sometimes he does have some valuable and profound things to say. Also, just because someone irritates you does not mean that you should not listen to them. It is also a good thing to reflect on why someone irritates you.
Colin Fulton

I was very fortunate that early in my career I was fortunate to work for one of the best managers I've worked for, Colin Fulton.
John VanZandt and CEO Consultancy

Long ago in a world that now seems long gone, I spent about a decade working on high performance parallel processors. The first of these was called the LDF-100 (I later worked on a heterogeneous processor project sponsored by DARPA and the MasPar massively parallel supercomputer). The LDF-100 was a dataflow parallel processor which we designed for signal processing and other numeric applications. I was part of a small group at Loral Instrumentation which developed the LDF-100. The group lead was John VanZandt, who now heads a consulting company called CEO Consultancy.

CEO Consultancy does both "onshore" and "offshore" consulting and software development. Their offshore team is in Malaysia.

Malaysia is in the news in the West every once-in-awhile as a result of the pronouncements and actions of its Prime Minister Mahathir Mohamad. In the 1990s Mahathir was famous for his predictions that Malaysia would be a high tech center. Before the 1997 currency crash (which Mahathir blamed on foreign, especially Jewish, speculators) the Malaysian government was planning large investments in high speed networking infrastructure and technology research. More recently Mahathir made the news for stating that that the Jews run the world by proxy. Mahathir is also famous for jailing Anwar Ibrahim, a Malaysian politician who was widely seen as Mahathir's political successor. Anwar was jailed on charges that are believed to be false by many foreign observers. Apparently there were political factions in Malaysia, particularly Malaysian's business people of Chinese heritage who feared Anwar's Islamic platform. Jailing Anwar forced him out of politics so he is, in theory, no longer a threat.

Miscellaneous Links

University of California Benifits Web Page
https://netbenefits.fidelity.com
Electronic Voting, computer security and electoral fraud
- Black Box Voting by Bev Harris, from Plan Nine Publishing
  
  After the 2000 presidential election recount in Florida there has been a move in many states away from punch card ballots toward computerized voting machines. At least in broad concept this seems like a good idea. However, the implementation has been criticized by the computer science and computer security community. The current voting machines have no paper trail or any other kind of audit trail (e.g., write-once CD-ROM for example). Without an audit trail to track the votes that voters make to the actual count there is grave concern that computerized voting systems will allow elections, especially close elections to be stolen. without an autid trail there is no way, in fact, to prove whether the election was stolen or not.
  
  Bev Harris' book Black Box Voting discusses the security issues surrounding the current voting machines. Diebold, the manufacturer of these systems has been suing Ms. Harris in an attempt to suppress the publication of internal memos that Diebold finds embarassing.
  
  Black Box Voting will be available in both trade paperback form and in electronic form (in PDF format).
  
  In April 2004 a California State committee recommended that Diebold be banned from selling electronic voting machines to the State of California. They also recommended that the State Attorney General pursue civil and possibly criminal action against Diebold.
- VoteScam: the Stealing of America by James Collier, Victoria House Press, 1992
  
  The United States may have a better record of electoral honesty than a South American "banana republic", but elections in the US have never been entirely honest. Under the first Mayor Daily, Chicago was famous for the "ward bosses" who could deliver blocks of votes.
  
  The presidential election between Nixon and Kennedy was close, although Kennedy did appear to win the popular vote and the electoral vote. However, Nixon believed that the election was stolen from him with the help of Lyndon Johnson, Kennedy's Vice President. Nixon considered contesting the election, but was told not to rock the boat by the Republican party bosses. They probably feared exposing the system. Johnson has long experience rigging elections in Texas:
  
  Some votes for Lyndon Johnson were being purchased more directly, for it was not only the size of Texas that made campaigning so expensive but the ethics that pervaded politics in entire sections of it.
  
  One of these sections was San Antonio, and the area south of it to the Rio Grande. "The way to play politics in San Antonio," as John Gunther was to write, "is to buy, or try to buy, the Mexican vote, which is decisive."
  
  The Years of Lyndon Johnson: The Path To Power by Robert A. Caro
  
  According to Caro, Johnson would talk of an election he lost, describing the critical mistake he made in announcing his vote count first. This gave the opposition a number to beat, which they did, presumably by rigging the voting box and buying voting blocks.
  
  Rereading these accounts of dirty Texas politics reminded me of the political environment from which Governor Bush and Tom DeLay arose. Perhaps this is where Bush II learned to use ugly political strategies when the need arises.
  
  I have not read the book Votescam. Apprently it discusses voting fraud and the web site has links to a number of articles discussing this topic. The chapters that are published on-line are not particularly well written. Like any good conspiracy story, the CIA is mentioned. While the CIA certainly helped rig elections in other countries, I find it unlikely that they did this in the US.
  
  Election fraud has no place in a Republic and no place in the United States. Especially with modern technology it should be possible to hold verifiable precisely correct elections. Having said this, breathless claims that voting fraud is something new is obviously wrong. What we can hope is that election fraud will be something that is consigned to the past.
Shooting Ourselves in the Foot Grandiose Schems for Electronic Eavesdropping may Hurt More Than They Help, "Robert X. Cringely", July 10, 2003, Public Broadcasting System written editorial

I don't have a links sub-section directly dealing with security and privacy (I probably should, since there are lots of dimensions of these topics). So for the time being, this link goes under "miscellaneous".

This is a rather amazing article on the computers that support the Communications Assistance to Law Enforcement Act (CALEA). These Sun Workstations apparently sit at various phone centers and support wiretapping of the digital phone system. According to "Cringely" these systems sit directly on the Internet.

History has shown that no computer system is totally secure (e.g., hackproof). For this reason, government classified computer systems are never connected to the public internet or to the phone system. Not only are the CALEA systems connected to the Internet, but apparently they have less than state-of-the-art security. Cringely claims that the CALEA systems have been hacked by a variety of people, both inside the US and outside (e.g., foreign intelligence). Once taken over, these systems can be used for unauthorized taps. Perhaps almost as bad, there is no logging, so it is possible for recorded information to be "lost". Conversations that could show innocence might be removed or never turned in by unethical members of the police or the prosecutors office.

The original link to this article was posted on slashdot. Although some of the slash dot response was "oh, everyone knows that, its old news", in a quick Google search I was unable to find other articles that verified Cringely's article. If what Cringely claims is true, it's pretty horrifying.
Government Holds Back Scientist's Book by Richard Benke, Federation of American Scientists, May 29, 2001

This brief article is about the United States government suppressing Dan Stillman's book Inside China's Nuclear Weapons Program. Stillman was a scientist and weapons designer at Los Alamos National Labs for 29 years. After retiring he was invited to tour the Chinese nuclear weapons labs. This was after a Chinese American scientist, Wei Ho Lee was accused of passing atomic secrets to the Chinese. There are those who believe that the government is suppressing the book to cover its embarrassment over the evidence in the book that the Chinese developed a new generation of nuclear weapons without foreign information.

A fascinating account of Dan Stillman's visits to Chinese weapons complex facilities and associated universities can be found in Thomas C. Reed's article The Chinese nuclear tests, 1964-1996, published in Physics Today. Thomas Reed was at one time a nuclear weapons designer. He is collaborating on a book with Dan Stillman. Along with this article, he provides a list of Chinese nuclear weapons tests that was provided to Stillman by the Chinese.

Other articles on Dan Stillman's attempt to get his work published:
- Our Secretive Government, the Cincinnati Post, June 12, 2002
The book Inside China's Nuclear Weapons Program by Dan Stillman was supposed to be 500 pages in length. Stillman apparently won all of the court cases I've seen referenced on the Web. The last reference I've seen, however, was in 2002. As of January of 2006 Stillman's book still has not been published, nor have I seen any recent references to its status.
Society for Professional Scientists and Engineers (SPSE)

And speaking of The Labs, the SPSE is a professional organization for Lab employees.
Interviewing With An Intelligence Agency (or, A Funny Thing Happened On The Way To Fort Meade) By Ralph J. Perro (a pseudonym), November 2003 (Federation of American Scientists (FAS) web site).

And speaking of security... This is a rather disturbing account of interviewing with the NSA. I write that this is a disturbing account because the interview process seems to weed out many potential employees who would be an asset to the NSA and our country, while not being much use in protecting government secrets from people who might disclose them.

This essay is published as part of the FAS Intelligence Resource Program web page
And, speaking of Intelligence: Selections from the Senate Committee Report on Drugs, Law Enforcement and Foreign Policy chaired by Senator John F. Kerry, December 1988

As I write this Senator John F. Kerry is much in the news as the Democratic nominee to run against George W. Bush. During the primaries a number of Kerry's Democratic rivals accused him of being little more than a slighly liberal version of G.W. Bush. After the 2002 mid-term elections many in the Democratic party critized the party leadership of being spineless. Kerry has been tarred with this brush as well.

Contradicting these views of Kerry is his past. His investigation of the Contra war was not terribly popular with either Democrats or Republicans. Yet what his committee found turns out to have been largely true.
Final Report Of The Independent Counsel For Iran/Contra Matters

This is Special Prosecutor Lawrence Walsh's report on the Iran/Contra scandal which ended in part when George H.W. Bush pardoned many of those involved.
The BCCI Affair A Report to the Committee on Foreign Relations United States Senate by Senator John Kerry and Senator Hank Brown December 1992

Kerry also investigated the BCCI bank, which had a number of notorious connections. This was even more unpopular than the Contra investigation, since some of those involved in this massive fraud were powers in the Democratic party.
The Man Who Knew , A Public Broadcasting System FRONTLINE Documentary

From the PBS web page on this documentary:

For six years, John O'Neill was the FBI's leading expert on Al Qaeda. He warned of its reach. He warned of its threat to the U.S. But to the people at FBI headquarters, O'Neill was too much of a maverick, and they stopped listening to him. He left the FBI in the summer of 2001 and took a new job as head of security at the World Trade Center.

John O'Neill died in collapse of the World Trade Center, which was destroyed by the Sept. 11, 2001 terrorist attacks. He was working to get people out of the buildings.
"We don't support that": We're not here to help fix your computer. We just want to get you off the phone.
A tech-support slave tells his hellish tale. By Kyle Killen, February 23, 2004, Salon.com

Four years after I graduated from college I went to work for a division of a defense contractor that made telemetry instruments (if you really want to know who this was, you can look at my resume). Many of the hardware engineers at this company were hardware hackers. They did not so much design hardware as hack it until it worked. Many of their initial designs had serious timing problems. The lack of real design coupled with agressive schedules meant that on some occasions there was no working hardware to ship when a deadline arrived. However, this company shipped it anyway.
One of the engineers who worked on my project had started out in customer support. I asked her how this company got away with shipping hardware they knew did not work. She said that simply having a sympathetic voice on the other end of the phone who promised to fix the problem worked wonders. In many cases they were able to mollify the customers and ship something resembling working hardware a few weeks later.

Apparently the telephone support at computer manufacturers have dispensed with the sympathetic voice. Or, more accurately, at the outsourcing company hired by the computer manufacturer. According this this account from "support Hell", the people answering the phone know little, if anything about computers. In fact in many cases the support people know less than the person calling them. The job of telephone support is simply to get the caller off the phone in under 12 minutes. All the outsourcing company cares about is call volume, since this is what they're paid for. So it is not much of a surprise that there is a move to outsource these knowledge-less telephone support jobs to India.

Of course after reading this article I've even more confirmed in my practice of buying computer systems from local integrators, not Dell, HP or IBM.
Julius Caesar: The Last Dictator, A Biography of Caesar and Rome, 100 - 44 BC by Suzanne Cross

This is a well written, interesting and beautifully produced biography of Julius Caesar. A bibliography is also provided.

Ms. Cross has also written a set of Web pages on the women of classical Rome: Feminae Romanae: The Women of Ancient Rome.
Graphic Novelest Joe Sacco on the power and ignorance of the United States (salon.com)

I think the American population should be sent to The Hague to be judged. This is a country that has an enormous impact around the world. What is decided in Washington, D.C., when George [W.] Bush lifts his little finger -- someone around the world is going to feel it. To me it seems almost criminal that the people who live here, who elect someone like that -- if they really knew how other people's lives are affected by American policies, maybe they would pay more attention. It's appalling the amount of ignorance here about world events.

Sacco is a citizen of the Island of Malta, off the coast of Italy. What is fascinating is that despite the faults that Sacco sees in the United States, he still seems to feel that the US holds out hope and promise. Sacco is currently a permanent resident of the US (e.g., he has a "green card") and is applying to become a citizen. When asked if his criticism of the United States is in contradiction with his plan to become a citizen he responds

I have a deep affection for this country [the United States], and in many ways living here and deciding to seek citizenship is my little way of taking some personal responsibility for how it acts. So I don't see a contradiction at all. I see a duty.
Offshore Data Centers and Offshore Banking

First, to avoid any misunderstanding, unlike some wealthy citizens of the United States, I believe that paying taxes is part of supporting the infrastructure of the country. So I'm not one of these "move your money offshore and avoid taxes" people.

However, I have noticed that many wealthy individuals and large corporations do not share my views about the obligation to support the country that provided them the environment where they could amass their wealth. There seems to be a large, but little discussed, offshore banking system. Many large banks, like the Royal Bank of Canada, have offshore subsidiaries. Many US corporations have also located offshore or set up offshore companies. One of the most famous examples is Rupert Murdoch's New Corp, which pays some of the lowest corporate rates in the world by using complex offshore corporate structures.

Many large transactions are also handled offshore for tax reasons. These include commodity sales and the sale oil tankers. Finally there are illegal or semi-legal transactions, like drug money laundering and arms sales.

There is no point in an offshore transaction if a judge in the United States or a European country can force the disclosure of financial records. This means that these records must exist offshore as well. Given the complexity and size of offshore money flows, these transactions have long passed the paper ledger stage and are handled by computer. These computer systems must be located offshore as well. However, what is odd is that it is difficult to locate much in the way of an offshore data center infrastructure or hiring of information technology people to manage this infrastructure.

In the case of a large bank, like the Royal Bank of Canada, there is an existing information technology staff which has the knowledge to set up an offshore data center. Once the physical infrastructure is in place the bank can use its existing software (perhaps with additional support for cryptography). However, it might also make sense to use a co-location facility, where security, local power supplies and hardening against natural disaster like hurricanes can be amortized among a group of customers. I've found a few footprints to such data centers, which are listed below:
- Secure Hosting Ltd, Nassau, Bahamas
- IT Gains Permanent Residency in The Bahamas by Joseph J. Euteneurer, January/February 2000, VAULT Magazine
The Blogshares Listing for Bearcave.com (referred to on Blogshares at The Bear Cave

The Blogshares main page describes Blogshares as:

BlogShares is a fantasy stock market for weblogs. Players get to invest a fictional $500, and blogs are valued by incoming links.

As the astute reader may have noticed, bearcave.com is not, exactly, a blog. But bearcave.com is my own personal soapbox, so perhaps I'm not stretching the truth too far.
Star Bridge Systems

Star Bridge Systems makes a programmable FPGA based "parallel processor". Apparently it has been used by NASA Langley in some real applications. The big problem with FPGA programming is expressing the algorithm. While C can be converted to synthesizable logic, the result is usually very slow. However, hardware design, which results in good FPGA based algorithms is very time consuming. Languages like Handel-C attempt to bridge the gap between algorithms and their hardware implementation. Star Bridge claims to offer a software development environment that helps here as well.

When looking at the background of the founders of Star Bridge, you find people with backgrounds in small technology companies. But they don't seem to be technology heavy weights. For example, the company does not seem to be a University of Utah spinoff. This is not to say that people without hard core technology backgrounds cannot deliver innovative solutions, but they do require a closer look to make sure that it is not hype and that they really understand the issues.

The Forbes.com article Super-Cheap Supercomputing by Daniel Lyons, March 25, 2003, provides the following description of the backgrounds of the Star Bridge founders:

The company's founder and chief technologist, Kent Gilson, is a 37-year-old high-school dropout who for years has been derided by computer scientists as, well, a bit of a fringe character. The company's chief executive, Daniel Oswald, joined Star Bridge four months ago after running a foundation that dealt with ancient religious texts and Mormon studies.

The company seems to have had some rocky moments, according to the Forbes.com article:

Gilson says he understands that people will be skeptical about the claims he is making. It doesn't help that two years ago one of Star Bridge's first customers sued the company, claiming he made payments totaling $200,000 to Star Bridge and never got a workable computer. A few months ago Star Bridge settled the dispute by giving the customer shares in Star Bridge in exchange for his $200,000.

Gilson insists his dream machine actually works. "I live in the future," he says. "Most people are pessimists who live in the present or the past."
But in the end, they're still nothing more than video games by Jewels, in Jive Magazine

This is a discussion and account of video game addiction by a writer who started investigating massive multi-player on-line games, like Everquest.

"Addiction" is a term that has been massively over used. The implication in some cases has been "It was not my fault, I'm an addict". It's a disease, not a moral failing. As I've noted above, we all have a choice about whether we walk through the door or pick up the straw.

Even the nature of "addiction" is wonderfully undefined. My definition for addiction is a behavior that interfears with the functioning of your life in a harmful way. For example, it harms your relationships, your ability to earn money to pay your hills or has a negative impact on your health. Lots of things can fall into this catagory. Gambling (as discussed above), sexual obcession, over eating (think Paul Prudhomme) and, as in this article, video games. One of the problem with the whole addiction label is the question of where the line is between passion and obcession and addiction. Their work can consume most of the lives of artists, writers and scientists, for example. Is Everquest art?
An Engineer's View of Venture Capitalists, by Nick Tredennick

I'm not sure what the history of the name Kuro5hin is, but it's a good site. In addition to publishing A Casino Odyssey, they published a link to Nick Tredennick's stories from the start-up wars, which is published in IEEE Spectrum. Tredennick's article does an excellent job describing the problems that engineers face in venture capital funded startups. I've covered some of these issues in on my web page Venture Caplital and Start-up Companies.

Sadly, I think that Tredennick's solutions are naive. He suggests that if engineers funded start-up companies, they would give a larger share to the engineering talent that makes the company possible. I think that the truth is that when engineers start thinking like venture capitalists they will act like venture capitalists. Engineers will be just as willing to exploit the engineering staff as VCs are.
'If he could get away with it here, no lock in the world is safe' by Oliver Burkeman, July 17, 2003, The Guardian UK

Art and jewel thieves are romantic figures. They are the aristocracy of the criminal world, using charm and intellect to steal from the very rich. This Guardian story is of a real jewel heist in the London diamond district. The heist took place in a heavily guarded safty deposit vault. The person who committed the theft became a regular in the community, allowing access to the value without arousing suspicion. Those interviewed for the article said that known destructive techniques (drilling, explosives, etc...) would not have worked because they would have attracted attention. Now if only is Sean Connery will still be available to play the thief in the movie that is sure to follow.
Unofficial, on the ground views, of the US Military in Iraq:
- Solidiers for the Truth
- Hackworth.com (author and retired Colonel David H. Hackworth's web site)
Passport to the Pub: A guide to British pub etiquette (PDF) by Kate Fox, Social Issues Research Centre

Apparently this was commissioned by the British Brewers and Licenced Retailers Association. From Passport to the Pub

Pub-going is by far the most popular native pastime. The 61,000 pubs in Britain have over 25 million loyal customers. Over three-quarters of the adult population go to pubs, and over a third are "regulars", visiting the pub at least once a week. The pub is a central part of British life and culture. If you haven't been to a pub, you haven't seen Britain.

Visitors to Britain are bewitched by our pubs, but they are often bothered and bewildered by the unwritten rules of pub etiquette. This is not surprising: the variety and complexity of pub customs and rituals can be equally daunting for inexperienced British pubgoers.
[...]
In 1995, for Passport to the Pub, the SIRC Research team, led by Research Manager Joe McCann and Senior Researcher John Middleton, embarked on yet another six-month anthropological pub-crawl. In total, the research on which this book is based has involved observation work in over 800 pubs, consultations with over 500 publicans and bar staff and interviews with over 1000 pubgoers, both natives and tourists.

Our first task in the preliminary research for this project was to find out how much tourists knew about pub etiquette. Not surprisingly, given the lack of information available, we found that what tourists didn\222t know about pub etiquette would fill a book. This is the book
Hunter S. Thompson is dead (2/20/2005)

Hunter S. Thompson took his own life on Sunday. In a spectacular eulogy to Thompson, The Rude Pundit writes of the motivation for Thompson's act: "Chances are it's the same old story - depression, disease, drugs, or some combination thereof."

Thompson's notorious long running substance abuse took a toll years ago. Someone I knew worked at the San Francisco Examiner when Thompson was writing a weekly column for them. Thompson would phone in the column, which was largely incoherent and his assistent would write some kind of riff on Thompson's demented disordered dialog.

The toll that drugs and alcohol took on Thompson in his later years should not distract from his stature as a writer. Books like Fear and Loathing in Las Vegas, Fear and Loathing on the Campaign Trail and Fear of Lono are American classics. Even near the end, Thompson could still rise up and recognize evil in its embodiment in Bush II and his cronies. In an interview before the 2004 election Thompson said "This is the darkest hour that I have seen in my long experience as an American. This is evil." And so it is, Doc, so it is. Thompson was a long time gambler on sports. Like many of us, sadly, Dr. Gonzo called the 2004 election wrong in his 2004 Rolling Stone article on the election.

Thompson's obituary for Richard Nixon was titled He was a Crook. Reading this bitter and biting rememberance of Nixon, once viewed by many as the worst president of the modern era, it is difficult to avoid thinking of our current era and Bush II, who Thompson viewed as worse than Nixon.

Art Links

We also like some of the photorealists, in particular the late American artist John Kacere. We have Kacere prints from the Ro Gallery in New York.

Sometimes an artists will do a work we like, even though we are not wild about all their work. For us this was true of Jack Henslee. We have one of his prints as well. His work can be found at has the The Painted Lady web site.

The Web forges the strangest and surprising connections. The Web is an engine of serendipity. I went from a book review written by Felicia Sullivan published on Amazon to the Web magazine she publishes, Small Spiral Notebook. I found the painting at the top of the Web page of Small Spiral Notebook haunting. One of those pictures you see and don't forget (for different reasons that John Kacere's work). I noticed in the fine print that the "images" were from Michael Paige-Glover. Google lead me to Michael's web site (PaigeGlover.com). Michael is a very talented painter and I hope to see more of his work.

Wierd Stuff

Many of the links above are examples of how the Internet has changed the flow of human information. To those doing research the Internet gives rapid access to a vast body of literature and to recent research results.

Literary, mathematical and scientific information represents the best of what the Internet has to offer. Then there is the wierd stuff. A compendium of wierd links would be vast and would grow all the time. I've included a few here that I've run into over the years. I hope that it is obvious that I don't endorse any of these web sites.

The code to prevent unauthorized launch of nuclear missiles were all set to 00000000. No, I'm not making it up! See Keeping Presidents in the Nuclear Dark (Episode #1: The Case of the Missing "Permissive Action Links") Bruce G. Blair, Ph.D, Feb. 11, 2004

This is not weird the way stories of space aliens or bizarre conspiracy theories are. This is weird in the sense that if a thriller writer wrote this, no one would believe them. It may also be wierd because it suggests that our survival is, to a degree, a matter of fortune that no terrorist or madman seized control of a nuclear weapons instillation.

For anyone who has seen the movie Dr. Strange Love this is a very scary idea. It is also ironic that after the movie came out the US Air Force denied that an unauthorized nuclear launch, of the type depicted in the movie, could ever take place.

The Strategic Air Command (SAC) in Omaha quietly decided to set the "locks" to all zeros in order to circumvent this safeguard. During the early to mid-1970s, during my stint as a Minuteman launch officer, they still had not been changed. Our launch checklist in fact instructed us, the firing crew, to double-check the locking panel in our underground launch bunker to ensure that no digits other than zero had been inadvertently dialed into the panel. SAC remained far less concerned about unauthorized launches than about the potential of these safeguards to interfere with the implementation of wartime launch orders. And so the "secret unlock code" during the height of the nuclear crises of the Cold War remained constant at OOOOOOOO.
Planetary Activation Organization

Sheldon Nidle's Planetary Activation Organization web site describes the galactic organization that has arrived in our solar system. At one time Mr. Nidle announced the date of contact with our galactic brethren. As is so often the case when someone is foolish enough to announce a date (e.g., the date of the end of the world), the date for contact with the galactic others has come and gone. No obvious aliens around. Of course its possible that George Bush Jr. is an alien android. I've always assumed that he's simply an intellectually limited individual controlled by his father's old cronys and the moneyed faction of the Republican Party. But its possible he's one of those aliens Mr. Nidle writes about.

And later...

According to Mr. Nidle, the Aliens have arrived. The Galatic Federation is here and they have contacted him. Perhaps a strange choice, but maybe it was Mr. Nidle or George W. Bush. I guess that if I had a choice between the two, I'd choose Mr. Nidle too.

What ever the case, the aliens are here, according to Mr. Nidle and his web page even has descriptions of some select members of the Galatic Federation. Mr. Nidle continues to work to bring the Earth to global consciousness or light or what ever. To help aid Mr. Nidle in his hard work, you can now donate money via credit card.
Trance Formation of America

I found out about this site through a massive spam on Usenet. The site claims that they had nothing to do with this spam. Personally, I don't lend a lot of credence to statements made by those who believe in "mind control" and radio voices. Perhaps they did it under mind control and don't remember.

Treatement of mental illness is still primative, but the drugs are getting better. The authors of this site might find happier lives with the aid of these medications.

The definition of sanity depends on the current culture. These days we'd put Joan of Arc on drugs to help her control her "voice". The French would speak English and the height of cuisine would be bangers and mash.

Here, without (much) further comment on my part here is the description from the Trance Formation web site. I particularly enjoyed the part about "White House sex slave". Did Ms. O'Brien have to interview for this job, or was she recruited? What would Congressmen Bob Barr and Dan Burton (the arch nemesis of President Clinton) have to say about this executive privilege?

TRANCE Formation of America is the first documented autobiography of a victim of government mind control. Cathy O'Brien is the only vocal and recovered survivor of the Central Intelligence Agency's MK-Ultra Project Monarch operation. Tracing her path from child pornography and recruitment into the program to serving as a top-level intelligence agent and White House sex slave, TRANCE Formation of America is a definitive eye-witness account of government corruption that implicates some of the most prominent figures in U.S. politics.
Navahoax by Matthew Fleischer, LA Weekly, January 25, 2006

back to home page

Ian Kaplan