FB Doug Meet

Search This Blog

August 20, 2009

Wolfram|Alpha Blog : What We’ve Been Doing This Summer

What We’ve Been Doing This Summer

August 20, 2009

So what’s been happening with Wolfram|Alpha this summer? A lot!

At a first glance, the website looks pretty much as it did when it first launched—with the straightforward input field. But inside that simple exterior an incredible amount has happened. Our development organization has been buzzing with activity all summer. In fact, it’s clear from the metrics that the intensity is steadily rising, with things being added at an ever-increasing rate.

Wolfram|Alpha was always planned to be a very long-term project, and paced accordingly. We pushed very hard to get it launched before the summer so that we could spend the “quiet time” of our first summer steadily enhancing it, before more people start using it more intently in the fall.

Two really great things have happened as a result of actually getting Wolfram|Alpha launched. The first is that we’ve discovered that there’s a huge community of people out there who want to help the mission of Wolfram|Alpha. And we’re steadily ramping up our mechanisms for those people to contribute to the project.

The second thing is that we’ve now got actual examples of what people want to do with Wolfram|Alpha—hundreds of millions of them. And it’s terrific to see that so many of them work so nicely. But for us now the most valuable thing is seeing what doesn’t work yet. Because that shows us what we need to add to Wolfram|Alpha.

There are several components. One is knowledge domains. Things people want that Wolfram|Alpha doesn’t yet know. The good news is that there’s been very little that’s come through that wasn’t already somewhere on our to-do lists. They’re long lists. But we can now be confident that they’re good lists.

A second big component is linguistics. Close to half the time that Wolfram|Alpha doesn’t give a result, it’s not because it doesn’t have the necessary knowledge, or can’t do the necessary computation. It’s because it doesn’t understand what’s being asked.

It’s very interesting to see the kinds of queries that come into Wolfram|Alpha, and how they’re phrased. We’re really seeing a new human language. Based on ordinary language, but without a lot of its niceties. Probably closer to the way people think internally.

Wolfram|Alpha is a bit like a child: it’s being exposed to a new language, and it’s got to learn from examples how to understand it. The good news, though, is that Wolfram|Alpha is getting a lot of examples. Already a couple of orders of magnitude more than a child ever gets.

One of our big activities this summer has been inventing new techniques to take advantage of all this. It’s very interesting science. Much of it based on NKS. We’ve made some great advances, which we’re steadily implementing in the Wolfram|Alpha system.

The results so far are quite encouraging. In just a couple of months, we’ve reduced the “fall-through rate” of queries we don’t understand by 10%. And this is just the beginning. The techniques we’ve invented can clearly go a lot further. And we have all sorts of ideas for completely new techniques.

One of the fascinating things for me about the Wolfram|Alpha project is the way it mixes deep theoretical ideas with very practical implementation.

And one of the great achievements this summer has been streamlining the implementation. New data comes into Wolfram|Alpha all the time. But we had a plan that once a week we would update the underlying code of Wolfram|Alpha.

Some people in our development team thought this was impossible. But working on Mathematica for the past 20+ years, we’ve come up with some pretty good software engineering techniques—particularly making use of Mathematica itself to do system building, testing, and deployment.

Well, I’m happy to report that we have indeed managed to make the idea of one code update per week for Wolfram|Alpha work. In fact, it’s been working every week for the past 13 weeks!

So what’s been in all those updates?

I should explain that through the course of the summer we’ve been steadily expanding the Wolfram|Alpha development team, adding a lot of very talented people from around the world.

But in writing this blog post, I just looked up what’s actually happened to the Wolfram|Alpha codebase since launch. And I have to say that I’m quite astonished: it’s grown by a staggering 52%—adding well over 2 million lines of Mathematica code.

There have also been nearly 50,000 manual groups of changes to our data repositories over the past 3 months.

It’s hard to have a good metric for how many completely new knowledge domains we’ve added. But based on new source files, and new underlying databases, I think it’s been between 10% and 15%. (There’ll be other blog posts talking about the specifics—though we tend to be a bit bashful about new domains when they’re first added; they usually take a little while to reach maturity, and by then they don’t seem as new to us.)

One of the most difficult things about keeping our weekly update schedule is getting testing done.

We test Wolfram|Alpha at many levels. Its data, both static and real-time. Its underlying computation. Its linguistic processing. Its presentation layer. And its web operation.

Continually through each day we’re building new versions of the Wolfram|Alpha system, and doing automated tests. Over the course of the summer, we’ve dramatically increased the number and types of tests we have, both custom-built and derived from actual query streams.

Of course these tests find bugs, which we’re continually fixing. (Each week, Monday and Tuesday are bug-fixing days for all our developers.)

But what’s really great is how many users of Wolfram|Alpha send in helpful bug reports and suggestions. In fact, it’s been a big effort just to keep up with all of them.

As of now, of all the feedbacks we’ve received, we’ve classified 54,233 of them as bugs or suggestions. Of these, 31,006 are now in our implementation queue, boiled down to about 5800 to-do items.

At the beginning of the summer, we were taking care of about 250 to-do items from all sources per week. Now it’s up to nearly 600 per week.

And so far we’ve been able to tell 3907 people that the bugs they reported have been fixed.

It’s really very exciting watching Wolfram|Alpha develop. Every day there are zillions of little changes and fixes that get made (”add an extra name for a type of spider”; “fix the timezone for an outlying settlement”; etc.), while major new domains and frameworks are getting built up.

There’s also infrastructure development. Making Wolfram|Alpha run well on more web browsers. Optimizing performance. People may have noticed recently that there are no longer URLs like www12.wolframalpha.com; it’s always just www.wolframalpha.com. That seemingly minor change reflects a large engineering effort to optimize load balancing between our colocation facilities.

In addition to new content, we’ve been working very hard on new delivery and interface mechanisms for Wolfram|Alpha, which we expect to be able to announce quite soon.

It’s been a great first summer for Wolfram|Alpha. It was a mad dash to launch Wolfram|Alpha when we did. But we’ve actually built up over the summer to an even greater development intensity, though now with a progressively larger team and increasingly streamlined development systems.

These are exciting times. The vision of Wolfram|Alpha is really working! With every day bringing new advances. Progressively building up the largest coherent repository of human knowledge ever assembled.

Which we’re now getting ready for its “fall traffic”…

Wolfram|Alpha Blog : What We’ve Been Doing This Summer