I wrote an interactive visualization for Gaussian mixtures and some probability laws, using the excellent Protovis library.  It helped me build intuition for the law of total variance.

Hal Varian writes:

You might find this a useful tool for cleaning data.

I haven't tried it out yet, but data cleaning is a hugely important topic and so this could be a big deal.

via Machine Learning (Theory) by jl on 5/20/10

Slashdot points out Google Predict. I’m not privy to the details, but this has the potential to be extremely useful, as in many applications simply having an easy mechanism to apply existing learning algorithms can be extremely helpful. This differs goalwise from MLcomp—instead of public comparisons for research purposes, it’s about private utilization of good existing algorithms. It also differs infrastructurally, since a system designed to do this is much less awkward than using Amazon’s cloud computing. The latter implies that datasets several order of magnitude larger can be handled up to limits imposed by network and storage.

via Slashdot by timothy on 4/21/10
Sara Chan writes "In a landmark ruling, the UK's Information Commissioner's Office has decided that researchers at a university must make all their data available to the public. The decision follows from a three-year battle by mathematician Douglas J. Keenan, who wants the data to do his own analysis on it. The university researchers have had the data for many years, and have published several papers using the data, but had refused to make the data available. The data in this case pertains to global warming, but the decision is believed to apply to any field: scientists at universities, which are all public in the UK, can now not claim data from publicly-funded research as their private property." There's more at the BBC, at Nature Climate Feedback, and at Keenan's site.

Read more of this story at Slashdot.

via Engadget by Nilay Patel on 3/5/10
We've been dying to know more about Microsoft's Courier tablet / e-book device ever since we first caught wind of it last September, and while our entreaties to Mr. Ballmer went unanswered, we just learned some very interesting information from an extremely trusted source. We're told Courier will function as a "digital journal," and it's designed to be seriously portable: it's under an inch thick, weighs a little over a pound, and isn't much bigger than a 5x7 photo when closed. That's a lot smaller than we expected -- this new picture really puts it into perspective -- and the internals apparently reflect that emphasis on mobility: rather than Windows 7, we're told the Courier is built on Tegra 2 and runs on the same OS as the Zune HD, Pink, and Windows Mobile 7 Series, which we're taking to mean Windows CE 6.

As we've heard, the interface appears to be pen-based and centered around drawing and writing, with built-in handwriting recognition and a corresponding web site that allows access to everything entered into the device in a blog-like format complete with comments. We're also hearing that there will be a built-in camera, and there's a headphone jack for media playback. Most interestingly, it looks like the Courier will also serve as Microsoft's e-book device, with a dedicated ecosystem centered around reading. It all sounds spectacular, but all we have for a launch date is "Q3 / Q4", and we have no idea how much it's going to cost, so we're trying to maintain a healthy skepticism until any of this gets official -- call us any time, Microsoft. One more pic showing the interface after the break.

Update: We've added a gallery of user interface shots -- some of which we've seen and some of which are new.

Update 2: We've just gotten two full-length HD videos of the interface in action. We've seen parts of these before, but there's some new stuff here that's quite interesting. Check it below.

Continue reading Microsoft's Courier 'digital journal': exclusive pictures and details (update: video!)

Microsoft's Courier 'digital journal': exclusive pictures and details (update: video!) originally appeared on Engadget on Fri, 05 Mar 2010 13:44:00 EST. Please see our terms for use of feeds.

Permalink   |   | Email this | Comments

via AI and Social Science - Brendan O'Connor by brendano on 12/30/09

There are an increasing number of systems that attempt to allow the user to specify a probabilistic model in a high-level language — for example, declare a (Bayesian) generative model as a hierarchy of various distributions — then automatically run training and inference algorithms on a data set. Now, you could always learn a good math library, and implement every model from scratch, but the motivation for this approach is you’ll avoid doing lots of repetitive and error-prone programming. I’m not yet convinced that any of them completely achieve this goal, but it would be great if they succeeded and we could use high-level frameworks for everything.

Everyone seems to know about only a few of them, so here’s a meager attempt to list together a bunch that can be freely downloaded. There is one package that is far more mature and been around much longer than the rest, so let’s start with:

  • BUGS – Bayesian Inference under Gibbs Sampling. Specify a generative model, then it does inference with a Gibbs sampler, thus being able to handle a wide variety of different sorts of models. The classic version has extensive GUI diagnostics for convergence and the like. BUGS can also be used from R. (The model definition language itself is R-like but not actually R.)

    BUGS has had many users from a variety of fields. There are many books and zillions of courses and other resources showing how to use it to do Bayesian statistical data analysis. BUGS is supposed to be too slow once you get to thousands of parameters. The original implementation, WinBUGS, is written in Delphi, a variant of Pascal (!); its first release was in 1996. There are also two alternative open-source implementations (OpenBUGS, JAGS).

    This is clearly very mature and successful software. Any new attempts to make something new should be compared against BUGS.

Next are systems that are much newer, generally less than several years old. Their languages all fall broadly into the category of probabilistic graphical models, but there are plenty of differences and specializations and assumptions that are a project in itself to understand. In lieu of doing a real synthesis, I’ll just list them with brief explanations.

  • Factorie focuses on factor graphs and discriminative undirected models. Claims to scale to millions of parameters. Written in Scala. New as of 2009. Its introductory paper is interesting. From Andrew McCallum’s group at UMass Amhearst.

  • Infer.NET. I only just learned of it. New as of 2008. Focuses on message-passing inference. Written in C#. From MSR Cambridge. I actually can’t tell whether you get its source code in the download. All other systems here are clearly open source (except WinBUGS, but OpenBUGS is a real alternative).

  • Church. Very new (as of 2009?), without much written about it yet. Focuses on generative models. Seems small/limited compared to the first three. Written in Scheme. From MIT.

  • PMTK – Probabilistic Modeling Toolkit. I actually have no idea whether it does model specification-driven inference, but the author’s previous similar-looking toolkit (BNT) is fairly well-known, so it’s in this list. Written in Matlab. From Kevin Murphy.

  • HBC – Hierarchical Bayesian Compiler. Similar idea as BUGS, though see webpage for a clear statement of its somewhat different goals. It compiles the Gibbs sampler to C, so it’s much faster. Seems to be unmaintained. Written in Haskell. From Hal Daume.

Finally, there are a few systems that seem to be more specialized. I certainly haven’t listed all of them; see the Factorie paper for a list of a few others.

  • Alchemy – an implementation of the Markov Logic Network formalism, an undirected graphical model over log-linear-weighted first-order logic. So, unlike BUGS and the above systems, there are no customized probability distributions for anything; everything is a Boltzmann (log-linear) distribution. At least, that’s how I understood it from the original paper. The FOL is essentially a language the user uses to define log-linear features. Alchemy then runs training algorithms to fit the their weights to data.

    From Pedros Domingos’ group at UWashington. Written in C++. I’ve heard people complain that Alchemy is too slow. But in fairness, all these systems are slower than customized implementations.

  • Dyna is specialized for dynamic programming. The formalism is weighted Horn clauses (weighted Prolog). Implements agenda-based training/inference algorithms that generalize Baum-Welch, PCFG chart parsers, and the like. Written in C++, compiles to C++. Seems unmaintained. From Jason Eisner’s group at John Hopkins.

    Since it only does dynamic programs, Dyna usefully supports a much more limited set of models than the above systems. But I expect that means it can train and infer with models that the above would be hopeless to handle, since dynamic programming gives you big-O efficiency gains over more general algorithms. (But on the other hand, even dynamic programming can be too generic and slow compared to direct, customized implementations. That’s the danger of all these systems, of course.)

  • BLOG – first-order logic with probability, though a fairly different formalism than MLNs. Focuses on problems with unknown and unknown numbers of objects. I personally don’t understand the use case very well. Its name stands for “Bayesian logic,” which seems like an unfairly broad characterization given all the other work in this area. From Brian Milch. Seems unmaintained? Written in Java.

An interesting axis of variation of all these is whether the model specification language is Turing-complete or not, and to what extent training and inference can be combined with external code.

  • Turing-complete: Factorie, Infer.NET, Church, and Dyna are all Turing complete. The modeling languages of the first three are embedded in general procedural programming languages (Scala, C#, and Scheme respectively). Dyna is Turing complete in two different ways: it has a complete Prolog-ish engine, which is technically Turing complete but is gonna be a pain to do anything normal in (I simply mean, since Prolog is technically Turing-complete but a total pain to do anything non-Prolog-y in); but also, it compiles to C++.
  • Not Turing-complete: BUGS, HBC, Alchemy/MLN, and BLOG use specialized mini-languages. BUGS’ and HBC’s languages are essentially the same as standard probabilistic model notation, though BUGS is imperative. Alchemy and BLOG are logic variants.
  • Compiles to Turing-complete: HBC compiles to C, and Dyna compiles to C++, which are then intended to be hacked up and/or embedded in larger programs. I imagine this is a maintainability nightmare, but could be fine for one-off projects.

Another interesting variation is to what extent the systems handle probabilistic relations. BUGS and HBC don’t really try at all beyond plates; Alchemy, BLOG, and Factorie basically specialize in this; Dyna kind of does in a way; and the rest I can’t tell.

In summary, lots of interesting variation here. Given how many of these things are new and changing, this area will probably look much different in a few years.

via "Papers" via Brandon in Google Reader by Rickert, J., Riehle, A., Aertsen, A., Rotter, S., Nawrot, M. P. on 11/4/09

When we perform a skilled movement such as reaching for an object, we can make use of prior information, for example about the location of the object in space. This helps us to prepare the movement, and we gain improved accuracy and speed during movement execution. Here, we investigate how prior information affects the motor cortical representation of movements during preparation and execution. We trained two monkeys in a delayed reaching task and provided a varying degree of prior information about the final target location. We decoded movement direction from multiple single-unit activity recorded from M1 (primary motor cortex) in one monkey and from PMd (dorsal premotor cortex) in a second monkey. Our results demonstrate that motor cortical cells in both areas exhibit individual encoding characteristics that change dynamically in time and dependent on prior information. On the population level, the information about movement direction is at any point in time accurately represented in a neuronal ensemble of time-varying composition. We conclude that movement representation in the motor cortex is not a static one, but one in which neurons dynamically allocate their computational resources to meet the demands defined by the movement task and the context of the movement. Consequently, we find that the decoding accuracy decreases if the precise task time, or the previous information that was available to the monkey, were disregarded in the decoding process. An optimal strategy for the readout of movement parameters from motor cortex should therefore take into account time and contextual parameters.

via FZ Blogs by Emre Sevinc on 9/6/09



Well, what he actually says is the ‘phase transition’ in computer science. Two things make that possible: 1- too much data and 2- processing speed.

One of the nicest example he gives is that a learning algorithm X is the best with 1 million examples and another algorithm Y comes at the third rank but when the same algorithms are run on a data set of 1 billion examples then Y becomes the best one.

Another good examples: Scene completion example where the algorithm did not provide meaningful results with 10.000 images and researchers kept on trying with 100.000 images, again no good results, then with 1.000.000 images, again no results but then with 10.000.000 images it worked very well! So there’s some kind of phase transition – or a quantum leap – is going on here. The situation is similar to Google Image Search where they were trying to find the canonical images, e.g. the image that best represents ‘Mona Lisa’ and not some variation of it. By taking pairs of images, doing a feature comparison, calculating a distance and arranging data as graph and running a pagerank-like algorithm on the graph they were able to find the images that represent the given set of keywords best.

It is always fun and revealing to listen to Norvig. If you are interested in cutting edge research in machine learning, pattern recognition and machine translation I recommend this video enthusiastically. Especially the parts where Norvig shows some single page Python source code for word segmentation and typo checking programs (first is about %97 correct, running on a laptop with a data set of about 1.7 billion words, the second is about %75 correct, again running on a not-very-high-end laptop). He also mentions MapReduce programming paradigm and some wrong claims about the model, showing how it helps to do parallel programming for very large amounts of data.

via www.decisionstats.com on 8/10/09
Erk Subasi:
 
R language on the GPU