hydrat development blog

Saturday, October 30, 2010

storage of task weights

Several interfaces have undergone some changes to support the storage of task weights.

Interfaces affected:
Store - new extend_Weights method, and the get_TaskSet and get_Task methods now support an optional list of weights to load.
Task - now has an associated dictionary of weightname-weight.
taskset_transform - When applying a transform, we first try to load any weights required from the store. After the transform has been applied, newly-computed weights are copied back into the store.
Transformer - transformers now all have a dictionary of weights associated with them. This interface doesn't feel very clean at the moment, and as such is likely to change soon.

Friday, October 29, 2010

key projection via browser_config

The hydrat browser is configured via browser_config. Key projection has been implemented in result_summary_table, so compound keys of the form key:subkey:subsubkey project out metadata equivalent to summary[key][subkey][subsubkey], projecting 'None' if any of the key lookups fail. This replaces the functionality originally provided by summary.ProjectMetadata .

Updated Stacking Metaclassifier

Since the reworking of hydrat to generalize 'CrossValidation' and 'TrainTest' into a DataSet-level declaration of a split, the Stacking metaclassifier had been unusable due to its use of the CrossValidation TaskSet subclass. This has now been fixed - Stacking now does a crossvalidation using the same machinery that is used to generate a crossvalidation taskset from a multi-fold split. The idea of crossvalidation has been hardcoded into the stacking metaclassifier, but it might be possible to generalize this further by implementing functionality similar to splits.

The new Stacking is also not sequence-aware. More work is required to determine exactly how sequence information should interact with the stacking. For example, the crossvalidation would need to respect sequence boundaries.

Friday, October 22, 2010

New interfaces to two machine learning libraries.

Today I added interfaces to FLANN and scikits.learn. They have been added to hydrat.classifier. FLANN provides a fast approximate-nearest-neighbor, and scikits provides bindings to LIBSVM, as well as implementations of a collection of ML algorithms. Examples of the usage of both have been added to examples/dummy_singleclass.py .

Installing FLANN 1.5 in Ubuntu 10.04 (Lucid)

FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms we found to work best for nearest neighbor search and a system for automatically choosing the best algorithm and optimum parameters depending on the dataset.

FLANN is written in C++ and contains bindings for the following languages: C, MATLAB and Python.

source: http://people.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

To install FLANN in Ubuntu Lucid, you will need the following packages:

cmake
libhdf5-serial-dev
python-numpy (for python bindings)
python-h5py

Obtain the FLANN-1.5 source package from the author's homepage.



wget http://people.cs.ubc.ca/~mariusm/uploads/FLANN/flann-1.5-src.zip

unzip flann-1.5-src.zip

cd flann-1.5

make

sudo make install # to install system-wide

Note that the default modules installation goes to /usr/local/python, so you may have to add that to your PYTHONPATH.

You should now be able to import pyflann.

First post!

Welcome to hydrat's development blog. I will work towards keeping this up to date with new ideas and developments in hydrat.