TinySVM: Support Vector Machines

Last update: $Date: 2002/08/20 06:14:22 $

TinySVM

TinySVM is an implementation of Support Vector Machines (SVMs) [Vapnik 95], [Vapnik 98] for the problem of pattern recognition. Support Vector Machines is a new generation learning algorithms based on recent advances in statistical learning theory, and applied to large number of real-world applications, such as text categorization, hand-written character recognition.

List of Contents

What's new
Features
libtinysvm Reference Manual
Using included tools
Download
Install
Language Bindings
Questions and Bug Reports
TODO
References
Related Link

What's new

2002-08-20: TinySVM 0.09 Released
- Fix bug in -W option
- Support Intel C++ compiler for Linux 6.0 (e.g. %env CC=icc CXX=icc ./configure --disable-shared; make )
- Support Borland C++ (use src/Makefile.bcc32)
- Support Microsoft Visual Studio .NET/C++ (use src/Makefile.msvc)
- Change extention of source files from .cc to .cpp
2002-03-08
- Support Mac OS X
2002-01-05
- Fix fatal bug in cache.h. (Thanks to Hideki Isozaki)
2001-12-07
- Support One-Class-SVM, (experimental) use -s option.
2001-09-03
- Support RBF, Neural, and ANOVA kernels
- Script bindings for perl/ruby are rewritten with SWIG
- python and java interfaces are available
2001-08-25: 0.04
- Fix memory leak bug in Claassify::classify().
- Implement a simple Garbage Collector (reference count) to handle training data effectively.
- The following new options are added:
  - -I: svm_learn creates the optional file (MODEL.idx) which lists up alpha and G (gradient) of all training examples. Each line of MODEL.idx corresponds to the each training instance.
  - -M model_file: carry out incremental training with model_file and model_file.idx as a initial condition. You can reduce computational cost with this option.
```
Sample:
% svm_learn -I train model
% ls
model model.idx
% cat new_train >> train
% svm_learn -M model train model2
```
  - -W : When linear kernel is used, single vector w (w * x + b) is directly obtained instead of lists of alpha.
- The following new API functions are added:
  - BaseExample::readSVindex()
  - BaseExample::writeSVindex()
  - BaseExample::set()
  - Example::rebuildSVindex()
  - Model::compress()
- Add new API functions to perl/ruby interface, each of which corresponds to the original C++ new API.
2001-06-29 0.03
- Delete -t option from svm_classify.
- Bug fix in calculation of VC dimension.
2001-01-17 0.02
- Support Support Vector Regression
- Ruby module is available

Features

Support standard C-SVR and C-SVM.
Uses sparse vector representation
Can handle several ten-thousands of training examples, and hundred-thousands of feature dimension.
Fast optimization algorithms stemming from SVM_light,[Joachims/99a].
- Working set selection based on steepest feasible descent.
- "Shrinking" (an effective heuristics to reduce working sets dynamically)[Joachims/99a]
- Use LRU cache algorithm for store gram matrix.
Optimization for handling binary features, twice faster than SVM_light.
Provide well-known model selection criteria such as Leave-One-Out bound, VC dimension and Xi-Alpha estimation.
Written in C++ with OO style. Provides useful class libraries.
Provide Perl/Ruby/Python/Java module
Multi-platform, Unix/Windows

Using the included Tools

Format of training data

TinySVM accepts the same representation of training data as SVM_light uses. This format has potential to handle large sparse feature vectors. The format of training and test data file is:

(BNF-like representation)
<class> .=. +1 | -1
<feature> .=. integer (>=1)
<value> .=. real
<line> .=. <class> <feature>:<value><feature>:<value> ... <feature>:<value>

Example (SVM)
+1 201:1.2 3148:1.8 3983:1 4882:1
-1 874:0.3 3652:1.1 3963:1 6179:1
+1 1168:1.2 3318:1.2 3938:1.8 4481:1
+1 350:1 3082:1.5 3965:1 6122:0.2
-1 99:1 3057:1 3957:1 5838:0.3

In the case of SVR, you can give real value as class label

Example (SVR)
0.23 201:1.2 3148:1.8 3983:1 4882:1
0.33 874:0.3 3652:1.1 3963:1 6179:1
-0.12 1168:1.2 3318:1.2 3938:1.8 4481:1

svm_learn (learner)

"svm_learn" accepts two arguments --- file name of training data and model file in which the SVs and their weights (alpha) are stored after training.
Try --help option for finding out other options.

% svm_learn -t 1 -d 2 -c 1 train.svmdata model
TinySVM - tiny SVM package
Copyright (C) 2000 Taku Kudoh All rights reserved.

  1000 examples, cache size: 1524
....................   1000   1000 0.0165 1286/ 64.3%   1286/ 64.3%
............
Checking optimality of inactive variables  re-activated: 0

Done! 1627 iterations

Number of SVs (BSVs)            719 (4)
Empirical Risk:                 0.002 (2/1000)
L1 Loss:                        4.22975
CPU Time:                       0:00:01

% ls -al model
  -rw-r--r--    1 taku-ku  is-stude    26455 Dec  7 13:40

....................   1000  15865  2412 1.6001  33.2%  33.2%
....................   2000  15864  2412 1.3847  39.5%  36.4%

1st column: One "." means 50 iterations.
2nd column: Represents the total iterations processed
3rd column: Represents the size of current working set. It will become smaller during the shrinking process.
4th column: Represents the current cache size.
5th column: Max KKT violation value. If this value reaches termination-criterion (default 0.001), it means that the 1st stage of the learning process has completed.
7th column: Cache hit rate during last 1000 iterations.
7th column: Cache hit rate during all iterations.

svm_classify (simple classifier)

"svm_classify" accepts two arguments --- file name of test data and model file generated by svm_learn. "svm_classify" simply displays the accuracy of given test data. You can also employ interactive classification by giving "-" as file name of test example.
Try --help option for finding out other options.

% svm_classify test.svmdata model
Accuracy:   77.80000% (389/500)
Precision:  66.82927% (137/205)
Recall:     76.11111% (137/180)
System/Answer p/p p/n n/p n/n: 137 68 43 252

% svm_classify -V test.svmdata model
-1 -1.04404
-1 -1.26626
-1 -0.545195
.. snip
Accuracy:   77.80000% (389/500)

svm_model (model browser)

"svm_model" displays the estimated margin, VC dimension and number of SVs of given some model file.
Try --help option for finding out other options.

% svm_model model
File Name:                      model
Margin:                         0.181666
Number of SVs:                  719
Number of BSVs:                 4
Size of training data:          1000
L1 Loss (Empirical Risk):       4.16917
Estimated VC dimension:         728.219
Estimated xi-alpha(2):          573

Download

GNU Lesser General Public License

Source

TinySVM-0.09.tar.gz: HTTP

Binary/Source package for RedHat Linux

RedHat 6.x i386: HTTP
RedHat 6.x SRPMS: HTTP
RedHat 7.x i386: HTTP
RedHat 7.x SRPMS: HTTP

Binary package for MS-Windows

HTTP

CVS

% cvs -d :pserver:anonymous@chasen.aist-nara.ac.jp:/cvsroot login
CVS password:   # Just hit return/enter.
% cvs -d :pserver:anonymous@chasen.aist-nara.ac.jp:/cvsroot co TinySVM

Install

 % ./configure 
 % make
 % make check
 % su
 # make install

Language Bindings

Perl: see perl/README file.
Ruby: see ruby/README file
Python: see python/README file
Java: see java/README file

Questions and Bug Reports

TODO

MultiClass SVM (one-for-all-others, pairwise)
nu-SVM and nu-SVR [Schölkopf1998]
Transductive SVM [Vapnik 98],[Joachims 99]
Span SVM [Vapnik 2000]
Ordinal SVR [Herbrich 2000]
Provide some Wrapper class to handle STRING features.
Provide DLL (Dynamic Link Libraries) for Windows environment.
Provide an API for user customizable kernel function.

References

[Joachims 99a] T. Joachims, 11 in: Making large-Scale SVM Learning Practical. Advances in Kernel Methods pp.169-
[Vapnik 95] Vladimir N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1998.
[Vapnik 98] Vladimir N. Vapnik, The Statisitcal Learning Theory. Springer, 1998.
[Joachims 99c] T. Joachims, Transductive Inference for Text Classification using Support Vector Machines. International Conference on Machine Learning (ICML), 1999.
[Vapnik 2000] Vladimir N. Vapnik, 14 in: Bounds on Error Expectation for SVM. in Advances in Large Margin Classifiers. pp. 261-
[Herbrich 2000] Ralf Herbrich et al, 7 in: Large Margin Rank Boundaries for Ordinal Regression. in Advances in Large Margin Classifiers. pp. 116-
[Schölkopf 1998] B. Schölkopf, A. Smola, R. Williamson, and P. L. Bartlett. New support vector algorithms. NeuroCOLT Technical Report NC-TR-98-031, Royal Holloway College, University of London, UK, 1998. To appear in Neural Computation.