TinySVM: Support Vector Machines
Last update: $Date: 2002/08/20 06:14:22 $
TinySVM
TinySVM is an implementation of Support Vector Machines (SVMs) [Vapnik 95], [Vapnik 98]
for the problem of pattern recognition. Support Vector Machines is a new generation learning algorithms
based on recent advances in statistical learning theory, and applied to
large number of real-world applications, such as text categorization, hand-written
character recognition.
List of Contents
- 2002-08-20: TinySVM 0.09 Released
- Fix bug in -W option
- Support Intel C++ compiler for Linux 6.0 (e.g. %env CC=icc CXX=icc ./configure --disable-shared; make )
- Support Borland C++ (use src/Makefile.bcc32)
- Support Microsoft Visual Studio .NET/C++ (use src/Makefile.msvc)
- Change extention of source files from .cc to .cpp
- 2002-03-08
- 2002-01-05
- Fix fatal bug in cache.h. (Thanks to Hideki Isozaki)
- 2001-12-07
- Support One-Class-SVM, (experimental) use -s option.
- 2001-09-03
- Support RBF, Neural, and ANOVA kernels
- Script bindings for perl/ruby are rewritten with SWIG
- python and java interfaces are available
- 2001-08-25: 0.04
- Fix memory leak bug in Claassify::classify().
- Implement a simple Garbage Collector (reference count) to handle training data effectively.
- The following new options are added:
- The following new API functions are added:
- BaseExample::readSVindex()
- BaseExample::writeSVindex()
- BaseExample::set()
- Example::rebuildSVindex()
- Model::compress()
- Add new API functions to perl/ruby interface, each of which
corresponds to the original C++ new API.
- 2001-06-29 0.03
- Delete -t option from svm_classify.
- Bug fix in calculation of VC dimension.
- 2001-01-17 0.02
- Support Support Vector Regression
- Ruby module is available
- Support standard C-SVR and C-SVM.
- Uses sparse vector representation
- Can handle several ten-thousands of training examples, and
hundred-thousands of feature dimension.
- Fast optimization algorithms stemming from SVM_light,[Joachims/99a].
- Working set selection based on steepest feasible descent.
- "Shrinking" (an effective heuristics to reduce working sets
dynamically)[Joachims/99a]
- Use LRU cache algorithm for store gram matrix.
- Optimization for handling binary features, twice faster than SVM_light.
- Provide well-known model selection criteria such as Leave-One-Out
bound, VC dimension and Xi-Alpha estimation.
- Written in C++ with OO style. Provides useful class libraries.
- Provide Perl/Ruby/Python/Java module
- Multi-platform, Unix/Windows
Format of training data
TinySVM accepts the same representation of training data as SVM_light uses.
This format has potential to handle large sparse feature vectors.
The format of training and test data file is:
(BNF-like representation)
<class> .=. +1 | -1
<feature> .=. integer (>=1)
<value> .=. real
<line> .=. <class> <feature>:<value><feature>:<value> ... <feature>:<value>
Example (SVM)
+1 201:1.2 3148:1.8 3983:1 4882:1
-1 874:0.3 3652:1.1 3963:1 6179:1
+1 1168:1.2 3318:1.2 3938:1.8 4481:1
+1 350:1 3082:1.5 3965:1 6122:0.2
-1 99:1 3057:1 3957:1 5838:0.3
See tests/train.svmdata and tests/test.svmdata in the package.
In the case of SVR, you can give real value as class label
Example (SVR)
0.23 201:1.2 3148:1.8 3983:1 4882:1
0.33 874:0.3 3652:1.1 3963:1 6179:1
-0.12 1168:1.2 3318:1.2 3938:1.8 4481:1
See tests/train.svrdata in the package
"svm_learn" accepts two arguments --- file name of training data
and model file in which the SVs and their weights (alpha) are
stored after training.
Try --help option for finding out other options.
% svm_learn -t 1 -d 2 -c 1 train.svmdata model
TinySVM - tiny SVM package
Copyright (C) 2000 Taku Kudoh All rights reserved.
1000 examples, cache size: 1524
.................... 1000 1000 0.0165 1286/ 64.3% 1286/ 64.3%
............
Checking optimality of inactive variables re-activated: 0
Done! 1627 iterations
Number of SVs (BSVs) 719 (4)
Empirical Risk: 0.002 (2/1000)
L1 Loss: 4.22975
CPU Time: 0:00:01
% ls -al model
-rw-r--r-- 1 taku-ku is-stude 26455 Dec 7 13:40
TinySVM prints the following messages during the learning iterations,
.................... 1000 15865 2412 1.6001 33.2% 33.2%
.................... 2000 15864 2412 1.3847 39.5% 36.4%
- 1st column: One "." means 50 iterations.
- 2nd column: Represents the total iterations processed
- 3rd column: Represents the size of current working set. It will
become smaller during the shrinking process.
- 4th column: Represents the current cache size.
- 5th column: Max KKT violation value. If this value reaches
termination-criterion (default 0.001), it means that the 1st stage of
the learning process has completed.
- 7th column: Cache hit rate during last 1000 iterations.
- 7th column: Cache hit rate during all iterations.
"svm_classify" accepts two arguments --- file name of test data and model file generated by svm_learn.
"svm_classify" simply displays the accuracy of given test data.
You can also employ interactive classification by giving "-" as file name of test example.
Try --help option for finding out other options.
% svm_classify test.svmdata model
Accuracy: 77.80000% (389/500)
Precision: 66.82927% (137/205)
Recall: 76.11111% (137/180)
System/Answer p/p p/n n/p n/n: 137 68 43 252
% svm_classify -V test.svmdata model
-1 -1.04404
-1 -1.26626
-1 -0.545195
.. snip
Accuracy: 77.80000% (389/500)
"svm_model" displays the estimated margin, VC dimension and number of SVs
of given some model file.
Try --help option for finding out other options.
% svm_model model
File Name: model
Margin: 0.181666
Number of SVs: 719
Number of BSVs: 4
Size of training data: 1000
L1 Loss (Empirical Risk): 4.16917
Estimated VC dimension: 728.219
Estimated xi-alpha(2): 573
TinySVM is free software distributed under the GNU Lesser General Public License.
- TinySVM-0.09.tar.gz:
HTTP
- RedHat 6.x i386:
HTTP
- RedHat 6.x SRPMS:
HTTP
- RedHat 7.x i386:
HTTP
- RedHat 7.x SRPMS:
HTTP
Development of TinySVM uses CVS. So latest developing version is
available at CVS.
We are willing to welcome you to join CVS based development.
% cvs -d :pserver:anonymous@chasen.aist-nara.ac.jp:/cvsroot login
CVS password: # Just hit return/enter.
% cvs -d :pserver:anonymous@chasen.aist-nara.ac.jp:/cvsroot co TinySVM
% ./configure
% make
% make check
% su
# make install
You can change default install path by using --prefix option of
configure script.
Try --help option for finding out other options.
- Perl: see perl/README file.
- Ruby: see ruby/README file
- Python: see python/README file
- Java: see java/README file
If you find bugs or you have any questions
please contact me via email taku-ku@is.aist-nara.ac.jp.
(Japanese mail is also (more) acceptable.)
- MultiClass SVM (one-for-all-others, pairwise)
- nu-SVM and nu-SVR [Schölkopf1998]
- Transductive SVM [Vapnik 98],[Joachims 99]
- Span SVM [Vapnik 2000]
- Ordinal SVR [Herbrich 2000]
- Provide some Wrapper class to handle STRING features.
- Provide DLL (Dynamic Link Libraries) for Windows environment.
- Provide an API for user customizable kernel function.
- [Joachims 99a] T. Joachims,
11 in: Making large-Scale SVM Learning Practical. Advances in Kernel Methods pp.169-
- [Vapnik 95] Vladimir N. Vapnik,
The Nature of Statistical Learning Theory. Springer, 1998.
- [Vapnik 98] Vladimir N. Vapnik,
The Statisitcal Learning Theory. Springer, 1998.
- [Joachims 99c] T. Joachims,
Transductive Inference for Text Classification using Support Vector Machines.
International Conference on Machine Learning (ICML), 1999.
- [Vapnik 2000] Vladimir N. Vapnik,
14 in: Bounds on Error Expectation for SVM.
in Advances in Large Margin Classifiers. pp. 261-
- [Herbrich 2000] Ralf Herbrich et al,
7 in: Large Margin Rank Boundaries for Ordinal Regression.
in Advances in Large Margin Classifiers. pp. 116-
- [Schölkopf 1998] B. Schölkopf,
A. Smola,
R. Williamson, and P. L. Bartlett. New support vector algorithms.
NeuroCOLT Technical Report NC-TR-98-031, Royal Holloway College,
University of London, UK, 1998. To appear in Neural Computation.
$Id: index.html,v 1.26 2002/08/20 06:14:22 taku-ku Exp $;
taku-ku@is.aist-nara.ac.jp