--- /dev/null
+++ b/README
@@ -0,0 +1,148 @@
+KlustaKwik version 2.01
+----------------------
+
+KlustaKwik is a program for unsupervised classification of multidimensional
+continuous data. It arose from a specific need - automatic sorting of neuronal
+action potential waveforms (see KD Harris et al, Journal of Neurophysiology
+84:401-414,2000), but works for any type of data. We needed a program that
+would:
+
+1) Fit a mixture of Gaussians with unconstrained covariance matrices
+2) Automatically choose the number of mixture components
+3) Be robust against noise
+4) Reduce the problem of local minima
+5) Run fast on large data sets (up to 100000 points, 48 dimensions)
+
+Speed in particular was essential. KlustaKwik is based on the CEM algorithm of
+Celeux and Govaert (which is faster than the standard EM algorithm), and also
+uses several tricks to improve execution speed while maintaining good
+performance. On our data, it runs at least 10 times faster than Autoclass.
+
+Cluster splitting and deletion
+------------------------------
+
+The main improvement in version 1.5 is a cluster splitting feature. KlustaKwik
+allows for a variable number of clusters to be fit, penalized by AIC. The
+program periodically checks if splitting any cluster would improve the overall
+score. It also checks to see if deleting any cluster and reallocating its
+points would improve overall score. The splitting and deletion features allow
+the program to often escape from local minima, reducing sensitivity to the
+initial number of clusters, and reducing the total number of starts needed for
+a data set.
+
+
+Compilation
+-----------
+
+The program is written in C++. To compile under unix, extract all files to a
+single directory and type make. That should be all you need to do. If it
+doesn't work, change the makefile to replace g++ with the name of your C++
+compiler.
+
+To check it compiled properly type "KlustaKwik test 1 -MinClusters 2" to run
+the program on the supplied test file.
+
+Usage
+-----
+
+The program takes a "feature file" as input, and produces two output files, the
+"cluster file", and a log file. The file formats and conventions may seem
+slightly strange. This is for historical reasons. If you want to change the
+code, go ahead, this is open source software.
+
+The feature file should have a name like FILE.fet.n, where FILE is any string,
+and n is a number. The program is invoked by running "KlustaKwik FILE n", and
+will create a cluster file FILE.clu.n and a log file FILE.klg.n. The number n
+doesn't serve any purpose other than to let you have several files with the same
+file base.
+
+The first line of the feature file should be the number of input dimensions.
+The following lines are the data, with each line being one data instance,
+consisting of a list of numbers separated by spaces. An example file test.fet.1
+is provided.
+
+The first line of the cluster file will be the number of classes that the
+program chose. The following lines will be the classes asigned to the data
+points. Class 1 is a "noise cluster" modelled by a uniform distribution, which
+should contain outliers, if there are any.
+
+
+Parameters
+----------
+
+It is possible to pass the program parameters by running "KlustaKwik FILE n
+params" etc. All parameters have default values. Here are the parameters you can
+use:
+
+-help
+Prints a short message and then the default parameter values.
+
+-MinClusters n (default 20)
+The random intial assignment will have no less than n clusters. The final
+number may be different, since clusters can be split or deleted during the
+course of the algorithm
+
+-MaxClusters n (default 30)
+The random intial assignment will have no more than n clusters.
+
+-nStarts n (default 1)
+The algorithm will be started n times for each inital cluster count between
+MinClusters and MaxClusters.
+
+-SplitEvery n (default 50)
+Test to see if any clusters should be split every n steps. 0 means don't split.
+
+-MaxPossibleClusters n (default 100)
+Cluster splitting can produce no more than n clusters.
+
+-RandomSeed n (default 1)
+Specifies a seed for the random number generator
+
+-UseFeatures STRING (default 11111111111100001)
+Specifies a subset of the input features to use. STRING should consist of 1s
+and 0s with a 1 indicating to use the feature and a 0 to leave it out. NB The
+default value for this parameter is 11111111111100001 (because this is what we
+use in the lab) - so if you have more than 12 dimensions you will need to change
+it.
+
+-StartCluFile STRING (default "")
+Treats the specified cluster file as a "gold standard". If it can't find a
+better cluster assignment, it will output this.
+
+-DistThresh d (default 6.907755)
+Time-saving paramter. If a point has log likelihood more than d worse for a
+given class than for the best class, the log likelihood for that class is not
+recalculated. This saves an awful lot of time.
+
+-FullStepEvery n (default 10)
+All log-likelihoods are recalculated every n steps (see DistThresh)
+
+-ChangedThresh f (default 0.05)
+All log-likelihoods are recalculated if the fraction of instances changing class
+exeeds f (see DistThresh)
+
+-MaxIter n (default 500)
+Don't try more than n iterations from any starting point.
+
+-Log (default 1)
+
+Produces .klg log file (default is yes, to switch off do -Log 0)
+
+-Screen (default 1)
+
+Produces parameters and progress information on the console. Set to 0 to suppress
+output in batches.
+
+-Debug (default 0)
+Miscellaneous debugging information (not recommended)
+
+-DistDump (default 0)
+Outputs a ridiculous amount of debugging information (definately not recommended).
+
+
+Contact Information
+-------------------
+
+This program is copyright Ken Harris (harris@axon.rutgers.edu), 2000-2002. It
+is distributed under the GNU General Public License (www.gnu.org). If you make
+any changes or improvements, please let me know.
--- /dev/null
+++ b/test.fet.1
@@ -0,0 +1,202 @@
+2
+-4326 -1834
+-2437 -3718
+-3642 -2409
+-2392 -3417
+-2483 -3470
+-1751 -4523
+-4094 -1892
+-3774 -2010
+-2635 -3306
+-4117 -1770
+-3669 -2095
+-3085 -2993
+-3290 -2744
+-2238 -3799
+-3704 -2294
+-2491 -3533
+-3597 -2386
+-3966 -1797
+-1339 -4910
+-3095 -3061
+-3162 -2953
+-3620 -2456
+-3407 -2760
+-1948 -4340
+-3721 -2314
+-3898 -2204
+-3407 -2588
+-4588 -1501
+-2688 -3253
+-3666 -2507
+-761 -5480
+-2184 -3818
+-4029 -1732
+-995 -5200
+-2979 -3036
+-3643 -2197
+-3755 -2309
+-2870 -2956
+-3072 -2963
+-2109 -3610
+-2920 -3521
+-2860 -3409
+-4234 -1824
+-3813 -2090
+-3447 -2357
+-1362 -4430
+-4773 -973
+-4041 -1688
+-3409 -2426
+-3256 -2679
+-3367 -2793
+-4368 -1488
+-503 -5354
+-1968 -4362
+-4979 -1032
+-3115 -2816
+-1196 -4717
+-2486 -3729
+-2642 -3450
+-2460 -3424
+-3120 -2823
+-3965 -2088
+-2232 -3793
+-665 -5335
+-4442 -1923
+-2697 -3232
+-2417 -3317
+-1995 -4416
+-2891 -3090
+-2306 -3696
+-890 -4959
+-2857 -3257
+-4396 -1656
+-4724 -1194
+-3795 -2216
+-2349 -3429
+-2352 -3380
+-2216 -3863
+-3392 -2511
+-4628 -1471
+-1961 -3789
+-2783 -3583
+-3486 -2652
+-2084 -3307
+-2361 -3520
+-3568 -2269
+-2428 -3390
+-2731 -3322
+-3008 -3067
+-5142 -808
+-4021 -1913
+-3600 -2423
+-1879 -4507
+-2902 -2907
+-2790 -3325
+-3749 -2061
+-4278 -1728
+-2407 -3597
+-2347 -3766
+-3671 -2647
+2918 4029
+3185 4445
+2637 3126
+3162 3983
+3033 4518
+2898 4001
+3118 3818
+2854 1385
+2818 4377
+2957 2300
+3094 3119
+3040 2649
+3079 2914
+3347 1310
+3232 2064
+3383 2916
+3527 3780
+2667 4199
+2860 2151
+2995 3161
+3298 4057
+3163 2902
+3318 3694
+3107 1996
+2853 2448
+2927 4169
+2954 1052
+2893 2598
+2939 3064
+2993 1013
+2792 3996
+3211 3226
+3076 3885
+2943 2748
+2928 3930
+2953 3012
+3039 1962
+3140 2110
+2991 3878
+2930 3650
+2873 3107
+2897 2983
+3107 2813
+3223 2366
+3246 4391
+2869 3684
+2706 1623
+3263 1425
+3007 3931
+3244 3060
+3142 2632
+3218 3530
+3058 1120
+2879 2784
+2285 3624
+2871 921
+2981 3129
+2725 2852
+2884 1657
+2891 1722
+3089 4797
+2984 1936
+3443 3679
+3165 3726
+2875 4545
+2865 2137
+3115 2169
+3012 1031
+3148 1722
+3142 2500
+2830 3383
+3084 3545
+3120 2423
+2765 2456
+2984 1631
+2981 3797
+2407 3704
+2885 3240
+3189 4081
+2653 3172
+2993 5084
+2940 3365
+2891 4528
+2677 4228
+3044 5899
+3124 426
+3213 1975
+2929 4583
+3164 4701
+3100 2755
+2951 -356
+3174 2629
+3129 4391
+2965 2793
+2527 4671
+3327 4180
+3187 2113
+3142 1422
+2904 3945
+2909 4102
+0 0
--- /dev/null
+++ b/test_res.clu.1
@@ -0,0 +1,202 @@
+3
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+1