···
2
+
authors: "Francois Berenger"
3
+
maintainer: "unixjunkie@sdf.org"
4
+
homepage: "https://github.com/UnixJunkie/linwrap"
5
+
bug-reports: "https://github.com/UnixJunkie/linwrap/issues"
6
+
dev-repo: "git+https://github.com/UnixJunkie/linwrap.git"
7
+
license: "BSD-3-Clause"
8
+
build: ["dune" "build" "-p" name "-j" jobs]
9
+
install: ["cp" "bin/ecfp6.py" "%{bin}%/linwrap_ecfp6.py"]
12
+
"batteries" {>= "3.3.0"}
14
+
"conf-liblinear-tools"
16
+
"dokeysto" # possible perf. regr.: dokeysto_camltc -> dokeysto
17
+
"ocaml" {>= "5.0.0"} # because camltc not yet ready for ocaml>=5.0.0
18
+
"dolog" {>= "6.0.0"}
20
+
"minicli" {>= "5.0.0"}
22
+
"parany" {>= "11.0.0"}
24
+
# the software can compile and install without the depopts.
25
+
# however, some tools and options will not work anymore at run-time
31
+
synopsis: "Wrapper on top of liblinear-tools"
33
+
Linwrap can be used to train a L2-regularized logistic regression classifier
34
+
or a linear Support Vector Regressor.
35
+
You can optimize C (the L2 regularization parameter), w (the class weight)
36
+
or k (the number of bags, i.e. use bagging).
37
+
You can also find the optimal classification threshold using MCC maximization,
38
+
use k-folds cross validation, parallelization, etc.
39
+
In the regression case, you can only optimize C and epsilon.
41
+
When using bagging, each model is trained on balanced bootstraps
42
+
from the training set (one bootstrap for the positive class,
43
+
one for the negative class).
44
+
The size of the bootstrap is the size of the smallest (under-represented)
48
+
-i <filename>: training set or DB to screen
49
+
[-o <filename>]: predictions output file
52
+
[-e <float>]: fix epsilon (for SVR);
53
+
(0 <= epsilon <= max_i(|y_i|))
54
+
[-w <float>]: fix w1
55
+
[--no-plot]: no gnuplot
56
+
[-k <int>]: number of bags for bagging (default=off)
57
+
[{-n|--NxCV} <int>]: folds of cross validation
58
+
[--mcc-scan]: MCC scan for a trained model (requires n>1)
59
+
also requires (c, w, k) to be known
60
+
[--seed <int>]: fix random seed
61
+
[-p <float>]: training set portion (in [0.0:1.0])
62
+
[--pairs]: read from .AP files (atom pairs; will offset feat. indexes by 1)
63
+
[--train <train.liblin>]: training set (overrides -p)
64
+
[--valid <valid.liblin>]: validation set (overrides -p)
65
+
[--test <test.liblin>]: test set (overrides -p)
66
+
[{-l|--load} <filename>]: prod. mode; use trained models
67
+
[{-s|--save} <filename>]: train. mode; save trained models
68
+
[-f]: force overwriting existing model file
69
+
[--scan-c]: scan for best C
70
+
[--scan-e <int>]: epsilon scan #steps for SVR
71
+
[--regr]: regression (SVR); also, implied by -e and --scan-e
72
+
[--scan-w]: scan weight to counter class imbalance
73
+
[--w-range <float>:<int>:<float>]: specific range for w
74
+
(semantic=start:nsteps:stop)
75
+
[--e-range <float>:<int>:<float>]: specific range for e
76
+
(semantic=start:nsteps:stop)
77
+
[--c-range <float,float,...>] explicit scan range for C
78
+
(example='0.01,0.02,0.03')
79
+
[--k-range <int,int,...>] explicit scan range for k
80
+
(example='1,2,3,5,10')
81
+
[--scan-k]: scan number of bags (advice: optim. k rather than w)
84
+
src: "https://github.com/UnixJunkie/linwrap/archive/v9.1.5.tar.gz"
85
+
checksum: "md5=f59e8b0452a5bb33f0fe239e524b5b40"