this repo has no description
1opam-version: "2.0"
2authors: "Francois Berenger"
3maintainer: "unixjunkie@sdf.org"
4homepage: "https://github.com/UnixJunkie/linwrap"
5bug-reports: "https://github.com/UnixJunkie/linwrap/issues"
6dev-repo: "git+https://github.com/UnixJunkie/linwrap.git"
7license: "BSD-3-Clause"
8build: ["dune" "build" "-p" name "-j" jobs]
9install: ["cp" "bin/ecfp6.py" "%{bin}%/linwrap_ecfp6.py"]
10depends: [
11 "base-unix"
12 "batteries" {>= "3.3.0"}
13 "bst"
14 "conf-liblinear-tools"
15 "cpm" {>= "11.0.0"}
16 "dokeysto" # possible perf. regr.: dokeysto_camltc -> dokeysto
17 "ocaml" {>= "5.0.0"} # because camltc not yet ready for ocaml>=5.0.0
18 "dolog" {>= "6.0.0"}
19 "dune" {>= "1.10"}
20 "minicli" {>= "5.0.0"}
21 "molenc"
22 "parany" {>= "11.0.0"}
23]
24# the software can compile and install without the depopts.
25# however, some tools and options will not work anymore at run-time
26depopts: [
27 "conf-gnuplot"
28 "conf-python-3"
29 "conf-rdkit"
30]
31synopsis: "Wrapper on top of liblinear-tools"
32description: """
33Linwrap can be used to train a L2-regularized logistic regression classifier
34or a linear Support Vector Regressor.
35You can optimize C (the L2 regularization parameter), w (the class weight)
36or k (the number of bags, i.e. use bagging).
37You can also find the optimal classification threshold using MCC maximization,
38use k-folds cross validation, parallelization, etc.
39In the regression case, you can only optimize C and epsilon.
40
41When using bagging, each model is trained on balanced bootstraps
42from the training set (one bootstrap for the positive class,
43one for the negative class).
44The size of the bootstrap is the size of the smallest (under-represented)
45class.
46
47usage: linwrap
48 -i <filename>: training set or DB to screen
49 [-o <filename>]: predictions output file
50 [-np <int>]: ncores
51 [-c <float>]: fix C
52 [-e <float>]: fix epsilon (for SVR);
53 (0 <= epsilon <= max_i(|y_i|))
54 [--iwn]: turn ON instance-wise-normalization
55 [-w <float>]: fix w1
56 [--no-plot]: no gnuplot
57 [-k <int>]: number of bags for bagging (default=off)
58 [{-n|--NxCV} <int>]: folds of cross validation
59 [--mcc-scan]: MCC scan for a trained model (requires n>1)
60 also requires (c, w, k) to be known
61 [-q]: quiet liblinear
62 [--seed <int>]: fix random seed
63 [-p <float>]: training set portion (in [0.0:1.0])
64 [-pr]: optimize PR_AUC (default=ROC_AUC)
65 [--pairs]: read from .AP files (atom pairs; will offset feat. indexes by 1)
66 [--train <train.liblin>]: training set (overrides -p)
67 [--valid <valid.liblin>]: validation set (overrides -p)
68 [--test <test.liblin>]: test set (overrides -p)
69 [{-l|--load} <filename>]: prod. mode; use trained models
70 [{-s|--save} <filename>]: train. mode; save trained models
71 [-f]: force overwriting existing model file
72 [--scan-c]: scan for best C
73 [--scan-e <int>]: epsilon scan #steps for SVR
74 [--regr]: regression (SVR); also, implied by -e and --scan-e
75 [--scan-w]: scan weight to counter class imbalance
76 [--w-range <float>:<int>:<float>]: specific range for w
77 (semantic=start:nsteps:stop)
78 [--e-range <float>:<int>:<float>]: specific range for e
79 (semantic=start:nsteps:stop)
80 [--c-range <float,float,...>] explicit scan range for C
81 (example='0.01,0.02,0.03')
82 [--k-range <int,int,...>] explicit scan range for k
83 (example='1,2,3,5,10')
84 [--scan-k]: scan number of bags (advice: optim. k rather than w)
85 [--dump-AD <filename>]: dump AD points to file
86 (also requires --regr, --pairs and n>1)
87"""
88url {
89 src: "https://github.com/UnixJunkie/linwrap/archive/v9.2.0.tar.gz"
90 checksum: [
91 "sha256=93e4bb71116b5ba3bd0a4baa62ca6521c8b17ade0848299778e0f18ffbd6005a"
92 "md5=a61342684e0ba7db2757c7aa60c84744"
93 ]
94}