CorelsClassifier

This class includes functionality for training and predicting using the CORELS algorithm.

class corels.CorelsClassifier(c=0.01, n_iter=10000, map_type='prefix', policy='lower_bound', verbosity=['rulelist'], ablation=0, max_card=2, min_support=0.01)[source]

Certifiably Optimal RulE ListS classifier.

This class implements the CORELS algorithm, designed to produce human-interpretable, optimal rulelists for binary feature data and binary classification. As an alternative to other tree based algorithms such as CART, CORELS provides a certificate of optimality for its rulelist given a training set, leveraging multiple algorithmic bounds to do so.

In order to use run the algorithm, create an instance of the CorelsClassifier class, providing any necessary parameters in its constructor, and then call fit to generate a rulelist. printrl prints the generated rulelist, while predict provides classification predictions for a separate test dataset with the same features. To determine the algorithm’s accuracy, run score on an evaluation dataset with labels. To save a generated rulelist to a file, call save. To load it back from the file, call load.

c

Regularization parameter. Higher values penalize longer rulelists.

Type:float, optional (default=0.01)
n_iter

Maximum number of nodes (rulelists) to search before exiting.

Type:int, optional (default=10000)
map_type

The type of prefix map to use. Supported maps are “none” for no map, “prefix” for a map that uses rule prefixes for keys, “captured” for a map with a prefix’s captured vector as keys.

Type:str, optional (default=”prefix”)
policy

The search policy for traversing the tree (i.e. the criterion with which to order nodes in the queue). Supported criteria are “bfs”, for breadth-first search; “curious”, which attempts to find the most promising node; “lower_bound” which is the objective function evaluated with that rulelist minus the default prediction error; “objective” for the objective function evaluated at that rulelist; and “dfs” for depth-first search.

Type:str, optional (default=”lower_bound”)
verbosity

The verbosity levels required. A list of strings, it can contain any subset of [“rulelist”, “rule”, “label”, “minor”, “samples”, “progress”, “mine”, “loud”]. An empty list ([]) indicates ‘silent’ mode.

  • “rulelist” prints the generated rulelist at the end.
  • “rule” prints a summary of each rule generated.
  • “label” prints a summary of the class labels.
  • “minor” prints a summary of the minority bound.
  • “samples” produces a complete dump of the rules, label, and/or minor data. You must also provide at least one of “rule”, “label”, or “minor” to specify which data you want to dump, or “loud” for all data. The “samples” option often spits out a lot of output.
  • “progress” prints periodic messages as corels runs.
  • “mine” prints debug information while mining rules, including each rule as it is generated.
  • “loud” is the equivalent of [“progress”, “label”, “rule”, “mine”, “minor”].
Type:list, optional (default=[“rulelist”])
ablation

Specifies addition parameters for the bounds used while searching. Accepted values are 0 (all bounds), 1 (no antecedent support bound), and 2 (no lookahead bound).

Type:int, optional (default=0)
max_card

Maximum cardinality allowed when mining rules. Can be any value greater than or equal to 1. For instance, a value of 2 would only allow rules that combine at most two features in their antecedents.

Type:int, optional (default=2)
min_support

The fraction of samples that a rule must capture in order to be used. 1 minus this value is also the maximum fraction of samples a rule can capture. Can be any value between 0.0 and 0.5.

Type:float, optional (default=0.01)

References

Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, and Cynthia Rudin. Learning Certifiably Optimal Rule Lists for Categorical Data. KDD 2017. Journal of Machine Learning Research, 2018; 19: 1-77. arXiv:1704.01701, 2017

Examples

>>> import numpy as np
>>> from corels import CorelsClassifier
>>> X = np.array([ [1, 0, 1], [0, 1, 0], [1, 1, 1] ])
>>> y = np.array([ 1, 0, 1])
>>> c = CorelsClassifier(verbosity=[])
>>> c.fit(X, y)
...
>>> print(c.predict(X))
[ True False  True ]
fit(X, y, features=[], prediction_name='prediction')[source]

Build a CORELS classifier from the training set (X, y).

Parameters:
  • X (array-like, shape = [n_samples, n_features]) – The training input samples. All features must be binary, and the matrix is internally converted to dtype=np.uint8.
  • y (array-line, shape = [n_samples]) – The target values for the training input. Must be binary.
  • features (list, optional(default=[])) – A list of strings of length n_features. Specifies the names of each of the features. If an empty list is provided, the feature names are set to the default of [“feature1”, “feature2”… ].
  • prediction_name (string, optional(default="prediction")) – The name of the feature that is being predicted.
Returns:

self

Return type:

obj

get_params()[source]

Get a list of all the model’s parameters.

Returns:params – Dictionary of all parameters, with the names of the parameters as the keys
Return type:dict
load(fname)[source]

Load a model from a file, using python’s pickle module.

Parameters:fname (string) – File name to load the model from
Returns:self
Return type:obj
predict(X)[source]

Predict classifications of the input samples X.

Parameters:X (array-like, shape = [n_samples, n_features]) – The training input samples. All features must be binary, and the matrix is internally converted to dtype=np.uint8. The features must be the same as those of the data used to train the model.
Returns:p – The classifications of the input samples.
Return type:array of shape = [n_samples]
rl(set_val=None)[source]

Return or set the learned rulelist

Parameters:set_val (RuleList, optional) – Rulelist to set the model to
Returns:rl – The model’s rulelist
Return type:obj
save(fname)[source]

Save the model to a file, using python’s pickle module.

Parameters:fname (string) – File name to store the model in
Returns:self
Return type:obj
score(X, y)[source]

Score the algorithm on the input samples X with the labels y. Alternatively, score the predictions X against the labels y (where X has been generated by predict or something similar).

Parameters:
  • X (array-like, shape = [n_samples, n_features] OR shape = [n_samples]) – The input samples, or the sample predictions. All features must be binary.
  • y (array-like, shape = [n_samples]) – The input labels. All labels must be binary.
Returns:

a – The accuracy, from 0.0 to 1.0, of the rulelist predictions

Return type:

float

set_params(**params)[source]

Set model parameters. Takes an arbitrary number of keyword parameters, all of which must be valid parameter names (i.e. must be included in those returned by get_params).

Returns:self
Return type:obj