All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home

Class weka.filters.DiscretizeFilter

java.lang.Object
    |
    +----weka.filters.Filter
            |
            +----weka.filters.DiscretizeFilter

public class DiscretizeFilter
extends Filter
implements OptionHandler, WeightedInstancesHandler
An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization can be either by simple binning, or by Fayyad & Irani's MDL method (the default).

Valid filter-specific options are:

-B num
Specify the (maximum) number of bins to divide numeric attributes into. (default class-based discretisation).

-O
Optimizes the number of bins using a leave-one-out estimate of the entropy.

-R col1,col2-col4,...
Specify list of columns to Discretize. First and last are valid indexes. (default none)

-V
Invert matching sense.

-D
Make binary nominal attributes.

-E
Use better encoding of split point for MDL.

-K
Use Kononeko's MDL criterion.

Version:
$Revision: 1.15 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz)
Author:
Eibe Frank (eibe@cs.waikato.ac.nz) (Fayyad and Irani's method)

Constructor Index

 o DiscretizeFilter()
Constructor - initialises the filter

Method Index

 o attributeIndicesTipText()
Returns the tip text for this property
 o batchFinished()
Signifies that this batch of input to the filter is finished.
 o binsTipText()
Returns the tip text for this property
 o findNumBinsTipText()
Returns the tip text for this property
 o getAttributeIndices()
Gets the current range selection
 o getBins()
Gets the number of bins numeric attributes will be divided into
 o getCutPoints(int)
Gets the cut points for an attribute
 o getFindNumBins()
Get the value of FindNumBins.
 o getInvertSelection()
Gets whether the supplied columns are to be removed or kept
 o getMakeBinary()
Gets whether binary attributes should be made for discretized ones.
 o getOptions()
Gets the current settings of the filter.
 o getUseBetterEncoding()
Gets whether better encoding is to be used for MDL.
 o getUseKononenko()
Gets whether Kononenko's MDL criterion is to be used.
 o getUseMDL()
Gets whether MDL will be used as the discretisation method.
 o globalInfo()
Returns a string describing this filter
 o input(Instance)
Input an instance for filtering.
 o invertSelectionTipText()
Returns the tip text for this property
 o listOptions()
Gets an enumeration describing the available options
 o main(String[])
Main method for testing this class.
 o makeBinaryTipText()
Returns the tip text for this property
 o setAttributeIndices(String)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
 o setAttributeIndicesArray(int[])
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
 o setBins(int)
Sets the number of bins to divide each selected numeric attribute into
 o setFindNumBins(boolean)
Set the value of FindNumBins.
 o setInputFormat(Instances)
Sets the format of the input instances.
 o setInvertSelection(boolean)
Sets whether selected columns should be removed or kept.
 o setMakeBinary(boolean)
Sets whether binary attributes should be made for discretized ones.
 o setOptions(String[])
Parses the options for this object.
 o setUseBetterEncoding(boolean)
Sets whether better encoding is to be used for MDL.
 o setUseKononenko(boolean)
Sets whether Kononenko's MDL criterion is to be used.
 o setUseMDL(boolean)
Sets whether MDL will be used as the discretisation method.
 o useBetterEncodingTipText()
Returns the tip text for this property
 o useKononenkoTipText()
Returns the tip text for this property
 o useMDLTipText()
Returns the tip text for this property

Constructor Detail

 o DiscretizeFilter
public DiscretizeFilter()
          Constructor - initialises the filter

Method Detail

 o listOptions
public java.util.Enumeration listOptions()
          Gets an enumeration describing the available options
Returns:
an enumeration of all the available options
 o setOptions
public void setOptions(java.lang.String options[]) throws java.lang.Exception
          Parses the options for this object. Valid options are:

-B num
Specify the (maximum) number of equal-width bins to divide numeric attributes into. (default class-based discretization).

-O Optimizes the number of bins using a leave-one-out estimate of the entropy. -R col1,col2-col4,...
Specify list of columns to discretize. First and last are valid indexes. (default none)

-V
Invert matching sense.

-D
Make binary nominal attributes.

-E
Use better encoding of split point for MDL.

-K
Use Kononeko's MDL criterion.

Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported
 o getOptions
public java.lang.String[] getOptions()
          Gets the current settings of the filter.
Returns:
an array of strings suitable for passing to setOptions
 o setInputFormat
public boolean setInputFormat(Instances instanceInfo) throws java.lang.Exception
          Sets the format of the input instances.
Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true if the outputFormat may be collected immediately
Throws:
java.lang.Exception - if the input format can't be set successfully
Overrides:
setInputFormat in class Filter
 o input
public boolean input(Instance instance)
          Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.
Parameters:
instance - the input instance
Returns:
true if the filtered instance may now be collected with output().
Throws:
java.lang.IllegalStateException - if no input format has been defined.
Overrides:
input in class Filter
 o batchFinished
public boolean batchFinished()
          Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.
Returns:
true if there are instances pending output
Throws:
java.lang.IllegalStateException - if no input structure has been defined
Overrides:
batchFinished in class Filter
 o globalInfo
public java.lang.String globalInfo()
          Returns a string describing this filter
Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui
 o findNumBinsTipText
public java.lang.String findNumBinsTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getFindNumBins
public boolean getFindNumBins()
          Get the value of FindNumBins.
Returns:
Value of FindNumBins.
 o setFindNumBins
public void setFindNumBins(boolean newFindNumBins)
          Set the value of FindNumBins.
Parameters:
newFindNumBins - Value to assign to FindNumBins.
 o makeBinaryTipText
public java.lang.String makeBinaryTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getMakeBinary
public boolean getMakeBinary()
          Gets whether binary attributes should be made for discretized ones.
Returns:
true if attributes will be binarized
 o setMakeBinary
public void setMakeBinary(boolean makeBinary)
          Sets whether binary attributes should be made for discretized ones.
Parameters:
makeBinary - if binary attributes are to be made
 o useMDLTipText
public java.lang.String useMDLTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getUseMDL
public boolean getUseMDL()
          Gets whether MDL will be used as the discretisation method.
Returns:
true if so, false if fixed bins should be used.
 o setUseMDL
public void setUseMDL(boolean useMDL)
          Sets whether MDL will be used as the discretisation method.
Parameters:
useMDL - true if MDL should be used, false if fixed bins should be used.
 o useKononenkoTipText
public java.lang.String useKononenkoTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getUseKononenko
public boolean getUseKononenko()
          Gets whether Kononenko's MDL criterion is to be used.
Returns:
true if Kononenko's criterion will be used.
 o setUseKononenko
public void setUseKononenko(boolean useKon)
          Sets whether Kononenko's MDL criterion is to be used.
Parameters:
useKon - true if Kononenko's one is to be used
 o useBetterEncodingTipText
public java.lang.String useBetterEncodingTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getUseBetterEncoding
public boolean getUseBetterEncoding()
          Gets whether better encoding is to be used for MDL.
Returns:
true if the better MDL encoding will be used
 o setUseBetterEncoding
public void setUseBetterEncoding(boolean useBetterEncoding)
          Sets whether better encoding is to be used for MDL.
Parameters:
useBetterEncoding - true if better encoding to be used.
 o binsTipText
public java.lang.String binsTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getBins
public int getBins()
          Gets the number of bins numeric attributes will be divided into
Returns:
the number of bins.
 o setBins
public void setBins(int numBins)
          Sets the number of bins to divide each selected numeric attribute into
Parameters:
numBins - the number of bins
 o invertSelectionTipText
public java.lang.String invertSelectionTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getInvertSelection
public boolean getInvertSelection()
          Gets whether the supplied columns are to be removed or kept
Returns:
true if the supplied columns will be kept
 o setInvertSelection
public void setInvertSelection(boolean invert)
          Sets whether selected columns should be removed or kept. If true the selected columns are kept and unselected columns are deleted. If false selected columns are deleted and unselected columns are kept.
Parameters:
invert - the new invert setting
 o attributeIndicesTipText
public java.lang.String attributeIndicesTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o getAttributeIndices
public java.lang.String getAttributeIndices()
          Gets the current range selection
Returns:
a string containing a comma separated list of ranges
 o setAttributeIndices
public void setAttributeIndices(java.lang.String rangeList)
          Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
Parameters:
rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last
Throws:
java.lang.IllegalArgumentException - if an invalid range list is supplied
 o setAttributeIndicesArray
public void setAttributeIndicesArray(int attributes[])
          Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
Parameters:
attributes - an array containing indexes of attributes to Discretize. Since the array will typically come from a program, attributes are indexed from 0.
Throws:
java.lang.IllegalArgumentException - if an invalid set of ranges is supplied
 o getCutPoints
public double[] getCutPoints(int attributeIndex)
          Gets the cut points for an attribute
Parameters:
the - index (from 0) of the attribute to get the cut points of
Returns:
an array containing the cutpoints (or null if the attribute requested isn't being Discretized
 o main
public static void main(java.lang.String argv[])
          Main method for testing this class.
Parameters:
argv - should contain arguments to the filter: use -h for help

All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home