|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.mahout.clustering.kmeans.KMeansClusterer
public class KMeansClusterer
This class implements the k-means clustering algorithm. It uses Cluster as a cluster
representation. The class can be used as part of a clustering job to be started as map/reduce job.
| Constructor Summary | |
|---|---|
KMeansClusterer(DistanceMeasure measure)
Init the k-means clusterer with the distance measure to use for comparison. |
|
| Method Summary | |
|---|---|
protected void |
addPointToNearestCluster(Vector point,
Iterable<Cluster> clusters)
Sequential implementation to add point to the nearest cluster |
static List<List<Cluster>> |
clusterPoints(Iterable<Vector> points,
List<Cluster> clusters,
DistanceMeasure measure,
int maxIter,
double distanceThreshold)
This is the reference k-means implementation. |
boolean |
computeConvergence(Cluster cluster,
double distanceThreshold)
|
void |
emitPointToNearestCluster(Vector point,
Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
Iterates over all clusters and identifies the one closes to the given point. |
protected void |
emitPointToNearestCluster(Vector point,
Iterable<Cluster> clusters,
org.apache.hadoop.io.SequenceFile.Writer writer)
Iterates over all clusters and identifies the one closes to the given point. |
void |
outputPointWithClusterInfo(Vector vector,
Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
|
protected static boolean |
runKMeansIteration(Iterable<Vector> points,
Iterable<Cluster> clusters,
DistanceMeasure measure,
double distanceThreshold)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed. |
protected boolean |
testConvergence(Iterable<Cluster> clusters,
double distanceThreshold)
Sequential implementation to test convergence and update cluster centers |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public KMeansClusterer(DistanceMeasure measure)
measure - The distance measure to use for comparing clusters against points.| Method Detail |
|---|
public void emitPointToNearestCluster(Vector point,
Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
point - a point to find a cluster for.clusters - a ListIOException
InterruptedException
protected void addPointToNearestCluster(Vector point,
Iterable<Cluster> clusters)
point - clusters -
protected boolean testConvergence(Iterable<Cluster> clusters,
double distanceThreshold)
public void outputPointWithClusterInfo(Vector vector,
Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
IOException
InterruptedException
protected void emitPointToNearestCluster(Vector point,
Iterable<Cluster> clusters,
org.apache.hadoop.io.SequenceFile.Writer writer)
throws IOException
point - a point to find a cluster for.clusters - a ListIOException
public static List<List<Cluster>> clusterPoints(Iterable<Vector> points,
List<Cluster> clusters,
DistanceMeasure measure,
int maxIter,
double distanceThreshold)
points - the input Listclusters - the Listmeasure - the DistanceMeasure to usemaxIter - the maximum number of iterations
protected static boolean runKMeansIteration(Iterable<Vector> points,
Iterable<Cluster> clusters,
DistanceMeasure measure,
double distanceThreshold)
points - the Listclusters - the Listmeasure - a DistanceMeasure to use
public boolean computeConvergence(Cluster cluster,
double distanceThreshold)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||