Thursday, December 15, 2005

 

JSR-000247 Data Mining 2.0 - Early Draft Review

JSR-000247 Data Mining 2.0 - Early Draft Review

The first release of Java Data Mining (JSR-73) has been available for over a year and has
seen several commercially available implementations as well as being used by companies
deploying data mining functionality internally. We have also seen interest from the academic
realm. For the first release, the expert group was cautious not to over extend our
reach in an effort to produce version 1. However, we realized there were still many areas
in data mining that deserved attention.
This early review draft, as specified per the Java Community Process 2.6, provides a
broader reader audience the opportunity to provide feedback to the expert group on the
evolving JSR-247 supporting JDM 2.0.
The various enhancements to the standard include:
Transformations - a much requested and difficult subject for data mining in general.
JDM 2.0 puts in place a general framework for performed commonly used transformations
as well open-ended transformations through the use of language-specific expressions.
Time Series - this mining function expands the mining functions supported by JDM and
provides an important capability for supporting forecasting and series data analysis.
Apply for Association - this completes the association mining function making the prediction
of cross-sell items easier.
Multi-record real-time scoring - enable scoring of multiple records in the record apply
task as a performance optimization for applications.
Multi-target models - enable the specification of multiple targets for supervised models
as a model performance and representation optimization. This also enables a performance
optimization for processing common predictor data more efficiently.
Multivariate statistics - provides the ability to conveniently compute multivariate statistics
such as the F and T tests, K-S and M-W tests, among others. This provides an extensible
framework for additional statistics. As with univariate statistics, models that produce
multivariate statistics as a by-product can associate these with the model itself.
Text Mining - this is an initial extension for supporting text mining in JDM. It allows vendors
to automate the term extraction process for users wanting to include unstructured text
data in the building of their data mining models.
Task dependencies and scheduling - this extension allows programs to set up multiple
tasks, where one depends on another, for automatic sequential execution without control
from the client application. In addition, tasks can be scheduled to commence at some
future time.
Anomaly detection - this mining function expands the mining functions supported by
JDM and provides an important capability for supporting the detection of unusual events.

This page is powered by Blogger. Isn't yours?