Methodological
Training in Statistical Data Mining,
September 7-9, 2009, Baden, Switzerland
Utilisation des plans d'expériences,
9-11 Septembre 2009, Yverdon-les-Bains, Suisse
Methodological
Training in Statistical Data Mining Related to Drug Development,
October 19-21, 2009, Basel, Switzerland
|
|
|
What is Data Mining?
`We are drowning in information but starved for knowledge.'
John Naisbitt
Data mining is the non-trivial process of identifying valid,
novel, potentially useful, and ultimately understandable patterns or
structures or models
or trends in data to make crucial decisions.
What is meant by these terms?
- `Non-trivial': it is not a
straightforward computation of predefined quantities like computing the
average value of a set of numbers.
- `Valid': the patterns hold in general, i.e. being valid on
new data in the face of uncertainty.
- `Novel': the patterns were not known beforehand.
- `Potentially useful': lead to some benefit to the user.
- `Understandable': the patterns are interpretable and comprehensible.
Is data mining `statistical déjà vu'?
Statistics is the science of learning from data or turning data into knowledge. If you want to know
more about what statistics is, please click
here.
Like statistical thinking and statistics, data mining is not only modelling and
prediction, nor a product that can be bought, but a
whole iterative problem solving cycle/process that must
be mastered through team effort.
`Coming together is a beginning. Keeping together is progress. Working together is success.'
Henry Ford
What distinguishes data mining from statistics?
Statistics traditionally is concerned
with analysing primary (e.g. experimental)
data that have been collected to check specific
hypotheses (ideas). As such statistics is
`primary data analysis', top-down (confirmatory) analysis or
`hypothesis evaluation or testing.
Data mining, on the other hand, typically is concerned with analysing
secondary (e.g. observational) data that have been collected for other
reasons. As such data mining is
`secondary data analysis', bottom-up (exploratory) analysis,
`hypothesis generation' or `knowledge discovery'.
The two approaches of learning from data or turning
data into knowledge are complementary.
- The information obtained from a bottom-up analysis, which identifies
important relations and tendencies, can not explain why these discoveries are useful
and to what extent they are valid.
The confirmatory tools of top-down analysis
can be used to confirm the discoveries and evaluate the quality of decisions based
on those discoveries.
- Performing a top-down analysis, we think
up possible explanations for the
observed behaviour and let those hypotheses dictate the data to be
analysed.
Then, performing a bottom-up analysis, we let the data suggest new hypotheses
to test.
Want to know more about the relation between data mining and statistics?
Check out our paper entitled `Is Data Mining for Gold
"Statistical déjà vu"?' or additional papers in our `Publications' section.
Want to know more about the relation between bioinformatics, data mining
and statistics? Click here, check out our paper
entitled `Challenges in
Bioinformatics for Statistical Data Miners' or additional papers
in our `Publications' section.
Interested in our data mining services?
Are you drowning in uncertainty and starving for
knowledge? Interested to get Statooed?
Have a question about our data mining services? Contact us to allow us
to help you.
|
|