Stanford researchers have developed a patented algorithm for general supervised learning.

About

Summary Stanford researchers have developed a patented algorithm for general supervised learning. Its initialization requires a learning sample, with features and outcome given (missing values are allowed in predictors, but not in the outcomes). Once a decision tree is built, then its application is to cases with only features but not outcome given. It is particularly applicable to complex data sets where multiple factors, especially SNPs (single nucleotide polymorphisms) in a genetic scenario, combine to determine outcome. Those factors have complicated and influential interactions but may have insignificant individual contributions.  Such cases are common in the real world, one good example being the association between features such as genetic and environmental risk factors on the one hand, and complex disease on the other. Traditional approaches focusing on individual effects have proven difficult to apply in this case. The FlexTree approach, on the contrary, treats all risk factors together. It considers suitably chosen interactions and main effects simultaneously. The technique stems from well-known binary classification tree methods. It uses the tree as framework and employs penalized linear regression on suitably transformed features to define a partitioning rule. This approach allows consideration of optimally chosen complicated interactions, and also enables simple, easy interpretation. Its predictive power and robustness are improved by the variable selection procedure embedded in the algorithm. FlexTree demonstrates substantial improvement in performance over several other cutting-edge technologies in some applications.    Stage of Research FlexTree has been successfully applied to finding genetic and environmental interactions that predispose Chinese women to hypertension. The technology is also being utilized to find genotypes and various interactions that are predictive of (a subset of) cardiovascular disease.   Applications Data mining Life sciences research - data analysis for: SNPs bioinformatics mass spectroscopy defining risk groups statistics    Advantages Powerful: considers both interactions and combined effects simultaneously provides a simple easy-to-interpret model Robust  

Register for free for full unlimited access to all innovation profiles on LEO

  • Discover articles from some of the world’s brightest minds, or share your thoughts and add one yourself
  • Connect with like-minded individuals and forge valuable relationships and collaboration partners
  • Innovate together, promote your expertise, or showcase your innovations