Machine Learning

The new statistics – when the number of features moves to the thousands and the number of records moves to the billions and beyond…

The flaw with standard statistics is the belief that there will EVER be an experiment with enough resources to collect

  • all the features of interest
  • with enough information on each pattern of features
  • to adequately estimate how one thing truly impacts another

If you want to compare two drugs to each other for a medical application – well, things that probably matter include:

  1. gender (male, female, transmale, transfemale… 4 or more)
  2. age (newborn, infant, child, teen, early adult, adult, senior, … 7 or more)
  3. racial background (asian, black, caucasian, latino… 8 or more)
  4. past medical history (measels, mumps, aids, hip replacements, flu, … thousands)
  5. cultural background (… probably hundreds)
  6. dosage of the medication (several levels to test)

which means that 4 x 7 x 8 x 1000 x 100 x 5 is how many groups are needed, and then at least 5 members in each group (can we find them…) and then what about all the other effectors not in this small list…

whew…. so what is the way to proceed past this??


Vivamus lacus mauris, rutrum in nulla sed, euismod varius turpis. Fusce vel consequat orci. Vestibulum consequat nulla a turpis maximus tincidunt.

Donec rhoncus enim sed nulla imperdiet malesuada. Nunc pharetra urna a felis bibendum molestie. Vivamus lacus mauris, rutrum in nulla sed, euismod