信用卡为什么被降额度:如何解决多示例问题,有一个musk数据集,希望高手能够指点迷津,提供算法思想和描述.

来源:百度文库 编辑:高校问答 时间:2024/05/06 19:59:38
This dataset describes a set of 92 molecules of which 47 are judged by human experts to be musks and the remaining 45 molecules are judged to be non-musks. The goal is to learn to predict whether new molecules will be musks or non-musks. However, the 166 features that describe these molecules depend upon the exact shape, or conformation, of the molecule. Because bonds can rotate, a single molecule can adopt many different shapes. To generate this data set, the low-energy conformations of the molecules were generated and then filtered to remove highly similar conformations. This left 476 conformations. Then, a feature vector was extractedthat describes each conformation.
This many-to-one relationship between feature vectors and molecules is called the "multiple instance problem". When learning a classifier for this data, the classifier should classify a molecule as "musk" if ANY of its conformations is classified as a musk. A molecule should be classified as "non-musk" if NONE of its conformations is classified as a musk.
数据集太多就没有给出,希望能够给出尽可能详尽的算法描述.