mirror of https://github.com/01-edu/public.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
73 lines
3.1 KiB
73 lines
3.1 KiB
Citation Request: |
|
This breast cancer domain was obtained from the University Medical Centre, |
|
Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and |
|
M. Soklic for providing the data. Please include this citation if you plan |
|
to use this database. |
|
|
|
1. Title: Breast cancer data (Michalski has used this) |
|
|
|
2. Sources: |
|
-- Matjaz Zwitter & Milan Soklic (physicians) |
|
Institute of Oncology |
|
University Medical Center |
|
Ljubljana, Yugoslavia |
|
-- Donors: Ming Tan and Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu) |
|
-- Date: 11 July 1988 |
|
|
|
3. Past Usage: (Several: here are some) |
|
-- Michalski,R.S., Mozetic,I., Hong,J., & Lavrac,N. (1986). The |
|
Multi-Purpose Incremental Learning System AQ15 and its Testing |
|
Application to Three Medical Domains. In Proceedings of the |
|
Fifth National Conference on Artificial Intelligence, 1041-1045, |
|
Philadelphia, PA: Morgan Kaufmann. |
|
-- accuracy range: 66%-72% |
|
-- Clark,P. & Niblett,T. (1987). Induction in Noisy Domains. In |
|
Progress in Machine Learning (from the Proceedings of the 2nd |
|
European Working Session on Learning), 11-30, Bled, |
|
Yugoslavia: Sigma Press. |
|
-- 8 test results given: 65%-72% accuracy range |
|
-- Tan, M., & Eshelman, L. (1988). Using weighted networks to |
|
represent classification knowledge in noisy domains. Proceedings |
|
of the Fifth International Conference on Machine Learning, 121-134, |
|
Ann Arbor, MI. |
|
-- 4 systems tested: accuracy range was 68%-73.5% |
|
-- Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A |
|
Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko |
|
& N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press. |
|
-- Assistant-86: 78% accuracy |
|
|
|
4. Relevant Information: |
|
This is one of three domains provided by the Oncology Institute |
|
that has repeatedly appeared in the machine learning literature. |
|
(See also lymphography and primary-tumor.) |
|
|
|
This data set includes 201 instances of one class and 85 instances of |
|
another class. The instances are described by 9 attributes, some of |
|
which are linear and some are nominal. |
|
|
|
5. Number of Instances: 286 |
|
|
|
6. Number of Attributes: 9 + the class attribute |
|
|
|
7. Attribute Information: |
|
1. Class: no-recurrence-events, recurrence-events |
|
2. age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99. |
|
3. menopause: lt40, ge40, premeno. |
|
4. tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, |
|
45-49, 50-54, 55-59. |
|
5. inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, |
|
27-29, 30-32, 33-35, 36-39. |
|
6. node-caps: yes, no. |
|
7. deg-malig: 1, 2, 3. |
|
8. breast: left, right. |
|
9. breast-quad: left-up, left-low, right-up, right-low, central. |
|
10. irradiat: yes, no. |
|
|
|
8. Missing Attribute Values: (denoted by "?") |
|
Attribute #: Number of instances with missing values: |
|
6. 8 |
|
9. 1. |
|
|
|
9. Class Distribution: |
|
1. no-recurrence-events: 201 instances |
|
2. recurrence-events: 85 instances |