Tomado de aquí.
The database was collected at Chiba University hospital. Each patient came to the outpatient clinic of the hospital on collagen diseases, as recommended by a home doctor or a general physician in the local hospital.
Collagen diseases are auto-immune diseases. Patients generate antibodies attacking their own bodies. For example, if a patient generates antibodies in lungs, he/she will chronically lose the respiratory function and finally lose life. The disease mechanisms are only partially known and their classification is still fuzzy. Some patients may generate many kinds of antibodies and their manifestations may include all the characteristics of collagen diseases.
In collagen diseases, thrombosis is one of the most important and severe complications, one of the major causes of death. Thrombosis is an increased coagulation of blood, that cloggs blood vessels. Usually it will last several hours and can repeat over time. Thrombosis can arise from different collagen diseases. It has been found that this complication is closely related to anti-cardiolipin antibodies. This was discovered by physicians, one of whom donated the datasets for discovery challenge.
Thrombosis must be treated as an emergency. It is important to detect and predict the possibilities of its occurence. However, such database analysis has not been made by any experts on immunology. Domain experts are very much interested in discovering regularities behind patients' observations.
Basic information about patients (input by doctors). This dataset includes all patients (about 1000 records).
| item | meaning | remark |
|---|---|---|
| ID | identification of the patient | |
| Sex | ||
| Birthday | YYYY/M/D | |
| Description date | the first date when a patient data was recorded | YY.MM.DD |
| First date | the date when a patient came to the hospital | YY.MM.DD |
| Admission | patient was admitted to the hospital (+) or followed at the outpatient clinic (-) | |
| Diagnosis | disease names | multivalued attribute |
Special laboratory examinations (input by doctors) (measured by the Laboratory on Collagen Diseases). This dataset does not include all the patients, but includes the patients with these special tests.
| item | meaning | remark |
|---|---|---|
| ID | identification of the patient | |
| Examination Date | date of the test | YYYY/MM/DD |
| aCL IgG | anti-Cardiolipin antibody (IgG) concentration | |
| aCL IgM | anti-Cardiolipin antibody (IgM) concentration | |
| ANA | anti-nucleus antibody concentration | |
| ANA Pattern | pattern observed in the sheet of ANA examination | |
| aCL IgA | anti-Cardiolipin antibody (IgA) concentration | |
| Diagnosis | disease names | multivalued attribute |
| KCT | meassure of degree of coagulation | |
| RVVT | meassure of degree of coagulation | |
| LAC | meassure of degree of coagulation | |
| Symptoms | other symptoms observed | multivalued attribute |
| Thrombosis | degree of thrombosis | 0: negative (no thrombosis) 1: positive (the most severe one) 2: positive (severe) 3: positive (mild) |
Laboratory Examinations stored in Hospital Information Systems (Stored from 1980 to March 1999) All the data include ordinary laboratory examinations and have temporal stamps. The tests are not necessarily connected to thrombosis.
| item | meaning | normal range |
|---|---|---|
| ID | identification of the patient | |
| Date | Date of the laboratory tests (YYMMDD) | |
| GOT | AST glutamic oxaloacetic transaminase | N < 60 |
| GPT | ALT glutamic pylvic transaminase | N < 60 |
| LDH | lactate dehydrogenase | N < 500 |
| ALP | alkaliphophatase | N < 300 |
| TP | total protein | 6.0 < N < 8.5 |
| ALB | albumin | 3.5 < N < 5.5 |
| UA | uric acid | N > 8.0 (Male) N > 6.5 (Female) |
| UN | urea nitrogen | N < 30 |
| CRE | creatinine | N < 1.5 |
| T-BIL | total bilirubin | N < 2.0 |
| T-CHO | total cholesterol | N < 250 |
| TG | triglyceride | N < 200 |
| CPK | creatinine phosphokinase | N < 250 |
| GLU | blood glucose | N < 180 |
| WBC | White blood cell | 3.5 < N < 9.0 |
| RBC | Red blood cell | 3.5 < N < 6.0 |
| HGB | Hemoglobin | 10 < N < 17 |
| HCT | Hematoclit | 29 < N < 52 |
| PLT | platelet | 100 < N < 400 |
| PT | prothrombin time | N < 14 |
| Note | comment for the test PT | |
| APTT | activated partial prothrombin time | N < 45 |
| FG | fibrinogen | 150 < N < 450 |
| AT3 | marker of DIC, one of the most important complications of collagen diseases | 70 < N < 130 |
| A2PI | marker of DIC | 70 < N < 130 |
| U-PRO | proteinuria | 0 < N < 30 |
| IGG | Ig G | 900 < N < 2000 |
| IGA | Ig A | 80 < N < 500 |
| IGM | Ig M | 40 < N < 400 |
| CRP | C-reactive protein | N= -, +-, or N < 1.0 |
| RA | Rhuematoid Factor | N= -, +- |
| RF | RAHA | N < 20 |
| C3 | complement 3 | N > 35 |
| C4 | complement 4 | N > 10 |
| RNP | anti-ribonuclear protein | N= -, +- |
| SM | anti-SM | N= -, +- |
| SCl70 | anti-scl70 | N= -, +- |
| SSA | anti-SSA | N= -, +- |
| SSB | anti-SSB | N= -, +- |
| CENTROMEA | anti-centromere | N= -, +- |
| DNA | anti-DNA | N < 8 |
| DNA-II | anti-DNA | N < 8 |
RDDRDD en una tabla temporalqueries para ver que todo esté bien cargado.SQL ¿Se te ocurre otra manera de hacerlo?
In [ ]: