There are two datasets presented in the paper 1 (Golub et al), training data and test data. The training data consists of 38 bone marrow samples (27 ALL, 11 AML) obtained from acute leukemia patients at the time of diagnosis. There are 7129 probes in the experiment for 6817 genes, i.e. there are 7129 gene expressions for 6817 genes in the dataset. The test data is an independent collection of 34 leukemia samples with 24 bone marrow and 10 peripheral blood samples. 20 of them are ALL samples and the rest are AML samples. More details about the dataset could be found in the paper 1 or in this linked discription golubEsets.
Since the range of the gene expression in the dataset is large and there are lots of negative gene expression values, usually several transformation would be done before building the classifier. In paper 2, they manually restricted the value to above some positive threshold and did a log transformation after that. Paper 9 proposed a transformation procedure (ceiling and floor thresholding and a log transformation), which is widely used by researchers afterwards. They did three preprocessing steps: thresholding, filtering and base 10 logarithmic transformation and then reduced the whole training and test dataset to have only 3571 predictors.(dataset) However, after preprocessing use the procedure, we will left with 3051 predictors and that resulting dataset is available at library/package.
Since the dataset has more predictors than observations, the focus of research on the dataset is not just finding an effective classifier but also the feature selection criterion. In the original paper 1, they use correlation to select 50 genes for the classifier training step.