Source of data: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh
In [1]:
    
import pandas as pd
import pandas_profiling
import numpy as np
    
In [2]:
    
df=pd.read_csv("examples/Meteorite_Landings.csv", parse_dates=['year'], encoding='UTF-8')
# Note: Pandas does not support dates before 1880, so we ignore these for this analysis
df['year'] = pd.to_datetime(df['year'], errors='coerce')
# Example: Constant variable
df['source'] = "NASA"
# Example: Boolean variable
df['boolean'] = np.random.choice([True, False], df.shape[0])
# Example: Mixed with base types
df['mixed'] = np.random.choice([1, "A"], df.shape[0])
# Example: Highly correlated variables
df['reclat_city'] = df['reclat'] + np.random.normal(scale=5,size=(len(df)))
# Example: Duplicate observations
duplicates_to_add = pd.DataFrame(df.iloc[0:10])
duplicates_to_add[u'name'] = duplicates_to_add[u'name'] + " copy"
df = df.append(duplicates_to_add, ignore_index=True)
    
In [3]:
    
pandas_profiling.ProfileReport(df)
    
    Out[3]:
    
        Overview
    
    
    
        Dataset info
        
            
            
                Number of variables 
                14  
             
            
                Number of observations 
                45726  
             
            
                Total Missing (%) 
                3.5%  
             
            
                Total size in memory 
                4.6 MiB  
             
            
                Average record size in memory 
                105.0 B  
             
            
        
    
    
        Variables types
        
            
            
                Numeric 
                4  
             
            
                Categorical 
                5  
             
            
                Boolean 
                1  
             
            
                Date 
                1  
             
            
                Text (Unique) 
                1  
             
            
                Rejected 
                2  
             
            
                Unsupported 
                0  
             
            
        
    
    
        
        Warnings
        GeoLocation has 7315 / 16.0% missing values MissingGeoLocation has a high cardinality: 17101 distinct values  Warningmass (g) is highly skewed (γ1 = 76.918)  Skewedrecclass has a high cardinality: 466 distinct values  Warningreclat has 6438 / 14.1% zeros Zerosreclat has 7315 / 16.0% missing values Missingreclat_city is highly correlated with reclat (ρ = 0.99424) Rejectedreclong has 6214 / 13.6% zeros Zerosreclong has 7315 / 16.0% missing values Missingsource has constant value NASA Rejected  
    
    
        Variables
    
    
    
        GeoLocation
            Categorical
        
    
    
        
            Distinct count 
            17101 
         
        
            Unique (%) 
            37.4% 
         
        
            Missing (%) 
            16.0% 
         
        
            Missing (n) 
            7315 
         
    
    
        
    (0.000000, 0.000000) 
    
        
            6214
        
        
     
 
    (-71.500000, 35.666670) 
    
        
             
        
        4761
     
 
    (-84.000000, 168.000000) 
    
        
             
        
        3040
     
 
    Other values (17097) 
    
        
            24396
        
        
     
 
    (Missing) 
    
        
            7315
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        (0.000000, 0.000000) 
        6214 
        13.6% 
        
             
         
 
        (-71.500000, 35.666670) 
        4761 
        10.4% 
        
             
         
 
        (-84.000000, 168.000000) 
        3040 
        6.6% 
        
             
         
 
        (-72.000000, 26.000000) 
        1505 
        3.3% 
        
             
         
 
        (-79.683330, 159.750000) 
        657 
        1.4% 
        
             
         
 
        (-76.716670, 159.666670) 
        637 
        1.4% 
        
             
         
 
        (-76.183330, 157.166670) 
        539 
        1.2% 
        
             
         
 
        (-79.683330, 155.750000) 
        473 
        1.0% 
        
             
         
 
        (-84.216670, 160.500000) 
        263 
        0.6% 
        
             
         
 
        (-86.366670, -70.000000) 
        226 
        0.5% 
        
             
         
 
        Other values (17090) 
        20096 
        43.9% 
        
             
         
 
        (Missing) 
        7315 
        16.0% 
        
             
         
 
    
        boolean
            Boolean
        
    
    
        
            
                
                    Distinct count 
                    2 
                 
                
                    Unique (%) 
                    0.0% 
                 
                
                    Missing (%) 
                    0.0% 
                 
                
                    Missing (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    0.49821 
                 
            
        
    
    
        
    True 
    
        
            22781
        
        
     
 
    (Missing) 
    
        
            22945
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        True 
        22781 
        49.8% 
        
             
         
 
        (Missing) 
        22945 
        50.2% 
        
             
         
 
    
        fall
            Categorical
        
    
    
        
            Distinct count 
            2 
         
        
            Unique (%) 
            0.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    Found 
    
        
            44609
        
        
     
 
    Fell 
    
        
             
        
        1117
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Found 
        44609 
        97.6% 
        
             
         
 
        Fell 
        1117 
        2.4% 
        
             
         
 
    
        id
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    45716 
                 
                
                    Unique (%) 
                    100.0% 
                 
                
                    Missing (%) 
                    0.0% 
                 
                
                    Missing (n) 
                    0 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    26884 
                 
                
                    Minimum 
                    1 
                 
                
                    Maximum 
                    57458 
                 
                
                    Zeros (%) 
                    0.0% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        1 
                     
                    
                        5-th percentile 
                        2388.8 
                     
                    
                        Q1 
                        12681 
                     
                    
                        Median 
                        24256 
                     
                    
                        Q3 
                        40654 
                     
                    
                        95-th percentile 
                        54891 
                     
                    
                        Maximum 
                        57458 
                     
                    
                        Range 
                        57457 
                     
                    
                        Interquartile range 
                        27972 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        16863 
                     
                    
                        Coef of variation 
                        0.62727 
                     
                    
                        Kurtosis 
                        -1.1601 
                     
                    
                        Mean 
                        26884 
                     
                    
                        MAD 
                        14490 
                     
                    
                        Skewness 
                        0.26653 
                     
                    
                        Sum 
                        1229293495 
                     
                    
                        Variance 
                        284380000 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        417 
        2 
        0.0% 
        
             
         
 
        398 
        2 
        0.0% 
        
             
         
 
        1 
        2 
        0.0% 
        
             
         
 
        6 
        2 
        0.0% 
        
             
         
 
        392 
        2 
        0.0% 
        
             
         
 
        370 
        2 
        0.0% 
        
             
         
 
        379 
        2 
        0.0% 
        
             
         
 
        2 
        2 
        0.0% 
        
             
         
 
        390 
        2 
        0.0% 
        
             
         
 
        10 
        2 
        0.0% 
        
             
         
 
        Other values (45706) 
        45706 
        100.0% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        1 
        2 
        0.0% 
        
             
         
 
        2 
        2 
        0.0% 
        
             
         
 
        4 
        1 
        0.0% 
        
             
         
 
        5 
        1 
        0.0% 
        
             
         
 
        6 
        2 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        57454 
        1 
        0.0% 
        
             
         
 
        57455 
        1 
        0.0% 
        
             
         
 
        57456 
        1 
        0.0% 
        
             
         
 
        57457 
        1 
        0.0% 
        
             
         
 
        57458 
        1 
        0.0% 
        
             
         
 
        
    
    
        mass (g)
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    12577 
                 
                
                    Unique (%) 
                    27.5% 
                 
                
                    Missing (%) 
                    0.3% 
                 
                
                    Missing (n) 
                    131 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    13278 
                 
                
                    Minimum 
                    0 
                 
                
                    Maximum 
                    60000000 
                 
                
                    Zeros (%) 
                    0.0% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        0 
                     
                    
                        5-th percentile 
                        1.1 
                     
                    
                        Q1 
                        7.2 
                     
                    
                        Median 
                        32.61 
                     
                    
                        Q3 
                        202.9 
                     
                    
                        95-th percentile 
                        4000 
                     
                    
                        Maximum 
                        60000000 
                     
                    
                        Range 
                        60000000 
                     
                    
                        Interquartile range 
                        195.7 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        574930 
                     
                    
                        Coef of variation 
                        43.298 
                     
                    
                        Kurtosis 
                        6798.4 
                     
                    
                        Mean 
                        13278 
                     
                    
                        MAD 
                        25113 
                     
                    
                        Skewness 
                        76.918 
                     
                    
                        Sum 
                        605430000 
                     
                    
                        Variance 
                        330540000000 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        1.3 
        171 
        0.4% 
        
             
         
 
        1.2 
        140 
        0.3% 
        
             
         
 
        1.4 
        138 
        0.3% 
        
             
         
 
        2.1 
        130 
        0.3% 
        
             
         
 
        2.4 
        126 
        0.3% 
        
             
         
 
        1.6 
        120 
        0.3% 
        
             
         
 
        0.5 
        119 
        0.3% 
        
             
         
 
        1.1 
        116 
        0.3% 
        
             
         
 
        3.8 
        114 
        0.2% 
        
             
         
 
        1.5 
        111 
        0.2% 
        
             
         
 
        Other values (12566) 
        44310 
        96.9% 
        
             
         
 
        (Missing) 
        131 
        0.3% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        0.0 
        19 
        0.0% 
        
             
         
 
        0.01 
        2 
        0.0% 
        
             
         
 
        0.013000000000000001 
        1 
        0.0% 
        
             
         
 
        0.02 
        1 
        0.0% 
        
             
         
 
        0.03 
        1 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        28000000.0 
        1 
        0.0% 
        
             
         
 
        30000000.0 
        1 
        0.0% 
        
             
         
 
        50000000.0 
        1 
        0.0% 
        
             
         
 
        58200000.0 
        1 
        0.0% 
        
             
         
 
        60000000.0 
        1 
        0.0% 
        
             
         
 
        
    
    
        mixed
            Categorical
        
    
    
        
            Distinct count 
            2 
         
        
            Unique (%) 
            0.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    1 
    
        
            22987
        
        
     
 
    A 
    
        
            22739
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        1 
        22987 
        50.3% 
        
             
         
 
        A 
        22739 
        49.7% 
        
             
         
 
    
        name
            Categorical, Unique
        
    
  
    
      First 3 values 
     
  
  
    
      Dominion Range 10049 
     
    
      Yamato 74391 
     
    
      Miller Range 090500 
     
  
  
    
      Last 3 values 
     
  
  
    
      Roberts Massif 04129 
     
    
      Lewis Cliff 87087 
     
    
      Northwest Africa 6079 
     
  
    First 10 values
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Aachen 
        1 
        0.0% 
        
             
         
 
        Aachen copy 
        1 
        0.0% 
        
             
         
 
        Aarhus 
        1 
        0.0% 
        
             
         
 
        Aarhus copy 
        1 
        0.0% 
        
             
         
 
        Abajo 
        1 
        0.0% 
        
             
         
 
    Last 10 values
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Österplana 062 
        1 
        0.0% 
        
             
         
 
        Österplana 063 
        1 
        0.0% 
        
             
         
 
        Österplana 064 
        1 
        0.0% 
        
             
         
 
        Łowicz 
        1 
        0.0% 
        
             
         
 
        Święcany 
        1 
        0.0% 
        
             
         
 
    
        nametype
            Categorical
        
    
    
        
            Distinct count 
            2 
         
        
            Unique (%) 
            0.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    Valid 
    
        
            45651
        
        
     
 
    Relict 
    
        
             
        
        75
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Valid 
        45651 
        99.8% 
        
             
         
 
        Relict 
        75 
        0.2% 
        
             
         
 
    
        recclass
            Categorical
        
    
    
        
            Distinct count 
            466 
         
        
            Unique (%) 
            1.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    L6 
    
        
            8287
        
        
     
 
    H5 
    
        
            7143
        
        
     
 
    L5 
    
        
             
        
        4797
     
 
    Other values (463) 
    
        
            25499
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        L6 
        8287 
        18.1% 
        
             
         
 
        H5 
        7143 
        15.6% 
        
             
         
 
        L5 
        4797 
        10.5% 
        
             
         
 
        H6 
        4529 
        9.9% 
        
             
         
 
        H4 
        4211 
        9.2% 
        
             
         
 
        LL5 
        2766 
        6.0% 
        
             
         
 
        LL6 
        2043 
        4.5% 
        
             
         
 
        L4 
        1253 
        2.7% 
        
             
         
 
        H4/5 
        428 
        0.9% 
        
             
         
 
        CM2 
        416 
        0.9% 
        
             
         
 
        Other values (456) 
        9853 
        21.5% 
        
             
         
 
    
        reclat
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    12739 
                 
                
                    Unique (%) 
                    27.9% 
                 
                
                    Missing (%) 
                    16.0% 
                 
                
                    Missing (n) 
                    7315 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    -39.107 
                 
                
                    Minimum 
                    -87.367 
                 
                
                    Maximum 
                    81.167 
                 
                
                    Zeros (%) 
                    14.1% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        -87.367 
                     
                    
                        5-th percentile 
                        -84.355 
                     
                    
                        Q1 
                        -76.714 
                     
                    
                        Median 
                        -71.5 
                     
                    
                        Q3 
                        0 
                     
                    
                        95-th percentile 
                        34.494 
                     
                    
                        Maximum 
                        81.167 
                     
                    
                        Range 
                        168.53 
                     
                    
                        Interquartile range 
                        76.714 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        46.386 
                     
                    
                        Coef of variation 
                        -1.1861 
                     
                    
                        Kurtosis 
                        -1.4769 
                     
                    
                        Mean 
                        -39.107 
                     
                    
                        MAD 
                        43.937 
                     
                    
                        Skewness 
                        0.49132 
                     
                    
                        Sum 
                        -1502100 
                     
                    
                        Variance 
                        2151.7 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        0.0 
        6438 
        14.1% 
        
             
         
 
        -71.5 
        4761 
        10.4% 
        
             
         
 
        -84.0 
        3040 
        6.6% 
        
             
         
 
        -72.0 
        1506 
        3.3% 
        
             
         
 
        -79.68333 
        1130 
        2.5% 
        
             
         
 
        -76.71667 
        680 
        1.5% 
        
             
         
 
        -76.18333 
        539 
        1.2% 
        
             
         
 
        -84.21667 
        263 
        0.6% 
        
             
         
 
        -86.36667 
        226 
        0.5% 
        
             
         
 
        -86.71667 
        217 
        0.5% 
        
             
         
 
        Other values (12728) 
        19611 
        42.9% 
        
             
         
 
        (Missing) 
        7315 
        16.0% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        -87.36667 
        4 
        0.0% 
        
             
         
 
        -87.03333 
        3 
        0.0% 
        
             
         
 
        -86.93333 
        3 
        0.0% 
        
             
         
 
        -86.71667 
        217 
        0.5% 
        
             
         
 
        -86.56667 
        17 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        72.68333 
        1 
        0.0% 
        
             
         
 
        72.88333 
        1 
        0.0% 
        
             
         
 
        76.13333 
        1 
        0.0% 
        
             
         
 
        76.53333 
        1 
        0.0% 
        
             
         
 
        81.16667 
        1 
        0.0% 
        
             
         
 
        
    
    
        reclat_city
            Highly correlated
        
    
    This variable is highly correlated with reclat and should be ignored for analysis
    
        
            Correlation 
            0.99424 
         
    
    
        reclong
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    14641 
                 
                
                    Unique (%) 
                    32.0% 
                 
                
                    Missing (%) 
                    16.0% 
                 
                
                    Missing (n) 
                    7315 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    61.053 
                 
                
                    Minimum 
                    -165.43 
                 
                
                    Maximum 
                    354.47 
                 
                
                    Zeros (%) 
                    13.6% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        -165.43 
                     
                    
                        5-th percentile 
                        -90.427 
                     
                    
                        Q1 
                        0 
                     
                    
                        Median 
                        35.667 
                     
                    
                        Q3 
                        157.17 
                     
                    
                        95-th percentile 
                        168 
                     
                    
                        Maximum 
                        354.47 
                     
                    
                        Range 
                        519.91 
                     
                    
                        Interquartile range 
                        157.17 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        80.655 
                     
                    
                        Coef of variation 
                        1.3211 
                     
                    
                        Kurtosis 
                        -0.73139 
                     
                    
                        Mean 
                        61.053 
                     
                    
                        MAD 
                        67.606 
                     
                    
                        Skewness 
                        -0.17438 
                     
                    
                        Sum 
                        2345100 
                     
                    
                        Variance 
                        6505.3 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        0.0 
        6214 
        13.6% 
        
             
         
 
        35.66667 
        4985 
        10.9% 
        
             
         
 
        168.0 
        3040 
        6.6% 
        
             
         
 
        26.0 
        1506 
        3.3% 
        
             
         
 
        159.75 
        657 
        1.4% 
        
             
         
 
        159.66666999999998 
        637 
        1.4% 
        
             
         
 
        157.16666999999998 
        542 
        1.2% 
        
             
         
 
        155.75 
        473 
        1.0% 
        
             
         
 
        160.5 
        263 
        0.6% 
        
             
         
 
        -70.0 
        228 
        0.5% 
        
             
         
 
        Other values (14630) 
        19866 
        43.4% 
        
             
         
 
        (Missing) 
        7315 
        16.0% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        -165.43333 
        9 
        0.0% 
        
             
         
 
        -165.11667 
        17 
        0.0% 
        
             
         
 
        -163.16666999999998 
        1 
        0.0% 
        
             
         
 
        -162.55 
        1 
        0.0% 
        
             
         
 
        -157.86667 
        1 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        175.13333 
        1 
        0.0% 
        
             
         
 
        175.73028 
        1 
        0.0% 
        
             
         
 
        178.08333000000002 
        1 
        0.0% 
        
             
         
 
        178.2 
        1 
        0.0% 
        
             
         
 
        354.47333 
        1 
        0.0% 
        
             
         
 
        
    
    
        source
            Constant
        
    
    This variable is constant and should be ignored for analysis
    
        
            Constant value 
            NASA 
         
    
    
        year
            Date
        
    
    
        
            
                
                    Distinct count 
                    246 
                 
                
                    Unique (%) 
                    0.5% 
                 
                
                    Missing (%) 
                    0.7% 
                 
                
                    Missing (n) 
                    312 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Minimum 
                    1688-01-01 00:00:00 
                 
                
                    Maximum 
                    2101-01-01 00:00:00 
                 
            
        
    
    
 
    
 
    
        Correlations
    
    
    
    
  
    
        Sample
    
    
    
        
  
    
       
      name 
      id 
      nametype 
      recclass 
      mass (g) 
      fall 
      year 
      reclat 
      reclong 
      GeoLocation 
      source 
      boolean 
      mixed 
      reclat_city 
     
  
  
    
      0 
      Aachen 
      1 
      Valid 
      L5 
      21.0 
      Fell 
      1880-01-01 
      50.77500 
      6.08333 
      (50.775000, 6.083330) 
      NASA 
      True 
      A 
      53.104124 
     
    
      1 
      Aarhus 
      2 
      Valid 
      H6 
      720.0 
      Fell 
      1951-01-01 
      56.18333 
      10.23333 
      (56.183330, 10.233330) 
      NASA 
      True 
      1 
      58.838867 
     
    
      2 
      Abee 
      6 
      Valid 
      EH4 
      107000.0 
      Fell 
      1952-01-01 
      54.21667 
      -113.00000 
      (54.216670, -113.000000) 
      NASA 
      True 
      1 
      59.307067 
     
    
      3 
      Acapulco 
      10 
      Valid 
      Acapulcoite 
      1914.0 
      Fell 
      1976-01-01 
      16.88333 
      -99.90000 
      (16.883330, -99.900000) 
      NASA 
      False 
      A 
      23.087539 
     
    
      4 
      Achiras 
      370 
      Valid 
      L6 
      780.0 
      Fell 
      1902-01-01 
      -33.16667 
      -64.95000 
      (-33.166670, -64.950000) 
      NASA 
      True 
      1 
      -34.589431 
     
  
    
In [4]:
    
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("/tmp/example.html")
    
In [5]:
    
pfr
    
    Out[5]:
    
        Overview
    
    
    
        Dataset info
        
            
            
                Number of variables 
                14  
             
            
                Number of observations 
                45726  
             
            
                Total Missing (%) 
                3.5%  
             
            
                Total size in memory 
                4.6 MiB  
             
            
                Average record size in memory 
                105.0 B  
             
            
        
    
    
        Variables types
        
            
            
                Numeric 
                4  
             
            
                Categorical 
                5  
             
            
                Boolean 
                1  
             
            
                Date 
                1  
             
            
                Text (Unique) 
                1  
             
            
                Rejected 
                2  
             
            
                Unsupported 
                0  
             
            
        
    
    
        
        Warnings
        GeoLocation has 7315 / 16.0% missing values MissingGeoLocation has a high cardinality: 17101 distinct values  Warningmass (g) is highly skewed (γ1 = 76.918)  Skewedrecclass has a high cardinality: 466 distinct values  Warningreclat has 6438 / 14.1% zeros Zerosreclat has 7315 / 16.0% missing values Missingreclat_city is highly correlated with reclat (ρ = 0.99424) Rejectedreclong has 6214 / 13.6% zeros Zerosreclong has 7315 / 16.0% missing values Missingsource has constant value NASA Rejected  
    
    
        Variables
    
    
    
        GeoLocation
            Categorical
        
    
    
        
            Distinct count 
            17101 
         
        
            Unique (%) 
            37.4% 
         
        
            Missing (%) 
            16.0% 
         
        
            Missing (n) 
            7315 
         
    
    
        
    (0.000000, 0.000000) 
    
        
            6214
        
        
     
 
    (-71.500000, 35.666670) 
    
        
             
        
        4761
     
 
    (-84.000000, 168.000000) 
    
        
             
        
        3040
     
 
    Other values (17097) 
    
        
            24396
        
        
     
 
    (Missing) 
    
        
            7315
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        (0.000000, 0.000000) 
        6214 
        13.6% 
        
             
         
 
        (-71.500000, 35.666670) 
        4761 
        10.4% 
        
             
         
 
        (-84.000000, 168.000000) 
        3040 
        6.6% 
        
             
         
 
        (-72.000000, 26.000000) 
        1505 
        3.3% 
        
             
         
 
        (-79.683330, 159.750000) 
        657 
        1.4% 
        
             
         
 
        (-76.716670, 159.666670) 
        637 
        1.4% 
        
             
         
 
        (-76.183330, 157.166670) 
        539 
        1.2% 
        
             
         
 
        (-79.683330, 155.750000) 
        473 
        1.0% 
        
             
         
 
        (-84.216670, 160.500000) 
        263 
        0.6% 
        
             
         
 
        (-86.366670, -70.000000) 
        226 
        0.5% 
        
             
         
 
        Other values (17090) 
        20096 
        43.9% 
        
             
         
 
        (Missing) 
        7315 
        16.0% 
        
             
         
 
    
        boolean
            Boolean
        
    
    
        
            
                
                    Distinct count 
                    2 
                 
                
                    Unique (%) 
                    0.0% 
                 
                
                    Missing (%) 
                    0.0% 
                 
                
                    Missing (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    0.49821 
                 
            
        
    
    
        
    True 
    
        
            22781
        
        
     
 
    (Missing) 
    
        
            22945
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        True 
        22781 
        49.8% 
        
             
         
 
        (Missing) 
        22945 
        50.2% 
        
             
         
 
    
        fall
            Categorical
        
    
    
        
            Distinct count 
            2 
         
        
            Unique (%) 
            0.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    Found 
    
        
            44609
        
        
     
 
    Fell 
    
        
             
        
        1117
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Found 
        44609 
        97.6% 
        
             
         
 
        Fell 
        1117 
        2.4% 
        
             
         
 
    
        id
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    45716 
                 
                
                    Unique (%) 
                    100.0% 
                 
                
                    Missing (%) 
                    0.0% 
                 
                
                    Missing (n) 
                    0 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    26884 
                 
                
                    Minimum 
                    1 
                 
                
                    Maximum 
                    57458 
                 
                
                    Zeros (%) 
                    0.0% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        1 
                     
                    
                        5-th percentile 
                        2388.8 
                     
                    
                        Q1 
                        12681 
                     
                    
                        Median 
                        24256 
                     
                    
                        Q3 
                        40654 
                     
                    
                        95-th percentile 
                        54891 
                     
                    
                        Maximum 
                        57458 
                     
                    
                        Range 
                        57457 
                     
                    
                        Interquartile range 
                        27972 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        16863 
                     
                    
                        Coef of variation 
                        0.62727 
                     
                    
                        Kurtosis 
                        -1.1601 
                     
                    
                        Mean 
                        26884 
                     
                    
                        MAD 
                        14490 
                     
                    
                        Skewness 
                        0.26653 
                     
                    
                        Sum 
                        1229293495 
                     
                    
                        Variance 
                        284380000 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        417 
        2 
        0.0% 
        
             
         
 
        398 
        2 
        0.0% 
        
             
         
 
        1 
        2 
        0.0% 
        
             
         
 
        6 
        2 
        0.0% 
        
             
         
 
        392 
        2 
        0.0% 
        
             
         
 
        370 
        2 
        0.0% 
        
             
         
 
        379 
        2 
        0.0% 
        
             
         
 
        2 
        2 
        0.0% 
        
             
         
 
        390 
        2 
        0.0% 
        
             
         
 
        10 
        2 
        0.0% 
        
             
         
 
        Other values (45706) 
        45706 
        100.0% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        1 
        2 
        0.0% 
        
             
         
 
        2 
        2 
        0.0% 
        
             
         
 
        4 
        1 
        0.0% 
        
             
         
 
        5 
        1 
        0.0% 
        
             
         
 
        6 
        2 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        57454 
        1 
        0.0% 
        
             
         
 
        57455 
        1 
        0.0% 
        
             
         
 
        57456 
        1 
        0.0% 
        
             
         
 
        57457 
        1 
        0.0% 
        
             
         
 
        57458 
        1 
        0.0% 
        
             
         
 
        
    
    
        mass (g)
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    12577 
                 
                
                    Unique (%) 
                    27.5% 
                 
                
                    Missing (%) 
                    0.3% 
                 
                
                    Missing (n) 
                    131 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    13278 
                 
                
                    Minimum 
                    0 
                 
                
                    Maximum 
                    60000000 
                 
                
                    Zeros (%) 
                    0.0% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        0 
                     
                    
                        5-th percentile 
                        1.1 
                     
                    
                        Q1 
                        7.2 
                     
                    
                        Median 
                        32.61 
                     
                    
                        Q3 
                        202.9 
                     
                    
                        95-th percentile 
                        4000 
                     
                    
                        Maximum 
                        60000000 
                     
                    
                        Range 
                        60000000 
                     
                    
                        Interquartile range 
                        195.7 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        574930 
                     
                    
                        Coef of variation 
                        43.298 
                     
                    
                        Kurtosis 
                        6798.4 
                     
                    
                        Mean 
                        13278 
                     
                    
                        MAD 
                        25113 
                     
                    
                        Skewness 
                        76.918 
                     
                    
                        Sum 
                        605430000 
                     
                    
                        Variance 
                        330540000000 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        1.3 
        171 
        0.4% 
        
             
         
 
        1.2 
        140 
        0.3% 
        
             
         
 
        1.4 
        138 
        0.3% 
        
             
         
 
        2.1 
        130 
        0.3% 
        
             
         
 
        2.4 
        126 
        0.3% 
        
             
         
 
        1.6 
        120 
        0.3% 
        
             
         
 
        0.5 
        119 
        0.3% 
        
             
         
 
        1.1 
        116 
        0.3% 
        
             
         
 
        3.8 
        114 
        0.2% 
        
             
         
 
        1.5 
        111 
        0.2% 
        
             
         
 
        Other values (12566) 
        44310 
        96.9% 
        
             
         
 
        (Missing) 
        131 
        0.3% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        0.0 
        19 
        0.0% 
        
             
         
 
        0.01 
        2 
        0.0% 
        
             
         
 
        0.013000000000000001 
        1 
        0.0% 
        
             
         
 
        0.02 
        1 
        0.0% 
        
             
         
 
        0.03 
        1 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        28000000.0 
        1 
        0.0% 
        
             
         
 
        30000000.0 
        1 
        0.0% 
        
             
         
 
        50000000.0 
        1 
        0.0% 
        
             
         
 
        58200000.0 
        1 
        0.0% 
        
             
         
 
        60000000.0 
        1 
        0.0% 
        
             
         
 
        
    
    
        mixed
            Categorical
        
    
    
        
            Distinct count 
            2 
         
        
            Unique (%) 
            0.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    1 
    
        
            22987
        
        
     
 
    A 
    
        
            22739
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        1 
        22987 
        50.3% 
        
             
         
 
        A 
        22739 
        49.7% 
        
             
         
 
    
        name
            Categorical, Unique
        
    
  
    
      First 3 values 
     
  
  
    
      Dominion Range 10049 
     
    
      Yamato 74391 
     
    
      Miller Range 090500 
     
  
  
    
      Last 3 values 
     
  
  
    
      Roberts Massif 04129 
     
    
      Lewis Cliff 87087 
     
    
      Northwest Africa 6079 
     
  
    First 10 values
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Aachen 
        1 
        0.0% 
        
             
         
 
        Aachen copy 
        1 
        0.0% 
        
             
         
 
        Aarhus 
        1 
        0.0% 
        
             
         
 
        Aarhus copy 
        1 
        0.0% 
        
             
         
 
        Abajo 
        1 
        0.0% 
        
             
         
 
    Last 10 values
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Österplana 062 
        1 
        0.0% 
        
             
         
 
        Österplana 063 
        1 
        0.0% 
        
             
         
 
        Österplana 064 
        1 
        0.0% 
        
             
         
 
        Łowicz 
        1 
        0.0% 
        
             
         
 
        Święcany 
        1 
        0.0% 
        
             
         
 
    
        nametype
            Categorical
        
    
    
        
            Distinct count 
            2 
         
        
            Unique (%) 
            0.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    Valid 
    
        
            45651
        
        
     
 
    Relict 
    
        
             
        
        75
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        Valid 
        45651 
        99.8% 
        
             
         
 
        Relict 
        75 
        0.2% 
        
             
         
 
    
        recclass
            Categorical
        
    
    
        
            Distinct count 
            466 
         
        
            Unique (%) 
            1.0% 
         
        
            Missing (%) 
            0.0% 
         
        
            Missing (n) 
            0 
         
    
    
        
    L6 
    
        
            8287
        
        
     
 
    H5 
    
        
            7143
        
        
     
 
    L5 
    
        
             
        
        4797
     
 
    Other values (463) 
    
        
            25499
        
        
     
 
    
    
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        L6 
        8287 
        18.1% 
        
             
         
 
        H5 
        7143 
        15.6% 
        
             
         
 
        L5 
        4797 
        10.5% 
        
             
         
 
        H6 
        4529 
        9.9% 
        
             
         
 
        H4 
        4211 
        9.2% 
        
             
         
 
        LL5 
        2766 
        6.0% 
        
             
         
 
        LL6 
        2043 
        4.5% 
        
             
         
 
        L4 
        1253 
        2.7% 
        
             
         
 
        H4/5 
        428 
        0.9% 
        
             
         
 
        CM2 
        416 
        0.9% 
        
             
         
 
        Other values (456) 
        9853 
        21.5% 
        
             
         
 
    
        reclat
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    12739 
                 
                
                    Unique (%) 
                    27.9% 
                 
                
                    Missing (%) 
                    16.0% 
                 
                
                    Missing (n) 
                    7315 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    -39.107 
                 
                
                    Minimum 
                    -87.367 
                 
                
                    Maximum 
                    81.167 
                 
                
                    Zeros (%) 
                    14.1% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        -87.367 
                     
                    
                        5-th percentile 
                        -84.355 
                     
                    
                        Q1 
                        -76.714 
                     
                    
                        Median 
                        -71.5 
                     
                    
                        Q3 
                        0 
                     
                    
                        95-th percentile 
                        34.494 
                     
                    
                        Maximum 
                        81.167 
                     
                    
                        Range 
                        168.53 
                     
                    
                        Interquartile range 
                        76.714 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        46.386 
                     
                    
                        Coef of variation 
                        -1.1861 
                     
                    
                        Kurtosis 
                        -1.4769 
                     
                    
                        Mean 
                        -39.107 
                     
                    
                        MAD 
                        43.937 
                     
                    
                        Skewness 
                        0.49132 
                     
                    
                        Sum 
                        -1502100 
                     
                    
                        Variance 
                        2151.7 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        0.0 
        6438 
        14.1% 
        
             
         
 
        -71.5 
        4761 
        10.4% 
        
             
         
 
        -84.0 
        3040 
        6.6% 
        
             
         
 
        -72.0 
        1506 
        3.3% 
        
             
         
 
        -79.68333 
        1130 
        2.5% 
        
             
         
 
        -76.71667 
        680 
        1.5% 
        
             
         
 
        -76.18333 
        539 
        1.2% 
        
             
         
 
        -84.21667 
        263 
        0.6% 
        
             
         
 
        -86.36667 
        226 
        0.5% 
        
             
         
 
        -86.71667 
        217 
        0.5% 
        
             
         
 
        Other values (12728) 
        19611 
        42.9% 
        
             
         
 
        (Missing) 
        7315 
        16.0% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        -87.36667 
        4 
        0.0% 
        
             
         
 
        -87.03333 
        3 
        0.0% 
        
             
         
 
        -86.93333 
        3 
        0.0% 
        
             
         
 
        -86.71667 
        217 
        0.5% 
        
             
         
 
        -86.56667 
        17 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        72.68333 
        1 
        0.0% 
        
             
         
 
        72.88333 
        1 
        0.0% 
        
             
         
 
        76.13333 
        1 
        0.0% 
        
             
         
 
        76.53333 
        1 
        0.0% 
        
             
         
 
        81.16667 
        1 
        0.0% 
        
             
         
 
        
    
    
        reclat_city
            Highly correlated
        
    
    This variable is highly correlated with reclat and should be ignored for analysis
    
        
            Correlation 
            0.99424 
         
    
    
        reclong
            Numeric
        
    
    
        
            
                
                    Distinct count 
                    14641 
                 
                
                    Unique (%) 
                    32.0% 
                 
                
                    Missing (%) 
                    16.0% 
                 
                
                    Missing (n) 
                    7315 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Mean 
                    61.053 
                 
                
                    Minimum 
                    -165.43 
                 
                
                    Maximum 
                    354.47 
                 
                
                    Zeros (%) 
                    13.6% 
                 
            
        
    
    
 
    
    
        
            
                Quantile statistics
                
                    
                        Minimum 
                        -165.43 
                     
                    
                        5-th percentile 
                        -90.427 
                     
                    
                        Q1 
                        0 
                     
                    
                        Median 
                        35.667 
                     
                    
                        Q3 
                        157.17 
                     
                    
                        95-th percentile 
                        168 
                     
                    
                        Maximum 
                        354.47 
                     
                    
                        Range 
                        519.91 
                     
                    
                        Interquartile range 
                        157.17 
                     
                
            
            
                Descriptive statistics
                
                    
                        Standard deviation 
                        80.655 
                     
                    
                        Coef of variation 
                        1.3211 
                     
                    
                        Kurtosis 
                        -0.73139 
                     
                    
                        Mean 
                        61.053 
                     
                    
                        MAD 
                        67.606 
                     
                    
                        Skewness 
                        -0.17438 
                     
                    
                        Sum 
                        2345100 
                     
                    
                        Variance 
                        6505.3 
                     
                    
                        Memory size 
                        357.3 KiB 
                     
                
            
        
        
             
        
        
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        0.0 
        6214 
        13.6% 
        
             
         
 
        35.66667 
        4985 
        10.9% 
        
             
         
 
        168.0 
        3040 
        6.6% 
        
             
         
 
        26.0 
        1506 
        3.3% 
        
             
         
 
        159.75 
        657 
        1.4% 
        
             
         
 
        159.66666999999998 
        637 
        1.4% 
        
             
         
 
        157.16666999999998 
        542 
        1.2% 
        
             
         
 
        155.75 
        473 
        1.0% 
        
             
         
 
        160.5 
        263 
        0.6% 
        
             
         
 
        -70.0 
        228 
        0.5% 
        
             
         
 
        Other values (14630) 
        19866 
        43.4% 
        
             
         
 
        (Missing) 
        7315 
        16.0% 
        
             
         
 
        
        
            Minimum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        -165.43333 
        9 
        0.0% 
        
             
         
 
        -165.11667 
        17 
        0.0% 
        
             
         
 
        -163.16666999999998 
        1 
        0.0% 
        
             
         
 
        -162.55 
        1 
        0.0% 
        
             
         
 
        -157.86667 
        1 
        0.0% 
        
             
         
 
            Maximum 5 values
            
    
    
        Value 
        Count 
        Frequency (%) 
          
     
    
    
        175.13333 
        1 
        0.0% 
        
             
         
 
        175.73028 
        1 
        0.0% 
        
             
         
 
        178.08333000000002 
        1 
        0.0% 
        
             
         
 
        178.2 
        1 
        0.0% 
        
             
         
 
        354.47333 
        1 
        0.0% 
        
             
         
 
        
    
    
        source
            Constant
        
    
    This variable is constant and should be ignored for analysis
    
        
            Constant value 
            NASA 
         
    
    
        year
            Date
        
    
    
        
            
                
                    Distinct count 
                    246 
                 
                
                    Unique (%) 
                    0.5% 
                 
                
                    Missing (%) 
                    0.7% 
                 
                
                    Missing (n) 
                    312 
                 
                
                    Infinite (%) 
                    0.0% 
                 
                
                    Infinite (n) 
                    0 
                 
            
        
        
            
                
                    Minimum 
                    1688-01-01 00:00:00 
                 
                
                    Maximum 
                    2101-01-01 00:00:00 
                 
            
        
    
    
 
    
 
    
        Correlations
    
    
    
    
  
    
        Sample
    
    
    
        
  
    
       
      name 
      id 
      nametype 
      recclass 
      mass (g) 
      fall 
      year 
      reclat 
      reclong 
      GeoLocation 
      source 
      boolean 
      mixed 
      reclat_city 
     
  
  
    
      0 
      Aachen 
      1 
      Valid 
      L5 
      21.0 
      Fell 
      1880-01-01 
      50.77500 
      6.08333 
      (50.775000, 6.083330) 
      NASA 
      True 
      A 
      53.104124 
     
    
      1 
      Aarhus 
      2 
      Valid 
      H6 
      720.0 
      Fell 
      1951-01-01 
      56.18333 
      10.23333 
      (56.183330, 10.233330) 
      NASA 
      True 
      1 
      58.838867 
     
    
      2 
      Abee 
      6 
      Valid 
      EH4 
      107000.0 
      Fell 
      1952-01-01 
      54.21667 
      -113.00000 
      (54.216670, -113.000000) 
      NASA 
      True 
      1 
      59.307067 
     
    
      3 
      Acapulco 
      10 
      Valid 
      Acapulcoite 
      1914.0 
      Fell 
      1976-01-01 
      16.88333 
      -99.90000 
      (16.883330, -99.900000) 
      NASA 
      False 
      A 
      23.087539 
     
    
      4 
      Achiras 
      370 
      Valid 
      L6 
      780.0 
      Fell 
      1902-01-01 
      -33.16667 
      -64.95000 
      (-33.166670, -64.950000) 
      NASA 
      True 
      1 
      -34.589431 
     
  
    
Content source: JosPolfliet/pandas-profiling
Similar notebooks: