mjpdatascience
mjpdatascience
Data Science & Analytics
8 posts
Where MJP posts assignments
Don't wanna be here? Send us removal request.
mjpdatascience · 7 years ago
Text
Week 4 DAT
Table of Contents:
Discussion
My Week 3 Output
Week 3 Program
1) Discussion
Is crater depth correlated with crater diameter? More energetic impacts are expected to leave wider craters, but do they leave deeper craters? If there *is* a relationship between crater depth and diameter, does that relationship change depending on whether the craters are in the northern or southern hemisphere of Mars? Last week, crater diameter and depth were analyzed. The data were subset to exclude craters with zero or negative depth. A scatterplot of crater depth vs diameter with a least-squares linear fit shows some correlation, but the plot is truly scattered. That said, the Peterson correlation shows an r value of 0.49 (r^2 = 0.24), with a p value so small that the software is returning 0.0.
Thus, crater depth and diameter are significantly correlated. However, only 24% of the variation in depth is explained by variation in diameter.
This week, the data were subset for northern (LATITUDE_CIRCLE_IMAGE > 0) and southern (LATITUDE_CIRCLE_IMAGE < 0) craters. The plots for craters in both hemispheres look similar. The Peterson correlations are as follows:
Northern: r = 0.45 (r^2 = 0.21) Southern: r = 0.50 (r^2 = 0.25)
Both with p values that are so small (and thus significant) that the software is outputting zeroes.
Thus, the relationship between crater depth and diamter is not moderated by whether the crater is in the northern or southern hemisphere.
2) My Week 4 Output
length of data: 384343
length of data after subset: 76804 association between crater diameter and depth Out[13]: (0.48599045388140061, 0.0)
length of marsnorth after subset: 28623
length of marssouth after subset: 48180
association between crater diameter and depth for northern hemisphere Out[18]: (0.45861085510523519, 0.0)
association between crater diameter and depth for souther hemisphere Out[19]: (0.49619338821080183, 0.0)
3) Week 4 Program
# -*- coding: utf-8 -*- """ Created on Sat Feb 17 18:57:28 2018
@author: MJP """
import pandas import numpy import seaborn import scipy import matplotlib.pyplot as plt
#Read the Mars Crater Database into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix (that I don't fully understand) "for display formats to avoid run time errors", or so our instructors tell us pandas.set_option('display.float_format', lambda x:'%f'%x)
# variables of interest are already numeric, so no need to change
# check length of data print ("length of data:") print(len(marsdata))
# subset data for craters with depth > 0 (i.e. no raised craters or depthless ones) marssub1 = marsdata[marsdata['DEPTH_RIMFLOOR_TOPOG']>0]
#make a copy of my new subsetted data marssub2 = marssub1.copy()
# check that data are properly subset print ('\n'"length of data after subset:") print(len(marssub2))
plt.ylabel('Crater Depth')#basic scatterplot:  Q->Q scat1 = seaborn.regplot(x="DIAM_CIRCLE_IMAGE", y="DEPTH_RIMFLOOR_TOPOG", fit_reg=True, data=marssub2) plt.xlabel('Crater Diameter') plt.title('Scatterplot for the Association Between Crater Diameter and Depth')
print ('association between crater diameter and depth') scipy.stats.pearsonr(marssub2['DIAM_CIRCLE_IMAGE'], marssub2['DEPTH_RIMFLOOR_TOPOG'])
# subset data for craters in norther hemisphere marsnorth = marssub2[marssub2['LATITUDE_CIRCLE_IMAGE']>0]
# subset data for craters in norther hemisphere marssouth = marssub2[marssub2['LATITUDE_CIRCLE_IMAGE']<0]
# check that data are properly subset print ('\n'"length of marsnorth after subset:") print(len(marsnorth))
print ('\n'"length of marssouth after subset:") print(len(marssouth))
plt.ylabel('Crater Depth')#basic scatterplot:  Q->Q scat2 = seaborn.regplot(x="DIAM_CIRCLE_IMAGE", y="DEPTH_RIMFLOOR_TOPOG", fit_reg=True, data=marsnorth) plt.xlabel('Crater Diameter') plt.title('Scatterplot for the Northern Hemisphere Association Between Crater Diameter and Depth')
plt.ylabel('Crater Depth')#basic scatterplot:  Q->Q scat3 = seaborn.regplot(x="DIAM_CIRCLE_IMAGE", y="DEPTH_RIMFLOOR_TOPOG", fit_reg=True, data=marssouth) plt.xlabel('Crater Diameter') plt.title('Scatterplot for the Southern Hemisphere Association Between Crater Diameter and Depth')
print ('association between crater diameter and depth for northern hemisphere') scipy.stats.pearsonr(marsnorth['DIAM_CIRCLE_IMAGE'], marsnorth['DEPTH_RIMFLOOR_TOPOG'])
print ('association between crater diameter and depth for souther hemisphere') scipy.stats.pearsonr(marssouth['DIAM_CIRCLE_IMAGE'], marssouth['DEPTH_RIMFLOOR_TOPOG'])
0 notes
mjpdatascience · 7 years ago
Text
Week 3 DAT
Table of Contents:
Discussion
My Week 3 Output
Week 3 Program
1) Discussion
Is crater depth correlated with crater diameter? More energetic impacts are expected to leave wider craters, but do they leave deeper craters? Crater diameter and depth were analyzed. The data were subset to exclude craters with zero or negative depth. A scatterplot of crater depth vs diameter with a least-squares linear fit shows some correlation, but the plot is truly scattered. That said, the Peterson correlation shows an r value of 0.49 (r^2 = 0.24), with a p value so small that the software is returning 0.0.
Thus, crater depth and diameter are significantly correlated. However, only 24% of the variation in depth is explained by variation in diameter.
2) My Week 3 Output
length of data: 384343
length of data after subset: 76804 association between crater diameter and depth Out[7]: (0.48599045388140061, 0.0)
3) Week 3 Program
# -*- coding: utf-8 -*- """ Created on Sat Feb 10 18:57:28 2018
@author: MJP """
import pandas import numpy import seaborn import scipy import matplotlib.pyplot as plt
#Read the Mars Crater Database into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix (that I don't fully understand) "for display formats to avoid run time errors", or so our instructors tell us pandas.set_option('display.float_format', lambda x:'%f'%x)
# variables of interest are already numeric, so no need to change
# check length of data print ("length of data:") print(len(marsdata))
# subset data for craters with depth > 0 (i.e. no raised craters or depthless ones) marssub1 = marsdata[marsdata['DEPTH_RIMFLOOR_TOPOG']>0]
#make a copy of my new subsetted data marssub2 = marssub1.copy()
# check that data are properly subset print ('\n'"length of data after subset:") print(len(marssub2))
plt.ylabel('Crater Depth')#basic scatterplot:  Q->Q scat1 = seaborn.regplot(x="DIAM_CIRCLE_IMAGE", y="DEPTH_RIMFLOOR_TOPOG", fit_reg=True, data=marssub2) plt.xlabel('Crater Diameter') plt.title('Scatterplot for the Association Between Crater Diameter and Depth')
print ('association between crater diameter and depth') scipy.stats.pearsonr(marssub2['DIAM_CIRCLE_IMAGE'], marssub2['DEPTH_RIMFLOOR_TOPOG'])
0 notes
mjpdatascience · 7 years ago
Text
Week 2 DAT
Table of Contents:
Discussion
My Week 2 Program
Week 2 Output
1) Discussion
The majority of the craters in the Mars dataset do not have characterization of their ejecta listed. Is this because they have no ejecta, or because ejecta simply were not characterized for most craters? I expect larger craters are more likely to be fully characterized than smaller craters. To test this hypothesis, I split the craters into two groups, those smaller (0) and larger (1) than 10 km in diameter.
The dataset uses a space character where no ejecta characterization is listed. I used numpy.where (which I looked up on SciPy.org because we haven't learned about it) to create a binary variable EJECTA_YESNO, where 1 indicates the presence of an ejecta characterization and 0 indicates no ejecta characterization.
A contingency table and percentages table shows that small craters are actually characterized at a higher rate than larger craters, so I was entirely off-base. But to complete the exercise, the null hypothesis is that small and large craters have ejecta characterized at the same rate. A chi-square test returns a chi-square value of over 1300 and a p-value of ~2E286. This is more than sufficient to reject the null hypothesis, just for the opposite reason I had proposed.
Because this analysis was 2x2, no post hoc paired comparison is needed.
2) My Week 2 Program
# -*- coding: utf-8 -*- """ Created on Mon Jan 18 20:21:53 2017
@author: MJP """
#Import necessary libraries import pandas import numpy import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi import scipy.stats
#Read the Mars Crater Database into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix (that I don't fully understand) "for display formats to avoid run time errors", or so our instructors tell us pandas.set_option('display.float_format', lambda x:'%f'%x)
# check length of data print ("length of data:") print(len(marsdata))
# subset data for craters with depth > 0 (i.e. no raised craters or depthless ones) marssub1 = marsdata[marsdata['DEPTH_RIMFLOOR_TOPOG']>0]
#make a copy of my new subsetted data marssub2 = marssub1.copy()
# check that data are properly subset print ('\n'"length of data after subset:") print(len(marssub2))
# group the Crater Diameter by size. We're not printing here because there are too many data c1=marssub2.groupby('DIAM_CIRCLE_IMAGE').size()
# cut Crater Diamter data marssub2['DIAM_CAT'] = pandas.cut(marssub2.DIAM_CIRCLE_IMAGE, [0, 10, 512.75])
# change format from numeric to categorical marssub2['DIAM_CAT'] = marssub2['DIAM_CAT'].astype('category')
print('\n''describe DIAM_CAT') desc3 = marssub2['DIAM_CAT'].describe() print(desc3)
print('\n''crater diameter counts') c11 = marssub2['DIAM_CAT'].value_counts(sort=False, dropna=True) print(c11)
# make a copy and add a column that lists 1 if ejecta are categorized, else 0 marssub4 = marssub2.copy() marssub4["EJECTA_YESNO"] = numpy.where(marssub4["MORPHOLOGY_EJECTA_1"] == " ", 0, 1)
# contingency table of observed counts ct1=pandas.crosstab(marssub4["EJECTA_YESNO"], marssub4["DIAM_CAT"]) print (ct1)
# column percentages colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)
# chi-square print ('chi-square value, p value, expected counts') cs1= scipy.stats.chi2_contingency(ct1) print (cs1)
# recategorize small craters as 0 and large as 1 marssub4["DIAM_BIGSMALL"] = numpy.where(marssub4["DIAM_CAT"] == "Interval(10.0, 512.75, closed='right')", 0, 1) marssub4.dtypes marssub4.loc[1,"DIAM_CAT"]
# using ols function for calculating the F-statistic and associated p value model1 = smf.ols(formula='DEPTH_RIMFLOOR_TOPOG ~ C(DIAM_CAT)', data=marssub2) results1 = model1.fit() print (results1.summary())
marssub3 = marssub2[['DEPTH_RIMFLOOR_TOPOG', 'DIAM_CAT']].dropna()
print ('\n''means for DEPTH_RIMFLOOR_TOPOG by crater diameter category') m1= marssub3.groupby('DIAM_CAT').mean() print (m1)
print ('\n''standard deviations for DEPTH_RIMFLOOR_TOPOG by crater diameter category') sd1 = marssub3.groupby('DIAM_CAT').std() print (sd1)
#End of program
3) Week 2 Output
length of data: 384343
length of data after subset: 76804
describe DIAM_CAT count           76804 unique              2 top       (0.0, 10.0] freq            53390 Name: DIAM_CAT, dtype: object
crater diameter counts (0.0, 10.0]       53390 (10.0, 512.75]    23414 Name: DIAM_CAT, dtype: int64 DIAM_CAT      (0.0, 10.0]  (10.0, 512.75] EJECTA_YESNO                             0                   24225           13943 1                   29165            9471 DIAM_CAT      (0.0, 10.0]  (10.0, 512.75] EJECTA_YESNO                             0                0.453737        0.595498 1                0.546263        0.404502 chi-square value, p value, expected counts (1307.8495650262889, 2.225384735941522e-286, 1, array([[ 26532.33581584,  11635.66418416],        [ 26857.66418416,  11778.33581584]]))                              OLS Regression Results                             ================================================================================ Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.193 Model:                              OLS   Adj. R-squared:                  0.193 Method:                   Least Squares   F-statistic:                 1.837e+04 Date:                  Sun, 04 Feb 2018   Prob (F-statistic):               0.00 Time:                          13:44:32   Log-Likelihood:                -22484. No. Observations:                 76804   AIC:                         4.497e+04 Df Residuals:                     76802   BIC:                         4.499e+04 Df Model:                             1                                         Covariance Type:              nonrobust                                         =========================================================================================================================                                                             coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------------------------------------------------- Intercept                                                 0.2745      0.001    195.587      0.000       0.272       0.277 C(DIAM_CAT)[T.Interval(10.0, 512.75, closed='right')]     0.3445      0.003    135.546      0.000       0.340       0.350 ============================================================================== Omnibus:                    25528.287   Durbin-Watson:                   1.475 Prob(Omnibus):                  0.000   Jarque-Bera (JB):           127882.759 Skew:                           1.535   Prob(JB):                         0.00 Kurtosis:                       8.526   Cond. No.                         2.42 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
means for DEPTH_RIMFLOOR_TOPOG by crater diameter category                 DEPTH_RIMFLOOR_TOPOG DIAM_CAT                            (0.0, 10.0]                 0.274485 (10.0, 512.75]              0.619009
standard deviations for DEPTH_RIMFLOOR_TOPOG by crater diameter category                 DEPTH_RIMFLOOR_TOPOG DIAM_CAT                            (0.0, 10.0]                 0.220973 (10.0, 512.75]              0.483306
0 notes
mjpdatascience · 7 years ago
Text
Week 1 DAT
Table of Contents:
My Week 1 Program
Week 1 Output
Discussion
1) My Week 1 Program
# -*- coding: utf-8 -*- """ Created on Sun Dec 17 20:21:53 2017
@author: MJP """
#Import necessary libraries import pandas import numpy import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi
#Read the Mars Crater Database into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix (that I don't fully understand) "for display formats to avoid run time errors", or so our instructors tell us pandas.set_option('display.float_format', lambda x:'%f'%x)
# variables of interest are already numeric, so no need to change
# check length of data print ("length of data:") print(len(marsdata))
# subset data for craters with depth > 0 (i.e. no raised craters or depthless ones) marssub1 = marsdata[marsdata['DEPTH_RIMFLOOR_TOPOG']>0]
#make a copy of my new subsetted data marssub2 = marssub1.copy()
# check that data are properly subset print ('\n'"length of data after subset:") print(len(marssub2))
# group the Crater Diameter by size. We're not printing here because there are too many data c1=marssub2.groupby('DIAM_CIRCLE_IMAGE').size()
# cut Crater Diamter data marssub2['DIAM_CAT'] = pandas.cut(marssub2.DIAM_CIRCLE_IMAGE, [0, 10, 512.75])
# change format from numeric to categorical marssub2['DIAM_CAT'] = marssub2['DIAM_CAT'].astype('category')
print('\n''describe DIAM_CAT') desc3 = marssub2['DIAM_CAT'].describe() print(desc3)
print('\n''crater diameter counts') c11 = marssub2['DIAM_CAT'].value_counts(sort=False, dropna=True) print(c11)
# using ols function for calculating the F-statistic and associated p value model1 = smf.ols(formula='DEPTH_RIMFLOOR_TOPOG ~ C(DIAM_CAT)', data=marssub2) results1 = model1.fit() print (results1.summary())
marssub3 = marssub2[['DEPTH_RIMFLOOR_TOPOG', 'DIAM_CAT']].dropna()
print ('\n''means for DEPTH_RIMFLOOR_TOPOG by crater diameter category') m1= marssub3.groupby('DIAM_CAT').mean() print (m1)
print ('\n''standard deviations for DEPTH_RIMFLOOR_TOPOG by crater diameter category') sd1 = marssub3.groupby('DIAM_CAT').std() print (sd1)
#End of program
2) Week 1 Output
length of data: 384343
length of data after subset: 76804
describe DIAM_CAT count           76804 unique              2 top       (0.0, 10.0] freq            53390 Name: DIAM_CAT, dtype: object
crater diameter counts (0.0, 10.0]       53390 (10.0, 512.75]    23414 Name: DIAM_CAT, dtype: int64                              OLS Regression Results                             ================================================================================ Dep. Variable:     DEPTH_RIMFLOOR_TOPOG   R-squared:                       0.193 Model:                              OLS   Adj. R-squared:                  0.193 Method:                   Least Squares   F-statistic:                 1.837e+04 Date:                  Mon, 18 Dec 2017   Prob (F-statistic):               0.00 Time:                          22:11:54   Log-Likelihood:                -22484. No. Observations:                 76804   AIC:                         4.497e+04 Df Residuals:                     76802   BIC:                         4.499e+04 Df Model:                             1                                         Covariance Type:              nonrobust                                         =========================================================================================================================                                                             coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------------------------------------------------- Intercept                                                 0.2745      0.001    195.587      0.000       0.272       0.277 C(DIAM_CAT)[T.Interval(10.0, 512.75, closed='right')]     0.3445      0.003    135.546      0.000       0.340       0.350 ============================================================================== Omnibus:                    25528.287   Durbin-Watson:                   1.475 Prob(Omnibus):                  0.000   Jarque-Bera (JB):           127882.759 Skew:                           1.535   Prob(JB):                         0.00 Kurtosis:                       8.526   Cond. No.                         2.42 ==============================================================================
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
means for DEPTH_RIMFLOOR_TOPOG by crater diameter category                 DEPTH_RIMFLOOR_TOPOG DIAM_CAT                            (0.0, 10.0]                 0.274485 (10.0, 512.75]              0.619009
standard deviations for DEPTH_RIMFLOOR_TOPOG by crater diameter category                 DEPTH_RIMFLOOR_TOPOG DIAM_CAT                            (0.0, 10.0]                 0.220973 (10.0, 512.75]              0.483306
3) Discussion
I examined the relationship between crater diameter and crater depth. I first divided the data into two categories -- craters with diameter less than or greater than 10 meters. The null hypothesis is that there is no difference in crater depth between big (>10 m) and small (<10 m) craters.
Big craters have a mean depth of 0.62 km with a standard deviation of 0.48.
Small craters have a mean depth of 0.27 km with a standard deviation of 0.22.
The ANOVA returns an F statistic of over 1.8x10^4, with a p-value so small that the program returns a zero. Thus, we may reject the null hypothesis and conclude that big craters are deeper than small craters.
0 notes
mjpdatascience · 7 years ago
Text
Week 4 DS&A: Crater Depth as a Function of Diameter
Table of Contents:
My Week 4 Program
Week 4 Output
Discussion
1) My Week 4 Program:
# -*- coding: utf-8 -*- """ Created on Sun Nov 26 20:21:53 2017
@author: MJP """
#Import necessary libraries import pandas import numpy import seaborn import matplotlib.pyplot as plt
#Read the Mars Crater Database into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
#Set PANDAS to show all columns in DataFrame pandas.set_option('display.max_columns', None) #Set PANDAS to show all rows in DataFrame pandas.set_option('display.max_rows', None)
# bug fix (that I don't fully understand) "for display formats to avoid run time errors", or so our instructors tell us pandas.set_option('display.float_format', lambda x:'%f'%x)
# variables of interest are already numeric, so no need to change
# check length of data print ("length of data:") print(len(marsdata))
# subset data for craters with depth > 0 (i.e. no raised craters or depthless ones) marssub1 = marsdata[marsdata['DEPTH_RIMFLOOR_TOPOG']>0]
#make a copy of my new subsetted data marssub2 = marssub1.copy()
# check that data are properly subset print ('\n'"length of data after subset:") print(len(marssub2))
# standard deviation and other descriptive statistics for quantitative variables print('\n'"describe crater diameters") desc1 = marssub2['DIAM_CIRCLE_IMAGE'].describe() print(desc1)
# standard deviation and other descriptive statistics for quantitative variables print('\n'"describe crater depths") desc2 = marssub2['DEPTH_RIMFLOOR_TOPOG'].describe() print(desc2)
# group the Crater Diameter by size. We're not printing here because there are too many data c1=marssub2.groupby('DIAM_CIRCLE_IMAGE').size()
# cut Crater Diamter data marssub2['DIAM_CAT'] = pandas.cut(marssub2.DIAM_CIRCLE_IMAGE, [0, 25, 50, 75, 100, 512.75])
# change format from numeric to categorical marssub2['DIAM_CAT'] = marssub2['DIAM_CAT'].astype('category')
print('describe DIAM_CAT') desc3 = marssub2['DIAM_CAT'].describe() print(desc3)
print('crater diameter counts') c11 = marssub2['DIAM_CAT'].value_counts(sort=False, dropna=True) print(c11)
#Univariate histogram for categorical variable: seaborn.countplot(x="DIAM_CAT", data=marssub2) plt.xlabel('Crater Diameter') plt.title('Distribution of Crater Diameters in the Mars Crater Study')
# group the Crater Depth by size. We're not printing here because there are too many data c2=marssub2.groupby('DEPTH_RIMFLOOR_TOPOG').size()
# cut Crater Diamter data marssub2['DEPTH_CAT'] = pandas.cut(marssub2.DEPTH_RIMFLOOR_TOPOG, [0, 0.2, 0.4, 0.6, 0.8, 1.0, 4.95])
# change format from numeric to categorical marssub2['DEPTH_CAT'] = marssub2['DEPTH_CAT'].astype('category')
print('describe DEPTH_CAT') desc4 = marssub2['DEPTH_CAT'].describe() print(desc4)
print('crater depth counts') c22 = marssub2['DEPTH_CAT'].value_counts(sort=False, dropna=True) print(c22)
#Univariate histogram for categorical variable: seaborn.countplot(x="DEPTH_CAT", data=marssub2) plt.xlabel('Crater Depth') plt.title('Distribution of Crater Depth in the Mars Crater Study')
#basic scatterplot:  Q->Q scat1 = seaborn.regplot(x="DIAM_CIRCLE_IMAGE", y="DEPTH_RIMFLOOR_TOPOG", fit_reg=True, data=marssub2) plt.xlabel('Crater Diameter') plt.ylabel('Crater Depth') plt.title('Scatterplot for the Association Between Crater Diameter and Depth')
#End of program
2) Week 4 Output
length of data: 384343
length of data after subset: 76804
describe crater diameters count   76804.000000 mean       11.061201 std        15.619762 min         1.060000 25%         3.580000 50%         5.880000 75%        12.150000 max       512.750000 Name: DIAM_CIRCLE_IMAGE, dtype: float64
describe crater depths count   76804.000000 mean        0.379514 std         0.360978 min         0.010000 25%         0.120000 50%         0.270000 75%         0.520000 max         4.950000 Name: DEPTH_RIMFLOOR_TOPOG, dtype: float64 describe DIAM_CAT count           76804 unique              5 top       (0.0, 25.0] freq            69166 Name: DIAM_CAT, dtype: object crater diameter counts (0.0, 25.0]        69166 (25.0, 50.0]        5637 (50.0, 75.0]        1310 (75.0, 100.0]        407 (100.0, 512.75]      284 Name: DIAM_CAT, dtype: int64 Out[35]: Text(0.5,1,'Distribution of Crater Diameters in the Mars Crater Study')
Tumblr media
describe DEPTH_CAT count          76804 unique             6 top       (0.0, 0.2] freq           31695 Name: DEPTH_CAT, dtype: object crater depth counts (0.0, 0.2]     31695 (0.2, 0.4]     18398 (0.4, 0.6]     11522 (0.6, 0.8]      6773 (0.8, 1.0]      3608 (1.0, 4.95]     4808 Name: DEPTH_CAT, dtype: int64 Out[38]: Text(0.5,1,'Distribution of Crater Depth in the Mars Crater Study')
Tumblr media Tumblr media
3) Discussion:
Is crater depth correlated with crater diameter? More energetic impacts are expected to leave wider craters, but do they leave deeper craters? Crater diameter and depth were analyzed. Univariate analysis of both variables show that both diameter and depth are concentrated at the lower end. There is no "center" to describe; these are not normal or even skewed distributions. Small and shallow craters dominate, with frequency decreasing as diameter and depth increase.
Bivariate analysis of crater depth and diameter shows some positive correlation between the two variables. However, this correlation is poor. The depth of a crater of a given diameter may depend on other factors, such as the type of rock at the impact site.
0 notes
mjpdatascience · 7 years ago
Text
Week 3 DS&A: Crater depth and morphology
Table of Contents:
My Week 3 Program
Week 2 Output
Descriptions
1) My Week 3 Program:
# -*- coding: utf-8 -*- """ Created on Tue Nov 14 20:21:53 2017
@author: MJP """
#Import necessary libraries import pandas import numpy
#Read the Mars Crater Database into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
# bug fix (that I don't fully understand) "for display formats to avoid run time errors", or so our instructors tell us pandas.set_option('display.float_format', lambda x:'%f'%x)
# variables of interest are not numeric, so keep as strings
# check length of data print ("length of data:") print(len(marsdata))
# subset data for craters with depth > 0 (i.e. no raised craters or depthless ones) marssub1 = marsdata[marsdata['DEPTH_RIMFLOOR_TOPOG']>0]
#make a copy of my new subsetted data marssub2 = marssub1.copy()
# check that data are properly subset print ('\n'"length of data after subset:") print(len(marssub2))
# secondary variable for crater volume in cubic kilometers, which equals (pi.height/6)(3radius^2 + height^2) # we won't use VOLUME today marssub2['VOLUME']=((3.14159) * (marssub2['DEPTH_RIMFLOOR_TOPOG']) / 6) * (3 * ((marssub2['DIAM_CIRCLE_IMAGE']) / 2)**2 + (marssub2['DEPTH_RIMFLOOR_TOPOG'])**2)
# replace empty cells for the MORPHOLOGY_EJECT variables: recode to python missing (NaN) marssub2['MORPHOLOGY_EJECTA_1']=marssub2['MORPHOLOGY_EJECTA_1'].replace(' ', numpy.nan) marssub2['MORPHOLOGY_EJECTA_2']=marssub2['MORPHOLOGY_EJECTA_2'].replace(' ', numpy.nan) marssub2['MORPHOLOGY_EJECTA_3']=marssub2['MORPHOLOGY_EJECTA_3'].replace(' ', numpy.nan)
#Display the counts and percentages for three variables, MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2, and MORPHOLOGY_EJECTA_3 print('\n'"Counts for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology") c1 = marssub2['MORPHOLOGY_EJECTA_1'].value_counts(sort=False, dropna=False) print (c1)
print('\n'"Percentages for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology") p1 = marssub2['MORPHOLOGY_EJECTA_1'].value_counts(sort=False, normalize=True) print (p1)
print('\n'"Counts for MORPHOLOGY_EJECTA_2 - the morphology of the layers") c2 = marssub2['MORPHOLOGY_EJECTA_2'].value_counts(sort=False, dropna=False) print (c2)
print('\n'"Percentages for MORPHOLOGY_EJECTA_2 - the morphology of the layers") p2 = marssub2['MORPHOLOGY_EJECTA_2'].value_counts(sort=False, normalize=True) print (p2)
print('\n'"Counts for MORPHOLOGY_EJECTA_3 - the overall texture and/or shape of the layers that are unique") c3 = marssub2['MORPHOLOGY_EJECTA_3'].value_counts(sort=False, dropna=False) print (c3)
print('\n'"Percentages for MORPHOLOGY_EJECTA_3 - the overall texture and/or shape of the layers that are unique") p3 = marssub2['MORPHOLOGY_EJECTA_3'].value_counts(sort=False, normalize=True) print (p3)
#End of program
2) Week 2 Output
length of data: 384343
length of data after subset: 76804
Counts for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology NaN                     38168 Rd/MLERC/MLEPC/MLERS        1 SLERC/Rd                   59 SLEPC/Rd                   41 SLERCPd                     3 DLEPS/Rd/DLEPS              3 DLEPS/DLEPSPd               4 Rd/MLERS/Rd                 1 Rd                      20567 DLERS                    1149 Rd/DLEPS                  136 MLEPC                       2 Rd/SPERS                    1 DLEPC                     200 DLEPC/DLERS/Rd              2 DLERCPd                     3 MLERC/MLEPS/MLERS           2 MLERS/MLERS/Rd/MLEPS        1 SLERS/Rd/SLERS              1 Rd/DLERC                    9 MLEPC/MLEPS/MLEPS           6 Rd/SLEPC/Rd                 1 Rd/MLEPC/MLEPC/MLEPS        1 Rd/DLEPC/DLERC              3 DLERSRd                     2 DLEPd                       1 DLEPS/DLRES                 1 DLERC/Rd/DLERS              3 DLEPC/DLEPS/Rd              2 SLEPS/SLERS                 2   DLERC/DLEPS                92 MLERS/Rd                   10 Rd/SLERC                  123 MLERS                     476 Rd/MLERC/MLERS/MLEPC        1 Rd/MLEPS/MLERS/MLERS        1 SLEPSPd                    17 SLErS                       1 DLERS/Rd                   37 Rd/MLERC/MLERS/MLERS        2 DLERC/Rd                    5 SLERSRd                     4 DLERC/DLEPd                 1 DLERS/Rd/DLERS              4 Rd/DLEPC/DLEPS             52 DLEPC/Rd                    3 MLERS/MLERS/Rd/MLERS        1 Rd/SLEPS                  353 Rd/MLEPC                    2 Rd/DLEPC/DLEPSPd            1 Rd/SLEPCPd                  1 MLERC/MLERC/MLEPS           5 RD/SLEPC                    1 DLERC/DLEPCPd               1 MLEPC/MLERS/MLERS           2 MLERC/MLERS/MLERS/Rd        2 SLEPSRd                     3 DLEPC/DLEPd                 1 MLERSRd                     1 MLEPC/MLERS/MLEPS           4 Name: MORPHOLOGY_EJECTA_1, Length: 143, dtype: int64
Percentages for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology Rd/MLERC/MLEPC/MLERS   0.000026 SLERC/Rd               0.001527 SLEPC/Rd               0.001061 SLERCPd                0.000078 DLEPS/Rd/DLEPS         0.000078 DLEPS/DLEPSPd          0.000104 Rd/MLERS/Rd            0.000026 Rd                     0.532327 DLERS                  0.029739 Rd/DLEPS               0.003520 MLEPC                  0.000052 Rd/SPERS               0.000026 DLEPC                  0.005177 DLEPC/DLERS/Rd         0.000052 DLERCPd                0.000078 MLERC/MLEPS/MLERS      0.000052 MLERS/MLERS/Rd/MLEPS   0.000026 SLERS/Rd/SLERS         0.000026 Rd/DLERC               0.000233 MLEPC/MLEPS/MLEPS      0.000155 Rd/SLEPC/Rd            0.000026 Rd/MLEPC/MLEPC/MLEPS   0.000026 Rd/DLEPC/DLERC         0.000078 DLERSRd                0.000052 DLEPd                  0.000026 DLEPS/DLRES            0.000026 DLERC/Rd/DLERS         0.000078 DLEPC/DLEPS/Rd         0.000052 SLEPS/SLERS            0.000052 Rd/DLERC/DLERS         0.000181   DLERC/DLEPS            0.002381 MLERS/Rd               0.000259 Rd/SLERC               0.003184 MLERS                  0.012320 Rd/MLERC/MLERS/MLEPC   0.000026 Rd/MLEPS/MLERS/MLERS   0.000026 SLEPSPd                0.000440 SLErS                  0.000026 DLERS/Rd               0.000958 Rd/MLERC/MLERS/MLERS   0.000052 DLERC/Rd               0.000129 SLERSRd                0.000104 DLERC/DLEPd            0.000026 DLERS/Rd/DLERS         0.000104 Rd/DLEPC/DLEPS         0.001346 DLEPC/Rd               0.000078 MLERS/MLERS/Rd/MLERS   0.000026 Rd/SLEPS               0.009137 Rd/MLEPC               0.000052 Rd/DLEPC/DLEPSPd       0.000026 Rd/SLEPCPd             0.000026 MLERC/MLERC/MLEPS      0.000129 RD/SLEPC               0.000026 DLERC/DLEPCPd          0.000026 MLEPC/MLERS/MLERS      0.000052 MLERC/MLERS/MLERS/Rd   0.000052 SLEPSRd                0.000078 DLEPC/DLEPd            0.000026 MLERSRd                0.000026 MLEPC/MLERS/MLEPS      0.000104 Name: MORPHOLOGY_EJECTA_1, Length: 142, dtype: float64
Counts for MORPHOLOGY_EJECTA_2 - the morphology of the layers NaN                    58838 Hu/Sm                      2 HuAm/SmAm                  5 HuSL/HuSL/SmBL             3 SmBL/SmAm                  1 HuSL/HuLS/HuSp             1 HuBL/SmSL                  1 Sm/SmBL/HuBL               2 SmSL/HuBL                  8 Sm/SmSL                    9 Hu/HuSp                    9 HuBL/SmBL/SmSL             1 SmSL                    2366 HuBL/SmBL                 11 HuSL/HuBL/SmSp             1 HuSL/HuSp                 22 HuSL/SmBL                 36 SmAm                     716 SmSL/SmBL                 19 HuBL/SmBL/HuBL             1 HuSL/Sm                    5 Sm                       642 HuAm/HuSL                  4 HuSL/HuSL/HuSp             4 HuSL/HuBL/HuBL/HuSp        1 Hu/HuBL/SmSp               1 Hu/HuSL/HuBL               3 HuSL                    5680 Sm/SmAm                    2 HuSp                      72   HuAM                       1 HuSL/SmBL/SmBL             1 Sm/HuSL                    3 Sm/HuSL/HuSL               1 Sm/HuBL                    2 SmSL/Sm                    5 Hu/HuBL/HuBL               3 HuSL/HuBL                196 SmSL/HuSL/HuSL             1 Sm/HuSL/SmSp               1 HuSL/HuBL/HuBL/HuBL        1 HuSL/HuSL/HuBL            13 SmBL                    1005 HUBL                       1 SmSL/SmSL/SmBL             1 SmSL/SmBL/SmBL             2 SmSL/SmSp                 11 HuAm/SmSp                  1 Hu/SmSL                   11 SmSp                      13 Hu/Hu/Sm                   1 SmBL/HuSL                  1 SmSL/HuSL                  7 Hu/HuBL                   40 HuAm                    1269 SmBL/HuBL                  1 Hu/HuSp/SmSp               1 HuSL/HuBL/SmBL             1 HuSL/HuBL/Sm               1 SmAm/SmBL                  1 Name: MORPHOLOGY_EJECTA_2, Length: 102, dtype: int64
Percentages for MORPHOLOGY_EJECTA_2 - the morphology of the layers Hu/Sm                 0.000111 HuAm/SmAm             0.000278 HuSL/HuSL/SmBL        0.000167 SmBL/SmAm             0.000056 HuSL/HuLS/HuSp        0.000056 HuBL/SmSL             0.000056 Sm/SmBL/HuBL          0.000111 SmSL/HuBL             0.000445 Sm/SmSL               0.000501 Hu/HuSp               0.000501 HuBL/SmBL/SmSL        0.000056 SmSL                  0.131693 HuBL/SmBL             0.000612 HuSL/HuBL/SmSp        0.000056 HuSL/HuSp             0.001225 HuSL/SmBL             0.002004 SmAm                  0.039853 SmSL/SmBL             0.001058 HuBL/SmBL/HuBL        0.000056 HuSL/Sm               0.000278 Sm                    0.035734 HuAm/HuSL             0.000223 HuSL/HuSL/HuSp        0.000223 HuSL/HuBL/HuBL/HuSp   0.000056 Hu/HuBL/SmSp          0.000056 Hu/HuSL/HuBL          0.000167 HuSL                  0.316153 Sm/SmAm               0.000111 HuSp                  0.004008 HuAm/HuBL             0.000056   HuAM                  0.000056 HuSL/SmBL/SmBL        0.000056 Sm/HuSL               0.000167 Sm/HuSL/HuSL          0.000056 Sm/HuBL               0.000111 SmSL/Sm               0.000278 Hu/HuBL/HuBL          0.000167 HuSL/HuBL             0.010909 SmSL/HuSL/HuSL        0.000056 Sm/HuSL/SmSp          0.000056 HuSL/HuBL/HuBL/HuBL   0.000056 HuSL/HuSL/HuBL        0.000724 SmBL                  0.055939 HUBL                  0.000056 SmSL/SmSL/SmBL        0.000056 SmSL/SmBL/SmBL        0.000111 SmSL/SmSp             0.000612 HuAm/SmSp             0.000056 Hu/SmSL               0.000612 SmSp                  0.000724 Hu/Hu/Sm              0.000056 SmBL/HuSL             0.000056 SmSL/HuSL             0.000390 Hu/HuBL               0.002226 HuAm                  0.070633 SmBL/HuBL             0.000056 Hu/HuSp/SmSp          0.000056 HuSL/HuBL/SmBL        0.000056 HuSL/HuBL/Sm          0.000056 SmAm/SmBL             0.000056 Name: MORPHOLOGY_EJECTA_2, Length: 101, dtype: float64
Counts for MORPHOLOGY_EJECTA_3 - the overall texture and/or shape of the layers that are unique NaN                               75580 Butterfly                            73 Outer is Butterfly                    4 Middle is Rectangular                 1 Pseudo-Butterfly                    111 Outer is Splash                      54 Pseudo-Rectangular                   23 Inner is Pseudo-Pin-Cushion           1 Inner is Pin-Cushion                 85 Outer is Rectangular                  1 Pin-Cushion                         337 Small-Crown / Pseudo-Butterfly        1 Outer is Pseudo-Butterfly             3 Pin-Cushion / Pseudo-Butterfly        1 Pin-Cushion / Butterfly               1 Inner is Pseudo-Butterfly             1 Small-Crown / Sandbar                 2 Bumblebee                            11 Sandbar                              48 Pseudo-Small-Crown                   54 Inner is Pseudo-Small-Crown           4 Small-Crown                         250 Pseudo-Pin-Cushion                    1 Inner is Butterfly                    2 Inner-most is Small-Crown             1 Pseduo-Butterfly                      1 Rectangular                          36 Inner is Small-Crown                 66 Splash                               51 Name: MORPHOLOGY_EJECTA_3, dtype: int64
Percentages for MORPHOLOGY_EJECTA_3 - the overall texture and/or shape of the layers that are unique Butterfly                     ��  0.059641 Outer is Butterfly               0.003268 Middle is Rectangular            0.000817 Pseudo-Butterfly                 0.090686 Outer is Splash                  0.044118 Pseudo-Rectangular               0.018791 Inner is Pseudo-Pin-Cushion      0.000817 Inner is Pin-Cushion             0.069444 Outer is Rectangular             0.000817 Pin-Cushion                      0.275327 Small-Crown / Pseudo-Butterfly   0.000817 Outer is Pseudo-Butterfly        0.002451 Pin-Cushion / Pseudo-Butterfly   0.000817 Pin-Cushion / Butterfly          0.000817 Inner is Pseudo-Butterfly        0.000817 Small-Crown / Sandbar            0.001634 Bumblebee                        0.008987 Sandbar                          0.039216 Pseudo-Small-Crown               0.044118 Inner is Pseudo-Small-Crown      0.003268 Small-Crown                      0.204248 Pseudo-Pin-Cushion               0.000817 Inner is Butterfly               0.001634 Inner-most is Small-Crown        0.000817 Pseduo-Butterfly                 0.000817 Rectangular                      0.029412 Inner is Small-Crown             0.053922 Splash                           0.041667 Name: MORPHOLOGY_EJECTA_3, dtype: float64
3) Descriptions
I will be investigating a relationship between crater volume and ejecta morphology. Some craters are assigned a zero or negative depth. These will be excluded. Thus, the data were subset to include only craters with a depth greater than zero.
The vast majority of craters do not have any ejecta morphology labels assigned. It is not clear whether these data are missing (i.e. they are manually labeled and nobody has gotten around to them yet) or if a blank means there are no ejecta present. Blank entries were recoded to python missing (NaN).
The program output shows us that the morphology data are missing for most craters, e.g. over 38,000, 58,000, and 75,000 craters are now labeled as NaN for MORPHOLOGY_EJECTA_1, _2, and _3, respectively. And this is out of a data subset with only 76,804 entries. As noted last week, the clustering of multiple morphology assignments per crater leads to an excessive number of distinct variables, with MORPHOLOGY_EJECTA_1 and _2 comprising approximately sixty variables each, and nearly thirty variables in MORPHOLOGY_EJECTA_3. We will eventually need a way of separating out different morphology labels in order to simplify the data.
0 notes
mjpdatascience · 7 years ago
Text
Week 2 DS&A: Counting Mars Crater Ejecta Morphologies and Layers
Table of Contents:
My Week 2 Program
Week 2 Output
Variable Descriptions
1) My Week 2 Program:
# -*- coding: utf-8 -*- """ Created on Tue Nov 14 20:21:53 2017
@author: MJP
This program loads the Mars Crater dataset, and then lists the counts and percentages for three different variables. """
#Import necessary libraries import pandas
#Read the Mars Crater dataset into memory marsdata = pandas.read_csv("dab_marscrater_pds.csv", low_memory=False)
#Display the counts and percentages for three variables, MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2, and NUMBER_LAYERS print('\n'"Counts for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology") c1 = marsdata['MORPHOLOGY_EJECTA_1'].value_counts(sort=False) print (c1)
print('\n'"Percentages for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology") p1 = marsdata['MORPHOLOGY_EJECTA_1'].value_counts(sort=False, normalize=True) print (p1)
print('\n'"Counts for MORPHOLOGY_EJECTA_2 - the morphology of the layers") c2 = marsdata['MORPHOLOGY_EJECTA_2'].value_counts(sort=False) print (c2)
print('\n'"Percentages for MORPHOLOGY_EJECTA_2 - the morphology of the layers") p2 = marsdata['MORPHOLOGY_EJECTA_2'].value_counts(sort=False, normalize=True) print (p2)
print('\n'"Counts for NUMBER_LAYERS - the maximum number of cohesive layers") c3 = marsdata['NUMBER_LAYERS'].value_counts(sort=False) print (c3)
print('\n'"Percentages for NUMBER_LAYERS - the maximum number of cohesive layers") p3 = marsdata['NUMBER_LAYERS'].value_counts(sort=False, normalize=True) print (p3)
#End of program
2) Week 2 Output
Counts for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology SLEPCPd                         74 Rd/DLEPC/DLEPSPd                 1 DLERC/DLRPS                      1 Rd/SLERS/Rd                      1 SLERSPd                         16 DLERS/Rd/DLEPS                   2 MLERSRd                          1 DLEPS/DLEPC                      5 DLEPC/DLERSRd                    1 Rd/DLEPC/DLEPS                  52 DLEPC/DLERS/Rd                   2 Rd/MLERC/MLERS/MLERS             2 MLEPC/MLERC/MLEPS                2 Rd/DLEPC                        32 Rd/SLERCPd                       1 DLERC/DLEPS                    106 DLERC/DLEPSPd                    1 MLEPC/MLERC/MSLEPS               1 DLERC/Rd/DLERS                   3 DLEPC/DLERS                     86 DLEPSPd                          2 DLERC/DLERS/Rd                   3 DLERCPd/DLEPCPd                  1 DLERS/DLERSRd                    2 Rd/SLEPSPd                       1 MLEPC/MLEPC/MLEPS                1 DLSPC                            1 MLEPS                           37 DLERS/DLEPS                     20 DLEPCPd/DLEPSPd                  1
Rd/DLEPC/DLERSRd                 1 DLEPC                          232 Rd/MLEPC                         2 MLERS/MLERS/Rd/MLEPS             1 SLErS                            1 SLEPSPd/Rd                       1 Rd/MLERS                       199 MLERS/MLEPC/MLERS                1 SLEPC/SLEPS                      3 MLERC                            3 SLEPS                         4949 MLEPC                            2 SLEPS/Rd                        47 DLEPC/DLEPd                      1 DLEPd                            1 SLEPSPd                         51 DLEPS                          534 Rd/DLEPS                       137 SLEPCPd/Rd                       1 DLERCPd                          5 DLEPC/DLEPCPd                    4 Rd/SLERS                       555 SLERS/Rd                       281 DLERC/Rd/SLERS                   1 Rd/MLERC                         1 DLERS/Rd/DLERS                   4 DLERC/DLEPC/Rd                   1 Rd/MLEPC/MLERS/MLERS/MLERS       2 DLEPC/DLEPSPd                    3 DLERS/Rd                        39 Name: MORPHOLOGY_EJECTA_1, Length: 156, dtype: int64
Percentages for MORPHOLOGY_EJECTA_1 - the classification of ejecta morphology SLEPCPd                       0.000193 Rd/DLEPC/DLEPSPd              0.000003 DLERC/DLRPS                   0.000003 Rd/SLERS/Rd                   0.000003 SLERSPd                       0.000042 DLERS/Rd/DLEPS                0.000005 MLERSRd                       0.000003 DLEPS/DLEPC                   0.000013 DLEPC/DLERSRd                 0.000003 Rd/DLEPC/DLEPS                0.000135 DLEPC/DLERS/Rd                0.000005 Rd/MLERC/MLERS/MLERS          0.000005 MLEPC/MLERC/MLEPS             0.000005 Rd/DLEPC                      0.000083 Rd/SLERCPd                    0.000003 DLERC/DLEPS                   0.000276 DLERC/DLEPSPd                 0.000003 MLEPC/MLERC/MSLEPS            0.000003 DLERC/Rd/DLERS                0.000008 DLEPC/DLERS                   0.000224 DLEPSPd                       0.000005 DLERC/DLERS/Rd                0.000008 DLERCPd/DLEPCPd               0.000003 DLERS/DLERSRd                 0.000005 Rd/SLEPSPd                    0.000003 MLEPC/MLEPC/MLEPS             0.000003 DLSPC                         0.000003 MLEPS                         0.000096 DLERS/DLEPS                   0.000052 DLEPCPd/DLEPSPd               0.000003   Rd/DLEPC/DLERSRd              0.000003 DLEPC                         0.000604 Rd/MLEPC                      0.000005 MLERS/MLERS/Rd/MLEPS          0.000003 SLErS                         0.000003 SLEPSPd/Rd                    0.000003 Rd/MLERS                      0.000518 MLERS/MLEPC/MLERS             0.000003 SLEPC/SLEPS                   0.000008 MLERC                         0.000008 SLEPS                         0.012877 MLEPC                         0.000005 SLEPS/Rd                      0.000122 DLEPC/DLEPd                   0.000003 DLEPd                         0.000003 SLEPSPd                       0.000133 DLEPS                         0.001389 Rd/DLEPS                      0.000356 SLEPCPd/Rd                    0.000003 DLERCPd                       0.000013 DLEPC/DLEPCPd                 0.000010 Rd/SLERS                      0.001444 SLERS/Rd                      0.000731 DLERC/Rd/SLERS                0.000003 Rd/MLERC                      0.000003 DLERS/Rd/DLERS                0.000010 DLERC/DLEPC/Rd                0.000003 Rd/MLEPC/MLERS/MLERS/MLERS    0.000005 DLEPC/DLEPSPd                 0.000008 DLERS/Rd                      0.000101 Name: MORPHOLOGY_EJECTA_1, Length: 156, dtype: float64
Counts for MORPHOLOGY_EJECTA_2 - the morphology of the layers SmSL/SmSL/HuBL       1 HuBL/SmSp            1 HuAm/HuSL            4 Hu                1307 HUSL                 2 Sm/SmSp              1 Sm/HuSL              3 SmBL/HuSL            1 HuAm              1356 HuSL/HuLS/HuSp       1 Hu/HuBL/SmSp         1 HuBL/HuBL/SmBL       1 SmSL/Sm              7 HuSL/HuAm           14 Hu/HuSp/SmSp         1 Hu/SmSL             12 SmBL/SmAm            1 Hu/HuSL/HuBL         3 Sm/SmBL/HuBL         2 Hu/Sm                2 HuSL/Sm              6 Hu/SmAm              2 HuBL/HuSL           10 Hu/HuAm              1 HuAm/HuSp            2 HuBL/SmBL           11 Hu/HuBL/HuBL         3 HUBL                 1 Hu/HuSL/HuSp         1 HuSL/SmBL/SmSL       1
Sm/SmBL             10 Hu/SmBL             15 SmSL/SmSp           11 SmBL/HuBL            1 HuBL/SmSL            1 SmSL/HuSL            7 Hu/HuBL             43 Sm/HuSL/SmSp         1 SmSL/SmSL/SmSp       7 Sm/HuSL/HuSL         1 HuSL/HuSL            2 Sm                 770 Hu/HuSp/HuSp         1 HuSL/SmBL/HuBL       1 SmBL/SmSp            1 HuBL/SmSL/SmSp       1 SmAm/SmSL            1 HuSL/SmBL           37 HuSL/HuSL/SmSp       3 HuAm/SmSL            2 HuSL/SmBL/SmBL       1 HuSL/HuSL/SmBL       3 SmsL                 2 SmSL              2713 HuSL/HuBL/HuSp       4 Hu/HuSL             63 HuSL/HuBL/HuBL      14 HuAm/SmSp            1 SmSL/HuBL            8 HuSL/HuSL/HuBL      13 Name: MORPHOLOGY_EJECTA_2, Length: 104, dtype: int64
Percentages for MORPHOLOGY_EJECTA_2 - the morphology of the layers SmSL/SmSL/HuBL    0.000003 HuBL/SmSp         0.000003 HuAm/HuSL         0.000010 Hu                0.003401 HUSL              0.000005 Sm/SmSp           0.000003 Sm/HuSL           0.000008 SmBL/HuSL         0.000003 HuAm              0.003528 HuSL/HuLS/HuSp    0.000003 Hu/HuBL/SmSp      0.000003 HuBL/HuBL/SmBL    0.000003 SmSL/Sm           0.000018 HuSL/HuAm         0.000036 Hu/HuSp/SmSp      0.000003 Hu/SmSL           0.000031 SmBL/SmAm         0.000003 Hu/HuSL/HuBL      0.000008 Sm/SmBL/HuBL      0.000005 Hu/Sm             0.000005 HuSL/Sm           0.000016 Hu/SmAm           0.000005 HuBL/HuSL         0.000026 Hu/HuAm           0.000003 HuAm/HuSp         0.000005 HuBL/SmBL         0.000029 Hu/HuBL/HuBL      0.000008 HUBL              0.000003 Hu/HuSL/HuSp      0.000003 HuSL/SmBL/SmSL    0.000003   Sm/SmBL           0.000026 Hu/SmBL           0.000039 SmSL/SmSp         0.000029 SmBL/HuBL         0.000003 HuBL/SmSL         0.000003 SmSL/HuSL         0.000018 Hu/HuBL           0.000112 Sm/HuSL/SmSp      0.000003 SmSL/SmSL/SmSp    0.000018 Sm/HuSL/HuSL      0.000003 HuSL/HuSL         0.000005 Sm                0.002003 Hu/HuSp/HuSp      0.000003 HuSL/SmBL/HuBL    0.000003 SmBL/SmSp         0.000003 HuBL/SmSL/SmSp    0.000003 SmAm/SmSL         0.000003 HuSL/SmBL         0.000096 HuSL/HuSL/SmSp    0.000008 HuAm/SmSL         0.000005 HuSL/SmBL/SmBL    0.000003 HuSL/HuSL/SmBL    0.000008 SmsL              0.000005 SmSL              0.007059 HuSL/HuBL/HuSp    0.000010 Hu/HuSL           0.000164 HuSL/HuBL/HuBL    0.000036 HuAm/SmSp         0.000003 SmSL/HuBL         0.000021 HuSL/HuSL/HuBL    0.000034 Name: MORPHOLOGY_EJECTA_2, Length: 104, dtype: float64
Counts for NUMBER_LAYERS - the maximum number of cohesive layers 0    364612 1     15467 2      3435 3       739 4        85 5         5 Name: NUMBER_LAYERS, dtype: int64
Percentages for NUMBER_LAYERS - the maximum number of cohesive layers 0    0.948663 1    0.040243 2    0.008937 3    0.001923 4    0.000221 5    0.000013 Name: NUMBER_LAYERS, dtype: float64
3) Variable Descriptions
The NUMBER_LAYERS variable shows layers ranging from zero to five, with very clear decreasing frequency as the number of layers increases (zero is most common and five is least common). However, the two MORPHOLOGY variables are a mess! Each has over sixty unique categories. A review of the codebook reminds us that multiple morphologies may be labeled, separated by a '/' character, listed inner to outer or top to bottom. Thus, we may need to learn a way of separating out different morphology labels in order to simplify the data. For example, there are currently seven unique categories that include "DLERC" in MORPHOLOGY_EJECTA_1. Some minimal data scrubbing could combine these.
0 notes
mjpdatascience · 7 years ago
Text
Week 1 DS&A: Mars Craters Study
Week 1 data set: Mars Craters Study
Research question: Is crater depth associated with crater diameter?
Impact craters are created when a high-velocity small body impacts a planet or moon surface, resulting in both compression and ejection of material at the impact site. I expect higher-energy impacts to create larger craters. However, does "larger" mean wider, deeper, or both? Some or even most of the ejected material may fall back into the crater, thus minimizing the depth of larger craters.
Literature search:
Search terms "impact crater depth" and "impact crater size" were entered into Google Scholar . Citations are posted at the bottom of this entry.
A brief search of the literature reveals several relevant publications. The first (1) is a purely theoretical investigation of the dependence of impact crater diameter and depth. Crater formation is dependent on the rheology of different sub-crater rocks; thus, the relationship between crater diameter and depth in the Mars craters may vary with location.
Another publication (2) discusses the relationship between crater depth and diameter on Mars. The authors classify craters into three groups, one of which has a distinct depth/diameter distribution, which they attribute to water content in the impacted rocks.
A third publication (3) is another theoretical study that examines the effects of ice on crater formation. "Ice fraction, thickness, and depth" all contribute to how material is ejected upon impact, thus affecting crater morphology.
These three publications indicate that there is a relationship between impact crater diameter and depth. However; this relationship depends on the characteristics of the impact site. Thus, the depth/diameter relationship may reveal different geological regions of Mars.
(1) Numerical modelling of the impact crater depth–diameter dependence in an acoustically fluidized target K.Wünnemanna, B.A.Ivanovb Planetary and Space Science Volume 51, Issue 13, November 2003, Pages 831-845 https://doi.org/10.1016/j.pss.2003.08.001
(2) Ancient oceans in the northern lowlands of Mars: Evidence from impact crater depth/diameter relationships Joseph M. Boyce, Peter Mouginis-Mark, Harold Garbeil Journal of Geophysical Research Volume 110, Issue E3, March 2005 https://doi.org/10.1029/2004JE002328
(3) Impact crater formation in icy layered terrains on Mars Laurel E. Senft, Sarah T. Stewart Meteoritics & Planetary Science Volume 43, Issue 12, December 2008, Pages 1993–2013 https://doi.org/10.1111/j.1945-5100.2008.tb00657.x
0 notes