chandrakant-padhee-blog
Chandrakant Padhee
4 posts
Don't wanna be here? Send us removal request.
chandrakant-padhee-blog · 5 years ago
Text
Mars Craters - Creating Graphs for Variables
Introduction to blog
Purpose of this blog is to post my assignment work related to the course “Data Management and Visualization” offered by Wesleyan University through Coursera. This post is for Week 4 assignment which is broadly focused towards representation of variables through univariate and bivariate graphs.  
Area of research and Data processing
Area of research selected in previous weeks was Mars Crater’s study. For Week 4 main task was to build upon code developed during previous week and portray graphical representation chosen variables.
Below are the data type selected.
Crater size in Diameters: Quantitative data (was modified to categorical during previous week’s assignment)
Ejecta Perimeter Morphology: Categorical data (shape of perimeter)
Ejecta Surface Contour Morphology: Categorical data (types of profiles)
Number of layers: Quantitative data (Actual number of layers)
Respective Univariate Charts were created AND a Bivariate chart was created between Crater Size and Number of layers to check co-relation. Code is available under “SAS Code” section. 
SAS Code
Tumblr media
Program Output
Univariate Chart - Crater Size (Diameter) Quantitative
Tumblr media
Univariate Chart - Number of Layers Quantitative
Tumblr media
Univariate Chart - Morphology 1 Categorical (Ejecta Perimeter Shape)
Tumblr media
Univariate Chart - Morphology 2 Categorical (Ejecta - Hummocky & Smooth)
Tumblr media
Bivariate Chart - Relationship between Number of Layers and Crater Size
Tumblr media
Secondary Bivariate Chart ( Crater Size Vs Morphologies)
Tumblr media Tumblr media
Inference:
Univariate charts do depict slight similarity between Number of Layers of Ejecta Vs Crater Size in terms of Diameter. Craters with single layer constitute ~78% of sample and sizes below 10Km is ~ 78% as well.  
Other inferences as drawn from charts are as below: 
1.      Most of the craters from segregated data are having One Layer (78%) or Two layers (17%) This has slight co-relation  with size of craters as 78% of them fall under less than 10KM category.  
2.     Regarding Perimeter Morphology, it is equally shared by sample data, which means these are not directly correlated by crater size. Soil or surface type might be a cause but we do not have soil properties data to prove this.    
3.     Ejecta contour shows slight correlation with majority shared by Hummocky type. This can be co-related by size, impact force or soil properties but again we do not have related data to absolutely prove this hypothesis 
Looking at Bi-variate chart between Sizes and Layers it is evident that minimum crater diameter for each category is dependent on Layers.    
With Layer 1 - Min dia - 1KM 
With Layers 2 - Min dia - 3 Km 
With Layers 3 - Min Dia - 5 Km 
With Layers 4 - Min Dia - 10 Km 
With Layers 5 - Min Dia - 20Km 
Summary
Purpose of assignment is hereby covered considering below points.
Writing programming code: SAS was used for coding and same is presented under section “SAS Code”  
Display of Univariate Charts for Variables: This is covered under section “SAS output and Charts section”
Display of Biivariate Charts for Variables: This is covered under section “SAS output and Charts section”
Description of Charts: This is covered under Inference section above. 
Labeling: Appropriate labels were inserted in CODE as well as Charts for easy interpretation.
0 notes
chandrakant-padhee-blog · 5 years ago
Text
Mars Craters Data Management AND Working with Variables
Introduction to blog
Purpose of this blog is to post  assignment work related to the course “Data Management and Visualization” offered by Wesleyan University through Coursera. This post is for Week 3 assignment which is broadly focused towards data management decisions for the variables selected in previous week’s assignments.  
Area of research and Data processing
Area of research selected in Week 1 was Mars Crater’s study. Initial programming was done and posted under Week 2 assignment. For Week 3 main task was to code out missing data, coding in valid data, recoding variables, creating secondary variables and binning or grouping variables, whichever is applicable based on variables selected.
Based on global set of variables available for mars crater study four variables were opted and below operation performed as part of Week 3 assignment. SAS code available in next section.
1.      Segregated data set by eliminating rows which had Number of Layers = 0
2.      In reduced dataset below modifications were done. 
Number of layers: No further categorization for number of layers as original data already had data up to just 5 layers.
Crater Diameter: Segregated into below 4 categories
          1- Less tha 10Km dia
          2 – Between 10Km and 20 Km Dia
          3 – Between 20 Km and 30 Km Dia
          4 – Greater tha 30 Km Dia
Morphology 1 – Categorized into 7 simplified patterns  
Morphology 2 – Categorized into 11 simplified patterns  
3.      Frequency table generated for one original variable and rest three derived variables.
SAS Code
/* BELOW CODE IMPORTS DATA AND FETCHES UNDER VARAIABLE DATA*/
LIBNAME mydata " /courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.marscrater_pds; 
/*BELOW CODE LABELS EACH DATA FOR EASY UNDERSTANDING OF TERMS APPEARING IN FREQUENCY TABLE*/
LABEL DIAM_CIRCLE_IMAGE = "Crater Size (Diameter in Km)"
             MORPHOLOGY_EJECTA_1="Layers and pheripheral morphology"
             MORPHOLOGY_EJECTA_2="Ejecta surface morphology"
             NUMBER_LAYERS="Number of Ejecta layers";
 /*BELOW STEP REMOVES ROWS WHICH ARE NOT USEFUL FOR SELECTED DATA ANALYSIS CASE*/            
IF NUMBER_LAYERS > 0;
/* NEW VARIABLES ARE CREATED BASED ON LOGICAL COLLASPING OF AVAILABLE DATA UNDER SELECTED ORIGINAL VARIABLES */
/* DIAM_CIRCLE_IMAGE IS CLASSIFIED IN 4 CATEGORIES IN NEW VARIABLE CRATER_SIZE*/
IF DIAM_CIRCLE_IMAGE < 10 THEN CRATER_SIZE =  1;
ELSE IF DIAM_CIRCLE_IMAGE < 20 THEN CRATER_SIZE = 2;
ELSE IF DIAM_CIRCLE_IMAGE < 30 THEN CRATER_SIZE = 3;
ELSE IF DIAM_CIRCLE_IMAGE > 30 THEN CRATER_SIZE = 4;
 /*LABELLING VARIABLE NAME*/
LABEL CRATER_SIZE = "Crater Size Category (1 = Less than 10 Km, 2 = 10-20Km, 3= 20-30 Km, 4 = Greater than 30Km)";
 /* MORPHOLOGY_EJECTA_1 IS CLASSIFIED IN 7 CATEGORIES IN NEW VARIABLE MORPH_1*/
IF find(MORPHOLOGY_EJECTA_1,"RS") THEN MORPH_1 = "Rampant Sinusal";
ELSE IF find(MORPHOLOGY_EJECTA_1,"RC") THEN MORPH_1 = "Rampant Circular";
ELSE IF find(MORPHOLOGY_EJECTA_1,"PS") THEN MORPH_1 = "Pancake Sinusal";
ELSE IF find(MORPHOLOGY_EJECTA_1,"PC") THEN MORPH_1 = "Pancake Circular";
ELSE IF find(MORPHOLOGY_EJECTA_1,"Pd") THEN MORPH_1 = "Radial Only";
ELSE IF find(MORPHOLOGY_EJECTA_1,"Rs") THEN MORPH_1 = "Rampant Sinusal";
ELSE IF find(MORPHOLOGY_EJECTA_1,"rS") THEN MORPH_1 = "Radial Sinusal";
 LABEL MORPH_1 = "Crater Ejecta perimeter Profile";
 /* MORPHOLOGY_EJECTA_2 IS CLASSIFIED IN 11 CATEGORIES IN NEW VARIABLE MORPH_2*/
IF find(MORPHOLOGY_EJECTA_2,"HuBL") THEN MORPH_2 = "Hummocky Broad lobe";
ELSE IF find(MORPHOLOGY_EJECTA_2,"HuSL") THEN MORPH_2 = "Hummocky Small lobe";
ELSE IF find(MORPHOLOGY_EJECTA_2,"HuAm") THEN MORPH_2 = "Hummocky Amorphous";
ELSE IF find(MORPHOLOGY_EJECTA_2,"HuSp") THEN MORPH_2 = "Hummocky Splash";
ELSE IF find(MORPHOLOGY_EJECTA_2,"SmBL") THEN MORPH_2 = "Smooth Broad lobe";
ELSE IF find(MORPHOLOGY_EJECTA_2,"SmSL") THEN MORPH_2 = "Smooth Small lobe";
ELSE IF find(MORPHOLOGY_EJECTA_2,"SmAm") THEN MORPH_2 = "Smooth Amorphous";
ELSE IF find(MORPHOLOGY_EJECTA_2,"SmSp") THEN MORPH_2 = "Smooth Splash";
ELSE IF find(MORPHOLOGY_EJECTA_2,"Sm") THEN MORPH_2 = "Smooth";
ELSE IF find(MORPHOLOGY_EJECTA_2,"Hu") THEN MORPH_2 = "Hummocky";
ELSE IF find(MORPHOLOGY_EJECTA_2," ") THEN MORPH_2 = "Profile Unavailable";
 /*LABELLING VARIABLE NAME*/
LABEL MORPH_2 = "Crater Ejecta Surface Contour Profile";
 PROC SORT; by CRATER_ID;
 /*FREQUENCY TABLES CONSISTING OF COUNTS AND PERCENTAGES OF ALL 4 SELECTED VARIABLES*/
PROC FREQ; TABLES  NUMBER_LAYERS CRATER_SIZE MORPH_1 MORPH_2;
RUN ;
SAS OUPUT
Tumblr media Tumblr media
Inference
Frequency distribution from above tables were generated after segregating data for which morphology information was available, hence rest of the rows were deleted in data frame. Above distribution reveals below details:
1.      Most of the craters from segregated data are having One Layer (78%) or Two layers (17%) Rest small portion is distributed to Three, Four- and Five-layers Craters
2.      This is also supplemented by Morphology_Ejecta_1 data but additional information received is most of craters under Single and Double layers have equal representation from Pancake Circular, Pancake Sinusal and Rampant Sinusal categories.  
3.      Morphology_Ejecta_2 reveal ejecta patters are mostly hummock type as compared to smooth profiles with 70:30 proportion
4.      Lastly as far as size of craters is considered, most of them fall under less than 10KM category with 78% share.
Above information closely relates to correlation between layer dependent morphology vs crater size which was initial hypothesis. But this can only be proven after further analysis of data.  
Summary
Purpose of assignment is hereby covered considering below points.
Writing programming code: SAS was used for coding and same is presented under section “SAS Code”  
Display of Variables frequency table: This is covered under section “SAS output”
Description of frequency distribution in few sentences: This is covered in “Inference Section”
Labelling: Appropriate labels were inserted in CODE as well as Frequency table headings for easy interpretation.
0 notes
chandrakant-padhee-blog · 5 years ago
Text
Mars Craters - Data Aggregation and Frequency Distribution
Introduction to blog
Purpose of this blog is to post my assignment work related to the course “Data Management and Visualization” offered by Wesleyan University through Coursera. This post is for Week 2 assignment which is broadly focused towards writing program and performing data analysis targeting frequency distribution and aggregation as applicable.  
Area of research and Data processing
Area of research selected in Week 1 was Mars Crater’s study. Programming was done in Python and code is published in next section under “Python Code” but below is explanation of steps taken towards data aggregation.
1.      Loaded initial raw data to “pandas” data frame.
2.      Based on hypothesis identified during week 1 assignment below variables were chosen and aggregated.
a.       Crater size – New column inserted in data frame to categorize craters in multiples of 10. Example Cat 1 = size <10, Cat 2 = 10 < size > 20 and so on.
b.      Morphology 1 – Categories were restricted to first 5 letters of significance based on nomenclature.
c.       Morphology 2 – Categories were restricted to hummocky and Smooth type. Other secondary classification was ignored as they only depict patterns.
d.      Number of Layers. Even though this is corelated with Morphology 1. We considered this data as this variable give more classification upto Layer 5, whereas Morphology 1 considers 3 and above as multiple layers.
3.      Frequency distribution data generated using code depicted in Course. Findings are summarized in Inference section of this blog.
Python Code
# -*- coding: utf-8 -*-
"""
Created on Mon May 25 15:33:27 2020
 @author: Chandrakant Padhee
"""
#BELOW CODES IMPORT NECESSARY LIBRARIES - PANDAS AND NUMPY
import pandas #importing pandas library
import numpy #importing numpy library
 #BUG FIX TO REMOVE RUNTIME ERROR
pandas.set_option('display.float_format',lambda x:'%f'%x)
 #READING DATA FROM CSV SOURCE FILE AND IMPORT THEM TO DATAFRAME data_mars
data_mars = pandas.read_csv('marscrater_pds.csv',low_memory=False)
data_mars.columns = map(str.upper,data_mars.columns)
#BELOW CODE ADDS CATEGORIZATION OF CRATER SIZE IN MULTIPLES OF 10KM.
#EXAMPLE 1 REPRESENTS CRATER SIZE LESS THAN 10KM AND 2 REPRESENTS SIZE BETWEEN 10KM to 20KM AND SO ON.
data_mars['Crater_Size_Cat'] = data_mars['DIAM_CIRCLE_IMAGE']//10 + 1
 #BELOW CODE MODIFIES MMORPHOLOGY_EJECTA_2 DATA TO HUMMOCKY AND SMOOTH
data_mars['Morph_2'] = data_mars['MORPHOLOGY_EJECTA_2'].str[:2]
 #BELOW CODE MODIFIES MMORPHOLOGY_EJECTA_1 DATA TO RESTRICT TO SIMPLE LAYERS NOMENCLATURE
data_mars['Morph_1'] = data_mars['MORPHOLOGY_EJECTA_1'].str[:5]
#AS TARGET IS TO STUDY MORPHOLOGICAL DATA FROM GLOBAL DATASET,
#WE CREATE NEW DATA FRAME REMOVING ALL THE ROWS HAVING "NUMBER_LAYERS" = 0
#STORE NEW DATA UNDER NEW DATA FRAME data_mars_mod
data_mars_mod = data_mars[data_mars.NUMBER_LAYERS!= 0]
 #BELOW CODE IS TO CALCULATE FREQUENCY DISTRIBUTION OF "NUMBER OF LAYERS" IN TERMS OF COUNTS AND PERCENTAGES
c1 = data_mars_mod["NUMBER_LAYERS"].value_counts(sort=False)
p1 = data_mars_mod["NUMBER_LAYERS"].value_counts(sort=False, normalize=True)*100
 #BELOW CODE IS TO CALCULATE FREQUENCY DISTRIBUTION OF "MORPHOLOGY CHARECTERISTICS 1" IN TERMS OF COUNTS AND PERCENTAGES
c2 = data_mars_mod["Morph_1"].value_counts(sort=False)
p2 = data_mars_mod["Morph_1"].value_counts(sort=False, normalize=True)*100
 #BELOW CODE IS TO CALCULATE FREQUENCY DISTRIBUTION OF "MORPHOLOGY CHARECTERISTICS 2" IN TERMS OF COUNTS AND PERCENTAGES
c3 = data_mars_mod["Morph_2"].value_counts(sort=False)
p3 = data_mars_mod["Morph_2"].value_counts(sort=False, normalize=True)*100
 #BELOW CODE IS TO CALCULATE FREQUENCY DISTRIBUTION OF "AGGREGATED CRATER SIZES" IN TERMS OF COUNTS AND PERCENTAGES
c4 = data_mars_mod["Crater_Size_Cat"].value_counts(sort=False)
p4 = data_mars_mod["Crater_Size_Cat"].value_counts(sort=False, normalize=True)*100
 #BELOW CODES PRINTS OUT THE OUTPUT DISCTRIBUTION OF NUMBER OF LAYERS AND EJECTA PROFILES
print('Number of counts of Craters with different number of layers are as below')
print(c1)
print('Percentages of Craters with different number of layers are as below ')
print(p1)
print('Number of counts with different Morphology ejecta 1 charecteristics for craters are as below - Ex SLERS (Single Layer Ejecta / Rampant/Circular')
print(c2)
print('Percentages of different Morphology ejecta 1 charecteristics for craters are as below -  Ex SLERS (Single Layer Ejecta / Rampant/Circular' )
print(p2)
print('Number of counts with different Morphology ejecta 2 charecteristics for craters are as below - H = Hummocky and S = Smooth')
print(c3)
print('Number of counts with different Morphology ejecta 2 charecteristics for craters are as below - H = Hummocky and S = Smooth')
print(p3)
print('Counts of Crater size in multiples of 10KM are as below')
print(c4)
print('Percentages of Crater size in multiples of 10KM are as below')
print(p4)
 Output Frequency Tables
VARIABLE 1 – LAYERS OF CRATERS
Number of counts of Craters with different number of layers are as below
1   15467
2     3435
3      739
4       85
5        5
Percentages of Craters with different number of layers are as below  
1   78.389337
2   17.409153
3   3.745375
4   0.430794
5   0.025341
 VARIABLE 2 – MORPHOLOGY_EJECTA_1
Number of counts with different Morphology ejecta 1 characteristics for craters are as below - Ex SLERS (Single Layer Ejecta / Rampant/Circular)
SLErS       1
MLERC      24
SLERC    1290
DLSPC       1
DLEPC     505
Rd/SP       1
RD/SL       1
Rd/SL    1298
SLERS    5130
MLERS     492
MLEPS      43
Rd/DL     637
Rd/ML     240
SLEPS    5053
DLEPS     633
DLERS    1244
SLEPC    2678
DLERC     393
MLEPC      22
SLEPd      44
DLEPd       1
Percentages of different Morphology ejecta 1 characteristics for craters are as below -  Ex SLERS (Single Layer Ejecta / Rampant/Circular)
SLErS    0.005068
MLERC    0.121636
SLERC    6.537935
DLSPC    0.005068
DLEPC    2.559424
Rd/SP    0.005068
RD/SL    0.005068
Rd/SL    6.578481
SLERS   25.999696
MLERS    2.493538
MLEPS    0.217931
Rd/DL    3.228422
Rd/ML    1.216360
SLEPS   25.609447
DLEPS    3.208150
DLERS    6.304800
SLEPC   13.572551
DLERC    1.991790
MLEPC    0.111500
SLEPd    0.222999
DLEPd    0.005068
VARIABLE 3 – MORPHOLOGY_EJECTA_2
Number of counts with different Morphology ejecta 2 characteristics for craters are as below - H = Hummocky and S = Smooth
Sm     5561
Hu   13912
HU        3
Number of counts with different Morphology ejecta 2 characteristics for craters are as below - H = Hummocky and S = Smooth
Sm   28.184076
Hu   70.508337
HU   0.015205
VARIABLE 4: CRATER SIZE (DIAMETER) IN MULTIPLES OF 10KM
Counts of Crater size in multiples of 10KM are as below
9.000000         1
4.000000       172
3.000000       618
2.000000      3404
1.000000     15463
6.000000        15
12.000000        1
8.000000         5
5.000000        46
7.000000         6
Percentages of Crater size in multiples of 10KM are as below
9.000000     0.005068
4.000000     0.871725
3.000000     3.132127
2.000000    17.252040
1.000000    78.369064
6.000000     0.076023
12.000000    0.005068
8.000000     0.025341
5.000000     0.233136
7.000000     0.030409
Inference:
Frequency distribution from above tables were generated after segregating data for which morphology information was available, hence rest of the rows were deleted in data frame. Above distribution reveals below details:
1.      Most of the craters from segregated data are having One Layer (78%) or Two layers (17%) Rest small portion is distributed to Three, Four- and Five-layers Craters
2.      This is also supplemented by Morphology_Ejecta_1 data but additional information received is most of craters under Single and Double layers have equal representation from Pancake Circular, Pancake Sinusal and Rampant Sinusal categories.  
3.      Morphology_Ejecta_2 reveal ejecta patters are mostly hummock type as compared to smooth profiles with 70:30 proportion
4.      Lastly as far as size of craters is considered, most of them fall under less than 10KM category with 78% share.
Above information closely relates to correlation between layer dependent morphology vs crater size which was initial hypothesis. But this can only be proven after further analysis of data.  
Summary
Purpose of initial post is hereby covered considering below points.
Writing programming code: Python was used to write code and same is presented under section “Python Code”  
Display of Variables frequency table: This is covered under section “Output Frequency Table”
Description of frequency distribution: This is covered in “Inference Section”
0 notes
chandrakant-padhee-blog · 5 years ago
Text
Relation between Martian Crater Morphology with size and location of Crater
Introduction to blog
Purpose of this blog is to post my assignment work related to the course “Data Management and Visualization” offered by Wesleyan University through Coursera. I will be using this space as I progress through this course to post my assignment work. This post is for Week 1 assignment which is broadly focused towards choosing one of the research areas (data sets provided within the course), identification of specific topics of interest, preparation of personal code book and come up with a hypothesis for further research.
Area of research and Topic finalization
As first step in course we should either use one of the five areas/data sets provided with course (which are publicly available) OR we can use any other public data of our choice. After going through topics, provided data sets and research areas I have decided to go ahead with Mars Crater’s study. It is a data set specific to a study of the craters of Mars in 2011, by Stuart James Robbins of the University of Colorado, Boulder.
Topic Scope
After reviewing materials and data points provided under original code book, I have decided to analyze portion of data set where morphological information is provided for craters. Data set provided with original code-book includes information about 384,343 craters in Mars. However, as I have decided to give a closer look to crater morphology, hence I will be using 44,652 data points of these as rest of them do not have information related to Morphology.
Initial Hypothesis
I have decided to work on below two primary and secondary hypotheses.
1.      Primary topic: Whether crater morphology has any correlation with overall size of craters?  
2.      Secondary topic: If a particular morphology pattern is also dependent on location of the crater which then can help us associating morphology pattern with surface/soil characteristics.
(Assumption is soil and surface type will not drastically change within a specified area).  
Accordingly, I will be using all the features available in data set, but will get rid of data points where relevant information is not provided.
Code Book:
Personal code-book is created for this exercise which gives more details about data point considered and their definition. On a high level below variables are used.
CRATER_ID – Crater ID for internal use and is Unique ID
LATITUDE_CIRCLE_IMAGE – Latitude based on reference datum. (Useful for identifying crater location in latitudinal direction)  
LONGITUDE_CIRCLE_IMAGE – Longitude based on reference datum. (Useful got identifying crater location in longitudinal direction)
DIAM_CIRCLE_IMAGE – Diameter of Crater (units are km) 
DEPTH_RIMFLOOR_TOPOG – Depth of Crater (units are km) 
MORPHOLOGY_EJECTA_1 – Ejecta morphology classified. 
MORPHOLOGY_EJECTA_2 – Morphology of the layer(s) itself/themselves. 
MORPHOLOGY_EJECTA_3 – overall texture and/pr shape of some of the layer(s)/ejecta that are generally unique and deserve separate morphological classification.
NUMBER_LAYERS – the maximum number of cohesive layers
Existing Topic Research and Hypothesis
A search in Google Scholar and other publicly available resources in internet indicated that there might exists co-relation between object impact on surface which translates to size of crater. And different level of impact can be correlated with different morphology pattern. As we do not have actual impact data, a correlation of size (volume of crater) and morphology pattern can be helpful in proving this hypothesis. Some study results are not directly available to public over internet which is where we will rely on findings from this research.
Summary
Purpose of initial post is hereby covered considering below points.
Selection of data set and indication of selection: Mars crater study case is selected and data points modified to suit area of research.
Research question and hypothesis: Correlation of morphology and crater size. Initial hypothesis is they are co-related.
Literature review: Literature available in reference section is reviewed to get knowledge of search terms and definitions of variables.
References used: Reference section identifies resources used to gain more knowledge on this topic.
Variables used: Variables used are identified in above sections and some initial findings and work done is summarized in section “Existing Topic Research”.
References
Standardizing the nomenclature of Martian impact crater ejecta morphology - https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2000JE001258
Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database: http://about.sjrdesign.net/files/thesis/RobbinsThesis_LargeMB.pdf  
In Search of Martian Craters: https://earthdata.nasa.gov/learn/sensing-our-planet/in-search-of-martian-craters
Impact Crater: https://en.wikipedia.org/wiki/Impact_crater
Martian impact crater ejecta morphologies as indicators of the distribution of subsurface volatiles:https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2002JE002036
1 note · View note