#linestyle
Explore tagged Tumblr posts
Text
Chrysalis anthro voted by patreon. Accept no substitutes.
#Love this bitch#my linestyle doesnt fit her well#bc straight hair looks odd on my style#but still#banger#love her sm#chrysalis#queen chrysalis#mlp#my little pony#pony posting#anthro#friendship is magic#Best villan by presentation alone
986 notes
路
View notes
Text
I see... You use underhanded tactics, too. Huh, Shuichi?
#Poisned.art#my art#kokichi#kokichi fanart#danganronpa#danganronpa fanart#kokichi ouma#ouma kokichi#danganronpa spoilers#danganronpa v3#danganronpa v3 spoilers#I spent a couple days on this for one simple reason#I fucking hate line art#Any attempt at any linestyle#even no lines at all#looked absolutely dreadful#so i just said fuck it and used the sketch#finished it in an afternoon my god I hate lineart-
66 notes
路
View notes
Text
0 notes
Text
So fun fact i started drawing this gojo and halfway through gege confirmed that he was more of a dog person. ... 馃拃馃拃馃拃 idk if i'll end up finishing this but i still liked the colors and cute kitty and linestyle i went with!!
#artists on tumblr#digital art#trans artist#small artist#drawing#sketch#my art#fanart#anime fanart#anime art#jujutsu kaisen art#jujutsu gojo#jjk gojo#gojo satoru#jjk satoru#jjk fanart
25 notes
路
View notes
Text
opinions for cel shade artstyle
7 notes
路
View notes
Text
`Yavanna麓- Traditional painting. -Pencil and watercolor paint. - The color version of Yavanna is ready now. Soon available in my shops:
(Music in Video by www.frametraxx.de)
#tolkienfanart#valar#yavanna#fantasyart#traditionalpainting#illustration#linestyleartwork#annajaegerhauer#aquarell#drawing#fantasyartist#fantasy#traditionalart#noai#humanartist#femaleartist#artist#jrrtolkien#elves#artistsoninstagram#art#tolkienelben#tolkienelves#lordoftherings#lotr#silmarillion#fee#elfe#watercolor#artistsontumblr
32 notes
路
View notes
Note
HOW DO GOU EVEN MAKE YOUR ART LOOK LIKE SOFT COOKIES WITH LOTS OF CHOCOLATE CHIPS,,,, HOW??????驴??
(btw i left yir server becuaz i realized your server waz 16+,, but il be bac on my birthday!!! im turning 16 on january 19!!!((
(ty for respecting the rules and my boundaries! i generally keep all my socials 16+ for my own comfort and im glad you can understand <3) but awa nonetheless thankyou for the compliments! for anyone curious about how i create the soft look in my art i think theres a few things that contribute to it; - i rarely use fully opaque lineart, i draw all my lines using a transparency/pressue brush and i often use a thicker linestyle than others too, i think this helps lend itself to line weight and depth but also helps accentuate softer areas where the lines are really light and round! - i use a soft-edged brush for shading, i turn the hardness down to pretty much 0% and paint soft shapes and gradients to map out the general lighting with blend modes - i ALSO use a soft-edged brush for rendering-- after laying out the lights and darks i go in with a smaller but still soft brush to add details on a sepearate layer above the lineart, this can often result in me sometimes overlapping the lines a little and creating that fluffy look! i think these things are maybe subtle but they help a lot to generally reduce hard edges in the piece and reduce line intensity- lineart is of course important to a piece but i dont treat it as the focus, its more of a guide to the edges and shadows more than anything, if that makes sense :3
#zilluask#art tips#maybe?#soft art#how to#art advice#im not sure how much this qualifies as advice but its an insight into my process atleast!!#if people are interested id be happy to do more in depth guides with visuals and stuff :3
16 notes
路
View notes
Text
feel like different material could maybe elevate my linestyle stuff but it already takes foreeeeever and im worried something slower would render it stale. like stagnant water
4 notes
路
View notes
Text
Example Code```pythonimport pandas as pdimport numpy as npimport statsmodels.api as smimport matplotlib.pyplot as pltimport seaborn as snsfrom statsmodels.graphics.gofplots import qqplotfrom statsmodels.stats.outliers_influence import OLSInfluence# Sample data creation (replace with your actual dataset loading)np.random.seed(0)n = 100depression = np.random.choice(['Yes', 'No'], size=n)age = np.random.randint(18, 65, size=n)nicotine_symptoms = np.random.randint(0, 20, size=n) + (depression == 'Yes') * 10 + age * 0.5 # More symptoms with depression and agedata = { 'MajorDepression': depression, 'Age': age, 'NicotineDependenceSymptoms': nicotine_symptoms}df = pd.DataFrame(data)# Recode categorical explanatory variable MajorDepression# Assuming 'Yes' is coded as 1 and 'No' as 0df['MajorDepression'] = df['MajorDepression'].map({'Yes': 1, 'No': 0})# Multiple regression modelX = df[['MajorDepression', 'Age']]X = sm.add_constant(X) # Add intercepty = df['NicotineDependenceSymptoms']model = sm.OLS(y, X).fit()# Print regression results summaryprint(model.summary())# Regression diagnostic plots# Q-Q plotresiduals = model.residfig, ax = plt.subplots(figsize=(8, 5))qqplot(residuals, line='s', ax=ax)ax.set_title('Q-Q Plot of Residuals')plt.show()# Standardized residuals plotinfluence = OLSInfluence(model)std_residuals = influence.resid_studentized_internalplt.figure(figsize=(8, 5))plt.scatter(model.predict(), std_residuals, alpha=0.8)plt.axhline(y=0, color='r', linestyle='-', linewidth=1)plt.title('Standardized Residuals vs. Fitted Values')plt.xlabel('Fitted values')plt.ylabel('Standardized Residuals')plt.grid(True)plt.show()# Leverage plotfig, ax = plt.subplots(figsize=(8, 5))sm.graphics.plot_leverage_resid2(model, ax=ax)ax.set_title('Leverage-Residuals Plot')plt.show()# Blog entry summarysummary = """### Summary of Multiple Regression Analysis1. **Association between Explanatory Variables and Response Variable:** The results of the multiple regression analysis revealed significant associations: - Major Depression (Beta = {:.2f}, p = {:.4f}): Significant and positive association with Nicotine Dependence Symptoms. - Age (Beta = {:.2f}, p = {:.4f}): Older participants reported a greater number of Nicotine Dependence Symptoms.2. **Hypothesis Testing:** The results supported the hypothesis that Major Depression is positively associated with Nicotine Dependence Symptoms.3. **Confounding Variables:** Age was identified as a potential confounding variable. Adjusting for Age slightly reduced the magnitude of the association between Major Depression and Nicotine Dependence Symptoms.4. **Regression Diagnostic Plots:** - **Q-Q Plot:** Indicates that residuals approximately follow a normal distribution, suggesting the model assumptions are reasonable. - **Standardized Residuals vs. Fitted Values Plot:** Shows no apparent pattern in residuals, indicating homoscedasticity and no obvious outliers. - **Leverage-Residuals Plot:** Identifies influential observations but shows no extreme leverage points.### Output from Multiple Regression Model```python# Your output from model.summary() hereprint(model.summary())```### Regression Diagnostic Plots![Q-Q Plot of Residuals](insert_url_to_image_qq_plot)![Standardized Residuals vs. Fitted Values](insert_url_to_image_std_resid_plot)![Leverage-Residuals Plot](insert_url_to_image_leverage_plot)"""# Assuming you would generate and upload images of the plots to your blog# Print the summary for submissionprint(summary)```### Explanation:1. **Sample Data Creation**: Simulates a dataset with `MajorDepression` as a categorical explanatory variable, `Age` as a quantitative explanatory variable, and `NicotineDependenceSymptoms` as the response variable. 2. **Multiple Regression Model**: - Constructs an Ordinary Least Squares (OLS) regression model using `sm.OLS` from the statsmo
2 notes
路
View notes
Text
OOL Attacker - Running a Lasso Regression Analysis
For this project it was used the Outlook on Life Surveys. "The purpose of the 2012 Outlook Surveys were to study political and social attitudes in the United States. The specific purpose of the survey is to consider the ways in which social class, ethnicity, marital status, feminism, religiosity, political orientation, and cultural beliefs or stereotypes influence opinion and behavior." - Outlook
Was necessary the removal of some rows containing text (for confidentiality purposes) before the dorpna() function.
This is my full code:
import pandas as pd import numpy as np import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LassoLarsCV from sklearn import preprocessing import time #Load the dataset data = pd.read_csv(r"PATHHHH") #upper-case all DataFrame column names data.columns = map(str.upper, data.columns) # Data Management data_clean = data.select_dtypes(include=['number']) data_clean = data_clean.dropna() print(data_clean.describe()) print(data_clean.dtypes) #Split into training and testing sets headers = list(data_clean.columns) headers.remove("PPNET") predvar = data_clean[headers] target = data_clean.PPNET predictors=predvar.copy() for header in headers: 聽 聽 predictors[header] = preprocessing.scale(predictors[header].astype('float64')) # split data into train and test sets pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, target, test_size=.3, random_state=123) # specify the lasso regression model model=LassoLarsCV(cv=10, precompute=False).fit(pred_train,tar_train) #Display both categories by coefs pd.set_option('display.max_rows', None) table_catimp=pd.DataFrame({'cat': predictors.columns, 'coef': abs(model.coef_)}) print(table_catimp) non_zero_count = (table_catimp['coef'] != 0).sum() zero_count = table_catimp.shape[0] - non_zero_count print(f"Number of non-zero coefficients: {non_zero_count}") print(f"Number of zero coefficients: {zero_count}") #Display top 5 categories by coefs top_5_rows = table_catimp.nlargest(10, 'coef') print(top_5_rows.to_string(index=False)) # plot coefficient progression m_log_alphas = -np.log10(model.alphas_) ax = plt.gca() plt.plot(m_log_alphas, model.coef_path_.T) plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', 聽 聽 聽 聽 聽 聽 label='alpha CV') plt.ylabel('Regression Coefficients') plt.xlabel('-log(alpha)') plt.title('Regression Coefficients Progression for Lasso Paths') # plot mean square error for each fold m_log_alphascv = -np.log10(model.cv_alphas_) plt.figure() plt.plot(m_log_alphascv, model.mse_path_, ':') plt.plot(m_log_alphascv, model.mse_path_.mean(axis=-1), 'k', 聽 聽 聽 聽 聽label='Average across the folds', linewidth=2) plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', 聽 聽 聽 聽 聽 聽 label='alpha CV') plt.legend() plt.xlabel('-log(alpha)') plt.ylabel('Mean squared error') plt.title('Mean squared error on each fold') # MSE from training and test data from sklearn.metrics import mean_squared_error train_error = mean_squared_error(tar_train, model.predict(pred_train)) test_error = mean_squared_error(tar_test, model.predict(pred_test)) print ('training data MSE') print(train_error) print ('test data MSE') print(test_error) # R-square from training and test data rsquared_train=model.score(pred_train,tar_train) rsquared_test=model.score(pred_test,tar_test) print ('training data R-square') print(rsquared_train) print ('test data R-square') print(rsquared_test) plt.show()
Output:
Number of non-zero coefficients: 54 Number of zero coefficients: 186 training data MSE: 0.11867468892072082 test data MSE: 0.1458371486851879 training data R-square: 0.29967753231880834 test data R-square: 0.18204209521525183
Top 10 coefs:
PPINCIMP 0.097772 W1_WEIGHT3 0.061791 W1_P21 0.048740 W1_E1 0.027003 W1_CASEID 0.026709 PPHHSIZE 0.026055 PPAGECT4 0.022809 W1_Q1_B 0.021630 W1_P16E 0.020672 W1_E63_C 0.020205
Conclusions:
While the test error is slightly higher than the training error, the difference is not extreme, which is a positive sign that the model generalizes reasonably well.
Also in another note, the drop in R-square between training and test sets suggests the model may be overfitting slightly or that the predictors do not fully explain the response variable's behavior.
In the attachments I present the Regression coefficients, allowing to verify which features or characteristics are the most relevant for this model. Both PPINCIMP and W1_WEIGHT3 have the biggest weight. In the second attachment we can select the optimal alpha, the vertical dashed line indicates the optimal value of alpha.
The program successfully identified a subset of 54 predictors from 240 variables that are most strongly associated with the response variable. The moderate R-square values suggest room for improvement in the model's explanatory power. However, the close alignment between training and test MSE indicates reasonable generalization.
0 notes
Text
Multiple Regression model
To successfully complete the assignment on testing a multiple regression model, you'll need to conduct a comprehensive analysis using Python, summarize your findings in a blog entry, and include necessary regression diagnostic plots.
Here鈥檚 a structured example to guide you through the process:
### Example Code
```pythonimport pandas as pdimport numpy as npimport statsmodels.api as smimport matplotlib.pyplot as pltimport seaborn as snsfrom statsmodels.graphics.gofplots import qqplotfrom statsmodels.stats.outliers_influence import OLSInfluence# Sample data creation (replace with your actual dataset loading)np.random.seed(0)n = 100depression = np.random.choice(['Yes', 'No'], size=n)age = np.random.randint(18, 65, size=n)nicotine_symptoms = np.random.randint(0, 20, size=n) + (depression == 'Yes') * 10 + age * 0.5 # More symptoms with depression and agedata = { 'MajorDepression': depression, 'Age': age, 'NicotineDependenceSymptoms': nicotine_symptoms}df = pd.DataFrame(data)# Recode categorical explanatory variable MajorDepression# Assuming 'Yes' is coded as 1 and 'No' as 0df['MajorDepression'] = df['MajorDepression'].map({'Yes': 1, 'No': 0})# Multiple regression modelX = df[['MajorDepression', 'Age']]X = sm.add_constant(X) # Add intercepty = df['NicotineDependenceSymptoms']model = sm.OLS(y, X).fit()# Print regression results summaryprint(model.summary())# Regression diagnostic plots# Q-Q plotresiduals = model.residfig, ax = plt.subplots(figsize=(8, 5))qqplot(residuals, line='s', ax=ax)ax.set_title('Q-Q Plot of Residuals')plt.show()# Standardized residuals plotinfluence = OLSInfluence(model)std_residuals = influence.resid_studentized_internalplt.figure(figsize=(8, 5))plt.scatter(model.predict(), std_residuals, alpha=0.8)plt.axhline(y=0, color='r', linestyle='-', linewidth=1)plt.title('Standardized Residuals vs. Fitted Values')plt.xlabel('Fitted values')plt.ylabel('Standardized Residuals')plt.grid(True)plt.show()# Leverage plotfig, ax = plt.subplots(figsize=(8, 5))sm.graphics.plot_leverage_resid2(model, ax=ax)ax.set_title('Leverage-Residuals Plot')plt.show()# Blog entry summarysummary = """### Summary of Multiple Regression Analysis1. **Association between Explanatory Variables and Response Variable:** The results of the multiple regression analysis revealed significant associations: - Major Depression (Beta = {:.2f}, p = {:.4f}): Significant and positive association with Nicotine Dependence Symptoms. - Age (Beta = {:.2f}, p = {:.4f}): Older participants reported a greater number of Nicotine Dependence Symptoms.2. **Hypothesis Testing:** The results supported the hypothesis that Major Depression is positively associated with Nicotine Dependence Symptoms.3. **Confounding Variables:** Age was identified as a potential confounding variable. Adjusting for Age slightly reduced the magnitude of the association between Major Depression and Nicotine Dependence Symptoms.4. **Regression Diagnostic Plots:** - **Q-Q Plot:** Indicates that residuals approximately follow a normal distribution, suggesting the model assumptions are reasonable. - **Standardized Residuals vs. Fitted Values Plot:** Shows no apparent pattern in residuals, indicating homoscedasticity and no obvious outliers. - **Leverage-Residuals Plot:** Identifies influential observations but shows no extreme leverage points.### Output from Multiple Regression Model```python# Your output from model.summary() hereprint(model.summary())```### Regression Diagnostic Plots![Q-Q Plot of Residuals](insert_url_to_image_qq_plot)![Standardized Residuals vs. Fitted Values](insert_url_to_image_std_resid_plot)![Leverage-Residuals Plot](insert_url_to_image_leverage_plot)"""# Assuming you would generate and upload images of the plots to your blog# Print the summary for submissionprint(summary)```
### Explanation:
1. **Sample Data Creation**: Simulates a dataset with `MajorDepression` as a categorical explanatory variable, `Age` as a quantitative explanatory variable, and `NicotineDependenceSymptoms` as the response variable.
2. **Multiple Regression Model**: - Constructs an Ordinary Least Squares (OLS) regression model using `sm.OLS` from the statsmodels library. - Adds an intercept to the model using `sm.add_constant`. - Fits the model to predict `NicotineDependenceSymptoms` using `MajorDepression` and `Age` as predictors.
3. **Regression Diagnostic Plots**: - Q-Q Plot: Checks the normality assumption of residuals. - Standardized Residuals vs. Fitted Values: Examines homoscedasticity and identifies outliers. - Leverage-Residuals Plot: Detects influential observations that may affect model fit.
4. **Blog Entry Summary**: Provides a structured summary including results of regression analysis, hypothesis testing, discussion on confounding variables, and inclusion of regression diagnostic plots.
### Blog Entry SubmissionEnsure to adapt the code and summary based on your specific dataset and analysis. Upload the regression diagnostic plots as images to your blog entry and provide the URL to your completed assignment. This example should help you effectively complete your Coursera assignment on testing a multiple regression model.
0 notes
Text
To successfully complete the assignment on testing a multiple regression model, you'll need to conduct a comprehensive analysis using Python, summarize your findings in a blog entry, and include necessary regression diagnostic plots. Here鈥檚 a structured example to guide you through the process:### Example Code```pythonimport pandas as pdimport numpy as npimport statsmodels.api as smimport matplotlib.pyplot as pltimport seaborn as snsfrom statsmodels.graphics.gofplots import qqplotfrom statsmodels.stats.outliers_influence import OLSInfluence# Sample data creation (replace with your actual dataset loading)np.random.seed(0)n = 100depression = np.random.choice(['Yes', 'No'], size=n)age = np.random.randint(18, 65, size=n)nicotine_symptoms = np.random.randint(0, 20, size=n) + (depression == 'Yes') * 10 + age * 0.5 # More symptoms with depression and agedata = { 'MajorDepression': depression, 'Age': age, 'NicotineDependenceSymptoms': nicotine_symptoms}df = pd.DataFrame(data)# Recode categorical explanatory variable MajorDepression# Assuming 'Yes' is coded as 1 and 'No' as 0df['MajorDepression'] = df['MajorDepression'].map({'Yes': 1, 'No': 0})# Multiple regression modelX = df[['MajorDepression', 'Age']]X = sm.add_constant(X) # Add intercepty = df['NicotineDependenceSymptoms']model = sm.OLS(y, X).fit()# Print regression results summaryprint(model.summary())# Regression diagnostic plots# Q-Q plotresiduals = model.residfig, ax = plt.subplots(figsize=(8, 5))qqplot(residuals, line='s', ax=ax)ax.set_title('Q-Q Plot of Residuals')plt.show()# Standardized residuals plotinfluence = OLSInfluence(model)std_residuals = influence.resid_studentized_internalplt.figure(figsize=(8, 5))plt.scatter(model.predict(), std_residuals, alpha=0.8)plt.axhline(y=0, color='r', linestyle='-', linewidth=1)plt.title('Standardized Residuals vs. Fitted Values')plt.xlabel('Fitted values')plt.ylabel('Standardized Residuals')plt.grid(True)plt.show()# Leverage plotfig, ax = plt.subplots(figsize=(8, 5))sm.graphics.plot_leverage_resid2(model, ax=ax)ax.set_title('Leverage-Residuals Plot')plt.show()# Blog entry summarysummary = """### Summary of Multiple Regression Analysis1. **Association between Explanatory Variables and Response Variable:** The results of the multiple regression analysis revealed significant associations: - Major Depression (Beta = {:.2f}, p = {:.4f}): Significant and positive association with Nicotine Dependence Symptoms. - Age (Beta = {:.2f}, p = {:.4f}): Older participants reported a greater number of Nicotine Dependence Symptoms.2. **Hypothesis Testing:** The results supported the hypothesis that Major Depression is positively associated with Nicotine Dependence Symptoms.3. **Confounding Variables:** Age was identified as a potential confounding variable. Adjusting for Age slightly reduced the magnitude of the association between Major Depression and Nicotine Dependence Symptoms.4. **Regression Diagnostic Plots:** - **Q-Q Plot:** Indicates that residuals approximately follow a normal distribution, suggesting the model assumptions are reasonable. - **Standardized Residuals vs. Fitted Values Plot:** Shows no apparent pattern in residuals, indicating homoscedasticity and no obvious outliers. - **Leverage-Residuals Plot:** Identifies influential observations but shows no extreme leverage points.### Output from Multiple Regression Model```python# Your output from model.summary() hereprint(model.summary())```### Regression Diagnostic Plots![Q-Q Plot of Residuals](insert_url_to_image_qq_plot)![Standardized Residuals vs. Fitted Values](insert_url_to_image_std_resid_plot)![Leverage-Residuals Plot](insert_url_to_image_leverage_plot)"""# Assuming you would generate and upload images of the plots to your blog# Print the summary for submissionprint(summary)```###
0 notes
Text
To successfully complete the assignment on testing a multiple regression model, you'll need to conduct a comprehensive analysis using Python, summarize your findings in a blog entry, and include necessary regression diagnostic plots. Here鈥檚 a structured example to guide you through the process:
Example Code
import pandas as pd import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt import seaborn as sns from statsmodels.graphics.gofplots import qqplot from statsmodels.stats.outliers_influence import OLSInfluence # Sample data creation (replace with your actual dataset loading) np.random.seed(0) n = 100 depression = np.random.choice(['Yes', 'No'], size=n) age = np.random.randint(18, 65, size=n) nicotine_symptoms = np.random.randint(0, 20, size=n) + (depression == 'Yes') * 10 + age * 0.5 # More symptoms with depression and age data = { 'MajorDepression': depression, 'Age': age, 'NicotineDependenceSymptoms': nicotine_symptoms } df = pd.DataFrame(data) # Recode categorical explanatory variable MajorDepression # Assuming 'Yes' is coded as 1 and 'No' as 0 df['MajorDepression'] = df['MajorDepression'].map({'Yes': 1, 'No': 0}) # Multiple regression model X = df[['MajorDepression', 'Age']] X = sm.add_constant(X) # Add intercept y = df['NicotineDependenceSymptoms'] model = sm.OLS(y, X).fit() # Print regression results summary print(model.summary()) # Regression diagnostic plots # Q-Q plot residuals = model.resid fig, ax = plt.subplots(figsize=(8, 5)) qqplot(residuals, line='s', ax=ax) ax.set_title('Q-Q Plot of Residuals') plt.show() # Standardized residuals plot influence = OLSInfluence(model) std_residuals = influence.resid_studentized_internal plt.figure(figsize=(8, 5)) plt.scatter(model.predict(), std_residuals, alpha=0.8) plt.axhline(y=0, color='r', linestyle='-', linewidth=1) plt.title('Standardized Residuals vs. Fitted Values') plt.xlabel('Fitted values') plt.ylabel('Standardized Residuals') plt.grid(True) plt.show() # Leverage plot fig, ax = plt.subplots(figsize=(8, 5)) sm.graphics.plot_leverage_resid2(model, ax=ax) ax.set_title('Leverage-Residuals Plot') plt.show() # Blog entry summary summary = """ ### Summary of Multiple Regression Analysis 1. **Association between Explanatory Variables and Response Variable:** The results of the multiple regression analysis revealed significant associations: - Major Depression (Beta = {:.2f}, p = {:.4f}): Significant and positive association with Nicotine Dependence Symptoms. - Age (Beta = {:.2f}, p = {:.4f}): Older participants reported a greater number of Nicotine Dependence Symptoms. 2. **Hypothesis Testing:** The results supported the hypothesis that Major Depression is positively associated with Nicotine Dependence Symptoms. 3. **Confounding Variables:** Age was identified as a potential confounding variable. Adjusting for Age slightly reduced the magnitude of the association between Major Depression and Nicotine Dependence Symptoms. 4. **Regression Diagnostic Plots:** - **Q-Q Plot:** Indicates that residuals approximately follow a normal distribution, suggesting the model assumptions are reasonable. - **Standardized Residuals vs. Fitted Values Plot:** Shows no apparent pattern in residuals, indicating homoscedasticity and no obvious outliers. - **Leverage-Residuals Plot:** Identifies influential observations but shows no extreme leverage points. ### Output from Multiple Regression Model
python
Your output from model.summary() here
print(model.summary())### Regression Diagnostic Plots ![Q-Q Plot of Residuals](insert_url_to_image_qq_plot) ![Standardized Residuals vs. Fitted Values](insert_url_to_image_std_resid_plot) ![Leverage-Residuals Plot](insert_url_to_image_leverage_plot) """ # Assuming you would generate and upload images of the plots to your blog # Print the summary for submission print(summary)
Explanation:
Sample Data Creation: Simulates a dataset with MajorDepression as a categorical explanatory variable, Age as a quantitative explanatory variable, and NicotineDependenceSymptoms as the response variable.
Multiple Regression Model:
Constructs an Ordinary Least Squares (OLS) regression model using sm.OLS from the statsmodels library.
Adds an intercept to the model using sm.add_constant.
Fits the model to predict NicotineDependenceSymptoms using MajorDepression and Age as predictors.
Regression Diagnostic Plots:
Q-Q Plot: Checks the normality assumption of residuals.
Standardized Residuals vs. Fitted Values: Examines homoscedasticity and identifies outliers.
Leverage-Residuals Plot: Detects influential observations that may affect model fit.
Blog Entry Summary: Provides a structured summary including results of regression analysis, hypothesis testing, discussion on confounding variables, and inclusion of regression diagnostic plots.
Blog Entry Submission
Ensure to adapt the code and summary based on your specific dataset and analysis. Upload the regression diagnostic plots as images to your blog entry and provide the URL to your completed assignment. This example should help you effectively complete your Coursera assignment on testing a multiple regression model.
0 notes
Text
To successfully complete the assignment on testing a multiple regression model, you'll need to conduct a comprehensive analysis using Python, summarize your findings in a blog entry, and include necessary regression diagnostic plots. Here鈥檚 a structured example to guide you through the process:
Example Code
import pandas as pd import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt import seaborn as sns from statsmodels.graphics.gofplots import qqplot from statsmodels.stats.outliers_influence import OLSInfluence # Sample data creation (replace with your actual dataset loading) np.random.seed(0) n = 100 depression = np.random.choice(['Yes', 'No'], size=n) age = np.random.randint(18, 65, size=n) nicotine_symptoms = np.random.randint(0, 20, size=n) + (depression == 'Yes') * 10 + age * 0.5 # More symptoms with depression and age data = { 'MajorDepression': depression, 'Age': age, 'NicotineDependenceSymptoms': nicotine_symptoms } df = pd.DataFrame(data) # Recode categorical explanatory variable MajorDepression # Assuming 'Yes' is coded as 1 and 'No' as 0 df['MajorDepression'] = df['MajorDepression'].map({'Yes': 1, 'No': 0}) # Multiple regression model X = df[['MajorDepression', 'Age']] X = sm.add_constant(X) # Add intercept y = df['NicotineDependenceSymptoms'] model = sm.OLS(y, X).fit() # Print regression results summary print(model.summary()) # Regression diagnostic plots # Q-Q plot residuals = model.resid fig, ax = plt.subplots(figsize=(8, 5)) qqplot(residuals, line='s', ax=ax) ax.set_title('Q-Q Plot of Residuals') plt.show() # Standardized residuals plot influence = OLSInfluence(model) std_residuals = influence.resid_studentized_internal plt.figure(figsize=(8, 5)) plt.scatter(model.predict(), std_residuals, alpha=0.8) plt.axhline(y=0, color='r', linestyle='-', linewidth=1) plt.title('Standardized Residuals vs. Fitted Values') plt.xlabel('Fitted values') plt.ylabel('Standardized Residuals') plt.grid(True) plt.show() # Leverage plot fig, ax = plt.subplots(figsize=(8, 5)) sm.graphics.plot_leverage_resid2(model, ax=ax) ax.set_title('Leverage-Residuals Plot') plt.show() # Blog entry summary summary = """ ### Summary of Multiple Regression Analysis 1. **Association between Explanatory Variables and Response Variable:** The results of the multiple regression analysis revealed significant associations: - Major Depression (Beta = {:.2f}, p = {:.4f}): Significant and positive association with Nicotine Dependence Symptoms. - Age (Beta = {:.2f}, p = {:.4f}): Older participants reported a greater number of Nicotine Dependence Symptoms. 2. **Hypothesis Testing:** The results supported the hypothesis that Major Depression is positively associated with Nicotine Dependence Symptoms. 3. **Confounding Variables:** Age was identified as a potential confounding variable. Adjusting for Age slightly reduced the magnitude of the association between Major Depression and Nicotine Dependence Symptoms. 4. **Regression Diagnostic Plots:** - **Q-Q Plot:** Indicates that residuals approximately follow a normal distribution, suggesting the model assumptions are reasonable. - **Standardized Residuals vs. Fitted Values Plot:** Shows no apparent pattern in residuals, indicating homoscedasticity and no obvious outliers. - **Leverage-Residuals Plot:** Identifies influential observations but shows no extreme leverage points. ### Output from Multiple Regression Model
python
Your output from model.summary() here
print(model.summary())### Regression Diagnostic Plots ![Q-Q Plot of Residuals](insert_url_to_image_qq_plot) ![Standardized Residuals vs. Fitted Values](insert_url_to_image_std_resid_plot) ![Leverage-Residuals Plot](insert_url_to_image_leverage_plot) """ # Assuming you would generate and upload images of the plots to your blog # Print the summary for submission print(summary)
Explanation:
Sample Data Creation: Simulates a dataset with MajorDepression as a categorical explanatory variable, Age as a quantitative explanatory variable, and NicotineDependenceSymptoms as the response variable.
Multiple Regression Model:
Constructs an Ordinary Least Squares (OLS) regression model using sm.OLS from the statsmodels library.
Adds an intercept to the model using sm.add_constant.
Fits the model to predict NicotineDependenceSymptoms using MajorDepression and Age as predictors.
Regression Diagnostic Plots:
Q-Q Plot: Checks the normality assumption of residuals.
Standardized Residuals vs. Fitted Values: Examines homoscedasticity and identifies outliers.
Leverage-Residuals Plot: Detects influential observations that may affect model fit.
Blog Entry Summary: Provides a structured summary including results of regression analysis, hypothesis testing, discussion on confounding variables, and inclusion of regression diagnostic plots.
0 notes
Text
Lasso Regression Project
#from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
import sklearn.metrics
from sklearn.linear_model import LassoLarsCV
# standardize predictors to have mean=0 and sd=1
#predictors=predvar.copy()
from sklearn import preprocessing
predictors.loc[:,'CODPERING']=preprocessing.scale(predictors['CODPERING'].astype('float64'))
predictors.loc[:,'CODPRIPER']=preprocessing.scale(predictors['CODPRIPER'].astype('float64'))
predictors.loc[:,'CODULTPER']=preprocessing.scale(predictors['CODULTPER'].astype('float64'))
predictors.loc[:,'CICREL']=preprocessing.scale(predictors['CICREL'].astype('float64'))
predictors.loc[:,'CRDKAPRACU']=preprocessing.scale(predictors['CRDKAPRACU'].astype('float64'))
predictors.loc[:,'PPKAPRACU']=preprocessing.scale(predictors['PPKAPRACU'].astype('float64'))
predictors.loc[:,'CODPER5']=preprocessing.scale(predictors['CODPER5'].astype('float64'))
predictors.loc[:,'RN']=preprocessing.scale(predictors['RN'].astype('float64'))
predictors.loc[:,'MODALIDADC']=preprocessing.scale(predictors['MODALIDADC'].astype('float64'))
predictors.loc[:,'SEXOC']=preprocessing.scale(predictors['SEXOC'].astype('float64'))
# split data into train and test sets
pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, targets,
聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 test_size=.3, random_state=123)
# specify the lasso regression model
model=LassoLarsCV(cv=10, precompute=False).fit(pred_train,tar_train)
# print variable names and regression coefficients
dict(zip(predictors.columns, model.coef_))
{'CODPERING': 0.0057904161622687605, 'CODPRIPER': 0.6317522521193139, 'CODULTPER': -0.15191539575581153, 'CICREL': 0.07945048661923974, 'CRDKAPRACU': -0.3022282694810491, 'PPKAPRACU': 0.15702206999868978, 'CODPER5': -0.11697786485114721, 'RN': -0.03802582617592532, 'MODALIDADC': 0.017655346467683828, 'SEXOC': 0.10597063961395894}
# plot coefficient progression
m_log_alphas = -np.log10(model.alphas_)
ax = plt.gca()
plt.plot(m_log_alphas, model.coef_path_.T)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k',
聽 聽 聽 聽 聽 聽 label='alpha CV')
plt.ylabel('Regression Coefficients')
plt.xlabel('-log(alpha)')
plt.title('Regression Coefficients Progression for Lasso Paths')
# plot mean square error for each fold
m_log_alphascv = -np.log10(model.cv_alphas_)
plt.figure()
plt.plot(m_log_alphascv, model.mse_path_, ':')
plt.plot(m_log_alphascv, model.mse_path_.mean(axis=-1), 'k',
聽 聽 聽 聽 聽label='Average across the folds', linewidth=2)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k',
聽 聽 聽 聽 聽 聽 label='alpha CV')
plt.legend()
plt.xlabel('-log(alpha)')
plt.ylabel('Mean squared error')
plt.title('Mean squared error on each fold')
# MSE from training and test data
from sklearn.metrics import mean_squared_error
train_error = mean_squared_error(tar_train, model.predict(pred_train))
test_error = mean_squared_error(tar_test, model.predict(pred_test))
print ('training data MSE')
print(train_error)
print ('test data MSE')
print(test_error)
training data MSE 10.81377398206078
test data MSE 10.82396461754075
# R-square from training and test data
rsquared_train=model.score(pred_train,tar_train)
rsquared_test=model.score(pred_test,tar_test)
print ('training data R-square')
print(rsquared_train)
print ('test data R-square')
print(rsquared_test)
training data R-square 0.041399684741574516
test data R-square 0.04201223667290355
Results Explanation:
A lasso regression analysis was conducted to identify a subset of variables from a pool of 10 quantitative predictor variables that best predicted a quantitative response variable. All predictor variables were standardized to have a mean of zero and a standard deviation of one.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. The least angle regression algorithm with k=10 fold cross validation was used to estimate the lasso regression model in the training set, and the model was validated using the test set. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of predictor variables.
The MSE values for both training and test data are quite similar, indicating that the model performs consistently across both datasets. However, the R-square values are quite low (around 0.04), suggesting that the model does not explain much of the variance in the data.
Of the 10 predictor variables, 6 were retained in the selected model.
1 note
路
View note
Text
Linear Regression
So in my first proper post i'd like to talk a bit about Linear Regression.
In the Data Science course I am studying, we are given an example dataset (real_estate_price_size.csv) to play that contains one column with house price and another with house size. In one exercise, we were asked to create a simple linear regression using this dataset. Here is the code for it:
------------------------------------------------------------------------------
import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm import seaborn seaborn.set() import unicodeit
data = pd.read_csv('real_estate_price_size.csv') #using the data.describe() method gives us nice descriptive statistics! print(data.describe())
#y is the dependent variable y = data['price'] #x is the independent variable x1 = data['size']
#add a constant x = sm.add_constant(x1) #fit the model, according to the OLS (ordinary least squares) method with a dependent variable y and independent x results = sm.OLS(y, x).fit() print(results.summary())
plt.scatter(x1,y) #coefficients obtained from the OLS summary. In this case, the property size has the multiplier of 223.1787 and the intercept of 101900 (the constant) yhat = 223.1787*x1 + 101900 fig = plt.plot(x1, yhat, lw=2, color='red', linestyle='dotted', label='Regression line') plt.xlabel(unicodeit.replace('Size (m^2)'),fontsize = 15) plt.ylabel('Price',fontsize = 15) plt.title('House size and price',fontsize =20,loc='center') plt.savefig('LinearRegression_PropertyPriceSize.png')
plt.show()
------------------------------------------------------------------------------
And here is the scatter graph of the data with a regression line!
So overall I'm very happy with this. After a bit more playing around i'd like to be able to sort out the y-axis so that the title isn't cut off!
But what does that code actually do?
A couple of things in the code that I'd like to explain:
The first 7 lines (i.e the import numpy etc) are for importing the relevant libraries. By stating something like 'import matplotlib.pyplot as plt' means that each time you want to call on matplotlib.pyplot, you only have to write 'plt'!
The next line of code, "data = pd.read_csv('real_estate_price_size.csv')" is using the pandas library and syntax to load in the real estate data (in comma seperated value [.csv] format).
By using "print(data.describe())" we can see a variety of statistics as shown below:
4. After this, the next lines of code are declaring the dependent variable (y) and the independent variable (x1). With regards to the '1' after the 'x' - it's a good habit to get into labelling them in this way (we will see later that we can make our regression more sophisticated [or in some cases not!] by using more than one x term [i.e x1, x2, x3,...,xk]).
5. The next bit is slightly more complicated - so it's probably worth taking a step back. Remember that a straight line has the equation: y = mx + c, where y = dependent variable, x = independent variable, m = slope and c = intercept (where the line cuts the y-axis). So now imagine that we're trying to fit this line to some data points - we want a constant term c that is non-zero.
A bit of a digression here - but hopefully this will make sense. Imagine that we are trying to predict someones salary based on their years of experience. We collect data on both and we want to fit a line to this data to understand how salary changes with experience.
Even if someone has zero years of experience, they still might have a base salary.
So in regresison analysis, we often include a constant term (or intercept) to account for this baseline value. When you add this constant term to your data, it ensures that the regression model considers this starting salary even if someone has zero years of experience.
This constant term is like the starting point on the salary scale, just like the y-intercept is the starting point on the graph of a straight line.
x = sm.add_constant(x1) will add constants (of 1) as a column next to our 'x1' which in this example is 'size'
So this column of ones corresponds to x0 in the equation
y_hat = b0 * x0 + b1 * x1.
So this means that x0 is always going to be 1, which in turn yields
y_hat = b0 + b1*x1
So now that we've got that cleared up, it's on to the next bit. This is the following section of code:
results = sm.OLS(y, x).fit() print(results.summary())
sm.OLS (Odinary Least Squares) is a method for estimating the parameters in a linear regression model
(y,x) represent the dependent variable we're trying to predict and the indepenedent variable we believe are influencing y.
.fit() this is a function that fits the linear regression model to your data, estimating the coefficients that best describe the relationship between the variables.
results.summary() returns this:
6. The next section is relatively straightforward! This is the bit where we actually plot the data on the graph!
So first of all we have plt.scatter(x1, y) - which is fairly self explanatory!
Next, we declare a new variable yhat. The highlighted sections in the previous image give us the values we need for the equation. So we have 223.1787 (this is our Beta1 that is multiplying x1) and we have 1.019e+05 (which is 101,900) which is our Beta0. I suppose we could have arranged it (so that it is in the same format as the diagram) as yhat = 101900 + 223.1787*x1
So the next bit - I have to be honest - I'm not entirely sure of the arguments and their order, but essentially it looks as if
fig = plt.plot(x1, yhat, lw=2, color='red', linestyle='dotted', label='Regression line')
is determining the attributes of the graph (i.e the variables for the x and y axes, the line width (lw), colour, the style of the line etc).
The next 3 to 4 lines are quite straightforward. Essentially just giving the axes a title, specifying the font size, giving the chart a title and using the plt.savefig() method to save the image!
0 notes