Case Study: Statistical Forecasting

Already solved just needs to be input into an excel spreadsheet and summarize

Final part

summary of the findings you learned through the analysis in 3 paragraphs .
Provide three data-driven suggestions for further exploration.

Sources for – Provide three data-driven suggestions for further exploration. are as follows

Data-Driven Decision-Making for Health Administrators

https://www.tableau.com/learn/articles/data-driven-decision-making

https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/three-keys-to-building-a-data-driven-strategy

Solved information plus original question; tables for each part is attached.

Part 1:

Megan is initiating some efforts at a preliminary analysis. She has seen 20 initial patients and made several observations about the skin disease. She wants to analyze this initial data before structuring and recommending a more encompassing study.

The signs and symptoms of this disorder usually affect multiple sections of the patient’s body. These signs and symptoms may include:

Pain, burning, numbness or tingling, but pain is always present.
Sensitivity to touch.
A red rash that begins a few days after the pain.
Fluid-filled blisters that break open and crust over.
Itching.

Some people also experience:

Fever.
Headache.
Sensitivity to light.
Fatigue.

Pain is always the first symptom of PR. For some, it can be intense. Depending on the location of the pain, it can sometimes be mistaken for a symptom of problems affecting the heart, lungs, or kidneys. Some people experience PR pain without ever developing the rash. The degree of pain that the individual experiences is seemingly proportional to the number of lesions.

Dr. Zobb is extremely concerned that this new variant is especially challenging to the younger population, who are active and like to be outdoors. She has asked you as an analyst and statistician for some assistance in analyzing her initial data. She is not a biostatistician, so she requests that you explain the process you use and your interpretation of the results for each task.

Initial Data Analysis

Dr. Zobb has accumulated some data on an initial set of 20 patients across multiple age groups. She believes that the data suggests younger individuals are affected more than others. She wants you to complete the tasks shown here based on the data below.

For each of the following, provide a detailed explanation of the process you used along with your interpretation of the results. Submit the response in a Word document and attach your Excel spreadsheet to show your calculations (where applicable). Be sure to number each response (e.g., 1.a, 1.b,…).

Develop an equation to model the data using a regression analysis approach and explain your calculation process in Excel.
Calculate the r-square statistic using Excel. Interpret the meaning of the r-square statistic in this case.
Determine three conclusions that address the initial observations and are supported by the regression analysis.

solution:

Regression Analysis Initial Data

a. Equation to Model the Data: To model the data using regression analysis, we will use the number of lesions as the dependent variable and the age of the patient as the independent variable. The equation for the regression line is: y = β0 + β1x where y is the number of lesions, x is the age of the patient, β0 is the y-intercept and β1 is the slope of the line. To calculate the regression line in Excel, we will use the LINEST function. The formula for the regression line in Excel is: =LINEST(y-range, x-range, constant, stats) where y-range is the range of cells containing the number of lesions, x-range is the range of cells containing the age of the patient, constant is a logical value indicating whether the regression line should be forced through the origin, and stats is a logical value indicating whether to return additional regression statistics.

b. R-Square Statistic: The r-square statistic is a measure of the proportion of variance in the dependent variable that is explained by the independent variable. It ranges from 0 to 1, with a value of 1 indicating that all the variance in the dependent variable is explained by the independent variable. To calculate the r-square statistic in Excel, we will use the RSQ function. The formula for the r-square statistic in Excel is: =RSQ(y-range, x-range) where y-range is the range of cells containing the number of lesions and x-range is the range of cells containing the age of the patient.

c. Interpretation of the R-Square Statistic: In this case, the r-square statistic is 0.388. This means that 38.8% of the variance in the number of lesions is explained by the age of the patient. This implies that there are other factors that influence the number of lesions, such as the amount of sunlight exposure, that should be considered in future studies.

d. Conclusions Based on Regression Analysis:

There is a positive relationship between the age of the patient and the number of lesions. As the age of the patient increases, the number of lesions decreases.
The age of the patient explains 38.8% of the variance in the number of lesions, implying that there are other factors that influence the number of lesions.
Based on the regression analysis, we can predict the number of lesions for a given age of the patient. For example, if a patient is 30 years old, we can predict that they will have approximately 14 lesions.
Effects of Sunlight Analysis

a. Equation to Model the Data: To model the data using regression analysis, we will use the number of lesions as the dependent variable and the time of continuous exposure to direct sunlight as the independent variable. The equation for the regression line is: y = β0 + β1x where y is the number of lesions, x is the time of continuous exposure to direct sunlight, β0 is the y-intercept and β1 is the slope of the line. To calculate the regression line in Excel, we will use the LINEST function. The formula for the regression line in Excel is: =LINEST(y-range, x-range, constant, stats) where y-range is the range of cells containing the number of lesions, x-range is the range of cells containing the time of continuous exposure to direct sunlight, constant is a logical value indicating whether the regression line should be forced through the origin

how to input into excel:

Develop an equation to model the data using a regression analysis approach and explain your calculation process in Excel.

To model the data using a regression analysis approach, we need to find the relationship between the two variables, age and number of lesions. We will use linear regression to model this relationship.

Step 1: Create a scatterplot of the data

In Excel, input the patient number, age, and number of lesions into three separate columns.
Select the data and insert a scatterplot.

Step 2: Add the regression line

Right-click on one of the data points and select “Add Trendline”
Select linear regression as the type of trendline
Select “Display Equation on Chart” and “Display R-Squared Value on Chart”

Step 3: Interpret the results

The equation of the regression line represents the relationship between age and number of lesions. The equation can be used to predict the number of lesions based on the age of the patient.
The R-squared value represents the proportion of variability in the number of lesions that is explained by the age of the patient.

Calculate the r-square statistic using Excel. Interpret the meaning of the r-square statistic in this case.

The R-squared statistic can be calculated using Excel by following the steps outlined above in the regression analysis process. The R-squared value represents the proportion of variability in the number of lesions that is explained by the age of the patient.

A value of 1 means that all of the variability in the number of lesions is explained by the age of the patient. A value of 0 means that the age of the patient does not explain any of the variability in the number of lesions.

In this case, the R-squared value is 0.31, meaning that 31% of the variability in the number of lesions is explained by the age of the patient.

Determine three conclusions that address the initial observations and are supported by the regression analysis.
The age of the patient is positively associated with the number of lesions. This can be seen from the positive slope of the regression line.
The age of the patient explains 31% of the variability in the number of lesions. This can be seen from the R-squared value of 0.31.
There is a large amount of variability in the number of lesions that is not explained by the age of the patient. This can be seen from the low R-squared value of 0.31, meaning that 69% of the variability in the number of lesions is not explained by the age of the patient.
Develop an equation to model the data using a regression analysis approach and explain your calculation process in Excel.

PART 2:

Effects of Sunlight Analysis

In her initial observations, Dr. Zobb notices that the number of lesions that appear on a patient seems to be dependent on the amount of direct sunlight exposure that the patient receives. She is uncertain at this point why this would be the case, but she is a good experimentalist and is trying to establish some observations that have statistical validity. She has taken a limited amount of data on 8 patients and wants you to complete the appropriate analysis based on the data below (be sure to show your work):

Develop an equation to model the data using a regression analysis approach and explain your calculation process, using Excel.
Megan has a small group of three additional patients that are the same age that she wants to examine for lesions. She knows the number of minutes of continuous exposure to direct sunlight that each has experienced. Predict the number of lesions that each of these patients will have based on the regression analysis that you completed in your initial data analysis:
- Patient 9 – 193 minutes.
- Patient 10 – 219 minutes.
- Patient 11 – 84 minutes.
Determine three conclusions based on the correlation of the number of lesions to minutes of sunlight exposure, using regression analysis.

SOLUTION:

Sunlight Exposure Regression Analysis:

Developing an equation to model the data using regression analysis: We will use linear regression to model the relationship between the number of lesions and the time of continuous exposure to direct sunlight. First, we will need to calculate the mean of both variables and the covariance between them. Mean of Time of Continuous Exposure to Direct Sunlight (x) = 190.375 minutes Mean of Number of Lesions (y) = 19.375 Covariance (cov(x,y)) = 136.0625 Next, we will calculate the standard deviation of both variables. Standard deviation of Time of Continuous Exposure to Direct Sunlight (x) = 33.4878 minutes Standard deviation of Number of Lesions (y) = 4.9207 Finally, we can calculate the correlation coefficient (r) using the formula: r = cov(x,y) / (SD(x) * SD(y)) = 4.1184 The regression equation is: y = b0 + b1x where b0 = y-intercept and b1 = slope b1 = r * (SD(y) / SD(x)) = 0.1369 b0 = mean(y) – b1 * mean(x) = -3.3109 Therefore, the equation that models the data is: y = -3.3109 + 0.1369x
Predicting the number of lesions for new patients: Using the equation we derived, we can predict the number of lesions for each of the new patients based on their time of continuous exposure to direct sunlight. Patient 9: 193 minutes y = -3.3109 + 0.1369 * 193 = 22.6052~ Patient 10: 219 minutes y = -3.3109 + 0.1369 * 219 = 24.4501 ~Patient 11: 84 minutes y = -3.3109 + 0.1369 * 84 = 12.2597
Conclusions:
There is a positive relationship between the number of lesions and the time of continuous exposure to direct sunlight. As the time of exposure increases, the number of lesions also increases.
The regression equation we derived is a useful tool for predicting the number of lesions for new patients based on their time of exposure to sunlight.
The correlation coefficient (r) indicates a moderate positive correlation between the two variables. However, it is important to keep in mind that correlation does not necessarily imply causation and further research is needed to establish the underlying cause of the relationship.

PART 3:

Over the Counter Medication Effectiveness Analysis

Dr. Zobb wants to test several over the counter lotions—that is, lotions available without a prescription—that can be applied directly to the lesions. She wants to determine whether there is a difference in the mean length of time it takes these three types of pain lotions to provide relief from the pain caused by these lesions. Megan is hoping that one of these lotions might be more promising than the others. Several sufferers (with roughly the same number of lesions) are randomly selected and given one of the three medications. Each sufferer records the time (in minutes) it takes the medication to begin working. The results are shown in the table below. She asks you to answer these questions (be sure to show your work).

State the null hypothesis and the alternative hypothesis for this situation.
At α = 0.01, can you conclude that the mean times are different? Assume that each population of relief times is normally distributed and that the population variances are equal. Hint: Use a one-way ANOVA to solve this problem. Be certain to show your calculations and describe the process you used to solve this problem.
Determine three conclusions on the effectiveness of the medication by addressing observations or hypotheses regarding these initial tests.

SOLUTION:

Hypotheses: Null hypothesis (H0): The mean time it takes for each medication to provide relief is equal. Alternative hypothesis (Ha): The mean time it takes for at least one of the medications to provide relief is different from the others.
ANOVA Calculation: Step 1: Calculate the sample means for each medication: Medication 1 (12, 15, 17, 12) = 14 Medication 2 (16, 14, 21, 15, 19) = 16.6 Medication 3 (14, 17, 20, 15, 0) = 14.4

Step 2: Calculate the sum of squares (SS) for each medication: Medication 1: SS = (12-14)^2 + (15-14)^2 + (17-14)^2 + (12-14)^2 = 12 Medication 2: SS = (16-16.6)^2 + (14-16.6)^2 + (21-16.6)^2 + (15-16.6)^2 + (19-16.6)^2 = 31.8 Medication 3: SS = (14-14.4)^2 + (17-14.4)^2 + (20-14.4)^2 + (15-14.4)^2 + (0-14.4)^2 = 30.4

Step 3: Calculate the total sum of squares (SST): SST = SS for Medication 1 + SS for Medication 2 + SS for Medication 3 = 12 + 31.8 + 30.4 = 74.2

Step 4: Calculate the mean square (MS) for each medication: MS = SS/df where df = number of treatments – 1 = 2 MS = 74.2/2 = 37.1

Step 5: Calculate the F-statistic: F = MS for treatment / MS for error MS for error = SSE/df where df = n – number of treatments n = number of observations in each medication df for error = 5 – 3 = 2 SSE = SST – SS for treatment = 74.2 – 37.1 = 37.1 MS for error = SSE/df = 37.1/2 = 18.55 F = MS for treatment / MS for error = 37.1 / 18.55 = 1.99

Step 6: Compare the F-statistic to the critical value: At α = 0.01 and df = 2, the critical value from the F-distribution table is 6.635. Since F = 1.99 < 6.635, we fail to reject the null hypothesis.

Conclusions:
We cannot conclude that the mean times for the relief of each medication are different at α = 0.01.
There is not enough evidence to support that one of the medications is more effective than the others.
Further studies with larger sample sizes and different populations should be conducted to determine the effectiveness of these medications.