Results 3.3 Correlation Analysis of Supraspinatus, Infraspinatus and Subscapularis

2. Results 3.3 Correlation Analysis of Supraspinatus, Infraspinatus and Subscapularis#

This Python script performs an analysis to determine the correlation between the log-transformed origin areas and volumes of specific shoulder muscles. The analysis is conducted for the following muscles:

Supraspinatus
Infraspinatus
Subscapularis

The log-transformation is applied to the origin areas and volumes to normalize the data and improve the accuracy of the correlation analysis.

2.1. Imports#

Required packages for this analysis can be found in the requirements.txt file. Ensure that all dependencies are installed before running the script.

2.2. Input Setup#

Loads the Muscle_data.xlsx file containing all original data and log transforms origin area and experimental volume.

2.3. Filter data#

Remove Teres Major and Minor from the analysis to investigate them separately.

2.4. Tests for normality#

Performs a Shapiro-Wilk test and evaluates normality using Q-Q plots

Shapiro-Wilk test for muscle origin:  ShapiroResult(statistic=0.9892005715167651, pvalue=0.9907832024086947)
Shapiro-Wilk test for muscle volume:  ShapiroResult(statistic=0.9838490399269613, pvalue=0.9369937392036665)

../_images/2ef046b85835897e868ac0b81b8204bc0d8cc9fe1446a45f18971a8f09d888df.png

../_images/d0ad4e3b25483758dbed19d74718d3902cf59751580921dab6232003c0ed15cb.png

2.5. Pearson Correlation Analysis#

                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Log_Volume_exp   R-squared:                       0.911
Model:                            OLS   Adj. R-squared:                  0.907
Method:                 Least Squares   F-statistic:                     254.6
Date:                Wed, 06 Nov 2024   Prob (F-statistic):           1.29e-14
Time:                        16:29:16   Log-Likelihood:                -9.9869
No. Observations:                  27   AIC:                             23.97
Df Residuals:                      25   BIC:                             26.57
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const              -0.3170      0.302     -1.049      0.304      -0.939       0.305
Log_Origin_Area     1.2119      0.076     15.957      0.000       1.056       1.368
==============================================================================
Omnibus:                        0.619   Durbin-Watson:                   1.327
Prob(Omnibus):                  0.734   Jarque-Bera (JB):                0.691
Skew:                           0.186   Prob(JB):                        0.708
Kurtosis:                       2.311   Cond. No.                         18.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Show code cell source Hide code cell source

# determine results
# determine prediction intervals boundaries
prstd, iv_l, iv_u = wls_prediction_std(re)
# determine slope and intercept
intercept = re.params.iloc[0] # The first parameter is the intercept
slope = re.params.iloc[1]      # The remaining parameter is the slope

# determine y values for linear model
y_pred = slope * x + intercept

# Format the regression line equation
if intercept < 0:
    regression_eq = f'y = {slope:.2f}x - {abs(intercept):.2f}'
else:
    regression_eq = f'y = {slope:.2f}x + {intercept:.2f}'

# Reporting
print(f"\nThe Pearson correlation coefficient was found to be {bold_start}r = {r:.2f}{bold_end}, p = {p:.2e}, \nwith degrees of freedom df = {df} (n = {n}).\n")

print(f"95% Confidence interval for r: ({r_lower:.2f}, {r_upper:.2f})\n")
print(f"The Regression line equation is: {bold_start}{regression_eq}{bold_end}.\n")



# Figure settings Correlation
# Define the custom colors for each muscle
custom_colors = {
    'Supraspinatus': (0.467, 0.710, 0.367),
    'Infraspinatus': (0.7, 0.47, 0.82),
    'Subscapularis': (0.45, 0.56, 0.87),
    'Teres Minor': (0.741, 0.8, 0.384),
    'Teres Major': (0.871, 0.435, 0.427)
}


# Define unique markers for Genus
custom_markers = {
    'Hylobates': 'P',
    'Symphalangus': 'X',
    'Pongo': 'D',
    'Gorilla': '^',
    'Pan': 'v',
    'Homo': 's'
}


# Set figure size (convert cm to inches)
width_cm = 17.7
height_cm = 10  
width_in = width_cm / 2.54
height_in = height_cm / 2.54


# Create the figure and subplots
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(width_in, height_in), sharey=True)

# Scatter Plot all (left subplot)
ax1 = axes[0]

# Get unique muscle names and specimen names
muscle_names = data['Muscle_Name']
specimen_names = data['Genus']

# Plot each data point with the respective color and marker
for muscle, color in custom_colors.items():
    for specimen, marker in custom_markers.items():
        mask = (muscle_names == muscle) & (specimen_names == specimen)
        sns.scatterplot(x='Log_Origin_Area', y='Log_Volume_exp', ax=ax1, data=data[mask],
                        color=color, marker=marker)

ax1.set_title('Origin Area and Volume of all Muscles', fontsize=10)
ax1.set_xlabel('Log Muscle Origin Area (cm$^2$)', fontsize=9)
ax1.set_ylabel('Log Muscle Volume (cm$^3$)', fontsize=9)

ax1.tick_params(axis='both', which='major', labelsize=8)
for label in (ax1.get_xticklabels() + ax1.get_yticklabels()):
    label.set_size(8)

# Add annotation "a"
ax1.annotate('(a)', xy=(-0.1, 1.05), xycoords='axes fraction', fontsize=12)


# Correlation Supra, infra, subscap (right subplot)
ax2 = axes[1]

# Get unique muscle names and specimen names
muscle_names = filtered_data['Muscle_Name']
specimen_names = filtered_data['Genus']

# sort values before using the fill_between function
# Sort the values based on x (Log_Origin_Area)
sorted_indices = x.argsort()
x_sorted = x.iloc[sorted_indices]
iv_l_sorted = iv_l.iloc[sorted_indices]
iv_u_sorted = iv_u.iloc[sorted_indices]

# Plot the prediction intervals as shaded areas
ax2.fill_between(x_sorted, iv_l_sorted, iv_u_sorted, alpha=0.1, color='red', linewidth = 0) # 95% Prediction Interval

# Determine line for isometric scaling
Y = filtered_data['Log_Origin_Area'] + slope*min(filtered_data['Log_Origin_Area']) + intercept -min(filtered_data['Log_Origin_Area'])
ax2.plot(filtered_data['Log_Origin_Area'], Y, color='grey')

# Plot each data point with the respective color and marker
for muscle, color in custom_colors.items():
    for specimen, marker in custom_markers.items():
        mask = (muscle_names == muscle) & (specimen_names == specimen)
        sns.scatterplot(x='Log_Origin_Area', y='Log_Volume_exp', ax=ax2, data=filtered_data[mask],
                        color=color, marker=marker)

# Plot regression line
sns.lineplot(x=x, y=y_pred, color='red', ax=ax2)


# Apply custom fonts to the plot
ax2.set_title('Supra- and Infraspinatus and Subscapularis',fontsize=10)
ax2.set_xlabel('Log Muscle Origin Area (cm$^2$)', fontsize=9)
ax2.set_ylabel('Log Muscle Volume (cm$^3$)', fontsize=9)

ax2.tick_params(axis='both', which='major', labelsize=8)
for label in (ax2.get_xticklabels() + ax2.get_yticklabels()):
    label.set_size(8)
    
# Add annotation text
ax2.annotate(
    f'Regression line: {regression_eq}\nPearson r: {r:.2f}\n95% CI: ({r_lower:.2f}, {r_upper:.2f})\np-value: {p_tex}', 
    xy=(0.05, 0.95), xycoords='axes fraction', fontsize=9, 
    horizontalalignment='left', verticalalignment='top'
)

# Add annotation "b"
ax2.annotate('(b)', xy=(-0.1, 1.05), xycoords='axes fraction', fontsize=12)


# Create custom legend handles
handles_muscles = [plt.Line2D([0], [0], marker='o', color='w', label=muscle,
                              markerfacecolor=color, markersize=10) 
                   for muscle, color in custom_colors.items()]

# Create handles with italicized labels
handles_specimen = [
    plt.Line2D(
        [0], [0],
        marker=marker,
        color='w',
        label=f'${specimen}$',  # Italicize the label here
        markerfacecolor='gray',
        markersize=10
    )
    for specimen, marker in custom_markers.items()
]

# Add legends above and below the plots
fig.legend(handles=handles_specimen, title='Genus', loc='upper center', ncol=6)
fig.legend(handles=handles_muscles, title='Muscles', loc='lower center', ncol=6)

# Adjust layout to make room for legends
plt.tight_layout(rect=[0, 0.1, 1, 0.9])  # Leave space for legends


plt.show()

The Pearson correlation coefficient was found to be r = 0.95, p = 1.29e-14, 
with degrees of freedom df = 25 (n = 27).

95% Confidence interval for r: (0.90, 0.98)

The Regression line equation is: y = 1.21x - 0.32.

../_images/d3ef6c0cd35ae392bf25b5fe5e05ce7520bf42b7d9ac39467c0903469270155f.png — Fig. 2.1 Manuscript Figure 4: Analysis of the correlation between muscle volume and origin area. (a) Scatter plot illustrating muscle volumes versus muscle origin areas for all muscles. (b) Pearson correlation of log-transformed data for the muscle origin areas and volumes of the supraspinatus, infraspinatus, and subscapularis muscles. The red line shows the best-fit linear regression between muscle origin area and volume. The red shaded area around the red line indicates the prediction intervals. The grey line represents the prediction assuming isometric scaling with a slope of one.#