Effective Stiffness of Fiber Composite


This example demonstrates the use of an homogenization model to predict the macroscopic stress in fiber-like structures. The structures emulate fiber-reinforced polymer samples. For a summary of homogenization theory and its use with effective stiffness properties please see the Effective Siffness of a Composite Material example. The example generates a series of random microstructures with various fiber lengths and volume fractions. The structures are used to calibrate and test the homegnization model against the simulated effective stress values.

import warnings
import os

os.environ["OMP_NUM_THREADS"] = "1"

import dask.array as da
import numpy as np
import pandas
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from dask_ml.decomposition import IncrementalPCA
from dask_ml.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from dask_ml.model_selection import train_test_split
from dask_ml.model_selection import GridSearchCV
from mpl_toolkits.mplot3d import Axes3D

from pymks import (

%matplotlib inline
%load_ext autoreload
%autoreload 2

Data Generation

The following generates four different types of microstructures each with 200 samples with spatial dimensions of 31 x 31. The grains have a high aspect ratio to simulate fiber-like structures.


tmp = [
        shape=(100, 31, 31),
    for (grain_size, volume_fraction) in zip(
        [(30, 2), (7, 2), (2, 30), (2, 7)],
        [(0.7, 0.3), (0.6, 0.4), (0.3, 0.7), (0.4, 0.6)]

x_data = da.concatenate(tmp).persist()
y_stress = solve_fe(x_data,
                    elastic_modulus=(1.3, 2.5),
                    poissons_ratio=(0.42, 0.35),
                    macro_strain=0.001)['stress'][..., 0].persist()

y_data = da.average(y_stress.reshape(y_stress.shape[0], -1), axis=1).persist()

The arrays have been computed by calling the persist method.


Array Chunk
Bytes 3.08 MB 384.40 kB
Shape (400, 31, 31) (50, 31, 31)
Count 8 Tasks 8 Chunks
Type int64 numpy.ndarray
31 31 400

Array Chunk
Bytes 3.20 kB 400 B
Shape (400,) (50,)
Count 8 Tasks 8 Chunks
Type float64 numpy.ndarray
400 1

View Microstructures

plot_microstructures(*x_data[::100], cmap='gray', colorbar=False)
(400, 31, 31)

array([0.00307852, 0.00316098, 0.00324742, 0.0031705 ])

The Model

The following creates an homogenization pipeline and cross-validates the model. For more details about the two-points statistics view the Checkerboard Microstructure Example.

Note (issue or bug)

There are currently two issues with the pipeline.

  • The svd_solver='full' argument is required in the pipeline as the results are not correct without it. This might be an issue with Dask’s PCA algorithms and needs further investigation.
  • Dask’s LinearRegression module does not seem to solve this problem correctly and gives very different results to Sklearn’s. Needs further investigation. Using Sklearn’s instead.
pca_steps = [
    ("reshape", GenericTransformer(
        lambda x: x.reshape(x.shape[0], x_data.shape[1], x_data.shape[2])
    ("discritize",PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0)),
    ("correlations",TwoPointCorrelation(periodic_boundary=True, cutoff=31, correlations=[(0, 1), (1, 1)])),
    ('flatten', GenericTransformer(lambda x: x.reshape(x.shape[0], -1))),
    ('pca', IncrementalPCA(n_components=3, svd_solver='full'))

pca_pipeline = Pipeline(steps=pca_steps)

pipeline = Pipeline(steps=pca_steps + [
    ('poly', PolynomialFeatures()),
    ('regressor', LinearRegression())

Training and testing data sets

Split the data into testing and training segments for computations in the rest of the notebook

x_train, x_test, y_train, y_test = train_test_split(
    x_data.reshape(x_data.shape[0], -1),

Visualize the task graph

The pipeline results in an entirely lazy calulation. The task graph can be viewed to check that it remains parallel and doesn’t gather all the data. The calculation has not been evaluated yet, since the graph is generated with only a small part of the data.

Note (issue or bug)

Note that for some reason the y_train passed into fit is required to be a Numpy array and doens’t work with a correctly chunked Dask array, see the github issue.


pca_pipeline.fit(x_train, np.array(y_train)).transform(x_train).visualize()

Cross Validation

The following creates an homogenization pipeline and cross-validates the model. For more details about the two-points statistics view the Checkerboard Microstructure Example. Use GridSearchCV to optimize the n_components and degree hyper-parameters for the model. Search over the range of 1st order to 3rd order degree polynomials and 2 to 7 principal components.

params_to_tune = {'pca__n_components': np.arange(1, 7),'poly__degree': np.arange(1, 4)}

grid_search = GridSearchCV(pipeline, param_grid=params_to_tune).fit(x_train, y_train)

Display the best model from the grid search and check that it is accurate.


print('Order of Polynomial',grid_search.best_params_.get('poly__degree'))
print('Number of Components', grid_search.best_params_.get('pca__n_components'))
print('R-squared Value', np.allclose(grid_search.score(x_test, y_test), 1.0, rtol=2e-2))
Order of Polynomial 2
Number of Components 6
R-squared Value True
assert (1 <= grid_search.best_params_.get('poly__degree') <= 3), grid_search.best_params_.get('poly__degree')
assert (3 <= grid_search.best_params_.get('pca__n_components') <= 6), grid_search.best_params_.get('pca__n_components')
assert np.allclose(grid_search.score(x_test, y_test), 1.0, rtol=2e-2), grid_search.score(x_test, y_test)
def plot_line(x, mean, std_dev, color, label, linestyle):
    plt.fill_between(x, mean - std_dev, mean + std_dev, alpha=0.1, color=color)
    plt.plot(x, mean, 'o-', color=color, label=label, linewidth=2, linestyle=linestyle)

def plot(dfs):
        df_.mean_, df_.std_,
     for df_ in dfs]
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize=15)
    plt.ticklabel_format(style='sci', axis='y')
    plt.ylabel('R-Squared', fontsize=20)
    plt.xlabel('Number of Components', fontsize=20)
    plt.ylim(0.0, 1.05)

def add_labels(df):
    return df.assign(
        label=df.degree.map(lambda n: {1: '1st Order', 2: '2nd Order', 3: '3rd Order'}[n]),
        color=df.degree.map(lambda n: {1: '#1a9850', 2: '#f46d43', 3: '#1f78b4'}[n]),
        linestyle=df.degree.map(lambda n: {1: 'solid', 2: 'dotted', 3: 'dashed'}[n])

def rename(df):
    return df.rename(

def groupby(df, col):
    gb = df.groupby(col)
    return [gb.get_group(x) for x in gb.groups]

df = groupby(


The best model has degree equal to 3 and n_components equal to 5.

model = grid_search.best_estimator_

Structures in PCA space

Plot the samples in 3D principle component space using a pipeline that omits both the polynomial features and the linear regression. This pipeline is a transform only pipeline, not a predicting pipeline.


Apply the pca_pipeline to the training dataset and testing dataset to obtain their principle component values.

x_trans_train = pca_pipeline.fit(x_train).transform(x_train).compute()
x_trans_test = pca_pipeline.transform(x_test).compute()

Visualize the principle component information in the 3D plot.

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('Component 1', fontsize=12)
ax.set_ylabel('Component 2', fontsize=12)
ax.set_zlabel('Component 3', fontsize=12)
ax.plot(*x_trans_train.T, 'o', label='Training Data', color='#1a9850')
ax.plot(*x_trans_test.T, 'o', label='Testing Data', color='#f46d43')
plt.title('Low Dimensional Representation', fontsize=15)
plt.legend(loc=1, borderaxespad=0., fontsize=15)
plt.legend(bbox_to_anchor=(1.05, 1.0), loc=2, borderaxespad=0., fontsize=15)

There is agreement between the testing and the training data. There is also some separation between fiber types in PC space.

Parity Plot

The calculated average stress is plotted against the stress generate by the model.

y_train_predict = model.predict(x_train)
y_predict = model.predict(x_test)
fit_data = np.array([y_train.compute(), y_train_predict])
pred_data = np.array([y_test.compute(), y_predict])

y_total = np.concatenate((fit_data, pred_data), axis=-1)
line = np.min(y_total), np.max(y_total)

plt.plot(fit_data[0], fit_data[1], 'o', color='#1a9850', label='Training Data')
plt.plot(pred_data[0], pred_data[1], 'o', color='#f46d43', label='Testing Data')
plt.plot(line, line, '-', linewidth=3, color='#000000')
plt.title('Goodness of Fit', fontsize=20)
plt.xlabel('Actual', fontsize=18)
plt.ylabel('Predicted', fontsize=18)
plt.legend(loc=2, fontsize=15)

There is a good correlation between the FE results and those predicted by the linkage.