API

Table

coeff_to_real Convert the coefficients to real space
correlations_multiple(data, correlations[, …]) Calculate 2-point stats for a multiple auto/cross correlation
generate_checkerboard(size[, square_shape]) Generate a 2-phase checkerboard microstructure
generate_delta Generate a delta microstructure
generate_multiphase Constructs microstructures for an arbitrary number of phases given the size of the domain, and relative grain size.
graph_descriptors([data, delta_x, …]) Compute graph descriptors for multiple samples
paircorr_from_twopoint(x_data[, cutoff_r, …]) Computes the pair correlations from 2-point statistics.
plot_microstructures(*arrs[, titles, cmap, …]) Plot a set of microstructures side-by-side
solve_cahn_hilliard(x_data[, n_steps, …]) Solve the Cahn-Hilliard equation.
solve_fe Solve the elasticity problem
test(*args) Run all the module tests.
two_point_stats Calculate the 2-points stats for two arrays
FlattenTransformer Reshape data ready for a PCA.
GenericTransformer(func) Make a generic transformer based on a function
GraphDescriptors([delta_x, periodic_boundary]) Calculate GraphDescriptors as part of a Sklearn pipeline
LegendreTransformer([n_state, min_, max_, …]) Legendre transformer for Sklearn pipelines
LocalizationRegressor([redundancy_func]) Perform the localization in Sklearn pipelines
PrimitiveTransformer([n_state, min_, max_, …]) Primitive transformer for Sklearn pipelines
ReshapeTransformer(shape) Reshape data ready for the LocalizationRegressor
TwoPointCorrelation([correlations, …]) Calculate the 2-point stats for two arrays as part of Scikit-learn pipeline.

Functions

pymks.coeff_to_real()

Convert the coefficients to real space

Convert the pymks.LocalizationRegressor coefficiencts to real space. The coefficiencts are calculated in Fourier space, but best viewed in real space. If the Fourier coefficients are defined as \(\beta\left[l, k\right]\) then the real space coefficients are calculated using,

\[\alpha \left[l, r\right] = \frac{1}{N} \sum_{k=0}^{N-1} \beta\left[l, k\right] e^{i \frac{2 \pi}{N} k r} e^{i \pi}\]

where \(l\) is the local state and \(r\) is the spatial index from \(0\) to \(N-1\). The \(e^{i \pi}\) term is a shift applied to place the 0 coefficient at the center of the domain for viewing purposes.

Parameters:
  • coeff (array) – the localization coefficients in Fourier space as a Dask array (n_x, n_y, n_state)
  • new_shape (tuple) – shape of the output to either shorten or pad with zeros
Returns:

the coefficients in real space

A spike at \(k=1\) should result in a cosine function on the real axis.

>>> N = 100
>>> fcoeff = np.zeros((N, 1))
>>> fcoeff[1] = N
>>> x = np.linspace(0, 1, N + 1)[:-1]
>>> assert np.allclose(
...     coeff_to_real(da.from_array(fcoeff)).real.compute(),
...     np.cos(2 * np.pi * x + np.pi)[:, None]
... )
pymks.correlations_multiple(data, correlations, periodic_boundary=True, cutoff=None)

Calculate 2-point stats for a multiple auto/cross correlation

The discretized two point statistics are given by

\[f[r \; \vert \; l, l'] = \frac{1}{S} \sum_s m[s, l] m[s + r, l']\]

where \(f[r \; \vert \; l, l']\) is the conditional probability of finding the local states \(l\) and \(l'\) at a distance and orientation away from each other defined by the vector \(r\). See this paper for more details on the notation.

The correlations are calulated based on pairs given in correlations for each sample.

To calculate a single correlation for two arrays, see two_point_stats().

To use correlations_multiple as part of a Scikit-learn pipeline, see TwoPointCorrelation.

Parameters:
  • data – the discretized data with shape (n_samples, n_x, n_y, n_state)
  • correlations – the correlation pairs, [[i0, j0], [i1, j1], ...]
  • periodic_boundary – whether to assume a periodic boundary (default is true)
  • cutoff – the subarray of the 2 point stats to keep
Returns:

the 2-points stats array

If data is a Numpy array then correlations_multiple will return a Numpy array.

>>> data = np.arange(18).reshape(1, 3, 3, 2)
>>> out_np = correlations_multiple(data, [[0, 1], [1, 1]])
>>> out_np.shape
(1, 3, 3, 2)
>>> answer = np.array([[[58, 62, 58], [94, 98, 94], [58, 62, 58]]]) + 2. / 3.
>>> assert np.allclose(out_np[..., 0], answer)

However, if data is a Dask array then a Dask array is returned.

>>> data = da.from_array(data, chunks=(1, 3, 3, 2))
>>> out = correlations_multiple(data, [[0, 1], [1, 1]])
>>> out.shape
(1, 3, 3, 2)
>>> out.chunks
((1,), (3,), (3,), (2,))
>>> assert np.allclose(out[..., 0], answer)
pymks.generate_checkerboard(size, square_shape=(1, ))

Generate a 2-phase checkerboard microstructure

Parameters:
  • size (tuple) – the size of the domain (n_x, n_y)
  • square_shape (tuple) – the shape of each subdomain (n_x, n_y)
Returns:

a microstructure of shape (1,) + shape (extra sample axis)

>>> print(generate_checkerboard((4,)).compute())
[[0 1 0 1]]
>>> print(generate_checkerboard((3, 3)).compute())
[[[0 1 0]
  [1 0 1]
  [0 1 0]]]
>>> print(generate_checkerboard((3, 3), (2,)).compute())
[[[0 0 1]
  [0 0 1]
  [1 1 0]]]
>>> print(generate_checkerboard((5, 8), (2, 3)).compute())
[[[0 0 0 1 1 1 0 0]
  [0 0 0 1 1 1 0 0]
  [1 1 1 0 0 0 1 1]
  [1 1 1 0 0 0 1 1]
  [0 0 0 1 1 1 0 0]]]
pymks.generate_delta()

Generate a delta microstructure

A delta microstructure has a 1 at the center and 0 everywhere else for each phase. This is used to calibrate linear elasticity models that only require delta microstructures for calibration.

Parameters:
  • n_phases (int) – number of phases
  • shape (tuple) – the shape of the microstructure, (n_x, n_y)
  • chunks (tuple) – how to chunk the sample axis (n_chunk,)
Returns:

a dask array of delta microstructures

If n_phases=5 for example, this requires 20 microstructures as each phase pairing requies 2 microstructure arrays.

>>> arr = generate_delta(5, (3, 4), chunks=(5,))
>>> arr.shape
(20, 3, 4)
>>> arr.chunks
((5, 5, 5, 5), (3,), (4,))
>>> print(arr[0].compute())
[[0 0 0 0]
 [0 0 1 0]
 [0 0 0 0]]

generate_delta requires at least 2 phases

>>> arr = generate_delta(2, (3, 3))
>>> arr.shape
(2, 3, 3)
>>> print(arr[0].compute())
[[0 0 0]
 [0 1 0]
 [0 0 0]]
pymks.generate_multiphase()

Constructs microstructures for an arbitrary number of phases given the size of the domain, and relative grain size.

Parameters:
  • shape (tuple) – shape of the domain (n_sample, n_x, n_y)
  • grain_size (tuple) – typical expected grain size (n_x, n_y)
  • volume_fraction (tuple) – the percent volume fraction for each phase, which must sum to 1
  • chunks (int) – chunks_size of the sample index
  • percent_variance (float) – the percent variance for each value of volume_fraction
  • seed (int) – set the seed value, default is no seed
Returns:

A dask array of random-multiphase microstructures microstructures for the system of shape given by shape.

Example:

>>> x_expected = np.array([[[0, 0, 0],
...                         [0, 1, 0],
...                         [1, 1, 1]]])
>>> x_actual = generate_multiphase(
...     shape=(1, 3, 3),
...     grain_size=(1, 1),
...     volume_fraction=(0.5, 0.5),
...     seed=10
... )
>>> print(x_actual.shape)
(1, 3, 3)
>>> assert np.allclose(x_actual, x_expected)

If chunks is not set a Numpy array is returned.

>>> type(x_actual)
<class 'numpy.ndarray'>

If chunks is defined a Dask array is returned.

>>> x = generate_multiphase(
...     shape=(2, 3, 3),
...     grain_size=(1, 1),
...     volume_fraction=(0.5, 0.5),
...     chunks=1
... )
>>> print(x.chunks)
((1, 1), (3,), (3,))
pymks.graph_descriptors(data='__no__default__', delta_x=1.0, periodic_boundary=True)

Compute graph descriptors for multiple samples

Parameters:
  • data – array of phases (n_samples, n_x, n_y), values must be 0 or 1
  • delta_x – pixel size
  • periodic_boundary – whether the boundaries are periodic
Returns:

A Pandas data frame with samples along rows and descriptors along columns

Compute graph descriptors for multiple samples using the GraSPI sub-package. See the installation instructions to install PyMKS with GraSPI enabled.

GraSPI is focused on characterizing photovoltaic devices and so the descriptors must be understood in this context. Future releases will have more generic descriptors. See Wodo et al. for more details. Note that the current implementation only works for two phase data.

This function returns a Pandas Dataframe with the descriptors as columns and samples in rows. In the context of a photovoltaic device the top of the domain (y-direction) represents an anode and the bottom of the domain represents a cathode. Phase 0 represents donor materials while phase 1 represents acceptor material. Many of these descriptors characterizes the morphology in terms of hole electron pair generation and transport leading to device charge extraction.

To use graph_descriptors as part of a Sklearn pipeline, see GraphDescriptors.

The column descriptors are as follows.

Column Name Description
n_vertices The number of vertices in the constructed graph. Should be equal to the number of pixels.
n_edges The number of edges in the constructed graph.
n_phase{i} The number of vertices for phase {i}.
n_phase{i}_connect The number of connected components for phase {i}.
n_phase{i}_connect_top The number of connected components for phase {i} with the top of the domain in y-direction.
n_phase{i}_connect_bottom The number of connected components for phase {i} with the top of the domain in y-direction.
w_frac_phase{i} Weighted fraction of phase {i} vertices.
frac_phase{i} Fraction of phase {i} vertices.
w_frac_phase{i}_{j}_dist Weighted fraction of phase {i} vertices within j nodes from an interface.
frac_phase{i}_{j}_dist Fraction of phase {i} vertices within {j} nodes from an interface.
frac_useful Fraction of useful vertices connected the top or bottom of the domain.
inter_frac_bottom_top Fraction of interface with complementary paths to bottom or top of the domain.
frac_phase{i}_top Fraction of phase {i} interface vertices with path to top.
frac_phase{i}_bottom Fraction of phase {i} interface vertices with path to bottom.
n_inter_paths Number of interface edges with complementary paths.
n_phase{i}_inter_top Number of phase {i} interface vertices with path to top
n_phase{i}_inter_bottom Number of phase {i} interface vertices with path to bottom
frac_phase{i}_rising Fraction of phase {i} with rising paths

Example, with 3 x (3, 3) arrays

Read in the expected data.

>>> from io import StringIO
>>> expected = pandas.read_csv(StringIO('''
... n_vertices,n_edges,n_phase0,n_phase1,n_phase0_connect,n_phase1_connect,n_phase0_connect_top,n_phase1_connect_bottom,w_frac_phase0,frac_phase0,w_frac_phase0_10_dist,fraction_phase0_10_dist,inter_frac_bottom_and_top,frac_phase0_top,frac_phase1_bottom,n_inter_paths,n_phase0_inter_top,n_phase1_inter_bottom,frac_phase0_rising,frac_phase1_rising,n_phase0_connect_anode,n_phase1_connect_cathode
... 9,7,3,6,2,1,1,1,0.3256601095199585,0.3333333432674408,0.9624541997909546,1.0,0.4285714328289032,0.3333333432674408,1.0,3,1,6,1.0,0.6666666865348816,2,2
... 9,6,3,6,1,1,1,1,0.3267437815666199,0.3333333432674408,0.9624541997909546,1.0,1.0,1.0,1.0,6,3,6,1.0,1.0,2,3
... 9,6,6,3,2,1,1,0,0.6534984707832336,0.6666666865348816,0.9624541997909546,1.0,0.0,0.5,0.0,0,3,0,1.0,0.0,4,1
... '''))

Construct the 3 samples each with 3x3 voxels

>>> data = np.array([[[0, 1, 0],
...                   [0, 1, 1],
...                   [1, 1, 1]],
...                  [[1, 1, 1],
...                   [0, 0, 0],
...                   [1, 1, 1]],
...                  [[0, 1, 0],
...                   [0, 1, 0],
...                   [0, 1, 0]]])
>>> actual = graph_descriptors(data)

graph_descriptors returns a data frame.

>>> actual
   n_vertices  n_edges  ...  n_phase0_connect_anode  n_phase1_connect_cathode
0           9        7  ...                       2                         2
1           9        6  ...                       2                         3
2           9        6  ...                       4                         1
<BLANKLINE>
[3 rows x 22 columns]

Check that the actual values are equal to the expected values.

>>> assert np.allclose(actual, expected)

Works with Dask arrays as well. When using Dask a Dask dataframe will be returned.

>>> import dask.array as da
>>> out = graph_descriptors(da.from_array(data, chunks=(2, 3, 3)))
>>> out.get_partition(0).compute()
   n_vertices  n_edges  ...  n_phase0_connect_anode  n_phase1_connect_cathode
0           9        7  ...                       2                         2
1           9        6  ...                       2                         3
<BLANKLINE>
[2 rows x 22 columns]

On examining the data for this simple test case there are a few obvious checks. Each sample has 9 vertices since there are 9 pixels in each sample.

>>> actual.n_vertices
0    9
1    9
2    9
Name: n_vertices, dtype: int64

Notice that the first and third sample have two phase 1 regions connected to either the top or bottom of the domain while the second sample has only 1 region.

>>> actual.n_phase1_connect
0    1
1    1
2    1
Name: n_phase1_connect, dtype: int64

All paths are blocked for the first and second samples from reaching the top from the bottom surface. The third sample has 6 interface edges that connect the top and bottom.

>>> actual.n_inter_paths
0    3
1    6
2    0
Name: n_inter_paths, dtype: int64
pymks.paircorr_from_twopoint(x_data, cutoff_r=None, interpolate_n=None)

Computes the pair correlations from 2-point statistics.

The pair correlations are the radial average of the 2 point stats. The grid spacing is assumed to be one unit. Linear interpolation is used if interpolate_n is specified. If another interpolation is desired, don’t specify this parameter and perform desired interpolation on the output.

The discretized two point statistics are given by

\[f[r \; \vert \; l, l'] = \frac{1}{S} \sum_s m[s, l] m[s + r, l']\]

where \(f[r \; \vert \; l, l']\) is the conditional probability of finding the local states \(l\) and math:l’ at a distance and orientation away from each other defined by the vector \(r\). See this paper for more details on the notation.

The pair correlation is defined as the conditional probability for the case of the magnitude vector, \(||r||_2\), defined by \(g[d]\). \(g\) is related to \(f\) via the following transformation. Consider the set, \(I[d] := \{ f[r] \; \vert \; ||r||_2 = d \}\) then

\[g[d] = \frac{1}{ | I[ d ] | } \sum_{f \in I[ d ]} f\]

The \(d\) are radii from the center pixel of the domain. They are automatially calculated if interpolate_n is None.

It’s assumed that x_data is a valid set of two point statistics calculated from the PyMKS correlations module.

Parameters:
  • x_data – array of centered 2-point statistics. (n_samples, n_x, n_y, …)
  • cutoff_r – the radius cut off. Values less than 1 are assumed to be a proportion while values greater than 1 are an exact radius cutoff
  • interpolate_n – the number of equally spaced radii that the probabilities will be interpolated to
Returns:

A tuple of the pair correlation array and the radii cutoffs used for averaging or interpolation. The pair correlations are shaped as (n_samples, n_radii), whilst the radii are shaped as (n_radii,). n_radii is equal to interpolate_n when interpolate_n is specified. The probabilities are chunked on the sample axis the same as x_data. The radii is a numpy array.

Test with only 2 samples of 3x3

>>> import dask.array as da
>>> x_data = np.array([
...     [
...         [0.2, 0.4, 0.3],
...         [0.4, 0.5, 0.5],
...         [0.2, 0.5, 0.3]
...     ],
...     [
...         [0.1, 0.2, 0.3],
...         [0.2, 0.6, 0.4],
...         [0.1, 0.4, 0.3]
...     ]
... ])

Most basic test

>>> probs, radii = paircorr_from_twopoint(x_data)
>>> assert np.allclose(probs,
...     [[0.5, 0.45, 0.25],
...      [0.6, 0.3, 0.2]])
>>> assert np.allclose(radii, [0, 1, np.sqrt(2)])

Test with cutoff_r greater than 1

>>> probs, radii = paircorr_from_twopoint(x_data, cutoff_r=1.01)
>>> assert np.allclose(probs,
...     [[0.5, 0.45],
...      [0.6, 0.3]])
>>> assert np.allclose(radii, [0, 1])

Test with cutoff_r less than 1

>>> probs, radii = paircorr_from_twopoint(x_data, cutoff_r=0.99)
>>> assert np.allclose(probs,
...     [[0.5, 0.45],
...      [0.6, 0.3]])
>>> assert np.allclose(radii, [0, 1])

Test with a linear interpolation

>>> probs, radii = paircorr_from_twopoint(x_data, interpolate_n=2)
>>> assert np.allclose(probs,
...     [[0.5, 0.25],
...      [0.6, 0.2]])
>>> assert np.allclose(radii, [0, np.sqrt(2)])

Test with Dask. The chunks along the sample axis are preserved.

>>> arr = da.from_array(np.random.random((10, 4, 3, 3)), chunks=(2, 4, 3, 3))
>>> probs, radii = paircorr_from_twopoint(arr)
>>> probs.shape
(10, 7)
>>> probs.chunks
((2, 2, 2, 2, 2), (7,))
>>> assert np.allclose(radii, np.sqrt([0, 1, 2, 3, 4, 5, 6]))
pymks.plot_microstructures(*arrs, titles=(), cmap=None, colorbar=True, showticks=False, figsize_weight=4)

Plot a set of microstructures side-by-side

Parameters:
  • arrs – any number of 2D arrays to plot
  • titles – a sequence of titles with len(*arrs)
  • cmap – any matplotlib colormap
>>> import numpy as np
>>> np.random.seed(1)
>>> x_data = np.random.random((2, 10, 10))
>>> fig = plot_microstructures(
...     x_data[0],
...     x_data[1],
...     titles=['array 0', 'array 1'],
...     cmap='twilight'
... )
>>> fig.show()  #doctest: +SKIP
_images/plot_microstructures.png
pymks.solve_cahn_hilliard(x_data, n_steps=1, delta_x=0.25, delta_t=0.001, gamma=1.0)

Solve the Cahn-Hilliard equation.

Solve the Cahn-Hilliard equation for multiple samples in arbitrary dimensions. The concentration varies from -1 to 1. The equation is given by

\[\dot{\phi} = \nabla^2 \left( \phi^3 - \phi \right) - \gamma \nabla^4 \phi\]

The discretiztion scheme used here is from Chang and Rutenberg. The scheme is a semi-implicit discretization in time and is given by

\[\phi_{t+\Delta t} + \left(1 - a_1\right) \Delta t \nabla^2 \phi_{t+\Delta t} + \left(1 - a_2\right) \Delta t \gamma \nabla^4 \phi_{t+\Delta t} = \phi_t - \Delta t \nabla^2 \left(a_1 \phi_t + a_2 \gamma \nabla^2 \phi_t - \phi_t^3 \right)\]

where \(a_1=3\) and \(a_2=0\).

Parameters:
  • x_data – dask array chunked along the sample axis (n_sample, n_x, n_y)
  • n_steps – number of time steps used
  • delta_x – the grid spacing, \(\Delta x\)
  • delta_t – the time step size, \(\Delta t\)
  • gamma – Cahn-Hilliard parameter, \(\gamma\)
>>> import dask.array as da
>>> da.random.seed(99)
>>> x_data = 2 * da.random.random((1, 100, 100), chunks=(1, 100, 100)) - 1
>>> y_data = solve_cahn_hilliard(x_data)
>>> y_data.chunks
((1,), (100,), (100,))
>>> y_data = solve_cahn_hilliard(x_data, n_steps=10000)  #doctest: +SKIP
>>> from pymks import plot_microstructures
>>> fig = plot_microstructures(x_data[0], y_data[0])
>>> fig.show()  #doctest: +SKIP
_images/cahn-hilliard.png
pymks.solve_fe()

Solve the elasticity problem

Use Sfepy to solve a linear strain problem in 2D with a varying microstructure on a rectangular grid. The rectangle (cube) is held at the negative edge (plane) and displaced by 1 on the positive x edge (plane). Periodic boundary conditions are applied to the other boundaries.

The boundary conditions on the rectangle (or cube) are given by

\[u(L, y) = L \left(1 + \bar{\varepsilon}_{xx}\right)\]
\[u(0, L) = u(0, 0) = 0\]
\[u(x, 0) = u(x, L)\]

where \(\bar{\varepsilon}_{xx}\) is the macro_strain, \(u\) is the displacement in the \(x\) direction, and \(L\) is the length of the domain. More details about these boundary conditions can be found in Landi et al.

See the elasticity notebook for a full set of equations.

x_data should have integer values that represent the phase of the material. The integer values should correspond to the indices for the elastic_modulus and poisson_ratio sequences and, therefore, elastic_modulus and poisson_ratio need to be of the same length.

Parameters:
  • x_data – microstructures with shape, (n_samples, n_x, ...)
  • elastic_modulus – the elastic modulus in each phase, (e0, e1, ...)
  • poissons_ratio – the poissons ratio for each phase, (p0, p1, ...)
  • macro_strain – the macro strain, \(\bar{\varepsilon}_{xx}\)
  • delta_x – the grid spacing
Returns:

a dictionary of strain, displacement and stress with stress and strain of shape (n_samples, n_x, ..., 3) and displacement shape of (n_samples, n_x + 1, ..., 2)

>>> import numpy as np
>>> x_data = np.zeros((1, 11, 11), dtype=int)
>>> x_data[0, :, 1] = 0

x_data has values of 0 and 1 and so elastic_modulus and poisson_ratio must each have 2 entries for phase 0 and phase 1.

>>> strain = solve_fe(
...     x_data,
...     elastic_modulus=(1.0, 10.0),
...     poissons_ratio=(0., 0.),
...     macro_strain=1.,
...     delta_x=1.
... )['strain']
>>> from pymks import plot_microstructures
>>> fig = plot_microstructures(strain[0, ..., 0], titles=r'$\varepsilon_{xx}$')
>>> fig.show()  #doctest: +SKIP
_images/strain.png
pymks.test(*args)

Run all the module tests.

Equivalent to running py.test pymks in the base of PyMKS. Allows an installed version of PyMKS to be tested.

Parameters:*args – add arguments to pytest

To test an installed version of PyMKS use

$ python -c "import pymks; pymks.test()"
pymks.two_point_stats()

Calculate the 2-points stats for two arrays

The discretized two point statistics are given by

\[f[r \; \vert \; l, l'] = \frac{1}{S} \sum_s m[s, l] m[s + r, l']\]

where \(f[r \; \vert \; l, l']\) is the conditional probability of finding the local states \(l\) and \(l\) at a distance and orientation away from each other defined by the vector \(r\). See this paper for more details on the notation.

The array arr1[i] (state \(l\)) is correlated with arr2[i] (state \(l'\)) for each sample i. Both arrays must have the same number of samples and nominal states (integer value) or continuous variables.

To calculate multiple different correlations for each sample, see correlations_multiple().

To use two_point_stats as part of a Scikit-learn pipeline, see TwoPointCorrelation.

Parameters:
  • arr1 – array used to calculate cross-correlations, shape (n_samples,n_x,n_y)
  • arr2 – array used to calculate cross-correlations, shape (n_samples,n_x,n_y)
  • periodic_boundary – whether to assume a periodic boundary (default is True)
  • cutoff – the subarray of the 2 point stats to keep
  • mask – array specifying confidence in the measurement at a pixel, shape (n_samples,n_x,n_y). In range [0,1].
Returns:

the snipped 2-points stats

If both arrays are Dask arrays then a Dask array is returned.

>>> out = two_point_stats(
...     da.from_array(np.arange(10).reshape(2, 5), chunks=(2, 5)),
...     da.from_array(np.arange(10).reshape(2, 5), chunks=(2, 5)),
... )
>>> out.chunks
((2,), (5,))
>>> out.shape
(2, 5)

If either of the arrays are Numpy then a Numpy array is returned.

>>> two_point_stats(
...     np.arange(10).reshape(2, 5),
...     np.arange(10).reshape(2, 5),
... )
array([[ 3.,  4.,  6.,  4.,  3.],
       [48., 49., 51., 49., 48.]])

Test masking

>>> array = da.array([[[1, 0 ,0], [0, 1, 1], [1, 1, 0]]])
>>> mask = da.array([[[1, 1, 1], [1, 1, 1], [1, 0, 0]]])
>>> norm_mask = da.array([[[2, 4, 3], [4, 7, 4], [3, 4, 2]]])
>>> expected = da.array([[[1, 0, 1], [1, 4, 1], [1, 0, 1]]]) / norm_mask
>>> assert np.allclose(
...     two_point_stats(array, array, mask=mask, periodic_boundary=False)[:, 1:-1, 1:-1],
...     expected
... )

The mask must be in the range 0 to 1.

>>> array = da.array([[[1, 0], [0, 1]]])
>>> mask =  da.array([[[2, 0], [0, 1]]])
>>> two_point_stats(array, array, mask=mask)
Traceback (most recent call last):
...
RuntimeError: Mask must be in range [0,1]

Classes

class pymks.FlattenTransformer

Reshape data ready for a PCA.

Two point correlation data need to be flatten before performing PCA. This class flattens the two point correlation data for use in a Sklearn pipeline.

>>> data = np.arange(50).reshape((2, 5, 5))
>>> FlattenTransformer().transform(data).shape
(2, 25)
fit(*_)

Only necessary to make pipelines work

static transform(x_data)

Transform the X data

Parameters:x_data – the data to be transformed
class pymks.GenericTransformer(func)

Make a generic transformer based on a function

>>> import numpy as np
>>> data = np.arange(4).reshape(2, 2)
>>> GenericTransformer(lambda x: x[:, 1:]).fit(data).transform(data).shape
(2, 1)

Instantiate a GenericTransformer

Function should take a multi-dimensional array and return an array with the same length in the sample axis (first axis).

Parameters:func – transformer function
fit(*_)

Only necessary to make pipelines work

transform(data)

Transform the data

Parameters:data – the data to be transformed
Returns:the transformed data
class pymks.GraphDescriptors(delta_x=1.0, periodic_boundary=True)

Calculate GraphDescriptors as part of a Sklearn pipeline

Wraps the graph_descriptors() function

Test

>>> data = np.array([[[0, 1, 0],
...                   [0, 1, 1],
...                   [1, 1, 1]],
...                  [[1, 1, 1],
...                   [0, 0, 0],
...                   [1, 1, 1]],
...                  [[0, 1, 0],
...                   [0, 1, 0],
...                   [0, 1, 0]]])
>>> actual = GraphDescriptors().fit(data).transform(data)
>>> actual.shape
(3, 22)

See the graph_descriptors() function for more complete documentation.

Instantiate a GraphDescriptors transformer

Parameters:
  • delta_x – pixel size
  • periodic_boundary – whether the boundaries are periodic
  • columns – subset of columns to include
fit(*_)

Only necessary to make pipelines work

transform(data)

Transform the data

Parameters:data – the data to be transformed
Returns:the graph descriptors dataframe
class pymks.LegendreTransformer(n_state=2, min_=0.0, max_=1.0, chunks=None)

Legendre transformer for Sklearn pipelines

>>> from toolz import pipe
>>> data = da.from_array(np.array([[0, 0.5, 1]]), chunks=(1, 3))
>>> pipe(
...     LegendreTransformer(),
...     lambda x: x.fit(None, None),
...     lambda x: x.transform(data).compute(),
... )
array([[[ 0.5, -1.5],
        [ 0.5,  0. ],
        [ 0.5,  1.5]]])

Instantiate a LegendreTransformer

Parameters:
  • n_state – the number of local states
  • min – the minimum local state
  • max – the maximum local state
  • chunks – chunks size for state axis
class pymks.LocalizationRegressor(redundancy_func=<function LocalizationRegressor.<lambda>>)

Perform the localization in Sklearn pipelines

Allows the localization to be part of a Sklearn pipeline

>>> make_data = lambda s, c: da.from_array(
...     np.arange(np.prod(s),
...               dtype=float).reshape(s),
...     chunks=c
... )
>>> X = make_data((6, 4, 4, 3), (2, 4, 4, 1))
>>> y = make_data((6, 4, 4), (2, 4, 4))
>>> y_out = LocalizationRegressor().fit(X, y).predict(X)
>>> assert np.allclose(y, y_out)
>>> print(
...     pipe(
...         LocalizationRegressor(),
...         lambda x: x.fit(X, y.reshape(6, 16)).predict(X).shape
...     )
... )
(6, 16)

Instantiate a LocalizationRegressor

Parameters:redundancy_func – function to remove redundant elements from the coefficient matrix
coeff_resize(shape)

Generate new model with larger coefficients

Parameters:shape – the shape of the new coefficients
Returns:a new model with larger influence coefficients
fit(x_data, y_data)

Fit the data

Parameters:
  • x_data – the X data to fit
  • y_data – the y data to fit
Returns:

the fitted LocalizationRegressor

predict(x_data)

Predict the data

Parameters:x_data – the X data to predict
Returns:The predicted y data
class pymks.PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0, chunks=None)

Primitive transformer for Sklearn pipelines

>>> from toolz import pipe
>>> assert pipe(
...     PrimitiveTransformer(),
...     lambda x: x.fit(None, None),
...     lambda x: x.transform(np.array([[0, 0.5, 1]])).compute(),
...     lambda x: np.allclose(x,
...         [[[1. , 0. ],
...           [0.5, 0.5],
...           [0. , 1. ]]])
... )

Instantiate a PrimitiveTransformer

Parameters:
  • n_state – the number of local states
  • min – the minimum local state
  • max – the maximum local state
  • chunks – chunks size for state axis
class pymks.ReshapeTransformer(shape)

Reshape data ready for the LocalizationRegressor

Sklearn likes flat image data, but MKS expects shaped data. This class transforms the shape of flat data into shaped image data for MKS.

>>> data = np.arange(18).reshape((2, 9))
>>> ReshapeTransformer((None, 3, 3)).fit(None, None).transform(data).shape
(2, 3, 3)

Instantiate a ReshapeTransformer

Parameters:shape – the shape of the reshaped data (ignoring the first axis)
fit(*_)

Only necessary to make pipelines work

transform(x_data)

Transform the X data

Parameters:x_data – the data to be transformed
class pymks.TwoPointCorrelation(correlations=None, periodic_boundary=True, cutoff=None)

Calculate the 2-point stats for two arrays as part of Scikit-learn pipeline.

Wraps the correlations_multiple() function. See that for more complete documentation.

TwoPointCorrelation works with non-square arrays

>>> from sklearn.pipeline import Pipeline
>>> from pymks import PrimitiveTransformer
>>> data = np.random.randint(0, 2, size=10).reshape(1, 5, 2)
>>> Pipeline([
...     ('discretize', PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0)),
...     ('correlations', TwoPointCorrelation())
... ]).transform(data).compute().shape
(1, 5, 3, 2)
Parameters:
  • correlations – the correlation pairs
  • periodic_boundary – whether to assume a periodic boundary (default is true)
  • cutoff – the subarray of the 2 point stats to keep
fit(*_)

Only necessary to make pipelines work

transform(data)

Transform the data

Parameters:data – the data to be transformed
Returns:the 2-point stats array