Jonathan Sedar

Personal website and new home of The Sampler blog

PyMC3 Examples: GLM with Custom Likelihood for Outlier Classification

Posted at — 8 Nov 2019

Preamble

I first created this content at the end of 2015 and submitted to the examples documentation for the PyMC3 project and presented a version at our inaugural Bayesian Mixer London meetup. The presentation wasn’t much more than an attempt to get the ball rolling, but it must have done something right since the meetup is still going strong. A record of the evening appears to survive on Markus Gessman’s blog.

The Notebook received an update in 2018 from Thomas Wiecki - reworking the specification for the Hogg model ‘Signal vs Noise’ to use a pm.Potential instead of the rather verbose custom likelihood written in theano. That v2.0 of the Notebook is available in the project examples documentation.

I recently revisited this code and realised a lot has changed since 2015, so I’ve given it a thorough spring clean and pushed a v2.1 of the Notebook up on Github. This version is developed against python3.6 and pymc3.7, contains a more efficient StudentT model, uses the pm.Data object, and contains plate notation, updated arviz plots and explanations throughout.

It’s hard to find time for the study required to make fundamental contributions to the PyMC3 (and now PyMC4) projects, but if I can submit examples for how to use the library, then all the better. So this Notebook is re-hosted in a new personal repo I’ve created specifically for my pymc3_examples. It’s simple to reproduce the self-contained project setup using the README, and the conda env and pip instructions and the README, and the license is MIT. This new submission is currently pending a PR, and I’ll note here if/when it’s accepted.


GLM with Custom Likelihood for Outlier Classification (Abridged)

This content is condensed from a far more detailed Notebook up on Github which is designed to be fully reproducible in an MVP environment or standalone. The following content has been heavily abridged so we can get to the models, if something doesn’t make sense then please see the full Notebook.

The general concept is to filter out noisy / erroneous datapoints in a set of observations. We propose that a main set of ‘inliers’ are created by a particular model that we specify and a set of ‘outliers’ that fit better to some alternative model.

This implementation is a copy of eqn 17 in Hogg et al. (2010) Data analysis recipes: Fitting a model to data, and is adapted specifically from Jake Vanderplas' implementation for the AstroML book:

It’s interesting to note the domain specificity and weaknesses of this implementation:

Contents

1. Basic EDA

1.1 Load Data

The dataset is available within AstroML and since it’s a very small dataset, is hardcoded below

# cut & pasted directly from the fetch_hogg2010test() function
# identical to the original dataset as hardcoded in the Hogg 2010 paper

dfhogg = pd.DataFrame(
                np.array([[1, 201, 592, 61, 9, -0.84],
                          [2, 244, 401, 25, 4, 0.31],
                          [3, 47, 583, 38, 11, 0.64],
                          [4, 287, 402, 15, 7, -0.27],
                          [5, 203, 495, 21, 5, -0.33],
                          [6, 58, 173, 15, 9, 0.67],
                          [7, 210, 479, 27, 4, -0.02],
                          [8, 202, 504, 14, 4, -0.05],
                          [9, 198, 510, 30, 11, -0.84],
                          [10, 158, 416, 16, 7, -0.69],
                          [11, 165, 393, 14, 5, 0.30],
                          [12, 201, 442, 25, 5, -0.46],
                          [13, 157, 317, 52, 5, -0.03],
                          [14, 131, 311, 16, 6, 0.50],
                          [15, 166, 400, 34, 6, 0.73],
                          [16, 160, 337, 31, 5, -0.52],
                          [17, 186, 423, 42, 9, 0.90],
                          [18, 125, 334, 26, 8, 0.40],
                          [19, 218, 533, 16, 6, -0.78],
                          [20, 146, 344, 22, 5, -0.56]]),
                columns=['id','x','y','sigma_y','sigma_x','rho_xy'])

dfhogg['id'] = dfhogg['id'].apply(lambda x: 'p{}'.format(int(x)))
dfhogg.set_index('id', inplace=True)
dfhogg.head()

x y sigma_y sigma_x rho_xy
id
p1 201.0 592.0 61.0 9.0 -0.84
p2 244.0 401.0 25.0 4.0 0.31
p3 47.0 583.0 38.0 11.0 0.64
p4 287.0 402.0 15.0 7.0 -0.27
p5 203.0 495.0 21.0 5.0 -0.33

1.2 Exploratory Data Analysis (Trivial)

Just a quick plot to aid understanding

gd = sns.jointplot(x='x', y='y', data=dfhogg, kind='scatter', height=8,
        marginal_kws={'bins':12, 'kde':True, 'kde_kws':{'cut':1}},
        joint_kws={'edgecolor':'w', 'linewidth':1.2, 's':80})

_ = gd.ax_joint.errorbar('x', 'y', 'sigma_y', 'sigma_x', fmt='none', 
                            ecolor='#4878d0', data=dfhogg, zorder=10)
for idx, r in dfhogg.iterrows():
    _ = gd.ax_joint.annotate(s=idx, xy=(r['x'], r['y']), xycoords='data', 
                             xytext=(10, 10), textcoords='offset points', 
                             color='#999999', zorder=1)
    
_ = gd.annotate(stats.pearsonr, loc="upper left", fontsize=12)
_ = gd.fig.suptitle(('Original datapoints in Hogg 2010 dataset\n' + 
        'showing marginal distributions and errors sigma_x, sigma_y'), y=1.05)
Original Datapoints

Observe:


2. Basic Feature Engineering

Ordinarily we would run through more formalised steps to split into Train and Test sets (to later help evaluate model fit), but here I’ll just fit the model to the full dataset and stop at inference

2.1 Transform and standardize dataset

It’s common practice to standardize the input values to a linear model, because this leads to coefficients sitting in the same range and being more directly comparable. Gelman notes this in a 2007 paper.

So, following Gelman’s paper above, we’ll divide by 2 s.d. here:

Additional note on scaling the output feature y and measurement error sigma_y:

Standardize (mean center and divide by 2 sd)
dfhoggs = ((dfhogg[['x', 'y']] - dfhogg[['x', 'y']].mean(0)) / 
           (2 * dfhogg[['x', 'y']].std(0)))
dfhoggs['sigma_x'] = dfhogg['sigma_x'] / ( 2 * dfhogg['x'].std())
dfhoggs['sigma_y'] = dfhogg['sigma_y'] / ( 2 * dfhogg['y'].std())

3. Simple Linear Model with no Outlier Correction

3.1 Specify Model

Before we get more advanced, I want to demo the fit of a simple linear model with Normal likelihood function. The priors are also Normally distributed, so this behaves like an OLS with Ridge Regression (L2 norm).

Note: the dataset also has sigma_x and rho_xy available, but for this exercise, I’ve chosen to only use sigma_y

$$ \hat{y} \sim \mathcal{N}(\beta^{T} \vec{x}_{i}, \sigma_{i}) $$

where:

with pm.Model() as mdl_ols:
    
    ## Define weakly informative Normal priors to give Ridge regression
    b0 = pm.Normal('b0_intercept', mu=0, sigma=10)
    b1 = pm.Normal('b1_slope', mu=0, sigma=10)
 
    ## Define linear model
    y_est = b0 + b1 * dfhoggs['x']
    
    ## Define Normal likelihood
    likelihood = pm.Normal('likelihood', mu=y_est, sigma=dfhoggs['sigma_y'],
                           observed=dfhoggs['y'])
    
pm.model_to_graphviz(mdl_ols)
OLS Model: basic plate notation

3.2 Fit Model

3.2.1 Sample Posterior
with mdl_ols:
    trc_ols = pm.sample(tune=5000, draws=500, chains=3, cores=3, 
                    init='advi+adapt_diag', n_init=50000, progressbar=True)
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Average Loss = 150.96:  20%|█▉        | 9871/50000 [00:06<00:26, 1516.47it/s]
Convergence achieved at 10000
Interrupted at 9,999 [19%]: Average Loss = 315.49
Multiprocess sampling (3 chains in 3 jobs)
NUTS: [b1_slope, b0_intercept]
Sampling 3 chains: 100%|██████████| 16500/16500 [00:11<00:00, 1430.19draws/s]
3.2.2 View Diagnostics

NOTE: We will illustrate the posterior OLS fit and compare to the datapoints in the final comparison plot

Traceplot
_ = az.plot_trace(trc_ols, combined=False, compact=False)
OLS Model: traceplots
View posterior joint distribution (since the model has only 2 parameters, we can easily view this as a 2D joint distribution)
df_trc_ols = pm.trace_to_dataframe(trc_ols)
gd = sns.jointplot(x='b0_intercept', y='b1_slope', data=df_trc_ols, height=6, 
                   marginal_kws={'kde':True, 'kde_kws':{'cut':1}},
                   joint_kws={'alpha':0.2})
gd.plot_joint(sns.kdeplot, zorder=0, cmap="Blues", n_levels=12)
_ = gd.fig.suptitle('Posterior joint distribution (mdl_ols)', y=1.02)
OLS Model: joint posterior distribution of linear params

4. Simple Linear Model with Robust Student-T Likelihood

I’ve added this brief section in order to directly compare the Student-T based method exampled in Thomas Wiecki’s notebook in the PyMC3 documentation

Instead of using a Normal distribution for the likelihood, we use a Student-T which has fatter tails. In theory this allows outliers to have a smaller influence in the likelihood estimation. This method does not produce inlier / outlier flags (it marginalizes over such a classification) but it’s simpler and faster to run than the Signal Vs Noise model below, so a comparison seems worthwhile.

4.1 Specify Model

In this modification, the single likelihood is more robust to outliers:

$$ \hat{y} \sim \text{StudentT}(\beta^{T} \vec{x}_{i}, \sigma_{i}, \nu) $$

where:

with pm.Model() as mdl_studentt:
    
    # define weakly informative Normal priors to give Ridge regression
    b0 = pm.Normal('b0_intercept', mu=0, sd=10)
    b1 = pm.Normal('b1_slope', mu=0, sd=10)
 
    # define linear model
    y_est = b0 + b1 * dfhoggs['x']
       
    # define prior for StudentT degrees of freedom
    # InverseGamma has nice properties: 
    # it's continuous and has support x ∈ (0, inf)
    nu = pm.InverseGamma('nu', alpha=1, beta=1)

    # define Student T likelihood
    likelihood = pm.StudentT('likelihood', mu=y_est, 
                             sigma=dfhoggs['sigma_y'], nu=nu, 
                             observed=dfhoggs['y'])

pm.model_to_graphviz(mdl_studentt)
Student-T Model: basic plate notation

4.2 Fit Model

4.2.1 Sample Posterior
with mdl_studentt:
    trc_studentt = pm.sample(tune=5000, draws=500, chains=3, cores=3, 
                    init='advi+adapt_diag', n_init=50000, progressbar=True)
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Average Loss = 19.398:  31%|███▏      | 15631/50000 [00:11<00:24, 1417.33it/s]
Convergence achieved at 15700
Interrupted at 15,699 [31%]: Average Loss = 28.621
Multiprocess sampling (3 chains in 3 jobs)
NUTS: [nu, b1_slope, b0_intercept]
Sampling 3 chains: 100%|██████████| 16500/16500 [00:18<00:00, 905.18draws/s] 
4.2.2 View Diagnostics

NOTE: We will illustrate this StudentT fit and compare to the datapoints in the final comparison plot

Traceplot
_ = az.plot_trace(trc_studentt, combined=False, compact=False)
OLS Model: traceplots
4.2.3 View the shift in posterior joint distributions from OLS to StudentT
fts = ['b0_intercept', 'b1_slope']
df_trc = pd.concat((df_trc_ols[fts], df_trc_studentt[fts]), sort=False)
df_trc['model'] =  pd.Categorical(
                        np.repeat(['ols', 'studentt'], len(df_trc_ols)),
                        categories=['hogg_inlier', 'studentt', 'ols'], 
                        ordered=True) 

gd = sns.JointGrid(x='b0_intercept', y='b1_slope', data=df_trc, height=8)
_ = gd.fig.suptitle(('Posterior joint distributions' + 
            '\n(showing general movement from OLS to StudentT)'), y=1.05)

_, x_bin_edges = np.histogram(df_trc['b0_intercept'], 60)
_, y_bin_edges = np.histogram(df_trc['b1_slope'], 60)

for idx, grp in df_trc.groupby('model'):
    _ = sns.scatterplot(grp['b0_intercept'], grp['b1_slope'], 
                        ax=gd.ax_joint, alpha=0.2, label=idx)
    _ = sns.kdeplot(grp['b0_intercept'], grp['b1_slope'], 
                        ax=gd.ax_joint, zorder=2, n_levels=7)
    _ = sns.distplot(grp['b0_intercept'], ax=gd.ax_marg_x, kde_kws={'cut':1}, 
                         bins=x_bin_edges, axlabel=False)
    _ = sns.distplot(grp['b1_slope'], ax=gd.ax_marg_y, kde_kws={'cut':1}, 
                         bins=y_bin_edges, vertical=True, axlabel=False)
_ = gd.ax_joint.legend()
OLS vs Student-T Models: joint posterior distribution of params

Observe:

5. Linear Model with Custom Likelihood to Distinguish Outliers: Hogg Method

Please read the paper (Hogg 2010) and Jake Vanderplas' code for more complete information about the modelling technique.

As noted above, the general idea is to create a ‘mixture’ model whereby datapoints can be described by either:

  1. the proposed (linear) model (thus a datapoint is an inlier), or
  2. a second model, which for convenience we also propose to be linear, but allow it to have a different mean and variance (thus a datapoint is an outlier)

5.1 Specify Model

The likelihood is evaluated over a mixture of two likelihoods, one for ‘inliers’, one for ‘outliers’. A Bernouilli distribution is used to randomly assign datapoints in N to either the inlier or outlier groups, and we sample the model as usual to infer robust model parameters and inlier / outlier flags:

$$ \mathcal{logL} = \sum_{i}^{i=N} log \left[ \frac{(1 - B_{i})}{\sqrt{2 \pi \sigma_{in}^{2}}} exp \left( - \frac{(x_{i} - \mu_{in})^{2}}{2\sigma_{in}^{2}} \right) \right] + \sum_{i}^{i=N} log \left[ \frac{B_{i}}{\sqrt{2 \pi (\sigma_{in}^{2} + \sigma_{out}^{2})}} exp \left( - \frac{(x_{i}- \mu_{out})^{2}}{2(\sigma_{in}^{2} + \sigma_{out}^{2})} \right) \right] $$

where:

with pm.Model() as mdl_hogg:
    
    # get into the practice of stating input data as Theano shared vars
    tsv_x = pm.Data('tsv_x', dfhoggs['x'])                                        # (n, )
    tsv_y = pm.Data('tsv_y', dfhoggs['y'])                                        # (n, )
    tsv_sigma_y = pm.Data('tsv_sigma_y', dfhoggs['sigma_y'])                      # (n, )
    
    # weakly informative Normal priors (L2 ridge reg) for inliers
    b0 = pm.Normal('b0_intercept', mu=0, sigma=10, testval=pm.floatX(0.1))
    b1 = pm.Normal('b1_slope', mu=0, sigma=10, testval=pm.floatX(1.))
 
    # linear model for mean for inliers
    y_est_in = b0 + b1 * tsv_x                                                    # (n, )

    # proposed mean for all outliers 
    y_est_out = pm.Normal('y_est_out', mu=0, sigma=10, testval=pm.floatX(1.))     # (1, )
    
    # weakly informative prior for additional variance for outliers
    sigma_y_out = pm.HalfNormal('sigma_y_out', sigma=1, testval=pm.floatX(1.))    # (1, )
   
    # create in/outlier distributions to get a logp evaluated on the observed y
    # this is not strictly a pymc3 likelihood, but behaves like one when we
    # evaluate it within a Potential (which is minimised)
    inlier_logp = pm.Normal.dist(mu=y_est_in, 
                                 sigma=tsv_sigma_y).logp(tsv_y)

    outlier_logp = pm.Normal.dist(mu=y_est_out, 
                                  sigma=tsv_sigma_y + sigma_y_out).logp(tsv_y)
    
    # Bernoulli in/outlier classification constrained to [0, .5] for symmetry
    frac_outliers = pm.Uniform('frac_outliers', lower=0.0, upper=.5)
    is_outlier = pm.Bernoulli('is_outlier', p=frac_outliers, 
                      shape=tsv_x.eval().shape[0], 
                      testval=(np.random.rand(tsv_x.eval().shape[0]) < 0.4) * 1)  # (n, )

    # non-sampled Potential evaluates the Normal.dist.logp's
    potential = pm.Potential('obs', ((1-is_outlier) * inlier_logp).sum() + 
                             (is_outlier * outlier_logp).sum())

5.2 Fit Model

5.2.1 Sample Posterior

Note that pm.sample conveniently and automatically creates the compound sampling process to:

  1. sample a Bernoulli variable (the class is_outlier) using a discrete sampler
  2. sample the continuous variables using a continous sampler

Further note:

 

with mdl_hogg:
    trc_hogg = pm.sample(tune=6000, draws=500, chains=3, cores=3, 
                         init='jitter+adapt_diag', nuts={'target_accept':0.9})
Multiprocess sampling (3 chains in 3 jobs)
CompoundStep
>NUTS: [frac_outliers, sigma_y_out, y_est_out, b1_slope, b0_intercept]
>BinaryGibbsMetropolis: [is_outlier]
Sampling 3 chains: 100%|██████████| 19500/19500 [00:43<00:00, 449.14draws/s]
5.2.2 View Diagnostics

NOTE: We will illustrate this model fit and compare to the datapoints in the final comparison plot

Traceplot
rvs = ['b0_intercept', 'b1_slope', 'y_est_out', 'sigma_y_out', 'frac_outliers']
_ = az.plot_trace(trc_hogg, var_names=rvs, combined=False)
Hogg Model: traceplots

Observe:

Simple trace summary inc rhat
pm.summary(trc_hogg, var_names=rvs)

mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
b0_intercept 0.017191 0.031101 0.001042 -0.047390 0.074615 793.464520 0.999149
b1_slope 1.240391 0.064382 0.001987 1.110171 1.360046 1147.170130 1.000458
y_est_out 0.104516 0.291202 0.010213 -0.498045 0.670796 760.256224 1.005919
sigma_y_out 0.423364 0.273111 0.008358 0.063661 0.929918 855.002935 0.999450
frac_outliers 0.270572 0.104870 0.003415 0.080593 0.465110 830.279459 1.000730

 

5.2.3 View the shift in posterior joint distributions from OLS to StudentT to Hogg

fts = ['b0_intercept', 'b1_slope']
df_trc = pd.concat((df_trc_ols[fts], df_trc_studentt[fts], df_trc_hogg), sort=False)
df_trc['model'] = pd.Categorical(
                    np.repeat(['ols', 'studentt', 'hogg_inlier'], len(df_trc_ols)),
                    categories=['hogg_inlier', 'studentt', 'ols'], ordered=True) 

gd = sns.JointGrid(x='b0_intercept', y='b1_slope', data=df_trc, height=8)
_ = gd.fig.suptitle(('Posterior joint distributions' + 
            '\nOLS, StudentT, and Hogg (inliers)'), y=1.05)

_, x_bin_edges = np.histogram(df_trc['b0_intercept'], 60)
_, y_bin_edges = np.histogram(df_trc['b1_slope'], 60)

for idx, grp in df_trc.groupby('model'):
    _ = sns.scatterplot(grp['b0_intercept'], grp['b1_slope'], 
                        ax=gd.ax_joint, alpha=0.2, label=idx)
    _ = sns.kdeplot(grp['b0_intercept'], grp['b1_slope'], 
                        ax=gd.ax_joint, zorder=2, n_levels=7)
    _ = sns.distplot(grp['b0_intercept'], ax=gd.ax_marg_x, kde_kws={'cut':1}, 
                         bins=x_bin_edges, axlabel=False)
    _ = sns.distplot(grp['b1_slope'], ax=gd.ax_marg_y, kde_kws={'cut':1}, 
                         bins=y_bin_edges, vertical=True, axlabel=False)
_ = gd.ax_joint.legend()
OLS vs Student-T vs Hogg Models: joint posterior distributions of linear params

Observe:

5.3 Declare Outliers

5.3.1 View ranges for inliers / outlier predictions

At each step of the traces, each datapoint may be either an inlier or outlier. We hope that the datapoints spend an unequal time being one state or the other, so let’s take a look at the simple count of states for each of the 20 datapoints.

df_outlier_results = pd.DataFrame.from_records(trc_hogg['is_outlier'], 
                                               columns=dfhoggs.index)
dfm_outlier_results = pd.melt(df_outlier_results, 
                              var_name='datapoint_id', value_name='is_outlier')

gd = sns.catplot(y='datapoint_id', x='is_outlier', data=dfm_outlier_results, 
                 kind='point', join=False, height=6, aspect=2)
_ = gd.fig.axes[0].set(xlim=(-0.05,1.05), xticks=np.arange(0, 1.1, 0.1))
_ = gd.fig.axes[0].axvline(x=0, color='b', linestyle=':')
_ = gd.fig.axes[0].axvline(x=1, color='r', linestyle=':')
_ = gd.fig.axes[0].yaxis.grid(True, linestyle='-', which='major', 
                              color='w', alpha=0.4)
_ = gd.fig.suptitle(('For each datapoint, distribution of outlier classification '+
                    'over the trace'), y=1.04, fontsize=16)
For each datapoint, proportion of the trace spent as an inlier or outlier

Observe:

The 95% cutoff we choose is subjective and arbitrary, but I prefer it for now, so let’s declare these 3 to be outliers and see how it looks compared to Jake Vanderplas' outliers, which were declared in a slightly different way as points with means above 0.68.

5.3.2 Declare outliers

Note:

cutoff = .05
dfhoggs['classed_as_outlier'] = np.quantile(trc_hogg['is_outlier'], cutoff, axis=0) == 1
dfhoggs['classed_as_outlier'].value_counts()
False    17
True      3
Name: classed_as_outlier, dtype: int64

Also add flag for points to be investigated. Will use this to annotate final plot

dfhoggs['annotate_for_investigation'] = np.quantile(trc_hogg['is_outlier'], 
                                                    0.75, axis=0) == 1
dfhoggs['annotate_for_investigation'].value_counts()
False    16
True      4
Name: annotate_for_investigation, dtype: int64

5.4 Posterior Prediction Plots for OLS vs StudentT vs Hogg “Signal vs Noise”

gd = sns.FacetGrid(dfhoggs, height=10, hue='classed_as_outlier', 
                   hue_order=[True, False], palette='Set1', legend_out=False)

# plot hogg outlier posterior distribution
outlier_mean = lambda x, s: s['y_est_out'] * x ** 0
pm.plot_posterior_predictive_glm(trc_hogg, lm=outlier_mean,
            eval=np.linspace(-3, 3, 10), samples=400, 
            color='#CC4444', alpha=.2, zorder=1)

# plot the 3 model (inlier) posterior distributions
lm = lambda x,s: s['b0_intercept'] + s['b1_slope'] * x

pm.plot_posterior_predictive_glm(trc_ols, lm=lm,
            eval=np.linspace(-3, 3, 10), samples=200, 
            color='#22CC00', alpha=.3, zorder=2)

pm.plot_posterior_predictive_glm(trc_studentt, lm=lm,
            eval=np.linspace(-3, 3, 10), samples=200, 
            color='#FFA500', alpha=.5, zorder=3)

pm.plot_posterior_predictive_glm(trc_hogg, lm=lm,
            eval=np.linspace(-3, 3, 10), samples=200, 
            color='#357EC7', alpha=.5, zorder=4.)
_ = plt.title(None)

line_legend = plt.legend(
    [Line2D([0], [0], color='#357EC7'), Line2D([0], [0], color='#CC4444'),
    Line2D([0], [0], color='#FFA500'), Line2D([0], [0], color='#22CC00')],
    ['Hogg Inlier', 'Hogg Outlier', 'Student-T', 'OLS'], loc='lower right', 
    title='Posterior Predictive')
_ = gd.fig.get_axes()[0].add_artist(line_legend)

# plot points
_ = gd.map(plt.errorbar, 'x', 'y', 'sigma_y', 'sigma_x', marker="o", 
          ls='', markeredgecolor='w', markersize=10, zorder=5).add_legend()
_ = gd.ax.legend(loc='upper left', title='Outlier Classification')

# annotate the potential outliers
for idx, r in dfhoggs.loc[dfhoggs['annotate_for_investigation']].iterrows():
    _ = gd.axes.ravel()[0].annotate(s=idx, xy=(r['x'], r['y']), xycoords='data', 
            xytext=(7, 7), textcoords='offset points', color='#999999', zorder=4)

## create xlims ylims for plotting
x_ptp = np.ptp(dfhoggs['x'].values) / 3.3
y_ptp = np.ptp(dfhoggs['y'].values) / 3.3
xlims = (dfhoggs['x'].min()-x_ptp, dfhoggs['x'].max()+x_ptp)
ylims = (dfhoggs['y'].min()-y_ptp, dfhoggs['y'].max()+y_ptp)
_ = gd.axes.ravel()[0].set(ylim=ylims, xlim=xlims)
_ = gd.fig.suptitle(('Standardized datapoints in Hogg 2010 dataset, showing ' + 
        'posterior predictive fit for 3 models:\nOLS, StudentT and Hogg ' +
        '"Signal vs Noise" (inlier vs outlier, custom likelihood)'), 
                    y=1.04, fontsize=16)
Posterior Predictive Plots: OLS vs Student-T vs Hogg Signal vs Noise, showing inlier vs outlier classifications

Observe:

The posterior preditive fit for:
We see that the Hogg Signal vs Noise model also yields specific estimates of which datapoints are outliers:
Overall:

Postscript

If you made it this far, great! This is an attempt to mix a blogpost with a technical notebook in the style of a mini case study. The code is valid and extracted verbatim from the full Notebook, but these are excerpts, and to fully reproduce you should use the Notebook.

There’s a back catalogue of this material that I plan to overhaul and release over coming weeks, some to the pymc3_examples repo, some to project-specific repos, and contributing to the PyMC3 project docs where possible.

Further Note

Very nice to see Dan Foreman-Mackey chime in via Twitter to note his alternative marginalised likelihood in pymc3 using the same setup and dataset! I plan to do a followup Notebook using an explicit GMM class via pm.Mixture and pm.NormalMixture, so will compare there too. I also appreciate the link to Dan’s earlier implementation of this Hogg model using his emcee package which is one of the first MCMC packages I worked with on a client project way back in 2014. Nice one Dan (and thanks Thomas for the initial tweet). The power of the internet, eh?


comments powered by Disqus