<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-153560552369058570</id><updated>2011-06-16T12:58:53.251-07:00</updated><title type='text'>DefaultBlogTitle</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://reynwar.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/153560552369058570/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://reynwar.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Ben</name><uri>http://www.blogger.com/profile/17273765997519157250</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-153560552369058570.post-7091501216404472894</id><published>2011-05-28T23:28:00.001-07:00</published><updated>2011-05-28T23:39:43.195-07:00</updated><title type='text'>Some Statistics with R examples</title><content type='html'>&lt;p&gt;These are notes I made while working through the book &amp;quot;Statistics Explained&amp;quot;.  They're rather terse but include R examples so I thought they might possibly be useful.  If the maths doesn't look right it's because your browser doesn't support MathML.  Currently only Firefox does.&lt;/p&gt;
&lt;div class="section" id="statistics-summary"&gt;
&lt;h1&gt;Statistics Summary&lt;/h1&gt;
&lt;table class="docutils field-list" frame="void" rules="none"&gt;
&lt;col class="field-name" /&gt;
&lt;col class="field-body" /&gt;
&lt;tbody valign="top"&gt;
&lt;tr class="field"&gt;&lt;th class="field-name"&gt;Author:&lt;/th&gt;&lt;td class="field-body"&gt;Ben Reynwar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="field"&gt;&lt;th class="field-name"&gt;Copyright:&lt;/th&gt;&lt;td class="field-body"&gt;Whatever wikipedia uses since there are probably bits cut and pasted.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="field"&gt;&lt;th class="field-name"&gt;Created:&lt;/th&gt;&lt;td class="field-body"&gt;2011 April 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="field"&gt;&lt;th class="field-name"&gt;Last Edited:&lt;/th&gt;&lt;td class="field-body"&gt;2011 May 28&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Type I error
- finding an effect that isn't real.&lt;/p&gt;
&lt;p&gt;Type II error
- not finding a real effect.&lt;/p&gt;
&lt;div class="section" id="some-useful-distributions"&gt;
&lt;h2&gt;Some Useful Distributions&lt;/h2&gt;
&lt;div class="section" id="normal-distribution"&gt;
&lt;h3&gt;Normal Distribution&lt;/h3&gt;
&lt;/div&gt;
&lt;div class="section" id="chi-square-distribution"&gt;
&lt;h3&gt;Chi-square Distribution&lt;/h3&gt;
&lt;p&gt;The chi-square distribution with k degrees of freedom is the distribution of the sum of squares
of k independent standard normal variables.&lt;/p&gt;
&lt;p&gt;It is a special case of the gamma distribution.&lt;/p&gt;
&lt;p&gt;Useful for estimating variance of a population.  Suppose we have n observations
from a normal population &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;μ&lt;/mi&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/math&gt; and &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt; is their standard
deviation then &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;msup&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/math&gt; has a chi squared distribution
with n-1 degrees of freedom.  Since &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;σ&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt; is the only unknown once can
get confidence intervals for the population variance.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Number of observations in sample.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; n &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Set alpha for 95% confidence interval (two-sided)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; alpha &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;0.05&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Get a sample.  Pretend we don&amp;#39;t know mean or standard deviation.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; sample &lt;span class="o"&gt;=&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;n&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; top &lt;span class="o"&gt;=&lt;/span&gt; var&lt;span class="p"&gt;(&lt;/span&gt;sample&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;n&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Get bounds for confidence interval for the standard deviation.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; lower &lt;span class="o"&gt;=&lt;/span&gt; sqrt&lt;span class="p"&gt;(&lt;/span&gt;top&lt;span class="o"&gt;/&lt;/span&gt;qchisq&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;alpha&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; n&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; upper &lt;span class="o"&gt;=&lt;/span&gt; sqrt&lt;span class="p"&gt;(&lt;/span&gt;top&lt;span class="o"&gt;/&lt;/span&gt;qchisq&lt;span class="p"&gt;(&lt;/span&gt;alpha&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; n&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; c&lt;span class="p"&gt;(&lt;/span&gt;lower&lt;span class="p"&gt;,&lt;/span&gt; upper&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;3.531557&lt;/span&gt; &lt;span class="m"&gt;9.373243&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="f-distribution"&gt;
&lt;h3&gt;F-Distribution&lt;/h3&gt;
&lt;p&gt;The ratio of two independent random variables both of which have
chi-squared distributions has an F-distribution.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# We take two samples from a normal population and take the ratio of variances.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; n1 &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; n2 &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# A 95% confidence interval for our result is:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; alpha &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;0.05&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; c&lt;span class="p"&gt;(&lt;/span&gt;qf&lt;span class="p"&gt;(&lt;/span&gt;alpha&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; n1&lt;span class="p"&gt;,&lt;/span&gt; n2&lt;span class="p"&gt;),&lt;/span&gt; qf&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;alpha&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; n1&lt;span class="p"&gt;,&lt;/span&gt; n2&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.06622087&lt;/span&gt; &lt;span class="m"&gt;9.97919853&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; mu &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; sigma &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; sample1 &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;n1&lt;span class="p"&gt;,&lt;/span&gt; mu&lt;span class="p"&gt;,&lt;/span&gt; sigma&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; sample2 &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;n2&lt;span class="p"&gt;,&lt;/span&gt; mu&lt;span class="p"&gt;,&lt;/span&gt; sigma&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ratio &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; var&lt;span class="p"&gt;(&lt;/span&gt;sample1&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;var&lt;span class="p"&gt;(&lt;/span&gt;sample2&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Calculated ratio is:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ratio
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.2974744&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="students-s-t-distribution"&gt;
&lt;h3&gt;Students's t-distribution&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;An estimated distibution of sample means.  Different from a normal distribution
since it takes into account the uncertainty in the standard deviation.&lt;/li&gt;
&lt;li&gt;Depends on number of degrees of freedom of the standard deviation used.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Calculate the value of x for which the cumulative distribution of the&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# t-distribution is 0.05, for degrees of freedom = 1000.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; qt&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;-1.646379&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Now for degrees for freedom = 3.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; qt&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;-2.353363&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Get the density of a t-distribution.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; dt&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.3983438&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Get the cumultive distribution&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; pt&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.8412238&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="student-s-t-test"&gt;
&lt;h2&gt;Student's t-test&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Assesses the statistical significance of the difference between two sample
means.&lt;/li&gt;
&lt;li&gt;Can have paired or unpaired samples (related or independent).&lt;/li&gt;
&lt;li&gt;Assumes the two samples have equal variances.  Often used as long as one is not
more than three times as big as the other.&lt;/li&gt;
&lt;li&gt;Welch's t-test is an extension that does not assume equal variances.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="section" id="independent-example"&gt;
&lt;h3&gt;Independent Example&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Generate a random vector containing 10 values from a normal distibution&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# with mean 10 and standard deviation 2.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; x &lt;span class="o"&gt;=&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; x
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="m"&gt;7.288907&lt;/span&gt; &lt;span class="m"&gt;10.679290&lt;/span&gt;  &lt;span class="m"&gt;7.021853&lt;/span&gt; &lt;span class="m"&gt;12.356291&lt;/span&gt; &lt;span class="m"&gt;11.973576&lt;/span&gt; &lt;span class="m"&gt;13.452286&lt;/span&gt; &lt;span class="m"&gt;10.765847&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;12.815315&lt;/span&gt; &lt;span class="m"&gt;10.776181&lt;/span&gt; &lt;span class="m"&gt;11.478132&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Generate another similar random vector.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; y &lt;span class="o"&gt;=&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; y
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;10.228506&lt;/span&gt;  &lt;span class="m"&gt;9.307230&lt;/span&gt;  &lt;span class="m"&gt;9.007056&lt;/span&gt; &lt;span class="m"&gt;11.699393&lt;/span&gt; &lt;span class="m"&gt;12.524014&lt;/span&gt; &lt;span class="m"&gt;15.061729&lt;/span&gt;  &lt;span class="m"&gt;9.449236&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;14.378682&lt;/span&gt; &lt;span class="m"&gt;10.251995&lt;/span&gt;  &lt;span class="m"&gt;9.862443&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Perform the t-test on them.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# The p-value is the chance that the difference between the means would be&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# this large if the null hypothesis were true.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; t.test&lt;span class="p"&gt;(&lt;/span&gt;x&lt;span class="p"&gt;,&lt;/span&gt; y&lt;span class="p"&gt;,&lt;/span&gt; var.equal&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    Two Sample t&lt;span class="o"&gt;-&lt;/span&gt;test

data:  x and y
t &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;-0.3273&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; df &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; p&lt;span class="o"&gt;-&lt;/span&gt;value &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0.7472&lt;/span&gt;
alternative hypothesis: true difference in means is not equal to &lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="m"&gt;95&lt;/span&gt; percent confidence interval:
 &lt;span class="m"&gt;-2.346419&lt;/span&gt;  &lt;span class="m"&gt;1.713897&lt;/span&gt;
sample estimates:
mean of x mean of y
 &lt;span class="m"&gt;10.86077&lt;/span&gt;  &lt;span class="m"&gt;11.17703&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="related-example"&gt;
&lt;h3&gt;Related Example&lt;/h3&gt;
&lt;p&gt;(continues on from above example)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Generate the differences between two sets (mean=1, stdev=2).&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; e &lt;span class="o"&gt;=&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; z &lt;span class="o"&gt;=&lt;/span&gt; x &lt;span class="o"&gt;+&lt;/span&gt; e
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; t.test&lt;span class="p"&gt;(&lt;/span&gt;x&lt;span class="p"&gt;,&lt;/span&gt; z&lt;span class="p"&gt;,&lt;/span&gt; paired&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; var.equal&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    Paired t&lt;span class="o"&gt;-&lt;/span&gt;test

data:  x and z
t &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0.1081&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; df &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; p&lt;span class="o"&gt;-&lt;/span&gt;value &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0.9163&lt;/span&gt;
alternative hypothesis: true difference in means is not equal to &lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="m"&gt;95&lt;/span&gt; percent confidence interval:
 &lt;span class="m"&gt;-1.687779&lt;/span&gt;  &lt;span class="m"&gt;1.857237&lt;/span&gt;
sample estimates:
mean of the differences
             &lt;span class="m"&gt;0.08472894&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="analysis-of-variance"&gt;
&lt;h2&gt;Analysis of Variance&lt;/h2&gt;
&lt;p&gt;Variance Ratio(F) = (Between conditions variance)/(Error variance)&lt;/p&gt;
&lt;p&gt;Assume samples come from normally distributed populations with equal variances.&lt;/p&gt;
&lt;p&gt;F statistic depends on the dof for the two kinds of variances.
These should always be given with the F value:&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;b&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/math&gt; = calculated value&lt;/p&gt;
&lt;p&gt;The p-value is then found from a table or calculation.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="one-factor-independent-measures-anova"&gt;
&lt;h2&gt;One factor independent measures ANOVA&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;also called completely randomised design ANOVA&lt;/li&gt;
&lt;li&gt;We have &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt; conditions, with &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt; samples in the ith condition, and &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt; samples overall.&lt;/li&gt;
&lt;li&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;_&lt;/mo&gt;&lt;/mover&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt; denotes the sample mean in the ith condition.&lt;/li&gt;
&lt;li&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;_&lt;/mo&gt;&lt;/mover&gt;&lt;/mrow&gt;&lt;/math&gt; denotes the overall mean of the data.&lt;/li&gt;
&lt;li&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt; is the jth observation in the ith condition.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Between conditions variance =
&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;munder&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/munder&gt;&lt;msub&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;_&lt;/mo&gt;&lt;/mover&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;_&lt;/mo&gt;&lt;/mover&gt;&lt;msup&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;mo&gt;/&lt;/mo&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/math&gt;
you
Error variance =
&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;munder&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;/munder&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;msub&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;_&lt;/mo&gt;&lt;/mover&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;msup&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;mo&gt;/&lt;/mo&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;p&gt;For two conditions it is mathematically identical to the t-test.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Number of points in each data set.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ni &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Generate four sets of data with different means.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; a &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; b &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; c &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; d &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Merge them all together into a data frame.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; values &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; c&lt;span class="p"&gt;(&lt;/span&gt;a&lt;span class="p"&gt;,&lt;/span&gt; b&lt;span class="p"&gt;,&lt;/span&gt; c&lt;span class="p"&gt;,&lt;/span&gt; d&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; letters &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; c&lt;span class="p"&gt;(&lt;/span&gt;rep&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; ni&lt;span class="p"&gt;),&lt;/span&gt; rep&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; ni&lt;span class="p"&gt;),&lt;/span&gt; rep&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;c&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; ni&lt;span class="p"&gt;),&lt;/span&gt; rep&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;d&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; ni&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; df &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; data.frame&lt;span class="p"&gt;(&lt;/span&gt;letter&lt;span class="o"&gt;=&lt;/span&gt;letters&lt;span class="p"&gt;,&lt;/span&gt; value&lt;span class="o"&gt;=&lt;/span&gt;values&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Perform an anova analysis.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; fit &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; aov&lt;span class="p"&gt;(&lt;/span&gt;value ~ letter&lt;span class="p"&gt;,&lt;/span&gt; data&lt;span class="o"&gt;=&lt;/span&gt;df&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; summary&lt;span class="p"&gt;(&lt;/span&gt;fit&lt;span class="p"&gt;)&lt;/span&gt;
            Df Sum Sq Mean Sq &lt;span class="k-Variable"&gt;F&lt;/span&gt; value    Pr&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="k-Variable"&gt;F&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
letter       &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="m"&gt;91.902&lt;/span&gt; &lt;span class="m"&gt;30.6342&lt;/span&gt;  &lt;span class="m"&gt;24.829&lt;/span&gt; &lt;span class="m"&gt;1.964&lt;/span&gt;e&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;05&lt;/span&gt; &lt;span class="o"&gt;***&lt;/span&gt;
Residuals   &lt;span class="m"&gt;12&lt;/span&gt; &lt;span class="m"&gt;14.805&lt;/span&gt;  &lt;span class="m"&gt;1.2338&lt;/span&gt;
&lt;span class="o"&gt;---&lt;/span&gt;
Signif. codes:  &lt;span class="m"&gt;0&lt;/span&gt; ‘&lt;span class="o"&gt;***&lt;/span&gt;’ &lt;span class="m"&gt;0.001&lt;/span&gt; ‘&lt;span class="o"&gt;**&lt;/span&gt;’ &lt;span class="m"&gt;0.01&lt;/span&gt; ‘&lt;span class="o"&gt;*&lt;/span&gt;’ &lt;span class="m"&gt;0.05&lt;/span&gt; ‘&lt;span class="m"&gt;.&lt;/span&gt;’ &lt;span class="m"&gt;0.1&lt;/span&gt; ‘ ’ &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="post-hoc-tests"&gt;
&lt;h2&gt;Post-hoc Tests&lt;/h2&gt;
&lt;p&gt;If we find an effect with the F-test then we need to work out where it is
coming from.  We do this with post-hoc tests.  The risk is the increased
chance of a type I error.&lt;/p&gt;
&lt;dl class="docutils"&gt;
&lt;dt&gt;Least Significant Difference Test&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last simple"&gt;
&lt;li&gt;takes not account of the number of comparisons being made.&lt;/li&gt;
&lt;li&gt;increased risk of Type I error is simply accepted.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Neuman-Keuls Test&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last simple"&gt;
&lt;li&gt;??&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Duncan Test&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last simple"&gt;
&lt;li&gt;??&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Tukey Test&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last simple"&gt;
&lt;li&gt;We have K conditions, with n samples in each. N=nK.&lt;/li&gt;
&lt;li&gt;Studentized Range is the range of samples divided by an estimate of their
standard deviation.&lt;/li&gt;
&lt;li&gt;We find the value of the Studentized Range for which their is some defined
chance that the condition results will be under.&lt;/li&gt;
&lt;li&gt;If any two conditions deviate by more than this amount we can say they are
significant.&lt;/li&gt;
&lt;li&gt;Depends on number of conditions and dof in standard deviation calculation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; th &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; TukeyHSD&lt;span class="p"&gt;(&lt;/span&gt;fit&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; th
  Tukey multiple comparisons of means
    &lt;span class="m"&gt;95&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; family&lt;span class="o"&gt;-&lt;/span&gt;wise confidence level

Fit: aov&lt;span class="p"&gt;(&lt;/span&gt;formula &lt;span class="o"&gt;=&lt;/span&gt; value ~ letter&lt;span class="p"&gt;,&lt;/span&gt; data &lt;span class="o"&gt;=&lt;/span&gt; df&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;$&lt;/span&gt;letter
         diff       lwr      upr     p adj
b&lt;span class="o"&gt;-&lt;/span&gt;a &lt;span class="m"&gt;0.7224116&lt;/span&gt; &lt;span class="m"&gt;-1.609443&lt;/span&gt; &lt;span class="m"&gt;3.054266&lt;/span&gt; &lt;span class="m"&gt;0.7949923&lt;/span&gt;
c&lt;span class="o"&gt;-&lt;/span&gt;a &lt;span class="m"&gt;4.8536226&lt;/span&gt;  &lt;span class="m"&gt;2.521768&lt;/span&gt; &lt;span class="m"&gt;7.185477&lt;/span&gt; &lt;span class="m"&gt;0.0002394&lt;/span&gt;
d&lt;span class="o"&gt;-&lt;/span&gt;a &lt;span class="m"&gt;5.3724864&lt;/span&gt;  &lt;span class="m"&gt;3.040632&lt;/span&gt; &lt;span class="m"&gt;7.704341&lt;/span&gt; &lt;span class="m"&gt;0.0000919&lt;/span&gt;
c&lt;span class="o"&gt;-&lt;/span&gt;b &lt;span class="m"&gt;4.1312110&lt;/span&gt;  &lt;span class="m"&gt;1.799356&lt;/span&gt; &lt;span class="m"&gt;6.463066&lt;/span&gt; &lt;span class="m"&gt;0.0009963&lt;/span&gt;
d&lt;span class="o"&gt;-&lt;/span&gt;b &lt;span class="m"&gt;4.6500748&lt;/span&gt;  &lt;span class="m"&gt;2.318220&lt;/span&gt; &lt;span class="m"&gt;6.981929&lt;/span&gt; &lt;span class="m"&gt;0.0003539&lt;/span&gt;
d&lt;span class="o"&gt;-&lt;/span&gt;c &lt;span class="m"&gt;0.5188638&lt;/span&gt; &lt;span class="m"&gt;-1.812991&lt;/span&gt; &lt;span class="m"&gt;2.850718&lt;/span&gt; &lt;span class="m"&gt;0.9097766&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Now try to do the same thing but more manually for (b-a) comparison.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Variance of residual errors.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; vare &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; mean&lt;span class="p"&gt;(&lt;/span&gt;c&lt;span class="p"&gt;(&lt;/span&gt;var&lt;span class="p"&gt;(&lt;/span&gt;a&lt;span class="p"&gt;),&lt;/span&gt; var&lt;span class="p"&gt;(&lt;/span&gt;b&lt;span class="p"&gt;),&lt;/span&gt; var&lt;span class="p"&gt;(&lt;/span&gt;c&lt;span class="p"&gt;),&lt;/span&gt; var&lt;span class="p"&gt;(&lt;/span&gt;d&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Normalize the difference between the means the estimate of the standard deviation&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# of the means.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; st_range &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;mean&lt;span class="p"&gt;(&lt;/span&gt;d&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;mean&lt;span class="p"&gt;(&lt;/span&gt;a&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;sqrt&lt;span class="p"&gt;(&lt;/span&gt;vare&lt;span class="o"&gt;/&lt;/span&gt;ni&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# ptukey: ptukey(q, nmeans, df, nranges = 1, lower.tail = TRUE, log.p = FALSE)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#  q - a given studentized range&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#  nmeans - the number of samples (i.e. number of means in this case)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#  df - degrees of freedom in the calculation of the stdev.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# It returns the probability that the studentized range of the sample of means is&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# less than the given value of q.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# For our example nmeans is clearly 4 (a,b,c and d)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# And df is 4*(ni-1) because we calculated the variance from 4 sets of samples each of&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# which had (ni-1) degrees of freedom.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; manual &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; ptukey&lt;span class="p"&gt;(&lt;/span&gt;st_range&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; manual
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;9.19243&lt;/span&gt;e&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;05&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; th&lt;span class="p"&gt;$&lt;/span&gt;letter&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;d-a&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;quot;p adj&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; manual &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;e&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="kc"&gt;TRUE&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;dl class="docutils"&gt;
&lt;dt&gt;Scheffé Test&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last"&gt;
&lt;li&gt;&lt;p class="first"&gt;Very similar to Turkey method except we do not limit to pairwise comparisons.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Let μ1, ..., μr be the means of some variable in r disjoint populations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Begin cut and paste from wikipedia:
An arbitrary contrast is defined by&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/munderover&gt;&lt;msub&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;msub&gt;&lt;mi&gt;μ&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;p&gt;where&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/munderover&gt;&lt;msub&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;p&gt;If μ1, ..., μr are all equal to each other, then all contrasts among them
are 0. Otherwise, some contrasts differ from 0.&lt;/p&gt;
&lt;p&gt;Technically there are infinitely many contrasts. The simultaneous
confidence coefficient is exactly 1 − α, whether the factor level
sample sizes are equal or unequal. (Usually only a finite number of
comparisons are of interest. In this case, Scheffé's method is
typically quite conservative, and the experimental error rate will
generally be much smaller than α.)&lt;/p&gt;
&lt;p&gt;We estimate C by&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;^&lt;/mo&gt;&lt;/mover&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/munderover&gt;&lt;msub&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;msub&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;_&lt;/mo&gt;&lt;/mover&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;p&gt;for which the estimated variance is&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mrow&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;^&lt;/mo&gt;&lt;/mover&gt;&lt;/mrow&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;msubsup&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;σ&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;^&lt;/mo&gt;&lt;/mover&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/munderover&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;p&gt;It can be shown that the probability is 1 − α that all confidence limits of the type&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;^&lt;/mo&gt;&lt;/mover&gt;&lt;mo&gt;±&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mover&gt;&lt;mrow&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;^&lt;/mo&gt;&lt;/mover&gt;&lt;/msub&gt;&lt;msqrt&gt;&lt;mrow&gt;&lt;mfenced open="(" close=")"&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;/mfenced&gt;&lt;msub&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;mo&gt;;&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;;&lt;/mo&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/msqrt&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Perform a Scheffe test to see if the average of samples a and b is significantly&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# different from the average of c and d.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; cs &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; c&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;-0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;-0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; means &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; tapply&lt;span class="p"&gt;(&lt;/span&gt;df&lt;span class="p"&gt;$&lt;/span&gt;value&lt;span class="p"&gt;,&lt;/span&gt; df&lt;span class="p"&gt;$&lt;/span&gt;letter&lt;span class="p"&gt;,&lt;/span&gt; mean&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; c.hat &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; sum&lt;span class="p"&gt;(&lt;/span&gt;cs&lt;span class="o"&gt;*&lt;/span&gt;means&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; s2 &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; vare&lt;span class="o"&gt;/&lt;/span&gt;ni &lt;span class="o"&gt;*&lt;/span&gt; sum&lt;span class="p"&gt;(&lt;/span&gt;cs&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; fval &lt;span class="o"&gt;=&lt;/span&gt; c.hat&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;s2&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; fval
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;6.100458&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# The chance of any linear combination deviating by this much if all samples&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# were from the same normal population.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; pf&lt;span class="p"&gt;(&lt;/span&gt;fval&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;ni&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.009188318&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="analyzing-frequency-data"&gt;
&lt;h2&gt;Analyzing Frequency Data&lt;/h2&gt;
&lt;p&gt;A population can be divided in c categories.  The chance of an observation being
of category i is &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt;.  We make N observations and find &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt; in category
i.&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;munder&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/munder&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;msup&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/mrow&gt;&lt;/math&gt; can be approximated by a chi-squared
distribution with c-1 degrees of freedom as long as &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;/math&gt; is greater than 5 for
all categories.&lt;/p&gt;
&lt;p&gt;Possible Uses:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;A goodness of fit test. To see if a sample seems to match a normal
distribution bin the observations and compare the observed
frequencies to those expected.&lt;/li&gt;
&lt;li&gt;A test of independence.  If we have two types of categories and each observation
is a member of one of each types of categories then we can check if the
categories are independent.  We just see how the frequencies deviate from what would
be expected if they were independent.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="linear-correlation"&gt;
&lt;h2&gt;Linear Correlation&lt;/h2&gt;
&lt;p&gt;We have a sample of (x, y) pairs.
Pearson correllation coefficient is:&lt;/p&gt;
&lt;p&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/p&gt;
&lt;p&gt;r=1 is perfect positive correlation, r=0 is uncorrelated, r=-1 is perfect
negative correlation.  It is the slope of the line of best-fit through
the reduced variables.&lt;/p&gt;
&lt;p&gt;The distribution of r is approximately a Studentized t-distribution. To be
exact the variable t (&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;msqrt&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;-&lt;/mo&gt;&lt;msup&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/msqrt&gt;&lt;/mrow&gt;&lt;/math&gt;) has this
distribution.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Create some correlated data.&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; n &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; a &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;n&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; b &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;a &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; rnorm&lt;span class="p"&gt;(&lt;/span&gt;n&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; r &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; cor&lt;span class="p"&gt;(&lt;/span&gt;a&lt;span class="p"&gt;,&lt;/span&gt; b&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; r
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.5985606&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; t &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; r &lt;span class="o"&gt;*&lt;/span&gt; sqrt&lt;span class="p"&gt;((&lt;/span&gt;n&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;r&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; t
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;2.113384&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# The probability of a correlation being this far from 0 by chance is:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; pt&lt;span class="p"&gt;(&lt;/span&gt;t&lt;span class="p"&gt;,&lt;/span&gt; n&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="m"&gt;0.06751678&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/153560552369058570-7091501216404472894?l=reynwar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://reynwar.blogspot.com/feeds/7091501216404472894/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://reynwar.blogspot.com/2011/05/some-statistics-with-r-examples.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/153560552369058570/posts/default/7091501216404472894'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/153560552369058570/posts/default/7091501216404472894'/><link rel='alternate' type='text/html' href='http://reynwar.blogspot.com/2011/05/some-statistics-with-r-examples.html' title='Some Statistics with R examples'/><author><name>Ben</name><uri>http://www.blogger.com/profile/17273765997519157250</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-153560552369058570.post-691433907209891191</id><published>2011-05-28T23:17:00.000-07:00</published><updated>2011-05-28T23:17:07.394-07:00</updated><title type='text'>Conversion of reStructuredText to html.</title><content type='html'>&lt;p&gt;I've recently decided to use reStructuredText for making notes and needed a method to convert them into html.
The reStructuredText contains code snippets as well as mathematical notation so the conversion process needed to
be able to handle that.&lt;/p&gt;
&lt;p&gt;docutils is the obvious candidate to do the conversion, however it doesn't do syntax highlighting or MathML out of the box,
so I needed to find extensions that could.&lt;/p&gt;
&lt;p&gt;I decided to use Pygments for the syntax highlighting of the code snippets.  The Pygments package comes with a file
rst-directive.py that creates a directive called 'sourcecode' that can then be used to define code snippets.&lt;/p&gt;
&lt;p&gt;For the maths I found rst2mathml which adds a directive converting tex math notation in a reStructuredText file to
MathML in the html.&lt;/p&gt;
&lt;p&gt;So the list of steps to get this working was:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p class="first"&gt;Install some stuff needed for rst.&lt;/p&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;sudo &lt;span class="pre"&gt;apt-get&lt;/span&gt; install &lt;span class="pre"&gt;python-docutils&lt;/span&gt;&lt;/tt&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Download &lt;a class="reference external" href="http://docutils.sourceforge.net/sandbox/jensj/latex_math/tools/rst2mathml.py"&gt;rst2mathml.py&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Found rst-directive.py in the Pygments constellation and made a copy renamed to rst_directive.py in the same directory as
rst2mathml.py.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Add the following line to the top of rst2mathml.py so that the sourcecode directive can be used.&lt;/p&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;import rst_directive&lt;/tt&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Create a stylesheet for syntax highlighting.&lt;/p&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;pygmentize &lt;span class="pre"&gt;-S&lt;/span&gt; default &lt;span class="pre"&gt;-f&lt;/span&gt; html &lt;span class="pre"&gt;-a&lt;/span&gt; .highlight &amp;gt; style.css&lt;/tt&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Create script to do conversion.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;convert.sh&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;

&lt;span class="nb"&gt;echo &lt;/span&gt;input file name is &lt;span class="nv"&gt;$1&lt;/span&gt;
&lt;span class="nv"&gt;stem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="p"&gt;%.rst&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
python /home/ben/Documents/Notes/rst2mathml.py --stylesheet-path&lt;span class="o"&gt;=&lt;/span&gt;/home/ben/Documents/Notes/style.css &lt;span class="nv"&gt;$1&lt;/span&gt; &amp;gt; &lt;span class="nv"&gt;$stem&lt;/span&gt;.xhtml
&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/153560552369058570-691433907209891191?l=reynwar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://reynwar.blogspot.com/feeds/691433907209891191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://reynwar.blogspot.com/2011/05/conversion-of-restructuredtext-to-html.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/153560552369058570/posts/default/691433907209891191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/153560552369058570/posts/default/691433907209891191'/><link rel='alternate' type='text/html' href='http://reynwar.blogspot.com/2011/05/conversion-of-restructuredtext-to-html.html' title='Conversion of reStructuredText to html.'/><author><name>Ben</name><uri>http://www.blogger.com/profile/17273765997519157250</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
