Interpreting NonSignificant Results
There are many theories and stories to account for theuse of P=0.05 to denote statistical significance.
ANOVA not significant but ttest significant  Cross …
The independent ttest, also called the two sample ttest, independentsamples ttest or student's ttest, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups.
The basis for many nonparametric tests involves discarding the actual numbers in the dataset and replacing them with numerical rankings from lowest to highest. Thus, the dataset 7, 12, 54, 103 would be replaced with 1, 2, 3, and 4, respectively. This may sound odd, but the general method, referred to as a , is well grounded. In the case of the MannWhitney test, which is used to compare two unpaired groups, data from both groups are combined and ranked numerically (1, 2, 3, … ). Then the rank numbers are sorted back into their respective starting groups, and a is tallied for each group^{}. If both groups were sampled from populations with identical means (the null hypothesis), then there should be relatively little difference in their mean ranks, although chance sampling will lead to some differences. Put another way, high and lowranking values should be more or less evenly distributed between the two groups. Thus for the MannWhitney test, the value will answer the following question: Based on the mean ranks of the two groups, what is the probability that they are derived from populations with identical means? As for parametric tests, a value ≤ 0.05 is traditionally accepted as statistically significant.
ANOVA not significant but ttest significant
For claims about a population mean from a population with a or for any sample with large sample size (for which the sample mean will follow a normal distribution by the ) with unknown standard deviation, the appropriate significance test is known as the , where the teststatistic is defined as t = .The test statistic follows the distribution with degrees of freedom.
It is now time to discuss SD in another context that is central to the understanding of statistics. We do this with a thought experiment. Imagine that we determine the brood size for six animals that are randomly selected from a larger population. We could then use these data to calculate a sample mean, as well as a sample SD, which would be based on a sample size of = 6. Not being satisfied with our efforts, we repeat this approach every day for 10 days, each day obtaining a new mean and new SD (). At the end of 10 days, having obtained ten different means, we can now use each sample mean as though it were a single data point to calculate a new mean, which we can call . In addition, we can calculate the SD of these ten mean values, which we can refer to for now as the . We can then pose the following question: will the SD calculated using the ten means generally turn out to be a larger or smaller value (on average) than the SD calculated from each sample of six random individuals? This is not merely an idiosyncratic question posed for intellectual curiosity. The notion of the is critical to statistical inference. Read on.
Statistics Roundtable: Not Significant, But Important?  ASQ
The paired test is a powerful way to detect differences in two sample means, provided that your experiment has been designed to take advantage of this approach. In our example of embryonic GFP expression, the two samples were in that the expression within any individual embryo was not linked to the expression in any other embryo. For situations involving independent samples, the paired test is not applicable; we carried out an unpaired test instead. For the paired method to be valid, data points must be linked in a meaningful way. If you remember from our first example, worms that have a mutation in show lower expression of the ::GFP reporter. In this example of a paired test, consider a strain that carries a construct encoding a hairpin dsRNA corresponding to gene . Using a specific promoter and the appropriate genetic background, the dsRNA will be expressed only in the rightmost cell of one particular neuronal pair, where it is expected to inhibit the expression of gene via the RNAi response. In contrast, the neuron on the left should be unaffected. In addition, this strain carries the same ::GFP reporter described above, and it is known that this reporter is expressed in both the left and right neurons at identical levels in wild type. The experimental hypothesis is therefore that, analogous to what was observed in embryos, fluorescence of the ::GFP reporter will be weaker in the right neuron, where gene has been inhibited.
Nevertheless, there are certain kinds of common experiments, such as qRTPCR, where a sample size of three is quite typical. Of course, by three we do not mean three worms. For each sample in a qRTPCR experiment, many thousands of worms may have been used to generate a single mRNA extract. Here, three refers to the number of . In such cases, it is generally understood that worms for the three extracts may have been grown in parallel but were processed for mRNA isolation and cDNA synthesis separately. Better yet, the templates for each biological replicate may have been grown and processed at different times. In addition, qRTPCR experiments typically require . Here, three or more equalsized aliquots of cDNA from the same biological replicate are used as the template in individual PCR reactions. Of course, the data from technical replicates will nearly always show less variation than data from true biological replicates. In the case of qRTPCR, the former are only informative as to the variation introduced by the pipetting or amplification process. As such, technical replicates should be averaged, and this value treated as a single data point.
For example, a significant overall F ..

us think that the result is significant and therefore we are ..
Still, why should the value 0.05 be adopted as the universallyaccepted value for statistical significance?

(0.05), the same outcome is not statistically significant.
Thedifference between the regression coefficients, though relatively large,cannot be regarded as significant.

What if my data is not statistically significant?  Quora
while in Fisher [19xx, p 516] he is willing pay attention to a value notmuch different.
Significance of correlation coefficient  Janda
There is, however, a problem in using the onesample approach, which is not statistical but experimental. Namely, there is always the possibility that something about the growth conditions, experimental execution, or alignment of the planets, could result in a value for wild type that is different from that of the established norm. If so, these effects would likely conspire to produce a value for mutant that is different from the traditional wildtype value, even if no real difference exists. This could then lead to a false conclusion of a difference between wild type and mutant . In other words, the statistical test, though valid, would be carried out using flawed data. For this reason, one doesn't often see onesample tests in the worm literature. Rather, researchers tend to carry out parallel experiments on both populations to avoid being misled. Typically, this is only a minor inconvenience and provides much greater assurance that any conclusions will be legitimate. Along these lines, historical controls, including those carried out by the same lab but at different times, should typically be avoided.
then as 0.03 > 0.01 then it is not significant
The key is to understand that the test is based on the theoretical distribution shown in , as are many other statistical parameters including 95% CIs of the mean. Thus, for the test to be valid, the shape of the actual differences in sample means must come reasonably close to approximating a normal curve. But how can we know what this distribution would look like without repeating our experiment hundreds or thousands of times? To address this question, we have generated a complementary distribution shown in . In contrast to , was generated using a computational resampling method known as bootstrapping (discussed in ). It shows a histogram of the differences in means obtained by carrying out 1,000 repeats of our experiment. Importantly, because this histogram was generated using our actual sample data, it automatically takes skewing effects into account. Notice that the data from this histogram closely approximate a normal curve and that the values obtained for the mean and SDs are virtually identical to those obtained using the theoretical distribution in . What this tells us is that even though the sample data were indeed somewhat skewed, a test will still give a legitimate result. Moreover, from this exercise we can see that with a sufficient sample size, the test is quite robust to some degree of nonnormality in the underlying population distributions. Issues related to normality are also discussed further below.
What Is a Scientific Hypothesis?  Definition of Hypothesis
One aspect of the test that tends to agitate users is the obligation to choose either the one or twotailed versions of the test. That the term “tails” is not particularly informative only exacerbates the matter. The key difference between the one and twotailed versions comes down to the formal statistical question being posed. Namely, the difference lies in the wording of the research question. To illustrate this point, we will start by applying a twotailed test to our example of embryonic GFP expression. In this situation, our typical goal as scientists would be to detect a difference between the two means. This aspiration can be more formally stated in the form of a or . Namely, that the average expression levels of ::GFP in wild type and in mutant are different. The must convey the opposite sentiment. For the twotailed test, the null hypothesis is simply that the expression of ::GFP in wild type and mutant backgrounds is the same. Alternatively, one could state that the difference in expression levels between wild type and mutant is zero.