# KeepNotes blog

Stay hungry, Stay Foolish.

0%

``````library(ggplot2)
library(dplyr)

data <- read.csv(file = "labelfree.csv", sep = ",", header = T, row.names = 1)
rm <- apply(data, 1, function(x){
sum(x == 0) > 3
})
df <- data[!rm,]
> nrow(df)
[1] 5325``````

``````Mean <- apply(df, 2, mean)
SD <- apply(df, 2, sd)
> Mean
A1         A2         A3         B1         B2         B3
2366243926 2351546722 2165457844 2175212680 2708233812 2290120209
> SD
A1          A2          A3          B1          B2          B3
13098498790 14392930144 14179115278 11151997873 12092333164 10344859562``````

T检验或者方差分析之类的有参检验需要数据符合方差齐性，一般随着实验的进行，由于仪器或者人工操作，误差有可能会越来越大，会导致数据波动越来越大。这个时候可以用对数变换把方差变大的趋势给调回来

#### T检验

``````fvalue <- apply(df, 1, function(x){
a <- factor(c(rep("A",3), rep("B",3)))
var.test(x~a)
})
>sum(unlist(lapply(fvalue, function(x){
x\$p.value > 0.05
})))
[1] 4605``````

``````pvalue <- apply(df, 1, function(x){
a <- factor(c(rep("A",3), rep("B",3)))
fvalue <- var.test(x~a)
if (fvalue\$p.value > 0.05){
t.test(x~a, var.equal = T)
}else{
t.test(x~a, var.equal = F)
}
})

result_ttest <- data.frame(ID=row.names(df),
Pvalue = as.numeric(unlist(lapply(pvalue, function(x) x\$p.value))),
log2FC = log2(as.numeric(unlist(lapply(pvalue, function(x) x\$estimate[1]/x\$estimate[2]))))
)
> sum(result_ttest\$Pvalue < 0.05 & abs(result_ttest\$log2FC) > 1)
[1] 704``````

#### Limma

Limma就比较熟悉了，如果就测试数据两组比较而言，limma需要先做log2转化，然后设定分组矩阵，最后用线性模型拟合再用Empirical Bayes test求出检验的P值

``````library(limma)

df1 <- log2(df + 1)
group_list <- factor(c(rep("A",3), rep("B",3)))
design <- model.matrix(~group_list)
colnames(design) <- levels(group_list)
rownames(design) <- colnames(df1)

fit <- lmFit(df1, design)
fit <- eBayes(fit, trend=TRUE)
result_limma <- topTable(fit, coef=2,n=Inf)
>sum(result_limma\$P.Value < 0.05 & abs(result_limma\$logFC) > 1)
[1] 952``````

Limma的差异蛋白数目就相比以上就略多了，满足阈值的有952个