Mutant-allele tumor heterogeneity(MATH)

最近有点浮躁,出去散了下心没什么好转,已经有1个多月没有系统的学习了,除了工作,其他时候也不知道在忙啥。

有时间就看了看文献,之前有一朋友推荐我看一篇临床研究的文献,发表于2017年 Breast Cancer Res Treat期刊的Clinical and molecular relevance of mutant-allele tumor heterogeneity in breast cancer,主要讲了使用Mutant-allele tumor heterogeneity(MATH)算法评估肿瘤异质性,并研究了其与一些临床指标以及组学数据的相关性,思路很简单,效果比较一般,并没有较大的突破,但是其MATH的算法还是值得看看的

MATH算法最早可追溯到发表于2013年Oral Oncol期刊的MATH, a novel measure of intratumor genetic heterogeneity, is high in poor-outcome classes of head and neck squamous cell carcinoma文章。后来该作者在Cancer上发表了一篇关于头颈部鳞状细胞癌的文章High intratumor genetic heterogeneity is related to worse outcome in patients with head and neck squamous cell carcinoma,并再次说明了MATH的有效性,高MATH的病人与低整体存活率有关等等

然后结合一篇国外的博文MATH and Tumors,大致上理解MATH的原理,整体上还是比较简单的

先说说什么是肿瘤异质性,虽然肿瘤异质性可分为肿瘤间异质性和肿瘤内异质性,但是不做特别说明,我们默认为肿瘤异质性就是指肿瘤内异质性(Intra-tumor heterogeneity (ITH)),随着癌细胞的不断生长,其分裂后的子代细胞呈现出与同代细胞或者父细胞的不同,从而使得其各个方面有了较大的差异
,最终导致肿瘤的生长、侵染、预后等指标的差异。最近几年对于肿瘤异质性的研究小结可以粗略的看下【盘点】浅谈肿瘤异质性

针对肿瘤异质性这种情况,2013年那篇作者想通过MATH指标来看看是否高肿瘤异质性的病人是否与较差的预后有关联。上述两篇的整体思路是先计算每个病人的MATH值,然后根据MATH值将病人分为低、中、高三大类,然后分别评估这三组病人的MATH值与临床指标的相关性以及突变等组学数据的关联。所以我们需要知道MATH值是怎么计算的,先看下Cancer文献的原文:

The MATH value for each tumor was based on the distribution of mutant-allele fractions among tumor-specific mutated loci, calculated as the percentage ratio of the width (median absolute deviation, MAD, scaled by a constant factor so that the expected MAD of a sample from a normal distribution equals the standard deviation) to the center (median) of its distribution:
MATH=100 * MAD/median

再看下上述2017年文献中的描述:

the steps to determine the MATH value can be summarized as follows: (1) calculating the mutant-allele fraction (MAF) for each locus as the ratio of mutant reads to total reads; (2) obtaining the absolute differences of each MAF from the median MAF value, multiplying the median of these absolute differences by a factor of 1.4826, thus the median absolute deviation (MAD) was generated; (3) calculating MATH as the percentage ratio of the MAD to the median of the MAFs among the tumor’s mutated genomic loci, presented as MATH = 100 * MAD/median.

以及2013年较早的那篇

Each tumor’s MATH value was calculated from the median absolute deviation (MAD) and the median of its mutant-allele fractions at tumor-specific mutated loci:MATH=100 * MAD/median. Calculation of MAD followed the default in R, with values scaled by a constant factor (1.4826) so that the expected MAD of a sample from a normal distribution equals the standard deviation.

个人理解是这样的:

  1. 首先通过测序数据计算每个样本的MAF(mutant-allele fractions)值,一般软件结果都会给出这个数据
  2. 再通过MAF计算得到MAD(median absolute deviation)值,也就是计算每个MAF值与其中位数的绝对差值,并将这些绝对差值的中位数再乘以一个常量(1.4826),从而获得MAD值,作者为什么要乘以常量,是为了让MAD值更能代表标准差的作用?至于为什么定这个数字的,我看完文献都没找到答案。。。
  3. 最后将MAD值除以MAF的中位数,再乘以100

MATH的意义,作者认为MATH能有效的代表肿瘤特异性特变位点的MAF值的分布的偏差,相当于说明MAF偏离该样本的MAF整体分布的程度(有点标准差的意思),当然是MATH值越大,说明肿瘤异质性越高

暂时性的水一篇了,以后还得定期阅读文献,反正看啥不是看呢

本文出自于http://www.bioinfo-scrounger.com转载请注明出处