MethylationToActivity (M2A)

AuthorsJustin Williams, Beisi Xu, Daniel Putnam, Andrew Thrasher, Xiang Chen
PublicationMethylationToActivity: a deep-learning framework that reveals promoter activity landscapes from DNA methylomes in individual tumors
Technical SupportContact Us

Overview

MethylationToActivity (M2A) is a machine learning framework using convolutional neural networks (CNN) to infer histone modification (HM) enrichment from whole genome bisulfite sequencing (WGBS). To date, both H3K27ac and H3K4me3 enrichment prediction from WGBS is supported, from a tab-delimited text file format of M-values. Optionally, we also support transfer-learning where a user may have matching H3K27ac or H3K4me3 data with appropriate controls in addition to WGBS data.

Inputs

NameTypeDescriptionExample
Sample HM bigwig file (only if using M2A with Transfer)Input fileHM ChIP-seq experiment bigwig track.SampleName_H3K27ac.bw OR SampleName_H3K4me3.bw
Sample HM control (Input) bigwig (only if using M2A with Transfer)Input fileChIP-seq Experiment control (Input) bigwig track.SampleName_Input.bw
WGBS data fileInput fileM-values by chromosome and position (non-standard format, see below).*.txt (tab-delimited)
Promoter region definition file (provided, or user defined)Input fileFile describing promoter regions to be predicted. Provided regions include both hg19 and GRCh38 definitions (non-standard format, see below).*.txt (tab-delimited)
App-provided model inputs:
Model weights (.h5) file: 1) H3K27ac or 2) H3K4me3

Input file configuration

Promoter region definition file (if user defined)

ColumnDescription
EnsmblID_TEnsemble transcript ID (unique)
EnsmblID_GEnsemble gene ID (not unique)
Genehuman readable gene name (abbrev, not unique)
Strand+, -
Chrchr1, chr2, ... chr22, etc.
StartBeginning of transcript definition
EndEnd of transcript definition
RStartTSS - 1000bp
REndTSS + 1000bp

WGBS data file

ColumnDescription
chromchromosome ID, e.g. 1,2,3 ...22
posposition of 5' cytosine of a CpG on the positive strand
mvalcalculated mvalue of a given CpG, typically M-value=log2(Beta/1-Beta)

Outputs

NameDescription
Predictions fileThe promoter region definition file with an additional Predicted_log2_ChipDivInput_"YOUR HM MARK HERE" column (tab-delimited).
Transfer modelThe updated weights to the HM model (a .hdf5 file; only if using M2A with Transfer)

Preparing to run M2A

Before you can run one of our workflows, you must first create a workspace in DNAnexus for the run. Refer to the general workflow guide to learn how to create a DNAnexus workspace for each workflow run.

Refer to the general workflow guide to learn how to upload input files to the workspace you just created.

Refer to the general workflow guide to learn how to launch the workflow, hook up input files, adjust parameters, start a run, and monitor run progress.

Analysis of Results

Today, the M2A pipeline does not produce an interactive visualization. If M2A with Transfer was run, the easiest measurement of training prediction accuracy would be calculating the Pearson's R2, or root mean square error (RMSE) between the measured and M2A predicted values. Furthermore, comparisons of sample-sample consistency with the same/similar cancer-type (as determined by Pearson's R2) is a good start for a contextual understanding of the predictions produced by M2A.

Refer to the general workflow guide to learn how to access raw results files.

Interpreting results

For the M2A pipeline, every pipeline run outputs a predictions text file (tab-delimited) for each sample. These values represent the predicted selected HM (either H3K27ac or H3K4me3) promoter region enrichment.

Frequently asked questions

None yet!

If you have any questions not covered here, feel free to email us at support@stjude.cloud.

Similar Topics

Running our Workflows
Working with our Data Overview
Upload/Download Data (local)