ComutR
ComutR.Rmd
Preparing data for comut plots
The main input to comut is a dataframe with columns found in a typical maf file. Three columns are required by comut:
- Tumor_Sample_Barcode
- Hugo_Symbol
- Variant_Classification
For more information on the maf format and column names, visit maf format
Generating mock data
sample_ids <- as.character(1:10)
gene_names <- c("A", "B", "C", "D", "E", "F", "G", "H")
gene_names <- paste0("gene_", gene_names)
variant_types = c("Missense_Mutation", "Nonsense_Mutation", "In_Frame_Del", "In_Frame_Del", "Missense_Mutation", "Nonsense_Mutation", "Nonsense_Mutation")
maf <- expand.grid(
"Tumor_Sample_Barcode" = sample_ids,
"Hugo_Symbol" = gene_names
)
maf$Variant_Classification <- sample(variant_types,
size = nrow(maf), replace = TRUE)
# "Variant_Classification" = variant_types
input_maf <- maf %>%
dplyr::sample_frac(0.35)
input_maf %>% head()
## Tumor_Sample_Barcode Hugo_Symbol Variant_Classification
## 1 8 gene_B In_Frame_Del
## 2 8 gene_D In_Frame_Del
## 3 3 gene_G Nonsense_Mutation
## 4 7 gene_D In_Frame_Del
## 5 6 gene_C Nonsense_Mutation
## 6 3 gene_C In_Frame_Del
Basic Comut plots
comut(data = input_maf)
By default, comutR provides a colorscheme for the variant
classification, but this can be altered using the
variant_colors
and variant_scheme
arguments.
comut
also orders the rows by frequency of alterations.
Adding metadata
Metadata can be added through the metadata argument of comut. This expects and dataframe with Tumor_Sample_Barcode and then additional columns for the metadata.
Along with metadata, you must pass the col_maps argument, which contains a named list of color maps corresponding to the columns in the metadata.
Here we have gender and biopsy site information for each sample, and we define color maps for the variables accordingly:
meta <- data.frame("Tumor_Sample_Barcode" = sample_ids)
meta$biopsy <- sample(c("Primary", "Metastasis"), size = nrow(meta), replace = TRUE)
meta$sex <- sample(c("Male", "Female"), size = nrow(meta), replace = TRUE)
# Define color mapping
sex_colors <- c(
"Male" = "lightblue",
"Female" = "pink"
)
biopsy_type_colors <- c(
"Primary" = "lightgreen",
"Metastasis" = "maroon"
)
color_maps <- list(
"sex" = sex_colors,
"biopsy" = biopsy_type_colors
)
Note that patients without metadata will be colored grey.
comut(data = input_maf, metadata = meta, col_maps = color_maps)
Adding Barplots
Barplot data is formatted in a similar manner to metadata. A column
for Tumor_Sample_Barcode
is required, and then columns for
the barplot values. If a stacked barplot is desired (for mutation
signatures for example) then there should be multiple columns
corresponding to the proportion of each value.
Barplots are defined within a named list of named lists. The top level name of each sublist is used as the y-axis label for the barplot. Each sublist should contain “data” containing the barplot data, “colors” containing a color map for the barplot, and “legend” a boolean for whether a legend should be created.
Note: The order of the barplot data list is the reverse of the order in which they are displayed in the plot!
signatures <- data.frame("Tumor_Sample_Barcode" = sample_ids)
signatures$Aging <- runif(nrow(meta), min = 0, max = 1)
signatures$Other <- 1 - signatures$Aging
total_snp_data <- data.frame("Tumor_Sample_Barcode" = sample_ids)
total_snp_data$n_snps <- sample(1:20, size = nrow(meta), replace = TRUE)
signature_colors <- c(
"Aging" = "coral1",
"Other" = "antiquewhite"
)
# Barplot specifications
barplot_data <- list(
"Mutation Signatures" = list(
"data" = signatures,
"colors" = signature_colors,
"legend" = TRUE
),
"N SNPs" = list(
"data" = total_snp_data,
"colors" = c("n_snps" = "cadetblue3"),
"legend" = FALSE
)
)
comut(
data = input_maf,
barplot_data = barplot_data
)
## Warning: Following annotation names are duplicated: value
Putting all of the components together, we get something like this:
comut(
data = input_maf,
metadata = meta,
col_maps = color_maps,
barplot_data = barplot_data
)
## Warning: Following annotation names are duplicated: value
Text Annotations
ComutR allows the user to add arbitrary text annotations to the cells of the comut plot. This can be specified as a column name that appears in the input data. Here we show adding allele fraction annotations to each mutation:
input_maf$Allele_Fraction <- round(runif(nrow(input_maf), min = 0.1, max = 1), 1)
comut(
data = input_maf,
text_annotation = "Allele_Fraction"
)
Limiting samples or features and filling missing information
If a subset of samples or features in the input data are desired, the
user can specify the ids
or
features_of_interest
arguments respectively.
comut
will also fill in any ids or features that are not
present in the input data. This can be especially useful when one wants
to highlight a feature that is not altered in any samples.
comut(
data = input_maf,
ids = c("1", "3", "4", "7", "missing_sample"),
features_of_interest = c("gene_A", "gene_B", "gene_E", "gene_H", "gene_missing")
)
Comut also provide many arguments for the plot size, fontsize, and other options.
comut
also takes additional variable arguments for the
call to Heatmap
, so users can specify existing options like
row_names_rot
, etc.
comut(
data = input_maf,
metadata = meta,
col_maps = color_maps,
barplot_data = barplot_data,
text_annotation = "Allele_Fraction",
cell_width = 0.5,
cell_height = 0.3,
anno_fontsize = 12,
legend_fontsize = 12,
body_border = TRUE,
add_borders = TRUE,
row_names_rot = 45,
column_names_rot = 0
)
## Warning: Following annotation names are duplicated: value