| Title: | Simple Transcriptome Meta-Analysis for Identifying Stress-Responsive Genes |
|---|---|
| Description: | Stress Response score (SRscore) is a stress responsiveness measure for transcriptome datasets and is based on the vote-counting method. The SRscore is determined to evaluate and score genes on the basis of the consistency of the direction of their regulation (Up-regulation, Down-regulation, or No change) under stress conditions across multiple analyzed research projects. This package is based on the HN-score (score based on the ratio of gene expression between hypoxic and normoxic conditions) proposed by Tamura and Bono (2022) <doi:10.3390/life12071079>, and can calculate both the original method and an extended calculation method described in Fukuda et al. (2025) <doi:10.1093/plphys/kiaf105>. |
| Authors: | Yusuke Fukuda [aut, cre], Atsushi Fukushima [aut] |
| Maintainer: | Yusuke Fukuda <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-28 14:57:12 UTC |
| Source: | https://github.com/fusk-kpu/srscore |
This function computes the Stress Response ratio (SR ratio) for paired variables in a dataset. The function supports both log2-transformed and non-log2-transformed data and calculates the mean SRratio for grouped variables.
calcSRratio(.data, var1, var2, pair, is.log2 = NA)calcSRratio(.data, var1, var2, pair, is.log2 = NA)
.data |
A data frame containing expression values for a series of arrays, with rows corresponding to genes and columns to samples. |
var1 |
A character vector containing column names of control samples. |
var2 |
A character vector containing column names of treatment samples. |
pair |
A data frame with control samples and treatment samples. |
is.log2 |
A logical value (TRUE, FALSE) or NA indicating whether the data in .data is log2-transformed:
|
A data frame containing:
Character columns from the original .data.
Mean SRratio values for each unique target variable.
var1 <- "control_sample" var2 <- "treated_sample" grp <- "Series" ebg <- expand_by_group(MetadataABA, grp, var1, var2) SRratio <- calcSRratio(TranscriptomeABA, var1, var2, ebg, is.log2 = TRUE)var1 <- "control_sample" var2 <- "treated_sample" grp <- "Series" ebg <- expand_by_group(MetadataABA, grp, var1, var2) SRratio <- calcSRratio(TranscriptomeABA, var1, var2, ebg, is.log2 = TRUE)
SRscore is score value of genes based expression profiles across different research projects. SRratio is required to calculate SRscore.
calcSRscore(srratio, threshold = c(-1, 1))calcSRscore(srratio, threshold = c(-1, 1))
srratio |
A data frame of SRratio. |
threshold |
A vector of length 2 (x, y) indicating threshold values. |
A data frame containing results.
grp <- "Series" var1 <- "control_sample" var2 <- "treated_sample" ebg <- expand_by_group(MetadataABA, grp, var1, var2) SRratio <- calcSRratio(TranscriptomeABA, var1, var2, ebg, is.log2 = TRUE) head(calcSRscore(SRratio, threshold = c(-1, 1)))grp <- "Series" var1 <- "control_sample" var2 <- "treated_sample" ebg <- expand_by_group(MetadataABA, grp, var1, var2) SRratio <- calcSRratio(TranscriptomeABA, var1, var2, ebg, is.log2 = TRUE) head(calcSRscore(SRratio, threshold = c(-1, 1)))
The SRscore calculation process is divided into three major processes, and functions are provided for each process (see the respective function documents for details).
directly_calcSRscore() aggregates the results of the three functions into a single list.
directly_calcSRscore( .data1, grp, var1, var2, .data2, is.log2 = NA, threshold = c(-1, 1) )directly_calcSRscore( .data1, grp, var1, var2, .data2, is.log2 = NA, threshold = c(-1, 1) )
.data1 |
A data frame containing the two variables you want to compare, as well as the variables of the group to which they belong. |
grp |
Column name of groups. |
var1 |
Column name of first variable. |
var2 |
Column name of second variable. |
.data2 |
A data frame containing expression values for a series of arrays, with rows corresponding to genes and columns to samples. |
is.log2 |
A logical specifying if .data2 is log-2transformed. |
threshold |
A vector of length 2 (x, y) indicating threshold values. |
A data frame containing results.
grp <- "Series" var1 <- "control_sample" var2 <- "treated_sample" ls <- directly_calcSRscore(MetadataABA, grp, var1, var2, TranscriptomeABA, is.log2 = TRUE, threshold = c(-1, 1)) lapply(ls, head)grp <- "Series" var1 <- "control_sample" var2 <- "treated_sample" ls <- directly_calcSRscore(MetadataABA, grp, var1, var2, TranscriptomeABA, is.log2 = TRUE, threshold = c(-1, 1)) lapply(ls, head)
expand_by_group() generates all combinations (Cartesian product) of two specified variables within each group in your dataframe.
expand_by_group(.data, grp, var1, var2)expand_by_group(.data, grp, var1, var2)
.data |
A data frame. |
grp |
A column name indicating the group. |
var1 |
A column name indicating the control. |
var2 |
A column name indicating the treatment. |
Returns a data frame containing all combinations of the specified variables for each group. The structure of the returned data frame includes:
All combinations of var1 and var2 within each group.
The group column (grp).
Rows with NA values removed.
grp <- "Series" var1 <- "control_sample" var2 <- "treated_sample" ebg <- expand_by_group(MetadataABA, grp, var1, var2) unique_series <- unique(MetadataABA$Series) lapply(unique_series, function(x) subset(ebg, Series == x))grp <- "Series" var1 <- "control_sample" var2 <- "treated_sample" ebg <- expand_by_group(MetadataABA, grp, var1, var2) unique_series <- unique(MetadataABA$Series) lapply(unique_series, function(x) subset(ebg, Series == x))
Find the expression ratio for each experimental sample for the specified gene.
find_diffexp(genes, srratio, srscore, metadata)find_diffexp(genes, srratio, srscore, metadata)
genes |
character vector that can consist of gene IDs |
srratio |
A dataframe of srratio |
srscore |
A dataframe of srratio |
metadata |
A dataframe of metadata |
Data frame of metadata with SRratio corresponding to the specified gene ID in the back row
vr1 <- "control_sample" vr2 <- "treated_sample" grp <- "Series" ebg <- expand_by_group(MetadataABA, vr1, vr2, grp) SRratio <- calcSRratio(TranscriptomeABA, vr1, vr2, ebg, is.log = 1) SRscore <- calcSRscore(SRratio) set.seed(1) find_diffexp(sample(SRratio$ensembl_gene_id, 1), SRratio, SRscore, MetadataABA)vr1 <- "control_sample" vr2 <- "treated_sample" grp <- "Series" ebg <- expand_by_group(MetadataABA, vr1, vr2, grp) SRratio <- calcSRratio(TranscriptomeABA, vr1, vr2, ebg, is.log = 1) SRscore <- calcSRscore(SRratio) set.seed(1) find_diffexp(sample(SRratio$ensembl_gene_id, 1), SRratio, SRscore, MetadataABA)
The HN-score is a scoring metric derived from the HN-ratio, which represents the gene expression ratio between hypoxic and normoxic conditions, and was originally proposed by Tamura and Bono (2022) https://doi.org/10.3390/life12071079.
It is publicly available on figshare https://doi.org/10.6084/m9.figshare.20055086.
HNscore is provided as a data frame containing HN-scores calculated from logHNratioHypoxia and is implemented as test data in the SRscore package.
To reduce data size, HNscore includes HN-scores for a subset of 1,000 genes extracted from the original dataset.
HNscoreHNscore
A data frame with 1000 rows and 11 variables:
Transcript ID in Arabidopsis thaliana
Total number of times HNratio exceeds 2
Total number of times HNratio is below 0.5
Total number of times SRratio is between 0.5 and 2
Maximum possible HNscore
HN-score
Gene name in Arabidopsis thaliana
Gene description in Arabidopsis thaliana
Transcript ID in Homo Sapiens
Gene name in Homo Sapiens
Gene name in Homo Sapiens
Tamura, Keita, and Hidemasa Bono. 2022. “Meta-Analysis of RNA Sequencing Data of Arabidopsis and Rice Under Hypoxia.” Life 12 (7).
The HN-ratio, which quantifies gene expression changes between hypoxic and normoxic conditions across multiple experiments, was originally proposed by Tamura and Bono (2022) https://doi.org/10.3390/life12071079.
It is publicly available on figshare https://doi.org/10.6084/m9.figshare.20055086.
In the SRscore package, the HN-ratio is introduced solely as an intermediate quantity required to compute HN-scores.
logHNratioHypoxia is a data frame containing log2-transformed HN-ratios.
To reduce data size, logHNratioHypoxia includes HN-ratios for a subset of 1,000 genes extracted from the original dataset.
logHNratioHypoxialogHNratioHypoxia
An object of class data.frame with 1000 rows and 30 columns.
Column components :
Ensembl gene id + 29 treatment sample id
Tamura, Keita, and Hidemasa Bono. 2022. “Meta-Analysis of RNA Sequencing Data of Arabidopsis and Rice Under Hypoxia.” Life 12 (7).
MetadataABA is the metadata for the experimental dataset related to Arabidopsis thaliana under ABA stress conditions. Metadata are used to define pairs for comparison between the target sample group and the experimental sample group.
MetadataABAMetadataABA
A data frame with 19 rows and 4 variables:
Research project ID
control sample ID
treatment sample ID
treatment condition
tissue name
This is metadata of RNA-Seq data that is used in the study by Tamura and Bono.
MetadataHypoxiaMetadataHypoxia
A data frame with 29 rows and 4 variables:
Research project ID
RNA-Seq run accession ID
RNA-Seq run accession ID
treatment condition
This function visualizes the distribution of SRscore values using a barplot. Values equal to 0 are excluded from the plot by design because they typically represent genes without detectable stress response activity.
plot_SRscore_distr(srscore, log = FALSE)plot_SRscore_distr(srscore, log = FALSE)
srscore |
A data.frame containing at least one column named |
log |
Logical (default: |
The function provides both a linear-scale plot and a log-scale version, which is particularly useful when the frequency of SRscore values spans a wide range.
The function performs the following steps:
Validates that srscore is a data.frame and contains a score column.
Removes SRscore values equal to 0.
Produces a barplot of the frequency of SRscore values.
Optionally draws the plot on a logarithmic y-axis.
This function returns NULL invisibly and produces a barplot as a side effect.
# Example SRscore data df <- data.frame(score = c(-5, -3, -3, 1, 2, 2, 2, 4, 5, 5, 0)) # Linear-scale plot plot_SRscore_distr(df) # Log-scale plot plot_SRscore_distr(df, log = TRUE)# Example SRscore data df <- data.frame(score = c(-5, -3, -3, 1, 2, 2, 2, 4, 5, 5, 0)) # Linear-scale plot plot_SRscore_distr(df) # Log-scale plot plot_SRscore_distr(df, log = TRUE)
This function visualizes SRscore values sorted in descending order and colors each point based on user-defined thresholds. Genes with SRscore above the upper threshold are colored red (up-regulated), those below the lower threshold are colored blue (down-regulated), and values within the range are shown in black.
plot_SRscore_rank(srscore, threshold = c(1, -1))plot_SRscore_rank(srscore, threshold = c(1, -1))
srscore |
A data.frame containing at least a column named |
threshold |
A numeric vector of length 2 specifying
|
The function performs the following:
Validates input data.
Sorts SRscore values in descending order.
Colors each point based on whether its value is:
greater than or equal to the upper threshold (red)
less than or equal to the lower threshold (blue)
between the thresholds (black)
Produces a rank plot with a legend explaining the color mapping.
Invisibly returns the sorted SRscore vector. The function produces a scatter plot as a side effect.
df <- data.frame( gene = paste0("Gene", 1:10), score = c(-5, -3, -1, 0, 0.5, 1.2, 2, 3, 4, 5) ) # Basic usage plot_SRscore_rank(df) # Custom thresholds plot_SRscore_rank(df, threshold = c(2, -2))df <- data.frame( gene = paste0("Gene", 1:10), score = c(-5, -3, -1, 0, 0.5, 1.2, 2, 3, 4, 5) ) # Basic usage plot_SRscore_rank(df) # Custom thresholds plot_SRscore_rank(df, threshold = c(2, -2))
This function selects the top top_n genes with the largest absolute
SRscore values and visualizes their SRscores using a barplot.
The function is useful for quickly identifying genes with the strongest
positive or negative stress responses.
plot_SRscore_top(srscore, top_n = 20)plot_SRscore_top(srscore, top_n = 20)
srscore |
A data.frame containing at least a column named |
top_n |
Integer (default: 20).
The number of top genes to plot, ranked by |
The function performs the following steps:
Validates the input data structure.
Computes absolute SRscore via abs(score).
Selects the top top_n genes with the largest absolute score.
Re-sorts the selected genes by actual SRscore (to separate up/down).
Produces a barplot in which gene names (character columns) are used as labels.
The barplot displays:
Positive SRscore (upregulated genes) as upward bars.
Negative SRscore (downregulated genes) as downward bars.
Genes ordered from lowest to highest SRscore for visual clarity.
Graphical parameters are temporarily modified, and restored automatically
using on.exit() to avoid affecting the user's plotting environment.
Invisibly returns the data.frame of selected top genes (after sorting). A barplot is produced as a side effect.
# Example data.frame of SRscore df <- data.frame( gene = paste0("Gene", 1:10), score = c(-12, -6, -3, 1, 2, 3, 5, 8, 10, 11) ) # Plot top 5 genes by |SRscore| plot_SRscore_top(df, top_n = 5)# Example data.frame of SRscore df <- data.frame( gene = paste0("Gene", 1:10), score = c(-12, -6, -3, 1, 2, 3, 5, 8, 10, 11) ) # Plot top 5 genes by |SRscore| plot_SRscore_top(df, top_n = 5)
Test data to create data frames from all combinations between two specified variables within each group using sample data
sample_pair_testsample_pair_test
A data frame with 71 rows and 2 variables:
Control Sample ID
Treated Sample ID
SRGA is a reference test dataset that integrates standardized SRscores across 11 stress conditions as reported in Fukuda et al. (2025) https://doi.org/10.1093/plphys/kiaf105. Because SRscore scales differ by stress type, SRscores were standardized using z-scores. This dataset is provided solely for demonstrating and testing template matching (Pavlidis and Noble, 2001) https://doi.org/10.1186/gb-2001-2-10-research0042 workflows implemented in the SRscore package and is not intended to introduce a new analysis method. To reduce file size, the dataset includes SRscores for a subset of 1,000 genes.
SRGASRGA
A data frame with 1000 rows and 13 variables:
Ensembl gene ID
SRscore derived from ABA dataset
SRscore derived from cold dataset
SRscore derived from DC3000 dataset
SRscore derived from drought dataset
SRscore derived from heat dataset
SRscore derived from highlight dataset
SRscore derived from hypoxia dataset
SRscore derived from osmotic dataset
SRscore derived from oxidation dataset
SRscore derived from salt dataset
SRscore derived from wound dataset
Gene symbol
A dataframe containing SRratio calculated from TranscriptomeABA
SRratio_testSRratio_test
An object of class data.frame with 1000 rows and 20 columns.
Column components :
Ensembl gene id + 19 treatment sample id
A dataframe containing SRscore calculated from SRratio_test
SRscore_testSRscore_test
A data frame with 1000 rows and 6 variables:
Ensembl gene ID
Total number of times SRratio exceeds 2
Total number of times SRratio is below 2
Total number of times SRratio is between -2 and 2
Maximum possible SRscore
SRscore with absolute value 2 as threshold
This is RNA-Seq data that is used in the study by Tamura and Bono (2022) https://doi.org/10.3390/life12071079. The quantitative RNA-Seq data, which were calculated as transcripts per million (TPM), are available at figshare https://doi.org/10.6084/m9.figshare.20055086.
TPMHypoxiaTPMHypoxia
An object of class data.frame with 1000 rows and 59 columns.
Column components :
Ensembl gene id + 58 sample id (control : 29, treatment : 29)
This is a gene expression matrix for Arabidopsis under ABA stress conditions. The first column is the gene ID column, all others are sample ID columns. The expression data are read as raw data (CEL files) and summarized and normalized by Robust Multi-array Average (RMA). To keep the file size small, the data is limited to 1,000 genes.
TranscriptomeABATranscriptomeABA
An object of class data.frame with 1000 rows and 39 columns.
Column components :
Ensembl gene id + 38 sample id (control : 19, treatment : 19)