Create an object for pulse-change count data

PulseData(counts, conditions, formulas, formulaIndexes = NULL,
  spikeins = NULL, groups = NULL)

Arguments

counts	a matrix; column names correspond to sample names. The columns in `counts` correspond to the rows in `conditions` argument.
conditions	a data.frame; the first column corresponds to the conditions given in `formulas`. The order of rows corresponds to the columns (samples) in the `counts` argument.
formulas	a list, created by `MeanFormulas`
formulaIndexes	a list of lists (or of vectors); defines indexes of formulas used for calculation of the expected read number.
spikeins	NULL (default) or a list of two items: refGroup, a character, which defines the group which should be treated as a reference for normalisation spikeLists, a list of character vectors with the spike-ins names and the same structure as `formulaIndexes`.
groups	NULL (default) or a vector or a formula, e.g. ~ fraction + time. If the normalisation factors must be recovered during fitting, `groups` define the sets of the samples, which share the same normalisation factors. Hence `groups` is relevant only, if there were no spike-ins provided. In this case, one may assume that for the samples of the same nature, i.e. same fraction and labelling time, the efficiency of the purification is the same, which reduces the number of parameters to fit. However, it is possible to treat every sample as an individual group, hence there will be no shared parameters. If it is NULL, `groups` are derived from the first column of the `conditions`. If a vector is provided, its elements correspond to the rows (samples) in the `conditions`. If, for example, there are 3 pull-down (2hr) samples purified with the protocol "A", and 3 pull-down (2hr) samples from the protocol "B", one may assume different efficiency of this protocols and reflect it in the `groups` argument by introducing additional column `protocol` in the condition matrix, which results in `groups = ~ fraction + time + protocol`. Alternatively, one may manually create a vector like `c("pull_down.2hr.A", "pull_down.2hr.B", ...)` with the order, corresponding to the sample order in the `conditions`.

Value

an object of class "PulseData"

It is a list with the following slots:

user_conditions, user_formulas, counts are the values of arguments conditions, formulas and counts, provided to the call of PulseData
rawFormulas is a list of initial formulas, evaluated at the corresponding conditions (e.g. time in formulas is substituted with its values in the conditions$time).
formulas is a list of the compiled rawFormulas
formulaIndexes is a list of integers (or vectors) with indexes of formulas used in estimation of the expression level in a given sample. The order of list items corresponds to the order of the samples in the conditions data.frame. See also addKnownToFormulas.
groups is a vector with the names of the sample groups, which is used to calculate normalisation factors.
depthNormalisation is a list of normalisation factors of the same structure as formulaIndexes. If no spike-ins are used, these values correspond to sequencing depth within a given group of samples according to the groups vector. For example, depth normalisation for a group "pull_down.2hr" of the pull-down samples after 2 hr of labelling. The relation between different groups, i.e. "total_fraction", "pull_down.2hr" etc., is not known and must be recovered during fitting as normFactors values. If spike-ins are provided, the relation between different fractions is recovered during the initialisation of the PulseData object and the values are written to the depthNormalisation slot.
interSampleCoeffs is a list, which structure is used as a sekeleton for the normalisation factors, if no spike-ins were provided. For every group in groups, there is a corresponding list item (a number of a numeric vector).
interSampleIndexes describes which normalisation factors to use during calculation of the mean read count in every sample (an index in the unlist(interSampleCoeffs))

Details

The conditions argument may include additional columns, which provide values for known parameters, such as time. Their name must be the same as defined in formulas. For example, if a formula is defined as mu * exp(-d * time) where time is the time point of the experiment, the condition data.frame must contain a column named time, otherwise time is treated as a parameter to fit!

Examples




formulaIndexes <- list(
  total_fraction = 'total',
  flow_through   = c('unlabelled', 'labelled'),
  pull_down      = c('labelled', 'unlabelled'))

# Spike-ins definition for object creation
refGroup <- "total_fraction"

labelled <- c("spike1", "spike2")
unlabelled <- c("spike3", "spike4")

spikeLists <- list(
# total samples are normalised using all spike-ins
  total_fraction = list(c(unlabelled, labelled)),
# for every item in formulaIndexes we have a set of spike-ins:
  flow_through   = list(unlabelled, labelled),
  pull_down      = list(labelled, unlabelled))

# argument for the function:
spikeins <- list(refGroup = refGroup,
                 spikeLists = spikeLists)

Create an object for pulse-change count data

Arguments

Value

Details

Examples

Contents