Create an object for pulse-change count data

PulseData(counts, conditions, formulas, formulaIndexes = NULL,
  spikeins = NULL, groups = NULL)

Arguments

counts

a matrix; column names correspond to sample names. The columns in counts correspond to the rows in conditions argument.

conditions

a data.frame; the first column corresponds to the conditions given in formulas. The order of rows corresponds to the columns (samples) in the counts argument.

formulas

a list, created by MeanFormulas

formulaIndexes

a list of lists (or of vectors); defines indexes of formulas used for calculation of the expected read number.

spikeins

NULL (default) or a list of two items:

  • refGroup, a character, which defines the group which should be treated as a reference for normalisation

  • spikeLists, a list of character vectors with the spike-ins names and the same structure as formulaIndexes.

groups

NULL (default) or a vector or a formula, e.g. ~ fraction + time. If the normalisation factors must be recovered during fitting, groups define the sets of the samples, which share the same normalisation factors. Hence groups is relevant only, if there were no spike-ins provided. In this case, one may assume that for the samples of the same nature, i.e. same fraction and labelling time, the efficiency of the purification is the same, which reduces the number of parameters to fit. However, it is possible to treat every sample as an individual group, hence there will be no shared parameters.

If it is NULL, groups are derived from the first column of the conditions. If a vector is provided, its elements correspond to the rows (samples) in the conditions. If, for example, there are 3 pull-down (2hr) samples purified with the protocol "A", and 3 pull-down (2hr) samples from the protocol "B", one may assume different efficiency of this protocols and reflect it in the groups argument by introducing additional column protocol in the condition matrix, which results in groups = ~ fraction + time + protocol. Alternatively, one may manually create a vector like c("pull_down.2hr.A", "pull_down.2hr.B", ...) with the order, corresponding to the sample order in the conditions.

Value

an object of class "PulseData"

It is a list with the following slots:

  • user_conditions, user_formulas, counts are the values of arguments conditions, formulas and counts, provided to the call of PulseData

  • rawFormulas is a list of initial formulas, evaluated at the corresponding conditions (e.g. time in formulas is substituted with its values in the conditions$time).

  • formulas is a list of the compiled rawFormulas

  • formulaIndexes is a list of integers (or vectors) with indexes of formulas used in estimation of the expression level in a given sample. The order of list items corresponds to the order of the samples in the conditions data.frame. See also addKnownToFormulas.

  • groups is a vector with the names of the sample groups, which is used to calculate normalisation factors.

  • depthNormalisation is a list of normalisation factors of the same structure as formulaIndexes. If no spike-ins are used, these values correspond to sequencing depth within a given group of samples according to the groups vector. For example, depth normalisation for a group "pull_down.2hr" of the pull-down samples after 2 hr of labelling. The relation between different groups, i.e. "total_fraction", "pull_down.2hr" etc., is not known and must be recovered during fitting as normFactors values. If spike-ins are provided, the relation between different fractions is recovered during the initialisation of the PulseData object and the values are written to the depthNormalisation slot.

  • interSampleCoeffs is a list, which structure is used as a sekeleton for the normalisation factors, if no spike-ins were provided. For every group in groups, there is a corresponding list item (a number of a numeric vector).

  • interSampleIndexes describes which normalisation factors to use during calculation of the mean read count in every sample (an index in the unlist(interSampleCoeffs))

Details

The conditions argument may include additional columns, which provide values for known parameters, such as time. Their name must be the same as defined in formulas. For example, if a formula is defined as mu * exp(-d * time) where time is the time point of the experiment, the condition data.frame must contain a column named time, otherwise time is treated as a parameter to fit!

Examples

formulaIndexes <- list( total_fraction = 'total', flow_through = c('unlabelled', 'labelled'), pull_down = c('labelled', 'unlabelled')) # Spike-ins definition for object creation refGroup <- "total_fraction" labelled <- c("spike1", "spike2") unlabelled <- c("spike3", "spike4") spikeLists <- list( # total samples are normalised using all spike-ins total_fraction = list(c(unlabelled, labelled)), # for every item in formulaIndexes we have a set of spike-ins: flow_through = list(unlabelled, labelled), pull_down = list(labelled, unlabelled)) # argument for the function: spikeins <- list(refGroup = refGroup, spikeLists = spikeLists)