==============================================================================
=                                                                            =
=                       SPRINT BETA 0.3.0 RELEASE NOTES                      =
=                                                                            =
==============================================================================

Latest release SPRINT Beta 0.3.0 - 20.05.2011
Previous release SPRINT Beta 0.2.0 - 22.06.2010


-------
Content
-------

    1. Scope
    2. What's new in SPRINT Beta 0.3.0


--------
1. Scope
--------

The main change in SPRINT beta 0.3.0 is the addition of a clustering function 
to the SPRINT library of parallelized statistical R functions. It performs a 
Parallel Partitioning Around Medoids (PPAM) and is based on the pam() 
function from the cluster R package 
(http://cran.r-project.org/web/packages/cluster/index.html). 

In addition, the implementation of the SPRINT pcor() function which performs 
the Pearson correlation in parallel has been extended to allow the correlation 
between two matrices.


----------------------------------
2. What's new in SPRINT Beta 0.3.0
----------------------------------

1. A new clustering function, ppam(), has been added to the SPRINT 
   library. It performs a Parallel Partitioning Around Medoids (PPAM) and is 
   based on the pam() function from the cluster R package 
   (http://cran.r-project.org/web/packages/cluster/index.html). 
   
   The interface and parameters to parallel function ppam() are similar to 
   the serial function pam() but not identical. Ppam() requires a distance 
   matrix as input parameters. Although, ppam() does not include the option to
   calculate the distance matrix, this can easily be done using SPRINT pcor() 
   function with the 'distance' parameter set to TRUE. 

   ppam (x, k, medoids = NULL, is_dist = inherits(x, "dist"),
         cluster.only = FALSE, do.swap = TRUE, trace.lev = 0)

   where:
       - 'x' is the input distance matrix or dissimilarity matrix, depending 
         on the value of the "dist" argument. This can either be a matrix or 
         an ff object,
       - 'k' is a positive integer indicating the number of clusters. It must
         be less than the number of observations,
       - 'medoids' is either a vector specifying the initial 'k' medoids or 
         the default value NULL which indicates that the initial medoids will 
         be selected by the algorithm, 
       - 'is_dist' is a boolean indicating whether the input matrix is a 
         distance or dissimilarity matrix (TRUE) or a symmetric matrix 
         (FALSE),
       - 'cluster.only' is a boolean when set to TRUE only the clustering will
         be computed and returned. The default value is FALSE,
       - 'do.swap' is a boolean indicating if the swap phase of the algorithm 
         should take place. The default is TRUE. The swap phase is computer 
         intensive and can be skipped by setting the 'do.swap' option to 
         FALSE,
       - 'trace.lev' is an integer specifying the trace level for printing
         diagnostics during the build and swap phases of the algorithm. The 
         default value is 0 which does not produce any output. Increasing 
         values print increasing level of detailed information.

   Examples of valid calls to ppam():

    # Pre-processing step using pcor() to return an ff object containing a
    # distance matrix. 
    mcor <- pcor(matrix(rnorm(1:10000), ncol=100), distance = TRUE)
      
    p1m <- ppam(mcor, 4)
    p2m <- ppam(mcor, 4, medoids = c(1,16))
    p3m <- ppam(mcor, 3, trace = 2)
    p4m <- ppam(dist(x), 12)
                                
2. Pcor() has been extended to allow the correlation between two matrices. 
   When two 2D arrays are given as input, pcor() correlates the columns of the
   first matrix with the columns of the second matrix. When only one 2D array 
   is given as input and pcor() correlates each row with every other row of 
   the matrix.
   
   The interface to pcor has changed from:

        pcor(data, distance = FALSE, caching_ = "mmeachflush", 
             filename_ = NULL)
    to:
        pcor(data_x, data_y, distance = FALSE, caching_ = "mmeachflush", 
             filename_ = NULL)
        
    where:
        - 'data_x' is the first input matrix data,
        - 'data_y' is the second input matrix data,
        - 'distance' is a boolean indicating whether the output is to be a 
          distance matrix rather than the correlation coefficient matrix,
        - 'caching_' caching scheme for the backend, currently mmnoflush or
          mmeachflush (flush mmpages at each swap) if no name is specified
          the default value is "mmeachflush",
        - 'filename' is a string and is optional. It specifies the name of
          a file where the results will be saved. By default, the results are
          saved to a temporary file that is delete after exiting from SPRINT.

    Examples of valid calls to pcor() :
        - ff_obj <- pcor(t(inData))
        - ff_obj <- pcor(t(inData_x), t(inData_y))
        - ff_obj <- pcor(t(inData), filename_="output.dat")
        - ff_obj <- pcor(t(inData), distance=TRUE, filename_="output.dat")
        - ff_obj <- pcor(data, caching_="mmeachflush", filename_="output.dat")


==============================================================================
SPRINT Team

email: sprint@ed.ac.uk
http://www.r-sprint.org

Copyright  2011 The University of Edinburgh.
==============================================================================
