% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/computeLm.R
\name{computeLm}
\alias{computeLm}
\title{Fit Linear Model and return its coefficients.}
\usage{
computeLm(channel, tableName, formula, tableInfo = NULL, categories = NULL,
  sampleSize = 1000, where = NULL, test = FALSE)
}
\arguments{
\item{channel}{connection object as returned by \code{\link{odbcConnect}}}

\item{tableName}{Aster table name}

\item{formula}{an object of class "formula" (or one that can be coerced to that class): 
a symbolic description of the model to be fitted. The details of model 
specification are given under `Details`.}

\item{tableInfo}{pre-built table summary with data types}

\item{categories}{vector with column names containing categorical data. Optional if the column is of
character type as it is automatically treated as categorical predictors. But if numerical 
column contains categorical data then then it has to be specified for a model to view it
as categorical. Apply extra care not to have columns with too many values (approximaltely > 10) 
as categorical because each value results in dummy predictor variable added to the model.}

\item{sampleSize}{function always computes regression model coefficent on all data in the table.
But it computes predictions and returns an object of \code{\link{class}} "lm" based on sample
of data. The sample size is in an absolute value for number of rows in the sample. 
Be careful not overestimating the size as all results are loaded into memory. 
Special value \code{"all"} or \code{"ALL"} will include all data in computation.}

\item{where}{specifies criteria to satisfy by the table rows before applying
computation. The creteria are expressed in the form of SQL predicates (inside
\code{WHERE} clause).}

\item{test}{logical: if TRUE show what would be done, only (similar to parameter \code{test} in \link{RODBC} 
functions like \link{sqlQuery} and \link{sqlSave}).}
}
\value{
\code{computeLm} returns an object of \code{\link{class}} \code{"toalm", "lm"}.

The function \code{summary} .....

For backward compatibility 
Outputs data frame containing 3 columns:
\describe{
  \item{coefficient_name}{name of predictor table column, zeroth coefficient name is "0"}
  \item{coefficient_index}{index of predictor table column starting with 0}
  \item{value}{coefficient value}
}
}
\description{
Outputs coefficients of the linear model fitted to Aster table according
to the formula expression containing column names. The zeroth coefficient corresponds 
to the slope intercept. R formula expression with column names for response and 
predictor variables is exactly as in \code{\link{lm}} function (though less 
features supported).
}
\details{
Models for \code{computeLm} are specified symbolically. A typical model has the form 
\code{response ~ terms} where response is the (numeric) column and terms is a series of
column terms which specifies a linear predictor for response. A terms specification of 
the form \code{first + second} indicates all the terms in first together with all the 
terms in second with duplicates removed. A specification of the form \code{first:second} 
and \code{first*second} (interactions) are not supported yet.
}
\examples{
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# batting average explained by rbi, bb, so 
lm1 = computeLm(channel=conn, tableName="batting_enh", formula= ba ~ rbi + bb + so)
summary(lm1)

# with category predictor league and explicit sample size
lm2 = computeLm(channel=conn, tableName="batting_enh", formula= ba ~ rbi + bb + so + lgid,
                , sampleSize=10000, where="lgid in ('AL','NL') and ab > 30") 
summary(lm2)
}

}

