% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data-ucla_textbooks_f18.R
\docType{data}
\name{ucla_textbooks_f18}
\alias{ucla_textbooks_f18}
\title{Sample of UCLA course textbooks for Fall 2018}
\format{
A data frame with 201 observations on the following 20 variables.
\describe{
  \item{year}{Year the course was offered}
  \item{term}{Term the course was offered}
  \item{subject}{Subject}
  \item{subject_abbr}{Subject abbreviation, if any}
  \item{course}{Course name}
  \item{course_num}{Course number, complete}
  \item{course_numeric}{Course number, numeric only}
  \item{seminar}{ Boolean for if this is a seminar course.}
  \item{ind_study}{ Boolean for if this is some form of independent study}
  \item{apprenticeship}{ Boolean for if this is an apprenticeship}
  \item{internship}{ Boolean for if this is an internship}
  \item{honors_contracts}{ Boolean for if this is an honors contracts course}
  \item{laboratory}{Boolean for if this is a lab}
  \item{special_topic}{ Boolean for if this is any of the special types of courses listed}
  \item{textbook_isbn}{Textbook ISBN}
  \item{bookstore_new}{ New price at the UCLA bookstore}
  \item{bookstore_used}{ Used price at the UCLA bookstore}
  \item{amazon_new}{New price sold by Amazon}
  \item{amazon_used}{Used price sold by Amazon}
  \item{notes}{Any relevant notes}
}
}
\source{
\url{https://sa.ucla.edu/ro/public/soc}

\url{https://ucla.verbacompare.com}

\url{https://www.amazon.com}
}
\usage{
ucla_textbooks_f18
}
\description{
A sample of courses were collected from UCLA from Fall 2018, and the
corresponding textbook prices were collected from the UCLA bookstore and
also from Amazon.
}
\details{
A past data set was collected from UCLA courses in Spring 2010, and Amazon
at that time was found to be almost uniformly lower than those of the UCLA
bookstore's.  Now in 2018, the UCLA bookstore is about even with Amazon on
the vast majority of titles, and there is no statistical difference in the
sample data.

The most expensive book required for the course was generally used.

The reason why we advocate for using raw amount differences instead of
percent differences is that a 20\% savings on a $10 book is minor relative
to a 20\% savings on a $100 book, meaning a small and largely insignificant
price difference on low-priced books would balance numerically (but not in a
practical sense) a moderate but important price difference on more expensive
books.  So while this tends to result in a bit less sensitivity in detecting
\emph{some} effect, we believe the absolute difference compares prices in a
more meaningful way.

Used prices contain the shipping cost but do not contain tax.  The used
prices are a more nuanced comparison, since these are all 3rd party sellers.
Amazon is often more a marketplace than a retail site at this point, and
many people buy from 3rd party sellers on Amazon now without realizing it.
The relationship Amazon has with 3rd party sellers is also challenging.
Given the frequently changing dynamics in this space, we don't think any
analysis here will be very reliable for long term insights since products
from these sellers changes frequently in quantity and price.  For this
reason, we focus only on new books sold directly by Amazon in our
comparison.  In a future round of data collection, it may be interesting to
explore whether the dynamics have changed in the used market.
}
\examples{

library(ggplot2)
library(dplyr)

ggplot(ucla_textbooks_f18, aes(x = bookstore_new, y = amazon_new)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = "orange") +
  labs(x = "UCLA Bookstore price", y = "Amazon price",
       title = "Amazon vs. UCLA Bookstore prices of new textbooks",
       subtitle = "Orange line represents y = x")

# The following outliers were double checked for accuracy
ucla_textbooks_f18_with_diff <-  ucla_textbooks_f18 \%>\%
  mutate(diff = bookstore_new - amazon_new)

ucla_textbooks_f18_with_diff \%>\%
  filter(diff > 20 | diff < -20)

# Distribution of price differences
ggplot(ucla_textbooks_f18_with_diff, aes(x = diff)) +
  geom_histogram(binwidth = 5)

# t-test of price differences
t.test(ucla_textbooks_f18_with_diff$diff)

}
\seealso{
\code{\link{textbooks}}, \code{\link{ucla_f18}}
}
\keyword{datasets}
