Statement of Business Task:

Analyze user trends of one of Bellabeat’s products and the smart device data in order to gain actionable insight that could provide data-driven marketing strategies to unlock the potential growth of Bellabeat.

Key stakeholders:

Urška Sršen - Bellabeat’s Cofounder & CCO

Sando Mur - Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

Bellabeat marketing analytics team - A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Questions for the analysis:

What are some trends in smart device usage?

How could these trends apply to Bellabeat customers?

How could these trends help influence Bellabeat marketing strategy?

Installing and Loading Packages
install.packages("tidyverse", quiet = T)
## package 'tidyverse' successfully unpacked and MD5 sums checked
install.packages("skimr", quiet = T)
## package 'skimr' successfully unpacked and MD5 sums checked
install.packages("janitor", quiet = T)
## package 'janitor' successfully unpacked and MD5 sums checked
install.packages("dplyr", quiet = T)
## package 'dplyr' successfully unpacked and MD5 sums checked
install.packages("sqldf", quiet = T)
## package 'sqldf' successfully unpacked and MD5 sums checked
install.packages("reshape2", quiet = T)
## package 'reshape2' successfully unpacked and MD5 sums checked
library("tidyverse", quietly = T)
library("skimr", quietly = T)
library("janitor", quietly = T)
library("dplyr", quietly = T)
library("sqldf", quietly = T)
library("reshape2", quietly = T)

Importing Datasets

setwd("/Users/Jeong Park/OneDrive/Documents/Fitabase Data 2016.04.12 - 2016.05.12")
daily_Activity <- read_csv("dailyActivity_merged.csv", show_col_types = FALSE)
daily_Calories <- read_csv("dailyCalories_merged.csv", show_col_types = FALSE)
daily_Intensities <- read_csv("dailyIntensities_merged.csv", show_col_types = FALSE)
daily_Steps <- read_csv("dailySteps_merged.csv", show_col_types = FALSE)
sleep_Day <- read_csv("sleepDay_merged.csv", show_col_types = FALSE)
weight_Loginfo <- read_csv("weightLogInfo_merged.csv", show_col_types = FALSE)

Observing the data

head(daily_Activity)
## # A tibble: 6 x 15
##        Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie~
##     <dbl> <chr>             <dbl>         <dbl>           <dbl>            <dbl>
## 1  1.50e9 4/12/2016         13162          8.5             8.5                 0
## 2  1.50e9 4/13/2016         10735          6.97            6.97                0
## 3  1.50e9 4/14/2016         10460          6.74            6.74                0
## 4  1.50e9 4/15/2016          9762          6.28            6.28                0
## 5  1.50e9 4/16/2016         12669          8.16            8.16                0
## 6  1.50e9 4/17/2016          9705          6.48            6.48                0
## # ... with 9 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>
glimpse(daily_Activity)
## Rows: 940
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate             <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ TotalSteps               <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes        <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes      <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes     <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes         <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories                 <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~
head(daily_Calories)
## # A tibble: 6 x 3
##           Id ActivityDay Calories
##        <dbl> <chr>          <dbl>
## 1 1503960366 4/12/2016       1985
## 2 1503960366 4/13/2016       1797
## 3 1503960366 4/14/2016       1776
## 4 1503960366 4/15/2016       1745
## 5 1503960366 4/16/2016       1863
## 6 1503960366 4/17/2016       1728
glimpse(daily_Calories)
## Rows: 940
## Columns: 3
## $ Id          <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366~
## $ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/2016", "4/16/~
## $ Calories    <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 1775~
head(daily_Intensities) 
## # A tibble: 6 x 10
##           Id ActivityDay SedentaryMinutes LightlyActiveMinutes FairlyActiveMinu~
##        <dbl> <chr>                  <dbl>                <dbl>             <dbl>
## 1 1503960366 4/12/2016                728                  328                13
## 2 1503960366 4/13/2016                776                  217                19
## 3 1503960366 4/14/2016               1218                  181                11
## 4 1503960366 4/15/2016                726                  209                34
## 5 1503960366 4/16/2016                773                  221                10
## 6 1503960366 4/17/2016                539                  164                20
## # ... with 5 more variables: VeryActiveMinutes <dbl>,
## #   SedentaryActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, VeryActiveDistance <dbl>
glimpse(daily_Intensities)
## Rows: 940
## Columns: 10
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDay              <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ SedentaryMinutes         <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ LightlyActiveMinutes     <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ FairlyActiveMinutes      <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ VeryActiveMinutes        <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
head(daily_Steps)
## # A tibble: 6 x 3
##           Id ActivityDay StepTotal
##        <dbl> <chr>           <dbl>
## 1 1503960366 4/12/2016       13162
## 2 1503960366 4/13/2016       10735
## 3 1503960366 4/14/2016       10460
## 4 1503960366 4/15/2016        9762
## 5 1503960366 4/16/2016       12669
## 6 1503960366 4/17/2016        9705
glimpse(daily_Steps)
## Rows: 940
## Columns: 3
## $ Id          <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366~
## $ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/2016", "4/16/~
## $ StepTotal   <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019, 15506, 1054~
head(sleep_Day)
## # A tibble: 6 x 5
##           Id SleepDay  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
##        <dbl> <chr>                 <dbl>              <dbl>          <dbl>
## 1 1503960366 4/12/2016                 1                327            346
## 2 1503960366 4/13/2016                 2                384            407
## 3 1503960366 4/15/2016                 1                412            442
## 4 1503960366 4/16/2016                 2                340            367
## 5 1503960366 4/17/2016                 1                700            712
## 6 1503960366 4/19/2016                 1                304            320
glimpse(sleep_Day)
## Rows: 413
## Columns: 5
## $ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
## $ SleepDay           <chr> "4/12/2016", "4/13/2016", "4/15/2016", "4/16/2016",~
## $ TotalSleepRecords  <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
## $ TotalTimeInBed     <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~
head(weight_Loginfo)
## # A tibble: 6 x 8
##           Id Date      WeightKg WeightPounds   Fat   BMI IsManualReport    LogId
##        <dbl> <chr>        <dbl>        <dbl> <dbl> <dbl> <lgl>             <dbl>
## 1 1503960366 5/2/2016      52.6         116.    22  22.6 TRUE            1.46e12
## 2 1503960366 5/3/2016      52.6         116.    NA  22.6 TRUE            1.46e12
## 3 1927972279 4/13/2016    134.          294.    NA  47.5 FALSE           1.46e12
## 4 2873212765 4/21/2016     56.7         125.    NA  21.5 TRUE            1.46e12
## 5 2873212765 5/12/2016     57.3         126.    NA  21.7 TRUE            1.46e12
## 6 4319703577 4/17/2016     72.4         160.    25  27.5 TRUE            1.46e12
glimpse(weight_Loginfo)
## Rows: 67
## Columns: 8
## $ Id             <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212~
## $ Date           <chr> "5/2/2016", "5/3/2016", "4/13/2016", "4/21/2016", "5/12~
## $ WeightKg       <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, ~
## $ WeightPounds   <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6~
## $ Fat            <dbl> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ BMI            <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,~
## $ IsManualReport <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ~
## $ LogId          <dbl> 1.46223e+12, 1.46232e+12, 1.46051e+12, 1.46128e+12, 1.4~

Cleaning the Data

Most of the columns are redundant, daily_Activity contains data for calories, intensities and steps. Therefore, it will be easier to analyze the data once the data frames are merged.
daily_Activity <- daily_Activity %>%
  rename(Date = ActivityDate)

daily_Steps <- daily_Steps %>%
  rename(TotalSteps = StepTotal)

sleep_Day <- sleep_Day %>% rename(Date = SleepDay)

join_1 <- inner_join(daily_Activity, daily_Calories, by = c('Id', 'Calories'))
## Warning in inner_join(daily_Activity, daily_Calories, by = c("Id", "Calories")): Each row in `x` is expected to match at most 1 row in `y`.
## i Row 23 of `x` matches multiple rows.
## i If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
join_2 <- inner_join(daily_Activity, daily_Intensities)
## Joining with `by = join_by(Id, VeryActiveDistance, ModeratelyActiveDistance,
## LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes,
## FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes)`
## Warning in inner_join(daily_Activity, daily_Intensities): Each row in `x` is expected to match at most 1 row in `y`.
## i Row 105 of `x` matches multiple rows.
## i If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
join_3 <- inner_join(daily_Activity, daily_Steps)
## Joining with `by = join_by(Id, TotalSteps)`
## Warning in inner_join(daily_Activity, daily_Steps): Each row in `x` is expected to match at most 1 row in `y`.
## i Row 105 of `x` matches multiple rows.
## i If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
join_4 <- inner_join(daily_Activity, sleep_Day)
## Joining with `by = join_by(Id, Date)`
## Warning in inner_join(daily_Activity, sleep_Day): Each row in `x` is expected to match at most 1 row in `y`.
## i Row 436 of `x` matches multiple rows.
## i If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
merge_A <- merge(join_1, join_2)
merge_B <- merge(join_3, join_4)
daily_userData <- merge(merge_A, merge_B) %>% select(-ActivityDay)

head(daily_userData)
##           Id      Date TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016      13162          8.50            8.50
## 2 1503960366 4/13/2016      10735          6.97            6.97
## 3 1503960366 4/15/2016       9762          6.28            6.28
## 4 1503960366 4/16/2016      12669          8.16            8.16
## 5 1503960366 4/17/2016       9705          6.48            6.48
## 6 1503960366 4/19/2016      15506          9.88            9.88
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035
##   TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1                 1                327            346
## 2                 2                384            407
## 3                 1                412            442
## 4                 2                340            367
## 5                 1                700            712
## 6                 1                304            320

Check for NA values and see if they will have an affect on the analysis.

cbind(lapply(lapply(daily_userData, is.na),sum))
##                          [,1]
## Id                       0   
## Date                     0   
## TotalSteps               0   
## TotalDistance            0   
## TrackerDistance          0   
## LoggedActivitiesDistance 0   
## VeryActiveDistance       0   
## ModeratelyActiveDistance 0   
## LightActiveDistance      0   
## SedentaryActiveDistance  0   
## VeryActiveMinutes        0   
## FairlyActiveMinutes      0   
## LightlyActiveMinutes     0   
## SedentaryMinutes         0   
## Calories                 0   
## TotalSleepRecords        0   
## TotalMinutesAsleep       0   
## TotalTimeInBed           0
sum(is.na(weight_Loginfo))
## [1] 65
n_distinct(weight_Loginfo$Id)
## [1] 8
nrow(weight_Loginfo)
## [1] 67
weight_Loginfo has 65 missing values of the 68 rows in the Fat column. This data frame also has a very low number of participants, only 8. Therefore, it will be difficult to provide reliable recommendations through this data frame and will not be used in the analysis.

Analysis

Summarzing the data for analysis
summarize(daily_userData)
## data frame with 0 columns and 1 row

Visualizations

The scatter plot below depicts a the relationship between minutes spent in each activity level and total daily calories burned. Overall we can see that very active minutes and calories burned have a positive relationship. This suggests that the more physical activity the participant did, the more calories they burned. This could also be seen with the fairly active minutes, but noticeably less so. Furthermore, the slope for the lightly active and sedentary active minutes is close to zero. Therefore, regardless of time, calories burned is most likely due to metabolism.
daily_userData <- daily_userData %>% mutate(TimeVeryActive = VeryActiveMinutes/60,
                                            TimeFairlyActive = FairlyActiveMinutes/60,
                                            TimeLightlyActive = LightlyActiveMinutes/60,
                                            TimeSedentary = SedentaryMinutes/60)

Activitytime.gathered <- daily_userData %>% gather(key = 'variables', value = 'ActivityLevel',-Calories,-Id, -Date, -TotalSleepRecords, -TotalMinutesAsleep, -TotalTimeInBed, -TotalSteps, -TotalDistance, -TrackerDistance, -LoggedActivitiesDistance, -VeryActiveDistance, -ModeratelyActiveDistance,-LightActiveDistance, -SedentaryActiveDistance, -VeryActiveMinutes, -FairlyActiveMinutes, -LightlyActiveMinutes, -SedentaryMinutes)

head(Activitytime.gathered)
##           Id      Date TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016      13162          8.50            8.50
## 2 1503960366 4/13/2016      10735          6.97            6.97
## 3 1503960366 4/15/2016       9762          6.28            6.28
## 4 1503960366 4/16/2016      12669          8.16            8.16
## 5 1503960366 4/17/2016       9705          6.48            6.48
## 6 1503960366 4/19/2016      15506          9.88            9.88
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035
##   TotalSleepRecords TotalMinutesAsleep TotalTimeInBed      variables
## 1                 1                327            346 TimeVeryActive
## 2                 2                384            407 TimeVeryActive
## 3                 1                412            442 TimeVeryActive
## 4                 2                340            367 TimeVeryActive
## 5                 1                700            712 TimeVeryActive
## 6                 1                304            320 TimeVeryActive
##   ActivityLevel
## 1     0.4166667
## 2     0.3500000
## 3     0.4833333
## 4     0.6000000
## 5     0.6333333
## 6     0.8333333
Activitytime.gathered <- Activitytime.gathered %>% 
mutate(across(variables, factor, levels = c('TimeVeryActive', 'TimeFairlyActive', 'TimeLightlyActive', 'TimeSedentary')))
## Warning: There was 1 warning in `mutate()`.
## i In argument: `across(...)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
ggplot(Activitytime.gathered, aes(x=ActivityLevel, y=Calories, color = ActivityLevel)) + geom_point() + stat_smooth(method=lm) + facet_wrap(~variables, scale = 'free') +  scale_color_gradient(low = "blue", high = "red") + labs(title="Relationship Between Activity Level and Calories Burned") 
## `geom_smooth()` using formula 'y ~ x'

This figure shows the relationship between total steps taken and calories burned. There seems to be a positive relationship which reflects the expectation that if the participant took more steps they will burn more calories.
ggplot(daily_userData, aes(x=TotalSteps, y=Calories, color = TotalSteps)) + geom_point() + stat_smooth(method=lm) + scale_color_gradient(low = "red", high = "purple") + labs(title ='Relationship between Total Steps and Calories Burned') +  theme(legend.position="none")
## `geom_smooth()` using formula 'y ~ x'

The scatterplot below shows the relationship between the time slept and minutes spent in each activity level. Overall for very, fairly and lightly active minutes the slopes are close to zero suggests that being active during the day allows one to get better quality sleep. Meanwhile the slope for total sleep and sedentary minutes is negative which suggests that the more one is sedentary the lower quality of sleep that they get.
daily_userData <- daily_userData %>% mutate(TotalSleep = TotalMinutesAsleep/60)

Sleepquality.gathered <- daily_userData %>% gather(key = 'variables', value = 'ActivityLevel', -TotalSleep, -Id, -Date, -Calories, -TotalSleepRecords, -TotalMinutesAsleep, -TotalTimeInBed, -TotalSteps, -TotalDistance, -TrackerDistance, -LoggedActivitiesDistance, -VeryActiveDistance, -ModeratelyActiveDistance,-LightActiveDistance, -SedentaryActiveDistance, -VeryActiveMinutes, -FairlyActiveMinutes, -LightlyActiveMinutes, -SedentaryMinutes)

head(Sleepquality.gathered)
##           Id      Date TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016      13162          8.50            8.50
## 2 1503960366 4/13/2016      10735          6.97            6.97
## 3 1503960366 4/15/2016       9762          6.28            6.28
## 4 1503960366 4/16/2016      12669          8.16            8.16
## 5 1503960366 4/17/2016       9705          6.48            6.48
## 6 1503960366 4/19/2016      15506          9.88            9.88
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035
##   TotalSleepRecords TotalMinutesAsleep TotalTimeInBed TotalSleep      variables
## 1                 1                327            346   5.450000 TimeVeryActive
## 2                 2                384            407   6.400000 TimeVeryActive
## 3                 1                412            442   6.866667 TimeVeryActive
## 4                 2                340            367   5.666667 TimeVeryActive
## 5                 1                700            712  11.666667 TimeVeryActive
## 6                 1                304            320   5.066667 TimeVeryActive
##   ActivityLevel
## 1     0.4166667
## 2     0.3500000
## 3     0.4833333
## 4     0.6000000
## 5     0.6333333
## 6     0.8333333
Sleepquality.gathered <- Sleepquality.gathered %>% 
  mutate(across(variables, factor, levels = c('TimeVeryActive', 'TimeFairlyActive', 'TimeLightlyActive', 'TimeSedentary')))

ggplot(Sleepquality.gathered, aes(x=ActivityLevel, y= TotalSleep, color = TotalSleep)) + geom_point() + stat_smooth(method=lm) + facet_wrap(~variables, scale = 'free') + scale_color_gradient(low = "black", high = "yellow") + labs(title="Relationship Between Total Sleep and Activity Level")
## `geom_smooth()` using formula 'y ~ x'

Conclusion

From the analysis there are clear trends that gave interesting insights that could be applicable to the marketing strategy for Bellabeat in the global smart device market.

These insights were:

  1. There is a clear relation between higher physical activity and more calories burned.

  2. More activity is linked with higher quality of sleep.

Recommendation

  1. Recommend users to set goals for total amount of steps taken in a day. Enable notifications to encourage users to meet the goal and if they achieve it, to set a higher goal when they feel ready.

  2. Include a function in the Bellabeat app to alert users to try and get at least 30 minutes of moderate activity if data shows that they are often sedentary throughout the day.

  3. Have the app notify users with encouraging and motivating messages, especially if they have been sedentary for a extended period of time to motivate activity.

  4. Enhance the app to inform users of disruptive sleeping habits, such as irregular sleep schedules or not enough activity.