> ### 2.13 dplyr 응용
>
> #### 2.13.1 데이터 전처리 응용 1
>
> P1 = Audi %>%
+ mutate(year_G = ifelse(year < 2000, 1990,
+ ifelse(year < 2010, 2000, 2010))) %>%
+ group_by(year_G,transmission) %>%
+ summarise(Count = n(),
+ Mean_Price = mean(price),
+ Median_Price = median(price)) %>%
+ mutate(Perc = Count / sum(Count)) %>%
+ arrange(year_G,-Mean_Price)
`summarise()` has grouped output by 'year_G'. You can override using the `.groups` argument.
>
>
> #### 2.13.1 데이터 전처리 응용 2
>
> P2 = Audi %>%
+ filter(price > quantile(price, probs = c(0.9))) %>%
+ group_by(model) %>%
+ summarise(Mean_Price = mean(price),
+ Mean_Mileage = mean(mileage),
+ Mean_Tax = mean(tax))
>
> P2
# A tibble: 21 x 4
model Mean_Price Mean_Mileage Mean_Tax
<chr> <dbl> <dbl> <dbl>
1 " A3" 36990 3059 145
2 " A4" 43232. 4631. 145.
3 " A5" 38833. 5642. 145.
4 " A6" 43602. 6010. 146.
5 " A7" 43217 6928. 151.
6 " A8" 42699. 5402. 151.
7 " Q2" 38495 5950 148.
8 " Q3" 38571. 3495 146.
9 " Q5" 42151. 6504. 145.
10 " Q7" 51328. 11930. 152.
# ... with 11 more rows
>
>
> #### 2.13.3 데이터 전처리 응용 3
>
> DIR = "F:/1_Study/1_BigData/12_R/02_Practical-R/Data/"
> ListFiles = list.files(DIR)
> List_Length = length(ListFiles)
>
> # audi, bmw, cclass
> Brand_Data = data.frame()
>
> for(k in 1:List_Length){
+
+ DF = read.csv(paste0(DIR,ListFiles[k]),stringsAsFactors = FALSE)
+
+ DF2 = DF %>%
+ mutate(year_G = ifelse(year < 2000, 1990,
+ ifelse(year < 2010, 2000,2010))) %>%
+ group_by(year_G,transmission) %>%
+ summarise(Count = n(),
+ Mean_Price = mean(price),
+ Median_Price = median(price)) %>%
+ mutate(Perc = Count / sum(Count)) %>%
+ arrange(year_G,-Mean_Price) %>%
+ mutate(Brand = gsub(".csv","",ListFiles[k]))
+
+ Brand_Data = Brand_Data %>%
+ bind_rows(DF2)
+
+ }
`summarise()` has grouped output by 'year_G'. You can override using the `.groups` argument.
`summarise()` has grouped output by 'year_G'. You can override using the `.groups` argument.
`summarise()` has grouped output by 'year_G'. You can override using the `.groups` argument.
> Brand_Data
year_G transmission Count Mean_Price Median_Price Perc Brand
1 1990 Automatic 2 4824.500 4824.5 1.0000000000 audi
2 2000 Automatic 26 8746.962 5945.0 0.2921348315 audi
3 2000 Semi-Auto 2 5995.000 5995.0 0.0224719101 audi
4 2000 Manual 61 4716.377 3995.0 0.6853932584 audi
5 2010 Automatic 2680 28410.968 26700.0 0.2533799754 audi
6 2010 Semi-Auto 3589 27173.577 24250.0 0.3393211686 audi
7 2010 Manual 4308 16262.237 15817.5 0.4072988560 audi
8 1990 Manual 5 3968.000 3950.0 0.7142857143 bmw
9 1990 Automatic 2 3597.500 3597.5 0.2857142857 bmw
10 2000 Semi-Auto 3 9228.333 9495.0 0.0263157895 bmw
11 2000 Automatic 50 6335.780 5720.0 0.4385964912 bmw
12 2000 Manual 61 5834.885 4195.0 0.5350877193 bmw
13 2010 Semi-Auto 4663 27371.413 24990.0 0.4374296435 bmw
14 2010 Automatic 3536 22657.091 19500.0 0.3317073171 bmw
15 2010 Manual 2461 14877.315 13750.0 0.2308630394 bmw
16 1990 Automatic 3 3496.667 4450.0 1.0000000000 cclass
17 2000 Semi-Auto 1 5995.000 5995.0 0.0263157895 cclass
18 2000 Automatic 36 4733.778 3997.0 0.9473684211 cclass
19 2000 Manual 1 1495.000 1495.0 0.0263157895 cclass
20 2010 Semi-Auto 2070 25405.171 24992.0 0.5365474339 cclass
21 2010 Automatic 1589 23070.021 21544.0 0.4118714360 cclass
22 2010 Manual 198 14437.808 13998.5 0.0513219285 cclass
23 2010 Other 1 11995.000 11995.0 0.0002592017 cclass
출처 : 실무 프로젝트로 배우는 데이터 분석 with R
'데이터분석 > R' 카테고리의 다른 글
[실무 프로젝트로 배우는...] 데이터 구조 변환 (0) | 2022.01.21 |
---|---|
[실무 프로젝트로 배우는...] 데이터 시각화 기초 (ggplot2 패키지) (0) | 2022.01.21 |
[실무 프로젝트로 배우는...] 데이터 전처리 2 - dplyr 패키지 (0) | 2022.01.20 |
[실무 프로젝트로 배우는...] 데이터 전처리 1 - apply (0) | 2022.01.19 |
[ADP] 순차 패턴 분석 (Sequence Pattern Analysis) (0) | 2022.01.16 |