[ADP] 밀도기반 군집 (Density-based Clustering)

데이터분석/R

[ADP] 밀도기반 군집 (Density-based Clustering)

버섯도리 2022. 1. 16. 11:55

> # 13. 밀도기반 군집 (Density-based Clustering)
>
>
> # iris dataset에 대해 DBSCAN() 함수를 이용한 밀도기반군집을 실행한다.
> library(fpc)
>
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> iris1 <- iris[-5]
> head(iris1)
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1          5.1         3.5          1.4         0.2
2          4.9         3.0          1.4         0.2
3          4.7         3.2          1.3         0.2
4          4.6         3.1          1.5         0.2
5          5.0         3.6          1.4         0.2
6          5.4         3.9          1.7         0.4
>
> df <- dbscan(iris1, eps = 0.42, MinPts = 5)
> table(df$cluster, iris$Species)

    setosa versicolor virginica
  0      2         10        17
  1     48          0         0
  2      0         37         0
  3      0          3        33
> # iris species와 비교한다.
>
> plot(df, iris1)

>
> plotcluster(iris1, df$cluster)

> # 위 그림은 어느 군집에도 속하지 않는 잡음점(noise)은 검은 점 또는 '0'으로 나타난다.
>
> # DBSCAN 장점
> # 1) K-평균군집과는 달리 군집의 수를 미리 정할 필요가 없다.
> # 2) 임의의 형태를 가지는 군집을 찾을 수 있다.
> # 3) 잡음(Noise) 자료에 대한 정보를 제공하며 이상치에 민감하지 않다.
> # 4) 단 2개의 파라미터만 요구되며 데이터베이스 값들의 순서에는 민감하지 않다.
> #
> # DBSCAN 단점
> # 1) 경계점은 두 군집 모두에 속할 수 있다.
>

출처 : 2020 데이터 분석 전문가 ADP 필기 한 권으로 끝내기