tidyr:: (3) iris data에 적용해보자 (+pivot

여담 line --------------------------------------------------------------

원래 굳이 응용까지 할 생각은 없었는데
내 데이터에 tidyr의 spread 함수가 안굴러가더라...
왜 안되나 구글링해보다가 iris data 분석한 걸 찾았다.
한번 따라해보다가 기록해둬야될 것 같아 글로 적는다.
각설하고 시작하자

---------------------------------------------------------------------------

참고 페이지: https://stackoverflow.com/questions/60083062/tidyrspread-error-each-row-of-output-must-be-identified-by-a-unique-combina

WANT TO GET :

PLAN: iris data의 Sepal, Petal을 Part에 넣고 column은 Length / Width로 간단히 보자.

1) column 순서 편하게 정렬

iris_re <- iris[,c(5,1,2,3,4)]
iris_re %>% head()

output:

  Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1  setosa          5.1         3.5          1.4         0.2
2  setosa          4.9         3.0          1.4         0.2
3  setosa          4.7         3.2          1.3         0.2
4  setosa          4.6         3.1          1.5         0.2
5  setosa          5.0         3.6          1.4         0.2
6  setosa          5.4         3.9          1.7         0.4

2) Sepal.Length~Petal.Width 열들을 gather 함수로 변환

iris.wide = iris_re %>% gather(-Species, key="name", value="measurement")
iris.wide %>% head()

output:

  Species         name measurement
1  setosa Sepal.Length        5.1
2  setosa Sepal.Length        4.9
3  setosa Sepal.Length        4.7
4  setosa Sepal.Length        4.6
5  setosa Sepal.Length        5.0
6  setosa Sepal.Length        5.4

[NOTE]

Species는 gather 안할거잖아!
그럼 gather(-Species) 해서 이 column은 그대로 두게 해!

3) Sepal.Length~Petal.Width 열들을 gather 함수로 변환

iris.wide = iris.wide %>% separate(name, into = c("part", "method"), sep="[.]")
iris.wide %>% head()

output:

  Species  part method measurement
1  setosa Sepal Length         5.1
2  setosa Sepal Length         4.9
3  setosa Sepal Length         4.7
4  setosa Sepal Length         4.6
5  setosa Sepal Length         5.0
6  setosa Sepal Length         5.4

[NOTE]

sep="." 으로 하면 error!
글고 오히려 sep argument 안쓰고 코드 돌리면 error 안뜸.
default가 sep = "[^[:alnum:]] +"이기 때문에 모든 특수문자 기준으로 다 sep 되기 때문이다.

4) Error !

iris.wide %>% spread(method, measurement)

output:

error. # 아 ㅋㅋ;

error: Each row of output must be identified by a unique combination of keys.

Why?
보면 head()로 나온 결과 1~6 rows가 모두 'setosa-Sepal-Length'로 같은 값.
이게 중복행이 없고 unique한 값들만 나와야 한다.

5) Solution

step 1:

iris_re %>% 
  gather(key = "name", value = "measurement", -Species) %>%
  separate(name, into = c("Part","Method")) %>%
  group_by(Species, Part, Method) %>%
  mutate(rn = row_number())

output:

# A tibble: 600 x 5
# Groups:   Species, Part, Method [12]
   Species Part  Method measurement    rn
 1 setosa  Sepal Length         5.1     1
 2 setosa  Sepal Length         4.9     2
 3 setosa  Sepal Length         4.7     3
 4 setosa  Sepal Length         4.6     4
 5 setosa  Sepal Length         5       5
 6 setosa  Sepal Length         5.4     6
 7 setosa  Sepal Length         4.6     7
 8 setosa  Sepal Length         5       8
 9 setosa  Sepal Length         4.4     9
10 setosa  Sepal Length         4.9    10
# ... with 590 more rows

[코드 해석]
그룹화하고, 그렇게 생성된 그룹마다 row_number 생성
→ 이를 통해 Species, Part, Method를 그룹핑해도 모두 unique한 열 만들어지게끔!

step 2:

result = iris_re %>% 
  gather(key = "flower_att", value = "measurement",
         -Species) %>%
  separate(flower_att, into = c("Part","Method")) %>%
  group_by(Species, Part, Method) %>%
  mutate(rn = row_number()) %>% 
  ungroup %>%
  spread(Method, measurement)
  
  head(result)

output:

# A tibble: 6 x 5
  Species Part     rn Length Width
1 setosa  Petal     1    1.4   0.2
2 setosa  Petal     2    1.4   0.2
3 setosa  Petal     3    1.3   0.2
4 setosa  Petal     4    1.5   0.2
5 setosa  Petal     5    1.4   0.2
6 setosa  Petal     6    1.7   0.4

해결 !

+6) Another Solution

iris_re %>%
  pivot_longer(cols = -Species, names_to = c("Part", ".value"),
               names_sep= "[.]")

output:

# A tibble: 300 x 4
   Species Part  Length Width
 1 setosa  Sepal    5.1   3.5
 2 setosa  Petal    1.4   0.2
 3 setosa  Sepal    4.9   3  
 4 setosa  Petal    1.4   0.2
 5 setosa  Sepal    4.7   3.2
 6 setosa  Petal    1.3   0.2
 7 setosa  Sepal    4.6   3.1
 8 setosa  Petal    1.5   0.2
 9 setosa  Sepal    5     3.6
10 setosa  Petal    1.4   0.2
# ... with 290 more rows

[pivot_longer]
use pivot_longer from tidyr , which can also take multiple columns

내 전체 코드:

head(iris)

iris_re <- iris[,c(5,1,2,3,4)]
iris_re %>% head()

iris.wide = iris_re %>% gather(-Species, key="name", value="measurement")
iris.wide %>% head()
a = iris.wide %>% separate(name, into = c("part", "method"))
a %>% head()
 
b = iris.wide %>% separate(name, into = c("part", "method"), sep=".") #잘못된 코드
 
b = iris.wide %>% separate(name, into = c("part", "method"), sep="[.]") #correct!
b %>% head()

b %>% spread(method, measurement) #error



# solution -----------------------------------------
iris_re %>% 
  gather(key = "name", value = "measurement",
         -Species) %>%
  separate(name, into = c("Part","Method")) %>%
  group_by(Species, Part, Method) %>%
  mutate(rn = row_number()) 
# result 해석: 그룹화하고, 그렇게 생성된 그룹마다 row_number 생성
# => 이를 통해 Species, Part, Method를 그룹핑해도 모두 unique한 열 만들어지게끔!


result = iris_re %>% 
  gather(key = "flower_att", value = "measurement",
         -Species) %>%
  separate(flower_att, into = c("Part","Method")) %>%
  group_by(Species, Part, Method) %>%
  mutate(rn = row_number()) %>% 
  ungroup %>%
  spread(Method, measurement)

# View(result)
head(result)



# goooood & simple solution ------------------------

final = iris_re %>%
  pivot_longer(cols = -Species, names_to = c("Part", ".value"),
               names_sep= "[.]")

head(final)

저작자표시

'R' 카테고리의 다른 글

tidyr:: (2) Split or combine: separate, separate_rows, unite (0)	2022.07.21
tidyr:: (1) reshape function: gather, spread (0)	2022.07.21
'tidyverse' package (0)	2022.07.21

꼭 짱이 돼야지. 꼭 짱이 돼서 데이터를 분석으로 패버릴 거야.

tidyr:: (3) iris data에 적용해보자 (+pivot_longer)

WANT TO GET :

1) column 순서 편하게 정렬

2) Sepal.Length~Petal.Width 열들을 gather 함수로 변환

3) Sepal.Length~Petal.Width 열들을 gather 함수로 변환

4) Error !

5) Solution

step 1:

step 2:

+6) Another Solution

'R' 카테고리의 다른 글

티스토리툴바

tidyr:: (3) iris data에 적용해보자 (+pivot_longer)

WANT TO GET :

1) column 순서 편하게 정렬

2) Sepal.Length~Petal.Width 열들을 gather 함수로 변환

3) Sepal.Length~Petal.Width 열들을 gather 함수로 변환

4) Error !

5) Solution

step 1:

step 2:

+6) Another Solution

'R' 카테고리의 다른 글

'R' Related Articles

티스토리툴바