如何使用Purrr的Pmap和Dplyr的Semi_Join使用两个主数据帧过滤多个数据帧

[英]How to Filter Several Dataframes Using Two Master Dataframes with Purrr's Pmap and Dplyr's Semi_Join


 library(tidyverse)

Using the sample datasets below (three for 201603, and three for 201602), I'm attempting to use two master datasets to filter several other datasets by using tidyverse tools such as dplyr::semi_join, purrr::map2, and purrr::pmap. I would like the output to be a list of the filtered data. However I've become stuck while trying to use pmap() to accomplish all this in one set of code so I can output one list of all the filtered data.

使用下面的样本数据集(201603为三个,201602为三个),我尝试使用两个主数据集通过使用tidyverse工具过滤其他几个数据集,如dplyr :: semi_join,purrr :: map2和purrr :: PMAP。我希望输出是过滤数据的列表。但是,在尝试使用pmap()在一组代码中完成所有这些时,我已经陷入困境,因此我可以输出所有过滤数据的一个列表。

This example is a bit complicated, so I'll try to break it down. First, I used this code to put the two "201603" datasets that need to be filtered into a list called "List1". I then used purrr::map to change the Id column name so that I can use a semi_join after.

这个例子有点复杂,所以我会尝试将其分解。首先,我使用此代码将需要过滤的两个“201603”数据集放入名为“List1”的列表中。然后我使用purrr :: map来更改Id列名,以便我可以在之后使用semi_join。

List1<-list(One201603,Two201603)%>%
map(~rename(.x,"Id"="ID"))

Next, I used purrr::map2 to iterate over List1, and a list of MainData201603, which is the master dataset for 201603 that is being used to filter the other two datasets. This outputs a list, and then I change the list element names with purrr::set_names.

接下来,我使用purrr :: map2迭代List1,以及MainData201603列表,它是201603的主数据集,用于过滤其他两个数据集。这会输出一个列表,然后我用purrr :: set_names更改列表元素名称。

Df<-map2(List1,list(MainData201603),semi_join,by="Id")%>%
set_names(c("One201603","Two201603"))

Everything works up to here, but now I have another master dataset called "MainData201602", as well as two other datasets that need to be filtered, "One201602", and "Two201602". I could simply follow the same steps as above, but I don't want to repeat code and I feel there's a more elegant way to do it all in one step using purrr tools such as pmap(). I'm thinking something like the code below, which doesn't work. I tried several other variations of pmap and lists, but couldn't figure it out. Help would be appreciated!

一切都在这里工作,但现在我有另一个名为“MainData201602”的主数据集,以及需要过滤的另外两个数据集,“One201602”和“Two201602”。我可以简单地按照上面的相同步骤,但我不想重复代码,我觉得有一种更优雅的方法可以使用诸如pmap()之类的purrr工具一步完成。我正在考虑类似下面的代码,这不起作用。我尝试了pmap和列表的其他几种变体,但无法弄明白。帮助将不胜感激!

List2<-list(One201602,Two201602)%>%
map(~rename(.x,"Id"="ID"))

 Df<-pmap(list(List1,list(KPI201603)),list(List2,list(KPI201602)),semi_join,by="Id")%>%
map(~set_names(.x,c("One201603","Two201603","One201602","Two201602")))



Id<-c(6666,3333,1111,9999,8888,5555,2222,4444)
Animal<-c("Rabbit", "Moose", "Dog", "Cat", "Squirrel", "Raccoon", "Fish", "Elephant")
MainData201603<-data_frame(Id,Animal)

ID<-c(6666,3333,4545,6767,3322,1111,8888,8876,9990,1234,7775,3445)
Person<-c("Chris","Yuki","Mike","Darren","Katrina","Camilla","Dreanna","Nathan","Aisha","Sra","Pierre","Luigi")
One201603<-data_frame(ID,Person)

ID<-c(6666,8888,4453,1243,1111,5567,4543,8898,4444,7665,7889,5554)
Person<-c("Mr.K","Ms.S","Mr.P","Mr.B","Mrs.N","Mrs.W","Mr.D","Ms.A","Ms.M","Mr.X","Mrs.Z","Ms.T")
Two201603<-data_frame(ID,Person) 

MainData201602<-MainData201603

One201602<-One201603

Two201602<-Two201603

1 个解决方案

#1


0  

Here is a possible solution. The idea is to create List1 and List2 with proper names. And then create a list called target, containing List1 and List2, and a list called master, containing list(MainData201603) and list(MainData201602). After that, we can use map2 to apply the map2 function you developed to target and master.

这是一个可能的解决方案。我们的想法是使用适当的名称创建List1和List2。然后创建一个名为target的列表,其中包含List1和List2,以及一个名为master的列表,其中包含列表(MainData201603)和列表(MainData201602)。之后,我们可以使用map2将您开发的map2函数应用于target和master。

List1 <- list(One201603,Two201603)%>%
  map(~rename(.x,"Id"="ID")) %>%
  set_names(c("One201603","Two201603"))

List2 <- list(One201602,Two201602)%>%
  map(~rename(.x,"Id"="ID")) %>%
  set_names(c("One201602","Two201602"))

target <- list(List1, List2)
master <- list(list(MainData201603), list(MainData201602))

map2(target, master, ~map2(.x, .y, semi_join, by = "Id"))
[[1]]
[[1]]$One201603
# A tibble: 4 x 2
     Id Person 
  <dbl> <chr>  
1  6666 Chris  
2  3333 Yuki   
3  1111 Camilla
4  8888 Dreanna

[[1]]$Two201603
# A tibble: 4 x 2
     Id Person
  <dbl> <chr> 
1  6666 Mr.K  
2  8888 Ms.S  
3  1111 Mrs.N 
4  4444 Ms.M  


[[2]]
[[2]]$One201602
# A tibble: 4 x 2
     Id Person 
  <dbl> <chr>  
1  6666 Chris  
2  3333 Yuki   
3  1111 Camilla
4  8888 Dreanna

[[2]]$Two201602
# A tibble: 4 x 2
     Id Person
  <dbl> <chr> 
1  6666 Mr.K  
2  8888 Ms.S  
3  1111 Mrs.N 
4  4444 Ms.M  

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2018/01/28/697f8816c456284fb22e64ec855bcacd.html



 
© 2014-2018 ITdaan.com 粤ICP备14056181号