R语言 我如何从基于职业轨迹的个人数据中生成组织间网络?

6rqinv9w  于 2023-06-03  发布在  其他
关注(0)|答案(3)|浏览(232)

我想基于单个数据框架(https://drive.google.com/file/d/1FNfUjySsodqxHGyH1bX-ZDvJzd4GBYqQ/view?usp=sharing)来可视化组织间网络。公司A和B之间存在联系,因为个人在为公司B工作之前在公司A工作(职业轨迹)。你能帮帮我吗?
首先,我想为3个不同的时间段创建3个网络(为每个时间段的数据集子集)

subdata_period1 <- org[, c("Name", paste0("Carrier.", 1:45), paste0("Year.", 1:45))]

subdata_period2 <- org[, c("Name", paste0("Carrier.", 1:45), paste0("Year.", 1:45))]

subdata_period3 <- org[, c("Name", paste0("Carrier.", 1:45), paste0("Year.", 1:45))]

我根据年份值过滤子数据集

subdata_period1 <- subdata_period1[apply(subdata_period1[-(1:45), -(1:45)], 1, function(x) any(x >= 1958 & x <= 1978)), ]

subdata_period2 <- subdata_period2[apply(subdata_period2[-(1:45), -(1:45)], 1, function(x) any(x >= 1979 & x <= 1988)), ]

subdata_period3 <- subdata_period3[apply(subdata_period3[-(1:45), -(1:45)], 1, function(x) any(x >= 1989 & x <= 2010)), ]

我只用第一个句号来说明问题:

names1 <- na.omit(subdata_period1$Name)

carriers1 <- subdata_period1[, grep("^Carrier.", names(subdata_period1))]

首先,我尝试创建一个空的邻接矩阵

adj_matrix1 <- matrix(0, nrow = length(carriers1), ncol = length(carriers1))

其次,我迭代每对节点,检查它们是否共享一个载波

for (i in 1:(length(carriers1) - 1)) {
  
  for (j in (i + 1):length(carriers1)) {
    
    # Check if the nodes share at least one carrier
    
    if (sum(names1[i] %in% names1[j]) > 0) {
      
      adj_matrix1[i, j] <- 1
      
      adj_matrix1[j, i] <- 1
      
    }
    
  }
  
}

最后,我尝试从邻接矩阵创建图

graph1 <- graph_from_adjacency_matrix(adj_matrix1, mode = "undirected")

我尝试群集我的网络:

cluster1 = c("Natixis", "CIC","HSBC France", "BPCE group", "Banque Hervet", "Credit Lyonnais", "BNP Paribas", "Societe General", "Investment banks", "Other banks")

cluster2 = c("Retail", "Media", "Technology", "Heavy industry", "energy sector", "real estate sector", "consulting", "pharmaceutical industry", "other sectors")

V(graph1)$cluster <- ifelse(V(graph1)$carriers1 %in% cluster1, "Banks", ifelse(V(graph1)$carriers1 %in% cluster2, "Private sector", "Public administration"))

colors = c("Banks"= "red", "Private sector" = "blue", "Public administration" = "gold")

V(graph1)$color <- colors[V(graph1)$cluster]

最后,我使用ggraph来可视化第一个图:

p1 <- ggraph(graph1, layout = "mds") +
  
  geom_node_point(aes(size = size)) +
  
  geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) +
  
  theme_void() +
  
  scale_edge_width(range = c(0.5, 5.5)) +
  
  scale_edge_alpha(range = c(0.2, 5)) +
  
  theme(legend.position = "none") +
  
  labs(title = "First period: 1958-1978")

p1

p1 <- ggraph(graph1, layout = "mds") +
  geom_node_point(aes(size = size, label = label) +
                    geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) +
                    theme_void() +
                    scale_edge_width(range = c(0.5, 5.5)) +
                    scale_edge_alpha(range = c(0.2, 5)) +
                    theme(legend.position = "none") +
                    labs(title = "First period: 1958-1978")

非常感谢你的帮助!
我想基于单个数据框架(https://drive.google.com/file/d/1FNfUjySsodqxHGyH1bX-ZDvJzd4GBYqQ/view?usp=sharing)来可视化组织间网络。公司A和公司B之间存在联系,因为个人在为公司B工作之前在公司A工作。但这不管用。

ukxgm1gy

ukxgm1gy1#

你还没有回答我的问题,所以我假设如果数据包含两个或更多具有相同名称的条目,那么这些条目将被合并为一个。
算法:

LOOP over lines in input
    CLEAR career vector
    PARSE line into tokens separated by ","
    LOOP over names in tokens
         APPEND unique names to career vector
    LOOP N1 over names in career vector except last
        LOOP N2 over name in career vector that follow N1
             STORE LINK N1->N2

这里是输出输出的前几行

ACHARD Pierre Claude Paul
no links

ANSART Bernard Pierre Marie
Cabinets -> Heavy industry
Cabinets -> Retail
Heavy industry -> Retail
Ministry of Economy -> Cabinets
Ministry of Economy -> Heavy industry
Ministry of Economy -> Retail

ARMAND Loic Marie

ARSAC Francois Paul Marie Joseph
 Caisse des d‚p“ts et consignations -> BPCE group
 Caisse des d‚p“ts et consignations -> Cr‚dit Lyonnais
 Caisse des d‚p“ts et consignations -> HSBC France
"the Ministry of Labour, Employment and Population" ->  Caisse des d‚p“ts et consignations
"the Ministry of Labour, Employment and Population" -> BPCE group
"the Ministry of Labour, Employment and Population" -> Cr‚dit Lyonnais
"the Ministry of Labour, Employment and Population" -> HSBC France
BPCE group -> Cr‚dit Lyonnais
HSBC France ->  Caisse des d‚p“ts et consignations
HSBC France -> BPCE group
HSBC France -> Cr‚dit Lyonnais

ASSELINEAU Francois Didier
Natixis -> Ministry of Economy

AUBE-MARTIN Philippe Marc Andr‚
Investment banks -> Real estate sector
Ministry of Economy -> Investment banks
Ministry of Economy -> Real estate sector

您可以在https://github.com/JamesBremner/org-graph/blob/main/bin/links.txt上找到完整的输出
我不懂R,但看看实现这个的C++代码可能会有帮助:

void read(const std::string& fname )
{
    std::ifstream ifs( fname );
    if( ! ifs.is_open() )
        throw std::runtime_error(
            "Cannot open input"        );
    std::string line;
    getline(ifs,line );
    while( getline(ifs,line ))
    {
        auto v = ParseCSV(line);
        std::vector<std::string> career;
        std::cout << v[0] << "\n";
        for( int col = 1; col < v.size(); col+= 2) {
            std::string firm(v[col]);
            if( firm.empty() )
                continue;
            if( firm == "N / A")
                continue;
            career.push_back( firm );
        }
        if( career.size() < 2 ) {
            std::cout << "no links\n\n";
            continue;
        }
        std::set<std::pair<std::string,std::string>> setlink;
        for( int i1 = 0; i1 < career.size()-1; i1++ )
            for( int i2 = i1+1; i2 < career.size(); i2++ )
            {
                if( career[i1] != career[i2])
                    setlink.insert(std::make_pair(career[i1],career[i2]));
            }
        for( auto& l : setlink )
            std::cout << l.first <<" -> "<< l.second << "\n";
        std::cout << "\n";
    }
}

完整的应用程序代码可以在https://github.com/JamesBremner/org-graph上找到

smdncfj3

smdncfj32#

抱歉之前没有回答,我尝试了不同的功能没有任何结果...非常感谢您的回答和您的代码,我会尝试它,我回来,如果我有另一个问题,再次非常感谢!

0qx6xfy6

0qx6xfy63#

最后,我找到了一个解决方案(手动!)但目前我的问题是矩阵代码。我生成一个矩阵(值= 0),并尝试在下面的循环中将0重新编码为1。这个密码不起作用...你能帮帮我吗?

# Get unique organization names from the Data_Name column

organizations <- unique(all_connected_names1$Data_Name)

# Create an empty adjacency matrix with dimensions based on the number of organizations

adjacency_matrix <- matrix(0, nrow = length(organizations), ncol = length(organizations))
rownames(adjacency_matrix) <- as.character(organizations)
colnames(adjacency_matrix) <- as.character(organizations)

# Iterate over the rows of the combined dataset

for (i in 1:length(all_connected_names1)) {
  row_organizations <- (all_connected_names1$Data_Name)[i]
  connected_organizations <- (all_connected_names1$Name)[i]

# Iterate over the connected organizations in each row and set the corresponding elements in the adjacency matrix to 1

for (j in 1:length(row_organizations)) {
    for (k in 1:length(connected_organizations)) {
      if (row_organizations[j] != row_organizations[k]) {
       adjacency_matrix[row_organizations[j], connected_organizations[k]] <- 1
        adjacency_matrix[connected_organizations[k], row_organizations[j]] <- 1
      }
    }
  }
}

谢谢!

相关问题