pandas 使用python的假新闻数据集

rjjhvcjd  于 2023-02-02  发布在  Python
关注(0)|答案(1)|浏览(145)

我有这个细节来生成一个数据集Facebook假新闻检测:

Profile parameters (Fake and legitimate): Total number of information 
related to multiple profiles (Facebook)
Total profiles: 5026
Total news collected: 15328
Total number of post: 42256893
Total number followers: 3685643
Total number of followings: 2354782
Total number of likes by the user: 16236669
Total umber of listed count: 67 976
Total URL’s shared: 2609

我想生成一个数据集我的python代码是:

dataset = pd.DataFrame(
columns=['profileid', 'profilename', 'dateofjoin', 'allfriends', 'profilepicture', 'numberofgroupjoins',
         'numberofpagelikes', 'newspost', 'profilewithphotoguard', 'numberofsharedstories', 'numberoffollowers',
         'numberofevents', 'numberofsharedposts (image, text, video)', 'numberofurlshared', 'numberoftags',
         'numberofhashtag', 'numberofnewlyaddedfriends', 'recentpostlikedorshared', 'currentlocation',
         'messageswithspamwords', 'source', 'headline', 'bodytext', 'text', 'images (with text or with hyperlink)',
         'videos', 'linguisticsbased (chapter, word, sentence, document, quoted word, external link, etc.)',
         'StatisticalFeatures (count, ImageRatio, MultiImageRatio, HotImageRatio, ShortImageRatio)',
         'Images (ClaritySource, Coherence, SimilarityDistribution, DiversitySource, ClusteringScore)', 'PostDate'])

# Populate the dataframe with random values

for i in range(15000):
dataset.loc[i] = [np.random.randint(0, high=5000),  # profile id
                  'User' + str(np.random.randint(0, high=5000)),  # profile name
                  np.random.randint(0, high=999999999),  # date of join
                  np.random.randint(0, high=5000),  # all friends
                  np.random.randint(0, high=2),  # profile picture
                  np.random.randint(0, high=1000),  # number of group joins
                  np.random.randint(0, high=1000),  # number of page likes
                  np.random.randint(0, high=1000),  # news post
                  np.random.randint(0, high=2),  # profile with photo guard
                  np.random.randint(0, high=1000),  # number of stories shared
                  np.random.randint(0, high=1000),  # number of following
                  np.random.randint(0, high=1000),  # number of events
                  np.random.randint(0, high=1000),  # number of shared posts (image, text, video)
                  np.random.randint(0, high=1000),  # number of URL shared
                  np.random.randint(0, high=1000),  # number of tags
                  np.random.randint(0, high=1000),  # number of hashtags
                  np.random.randint(0, high=1000),  # number of newly added friends
                  np.random.randint(0, high=2),  # recent post liked or shared
                  np.random.randint(0, high=2),  # current location
                  np.random.randint(0, high=1000),  # messages with spam words
                  np.random.randint(0, high=2),  # source
                  np.random.randint(0, high=1000),  # headline
                  np.random.randint(0, high=1000),  # body text
                  np.random.randint(0, high=1),  # text
                  np.random.randint(0, high=1),  # images (with text or with hyperlink)
                  np.random.randint(0, high=1),  # videos
                  np.random.randint(0, high=999999999),  # linguistics based (chapter, word, sentence, document, quoted word, external link, etc.)
                  np.random.randint(0, high=999999999),  # StatisticalFeatures (count, ImageRatio, MultiImageRatio, HotImageRatio, ShortImageRatio)
                  np.random.randint(0, high=1),  # Images (ClaritySource, Coherence, SimilarityDistribution, DiversitySource, ClusteringScore)
                  np.random.randint(0, high=999999999)]  # PostDate

1-我不知道如何把一个范围日期之间的前;1970年至2023年的日期加入功能和邮政日期可以任何人帮助我吗?2-另一个问题剂量的数字,为所有功能,它似乎是正确的或没有任何想法?

mwngjboj

mwngjboj1#

我想你可以用随机数和日期函数来实现它。

start_date = date(1970...)
end_date = date(2023...)

days_delta = (end_date - start_date).days
random_days_to_append = random.randrange(days_delta)

random_date = start_date + datime.timedelta(random_days_to_append)

相关问题