我有这个细节来生成一个数据集Facebook假新闻检测:
Profile parameters (Fake and legitimate): Total number of information
related to multiple profiles (Facebook)
Total profiles: 5026
Total news collected: 15328
Total number of post: 42256893
Total number followers: 3685643
Total number of followings: 2354782
Total number of likes by the user: 16236669
Total umber of listed count: 67 976
Total URL’s shared: 2609
我想生成一个数据集我的python代码是:
dataset = pd.DataFrame(
columns=['profileid', 'profilename', 'dateofjoin', 'allfriends', 'profilepicture', 'numberofgroupjoins',
'numberofpagelikes', 'newspost', 'profilewithphotoguard', 'numberofsharedstories', 'numberoffollowers',
'numberofevents', 'numberofsharedposts (image, text, video)', 'numberofurlshared', 'numberoftags',
'numberofhashtag', 'numberofnewlyaddedfriends', 'recentpostlikedorshared', 'currentlocation',
'messageswithspamwords', 'source', 'headline', 'bodytext', 'text', 'images (with text or with hyperlink)',
'videos', 'linguisticsbased (chapter, word, sentence, document, quoted word, external link, etc.)',
'StatisticalFeatures (count, ImageRatio, MultiImageRatio, HotImageRatio, ShortImageRatio)',
'Images (ClaritySource, Coherence, SimilarityDistribution, DiversitySource, ClusteringScore)', 'PostDate'])
# Populate the dataframe with random values
for i in range(15000):
dataset.loc[i] = [np.random.randint(0, high=5000), # profile id
'User' + str(np.random.randint(0, high=5000)), # profile name
np.random.randint(0, high=999999999), # date of join
np.random.randint(0, high=5000), # all friends
np.random.randint(0, high=2), # profile picture
np.random.randint(0, high=1000), # number of group joins
np.random.randint(0, high=1000), # number of page likes
np.random.randint(0, high=1000), # news post
np.random.randint(0, high=2), # profile with photo guard
np.random.randint(0, high=1000), # number of stories shared
np.random.randint(0, high=1000), # number of following
np.random.randint(0, high=1000), # number of events
np.random.randint(0, high=1000), # number of shared posts (image, text, video)
np.random.randint(0, high=1000), # number of URL shared
np.random.randint(0, high=1000), # number of tags
np.random.randint(0, high=1000), # number of hashtags
np.random.randint(0, high=1000), # number of newly added friends
np.random.randint(0, high=2), # recent post liked or shared
np.random.randint(0, high=2), # current location
np.random.randint(0, high=1000), # messages with spam words
np.random.randint(0, high=2), # source
np.random.randint(0, high=1000), # headline
np.random.randint(0, high=1000), # body text
np.random.randint(0, high=1), # text
np.random.randint(0, high=1), # images (with text or with hyperlink)
np.random.randint(0, high=1), # videos
np.random.randint(0, high=999999999), # linguistics based (chapter, word, sentence, document, quoted word, external link, etc.)
np.random.randint(0, high=999999999), # StatisticalFeatures (count, ImageRatio, MultiImageRatio, HotImageRatio, ShortImageRatio)
np.random.randint(0, high=1), # Images (ClaritySource, Coherence, SimilarityDistribution, DiversitySource, ClusteringScore)
np.random.randint(0, high=999999999)] # PostDate
1-我不知道如何把一个范围日期之间的前;1970年至2023年的日期加入功能和邮政日期可以任何人帮助我吗?2-另一个问题剂量的数字,为所有功能,它似乎是正确的或没有任何想法?
1条答案
按热度按时间mwngjboj1#
我想你可以用随机数和日期函数来实现它。