我有6个大数据tsv文件,我正在阅读到谷歌协作的 Dataframe 。但是,文件太大,谷歌协作无法处理它。
#Crew data
downloaded = drive.CreateFile({'id':'16'})
downloaded.GetContentFile('title.crew.tsv')
df_crew = pd.read_csv('title.crew.tsv',header=None,sep='\t',dtype='unicode')
#Ratings data
downloaded = drive.CreateFile({'id':'15'})
downloaded.GetContentFile('title.ratings.tsv')
df_ratings = pd.read_csv('title.ratings.tsv',header=None,sep='\t',dtype='unicode')
#Episode data
downloaded = drive.CreateFile({'id':'14'})
downloaded.GetContentFile('title.episode.tsv')
df_episode = pd.read_csv('title.episode.tsv',header=None,sep='\t',dtype='unicode')
#Name Basics data
downloaded = drive.CreateFile({'id':'13'})
downloaded.GetContentFile('name.basics.tsv')
df_name = pd.read_csv('name.basics.tsv',header=None,sep='\t',dtype='unicode')
#Principals data
downloaded = drive.CreateFile({'id':'12'})
downloaded.GetContentFile('title.pricipals.tsv')
df_principals = pd.read_csv('title.pricipals.tsv',header=None,sep='\t',dtype='unicode')
#Title Basics data
downloaded = drive.CreateFile({'id':'11'})
downloaded.GetContentFile('title.basics.tsv')
df_title = pd.read_csv('title.basics.tsv',header=None,sep='\t',dtype='unicode')
**错误:您的会话在使用所有可用RAM后崩溃。运行时日志显示:**x1c 0d1x
谷歌协作如何更好地处理内存?我所有的tsv文件加起来的大小是2,800 MB。请指教!
3条答案
按热度按时间ttp71kqs1#
The simplest way is to only use data as you use it and delete it from memory. This can be done forcefully by causing the garbage collector to release (see thread here [https://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python]) 1
If you want to expand your RAM in Colab there used to be a hack where you intentionally caused it to run out of RAM and then it'll offer you a higher RAM runtime. This option can also be selected with Colab pro under Runtime -> Change Runtime Type. For $10 a month, Colab pro may very well be a good option for you.
I saw this hack here but in short just append something to an array in a while loop until the RAM is depleted.
azpvetkf2#
如果任何人正在使用任何神经网络模型。在没有google pro帐户的google-colab中提供的RAM大约是12 GB。这可能会导致会话崩溃,因为某些神经模型的资源不足。您可以减少一些数量的训练和测试数据集,并重新检查模型的工作。它可能会工作得很好。
可以打乱数据集并使用少于原始数据集的数据集。
js5cn81o3#
谷歌colab通常提供12 GB的免费内存,但我们可以增加它与支付任何谷歌。
只需编写三行代码,就可以将RAM从12GB增加到25GB
a = [ ]而(1):a.附加('1 ')
试试这个可能会有帮助。