postgresql 要从pgAdmin中的S3存储桶导入的动态查询

gorkyyrv  于 2022-12-12  发布在  PostgreSQL
关注(0)|答案(1)|浏览(131)

I am writing a stored procedure in pgAdmin (RDS instance on AWS) to dynamically import csv data from S3 bucket. I have all the necessary permissions and extensions created. I am passing the path within S3 bucket as a parameter for stored proc but query is treating the parameter as text as it is enclose in ''.
Sample query is as below:

CREATE OR REPLACE PROCEDURE public.sp_import(
    _folderpath character varying,
    _dbname character varying)
LANGUAGE 'plpgsql'
AS $BODY$
declare 
__path varchar(500);
__sql varchar(65000);
__region varchar(100);
__s3bucket varchar(100);
__tbltoload varchar(100);
__delimit varchar(100);
__query varchar (65000);

BEGIN
__tbltoload:='''public.table_name''';
__delimit := '(DELIMITER ''|'', NULL '''')';
__s3bucket = 'samplebucket';
__region := 'sample-region-1';
__path := _folderpath||'/'||_dbname||'/'||'folder1'||'/'||'000';
--__fullpath := '(||__s3bucket||','||__path||','||__region||')'||'';
__sql:='SELECT aws_s3.table_import_from_s3';
__query := __sql || (__tbltoload,'''''','(DELIMITER ''''|'''', NULL '''''''')',(__s3bucket,__path,__region));
execute __query;

END;
$BODY$;

Error I am receiving is
ERROR: column "(DELIMITER ''|'', NULL '''')" does not exist
Below is the formation of the __query
SELECT aws_s3.table_import_from_s3('public.table1','',"(DELIMITER ''|'', NULL '''')","(samplebucket,_folderpath/_dbname/folder1/000,sample-region-1)");
Expected __query is :
SELECT aws_s3.table_import_from_s3('public.table1','','(DELIMITER ''|'', NULL '''')','(samplebucket,_folderpath/_dbname/folder1/000,sample-region-1)');
However, I am able to import the data when the path is hard coded. the quotes "" is not allowing it to execute as they are being treated as columns , instead "" needs to be replaced with '' in order to form the actual query to execute.

lnlaulya

lnlaulya1#

Your fourth parameter to table_import_from_s3 is the literal string '(S3BucketName,__path,awsregion)' . I'm not sure which overload of table_import_from_s3 you want to call, but you either need to supply an actual s3 uri, or you should supply the bucket name, path, and region separately.
So assuming S3BucketName and awsregion are both variables that contain valid values, do either

aws_s3.table_import_from_s3(
   'public.table',
   '',
   '(DELIMITER ''|'', NULL '''')',
   aws_commons.create_s3_uri(S3BucketName, __path, awsregion)
)

or

aws_s3.table_import_from_s3(
   'public.table',
   '',
   '(DELIMITER ''|'', NULL '''')',
   S3BucketName,
   __path,
   awsregion
)

See https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Procedural.Importing.html
For the formatting issue, I would suggest you use a function like FORMAT to make things a bit clearer, as well as using dollar quotes in order to not go crazy with all the double/quadruple quoting. Just showing the bits from your function I changed:

-- added more double quotes here
__delimit := '(DELIMITER ''''|'''', NULL '''''''')';
-- by using dollar quoting here, everything looks much more normal
__query := FORMAT(
    $query$SELECT aws_s3.table_import_from_s3('%s', '', '%s', '%s', '%s', '%s')$query$,
    __tbltoload, __delimit, __s3bucket, __path, __region
);

And a RAISE NOTICE on __query returns this: SELECT aws_s3.table_import_from_s3('public.table_name', '', '(DELIMITER ''|'', NULL '''')', 'samplebucket', 'myfolder/mydb/folder1/000', 'sample-region-1')
Note I put the last 3 values as separate params rather than putting them in () because I think that's the version of the function you want, but you can always put those things back in, in the format string.

相关问题