SQL Server How to loop through specific column names to generate a frequency table in SQL?

k4ymrczo  于 9个月前  发布在  其他
关注(0)|答案(2)|浏览(70)

I am looking for a way to run frequencies for a number of different columns. I'm used to working in SAS and SQL is still new. In SAS I'd run a proc freq; table var1 var2 var3; run; command; however, it appears as though this requires a bit more code in SQL. Because of the number of variables/columns I'm obtaining frequencies for, I really don't want to have to write out the same code over and over again. Eg:

select
    language,
    count(*) as freq
from #pop
group by language
order by language

I've been looking at cursor commands to loop through column names, so I start with generating a new temporary table with all the column names that I'd like to loop through. Here's what I've been able to put together:

drop table if exists #dem_table;
create table #dem_table (dem_vars varchar(50) not null);
insert into #dem_table values ('age_cat'),('race'),('ethnicity'),('language'),('sex'),('orientation'),('income'),('urban'),('poverty'),('children'),('marital_status'),('state');

declare @column varchar(50);

declare cursor_dem cursor for
    select * from #dem_table;

open cursor_dem;
    fetch next from cursor_dem into @column;
        while @@fetch_status = 0
            begin
                select @column, count(*) as freq
                from #pop
                group by @column
                order by @column;
    fetch next from cursor_dem into @column;
    end;
close cursor_dem;
deallocate cursor_dem;

It works until I reach the "group by" and "order by" syntax, and then I receive the following errors:

Each GROUP BY expression must contain at least one column that is not an outer reference. The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.

Wondering if dynamic sql might be helpful? But I can't get it to iterate through columns that way either.

9o685dep

9o685dep1#

This could probably be pared down a bit, or adjusted to accept a specific list of columns if you are interested only in a subset (For instance, you probably don't care about the distribution of a surrogate key, so you could exclude that. Or maybe you want to exclude float columns), but here is a snippet I use when I want to find the distinct values in a table, and the counts thereof.

-- Field-by-Field with loop
drop table if exists #DistVals
create table #DistVals
(
    TableName nvarchar(384),
    OrdinalPosition int,
    Field nvarchar(128),
    Type nvarchar(128),
    Value nvarchar(max),
    Ct int
)
go

declare 
    @TableName nvarchar(128), -- table here
    @SchemaName nvarchar(128) -- schema here
declare @sql nvarchar(max)
declare c cursor local fast_forward for
    select concat('insert into #DistVals (TableName, OrdinalPosition, Field, Type, Value, Ct) select TableName = ', quotename(concat(table_catalog, '.', table_schema, '.', table_name), ''''), ', OrdinalPosition = ', ordinal_position, ', Field = ', quotename(column_name, ''''), ', Type = ', quotename(data_type, ''''), ', Value = convert(nvarchar(max), ', quotename(column_name), '), Ct = count(1) from ', quotename(table_catalog), '.', quotename(table_schema), '.', quotename(table_name), ' group by ', quotename(column_name))
    from information_schema.columns
    where table_name = @TableName
        and table_schema = @SchemaName
        -- and column_name <> 'MySurrogateKey'
open c

fetch next from c into @sql

while @@fetch_status = 0
begin

    exec sp_executesql @sql
    
    fetch next from c into @sql

end

select
    TableName,
    OrdinalPosition,
    Field,
    Type,
    Value,
    Ct,
    TotalValueCt = sum(ct) over (partition by TableName, Field)
from #DistVals
order by TableName, OrdinalPosition, Ct desc

I find this approach the easiest to reason about, and since I'm typically running this in an ad-hoc fashion, I usually don't care that I have to re-hit the table n times for each column.

EDIT If you really want as many result sets as you have columns to analyze, you could replace the insert into #DistVals with a simple select. I don't know your exact use case, but to me, having a single dataset containing all columns is more useful.

That said, if you want to get everything in one go, You could also do that by doing a dynamic unpivot .

-- all at once with dynamic unpivot
declare 
    @TableName nvarchar(128),
    @SchemaName nvarchar(128)

declare 
    @Columns nvarchar(max),
    @ConvertedColumns nvarchar(max),
    @SQL nvarchar(max)

select 
    @Columns = string_agg(quotename(name), ','),
    @ConvertedColumns = string_agg(concat(quotename(name), ' = convert(nvarchar(max), ', quotename(name), ')'), ',')
from sys.columns
where object_id = object_id(@SchemaName + '.' + @TableName)

select @SQL = concat
('
    select
        Name,
        ValueCt = count(1),
        DistinctValueCt = count(distinct Value)
    from
    (
        select ', @ConvertedColumns, '
        from ', quotename(@SchemaName), '.', quotename(@TableName), '
    ) s
    unpivot(Value for Name in(', @Columns, ')) u
    group by name
')

exec sp_executesql @SQL
wi3ka0sx

wi3ka0sx2#

Here is another option leveraging the sys.columns to build the dynamic sql for this. It is a bit simpler than using a dynamic unpivot for something like this.

If you want the distinct counts in a single row per column it would be something like this.

declare @sql nvarchar(max) = 'select '
    , @TableName sysname = 'YourTableName'

select @sql = @sql + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + '), '
from sys.columns c
where object_id = object_id(@TableName)

set @sql = left(@sql, len(@sql) - 1) + ' from ' + QUOTENAME(@TableName)

exec sp_executesql @sql

If however you want individual rows you could do something similar like this.

declare @sql nvarchar(max) = ''
    , @TableName sysname = 'YourTableName'

select @sql = @sql + 'select Name = ''' + c.name + ''', ValueCount = count(' + quotename(c.name) + '), ' + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + ') from ' + quotename(@TableName) + ' union all '
from sys.columns c
where object_id = object_id(@TableName)

set @sql = left(@sql, len(@sql) - 10)

exec sp_executesql @sql

And last but not least if you really do want a new result set for each value as you indicated you can modify to something like this.

declare @sql nvarchar(max) = ''
    , @TableName sysname = 'YourTableName'

select @sql = @sql + 'select Name = ''' + c.name + ''', ValueCount = count(' + quotename(c.name) + '), ' + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + ') from ' + quotename(@TableName) + ';'
from sys.columns c
where object_id = object_id(@TableName)

exec sp_executesql @sql

相关问题