I am looking for a way to run frequencies for a number of different columns. I'm used to working in SAS and SQL is still new. In SAS I'd run a proc freq; table var1 var2 var3; run;
command; however, it appears as though this requires a bit more code in SQL. Because of the number of variables/columns I'm obtaining frequencies for, I really don't want to have to write out the same code over and over again. Eg:
select
language,
count(*) as freq
from #pop
group by language
order by language
I've been looking at cursor commands to loop through column names, so I start with generating a new temporary table with all the column names that I'd like to loop through. Here's what I've been able to put together:
drop table if exists #dem_table;
create table #dem_table (dem_vars varchar(50) not null);
insert into #dem_table values ('age_cat'),('race'),('ethnicity'),('language'),('sex'),('orientation'),('income'),('urban'),('poverty'),('children'),('marital_status'),('state');
declare @column varchar(50);
declare cursor_dem cursor for
select * from #dem_table;
open cursor_dem;
fetch next from cursor_dem into @column;
while @@fetch_status = 0
begin
select @column, count(*) as freq
from #pop
group by @column
order by @column;
fetch next from cursor_dem into @column;
end;
close cursor_dem;
deallocate cursor_dem;
It works until I reach the "group by" and "order by" syntax, and then I receive the following errors:
Each GROUP BY expression must contain at least one column that is not an outer reference. The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.
Wondering if dynamic sql might be helpful? But I can't get it to iterate through columns that way either.
2条答案
按热度按时间9o685dep1#
This could probably be pared down a bit, or adjusted to accept a specific list of columns if you are interested only in a subset (For instance, you probably don't care about the distribution of a surrogate key, so you could exclude that. Or maybe you want to exclude
float
columns), but here is a snippet I use when I want to find the distinct values in a table, and the counts thereof.I find this approach the easiest to reason about, and since I'm typically running this in an ad-hoc fashion, I usually don't care that I have to re-hit the table n times for each column.
EDIT If you really want as many result sets as you have columns to analyze, you could replace the insert into
#DistVals
with a simple select. I don't know your exact use case, but to me, having a single dataset containing all columns is more useful.That said, if you want to get everything in one go, You could also do that by doing a dynamic
unpivot
.wi3ka0sx2#
Here is another option leveraging the sys.columns to build the dynamic sql for this. It is a bit simpler than using a dynamic unpivot for something like this.
If you want the distinct counts in a single row per column it would be something like this.
If however you want individual rows you could do something similar like this.
And last but not least if you really do want a new result set for each value as you indicated you can modify to something like this.