BigQuery SQLの勘所 Ⅳ サブクエリ

2023年10月30日 2023年10月30日 14分9秒

サブクエリを使えばSQLの自由度が一段、いや二段上がる。サブクエリで一連のクエリに関わる操作を行い、親クエリはそれを利用し必要な結果を取得する。

基本的な構造

select
user_id,

sum(quantity) as qty_by_user,

(select sum(quantity) from data.pod ) as total_qty,
*サブクエリは（）太文字内にselect from 構造がある。asで別名設定可

round(sum(quantity)/(select sum(quantity) from data.pod ) *100,1 )as wariai
*（）内太文字はサブクエリ。

from data.pos
group by user_id
*イタリック部分のsum() は親クエリで処理。 group by user_id の範囲でuser_id別にデータが加算される。

サブクエリ例

１
select
order_id,
quantity,
(select avg(quantity) from data.pod ) as average,

from data.pod
where quantity > (select avg(quantity) from data.pod )
group by order_id,quantity
order by quantity desc;
*avg(quantity) は　sum(quantity)/ count(*)　でも同じ。
*whereでも使用できる。

２
order_id,
user_id,
product_id,
quantity,
(select sum(quantity) as goukei , from data.pod) as goukei,
from
data.pod
order by order_id
*サブクエリ内の集計関数を親クエリは感知しない。

３
select
date_trunc(date,month) as year_month,
sum(sales_amount) sales_by_month,
(select sum(sales_amount) from data.purchases) tt_amount,
round(sum(sales_amount) / (select sum(sales_amount) from data.purchases)*100 ,0) as share
from
data.purchases
group by year_month
order by year_month
*date_trunc(date,month)でdateを月ベースにしている。
* サブクエリ内の集計関数を親クエリは感知しない。

４
select
user_id,
concat(last_name,' ',first_name) as full_name,
birthday,
(select round (avg(date_diff('2020-12-31', birthday, year) )) from data.customers )as avg_age,
(select round (stddev_pop(date_diff('2020-12-31', birthday, year) )) from data.customers)as std_age,
date_diff('2020-12-31', birthday, year) as age,from
data.customers
group by user_id,full_name,birthday,age
order by age desc

*stddev_pop()。母集団標準偏差取得関数。

５
select
(select count(user_id) fromdata.customers) as zentai,
count(user_id) as gyousuu,
count(user_id) / (select count(user_id) from data.customers) as wariai
from
data.customers

*設定されている列（zentai、gyousuu、wariai）が全て集計関数で計算された数字。group by 不要。1行でデータが並ぶ。

６
select
user_id,
sum(sales_amount) as goukei
from
data.shop_purchases
where user_id in
(select user_id from data.customers where gender= 2)
group by user_id
order by 1

*親クエリでの絞り込み：where user_id in
* user_id 指定。サブクエリ内も user_id。
*サブクエリでの絞り込み： where gender= 2

７
select
user_id,
concat(last_name,' ',first_name) as fullname,
prefecture,
birthday
from
data.customers
where user_id in (select user_id from data.shop_purchases where product_id = 10)
and prefecture = 'Osaka'
and gender = 1
order by birthday asc

*サブクエリ内　shop_purchase デーブル　where で購入商品特定。

８
select
avg(product_num) as las_num
from
(select count(distinct product_id) as product_num from data.pod
group by date)

*サブクエリ内 group by date 指定されてるも、列にはdateが無い。このような設定も有り。可視化されていないが、date 別で product_num が表示されている。それを親クエリで平均している。 product_num ÷ date数。

９
select
max(pv) as_mpv,
min(pv) as_minpv,
avg(pv) as_avpv,
stddev_pop(pv) as_st,
from(select sum(pageview) as pv from `data.web` group by date_trunc(timestamp,month))

*サブクエリ内 date_trunc(timestamp,month)) 指定。月単位でtime_stampをまとめている。pvは月別にまめられた状態。親クエリではpvの最大、最少、平均、母集団平均偏差各値を列表示している。

１０
shop_id,
avg(sa_day) avg_sa_day
from
(select date,shop_id, sum(sales_amount) as sa_day from data.shop_purchases group by date,shop_id)
group by shop_id
order by avg_sa_day desc

*サブクエリ内 date,shop_id 列は group by date,shop_id で対応。
*サブクエリ内にはsa_day （日別店別売上）。　
* 親クエリではのshop_id別　（ group by shop_id ）　で売上平均（店別売上合計÷日数）を表示している。

１１
select
*
from
(
select
user_id,
order_id,
quantity,
rank()
over
(
partition by user_id
order by quantity desc
) as rank
from
data.pod
)
where rank <= 5
order by user_id,rank

* サブクエリで親クエリのfrom以下を形成している。rank関数を使用し user_id 別で quantity の多い順に「順番」の列を作成。最終的にはサブクエリで作成したテーブル全列、5位以内 user_id,rank 昇順でデータ表示させている。
集計関数は使われていない。

１２
select
sm.category as ct,
sum(id_q) as kosuu
from
(select product_id,
sum (quantity) as id_q from data.pod
group by product_id
having sum(quantity) > 12) as id_qt
left join data.shohin_master as sm
using(product_id)
group by sm.category

*サブクエリで作成したテーブル　 id_qt 　と　既存テーブル　 data.shohin_master （別名 sm）をleft join させている。
*サブクエリ内集合関数 　 sum(quantity) 　で絞り込みでhaving を使用。
*サブクエリ内 product_id 別で quantity 集計後（別名　 id_q ）、今度はカテゴリー別集計（別名　 kosuu ）している。 group by はカテゴリ。

１３
select
*
from
(
select
cu.prefecture,
pu.purchase_id,
pu.sales_amount,
rank()
over(
partition by cu.prefecture
order by pu.sales_amount desc
) as rank
from
data.shop_purchases as pu
left join
data.customers as cu
using(user_id)
)
where prefecture is not null and rank <=5
order by prefecture,rank

*サブクエリ内でテーブル結合を行っている。各テーブルから必要な列をセット。
集計関数は使われていない。

１４
select
*
from(
select
prefecture,
product_id,
ttl_sa,
rank()
over
(
partition by prefecture
order by ttl_sa desc
) as rank
from (
select
cu.prefecture,
pu.product_id,
sum(pu.sales_amount) as ttl_sa,
from data.shop_purchases as pu
left join data.customers as cu
using(user_id)
group by prefecture,product_id
)
)
where prefecture is not null and rank <= 5
order by 1,4

*サブクエリ内が複雑になってきた。select from 構造が2つある。
先に下記列を持つデーブルを作成。
cu.prefecture,
pu.product_id,
sum(pu.sales_amount) as ttl_sa,
続いて下記列を持つテーブルを作成し、最終的にデータ表示させている。
prefecture,
product_id,
ttl_sa,
rank
rank()関数は
partition by prefecture　
にフォーカスして
order by ttl_sa desc
で順位を設定している。
集計関数はサブクエリのみ。

１５
with egg as (
select
date_trunc(date,month) as month,
shop_id,

--月・shop id別sum
sum(sales_amount) as tt_am

from data.shop_purchases
group by month,shop_id
)
select
egg.month,
sm.chief_name,

--cheif別sum
sum(tt_am) as c_tt_am

from data.shops_master as sm
join egg
using(shop_id)
group by month,chief_name
order by chief_name,sum(tt_am) desc

*サブクエリの変形。with 別名　as ()が使われている。
サブクエリ内で 月・shop id別の　売上sum　が集計されている。
*SQL内で　--　から記述するとコメントとみなされる。
*親クエリでは既存テーブル　 data.shops_master as sm 　とサブクエリ　　*egg とのテーブルが連結されている。
最終的には　 cheif別　売上sum で集計されている。

参考：BigQuery で学ぶ非エンジニアのための SQL データ分析入門

Google アナリティクス4 プロパティー (GA4)基礎講座

 誰もが知るあのサイトのGA4設定、GTM設定、BigQuery設定はアジアスターズがやっているかも？

無料相談お問合せ