Skip to content

boslbosl/bigquery_style_guide

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 

Repository files navigation

BigQuery Style Guide

โ–  Written by Jaeseok Park (jaeseok.park@gmail.com)

๋ชฉ์ฐจ (Table of Contents)


์ผ๋ฐ˜ ๊ฐ€์ด๋“œ (General Guide)

์ด ๋ฌธ์„œ์—์„œ๋Š” ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ์—”์ง€๋‹ˆ์–ด(data engineer), ๋ฐ์ดํ„ฐ ๋ถ„์„๊ฐ€(data analyst)์™€ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž(data scientist)๊ฐ€ BigQuery ์–ธ์–ด๋กœ ์ฝ”๋“œ ์ž‘์„ฑ์‹œ์˜ ์ง€์นจ(guide)๊ณผ ์Šคํƒ€์ผ(style)์„ ์ œ์‹œํ•˜๊ณ  ์žˆ๋‹ค.

ํ•ต์‹ฌ์›์น™ (core principle)

  • ์ฝ๊ธฐ ์‰ฝ๊ณ  ์œ ์ง€๋ณด์ˆ˜๊ฐ€ ์šฉ์ดํ•œ ์ฝ”๋“œ๋ฅผ ์ตœ์šฐ์„ ์œผ๋กœ ํ•œ๋‹ค.
  • ๋ฌธ์žฅ ๊ตฌ์กฐ์™€ ์ฝ”๋“œ ์Šคํƒ€์ผ์˜ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜์—ฌ ๊ฐ€๋…์„ฑ์ด ๋†’์€ ์ฝ”๋“œ๊ฐ€ ๋˜๋„๋ก ํ•œ๋‹ค.
  • ์ค‘์–ธ๋ถ€์–ธ์ด๋‚˜ ๊ตฐ๋”๋”๊ธฐ ์—†๋Š” ๊ฐ„๊ฒฐํ•œ ์ฝ”๋“œ๊ฐ€ ๋˜๋„๋ก ํž˜์“ด๋‹ค. (reduce duplication and redundancy to make your code succinct)
  • ์ฝ”๋“œ๋งŒ์œผ๋กœ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋งฅ๋ฝ์€ ์ฃผ์„์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ดํ•ด๋ฅผ ๋•๋Š”๋‹ค.

๊ณตํ†ต ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ (common style guide)

  • ์ค„๋ฐ”๊ฟˆ๊ณผ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ์ ์ ˆํžˆ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์žฅ์ด ์˜๋ฏธ ๋‹จ์œ„๋กœ ์ฝํž ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ ๋‹ค.
  • ๋“ค์—ฌ์“ฐ๊ธฐ๋Š” ํƒญ์ด ์•„๋‹Œ ๊ณต๋ฐฑ์„ ์ด์šฉํ•˜๋ฉฐ 2์นธ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.
  • ๋ฌธ์žฅ(statement)์˜ ๊ฐ ์ ˆ(clause)์€ ์ƒˆ๋กœ์šด ๋ผ์ธ์—์„œ ์‹œ์ž‘ํ•˜๋ฉฐ ๊ฐ ์ ˆ์˜ ์‹œ์ž‘ ํ‚ค์›Œ๋“œ๊ฐ€ ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์ด ๋˜๋„๋ก ํ•œ๋‹ค.
    • DDL(Data Definition Language) ๋ฌธ์€ ์˜ˆ์™ธ์ ์œผ๋กœ ์™ผ์ชฝ์œผ๋กœ ์ •๋ ฌ์‹œํ‚จ๋‹ค.
  • ์ค„๋ฐ”๊ฟˆ์ด ์ง€๋‚˜์ณ ์ข๊ณ  ๊ธธ์ญ‰ํ•œ (skinny) ๊ตฌ์กฐ๊ฐ€ ๋˜๋Š” ๊ฒฝ์šฐ ์—ฐ๊ด€๋œ ์ฝ”๋“œ ๋ญ‰์น˜๋ฅผ ํ•˜๋‚˜์˜ ๋ผ์ธ์œผ๋กœ ํ•ฉ์น  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ฐ˜๋Œ€๋กœ ํ•œ ๋ผ์ธ์˜ ๊ธธ์ด๊ฐ€ 80 ๊ธ€์ž ๋‚ด์™ธ์˜ ์‹œ์•ผ ๋ฒ”์œ„๋ฅผ ๋„˜์–ด์„œ๋Š” ๊ฒฝ์šฐ ๊ฐ€๋กœ๋กœ ์ง€๋‚˜์น˜๊ฒŒ ๊ธธ์–ด์ง€์ง€ (flat and wide) ์•Š๋„๋ก ์ค„๋ฐ”๊ฟˆ์„ ํ†ตํ•ด ์ ์ ˆํ•œ ํญ์„ ์œ ์ง€ํ•œ๋‹ค.
  • ์–ธ์–ด์—์„œ ์ œ๊ณตํ•˜๋Š” ์žฅ์น˜๋“ค์„ ์ด์šฉํ•˜์—ฌ ๋ฐ˜๋ณต์„ ์ตœ์†Œํ™”ํ•œ๋‹ค. ๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹ CTE(Common Table Expression)๊ณผ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ UDF(User Defined Function) ๋“ฑ์€ ์ฝ”๋“œ์˜ ๋ชจ๋“ˆํ™”๋ฅผ ๋•๋Š” ์žฅ์น˜๋“ค๋กœ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๋Š”๋ฐ ์œ ์šฉํ•˜๋‹ค.

์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ์ ์šฉํ•œ ์˜ˆ์‹œ (examples)

SQL ๋ฌธ์žฅ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์„ธ๋ถ„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฐ ๋ฌธ์žฅ์— ๋Œ€ํ•ด ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ์ ์šฉํ•œ ์˜ˆ์‹œ๋“ค์„ ๋ช‡๊ฐ€์ง€ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž.

  • SEL statement - SELECT
  • DDL(Data Definition Language) statement - CREATE, ALTER, DROP
  • DML(Data Manipulation Language) statement - INSERT, UPDATE, DELETE, TRUNCATE, MERGE
  • DCL(Data Constraint Language) statement - BigQuery์—์„œ๋Š” Data Constraint Language ๊ตฌ๋ฌธ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค.

SELECT๋Š” DML์˜ ํ•œ ์ข…๋ฅ˜์ด์ง€๋งŒ ๋‹ค๋ฅธ DML ๋ฌธ๊ณผ ๋‹ฌ๋ฆฌ ํ…Œ์ด๋ธ”์˜ ๋‚ด์šฉ์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์—์„œ ๊ตฌ๋ถ„๋˜๊ธฐ๋„ ํ•œ๋‹ค.

  • SELECT statement ์˜ˆ์‹œ
/* ๊ฐ ์ ˆ์˜ ์‹œ์ž‘ ํ‚ค์›Œ๋“œ์ธ SELECT, FROM, LEFT, WHERE, GROUP ๋“ฑ์„ ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ */
SELECT station_id,
       name,
       status,
       latitude, longitude,  -- ์—ฐ๊ด€ ์ปฌ๋Ÿผ๋“ค์„ ๊ฐ™์€ ๋ผ์ธ์— ๋‚˜์—ดํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅ
       ST_DISTANCE(
         ST_GEOGPOINT(longitude, latitude), ST_GEOGPOINT(-0.118092, 51.509865)
       ) AS distance_from_city_centre,
       COUNT(1) AS cnt,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations` s
  LEFT JOIN `bigquery-public-data.austin_bikeshare.bikeshare_trips` t
    ON s.station_id = t.start_station_id
 WHERE s.status IN ('active')
   AND s.latitude > 15.0
 GROUP BY 1, 2, 3, 4, 5
HAVING cnt > 1
 LIMIT 1000
;
  • DDL statement ์˜ˆ์‹œ

DDL๋ฌธ์€ SEL ๋ฌธ์žฅ๊ณผ ๋‹ฌ๋ฆฌ ์‹œ์ž‘ ํ‚ค์›Œ๋“œ๋ฅผ ์™ผ์ชฝ์œผ๋กœ ์ •๋ ฌ์„ ์‹œํ‚จ๋‹ค.

-- https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#creating_a_new_table

CREATE TABLE my_dataset.new_table (
  x INT64 OPTIONS (description = 'An optional INTEGER field'),
  y STRUCT<
    a ARRAY<STRING> OPTIONS (description = 'A repeated STRING field'),
    b BOOL
  >,
)
PARTITION BY _PARTITIONDATE
OPTIONS (
  expiration_timestamp = TIMESTAMP '2025-01-01 00:00:00 UTC',
  partition_expiration_days = 1,
  description = 'a table that expires in 2025, with each partition living for 24 hours',
  labels=[("org_unit", "development")]
)
;
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#examples_6

ALTER TABLE mydataset.mytable
SET OPTIONS (
  expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY),
  description = "Table that expires seven days from now"
)
;
  • DML statement ์˜ˆ์‹œ
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#insert_using_explicit_values

INSERT dataset.Inventory (
         product,
         quantity
       )
VALUES ('top load washer', 10),
       ('front load washer', 20),
       ('dryer', 30),
       ('refrigerator', 10),
       ('microwave', 20),
       ('dishwasher', 30),
       ('oven', 5)
;
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_using_joins

UPDATE dataset.Inventory
   SET quantity = quantity + 
                  (SELECT quantity 
                     FROM dataset.NewArrivals
                    WHERE Inventory.product = NewArrivals.product),
       supply_constrained = false
 WHERE product IN (SELECT product FROM dataset.NewArrivals)
;

UPDATE dataset.Inventory i
   SET quantity = i.quantity + n.quantity,
       supply_constrained = false
  FROM dataset.NewArrivals n
 WHERE i.product = n.product
;
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples

MERGE dataset.DetailedInventory T
USING dataset.Inventory S
   ON T.product = S.product
 WHEN NOT MATCHED AND quantity < 20 THEN 
      INSERT (product, quantity, supply_constrained, comments)
      VALUES (product, quantity, true, 
              ARRAY<STRUCT<created DATE, comment STRING>>[
                (DATE('2016-01-01'), 'comment1')
              ]
      )
 WHEN NOT MATCHED THEN
      INSERT (product, quantity, supply_constrained)
      VALUES (product, quantity, false)

์ด๋ฆ„์ง“๊ธฐ์™€ ์‚ฌ์šฉํ•˜๊ธฐ (Naming Conventions)

์ด ์žฅ์—์„œ๋Š” ๋ฌธ์žฅ์„ ์ด๋ฃจ๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋‹จ์œ„์ธ ํ‚ค์›Œ๋“œ์™€ ์‹๋ณ„์ž ์ด๋ฆ„์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ๋‹ค๋ฃฌ๋‹ค.

๋ช…๋ช… ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

์‹๋ณ„์ž(identifier)๋Š” ๋ฌธ์žฅ๋‚ด์—์„œ ์œ ์ผํ•˜๊ฒŒ ๊ตฌ๋ณ„๋˜์–ด ์ธ์‹๋˜๋Š” ์ด๋ฆ„์œผ๋กœ ์•„๋ž˜ ํ•ญ๋ชฉ๋“ค์„ ํฌํ•จํ•œ๋‹ค.

  1. ์–ธ์–ด์˜ ๋ฌธ๋ฒ•๊ตฌ์กฐ๋ฅผ ๊ธฐ์ˆ ํ•˜๋Š” ํ† ํฐ(token) - ex. SELECT, JOIN, FROM, ...
  2. ํ…Œ์ด๋ธ”, ์ปฌ๋Ÿผ์˜ ๋ช…์นญ์ด๋‚˜ ๋ณ„์นญ(alias)
  3. ๊ธฐ๋ณธ ์ œ๊ณต ํ•จ์ˆ˜ ๋˜๋Š” ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ ์ด๋ฆ„ - ex. SUM(), STRPOS(), ...

ํ‚ค์›Œ๋“œ(keyword) ๋Š” ์˜ˆ์•ฝ๋œ ์‹๋ณ„์ž(reserved identifier) ๋กœ์„œ SQL ๋ช…์„ธ์— ์˜ํ•ด ๊ทธ ์“ฐ์ž„์ด ์ด๋ฏธ ์ •ํ•ด์ ธ ์žˆ๋Š” ์ด๋ฆ„์„ ๋งํ•œ๋‹ค.

๋ณธ ๊ฐ€์ด๋“œ์—์„œ๋Š” ์•„๋ž˜ ์„œ์ˆ ๋œ ๊ด€๋ก€์— ๋”ฐ๋ผ ์ด๋ฆ„์„ ์ •์˜ํ•˜๊ณ  ํ‘œ๊ธฐํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•œ๋‹ค.

  • ํ‚ค์›Œ๋“œ๋Š” ๋Œ€๋ฌธ์ž๋กœ ๊ทธ ์™ธ์˜ ์‹๋ณ„์ž๋Š” ์†Œ๋ฌธ์ž๋กœ ์ž‘์„ฑํ•˜๋„๋ก ํ•œ๋‹ค.
  • ๊ธฐ๋ณธ ์ œ๊ณต ํ•จ์ˆ˜ ํ˜น์€ ๋‚ด์žฅ ํ•จ์ˆ˜(built-in function)๋Š” ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์œผ๋‚˜ ๋Œ€๋ฌธ์ž๋กœ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.
  • ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF, user-defined function)๋Š” ์†Œ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•จ์ˆ˜ ์ด๋ฆ„์„ ์ •์˜ํ•˜๋„๋ก ํ•œ๋‹ค. UDF ์ด๋ฆ„์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•œ๋‹ค.
  • ์—ฌ๋Ÿฌ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ง„ ์‹๋ณ„์ž๋Š” camelCase๊ฐ€ ์•„๋‹Œ snake_case์˜ ํ˜•ํƒœ๋กœ ์ž‘์„ฑํ† ๋ก ํ•œ๋‹ค.

์œ„์˜ ๊ด€๋ก€์— ๋”ฐ๋ฅธ ์˜ˆ์‹œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT first_name, last_name FROM my_dataset.my_table

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

select firstName, lastName from myDataset.myTable -- camel case ์‚ฌ์šฉ
-- OR
SELECT FIRST_NAME, LAST_NAME FROM MY_DATASET.MY_TABLE

๋‚ด์žฅ ํ•จ์ˆ˜์™€ ๋‹ฌ๋ฆฌ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์˜ ์ด๋ฆ„์— ์†Œ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ ์˜์† ํ•จ์ˆ˜ (persistent function) ๋กœ ์ •์˜ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„์„ ํฌํ•จํ•œ ํ•œ์ •๋œ ํ•จ์ˆ˜ ์ด๋ฆ„ (qualified function name)์„ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์ด๋•Œ ํ•จ์ˆ˜๋ฅผ ํ•œ์ •์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์ ํŠธ ์ด๋ฆ„์ด๋‚˜ ๋ฐ์ดํ„ฐ์…‹์˜ ์ด๋ฆ„์ด ์†Œ๋ฌธ์ž๋กœ ์ž‘์„ฑ๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•˜๋‚˜์˜ ์˜๋ฏธ ๋‹จ์œ„๋‚ด์—์„œ ๋Œ€์†Œ๋ฌธ์ž๊ฐ€ ํ˜ผ์šฉ๋˜์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ์ด๋‹ค.

์ด๋ฅผ ํ†ตํ•ด ์•„๋ž˜์— ์„ค๋ช…ํ•  FQTN (Fully Qualified Table Name) ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๊ทœ์น™๊ณผ์˜ ์ผ๊ด€์„ฑ๋„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค.

-- ์™„์ „ ํ•œ์ •๋œ ํ•จ์ˆ˜ ์ด๋ฆ„์˜ ์˜ˆ - fully qualified function name
SELECT bqutils.fn.last_day('2021-03-07')

๋Œ€์†Œ๋ฌธ์ž์˜ ๊ตฌ๋ถ„์€ ๋ฌธ์žฅ๋‚ด์—์„œ ๊ฐ ์ด๋ฆ„์ด ์ œ ์—ญํ• ์„ ๋“œ๋Ÿฌ๋‚ด๋„๋ก ํ•˜๋Š”๋ฐ ๋„์›€์„ ์ค€๋‹ค.

๋Œ€๋ฌธ์ž SELECT, FROM ๋“ฑ์˜ ํ‚ค์›Œ๋“œ๋Š” ์ฃผ๋กœ ๋ฌธ์žฅ ๊ตฌ์กฐ์˜ ํ‘œํ˜„์— ์‚ฌ์šฉ๋˜๋Š”๋ฐ ์†Œ๋ฌธ์ž๋กœ ํ‘œ๊ธฐ๋˜๋Š” ๊ฐœ์ฒด(entity)๋‚˜ ์†์„ฑ(attribute)์˜ ์‹๋ณ„์ž ์ด๋ฆ„๊ณผ ์‹œ๊ฐ์ ์œผ๋กœ ๊ตฌ๋ถ„์ด ๋œ๋‹ค. ์ด๋Š” ์ „์ฒด์ ์ธ ๋ฌธ์žฅ ๊ตฌ์กฐ์˜ ํŒŒ์•…์„ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค.

๊ธฐํƒ€ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ

  • ํ‚ค์›Œ๋“œ๊ฐ€ ๋ถ€๋“์ดํ•˜๊ฒŒ ์ปฌ๋Ÿผ์˜ ์ด๋ฆ„์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ backtick(`)์„ ์ด์šฉํ•˜์—ฌ ๊ฐ์‹ธ์ค€๋‹ค (quoted identifiers).
-- from ์˜ˆ์•ฝ์–ด๊ฐ€ ์ปฌ๋Ÿผ๋ช…์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ
SELECT 'asia-northeast3' AS `from` FROM ...
  • ๋ฌธ์ž์—ด ๋ฆฌํ„ฐ๋Ÿด(string literal)์€ ๋‹จ์ผ๋”ฐ์˜ดํ‘œ(')๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
  • ์ƒ์ˆ˜(constant)๊ฐ’์„ ๊ฐ–๋Š” ๋ณ€์ˆ˜ ์ด๋ฆ„์€ ๋Œ€๋ฌธ์ž๋กœ ํ•œ๋‹ค.
DECLARE START_DATE DATE DEFAULT '2021-02-19'

์ฐธ๊ณ  (references)


์ปฌ๋Ÿผ ์ด๋ฆ„ ์ง“๊ธฐ์™€ ์‚ฌ์šฉํ•˜๊ธฐ (Column Naming)

์ปฌ๋Ÿผ ์ด๋ฆ„ ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

์ปฌ๋Ÿผ์ด๋ฆ„์€ camelCase๊ฐ€ ์•„๋‹Œ ์†Œ๋ฌธ์ž๋ฅผ ์ด์šฉํ•˜์—ฌ snake_case ๋กœ ํ‘œํ˜„ํ•˜๋ฉฐ ํ—๊ฐ€๋ฆฌ์–ธ ํ‘œ๊ธฐ ๋‚˜ ์˜๋ฏธ๋ฅผ ์œ ์ถ”ํ•˜๊ธฐ ํž˜๋“  ์ง€๋‚˜์น˜๊ฒŒ ์ถ•์•ฝ๋œ ์ ‘๋‘์–ด(prefix), ์ ‘๋ฏธ์–ด(suffix)์˜ ์‚ฌ์šฉ์€ ์ž์ œํ•œ๋‹ค.

๊ธฐ์ค€์ •๋ณด์™€ ๊ฐ™์ด ํŒ€๋‚ด์—์„œ ํ‘œ์ค€์œผ๋กœ ๊ด€๋ฆฌ๋˜๋Š” ์†์„ฑ๋“ค์ด ์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” ํ•ด๋‹น ํ‘œ์ค€์—์„œ ์ •์˜๋œ ๋ช…๋ช…๊ทœ์น™์„ ์šฐ์„ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ตญ๊ฐ€์ฝ”๋“œ๋“ฑ ๊ธฐ์ค€์ •๋ณด๋กœ ์‚ฌ์šฉ๋˜๋Š” ์ปฌ๋Ÿผ์˜ ์ด๋ฆ„์€ ํ‘œ์ค€์—์„œ ์ •์˜๋œ ๋ช…๋ช… ๊ทœ์น™์„ ์ฐจ์šฉํ•ด์„œ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.

SELECT cnty_cd, mcc, mnc, csc, FROM ...

๋ฐฐ์—ดํ˜•์ด๋‚˜ ๋ณต์ˆ˜์˜ ์˜๋ฏธ๋ฅผ ์ง€๋‹ˆ๋Š” ์ปฌ๋Ÿผ์ธ ๊ฒฝ์šฐ๋Š” ์ด๋ฆ„์„ ๋ณต์ˆ˜ํ˜•์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๋‹ค.

SELECT hits,          -- ARRAY<STRUCT<>> ํ˜•
       totals.visits, -- INT64ํ˜•, ์ด ์„ธ์…˜ ํšŸ์ˆ˜
       totals.hits,   -- INT64ํ˜•, ์„ธ์…˜๋‚ด ์กฐํšŒ์ˆ˜
  FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` 
;

์ ‘๋‘์–ด์™€ ์ ‘๋ฏธ์–ด (prefix, suffix)

์ ‘๋‘์–ด๋‚˜ ์ ‘๋ฏธ์–ด๋Š” ์ด๋ฆ„์— ๋ถ€๊ฐ€์ ์ธ ์˜๋ฏธ(semantics)๋ฅผ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‚˜ ์ง€๋‚˜์น˜๊ฒŒ ์ถ•์•ฝ๋œ ์•ฝ์–ด๋‚˜ ๊ณผ๋„ํ•œ ์‚ฌ์šฉ์€ ์˜คํžˆ๋ ค ๊ฐ€๋…์„ฑ์„ ๋–จ์–ด๋œจ๋ฆด ์ˆ˜ ์žˆ์–ด ์‚ฌ์šฉ์— ์‹ ์ค‘ํ•ด์•ผ ํ•œ๋‹ค.

์•„๋ž˜๋Š” ๋ช‡๊ฐ€์ง€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์˜ˆ์‹œ์ด๋‹ค.

  • Boolean ํ˜•์˜ ๊ฒฝ์šฐ is_, has_์™€ ๊ฐ™์€ ์ ‘๋‘์–ด๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • Dateํ˜•์€ _dt ์ ‘๋ฏธ์–ด๋ฅผ ๋ถ™์ผ ์ˆ˜ ์žˆ๋‹ค.
  • DateTime์ด๋‚˜ Timestampํ˜•์€ _at ์ ‘๋ฏธ์–ด๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
  • user_id, device_id ์ฒ˜๋Ÿผ ์‹๋ณ„์ž(id) ์ ‘๋ฏธ์–ด์˜ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
SELECT is_active,
       registered_dt,
       updated_at,
  FROM ...       

์ปฌ๋Ÿผ ๋ณ„์นญ (column_alias)

์ปฌ๋Ÿผ์˜ ๋ณ„์นญ ์ด๋ฆ„์€ ์ปฌ๋Ÿผ ์ด๋ฆ„ ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ์— ์ค€ํ•˜์—ฌ ์ž‘์„ฑํ•œ๋‹ค. ๋ณ„์นญ ์•ž์˜ AS ํ‚ค์›Œ๋“œ ๋Š” ์ƒ๋žต ๊ฐ€๋Šฅํ•˜๋‚˜ ๋ณธ ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.

๋ณ„์นญ์€ ๋ชฉ์ ์— ๋”ฐ๋ผ์„œ ์ปฌ๋Ÿผ ์ด๋ฆ„๋ณด๋‹ค ์ƒ์„ธํžˆ ์ž‘์„ฑ๋˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๊ณ  ๋ฐ˜๋Œ€๋กœ ๊ฐ„๋žตํ•˜๊ฒŒ ์ถ•์•ฝ๋œ ํ˜•ํƒœ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

์•„๋ž˜์™€ ๊ฐ™์ด ์ปฌ๋Ÿผ์˜ ์˜๋ฏธ๋ฅผ ๋ช…ํ™•ํ•˜ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ธด ์ด๋ฆ„์˜ ๋ณ„์นญ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

SELECT fname AS first_name,
       lname AS last_name,  
  FROM ...

๋ฐ˜๋ฉด์— ์ปฌ๋Ÿผ ์ด๋ฆ„์ด ์œ ๋„๋œ ์ปฌ๋Ÿผ(derived column) ๋˜๋Š” ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ(calculated column)์„ ๋งŒ๋“ค์–ด ๋‚ด๋Š” ์—ฐ์‚ฐ์‹์—์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜์–ด์ง€๋Š” ๊ฒฝ์šฐ๋Š” ๊ฐ„๊ฒฐํ•จ์„ ์œ„ํ•ด์„œ ์ถ•์•ฝ๋œ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

WITH locations AS (
  SELECT x_position,
         y_position,
         city_centre_x_position,
         city_center_y_position,
    FROM ...
)
SELECT SQRT(
         (city_center_x_position - x_position) * (city_center_x_position - x_position) +
         (city_center_y_position - y_position) * (city_center_y_position - y_position)
       ) AS distince,  -- calculated(or derived) column
  FROM locations

์œ„์˜ ์˜ˆ์‹œ๋Š” ํ…Œ์ด๋ธ”์ด ๊ฐ€์ง€๋Š” ๊ธฐ๋ณธ ์ปฌ๋Ÿผ (base column) ์œผ๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ํŒŒ์ƒ์‹œํ‚ค๋Š” ์—ฐ์‚ฐ์‹์— ์ปฌ๋Ÿผ ์ด๋ฆ„์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ๋“ฑ์žฅํ•˜์—ฌ ๋ผ์ธ์ด ๊ธธ์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ด๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ถ•์•ฝ๋œ ๋ณ„์นญ์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์—ฐ์‚ฐ์‹์ด ๊ฐ„๊ฒฐํ•˜๊ฒŒ ํ‘œํ˜„๋œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

WITH location AS (
  SELECT x_position AS x,
         y_position AS y,
         city_centre_x_position AS cx,
         city_center_y_position AS cy,
    FROM ...
)
SELECT SQRT((cx - x) * (cx - x) + (cy - y) * (cy - y)) AS distince,
  FROM location

ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์ง“๊ธฐ์™€ ์‚ฌ์šฉํ•˜๊ธฐ (Table Naming)

ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ ์†Œ๋ฌธ์ž ์‚ฌ์šฉ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•˜๋˜ ์˜๊ตฌ ํ…Œ์ด๋ธ”(permanent table)์— ๋Œ€ํ•œ ํŒ€๋‚ด ๋ช…๋ช… ๊ทœ์น™์ด ๋ณ„๋„๋กœ ์ •ํ•ด์ ธ ์žˆ๋Š” ๊ฒฝ์šฐ ๊ทธ๊ฒƒ์„ ์šฐ์„ ํ•œ๋‹ค.

๋‹ค๋งŒ, ์ž„์‹œํ…Œ์ด๋ธ”(temporary table)์ด๋‚˜ ๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹(CTEs) ์ด๋ฆ„์˜ ๊ฒฝ์šฐ๋Š” ์†Œ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค. ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•œ๋‹ค.

  • ์—ฌ๋ž˜ ๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์ด๋ฆ„๋ฅผ ์ง€์„ ๊ฒฝ์šฐ dash(-)๊ฐ€ ์•„๋‹Œ underscore(_)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ snake_case ์ด๋ฆ„์ด ๋˜๋„๋ก ํ•œ๋‹ค.
  • ๊ณผ์ œ๋ณ„ ๋ช…๋ช…๊ทœ์น™์— ์˜ํ•ด ์ด๋ฏธ ์ƒ์„ฑ๋˜์–ด ์šด์˜๋˜๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹๊ณผ ํ…Œ์ด๋ธ” ์ด๋ฆ„๋“ค์€ ๊ฐ€์ด๋“œ์˜ ๊ถŒ๊ณ ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋”๋ผ๋„ As-Is ์ด๋ฆ„์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ† ๋ก ํ•œ๋‹ค.

โ–  Fully Qualified Table Name (FQTN)

'์™„์ „ํ•˜๊ฒŒ ํ•œ์ •๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๋˜๋Š” ์ •๊ทœํ™”๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„'์˜ ์˜๋ฏธ๋Š” ๊ทธ ์ด๋ฆ„์„ ๊ฐ€์ง€๊ณ  ์‹คํ–‰์ปจํ…์ŠคํŠธ์™€ ์ƒ๊ด€์—†์ด ์œ ์ผํ•˜๊ฒŒ ํ…Œ์ด๋ธ”๋ฅผ ํŠน์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ง์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, customer.devices ๋ผ๋Š” ์ผ๋ถ€ ํ•œ์ •๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰์‹œํ‚ค๋Š” ํ”„๋กœ์ ํŠธ์— ๋”ฐ๋ผ์„œ ๊ฐ๊ธฐ ๋‹ค๋ฅธ customer.devices ํ…Œ์ด๋ธ”์— ์ ‘๊ทผ์ด ๋  ์ˆ˜ ์žˆ๋‹ค.

BigQuery๋กœ DWH(Data Ware House)๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋œ ํ”„๋กœ์ ํŠธ์™€ BigQuery Job์„ ์‹คํ–‰์‹œ์ผœ์ฃผ๋Š” ํ”„๋กœ์ ํŠธ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹คํ–‰์ปจํ…์ŠคํŠธ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋„๋ก ์ •๊ทœํ™”๋œ ํ˜•ํƒœ์˜ ํ…Œ์ด๋ธ” ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค. ํ…Œ์ด๋ธ” ์ด๋ฆ„์— ํ”„๋กœ์ ํŠธ ๋ช…์„ ์ƒ๋žตํ•˜๋Š” ๊ฒฝ์šฐ BigQuery Job ์ด ์ˆ˜ํ–‰๋˜๊ณ  ์žˆ๋Š” ํ”„๋กœ์ ํŠธ ์ด๋ฆ„์„ ์•”๋ฌต์ ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

์ด์™ธ์—, ๋‹จ์ผ ๋ฌธ์žฅ(single statement)์˜ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์•„๋‹Œ BigQuery ์Šคํฌ๋ฆฝํŠธ(script)๋กœ ๋ณต์ˆ˜์˜ ๋ฌธ์žฅ(multiple statements)์„ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” GCP Project ์ด๋ฆ„์— ํฌํ•จ๋œ dash(-)๊ฐ€ ๋บ„์…ˆ ๋“ฑ์˜ ๋‹ค๋ฅธ ์˜๋ฏธ๋กœ ํ•ด์„์ด ๋˜์ง€ ์•Š๋„๋ก ์ •๊ทœํ™”๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์„ backtick(`)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ์‹ธ์ค€๋‹ค.

์ด๋•Œ dash๊ฐ€ ํฌํ•จ๋œ ํ”„๋กœ์ ํŠธ ์ด๋ฆ„๋งŒ backtick์œผ๋กœ ๊ฐ์‹ธ์ฃผ์–ด๋„ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€๋Š” ์•Š์œผ๋‚˜ FQTN ์ „์ฒด๋ฅผ backtick์„ ๊ฐ์‹ธ์ฃผ์–ด ์˜๋ฏธ์ ์œผ๋กœ ํ•˜๋‚˜์˜ ๋‹จ์œ„์ž„์„ ๋ช…์‹œ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋„๋ก ํ•œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT * FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT * FROM `bigquery-public-data`.austin_bikeshare.bikeshare_stations

๋ณ„ํ‘œ(*)๋ฅผ ํฌํ•จํ•˜๋Š” ์™€์ผ๋“œ์นด๋“œ(wildcard) ํ…Œ์ด๋ธ” ์ด๋ฆ„๋„ backtick(`)์œผ๋กœ ๊ฐ์‹ธ์ฃผ์–ด์•ผ ํ•œ๋‹ค.

/* Valid standard SQL query with wildcard table name*/
SELECT max
  FROM `bigquery-public-data.noaa_gsod.gsod*`
 WHERE max != 9999.9 # code for missing data
   AND _TABLE_SUFFIX = '1929'
 ORDER BY max DESC
;

๋ฌธ์žฅ ๊ตฌ์กฐํ™” ํ•˜๊ธฐ (Statement Structure)

SQL ๋ฌธ๋ฒ•์— ์ •์˜๋œ ํ‚ค์›Œ๋“œ๋“ค๊ณผ ์•ž์„œ์˜ ๋ช…๋ช… ๊ทœ์น™์— ์˜ํ•ด์„œ ์ง€์–ด์ง„ ์ด๋ฆ„๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๊ตฌ์ ˆ์„ ๊ธฐ์ˆ ํ•˜๊ณ  ์ด๋ฅผ ๋ธ”๋Ÿญ์ฒ˜๋Ÿผ ์Œ“์•„ ๋ฌธ์žฅ์œผ๋กœ ๊ตฌ์กฐํ™”ํ•ด ๋ณด๋„๋ก ํ•œ๋‹ค.

SELECT ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ

SQL์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ SELECT ๋ฌธ์žฅ์˜ ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ๋Š” ์•„๋ž˜์˜ ์ง€์นจ์— ๋”ฐ๋ผ ์ž‘์„ฑํ•œ๋‹ค.

  • SELECT * ๋กœ ์ „์ฒด ์ปฌ๋Ÿผ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ๋ณด๋‹ค๋Š” ํ•„์š”ํ•œ ์ปฌ๋Ÿผ๋งŒ์„ SELECT ๋ฆฌ์ŠคํŠธ์— ์—ด๊ฑฐํ•˜๋„๋ก ํ•œ๋‹ค.
  • SELECT ๋ฆฌ์ŠคํŠธ์— ๋“ฑ์žฅํ•˜๋Š” ์ปฌ๋Ÿผ์ด ๋ณต์ˆ˜๊ฐœ์ผ ๊ฒฝ์šฐ ํ•œ ๋ผ์ธ์— ํ•˜๋‚˜์”ฉ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.
SELECT station_id,
       name,
       status,
       latitude,
       longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  • ์˜๋ฏธ์ ์œผ๋กœ ๊ด€๋ จ์ด ์žˆ๊ฑฐ๋‚˜ ์‹œ์•ผ ๋ฒ”์œ„๋‚ด์—์„œ ํ•œ ๋ˆˆ์— ์ฝํžˆ๊ณ  ์ดํ•ด๋  ์ˆ˜ ์žˆ๋‹ค๋ฉด 80์ž๋ฅผ ๋„˜์ง€ ์•Š๋Š” ์„ ์—์„œ ํ•˜๋‚˜์˜ ๋ผ์ธ์— ๋ณต์ˆ˜์˜ ์ปฌ๋Ÿผ๋“ค์„ ๊ธฐ์ˆ ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์ฃผ์„์„ ์‚ฌ์šฉํ•˜์—ฌ ์ปฌ๋Ÿผ์˜ ์˜๋ฏธ๋ฅผ ๋ถ€์—ฐ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ๋ช…์˜ ๋’ค์— ์ฝค๋งˆ(,)๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.
SELECT station_id,
       name,
       status,
       latitude, longitude,  -- ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ ๋ญ‰์น˜(Column Family)๋กœ ์˜๋ฏธ์ ์œผ๋กœ ๋ฌถ์ž„
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
;
-- ๋˜๋Š”
SELECT station_id, name, status, latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
;
  • ์ปฌ๋Ÿผ ๋ณ„์นญ(alias)์„ ํฌํ•จํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ ์œ„์˜ ๋‚ด์šฉ๋“ค์ด ๋™์ผํ•˜๊ฒŒ ์ ์šฉ๋œ๋‹ค.
  • ์ปฌ๋Ÿผ ๋ณ„์นญ์€ AS ํ‚ค์›Œ๋“œ ๋‹ค์Œ์— ์œ„์น˜์‹œํ‚ค๋ฉฐ ๋ณ„์นญ์˜ ์‹œ์ž‘ ์œ„์น˜๋Š” ๋”ฐ๋กœ ์ •๋ ฌ์‹œํ‚ค์ง€ ์•Š๋Š”๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT station_id AS station_id,
       name AS station_name,
       status AS bike_status,
       latitude,
       longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT station_id         AS station_id,
       name               AS station_name,
       status             AS bike_status,
       latitude,
       longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

์•ž์„  ์‰ผํ‘œ์™€ ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ (leading comma vs. trailing comma)

BigQuery์—์„œ๋Š” SELECT ๋ฆฌ์ŠคํŠธ์˜ ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ ๋’ค์— ์ฝค๋งˆ(,)๋ฅผ ํ—ˆ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณธ ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ recommend

SELECT station_id,
       name,
       status,
       latitude,
       longitude,  -- ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ ๋’ค ์ฝค๋งˆ(,) ์ถ”๊ฐ€
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๐Ÿ˜ž ๊ธฐํ”ผ avoid

SELECT station_id
     , name
     , status
     , latitude
     , longitude
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š๋Š” SQL ์–ธ์–ด์—์„œ๋Š” ์•ž์„  ์‰ผํ‘œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ SELECT ๋ฆฌ์ŠคํŠธ์— ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด๋„ ์ง์ „ ๋ผ์ธ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋„๋ก ํ•˜์˜€์œผ๋‚˜ BigQuery์—์„œ๋Š” ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์ง€์›ํ•˜๋ฏ€๋กœ ์ด๋ฅผ ๊ธฐ๋ณธ ์Šคํƒ€์ผ๋กœ ์‚ผ๋Š”๋‹ค.

-- ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ ์‚ฌ์šฉ
SELECT station_id,
       name
  FROM ...

-- ์ปฌ๋Ÿผ ์ถ”๊ฐ€์‹œ
SELECT station_id,
       name,    -- ์•„๋ž˜ ๋•Œ๋ฌธ์— comma๊ฐ€ ์ถ”๊ฐ€๋จ.
       status   -- ์‹ ๊ทœ์ปฌ๋Ÿผ ์ถ”๊ฐ€
  FROM ...
-- ์•ž์„  ์‰ผํ‘œ ์‚ฌ์šฉ
SELECT station_id
     , name
  FROM ...

-- ์ปฌ๋Ÿผ ์ถ”๊ฐ€์‹œ
SELECT station_id
     , name     -- ์•„๋ž˜ ์ปฌ๋Ÿผ ์ถ”๊ฐ€๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ๋‚ด์šฉ์ด ์—†์Œ
     , status   -- ์‹ ๊ทœ์ปฌ๋Ÿผ ์ถ”๊ฐ€
  FROM ...

์ด์™ธ์— ๋ฒ”์šฉ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋‚˜ ๋™์ (dynamic) SQL์„ ์‚ฌ์šฉํ•˜์—ฌ SELECT ๋ฌธ์˜ ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ž๋™ ์ƒ์„ฑํ•  ๋•Œ ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋ฆฌ์ŠคํŠธ์˜ ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ์— ์‰ผํ‘œ๊ฐ€ ์ƒ๋žต๋˜๋„๋ก ์˜ˆ์™ธ ์ฒ˜๋ฆฌํ•  ํ•„์š”๊ฐ€ ์—†์–ด ๊ตฌํ˜„ ๋กœ์ง์ด ๋‹จ์ˆœํ•ด์ง€๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.

์ปฌ๋Ÿผ ์ˆœ์„œ ๊ด€๋ก€ (column order conventions)

  • ๊ธฐ๋ณธํ‚ค(PK, Primary Key)์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋จผ์ € ์œ„์น˜์‹œํ‚จ๋‹ค.
  • (optional) ํŒŒํ‹ฐ์…˜๋œ ํ…Œ์ด๋ธ” ์‚ฌ์šฉ์‹œ ํŒŒํ‹ฐ์…˜ ์ปฌ๋Ÿผ๋ช…์€ ๋งˆ์ง€๋ง‰์— ์œ„์น˜์‹œํ‚จ๋‹ค. ์šด์˜์ค‘์— ์Šคํ‚ค๋งˆ ๋ณ€๊ฒฝ์œผ๋กœ ์ธํ•œ ์ปฌ๋Ÿผ ์ถ”๊ฐ€์‹œ๋Š” ์˜ˆ์™ธ๊ฐ€ ๋œ๋‹ค.
  • GROUP BY ๋ฅผ ๋™๋ฐ˜ํ•œ ์ง‘๊ณ„ ์ฟผ๋ฆฌ์—์„œ๋Š” ์ฐจ์›(dimension)์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋จผ์ € ๋‚˜์—ดํ•œ ํ›„ ์ง€ํ‘œ(metric)์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๊ธฐ์ˆ ํ•œ๋‹ค.

FROM ์ ˆ

FROM ์ ˆ์—๋Š” ํ…Œ์ด๋ธ” ๋˜๋Š” ์ด์— ์ƒ์‘ํ•˜๋Š” ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ์œ„์น˜ํ•˜๊ฒŒ ๋œ๋‹ค. ์„œ๋ธŒ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ ๋Œ€ํ•œ ์Šคํƒ€์ผ์€ ๋‹ค์Œ ์žฅ์—์„œ ๋‹ค์‹œ ์–ธ๊ธ‰ํ•˜๋„๋ก ํ•œ๋‹ค.

  • FROM ์ ˆ์€ SELECT ๋ฆฌ์ŠคํŠธ์˜ ๋‹ค์Œ ๋ผ์ธ์— ์œ„์น˜ํ•˜๋ฉฐ ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ FROM ํ‚ค์›Œ๋“œ ๋‹ค์Œ์— ์ด์–ด์„œ ๊ธฐ์ˆ ํ•œ๋‹ค.
  • FROM ํ‚ค์›Œ๋“œ๋Š” SELECT ํ‚ค์›Œ๋“œ์— ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ(right-aligned)์ด ๋˜๋„๋ก ์œ„์น˜์‹œํ‚จ๋‹ค.

์˜๊ตฌ ํ…Œ์ด๋ธ” ์ด๋ฆ„๊ณผ ์ž„์‹œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ (permanent and temporary table name)

  • FROM ํ‚ค์›Œ๋“œ ๋‹ค์Œ์— ๋“ฑ์žฅํ•˜๋Š” ํ…Œ์ด๋ธ” ์ด๋ฆ„์œผ๋กœ๋Š” ์˜๊ตฌ ํ…Œ์ด๋ธ”(permanent table)๊ณผ ์ž„์‹œ ํ…Œ์ด๋ธ”(temporary table) ์ด๋ฆ„ ๋˜๋Š” CTE(common table expression) ๊ตฌ๋ฌธ์— ์˜ํ•ด ์ •์˜๋œ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์˜๊ตฌ ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ FQTN ํ˜•์‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ ์ž„์‹œ ํ…Œ์ด๋ธ”์ด๋‚˜ CTE ์ด๋ฆ„์€ snake_case ํ˜•์‹์„ ์‚ฌ์šฉํ•œ๋‹ค.
  • ์ž„์‹œ ํ…Œ์ด๋ธ”์ด๋‚˜ CTE ์ด๋ฆ„์€ ํ…Œ์ด๋ธ”์˜ ์‚ฌ์šฉ๋ชฉ์ ์ด ๋ช…ํ™•ํžˆ ๋“ค์–ด๋‚˜๋„๋ก ์ƒ์„ธํ•˜๊ฒŒ ์ž‘์„ฑํ•œ๋‹ค.
-- Permanent Table 
SELECT * 
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`;
-- Temporara Table (ex. CTAS ๊ตฌ๋ฌธ์œผ๋กœ ์ž„์‹œํ…Œ์ด๋ธ” ์ƒ์„ฑ)
CREATE TEMP TABLE tmp_table AS SELECT ... ;

SELECT * FROM tmp_table;
-- CTEs - named subquery
WITH cte_name AS ( -- snake_case ๊ทœ์น™์— ๋”ฐ๋ผ ๊ธฐ์ˆ 
  SELECT ... -- ์™ผ์ชฝ ๋‘์นธ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ํ•œ๋‹ค.
    FROM ... 
)
SELECT * FROM cte_name;

์„œ๋ธŒ์ฟผ๋ฆฌ (subqueries)

ํ…Œ์ด๋ธ” ์ด๋ฆ„์ด๋‚˜ CTE ์ด๋ฆ„์ด ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ FROM ์ ˆ์„ ๊ตฌ์„ฑํ•  ๊ฒฝ์šฐ์— ์•„๋ž˜์™€ ๊ฐ™์ด ๋‹ค์–‘ํ•œ ์Šคํƒ€์ผ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” from_style_#2 ๋ฅผ ์‚ฌ์šฉํ•œ ๊ตฌ์„ฑ์„ ๊ถŒ์žฅํ•œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #1 (from_style_#1)
SELECT *
  FROM (SELECT station_id,
               name,
               status
          FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`)

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #2 (from_style_#2)
SELECT *
  FROM (
    SELECT station_id, -- FROM ํ‚ค์›Œ๋“œ๋กœ๋ถ€ํ„ฐ ๋‘ ์นธ ๋“ค์—ฌ์“ฐ๊ธฐ
           name,
           status,
      FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  )

๋ณธ ๊ฐ€์ด๋“œ์—์„œ from_style_#2 ๋ฅผ ์„ ํ˜ธํ•˜๋Š” ์ด์œ ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์šฐ์„  ์œ„์˜ ์ฟผ๋ฆฌ๋ฅผ JOIN์„ ์ด์šฉํ•˜์—ฌ ํ™•์žฅํ•ด ๋ณด๋„๋ก ํ•˜์ž.

/* JOIN์„ ์ด์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ๋ฅผ ํ™•์žฅํ•œ ์˜ˆ์‹œ */
-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #1 (from_style_#1)
SELECT *
  FROM (SELECT station_id,
               name,
               status
          FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`) AS a
  LEFT JOIN (SELECT station_id,
                    name,
                    status
               FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`) AS b
    ON a.station_id = b.station_id
;
-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #2 (from_style_#2)
SELECT *
  FROM (
    SELECT station_id,
           name,
           status,
      FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  ) AS a
  LEFT JOIN (
    SELECT station_id,
           name,
           status,
      FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  ) AS b
    ON a.station_id = b.station_id
;

from_style_#1 ์€ ๋™์ผํ•œ ์ˆ˜์ค€(level or depth)์— ์œ„์น˜ํ•œ ๋‘ ์„œ๋ธŒ์ฟผ๋ฆฌ a, b์˜ ๋“ค์—ฌ์“ฐ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ฌ๋ผ์ ธ ์žˆ๋‹ค. ๋ฐ˜๋ฉด์— from_style_#2 ์€ ๋‘ ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ๋™์ผํ•œ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ๋™์‹œ์— ํ…Œ์ด๋ธ” ๋ณ„์นญ(alias)์„ ๋ผ์ธ์˜ ๋์ด ์•„๋‹Œ ์•ž์ชฝ์— ์œ„์น˜ํ•จ์œผ๋กœ์จ ON์ ˆ์—์„œ ์‹œ์„ ์„ ํฌ๊ฒŒ ์ด๋™์‹œํ‚ค์ง€ ์•Š๊ณ ์„œ๋„ ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค.

WHERE ์ ˆ

WHERE ์ ˆ์—๋Š” ๊ฒฐ๊ณผ์…‹(result set)์— ๋‚จ๊ฒจ์งˆ ์—ด(row)๋“ค์„ ์„ ํƒํ•  ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•œ๋‹ค. WHERE ์ ˆ์˜ ์กฐ๊ฑด์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์— ๋”ฐ๋ผ ์ž‘์„ฑํ•œ๋‹ค.

  • ๋น„๊ต ์—ฐ์‚ฐ์ž์˜ ์•ž ๋’ค๋กœ ๊ณต๋ฐฑ์„ ๋‘๋„๋ก ํ•œ๋‹ค.
  • ๋น„๋™๋“ฑ์—ฐ์‚ฐ์ž๋Š” !=์™€ <> ๋ชจ๋‘ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๋‹ค. SQL-1999์—์„œ๋Š” <>๋ฅผ ํ‘œ์ค€์œผ๋กœ ์ •์˜ํ•˜๊ณ  ์žˆ๋‹ค.
  • ์กฐ๊ฑด์ ˆ์€ AND์™€ OR ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž์— ์˜ํ•ด ๋ณต์ˆ˜๊ฐœ๊ฐ€ ์—ฐ๊ฒฐ(chaining)๋  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ณต์ˆ˜๊ฐœ์˜ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ํ•˜๋‚˜์˜ ๋ผ์ธ์— ํ•˜๋‚˜์˜ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•˜๋˜ ์—ฐ๊ด€๋œ ์กฐ๊ฑด์˜ ๊ฒฝ์šฐ์—๋Š” 80์ž๋ฅผ ๋„˜์ง€ ์•Š๋Š” ์„ ์—์„œ ํ•˜๋‚˜์˜ ๋ผ์ธ์— ๊ฐ™์ด ์ž‘์„ฑ ๊ฐ€๋Šฅํ•˜๋‹ค.
  • AND์™€ OR ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž๋Š” WHERE ํ‚ค์›Œ๋“œ์— ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์ด ๋˜๋„๋ก ํ•œ๋‹ค.
  • BETWEEN์ด๋‚˜ IN ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณต์ˆ˜๊ฐœ์˜ ์กฐ๊ฑด์„ ๋ฌถ์„ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ๋Š” ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์กฐ๊ฑด์„ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ํ‘œํ˜„ํ•œ๋‹ค.
SELECT station_id,
       name,
       status,
       latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
   AND latitude > 135.0   -- AND ๋กœ ์กฐ๊ฑด ์—ฐ๊ฒฐ 
;

๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž์˜ ํ˜ผ์šฉ (mixed logical operators)

ํ…Œ์ด๋ธ” ์กฐํšŒ์‹œ AND์™€ OR๋ฅผ ํ˜ผ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์•„๋ž˜์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ์ฝ”๋“œ๋ฅผ ๊ตฌ์กฐํ™”ํ•œ๋‹ค.

  1. ์ฃผ์–ด์ง„ ํ…Œ์ด๋ธ”์—์„œ AND ์กฐ๊ฑด์œผ๋กœ ๊ฒฐ๊ณผ์…‹(result set)์„ ์ค„์—ฌ๋‚˜๊ฐ€๋Š”์ง€ ๋ฐ˜๋Œ€๋กœ OR ์กฐ๊ฑด์œผ๋กœ ๋Œ€์ƒ์„ ๋Š˜๋ ค๋‚˜๊ฐ€๋Š”์ง€๋ฅผ ๋ณด๊ณ  ์ตœ์ƒ์œ„ ์กฐ๊ฑด์ ˆ์˜ ๊ตฌ์กฐ๋ฅผ ๋จผ์ € ํŒŒ์•…ํ•œ๋‹ค.
  2. ๊ฐ ์ƒ์œ„ ์กฐ๊ฑด๋“ค์ด ๋‹ค์‹œ ์„ธ๋ถ€์ ์ธ ์กฐ๊ฑด์œผ๋กœ ๋‚˜๋‰˜๋Š” ๊ฒฝ์šฐ์— ๊ด„ํ˜ธ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•ด๋‹น ์„ธ๋ถ€์กฐ๊ฑด์„ ๋ฌถ๋Š”๋‹ค. ๊ด„ํ˜ธ๋ฅผ ํ†ตํ•ด์„œ ์—ฐ์‚ฐ์ž ์šฐ์„ ์ˆœ์œ„์— ๋”ฐ๋ฅธ ๋ถ€์ž‘์šฉ(side effect)์„ ์—†์•ค๋‹ค.
  3. ์„ธ๋ถ€ ์กฐ๊ฑด์— ๋“ฑ์žฅํ•˜๋Š” AND ๋˜๋Š” OR ์—ฐ์‚ฐ์ž๋Š” ๋ผ์ธ์˜ ๋์— ์œ„์น˜์‹œํ‚ค๋„๋ก ํ•œ๋‹ค.
  4. 3 ๋‹จ๊ณ„ ์ด์ƒ์˜ ๊นŠ์ด์—์„œ ๋…ผ๋ฆฌ์—ฐ์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋Š” ๊ฒฝ์šฐ ๋ถ„๋ฐฐ ๋ฒ•์น™์„ ์ด์šฉํ•˜์—ฌ ๊นŠ์ด๋ฅผ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ๋‹ค.
-- ์ตœ์ƒ์œ„ ์กฐ๊ฑด๋“ค์„ AND๋กœ ๋ฌถ๊ณ  ์„ธ๋ถ€์กฐ๊ฑด์ด OR๋กœ ๋ฌถ์ด๋Š” ๊ฒฝ์šฐ
SELECT station_id,
       name,
       status,
       latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
   AND (latitude > 135.0 OR
        longitude > 80.0) 
   AND ...
;
-- ๋˜๋Š”, ์ตœ์ƒ์œ„ ์กฐ๊ฑด๋“ค์„ OR๋กœ ๋ฌถ๊ณ  ์„ธ๋ถ€์กฐ๊ฑด์„ AND๋กœ ๋ฌถ๋Š” ๊ฒฝ์šฐ
SELECT station_id,
       name,
       status,
       latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
    OR (latitude > 135.0 AND longitude > 80.0) -- ์˜๋ฏธ๋‹จ์œ„๋กœ ๋ฌถ์–ด์„œ ํ•œ ์ค„์— ํ‘œํ˜„
    OR ...    
SELECT *
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
    OR (latitude > 135.0 AND (longitude = 80.0 OR longitude = 90.0))
    OR ...
-- ์œ„์—์„œ์™€ ๊ฐ™์ด 3 ๋‹จ๊ณ„ ์ด์ƒ์˜ ๊นŠ์ด์—์„œ ๋…ผ๋ฆฌ์—ฐ์‚ฐ ์ด๋ฃจ์–ด์ง€๋Š” ๊ฒฝ์šฐ ๋ถ„๋ฐฐ๋ฒ•์น™์„ ์ด์šฉํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋™์น˜์ธ ๋ฌธ์žฅ์œผ๋กœ ํ’€์–ด์“ธ ์ˆ˜ ์žˆ๋‹ค. a ยท (b + c) = (a ยท b) + (a ยท c)
 WHERE status = 'active'
    OR (latitude > 135.0 AND longitude = 80.0)
    OR (latitude > 135.0 AND longitude = 90.0)
    OR ...

GROUP BY ์ ˆ

  • ์ง‘๊ณ„ํ•จ์ˆ˜์— ์˜ํ•ด์„œ ๋งŒ๋“ค์–ด์ง€๋Š” ์ง€ํ‘œ(metric) ์ปฌ๋Ÿผ์€ ๋ณ„์นญ(alias)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ต๋ช…(anonymous)์ด ์•„๋‹Œ ์ด๋ฆ„์„ ๋ถ€์—ฌํ•œ๋‹ค.
  • GROUP BY ๋ฆฌ์ŠคํŠธ์—๋Š” ์ปฌ๋Ÿผ์˜ ๋ณ„์นญ์ด๋‚˜ ์ปฌ๋Ÿผ ์ˆœ์„œ(ordinal position)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์šฐ์„ ํ•ด์„œ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.
  • ํŠนํžˆ, ๊ธฐ๋ณธ์ปฌ๋Ÿผ์—์„œ CASE๋ฌธ ๋“ฑ์œผ๋กœ ์œ ๋„๋œ ์ปฌ๋Ÿผ(derived column)์ด ์ฐจ์›์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ์—๋Š” ๋ฐ˜๋“œ์‹œ ๋ณ„์นญ์ด๋‚˜ ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋˜์–ด ์‚ฌ์šฉ๋˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.
  • HAVING ์ ˆ์—์„œ๋Š” ์ปฌ๋Ÿผ ๋ณ„์นญ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
SELECT station_id,
       name AS station_name,
       status,
       CASE
         WHEN ST_DISTANCE (
                ST_GEOGPOINT(longitude, latitude), 
                ST_GEOGPOINT(-0.118092, 51.509865)
              ) < 7000000 THEN 'downtown'
         ELSE 'outskirt'
       END AS downtown_or_outskirt,
       COUNT(1) AS cnt,    -- ์ง‘๊ณ„ํ•จ์ˆ˜์— ์˜ํ•ด ์œ ๋„๋œ metric ์ปฌ๋Ÿผ์— ๋ณ„์นญ ๋ถ€์—ฌ
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE s.status IN ('active')
   AND s.latitude > 135.0
 GROUP BY 1, 2, 3
HAVING cnt > 1
;

์•„๋ž˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ SELECT ๋ฆฌ์ŠคํŠธ์— ์œ ๋„๋œ ์ปฌ๋Ÿผ์„ ์ •์˜ํ•˜๊ณ  ํ•ด๋‹น ์ปฌ๋Ÿผ์„ ์ฐจ์›์œผ๋กœ GROUP BY ์ง‘๊ณ„๋ฅผ ํ•˜๋Š” ๊ฒฝ์šฐ GROUP BY ๋ฆฌ์ŠคํŠธ์—์„œ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋˜์–ด ๋ฌธ์žฅ์ด ๊ธธ์–ด์ง€๊ณ  ํ–ฅํ›„ ์œ ์ง€๋ณด์ˆ˜๊ฐ€ ์–ด๋ ค์›Œ์ง€๊ฒŒ ๋œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT CASE
         WHEN is_male = TRUE THEN 'Male'
         WHEN is_male = FALSE THEN 'Female'
       END AS gender,
       COUNT(1) AS cnt,
  FROM `bigquery-public-data.samples.natality`
 GROUP BY 1
HAVING cnt > 1000
; 

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT CASE
         WHEN is_male = TRUE THEN 'Male'
         WHEN is_male = FALSE THEN 'Female'
       END,
       COUNT(1) AS cnt,
  FROM `bigquery-public-data.samples.natality`
 GROUP BY CASE
            WHEN is_male = TRUE THEN 'Male'
            WHEN is_male = FALSE THEN 'Female'
          END
;

๋˜ํ•œ, ์œ ๋„๋œ ์ปฌ๋Ÿผ์„ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ง‘๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์— ์ปฌ๋Ÿผ ์ด๋ฆ„๊ณผ ๋ณ„์นญ์„ ํ˜ผ๋™ํ•˜์—ฌ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด ์›์น˜ ์•Š๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋งŒ๋“ค์–ด์งˆ ์ˆ˜ ์žˆ์œผ๋‹ˆ ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

DECLARE data ARRAY<STRUCT<region STRING,  amount INT64>> DEFAULT [
    ('', 1), ('NULL', 2), ('KOR', 2), ('USA', 3), ('KOR', 4), ('CHN', 1)
];

SELECT IF(d.region IN ('', 'NULL'), 'UNKNOWN', d.region) AS rgn,
       SUM(d.amount) AS totals,
  FROM UNNEST(data) d GROUP BY region; -- ๋ณ„์นญ์ด ์•„๋‹Œ ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ๋ฃนํ•‘
ํ–‰ rgn totals
1 UNKNOWN 1
2 UNKNOWN 2
3 KOR 6
4 USA 3
5 CHN 1

๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹ Common Table Expressions, CTEs

Common Table Expression ์€ ์„œ๋ธŒ ์ฟผ๋ฆฌ ์ˆ˜ํ–‰์œผ๋กœ ๋งŒ๋“ค์–ด์ง€๋Š” ์ž„์‹œ ๊ฒฐ๊ณผ์…‹(result set)์— ์ด๋ฆ„์„ ๋ถ€์—ฌํ•˜์—ฌ ๋‹ค๋ฅธ ๊ณณ์—์„œ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ‘œํ˜„๋ฐฉ์‹์ด๋‹ค. ์„œ๋ธŒ์ฟผ๋ฆฌ์˜ ๊ฒฐ๊ณผ์…‹์— ์ด๋ฆ„์„ ๋ถ€์—ฌํ•œ Named Subquery ์ธ ์…ˆ์ด๋‹ค. CTE๋Š” ๋ณต์žกํ•œ ์ฟผ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์žฌ๊ตฌ์„ฑํ•˜๋Š”๋ฐ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ํ•œ๋‹ค.

CTE ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

์ฝ”๋“œ ๋‚ด์— 2๋‹จ๊ณ„๋ฅผ ๋„˜์–ด ์ค‘์ฒฉ๋˜๋Š” ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ CTE ๊ตฌ๋ฌธ์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์ „์ฒด ์ฟผ๋ฆฌ ๊ตฌ์กฐ๊ฐ€ ์ ์ ˆํ•œ ๊นŠ์ด(2 depth)๋ฅผ ์œ ์ง€ ํ•˜๋„๋ก ํ•œ๋‹ค.

CTE๋Š” ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ(table subquery)์™ธ์—๋„ ์Šค์นผ๋ผ ์„œ๋ธŒ์ฟผ๋ฆฌ(scalar subquery)๋ฅผ ๋Œ€์ฒดํ•˜์—ฌ ์ฝ”๋“œ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ๋งŒ๋“œ๋Š”๋ฐ๋„ ์œ ์šฉํ•˜๋‹ค.

์•„๋ž˜์˜ ์˜ˆ์‹œ์—์„œ๋Š” 2๋‹จ๊ณ„ ์ด์ƒ ์ค‘์ฒฉ๋œ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ CTE ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ์˜ ๊นŠ์ด๊ฐ€ ํ•œ ๋‹จ๊ณ„๊ฐ€ ๋˜๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค. ์ดํ›„ ๋‹จ๊ณ„๋ณ„ ์ค‘๊ฐ„ ํ…Œ์ด๋ธ” ์—ญ์‹œ CTE๋กœ ์ •์˜ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋ฉด์„œ ์ตœ์ข…์ ์œผ๋กœ๋Š” ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๊ฒŒ ๋œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ avoid

SELECT *
  FROM (
    SELECT * 
      FROM (
        SELECT * 
          FROM base_table
      ) AS first_subquery
  ) AS second_subquery

๐Ÿ˜„ ๊ถŒ์žฅ recommend

WITH first_subquery (  -- ๊ฐ€์žฅ ์•ˆ์ชฝ์˜ ์„œ๋ธŒ์ฟผ๋ฆฌ
  SELECT *
    FROM base_table
),
second_subquery ( -- ์ƒ์œ„ ์„œ๋ธŒ์ฟผ๋ฆฌ
  SELECT *
    FROM first_subquery
)
SELECT *
  FROM second_subquery
;

FROM ์ ˆ์— ์ง์ ‘ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์•„๋ž˜์ชฝ์— ์œ„์น˜ํ•œ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ๋จผ์ € ์ฝ๊ณ  ๋‹ค์‹œ ์œ„์ชฝ์˜ ์ฟผ๋ฆฌ๋ฅผ ์ฝ์–ด์•ผ ํ•˜๋Š” ๋ฐ˜๋ฉด, CTE๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ์œ„์—์„œ๋ถ€ํ„ฐ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์•„๋ž˜๋กœ ์ฝ”๋“œ ์ฝ๊ธฐ๊ฐ€ ๊ฐ€๋Šฅํ•œ ์žฅ์ ์ด ์žˆ๋‹ค.

์•„๋ž˜๋Š” CTEs๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ ๋„๋œ ์ปฌ๋Ÿผ์ด ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋„๋ก ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ•œ ์˜ˆ์‹œ์ด๋‹ค. ์ค‘๋ณต์€ ์ฝ”๋“œ ๊ธธ์ด๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ณ  ์œ ์ง€๋ณด์ˆ˜๋ฅผ ์–ด๋ ต๊ฒŒ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„œ๋ธŒ์ฟผ๋ฆฌ๋‚˜ CTE๋ฅผ ์ด์šฉํ•˜์—ฌ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋„๋ก ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT name AS station_name,
       ST_DISTINCE (
         ST_GEOPOINT(longitude, latitude), ST_GEOPOINT(-0.118092, 51.509865)
       ) AS distance_from_city_centre_m
  FROM `bigquery-public-data.london_bicycles.cycle_stations`
 WHERE ST_DISTINCE (
         ST_GEOPOINT(longitude, latitude),
         ST_GEOPOINT(-0.118092, 51.509865)
       ) <= 500
 ORDER BY distance_from_city_centre_m

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

WITH stations (
  SELECT name AS station_name,
         ST_DISTINCE (
           ST_GEOPOINT(longitude, latitude), ST_GEOPOINT(-0.118092, 51.509865)
         ) AS distance_from_city_centre_m
    FROM `bigquery-public-data.london_bicycles.cycle_stations`
)
SELECT * 
  FROM stations
 WHERE distance_from_city_centre_m <= 500
 ORDER BY distance_from_city_centre_m

JOIN ์ ˆ

JOIN ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

  • INNER์™€ OUTER ํ‚ค์›Œ๋“œ๋Š” ์ƒ๋žต๋˜๋”๋ผ๋„ ์˜๋ฏธ์ ์œผ๋กœ ํ˜ผ๋™์„ ์ฃผ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋žตํ•œ๋‹ค. ๋‚ด๋ถ€์กฐ์ธ์€ JOIN, ์™ธ๋ถ€์กฐ์ธ์€ LEFT JOIN, OUTER JOIN์„ ์‚ฌ์šฉํ•œ๋‹ค.
  • Cartesian Product๋Š” CROSS JOIN ์‚ฌ์šฉํ•œ ์กฐ์ธ ํ˜•์‹์„ ์ฝค๋งˆ(,)๋ฅผ ์‚ฌ์šฉํ•œ ์˜ค๋ผํด ์กฐ์ธ ํ˜•์‹ ๋ณด๋‹ค ์šฐ์„ ํ•œ๋‹ค. ๋‹ค๋งŒ, ์•„๋ž˜์—์„œ ์–ธ๊ธ‰ํ•  ์ƒ๊ด€ ํฌ๋กœ์Šค ์กฐ์ธ (correlated cross-join)์˜ ๊ฒฝ์šฐ๋Š” ์ฝค๋งˆ ์กฐ์ธ ํ˜•์‹์„ ์šฐ์„ ํ•œ๋‹ค.
  • ON ์ ˆ์€ JOIN์ ˆ ๋ฐ”๋กœ ์•„๋ž˜์— ์ž‘์„ฑํ•˜๋ฉฐ ๊ฒฐํ•ฉ์กฐ๊ฑด์ด ์—ฌ๋Ÿฌ ๊ฐœ ๋“ฑ์žฅํ•˜๋Š” ๊ฒฝ์šฐ๋Š” WHERE ์ ˆ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ณต์ˆ˜๊ฐœ์˜ ๊ฒฐํ•ฉ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•œ๋‹ค.
  • ํ…Œ์ด๋ธ” ๋ณ„์นญ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธด ํ…Œ์ด๋ธ” ์ด๋ฆ„์ด ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT m.*,
       d.*,
  FROM masters
  LEFT OUTER JOIN details
    ON masters.mid = details.did

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT m.*,
       d.*,
  FROM masters AS m
  LEFT JOIN details AS d
    ON m.mid = d.did
   AND m.status = d.sts

์ƒ๊ด€ ํฌ๋กœ์Šค ์กฐ์ธ (correlated cross-join)

BigQuery์˜ ๋ฐฐ์—ด์„ ํ‰ํƒ„ํ™”(flattening)ํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋Š” CROSS JOIN ์„ ์ƒ๊ด€ ํฌ๋กœ์Šค ์กฐ์ธ์ด๋ผํ•˜๊ณ  ์—ฌ๊ธฐ์„œ์˜ ์ƒ๊ด€(correleated)์€ ํฌ๋กœ์Šค ์กฐ์ธ์˜ ๋ฒ”์œ„๊ฐ€ ํ…Œ์ด๋ธ” ์ „์ฒด ํ–‰(row)์ด ์•„๋‹Œ ๋ฐฐ์—ด์„ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋Š” ํ–‰(row)์— ์—ฎ์—ฌ ๊ตญํ•œ๋œ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.

์•„๋ž˜ ์˜ˆ์‹œ์˜ ๊ฒฝ์šฐ data ํ…Œ์ด๋ธ”๊ณผ y ๋ฐฐ์—ด์ด ์˜ค๋ผํด ํ˜•์‹์˜ ํฌ๋กœ์Šค ์กฐ์ธ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ์œผ๋‚˜, ์ผ๋ฐ˜์ ์ธ ํฌ๋กœ์Šค ์กฐ์ธ ์ฒ˜๋Ÿผ y ๋ฐฐ์—ด์˜ ์š”์†Œ๋“ค์ด data ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ํ–‰(row) ๊ณผ ๊ฒฐํ•ฉ๋˜์–ด m X n์˜ ์กฐํ•ฉ์„ ๋งŒ๋“ค์ง€ ์•Š๊ณ  y๋ฐฐ์—ด์„ ํฌํ•จํ•œ ํ•˜๋‚˜์˜ ํ–‰๊ณผ 1 X n(y๋ฐฐ์—ดํฌ๊ธฐ) ์˜ ํ•œ์ •๋œ ์กฐ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค.

-- Correlated Cross-Join
WITH data AS (
  SELECT 1 AS w, 'a' AS x, ['P', 'Q'] AS y
   UNION ALL
  SELECT 2 AS w, 'b' AS x, ['R', 'S', 'T'] AS y
)
-- Comma Join --> data CROSS JOIN data.y ๋™์ผ
SELECT w, x, y_ FROM data, data.y y_  
; 
-- ์œ„์—์„œ ๋ฉ”์ธ `SELECT` ๋ฌธ์žฅ์€ ์•„๋ž˜ ์ฝ”๋“œ๊ฐ€ ์ถ•์•ฝ๋œ ํ˜•ํƒœ์ด๋‹ค.
-- UNNEST()๋ฅผ ์ƒ๋žตํ•˜๊ณ  y ์ปฌ๋Ÿผ์ด๋ฆ„์„ data.y๋กœ ํ•œ์ •์‹œํ‚ด.
SELECT w, x, y_ FROM data, UNNEST(y) y_  

Window Function

์œˆ๋„์šฐ ํ•จ์ˆ˜ ๊ธฐ๋ณธ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ

๋ถ„์„ ํ•จ์ˆ˜ ๋˜๋Š” ์œˆ๋„์šฐ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ์ปฌ๋Ÿผ์˜ ๊ตฌ์กฐ๋Š” ๋Œ€๋žต ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • OVER ํ‚ค์›Œ๋“œ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์™ผ์ชฝ์— ๋ถ„์„ ํ•จ์ˆ˜๊ฐ€ ๋†“์ด๊ณ  ์˜ค๋ฅธ์ชฝ์—๋Š” ๋ถ„์„ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•  ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„ (window frame)์„ ์ •์˜ํ•˜๋Š” ๊ตฌ๋ฌธ์ด ๋‚˜์˜จ๋‹ค.
  • ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„(window frame) ์ •์˜ ๊ตฌ๋ฌธ์€ ๋‹ค์‹œ ์•„๋ž˜์˜ 3๊ฐœ์˜ ๊ตฌ๋ฌธ์œผ๋กœ ์„ธ๋ถ„ํ™” ๋œ๋‹ค. 3๊ฐ€์ง€ ๊ตฌ๋ฌธ์€ ๋ชจ๋‘ ์„ ํƒ์ ์œผ๋กœ(optional)๋กœ ์‚ฌ์šฉ๋œ๋‹ค.
    • PARTITION BY๋กœ ํŒŒํ‹ฐ์…˜(partition)์„ ์ •์˜
    • ORDER BY๋กœ ๊ฐ ํŒŒํ‹ฐ์…˜์„ ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ• ์ง€ ์ •์˜
    • ROWS ๋‚˜ RANGE ๋กœ ๊ฐ ํŒŒํ‹ฐ์…˜์•ˆ์—์„œ ๋ฌผ๋ฆฌ์  ๋˜๋Š” ๋…ผ๋ฆฌ์ ์ธ ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„(window frame)์„ ์–ด๋–ป๊ฒŒ ์ ์šฉํ• ์ง€ ์ •์˜

์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ์ฝ”๋”ฉ ์Šคํƒ€์ผ์€ ๋‹ค์Œ ์ง€์นจ์— ๋”ฐ๋ผ ์ž‘์„ฑํ† ๋ก ํ•œ๋‹ค.

  • ๋ถ„์„ ํ•จ์ˆ˜์™€ OVER ํ‚ค์›Œ๋“œ๋Š” ์ปฌ๋Ÿผ์˜ ์ฒซ๋ฒˆ์งธ ๋ผ์ธ์— ์œ„์น˜์‹œํ‚จ๋‹ค.
  • OVER () ๊ตฌ๋ฌธ๋‚ด์— ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„์„ ์ •์˜ํ•˜๋Š” ๊ตฌ๋ฌธ์ด ๊ธธ์–ด์ง€๋Š” ๊ฒฝ์šฐ ์ค„๋ฐ”๊ฟˆ์„ ํ•œ ํ›„ ๊ฐ ๊ตฌ๋ฌธ์„ ํ•˜๋‚˜์˜ ๋ผ์ธ์— ๊ธฐ์ˆ ํ•œ๋‹ค.
  • ์ค„๋ฐ”๊ฟˆ๋œ ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„ ์ •์˜ ๊ด€๋ จ ํ‚ค์›Œ๋“œ๋“ค์€ ์œˆ๋„์šฐ ํ•จ์ˆ˜ ์ด๋ฆ„์„ ๊ธฐ์ค€์œผ๋กœ ๋‘ ์นธ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ํ•œ ํ›„ ์™ผ์ชฝ ์ •๋ ฌ์‹œํ‚จ๋‹ค.

์œ„์˜ ์Šคํƒ€์ผ์„ ์ ์šฉํ•˜์—ฌ ์ž‘์„ฑํ•œ ์ฝ”๋“œ ์˜ˆ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • ์ฒซ๋ฒˆ์งธ ์˜ˆ์‹œ
SELECT item, purchases, category,
       SUM(purchases) OVER ( 
         PARTITION BY category  -- ๋‘ ์นธ ๋“ค์—ฌ์“ฐ๊ธฐ ๋ฐ 
         ORDER BY purchases     -- PARTITION, ORDER ํ‚ค์›Œ๋“œ ์™ผ์ชฝ ์ •๋ ฌ
       ) AS total_purchases,    -- ๋ถ„์„ํ•จ์ˆ˜ ์ปฌ๋Ÿผ์— ๋Œ€ํ•œ ๋ณ„์นญ(alias)

       SUM(purchases) OVER ( 
         PARTITION BY category 
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_purchases,

       LAST_VALUE(item) OVER ( 
         PARTITION BY category ORDER BY purchases
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS most_popular,
  FROM Produce
;

SELECT animal, population, category,
       COUNT(*) OVER (
         ORDER BY population
         RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
       ) AS similar_population,
  FROM Farm
;
  • ๋‘ ๋ฒˆ์งธ ์˜ˆ์‹œ
SELECT name, 
       division,
       FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
       FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time,
       TIMESTAMP_DIFF(finish_time, fastest_time, SECOND) AS delta_in_seconds
  FROM (
    SELECT name, finish_time, division,
           FIRST_VALUE(finish_time) OVER (
             PARTITION BY division ORDER BY finish_time ASC
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
           ) AS fastest_time
      FROM finishers
);

User Defined Function

BigQuery์˜ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋Š” SQL๊ณผ JavaScript ๋‘ ๊ฐ€์ง€ ์–ธ์–ด๋กœ ์ž‘์„ฑ ๊ฐ€๋Šฅํ•˜๋‹ค.

  • ํ•จ์ˆ˜ ๋ณธ๋ฌธ(function body)์„ ์ž‘์„ฑํ•  ๋•Œ๋Š” ๊ฐ ์–ธ์–ด์˜ ๊ธฐ๋ณธ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
  • SQL ์–ธ์–ด๋กœ ์ž‘์„ฑ์‹œ๋Š” ์ด ๋ฌธ์„œ์˜ ๋‚ด์šฉ์„ ์ฐธ๊ณ ํ•œ๋‹ค.
  • JavaScript ์–ธ์–ด๋กœ ์ž‘์„ฑ์‹œ๋Š” ํŒ€ ๋‚ด ์‚ฌ์šฉ์ค‘์ธ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ๋”ฐ๋ฅธ๋‹ค. ์—†์„ ๊ฒฝ์šฐ์—๋Š” ๋‹ค์Œ์„ ์ฐธ์กฐํ•œ๋‹ค.

์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ ์Šคํƒ€์ผ ์˜ˆ์‹œ

  • JavaScript UDF
CREATE TEMP FUNCTION multiply_inputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS 
"""
  return x * y;
""";
SELECT multiplyInputs(3, 5) as product;
  • SQL UDF
CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64) AS (
  x * y
);
SELECT multiplyInputs(3, 5) as product
CREATE TEMPORARY FUNCTION dayOfWeek(x TIMESTAMP) AS ( 
  ['Sun','Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
  [ORDINAL(EXTRACT(DAYOFWEEK from x))]
);
CREATE TEMPORARY FUNCTION getDate(x TIMESTAMP) AS (
  EXTRACT(DATE FROM x) 
);
WITH overnight_trips AS (
  SELECT duration, dayOfWeek(start_date) AS start_day
    FROM `bigquery-public-data.london_bicycles.cycle_hire`
   WHERE getDate(start_date) != getDate(end_date)
)
SELECT * FROM overnight_trips LIMIT 100;

SQL UDF ์—์„œ SELECT ๋ฌธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ผ๋ฐ˜์ ์ธ ์Šค์นผ๋ผ ์„œ๋ธŒ์ฟผ๋ฆฌ(scalar subquery)์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ด„ํ˜ธ๋กœ ๊ฐ์‹ธ์ค˜์•ผ ํ•œ๋‹ค. ์ด ๋•Œ ๊ด„ํ˜ธ๋Š” ์•„๋ž˜์ฒ˜๋Ÿผ ํ•จ์ˆ˜ ๋ณธ๋ฌธ (_functon body)์˜ ๊ด„ํ˜ธ์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•˜์ด ์ด์ค‘ ๊ด„ํ˜ธ ํ˜•ํƒœ๊ฐ€ ๋˜๋„๋ก ํ•˜์—ฌ UDF ๋ณธ๋ฌธ์ด ๊ด„ํ˜ธ๊ฐ€ ์•„๋‹Œ SELECT๋กœ ์‹œ์ž‘๋˜๋„๋ก ๋งŒ๋“ ๋‹ค.

CREATE OR REPLACE FUNCTION `jaeseok-park.fn.median` (arr ANY TYPE) AS ((
  SELECT IF(MOD(ARRAY_LENGTH(arr), 2) = 0, 
            (arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] +
             arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))] ) / 2,
             
            arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
         ) AS median
   FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
));

BigQuery Scripting

TBD


BigQuery ์Šคํƒ€์ผ ์˜ˆ์‹œ

์•„๋ž˜๋Š” ์ง€๊ธˆ๊นŒ์ง€์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ์‹ค์ œ ๋ถ„์„์šฉ ์ฟผ๋ฆฌ๋ฅผ ๊ตฌ์กฐํ™”ํ•œ ์˜ˆ์‹œ์ด๋‹ค. ์ฟผ๋ฆฌ๋Š” ์•„๋ž˜ ๋งํฌ์˜ ๋‚ด์šฉ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

#standardSQL
WITH variants AS (
  -- Retrieve the variants in this cohort, flattening by alternate bases and
  -- counting affected alleles.
  SELECT REPLACE(reference_name, 'chr', '') as reference_name,
         start_position,
         end_position,
         reference_bases,
         alternate_bases.alt AS alt,
         (SELECT COUNTIF(gt = alt_offset + 1) FROM v.call, call.genotype gt) AS num_variant_alleles,
         (SELECT COUNTIF(gt >= 0) FROM v.call, call.genotype gt) AS total_num_alleles,
    FROM `bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823`,
         UNNEST(alternate_bases) alternate_bases WITH OFFSET alt_offset
),
intervals AS (
  -- Define an inline table that uses five rows
  -- selected from silver-wall-555.TuteTable.hg19.
  SELECT * 
    FROM UNNEST([
           STRUCT<Gene STRING, Chr STRING, 
                  gene_start INT64, gene_end INT64,
                  region_start INT64, region_end INT64>
           ('PRCC',  '1', 156736274, 156771607, 156636274, 156871607),
           ('NTRK1', '1', 156785541, 156852640, 156685541, 156952640),
           ('PAX8',  '2', 113972574, 114037496, 113872574, 114137496),
           ('FHIT',  '3',  59734036,  61238131,  59634036,  61338131),
           ('PPARG', '3',  12328349,  12476853,  12228349,  12576853)
         ])
),
gene_variants AS (
  --
  -- JOIN the variants with the genomic intervals overlapping
  -- the genes of interest.
  --
  -- The JOIN criteria is complicated because the task is to see if
  -- an SNP overlaps an interval.  With standard SQL you can use complex
  -- JOIN predicates, including arbitrary expressions.
  SELECT reference_name,
         start_position,
         reference_bases,
         alt,
         num_variant_alleles,
         total_num_alleles,
    FROM variants v
    JOIN intervals i
      ON v.reference_name = i.Chr
     AND i.region_start <= v.start_position
     AND i.region_end >= v.end_position 
)
--
-- And finally JOIN the variants in the regions of interest
-- with annotations for rare variants.
SELECT DISTINCT 
       Chr,
       annots.Start AS Start,
       Ref,
       annots.Alt,
       Func,
       Gene,
       PopFreqMax,
       ExonicFunc,
       num_variant_alleles,
       total_num_alleles,
  FROM `silver-wall-555.TuteTable.hg19` AS annots
  JOIN gene_variants AS vars
    ON vars.reference_name = annots.Chr
   AND vars.start_position = annots.Start
   AND vars.reference_bases = annots.Ref
   AND vars.alt = annots.Alt
 WHERE PopFreqMax <= 0.01 -- Retrieve annotations for rare variants only.
 ORDER BY Chr, Start
;

์ฐธ๊ณ  (references)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors