DeskPrep
SQLintermediate~60 min

Order-Book Feature Extraction

Turn raw top-of-book snapshots and the trade tape into the microstructure features signal models consume - mid, spread, imbalance, microprice, signed flow and rolling VWAP - in clean, windowed SQL with a pandas cross-check.

The scenario

You're sitting with the microstructure research team at an electronic market maker. They hand you raw level-2 order-book snapshots and the trade tape for a single name across one morning, and ask you to turn them into the features the signal models actually consume. The work is mostly SQL — clean, windowed, reproducible — with a short Python cross-check.

Where this shows up

Turning raw book and trade data into microstructure features is a staple SQL/data take-home for HFT and electronic-trading research roles at firms of this type.

high-frequency market makerselectronic trading desksmicrostructure research

Firms such as Citadel Securities, Hudson River Trading, Jump.

DeskPrep is not affiliated with, endorsed by, or sponsored by any named firm. Firm names are used for illustrative, educational purposes only and do not imply that these materials are official assessments of, or are connected with, those firms.

Task brief

README.md
# Order-Book Feature Extraction

**Role relevance:** Microstructure / data take-home for HFT and electronic-trading research desks.
**Estimated time:** 45-60 minutes
**Difficulty:** Intermediate
**Format:** SQL (.sql) + sample CSVs, with a pandas cross-check

## What you are given
- `book_snapshots.csv` - top-of-book snapshots (ts, bid/ask price & size)
- `trades.csv` - the trade tape (ts, price, size)
- `order_book_features_starter.sql` - table definitions + task stubs

## What you must deliver
1. Mid-price and quoted spread per snapshot
2. Depth imbalance and the microprice
3. Tick-rule trade signs and a rolling 3-trade VWAP
4. Features aggregated into 1-second buckets

## Constraints
- Use set-based SQL (window functions), not row-by-row loops.
- Load the CSVs into SQLite - no database server required.

## Submission note
Complete the starter SQL, then check your output against the mark scheme and the pandas reference.

Your tasks

  1. 01Load the provided snapshots and trades into SQLite and reconstruct best bid, best ask, mid-price and quoted spread per snapshot.
  2. 02Compute level-1 depth imbalance and the microprice (size-weighted mid).
  3. 03Sign each trade as buyer- or seller-initiated using the tick rule, then build a rolling VWAP from the tape.
  4. 04Aggregate the features into fixed time buckets (e.g. 1-second bars) and reconcile your SQL output against the Python reference.

How you're assessed

Correct best bid/ask, mid and spread2 pts
Depth imbalance and microprice2 pts
Trade signing and rolling VWAP3 pts
Correct time-bucket aggregation / windowing2 pts
Clean, readable SQL1 pt
Total10 pts

The full points-based mark scheme is included with the pack.

What you'll learn

  • How a limit order book is structured and what mid, spread, imbalance and microprice each tell you.
  • Why order-book imbalance and the microprice carry short-horizon predictive signal.
  • How to express time-series features with SQL window functions instead of loops.

Study alongside