Order-Book Feature Extraction
Turn raw top-of-book snapshots and the trade tape into the microstructure features signal models consume - mid, spread, imbalance, microprice, signed flow and rolling VWAP - in clean, windowed SQL with a pandas cross-check.
The scenario
You're sitting with the microstructure research team at an electronic market maker. They hand you raw level-2 order-book snapshots and the trade tape for a single name across one morning, and ask you to turn them into the features the signal models actually consume. The work is mostly SQL — clean, windowed, reproducible — with a short Python cross-check.
Where this shows up
Turning raw book and trade data into microstructure features is a staple SQL/data take-home for HFT and electronic-trading research roles at firms of this type.
Firms such as Citadel Securities, Hudson River Trading, Jump.
DeskPrep is not affiliated with, endorsed by, or sponsored by any named firm. Firm names are used for illustrative, educational purposes only and do not imply that these materials are official assessments of, or are connected with, those firms.
Task brief
# Order-Book Feature Extraction **Role relevance:** Microstructure / data take-home for HFT and electronic-trading research desks. **Estimated time:** 45-60 minutes **Difficulty:** Intermediate **Format:** SQL (.sql) + sample CSVs, with a pandas cross-check ## What you are given - `book_snapshots.csv` - top-of-book snapshots (ts, bid/ask price & size) - `trades.csv` - the trade tape (ts, price, size) - `order_book_features_starter.sql` - table definitions + task stubs ## What you must deliver 1. Mid-price and quoted spread per snapshot 2. Depth imbalance and the microprice 3. Tick-rule trade signs and a rolling 3-trade VWAP 4. Features aggregated into 1-second buckets ## Constraints - Use set-based SQL (window functions), not row-by-row loops. - Load the CSVs into SQLite - no database server required. ## Submission note Complete the starter SQL, then check your output against the mark scheme and the pandas reference.
Your tasks
- 01Load the provided snapshots and trades into SQLite and reconstruct best bid, best ask, mid-price and quoted spread per snapshot.
- 02Compute level-1 depth imbalance and the microprice (size-weighted mid).
- 03Sign each trade as buyer- or seller-initiated using the tick rule, then build a rolling VWAP from the tape.
- 04Aggregate the features into fixed time buckets (e.g. 1-second bars) and reconcile your SQL output against the Python reference.
How you're assessed
The full points-based mark scheme is included with the pack.
What you'll learn
- How a limit order book is structured and what mid, spread, imbalance and microprice each tell you.
- Why order-book imbalance and the microprice carry short-horizon predictive signal.
- How to express time-series features with SQL window functions instead of loops.