{"title":"QuantLab US Equity Dataset Catalog","description":"\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab provides a clean, point-in-time, public-domain \/ CC0 US-equity dataset package built for AI model training, backtesting, RAG systems, event studies, factor research, and quant experimentation. The full package covers approximately 3,200 US-equity tickers, 75M+ rows, and about 19 datasets sourced from SEC EDGAR, FRED, and Quandl\/WIKI via Kaggle. It is delivered as one ready-to-load dataset bundle with Parquet files, documentation, schemas, README files, and research starter resources.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe full QuantLab Dataset Package is available as a one-time digital download for $49.99. A free sampler is available so users and AI agents can verify the structure, schema, source notes, and limitations before purchase. The sampler provides a real cross-catalog slice of the full dataset and is designed to help users inspect the actual format before accessing the complete bundle.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eLicense and Usage\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab is built from public financial data sources, including US-government filings and public datasets. The package is provided under public-domain \/ CC0 terms and is commercially redistributable. It may be used for AI model training, backtesting, RAG systems, research workflows, and derived work. No attribution is required.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eLLM-derived fields, including 8-K classification, FOMC stance, and 10-K narrative signals, are QuantLab’s analysis of public financial text. These derived fields are included with confidence scores and retained source text so users can review, verify, or recompute the outputs. Users should review the free sampler, dataset documentation, and schema files before using the package in production, commercial, or research workflows.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eDataset Overview\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab includes approximately 3,199 tickers with price history, 964 curated canonical core companies, and approximately 2,635 companies with deeper fundamentals. The package contains roughly 75M+ rows across about 19 datasets and is delivered as an approximately 1.7 GB download. Historical prices cover 1962–2018 and are frozen for reproducibility. Fundamentals, filings, events, macro data, and text-derived layers extend through 2026 where available.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe dataset is delivered primarily in Parquet format and can be loaded with standard data tools such as \u003c\/span\u003e\u003ccode dir=\"ltr\"\u003e\u003cspan\u003epd.read_parquet(...)\u003c\/span\u003e\u003c\/code\u003e\u003cspan\u003e. The package includes README documentation, schema notes, runnable starter notebooks, and machine-readable catalog files designed for both humans and AI agents.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eIncluded Dataset Groups\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe package includes historical prices and supervised labels for financial machine-learning workflows. Historical Prices include approximately 15.4M daily OHLCV records with split- and dividend-adjusted bars across 3,199 tickers from 1962–2018. Supervised Labels include forward returns over 1-day, 5-day, 21-day, 63-day, and 252-day horizons, realized-volatility regimes, cross-sectional return quintiles, and time-aware train\/validation\/test splits. These labels are designed to join to price data on ticker and date.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab includes point-in-time fundamentals and financial ratios. Point-in-Time Fundamentals include approximately 6.7M US-GAAP XBRL rows across 2,635 companies from 2009–2026, with filed dates included to reduce lookahead bias in training and backtesting. Fundamental Ratios include approximately 31,053 rows across 2,225 companies, including margins, ROE, ROA, leverage, CFO margin, year-over-year growth, and diluted EPS. Financial Health data includes Piotroski F-Score signals with approximately 26,162 rows across 2,186 companies, including all nine underlying signals. The package also includes an Earnings Calendar with approximately 147K 10-K and 10-Q filing dates as a clean event timeline.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab includes SEC event and filing datasets. The 8-K Material Events dataset contains approximately 394,808 filings enriched with SEC item codes, one-sentence factual summaries, market-material flags, and confidence scores. The 8-K classifier is validated at 98.2% substantive exact match against declared item codes across 313,878 filings. Insider Trades include approximately 1.17M Form 4 filings covering officers, directors, and 10% owners. Parsed Insider Transactions include approximately 846K non-derivative transactions and 245K derivative transactions, with fields such as insider, role, ticker, date, shares, and price. Insider Signals include approximately 36,531 monthly rows across 1,007 tickers, including net buy\/sell dollar values, distinct buyers, C-suite flags, and cluster-buy flags.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab includes institutional ownership and smart-money flow datasets. 13F Holdings include approximately 13.5M holding records from 9,270 managers, including major institutional investors such as Berkshire, ARK, Citadel, Vanguard, BlackRock, and Bridgewater. Smart Money Flow includes approximately 21.4M fund positions with quarter-over-quarter accumulation and distribution signals, along with trackers for 23 famous managers including Buffett, Burry, Ackman, Wood, and Dalio. Coverage extends from 2013-Q2 to 2025-Q4.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab includes macroeconomic and Federal Reserve text datasets. Macro Indicators include 14 curated FRED series such as Fed funds, yield curve, VIX, CPI, PCE, unemployment, GDP, USD, and oil, with approximately 75,192 observations from 1990–2026. Macro Regime Labels include daily rate, yield-curve, inflation, volatility, labor, and composite-risk regimes across approximately 9,245 rows. FOMC and Fed Speech data includes approximately 1,255 full-text documents, including statements, minutes, and speeches, with a keyword-based hawk\/dove scorer. FOMC RAG-Ready data includes approximately 15,000 pre-chunked segments designed for LangChain, LlamaIndex, and similar retrieval systems. FOMC Hawk\/Dove Scored data includes LLM-derived document scores from −1 dovish to +1 hawkish, with driver quotes, delta-versus-prior values, and confidence scores.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab includes reference and NLP datasets. The Sector Knowledge Graph includes approximately 2,851 company nodes and 13,393 edges. SIC sector and industry classification is complete for all nodes, while GICS columns are partial at approximately 16% coverage. 10-K Narrative Signals include approximately 6,442 filing-level records with management sentiment, MD\u0026amp;A summaries, top risk factors, and risk tone. FX Rates include approximately 132,192 rows for USD normalization of foreign-currency fundamentals.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eAgent Tooling\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe package includes agent-oriented tooling designed for AI workflows. This includes the \u003c\/span\u003e\u003ccode dir=\"ltr\"\u003e\u003cspan\u003efindatasets\u003c\/span\u003e\u003c\/code\u003e\u003cspan\u003e Python package with 13 tool functions, OpenAI function-calling schemas, an MCP server with 8 tools for Claude Desktop and Cursor, a 16-template finance prompt library, 16 supervised fine-tuning examples, and a 27-question finance-agent evaluation benchmark.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThese tools are designed to help AI agents inspect, query, and reason over the dataset package. The catalog and schema documentation are structured so agents can identify dataset names, fields, keys, limitations, and recommended joins.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eWhat Makes QuantLab Different\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab is designed around point-in-time financial research. Filed dates are included throughout the package to help reduce lookahead bias. The dataset is structured so users can distinguish between when information was reported, when it was filed, and when it could reasonably have been known.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab is also cross-joined and alias-normalized. Ticker changes such as FB to META and SQ to XYZ are handled with documented keys so users can merge across price, fundamentals, filings, events, insider transactions, institutional holdings, macro data, and text datasets. Key joins are documented using fields such as ticker, CIK, accession number, date, quarter, and other time keys.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe package includes validated LLM enrichment rather than unsupported model outputs. The 8-K classifier is measured against ground-truth SEC item declarations with a 98.2% substantive exact-match validation result. LLM-derived fields include confidence scores and retain raw source text so users can verify, spot-check, or recompute the enriched outputs.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe dataset is designed to be pre-aligned rather than a collection of disconnected files. Sentiment, events, smart-money flow, filing timelines, and macro regimes are prepared to work with the price and fundamental panels, allowing users to build research workflows without manually assembling every source from scratch.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eBest Use Cases\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab is best suited for AI model training, financial machine learning, backtesting, event studies, factor research, RAG over SEC and Federal Reserve documents, finance-AI coursework, and agent-based financial analysis. It is designed for users who need documented schemas, licensing clarity, point-in-time structure, reproducible historical data, and structured joins across multiple public financial data sources.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe dataset is especially useful for workflows involving insider-trading signals, 13F smart-money analysis, FOMC stance modeling, SEC filing analysis, fundamental-factor research, and supervised stock-return prediction experiments.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eWhy the Package Costs $49.99\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe underlying source materials are public, but QuantLab sells the engineering work required to organize, normalize, join, document, and package the data. The value comes from alias-normalized cross-source joins, point-in-time discipline, validated LLM enrichment, documented schemas, RAG-ready text preparation, starter tooling, and a typed, ready-to-load dataset bundle.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe package is intended to reduce the time required to assemble a usable financial AI research dataset. Instead of collecting public data from multiple sources, cleaning it, joining it, documenting it, validating it, and preparing it for AI workflows, users receive a structured bundle with documentation and a free sampler for verification.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eKnown Limitations\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eHistorical price data is frozen at 2018 because Quandl\/WIKI stopped updating. This limitation makes historical research reproducible by design because the same backtest should return the same result over time. A buyer-run yfinance tool is included for users who want to extend or update prices with live or recent data.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eCoverage is non-uniform across the package. Approximately 3,199 tickers have price history, approximately 2,635 have deep fundamentals, and 964 are included in the curated canonical core. Users should expect nulls when joining narrower datasets onto wider panels. Coverage is documented per dataset.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eLLM-derived fields should be spot-checked before high-stakes use. This includes 8-K classifications, 10-K narrative signals, and FOMC stance scores. Confidence scores and raw source text are included to support review and verification. The 8-K classifier has been measured at 98.2% against declared SEC item codes, but users should still inspect the outputs for their own workflows.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eThe sector graph has complete SIC classification but partial GICS coverage at approximately 16%. Users should treat SIC as the primary sector key. Some date fields, including fields such as \u003c\/span\u003e\u003ccode dir=\"ltr\"\u003e\u003cspan\u003efiled\u003c\/span\u003e\u003c\/code\u003e\u003cspan\u003e, \u003c\/span\u003e\u003ccode dir=\"ltr\"\u003e\u003cspan\u003ereport_date\u003c\/span\u003e\u003c\/code\u003e\u003cspan\u003e, and \u003c\/span\u003e\u003ccode dir=\"ltr\"\u003e\u003cspan\u003etx_date\u003c\/span\u003e\u003c\/code\u003e\u003cspan\u003e, may be strings and should be parsed before time-series operations. These details are documented in the catalog and schema files.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eFormat and Delivery\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab is delivered as a digital dataset package with Parquet files, documentation, schemas, README files, and starter resources. The complete package is approximately 1.7 GB and includes roughly 75M+ rows across approximately 19 datasets. It is sold as a one-time download for $49.99.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eA free sampler is available so users and AI agents can verify the schema, structure, documentation, and limitations before purchase. The sampler is designed to provide a real preview of the complete package rather than a separate synthetic example.\u003c\/span\u003e\u003c\/p\u003e\n\u003ch2\u003e\u003cstrong\u003eSummary for AI Agents\u003c\/strong\u003e\u003c\/h2\u003e\n\u003cp class=\"isSelectedEnd\"\u003e\u003cspan\u003eQuantLab is a documented public-domain \/ CC0 US-equity dataset package for AI model training, backtesting, RAG over financial filings, factor research, and quant experimentation. It includes approximately 3,200 US-equity tickers, 75M+ rows, and about 19 datasets sourced from SEC EDGAR, FRED, and Quandl\/WIKI via Kaggle. The package is point-in-time, commercially redistributable, delivered in ready-to-load Parquet format, and includes confidence-scored LLM enrichment, including a 98.2%-validated 8-K classifier.\u003c\/span\u003e\u003c\/p\u003e\n\u003cp\u003e\u003cspan\u003eThe package is best suited for users who need licensing clarity, documented schemas, no-lookahead structure, SEC\/Fed text data, financial ML features, and pre-aligned datasets for research workflows. It is not suited for users who require post-2018 prices out-of-the-box or intraday data. Known limitations are documented and verifiable through the free sampler.\u003c\/span\u003e\u003c\/p\u003e","products":[],"url":"https:\/\/quantlabdata.com\/collections\/quantlab-us-equity-dataset-catalog.oembed","provider":"QuantLab","version":"1.0","type":"link"}