Stock Trading with Reinforcement Learning and News Embeddings

Authors: Subin Park, Jiseong Jo, Changyong Park, Nayeon Kwan

This work presents an end‑to‑end system that learns market context from financial news and uses it to drive portfolio decisions. We pretrain a JEPA‑style text encoder on daily news windows and then train a reinforcement learning agent to allocate across multiple assets under transaction costs.

Abstract

We investigate whether news‑conditioned representations improve portfolio management. A Joint Embedding Predictive Architecture (JEPA) is pretrained on mainstream financial news to produce compact, predictive daily embeddings. These embeddings are fused with price and portfolio state in an RL environment. A Soft Actor‑Critic (SAC) or Q‑learning agent outputs position targets subject to trading frictions. We backtest on held‑out dates and report returns, risk, Sharpe, and correlation diagnostics between embeddings and financial targets.

1. Introduction

Numerical features alone often miss regime changes and cross‑sectional narratives captured by news. To address this, we combine textual context with temporal modeling and reinforcement learning to produce allocations that adapt to evolving market states. Our approach contributes a market‑wide news embedding mechanism that aggregates headlines across tickers per day with configurable pooling (sum, mean, or weighted sum), a JEPA objective that can operate in masking or causal mode with an EMA target encoder and a lightweight causal decoder, and a practical portfolio environment that incorporates trading costs, weight constraints, and a cash asset. We add correlation‑aware validation that reports how learned embeddings relate to multi‑horizon returns, volatility, momentum, mean‑reversion, and acceleration under lead/lag analysis. Finally, we expose memory‑aware training controls that subsample or cap daily news to fit large corpora without sacrificing the overall pipeline.

2. Related Work (brief)

Self‑supervised pretraining methods like JEPA learn predictive features without labels, while reinforcement learning for trading focuses on decision policies under uncertainty. We integrate a text‑driven predictive encoder into a sequential, multi‑asset portfolio environment and evaluate using standard risk/return metrics, avoiding direct price prediction and instead leveraging contextual embeddings.

3. Data

The system operates on mainstream news and daily prices per ticker. A helper script in subin/src/data_prep/news.py can download and merge data via the Tiingo API, using a secret.yaml that provides a TIINGO_TOKEN. Users may supply a combined Parquet through --dataset_path or separate Parquets through --news_path and --price_path, which the loader merges at runtime. Dates are timezone‑stripped and normalized, tickers are uppercased, and a price column is inferred from common names such as adjClose or close. For JEPA, we build daily sequences of headlines either per ticker or across the entire market. For portfolio RL, we assemble [time × ticker] tables of daily titles and prices and encode them into [time, tickers, embed_dim] tensors.

4. Method

In the pretraining stage, a Sentence‑Transformers encoder produces daily embeddings that are optionally projected to a target dimension. Daily aggregation can occur per ticker by summing headline embeddings, or across the market by pooling all same‑day headlines using sum, mean, or weighted sum. A causal decoder with positional encoding and a mask token predicts either masked positions (masking mode) or the next step (causal mode). The target network is an exponential moving average of the online encoder to stabilize training. During validation, the system computes correlations between predicted embeddings and financial indicators including multi‑period returns (1d, 2d, 3d, 5d), rolling volatility, momentum, mean‑reversion z‑scores, and price acceleration, each examined across positive and negative lags.

In the portfolio stage, the observation comprises the most recent seq_len news embeddings per ticker, current prices, and the current portfolio vector of weights plus cash. The policy outputs a continuous vector that is transformed into buy and sell targets while enforcing minimum and maximum position sizes and applying a transaction cost. Cash and holdings are updated sequentially, and the reward is the clipped daily portfolio return. Equal‑weight Buy‑and‑Hold is computed as a benchmark. The policy backbone is a transformer over the [time, ticker, embed_dim] tensor, with portfolio and price features fused before policy and value heads. We implement SAC with optional temperature auto‑tuning and a DQN‑style Q‑learning variant, both available in subin/src/train/portfolio_rl.py.

5. Experimental Setup

Experiments use chronological splits by default. For fine‑tuning, the test window commonly spans from 2025‑01‑01 to 2025‑08‑31, while the remaining data is divided into training and validation by a configurable ratio. JEPA typically uses a projected embedding dimension of 64, a sequence length such as 32, and a causal objective trained with a cosine schedule and an EMA momentum of 0.996. The portfolio agent uses a transformer backbone with a model dimension of 256, several layers and heads, and SAC as the default algorithm, though Q‑learning is also supported. Replay capacity, batch sizes, the discount factor, daily return clipping, and transaction cost are parameterized in FineTuneConfig. To deal with large text corpora, we rely on news_sampling_ratio, max_news_per_day, and text_batch_size, and we use gradient accumulation to reduce memory pressure.

6. Results and Outputs

Pretraining yields checkpoints that include the online and target encoders and the temporal decoder. Validation reports mean squared error for the masking or causal objective together with comprehensive correlation diagnostics that relate embeddings to returns, volatility, momentum, mean‑reversion, and acceleration under different lags. Portfolio training evaluates multiple episodes on the test window, reports total return, annualized volatility, and Sharpe ratio, and saves plots that compare the agent to an equal‑weight Buy‑and‑Hold baseline. A metrics JSON is written to the run’s save directory, making it straightforward to compare different configurations.

7. Ablations and Diagnostics

Ablations can be performed by changing configuration only. Switching among sum, mean, and weighted sum for market‑wide pooling highlights how daily news aggregation affects performance. Comparing market‑wide against per‑ticker embeddings helps isolate the value of cross‑sectional context. Choosing masking versus causal pretraining illuminates the role of temporal prediction. Varying the sequence length and embedding dimension probes representational capacity. When memory becomes a bottleneck, reducing the fraction of daily headlines or capping their count provides a simple way to balance efficiency and accuracy.

8. Limitations and Risks

The environment models trading costs and allows cash, but does not fully capture market microstructure effects such as slippage and liquidity constraints. Representation quality may drift as news regimes change; while chronological splits and correlation‑aware validation help, they cannot remove the risk. Care is needed to prevent leakage between train and test windows, especially when headlines overlap. The ticker universe should be fixed prior to evaluation to avoid survivorship or selection biases. Finally, hyperparameters tuned on validation might not generalize, so consistent protocols and out‑of‑sample reporting remain essential.

9. Reproducibility

Reproducibility is supported through explicit seeding and device selection. Seeds can be set for both stages, and deterministic flags are enabled where appropriate. Devices are chosen automatically from CUDA, Apple silicon (MPS), or CPU. Weights & Biases logging is optional and controlled via flags for the project and run name, which makes it easy to replicate and compare experiments.

10. Usage

Run commands from the subin/ directory so that src is importable. Data can be prepared with the Tiingo helper script in src/data_prep/news.py using a secret.yaml that contains TIINGO_TOKEN, or by supplying your own Parquet files either combined via --dataset_path or separate via --news_path and --price_path. After data preparation, launch JEPA pretraining with python -m src.train.run_pretrainer, specifying the dataset location, aggregation mode, embedding dimension, sequence length, batch sizes, number of epochs, and the output directory, as in the following example:

python -m src.train.run_pretrainer \
  --dataset_path data/multi_ticker_price_and_mainstream_news.parquet \
  --use_market_wide_embedding=True \
  --embedding_dim=128 \
  --embedding_aggregation=sum \
  --seq_len=16 \
  --batch_size=4 \
  --num_epochs=10 \
  --save_dir outputs/jepa_news_market_wide

Once a checkpoint directory exists, train the portfolio agent with python -m src.train.run_train, providing the dataset path, JEPA checkpoint, sequence length, algorithm, output directory, and a test window to ensure out‑of‑sample evaluation, for example:

python -m src.train.run_train \
  --dataset_path data/multi_ticker_price_and_mainstream_news.parquet \
  --jepa_checkpoint_path outputs/jepa_news_market_wide \
  --seq_len=16 \
  --rl_algo=sac \
  --save_dir outputs/rl_trader_market_wide \
  --test_start_date 2025-01-01 --test_end_date 2025-08-31

Asset universes can be narrowed with --portfolio_assets or by setting --max_assets_in_portfolio. Weights & Biases can be disabled with --use_wandb False. When relying on separate news and price files, add --news_path and --price_path to the relevant stage. Some older scripts may mention run_fine_tune; the correct RL entry point here is subin/src/train/run_train.py.

11. Repository Map

Core components live under subin/src/train/. The JEPA pretraining loop is implemented in pretrainer.py with its command‑line interface in run_pretrainer.py. Portfolio reinforcement learning is implemented in portfolio_rl.py with its entry point in run_train.py. The portfolio_data.py module converts news and price tables into aligned tensors suitable for the RL environment. Sentence‑Transformers encoder construction and daily or market‑wide aggregation live in encoders.py, while the temporal decoder used for JEPA is defined in models.py. Configuration dataclasses for both stages are in train_config.py, and correlation diagnostics used during validation are implemented in financial_metrics.py. Memory guidance is documented in subin/MEMORY_OPTIMIZATION.md, and SLURM batch examples are provided in subin/job-scripts/.

Appendix A: Minimal Configs

Pretraining (Python):

from src.train.train_config import PreTrainConfig
cfg = PreTrainConfig(
    dataset_path='data/multi_ticker_price_and_mainstream_news.parquet',
    use_market_wide_embedding=True,
    embedding_dim=128,
    embedding_aggregation='sum',
    seq_len=16,
    news_sampling_ratio=0.5,
    max_news_per_day=50,
    save_dir='outputs/jepa_news_market_wide',
)

Fine‑tuning (Python):

from src.train.train_config import FineTuneConfig
cfg = FineTuneConfig(
    dataset_path='data/multi_ticker_price_and_mainstream_news.parquet',
    jepa_checkpoint_path='outputs/jepa_news_market_wide',
    rl_algo='sac',
    seq_len=16,
    save_dir='outputs/rl_trader_market_wide',
    test_start_date='2025-01-01',
    test_end_date='2025-08-31',
)

Appendix B: Troubleshooting

If module imports fail when running commands, verify that your working directory is subin/ so that the src package is on the path. If pretraining runs out of memory, shorten the sequence length, cap the number of headlines per day, or increase the sampling ratio so that fewer headlines are processed. If no training windows are created, check that the dataset contains enough unique dates for the chosen seq_len and that min_titles_per_day is not too restrictive. When using separate news and price files, ensure ticker casing is consistent and that date columns are parsed to timezone‑naive dates so that joins succeed.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
changyong		changyong
jiseong		jiseong
nayeon		nayeon
subin		subin
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Trading with Reinforcement Learning and News Embeddings

Abstract

1. Introduction

2. Related Work (brief)

3. Data

4. Method

5. Experimental Setup

6. Results and Outputs

7. Ablations and Diagnostics

8. Limitations and Risks

9. Reproducibility

10. Usage

11. Repository Map

Appendix A: Minimal Configs

Appendix B: Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stock Trading with Reinforcement Learning and News Embeddings

Abstract

1. Introduction

2. Related Work (brief)

3. Data

4. Method

5. Experimental Setup

6. Results and Outputs

7. Ablations and Diagnostics

8. Limitations and Risks

9. Reproducibility

10. Usage

11. Repository Map

Appendix A: Minimal Configs

Appendix B: Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages