Technical Documentation · Financial Risk Engine

News Stress Scorer

FinBERT Sentiment Analysis | Real-Time News Ingestion | Weighted Aggregation
A pipeline that reads a company from your database, finds its news, and returns a financial stress score between 0 and 1.
Given any company identifier — a code like ALEPPL or a name like Alpha Ecoplast Private Limited — the system pulls its profile from MongoDB, searches Google News for recent coverage, and classifies each snippet through FinBERT (ProsusAI/finbert) — a BERT model fine-tuned on financial text — then aggregates the per-snippet sentiment into a single weighted stress score.
// Example Output {
  "company_code": "ALEPPL",
  "stress_score": 0.5202,
  "confidence": "medium",
  "key_drivers": [
    "CRISIL rating action…",
    "Starlinger PET line…"
  ],
  "news_used": 3,
  "articles": []
}
// 01
How It Works

The system runs four sequential steps. Each step feeds the next — if any step returns nothing, the following step still runs on whatever is available.

Step 1
MongoDB Lookup
Fetch the company profile by code or name from your financial knowledge graph
Step 2
News Fetch
Query Google News RSS using the company name as the search term
Step 3
FinBERT Scoring
Classify each snippet with FinBERT (positive / neutral / negative) and aggregate into a stress score
Step 4
Output
Return normalised stress score, confidence, drivers and summary
Design choice

The company name is used directly as the news search query — there is no keyword matching or feed filtering. This means you always get articles that are explicitly about that company rather than loosely related sector news.

// 02
Data Sources

The scorer draws from two sources per company. Both are queried on every run — there is no caching.

🗄
MongoDB — finincial_kg.companies
Your existing database of 2,818 companies. Contains CRISIL ratings, MCA registration data, industry codes, listing status, capital structure and shareholding patterns. The CRISIL heading is always fed to FinBERT as an extra text snippet — even when no news is found, the model can score from the rating action text alone.
📰
Google News RSS
A free, unauthenticated RSS feed queried by company name. Returns up to 10 recent articles with title, source, publication date and summary snippet. Requires a browser-style User-Agent header — without it, Google returns an empty response.

The script tries two query strategies against Google News in order, stopping as soon as it gets results:

Primary Exact quoted name — "Alpha Ecoplast Private Limited"
Fallback Short name with legal suffix stripped — "Alpha Ecoplast"
No news Score entirely from MongoDB profile and CRISIL heading
// 03
Company Lookup Logic

You can identify a company in three ways. The system tries each field in order until it finds a match:

companyCode
Exact match. Fastest. e.g. ALEPPL
crisilName
Case-insensitive regex match. e.g. alpha ecoplast
mcaName
Case-insensitive regex match against the MCA registered name
Fallback
Raises a clear error showing 10 sample codes from the collection

The following fields from the matched document are used for building text snippets and enriching the final output:

crisilName / mcaName
Official company name
listingStatus
Listed or Unlisted
mcaClass
Private / Public
mcaCompanyStatus
Active / Struck off etc.
industryName + nicCode
Sector classification
mcaAuthorizedCapital
Authorised capital in ₹
mcaPaidupCapital
Paid-up capital — key size signal
crisilHeading
Most recent CRISIL rating line — the single most valuable field when no news exists
// 04
The Stress Score

The score is a single float between 0.0 (no stress) and 1.0 (severe distress), produced by running each text snippet through FinBERT (ProsusAI/finbert). FinBERT returns three probabilities per snippet: positive, neutral, and negative. Each is mapped to a stress value using: stress = negative + 0.5 × neutral. The per-snippet scores are then aggregated via a weighted average (news headlines get weight 2×, CRISIL/baseline snippets get 1×).

Band Score Range What it means
None 0.0 – 0.2 No stress. Positive news, routine operations, stable ratings.
Low 0.2 – 0.4 Minor operational friction. Small delays, management reshuffles at large companies.
Moderate 0.4 – 0.6 Noticeable issues. Leadership change at a small firm, downgrade watch, sector headwinds.
High 0.6 – 0.8 Significant concern. Rating downgrade, auditor resignation, payment delays, legal action.
Severe 0.8 – 1.0 Critical distress. Default, NCLT filing, insolvency proceedings, fraud investigation.

The output also includes these supporting fields:

confidence
low / medium / high — based on article volume. ≥5 articles → high, ≥2 → medium, otherwise low.
key_drivers
Text previews of the snippets with the highest stress values (stress > 0.4).
summary
Auto-generated sentence with the band label, score, confidence, and article count.
news_used
Integer count of articles analysed. Zero means the score is profile-only.
articles
Array of objects with title, link, and source for each news article consumed.
// 05
How FinBERT Aggregation Works

Each text snippet is independently classified by FinBERT. The per-snippet results are then combined into one final score. Here is how the pipeline handles key scenarios:

// 06
Full Output Structure

The script prints and returns a Python dictionary (serialisable to JSON) with the following shape:

// Full output schema {
  "company_code": "ALEPPL",
  "company_name": "Alpha Ecoplast Private Limited",
  "listing_status": "Unlisted",
  "industry": "Other Textile Products",
  "stress_score": 0.5202,
  "confidence": "medium",
  "key_drivers": ["CRISIL rating action: …downgraded…", "Starlinger PET bottle-to-bottle…"],
  "summary": "Moderate financial stress detected for Alpha Ecoplast… (FinBERT score 0.52, medium confidence, 3 article(s) analysed).",
  "news_used": 3,
  "articles": [
    { "title": "…", "link": "https://…", "source": "…" }
  ],
  "scored_at": "2026-03-04T20:17:37+00:00"
}
// 07
Running the Script

Set your environment variables then call the script with a company identifier as the first argument:

// Install dependencies pip install pymongo feedparser requests transformers torch numpy

// Run with a company code MONGO_URI="mongodb://host:27017" \
python3 news_data_fetcher_stress_mapper.py ALEPPL

// Or with a partial name python3 news_data_fetcher_stress_mapper.py "Alpha Ecoplast"

All configuration can also be placed in a .env file and loaded with set -a && . .env && set +a before running. The script reads: MONGO_URI, DB_NAME, and COLLECTION_NAME. No LLM endpoint or API key is required.

FinBERT — Offline Model

The scorer uses ProsusAI/finbert, a BERT model fine-tuned on financial text. The model weights (~438 MB) are downloaded automatically from HuggingFace on first run and cached locally. No external API calls are required for scoring — the only outbound request is the Google News RSS fetch, which can be removed for fully air-gapped operation.