<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://shibaprasadb.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://shibaprasadb.github.io/" rel="alternate" type="text/html" /><updated>2026-04-14T06:17:21+00:00</updated><id>https://shibaprasadb.github.io/feed.xml</id><title type="html">Shibaprasad Bhattacharya</title><entry><title type="html">GenAI in Product : Means &amp;amp; Ends</title><link href="https://shibaprasadb.github.io/2026/03/25/genai-in-product-means-ends.html" rel="alternate" type="text/html" title="GenAI in Product : Means &amp;amp; Ends" /><published>2026-03-25T00:00:00+00:00</published><updated>2026-03-25T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2026/03/25/genai-in-product-means-ends</id><content type="html" xml:base="https://shibaprasadb.github.io/2026/03/25/genai-in-product-means-ends.html"><![CDATA[<p>Most productivity debates around GenAI miss a distinction I think matters quite a bit. Using GenAI to write code is not the same as building a product where GenAI is the core feature. The gap is manageable at the prototype stage. Production is where it becomes a different problem entirely.</p>

<p><a href="https://ordinaryanalysis.substack.com/p/genai-in-product-means-ends">Read on Ordinary Analysis</a></p>]]></content><author><name></name></author><category term="technical" /><summary type="html"><![CDATA[The productivity debate has a blind spot. Not all gains are the same.]]></summary></entry><entry><title type="html">Reading the Numbers: How India Reads</title><link href="https://shibaprasadb.github.io/2026/02/18/how-india-reads.html" rel="alternate" type="text/html" title="Reading the Numbers: How India Reads" /><published>2026-02-18T00:00:00+00:00</published><updated>2026-02-18T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2026/02/18/how-india-reads</id><content type="html" xml:base="https://shibaprasadb.github.io/2026/02/18/how-india-reads.html"><![CDATA[<p>Guardian published an article on 9th Feb 2025. The post titled <a href="https://www.theguardian.com/global-development/2026/feb/09/books-india-literature-festivals-readers">“Most Indians don’t read for pleasure - so why does the country have 100 literature festivals?”</a> garnered a lot of attention (the title was changed later). <a href="https://www.outlookindia.com/culture-society/the-guardian-cant-question-profusion-of-lit-fests-india-reads-writes-and-celebrates-words">Outlook India</a> refuted some of the claims in their own way. And Anurag Minus Verma penned down a brilliant essay on <a href="https://www.theculturecafe.in/p/why-dont-indians-read-for-pleasure">why don’t Indians read for pleasure</a>.</p>

<p>I related a lot to the essay by Anurag Verma. He argued that most of our readings are utility driven i.e. we read because we have to.</p>

<p>Among all these debates and discussions, I felt we were missing one point: how is India different from other nations?</p>

<p>I don’t have a lot of foreign associates. But from the little knowledge, it didn’t feel like others were drowning themselves in books while India alone wasn’t reading. It is a contemporary issue that transcends borders, regions and seas.</p>

<p>So the question arose:</p>

<h2 id="how-do-indians-read-compared-to-other-countries">How do Indians read compared to other countries?</h2>

<p>And the only way we can answer this is through data.</p>

<p>I looked at this from two angles:</p>

<ul>
  <li>What is the buying pattern of different countries with respect to genre?</li>
  <li>How much, on average, is an Indian spending on trade books?</li>
</ul>

<p>Let’s dissect the first one:</p>

<p><img src="/images/posts/2026-02-18-how-india-reads/ranked_genres.png" alt="Revenue share for different genres" /></p>

<p>In terms of revenue share, Indians are spending far more on educational books than the US, UK or European countries. So Anurag’s point is not completely wrong. And intuitively, that makes sense.</p>

<p>But now, let’s look at it from another angle: as a percentage of GDP, how much are Indians spending on trade books (i.e. excluding educational books)?</p>

<p><img src="/images/posts/2026-02-18-how-india-reads/trade_book_spending.png" alt="Trade book spending as a % of GDP" /></p>

<p>An average Indian is actually spending more on trade books than an average US and European citizen. But less than the UK.</p>

<p>This shows that when it comes to buying non-educational books, Indians are more intentional compared to more developed nations. When you are spending more of your hard-earned money, especially in a country with a lower per-capita GDP, you need to be more intentional.</p>

<p>One methodological note: this analysis uses nominal GDP per capita as the denominator. Book prices vary across markets. A $10 US paperback might be ₹400-600 in India, depending on the publisher and format - but books are globally traded goods with some price convergence, unlike purely local services. Using nominal figures keeps the comparison straightforward, though a full <a href="https://en.wikipedia.org/wiki/Purchasing_power_parity">PPP adjustment</a> could be explored in future work.</p>

<p>As usual, reality is far from a clean black-and-white thing. It is much more nuanced. On average, Indians tend to read more for utility, but we also tend to spend more on non-utility books.</p>

<p>This also raises an interesting question: what happens when India’s per capita GDP rises? More surplus should lead to more spending on pleasure?</p>

<hr />

<h3 id="notes--sources">Notes &amp; Sources</h3>

<p><strong>Trade books</strong> are commercially published books sold to the general public, excluding educational textbooks, academic journals, and professional reference materials. In this analysis, trade books = Fiction + Non-Fiction.</p>

<p><strong>Data note:</strong> All book market figures reflect 2024 actuals from industry sources (Horizon Databook, FEP, Nielsen). GDP per capita figures are from 2024-2025; US figures use 2025 IMF projections. Year-on-year differences are negligible (&lt;2%) and do not materially affect the conclusions.</p>

<p><strong>Calculation methodology:</strong> Trade book spending per capita was derived by multiplying total market size by the trade book revenue share (Fiction + Non-Fiction %), then dividing by population. This was then expressed as a percentage of nominal GDP per capita.</p>

<table>
  <thead>
    <tr>
      <th>Country</th>
      <th>Market Size</th>
      <th>Trade Share</th>
      <th>Trade Per Capita</th>
      <th>GDP Per Capita</th>
      <th>% of GDP</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>India</td>
      <td>$10.37B</td>
      <td>29–51%</td>
      <td>$2.05–$3.56</td>
      <td>$2,730</td>
      <td>0.08%–0.13%</td>
    </tr>
    <tr>
      <td>US</td>
      <td>$40.44B</td>
      <td>52–56%</td>
      <td>$61.76–$66.47</td>
      <td>$85,000</td>
      <td>0.07%–0.08%</td>
    </tr>
    <tr>
      <td>UK</td>
      <td>$8.94B</td>
      <td>65–75%</td>
      <td>$85.44–$98.53</td>
      <td>$56,000</td>
      <td>0.15%–0.18%</td>
    </tr>
    <tr>
      <td>Europe</td>
      <td>$29.03B</td>
      <td>51%</td>
      <td>$32.89</td>
      <td>$49,000</td>
      <td>0.07%</td>
    </tr>
  </tbody>
</table>

<p>The plot uses the mean of each range as the representative value.</p>

<hr />

<p><strong>India</strong></p>
<ul>
  <li>Market size: $10.37B (Horizon Databook) [1]</li>
  <li>Educational: 60.2% mean revenue share (Horizon [1], IBEF [2])</li>
  <li>Fiction: 17.5% revenue share, +30.7% YoY growth [1][2]</li>
  <li>Children’s: 9.8% revenue share [1]</li>
</ul>

<p><strong>United States</strong></p>
<ul>
  <li>Market size: $40.44B (Horizon Databook) [3]</li>
  <li>Fiction: 32.8% ($3.26B), +12.6% YoY (AAP) [4]</li>
  <li>Children’s: 24.7% [3]</li>
  <li>Educational: 19.8% [3]</li>
  <li>Non-Fiction: 19.2% ($2.88B), +1.3% YoY (AAP) [4]</li>
  <li>Religious/Professional: 3.5% [3]</li>
</ul>

<p><strong>United Kingdom</strong></p>
<ul>
  <li>Market size: £1.82B physical (Nielsen) [5]</li>
  <li>Fiction: 42.5% mean revenue share, record high, +18% YoY [6]</li>
  <li>Non-Fiction: 27.5% [7]</li>
  <li>Educational: 17.4% (Horizon) [8]</li>
  <li>Children’s: 12.6% [5][8]</li>
</ul>

<p><strong>Europe</strong></p>
<ul>
  <li>Market size: €24.9B (FEP) [9]</li>
  <li>Fiction: 27.5% [9]</li>
  <li>Educational: 23.3% mean (FEP [9], Horizon [10])</li>
  <li>Non-Fiction: 22.5% [9]</li>
  <li>Academic/Professional: 16.7% [9]</li>
  <li>Children’s: 14.6% [9]</li>
</ul>

<hr />

<p><strong>Citations</strong></p>

<p>[1] Grand View Research/Horizon Databook. India Books Market Size &amp; Outlook, 2025–2033. https://www.grandviewresearch.com/horizon/outlook/books-market/india</p>

<p>[2] IBEF. India’s Meteoric Rise as a Publishing Hub. https://www.ibef.org/blogs/india-s-meteoric-rise-as-a-publishing-hub</p>

<p>[3] Grand View Research/Horizon Databook. US Books Market Size &amp; Outlook, 2025–2033. https://www.grandviewresearch.com/horizon/outlook/books-market/united-states</p>

<p>[4] AAP via Publishing Perspectives. December StatShot: US Book Market Up 6.5% Year-to-Date. https://publishingperspectives.com/2025/03/aaps-december-statshot-us-market-up-6-5-percent-year-to-date/</p>

<p>[5] Nielsen BookData. Bestsellers &amp; trends in the UK &amp; Ireland in 2024. https://nielseniq.com/global/en/insights/commentary/2025/bestsellers-trends-in-the-uk-ireland-in-2024/</p>

<p>[6] Friedman, Jane. Book sales update: UK market. https://janefriedman.com/book-sales-update-uk-market/</p>

<p>[7] LoveReading. Fiction Book Sales in 2024 Top The Lot. https://www.lovereading.co.uk/blog/fiction-book-sales-in-2024-top-the-lot-while-non-fiction-lags-behind-9226</p>

<p>[8] Grand View Research/Horizon Databook. UK Books Market Size &amp; Outlook, 2025–2033. https://www.grandviewresearch.com/horizon/outlook/books-market/uk</p>

<p>[9] Federation of European Publishers (FEP). European Book Publishing Statistics 2024. https://www.fep-fee.eu/European-Book-Publishing-Statistics-2024</p>

<p>[10] Grand View Research/Horizon Databook. Europe Books Market Size &amp; Outlook, 2025–2033. https://www.grandviewresearch.com/horizon/outlook/books-market/europe</p>]]></content><author><name></name></author><category term="data-stories" /><summary type="html"><![CDATA[Utility, trade books, and what the numbers actually say about Indian reading habits.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://shibaprasadb.github.io/images/posts/2026-02-18-how-india-reads/trade_book_spending.png" /><media:content medium="image" url="https://shibaprasadb.github.io/images/posts/2026-02-18-how-india-reads/trade_book_spending.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Tale of Two Contes</title><link href="https://shibaprasadb.github.io/2026/02/04/tale-of-two-contes.html" rel="alternate" type="text/html" title="The Tale of Two Contes" /><published>2026-02-04T00:00:00+00:00</published><updated>2026-02-04T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2026/02/04/tale-of-two-contes</id><content type="html" xml:base="https://shibaprasadb.github.io/2026/02/04/tale-of-two-contes.html"><![CDATA[<p>Antonio Conte has just finished his 2025/26 Champions League run with Napoli, and yet again, the “Scudetto King” looked like a “European Novice.” For those who follow football, it’s well known that he has an illustrious record when it comes to league matches, but falls short when it comes to Europe. Or at least there is a strong belief among many that he falls short.</p>

<p>As an Analytics professional, it is our job not to accept whatever is propagated by the belief system blindly. Rather, do our investigation, and then accept, reject, or refine it.</p>

<p>So, I thought, what if we try to see what is actually happening here? In this blogpost, I will discuss my approaches for the same.</p>

<p>But before asking any questions - let’s have a look at the data. What is the gap between the Conte of domestic league and Europe?</p>

<p>For our analysis, we have also taken Pep and Klopp (Liverpool &amp; Dortmund) - so that we can understand how he compares with the other two. This will help us in putting things into more context.</p>

<p><img src="/images/posts/2026-02-04-tale-of-two-contes/tactical_fingerprints.png" alt="Winning percentage for the three managers" /></p>

<p>As we can see, there is a significant difference in the win % for Conte - compared to Klopp &amp; Pep.</p>

<p>So the question arises:</p>

<h2 id="is-this-a-fluke-or-statistically-significant">Is this a fluke or statistically significant?</h2>

<p>A simple Z-test can tell us if the gap in Europe vs Domestic League is just a fluke or if there is some statistical significance.</p>

<p>Turns out - it is highly significant!</p>

<p>The difference is even significant for Pep. Now if you think about it - that is expected. UCL is a tougher competition. So, we would expect to see some reduction in the winning percentage.</p>

<p>For Klopp, though, it is not significant.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────────┬───────────────┬──────────────────┬─────────────┬──────────┬──────────────────────────────┐
│  Manager  │ UCL Win Rate  │ League Win Rate  │ Z-statistic │ P-value  │       Interpretation         │
├───────────┼───────────────┼──────────────────┼─────────────┼──────────┼──────────────────────────────┤
│   Conte   │     34.0%     │      60.5%       │    3.661    │ 0.000251 │ Highly significant difference│
├───────────┼───────────────┼──────────────────┼─────────────┼──────────┼──────────────────────────────┤
│ Guardiola │     62.4%     │      72.2%       │    2.540    │ 0.011086 │   Significant difference     │
├───────────┼───────────────┼──────────────────┼─────────────┼──────────┼──────────────────────────────┤
│   Klopp   │     56.9%     │      62.9%       │    1.161    │ 0.245760 │  No significant difference   │
└───────────┴───────────────┴──────────────────┴─────────────┴──────────┴──────────────────────────────┘
</code></pre></div></div>

<h2 id="how-different-are-the-tactical-profiles">How different are the tactical profiles?</h2>

<p>In the first section, we looked at winning % only. Now, let’s examine the complete tactical profile - how the full distribution of wins, draws, and losses shifts between competitions.</p>

<p>The Chi-square test helps us understand if the entire result pattern changes, not just the win rate. And interestingly, even Klopp shows a significant difference here despite his Z-test being non-significant. This suggests his teams play differently in Europe (fewer draws, more decisive results), even though his overall win rate stays similar.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────────┬───────────┬──────────┬────────────┬──────────────────────────────┐
│  Manager  │    χ²     │ P-value  │ Cramér's V │       Interpretation         │
├───────────┼───────────┼──────────┼────────────┼──────────────────────────────┤
│   Conte   │   14.136  │ 0.000852 │   0.145    │ Highly significant difference│
├───────────┼───────────┼──────────┼────────────┼──────────────────────────────┤
│ Guardiola │    6.441  │ 0.039928 │   0.092    │   Significant difference     │
├───────────┼───────────┼──────────┼────────────┼──────────────────────────────┤
│   Klopp   │   10.336  │ 0.005696 │   0.127    │   Significant difference     │
└───────────┴───────────┴──────────┴────────────┴──────────────────────────────┘
</code></pre></div></div>

<p>So far, all three managers show some form of difference between their league and UCL performances. But here’s the key question: <strong>Is the difference abnormal, or is it just the “Elite Tax” of playing in Europe?</strong></p>

<h2 id="the-elite-tax-accounting-for-expected-difficulty">The Elite Tax: Accounting for Expected Difficulty</h2>

<p>Now, let’s switch from a frequentist lens to a Bayesian one.</p>

<p>Here’s the fundamental insight: UCL is harder than domestic leagues. You can’t really treat playing Everton and Real Madrid at the same level. So we should expect some performance drop when managers face Europe’s elite. The question isn’t “is there a difference?” but rather “is the difference larger than expected?”</p>

<h3 id="the-methodology-the-10-rope">The Methodology: The 10% ROPE</h3>
<p>I introduced a <strong>Region of Practical Equivalence (ROPE)</strong>, which I call the “Elite Tax.” I am granting every manager a “pardon” for a <strong>10% drop</strong> in win rate. If their win rate drops by 10% or less, we consider that a normal byproduct of elite competition.</p>

<p>I then ran 100,000 simulations for each manager to calculate the probability that their performance drop is <strong>abnormal</strong> (greater than 10%).</p>

<p>If a manager’s drop is within the 10% zone → <strong>Normal</strong> (performing as expected given increased difficulty)<br />
If a manager’s drop exceeds 10% → <strong>Abnormal</strong> (genuine UCL problem)</p>

<h3 id="the-results">The Results</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────────┬───────────┬────────────────────┬──────────────────────────────────┐
│  Manager  │ Mean Drop │ P(Exceeds 10% Tax) │         Interpretation           │
├───────────┼───────────┼────────────────────┼──────────────────────────────────┤
│   Conte   │   25.9%   │      98.76%        │ Abnormal UCL underperformance    │
├───────────┼───────────┼────────────────────┼──────────────────────────────────┤
│ Guardiola │    9.9%   │      48.36%        │ Normal (within expected range)   │
├───────────┼───────────┼────────────────────┼──────────────────────────────────┤
│   Klopp   │    6.1%   │      23.18%        │ Normal (well within range)       │
└───────────┴───────────┴────────────────────┴──────────────────────────────────┘
</code></pre></div></div>

<p><img src="/images/posts/2026-02-04-tale-of-two-contes/gap_distribution_chart.png" alt="Gap Analysis" /></p>

<h3 id="what-this-means">What This Means</h3>

<p><strong>Pep Guardiola: The Benchmark.</strong> 
Pep’s distribution is centered almost perfectly on our 10% “Elite Tax” line. With a 48% chance of exceeding that threshold, his performance is essentially a coin flip. Mathematically, he is performing exactly how an elite manager should in a tougher competition.</p>

<p><strong>Jürgen Klopp: The Outlier (The Good Kind).</strong> 
Klopp’s entire distribution is tucked safely to the left of the 10% line. There is only a 23% chance that his drop-off is abnormal. But here’s the interesting part: his Z-test showed no significant win rate difference, yet his Chi-square test was significant. Why? Because his <em>tactical approach</em> shifts in Europe - fewer draws, more decisive results - even though his overall win rate stays similar. He doesn’t win less in the Champions League; he just plays differently. More high-stake matches, all-or-nothing football. If anything, Klopp is competition-proof; his tactical identity survives the jump from domestic leagues to Europe better than Pep or Conte.</p>

<p><strong>Antonio Conte: The “Statistical Glitch.”</strong> 
Conte lives in a different zip code. With 98.7% certainty, the model confirms that his European drop-off is abnormal. Even after accounting for his smaller UCL sample size (which makes his curve wider), there is almost zero overlap between his reality and that of Pep or Klopp.</p>

<p><strong>How to read the plot:</strong> Don’t just look at the peaks of the curves. Look at where they don’t overlap. Pep and Klopp’s performance distributions live in the same neighborhood. Conte’s distribution doesn’t even have a view of their street.</p>

<h2 id="the-verdict">The Verdict</h2>

<p>Statistics confirm what the eye test suggested: <strong>Only Conte has a genuine UCL problem.</strong></p>

<p>While all three managers show some performance difference between competitions (as they should - the UCL is harder), only Conte’s gap is abnormal after accounting for the increased difficulty. Pep and Klopp are performing within the expected range for elite managers facing elite opposition.</p>

<p>The “Tale of Two Contes” is a real, quantifiable pattern that has persisted across 50 Champions League matches and 618 league games. Whether it’s tactical inflexibility in knockout rounds, squad depth issues, or something else entirely - the numbers make one thing clear: domestic dominance doesn’t automatically translate to European success.</p>

<hr />

<p>Link to the notebook: https://github.com/shibaprasadb/datasignal/blob/main/conte_record/conte_record.ipynb</p>]]></content><author><name></name></author><category term="data-stories" /><summary type="html"><![CDATA[Antonio Conte has just finished his 2025/26 Champions League run with Napoli, and yet again, the “Scudetto King” looked like a “European Novice.” For those who follow football, it’s well known that he has an illustrious record when it comes to league matches, but falls short when it comes to Europe. Or at least there is a strong belief among many that he falls short. As an Analytics professional, it is our job not to accept whatever is propagated by the belief system blindly. Rather, do our investigation, and then accept, reject, or refine it. So, I thought, what if we try to see what is actually happening here? In this blogpost, I will discuss my approaches for the same. But before asking any questions - let’s have a look at the data. What is the gap between the Conte of domestic league and Europe? For our analysis, we have also taken Pep and Klopp (Liverpool &amp; Dortmund) - so that we can understand how he compares with the other two. This will help us in putting things into more context. As we can see, there is a significant difference in the win % for Conte - compared to Klopp &amp; Pep. So the question arises: Is this a fluke or statistically significant? A simple Z-test can tell us if the gap in Europe vs Domestic League is just a fluke or if there is some statistical significance. Turns out - it is highly significant! The difference is even significant for Pep. Now if you think about it - that is expected. UCL is a tougher competition. So, we would expect to see some reduction in the winning percentage. For Klopp, though, it is not significant. ┌───────────┬───────────────┬──────────────────┬─────────────┬──────────┬──────────────────────────────┐ │ Manager │ UCL Win Rate │ League Win Rate │ Z-statistic │ P-value │ Interpretation │ ├───────────┼───────────────┼──────────────────┼─────────────┼──────────┼──────────────────────────────┤ │ Conte │ 34.0% │ 60.5% │ 3.661 │ 0.000251 │ Highly significant difference│ ├───────────┼───────────────┼──────────────────┼─────────────┼──────────┼──────────────────────────────┤ │ Guardiola │ 62.4% │ 72.2% │ 2.540 │ 0.011086 │ Significant difference │ ├───────────┼───────────────┼──────────────────┼─────────────┼──────────┼──────────────────────────────┤ │ Klopp │ 56.9% │ 62.9% │ 1.161 │ 0.245760 │ No significant difference │ └───────────┴───────────────┴──────────────────┴─────────────┴──────────┴──────────────────────────────┘ How different are the tactical profiles? In the first section, we looked at winning % only. Now, let’s examine the complete tactical profile - how the full distribution of wins, draws, and losses shifts between competitions. The Chi-square test helps us understand if the entire result pattern changes, not just the win rate. And interestingly, even Klopp shows a significant difference here despite his Z-test being non-significant. This suggests his teams play differently in Europe (fewer draws, more decisive results), even though his overall win rate stays similar. ┌───────────┬───────────┬──────────┬────────────┬──────────────────────────────┐ │ Manager │ χ² │ P-value │ Cramér's V │ Interpretation │ ├───────────┼───────────┼──────────┼────────────┼──────────────────────────────┤ │ Conte │ 14.136 │ 0.000852 │ 0.145 │ Highly significant difference│ ├───────────┼───────────┼──────────┼────────────┼──────────────────────────────┤ │ Guardiola │ 6.441 │ 0.039928 │ 0.092 │ Significant difference │ ├───────────┼───────────┼──────────┼────────────┼──────────────────────────────┤ │ Klopp │ 10.336 │ 0.005696 │ 0.127 │ Significant difference │ └───────────┴───────────┴──────────┴────────────┴──────────────────────────────┘ So far, all three managers show some form of difference between their league and UCL performances. But here’s the key question: Is the difference abnormal, or is it just the “Elite Tax” of playing in Europe? The Elite Tax: Accounting for Expected Difficulty Now, let’s switch from a frequentist lens to a Bayesian one. Here’s the fundamental insight: UCL is harder than domestic leagues. You can’t really treat playing Everton and Real Madrid at the same level. So we should expect some performance drop when managers face Europe’s elite. The question isn’t “is there a difference?” but rather “is the difference larger than expected?” The Methodology: The 10% ROPE I introduced a Region of Practical Equivalence (ROPE), which I call the “Elite Tax.” I am granting every manager a “pardon” for a 10% drop in win rate. If their win rate drops by 10% or less, we consider that a normal byproduct of elite competition. I then ran 100,000 simulations for each manager to calculate the probability that their performance drop is abnormal (greater than 10%). If a manager’s drop is within the 10% zone → Normal (performing as expected given increased difficulty) If a manager’s drop exceeds 10% → Abnormal (genuine UCL problem) The Results ┌───────────┬───────────┬────────────────────┬──────────────────────────────────┐ │ Manager │ Mean Drop │ P(Exceeds 10% Tax) │ Interpretation │ ├───────────┼───────────┼────────────────────┼──────────────────────────────────┤ │ Conte │ 25.9% │ 98.76% │ Abnormal UCL underperformance │ ├───────────┼───────────┼────────────────────┼──────────────────────────────────┤ │ Guardiola │ 9.9% │ 48.36% │ Normal (within expected range) │ ├───────────┼───────────┼────────────────────┼──────────────────────────────────┤ │ Klopp │ 6.1% │ 23.18% │ Normal (well within range) │ └───────────┴───────────┴────────────────────┴──────────────────────────────────┘ What This Means Pep Guardiola: The Benchmark. Pep’s distribution is centered almost perfectly on our 10% “Elite Tax” line. With a 48% chance of exceeding that threshold, his performance is essentially a coin flip. Mathematically, he is performing exactly how an elite manager should in a tougher competition. Jürgen Klopp: The Outlier (The Good Kind). Klopp’s entire distribution is tucked safely to the left of the 10% line. There is only a 23% chance that his drop-off is abnormal. But here’s the interesting part: his Z-test showed no significant win rate difference, yet his Chi-square test was significant. Why? Because his tactical approach shifts in Europe - fewer draws, more decisive results - even though his overall win rate stays similar. He doesn’t win less in the Champions League; he just plays differently. More high-stake matches, all-or-nothing football. If anything, Klopp is competition-proof; his tactical identity survives the jump from domestic leagues to Europe better than Pep or Conte. Antonio Conte: The “Statistical Glitch.” Conte lives in a different zip code. With 98.7% certainty, the model confirms that his European drop-off is abnormal. Even after accounting for his smaller UCL sample size (which makes his curve wider), there is almost zero overlap between his reality and that of Pep or Klopp. How to read the plot: Don’t just look at the peaks of the curves. Look at where they don’t overlap. Pep and Klopp’s performance distributions live in the same neighborhood. Conte’s distribution doesn’t even have a view of their street. The Verdict Statistics confirm what the eye test suggested: Only Conte has a genuine UCL problem. While all three managers show some performance difference between competitions (as they should - the UCL is harder), only Conte’s gap is abnormal after accounting for the increased difficulty. Pep and Klopp are performing within the expected range for elite managers facing elite opposition. The “Tale of Two Contes” is a real, quantifiable pattern that has persisted across 50 Champions League matches and 618 league games. Whether it’s tactical inflexibility in knockout rounds, squad depth issues, or something else entirely - the numbers make one thing clear: domestic dominance doesn’t automatically translate to European success. Link to the notebook: https://github.com/shibaprasadb/datasignal/blob/main/conte_record/conte_record.ipynb]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://shibaprasadb.github.io/images/posts/2026-02-04-tale-of-two-contes/gap_distribution_chart.png" /><media:content medium="image" url="https://shibaprasadb.github.io/images/posts/2026-02-04-tale-of-two-contes/gap_distribution_chart.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">From 75% to 99.6%: The Math of LLM Ensembles</title><link href="https://shibaprasadb.github.io/2026/01/20/llm-ensemble.html" rel="alternate" type="text/html" title="From 75% to 99.6%: The Math of LLM Ensembles" /><published>2026-01-20T00:00:00+00:00</published><updated>2026-01-20T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2026/01/20/llm-ensemble</id><content type="html" xml:base="https://shibaprasadb.github.io/2026/01/20/llm-ensemble.html"><![CDATA[<p>The last project I worked on involved a lot of LLM API calls. One subtask seemed simple: count elements from a specific list. Straightforward, right? Not quite.</p>

<p>This needed production-level accuracy. But the simple API approach wasn’t cutting it. After testing 50 cases, I was only hitting around ~75% accuracy (37 out of 50). For production, that’s a non-starter.</p>

<h2 id="the-problem-with-single-api-calls">The Problem with Single API Calls</h2>

<p>The LLM was doing the task correctly for some instances but missing elements in others. Sometimes it would catch all 10 items, other times only 7 or 8. The pattern was clear: when it failed, it undercounted. It never hallucinated extra elements or went above the true count. It just missed things.</p>

<p>This directional bias turned out to be the key insight.</p>

<h2 id="so-i-random-forest-it">So I “Random Forest” It</h2>

<p>I decided to apply the “wisdom of crowds” principle. The same concept that makes Random Forest work. Instead of relying on a single API call, use multiple calls and aggregate intelligently.</p>

<p>The evaluation rule was simple: <strong>Max(API_call_1, API_call_2, …, API_call_n)</strong></p>

<p>Example: If there are 10 elements and three API calls return [7, 10, 3], the final output is 10.</p>

<p>Why this works: The undercounting errors get filtered out. The max function naturally finds the correct answer as long as at least one call succeeds. Since the LLM never overcounts, the highest value is almost always the right one.</p>

<p>Here’s how the two approaches compare:</p>

<p><img src="https://shibaprasadb.github.io/images/posts/2026-01-20-llm-ensemble/flow_LLM_Ensemble.jpg" alt="LLM Ensemble Flow" /></p>

<h2 id="the-math-behind-it">The Math Behind It</h2>

<p>With a single API call, the question is: What’s the probability of success?</p>

<p>With ensemble, it becomes: What’s the probability of at least one success?</p>

<p>The math changes drastically:</p>

<p><strong>P(at least one correct) = 1 - P(all calls wrong)</strong></p>

<p>For n=3 calls with p=0.75 success rate:</p>
<ul>
  <li>P(all wrong) = (1-p)ⁿ = 0.25³ = 0.015625</li>
  <li>P(at least one correct) = 1 - 0.015625 = <strong>98.4%</strong></li>
</ul>

<p>Going from 75% to 98.4% with just 3 calls? Not bad at all.</p>

<h2 id="finding-the-sweet-spot">Finding the Sweet Spot</h2>

<p>But I couldn’t just pick any number. Each API call costs money and adds latency. I needed to balance accuracy against cost.</p>

<p>Here’s how the numbers break down:</p>

<table>
  <thead>
    <tr>
      <th>n calls</th>
      <th>Accuracy</th>
      <th>Cost Multiplier</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>75.0%</td>
      <td>1x</td>
    </tr>
    <tr>
      <td>2</td>
      <td>93.8%</td>
      <td>2x</td>
    </tr>
    <tr>
      <td>3</td>
      <td>98.4%</td>
      <td>3x</td>
    </tr>
    <tr>
      <td>4</td>
      <td>99.6%</td>
      <td>4x</td>
    </tr>
    <tr>
      <td>5</td>
      <td>99.9%</td>
      <td>5x</td>
    </tr>
  </tbody>
</table>

<p><img src="https://shibaprasadb.github.io/images/posts/2026-01-20-llm-ensemble/APICall_success_rate.png" alt="Success Rate" /></p>

<p><img src="https://shibaprasadb.github.io/images/posts/2026-01-20-llm-ensemble/APICall_cost_accuracy.png" alt="Cost vs Accuracy" /></p>

<p>The diminishing returns kick in hard after n=3. Going from 98.4% to 99.6% costs an entire extra API call for just 1.2 percentage points. But for production-level reliability, I decided that extra margin was worth it.</p>

<p><strong>I settled on n=4: 99.6% accuracy at 4x the cost.</strong></p>

<h2 id="when-this-breaks">When This Breaks</h2>

<p>This approach only works because my LLM had a directional bias (like undercounting - in this case). The evaluation function must match your error pattern:</p>

<ul>
  <li><strong>Undercounting errors</strong> → Use Max()</li>
  <li><strong>Overcounting errors</strong> → Use Min()</li>
  <li><strong>Random errors</strong> (sometimes high, sometimes low) → Use majority voting with odd n (3, 5, 7) to avoid ties</li>
</ul>

<p>The key is understanding <em>how</em> your model fails, not just <em>that</em> it fails. Directional biases can take many forms - summarization models that are consistently too brief, classifiers that favor certain categories, extractors that miss edge cases. Each needs its own aggregation strategy.</p>

<p>If you don’t understand your failure mode, you’re just burning money on redundant calls.</p>

<p>(Now that I think about it, we can dedicate a separate blog post on designing Eval functions)</p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>Sometimes the best solution isn’t a better prompt or a bigger model. It’s understanding your failure mode and exploiting it mathematically.</p>

<p>A single API call gave me 75% accuracy. Four calls with a simple Max() aggregator got me to 99.6%. Same model, same prompt. Just a smarter approach.</p>

<p>The real lesson? When you can’t improve the model’s performance, improve how you use it. In a constrained space, solving a problem becomes more interesting.</p>]]></content><author><name></name></author><category term="technical" /><summary type="html"><![CDATA[The last project I worked on involved a lot of LLM API calls. One subtask seemed simple: count elements from a specific list. Straightforward, right? Not quite. This needed production-level accuracy. But the simple API approach wasn’t cutting it. After testing 50 cases, I was only hitting around ~75% accuracy (37 out of 50). For production, that’s a non-starter. The Problem with Single API Calls The LLM was doing the task correctly for some instances but missing elements in others. Sometimes it would catch all 10 items, other times only 7 or 8. The pattern was clear: when it failed, it undercounted. It never hallucinated extra elements or went above the true count. It just missed things. This directional bias turned out to be the key insight. So I “Random Forest” It I decided to apply the “wisdom of crowds” principle. The same concept that makes Random Forest work. Instead of relying on a single API call, use multiple calls and aggregate intelligently. The evaluation rule was simple: Max(API_call_1, API_call_2, …, API_call_n) Example: If there are 10 elements and three API calls return [7, 10, 3], the final output is 10. Why this works: The undercounting errors get filtered out. The max function naturally finds the correct answer as long as at least one call succeeds. Since the LLM never overcounts, the highest value is almost always the right one. Here’s how the two approaches compare: The Math Behind It With a single API call, the question is: What’s the probability of success? With ensemble, it becomes: What’s the probability of at least one success? The math changes drastically: P(at least one correct) = 1 - P(all calls wrong) For n=3 calls with p=0.75 success rate: P(all wrong) = (1-p)ⁿ = 0.25³ = 0.015625 P(at least one correct) = 1 - 0.015625 = 98.4% Going from 75% to 98.4% with just 3 calls? Not bad at all. Finding the Sweet Spot But I couldn’t just pick any number. Each API call costs money and adds latency. I needed to balance accuracy against cost. Here’s how the numbers break down: n calls Accuracy Cost Multiplier 1 75.0% 1x 2 93.8% 2x 3 98.4% 3x 4 99.6% 4x 5 99.9% 5x The diminishing returns kick in hard after n=3. Going from 98.4% to 99.6% costs an entire extra API call for just 1.2 percentage points. But for production-level reliability, I decided that extra margin was worth it. I settled on n=4: 99.6% accuracy at 4x the cost. When This Breaks This approach only works because my LLM had a directional bias (like undercounting - in this case). The evaluation function must match your error pattern: Undercounting errors → Use Max() Overcounting errors → Use Min() Random errors (sometimes high, sometimes low) → Use majority voting with odd n (3, 5, 7) to avoid ties The key is understanding how your model fails, not just that it fails. Directional biases can take many forms - summarization models that are consistently too brief, classifiers that favor certain categories, extractors that miss edge cases. Each needs its own aggregation strategy. If you don’t understand your failure mode, you’re just burning money on redundant calls. (Now that I think about it, we can dedicate a separate blog post on designing Eval functions) The Takeaway Sometimes the best solution isn’t a better prompt or a bigger model. It’s understanding your failure mode and exploiting it mathematically. A single API call gave me 75% accuracy. Four calls with a simple Max() aggregator got me to 99.6%. Same model, same prompt. Just a smarter approach. The real lesson? When you can’t improve the model’s performance, improve how you use it. In a constrained space, solving a problem becomes more interesting.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://shibaprasadb.github.io/images/posts/2026-01-20-llm-ensemble/flow_LLM_Ensemble.jpg" /><media:content medium="image" url="https://shibaprasadb.github.io/images/posts/2026-01-20-llm-ensemble/flow_LLM_Ensemble.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Curious Case of West Bengal’s Disappearing Class 12 Students</title><link href="https://shibaprasadb.github.io/2025/10/04/class12-dropouts.html" rel="alternate" type="text/html" title="The Curious Case of West Bengal’s Disappearing Class 12 Students" /><published>2025-10-04T00:00:00+00:00</published><updated>2025-10-04T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2025/10/04/class12-dropouts</id><content type="html" xml:base="https://shibaprasadb.github.io/2025/10/04/class12-dropouts.html"><![CDATA[<p>Recently, I came across some surprising statistics. In West Bengal, for the final exam of Class 12, this year (2025) almost <strong>3 lakhs fewer students</strong> appeared compared to previous years. The figure sounded mind-bogglingly shocking to me. The <a href="https://timesofindia.indiatimes.com/city/kolkata/hs-numbers-drop-by-1/3rd-from-last-year/articleshow/118586088.cms">board secretary</a> highlighted that this happened because they brought in a rule in 2017 - that mandated students need to be 10 years old in Class 5.</p>

<p>Still, the 3 lakh number is huge. Especially when we are far from hitting below the replacement level TFR. In any case, I got curious and wanted to explore what is happening with other states.</p>

<hr />

<h2 id="state-selection">State Selection</h2>

<p>Taking a look at 28 states would increase the noise more than the signal. And I don’t intend to publish a comprehensive comparative study of all the states. So, I cherry-picked a few, just to see how they’re doing:</p>

<ul>
  <li><strong>Tamil Nadu</strong>: Often cited as a model state for educational outcomes. Good to have them as a benchmark.</li>
  <li><strong>Bihar</strong>: Eastern Indian state, West Bengal’s neighbor. Most populous state in the East. The lowest per capita income in the country.</li>
  <li><strong>Maharashtra</strong>: Western Indian state. Highest state GDP. Mix of urban and rural demographics.</li>
  <li><strong>Haryana</strong>: Northern Indian state. 7th richest in terms of per capita income.</li>
  <li><strong>Uttar Pradesh</strong>: Bihar’s neighbor. 2nd most populous state overall. Similar socio-economic profile to Bihar in many ways.</li>
</ul>

<p>This selection gives us a mix of geographical regions, economic profiles, and educational performance levels - enough to see if West Bengal’s trend is an outlier or part of a broader pattern.</p>

<p>Let’s have a look at the absolute numbers - and their trends.</p>

<p><em>(Note: the numbers are quite decentralized. So I had to collate them from several sources. Directionally, this should be quite accurate.)</em></p>

<hr />

<h2 id="absolute-numbers-trends-from-20212025">Absolute Numbers: Trends from 2021–2025</h2>

<p><img src="/images/posts/2025-10-04-class12-droput/students_by_state_facets.png" alt="Students by State" /></p>

<p><strong>What stands out:</strong></p>

<p>For Haryana and West Bengal, there has been a decline for 2 consecutive years. But others have shown a consistent pattern overall. For UP, the pattern is interesting - there’s some volatility, but it is still stable. No sharp decline like West Bengal.</p>

<p>The West Bengal drop is particularly dramatic: from approximately 800K students in 2021 to around 470K in 2025. That’s a <strong>41% decline</strong>. Haryana shows a similar downward trend, though less severe - from about 230K to 195K (a 15% drop).</p>

<p>Maharashtra, Tamil Nadu, Bihar, and Uttar Pradesh remain relatively stable, with only minor fluctuations around their baseline numbers.</p>

<hr />

<h2 id="summary-year-on-year-changes">Summary: Year-on-Year Changes</h2>

<table>
  <thead>
    <tr>
      <th>State/Board</th>
      <th>2021</th>
      <th>2022</th>
      <th>2023</th>
      <th>2024</th>
      <th>2025</th>
      <th>Change (2021–2025)</th>
      <th>% Change</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Uttar Pradesh (UPMSP)</td>
      <td>26,10,000</td>
      <td>24,10,971</td>
      <td>27,69,000</td>
      <td>24,53,000</td>
      <td>26,91,000</td>
      <td>+81,000</td>
      <td>+3.1%</td>
    </tr>
    <tr>
      <td>Maharashtra (MSBSHSE)</td>
      <td>15,75,752</td>
      <td>15,68,977</td>
      <td>15,29,096</td>
      <td>15,49,326</td>
      <td>15,98,553</td>
      <td>+22,801</td>
      <td>+1.4%</td>
    </tr>
    <tr>
      <td>Bihar (BSEB)</td>
      <td>13,40,000</td>
      <td>13,56,000</td>
      <td>13,04,000</td>
      <td>12,91,000</td>
      <td>12,92,000</td>
      <td>-48,000</td>
      <td>-3.6%</td>
    </tr>
    <tr>
      <td>Tamil Nadu (TNBSE)</td>
      <td>8,18,000</td>
      <td>8,06,000</td>
      <td>8,00,000</td>
      <td>7,61,000</td>
      <td>7,92,000</td>
      <td>-26,000</td>
      <td>-3.2%</td>
    </tr>
    <tr>
      <td>West Bengal (WBCHSE)</td>
      <td>8,00,000</td>
      <td>7,21,000</td>
      <td>8,25,000</td>
      <td>7,55,000</td>
      <td>4,74,000</td>
      <td>-3,26,000</td>
      <td>-40.8%</td>
    </tr>
    <tr>
      <td>Haryana (HBSE)</td>
      <td>2,28,000</td>
      <td>2,46,000</td>
      <td>2,63,000</td>
      <td>2,14,000</td>
      <td>1,94,000</td>
      <td>-34,000</td>
      <td>-14.9%</td>
    </tr>
  </tbody>
</table>

<p>The table makes it clear: <strong>West Bengal’s decline is not just steep, it’s an outlier.</strong> No other state comes close to this magnitude of drop.</p>

<hr />

<h2 id="normalizing-by-population">Normalizing by Population</h2>

<p>Let’s take a look at the number of exam takers per 1,000 population:</p>

<p><img src="/images/posts/2025-10-04-class12-droput/12students_per1k.png" alt="Students per 1,000 Population" /></p>

<p><strong>Data Signals:</strong></p>

<p>Maharashtra maintains the highest rate throughout (around 12–13 students per 1,000 population), suggesting either better retention rates or favorable demographics. West Bengal’s rate drops dramatically from about 8 to under 5 per 1,000 population - confirming that this isn’t just a population effect, but a real decline in participation rates.</p>

<p>This chart reveals interesting patterns, but comes with a caveat. This has an implicit assumption: the proportion of the eligible age group (16-18 year-olds) within the total population is uniform across states. Which, obviously, is not true. States with younger populations will have proportionally more children under 10 and fewer teenagers, while states further along in demographic transition will have a higher share of the 16-18 cohort. But we don’t have more granular data readily available. At least, I couldn’t find one. Ideally, we should be looking at the base of 16-20 or 15-19 age cohorts. That would give us a better idea about the “eligible” group.</p>

<hr />

<h2 id="another-proxy-youth-population-base">Another Proxy: Youth Population Base</h2>

<p>The closest we can get to the ideal metric is the population of the 0–14 age group in 2021, which was published by <a href="https://www.dataforindia.com/age-distribution-states/#:~:text=As%20of%202021%2C%20Kerala%20was,Age-groups%20in%20Indian%20states">Data for India</a>. This can serve as a reasonable proxy for the eligible age group in 2025.</p>

<p>What if we try to see the number of students appearing per 1,000 youth?</p>

<p><img src="/images/posts/2025-10-04-class12-droput/12students_1000youth.png" alt="Students per 1,000 Youth Population" /></p>

<p>What this reveals:</p>

<p>It looks far worse for West Bengal. But this is a total guesstimation - at best. Here, ‘Youth’ is proxied by the 0–14 population share in 2021. By 2025, this group spans roughly ages 4–18. It’s not a perfect match to the Class 12 eligible cohort (≈17–18 years), but directionally it reflects the size of the feeder base.</p>

<p>Maharashtra leads with 58.0 students per 1,000 youth, followed by Tamil Nadu (50.0) and Uttar Pradesh (37.7). West Bengal sits at just 22.6 - trailing behind Haryana (26.7).</p>

<p>This stark difference (58.0 for Maharashtra vs 22.6 for West Bengal) suggests one of two things:</p>

<ul>
  <li>Massive dropout rates between lower and upper secondary levels in West Bengal</li>
  <li>Demographic differences - West Bengal might have a younger age structure with proportionally more children in the 0–10 range than the 10–14 range</li>
</ul>

<p>Most likely, it’s a combination of both.</p>

<hr />

<h2 id="what-could-be-happening">What Could Be Happening?</h2>

<h3 id="the-2017-policy-and-the-overage-bulge">The 2017 Policy and the Overage Bulge</h3>

<p>If the age mandate was introduced in 2017, requiring students to be 10 years old in Class 5, those students would hit Class 12 around 2024–2025. The timing checks out perfectly.</p>

<p>But here’s a critical question: Was there a bulge of overage students in earlier years that’s now correcting?</p>

<p>In many Indian states, particularly in rural areas, it’s not uncommon for children to start school late or repeat grades. If West Bengal had a significant population of unqualified students in the system pre-2017, the 2021–2023 cohorts might have represented this bulge working its way through. The 2024–2025 drop could then be the system “normalizing” to age-appropriate enrollments.</p>

<p>This would explain why the drop is so dramatic - it’s not just one year’s worth of students, but potentially 2–3 years’ worth of overage students who would have been in the system under the old regime.</p>

<hr />

<h3 id="why-haryana">Why Haryana?</h3>

<p>Haryana’s decline is notable but less discussed. I haven’t found evidence of a similar age-mandate policy there. This warrants investigation. Possible factors could include:</p>

<ul>
  <li>Migration patterns (families moving for work)</li>
  <li>Shift to private schooling or alternative examination boards (CBSE, ICSE)</li>
  <li>Economic factors affecting school retention</li>
</ul>

<hr />

<h2 id="future-work">Future Work</h2>

<p>This exploratory analysis raises more questions than it answers (which has been my goal anyways for this newsletter):</p>

<ul>
  <li><strong>Granular age-cohort data</strong>: Getting actual 15–19 or 16–20 population data by state would dramatically improve the accuracy of per-capita calculations.</li>
  <li><strong>Covid’s shadow</strong>: Does the pandemic have something to do with this? The 2024-2025 cohort would have been in Classes 9-10 during 2020-2021 (peak Covid years). If the pandemic disproportionately affected rural schooling (due to a lack of digital infrastructure), we might see this reflected in the numbers. Breaking down the data by urban-rural divide would help determine if Covid-induced dropouts are part of the story.</li>
  <li><strong>Pre-2017 enrollment patterns</strong>: Analyzing the age distribution of students in West Bengal’s secondary schools from 2015–2020 would reveal if there was indeed an overage bulge.</li>
  <li><strong>Dropout analysis</strong>: Where exactly are students dropping out? Between Class 8–10? Or 10–12? State-level progression ratios would be illuminating.</li>
  <li><strong>Cross-board comparison</strong>: Many students in urban areas take CBSE/ICSE boards instead of state boards. Are West Bengal’s numbers declining while CBSE enrollment is rising?</li>
  <li><strong>Haryana deep-dive</strong>: Understanding what’s driving Haryana’s decline could reveal factors beyond policy changes - economic trends, migration patterns, or shifts in educational preferences.</li>
  <li><strong>Long-term tracking</strong>: Will West Bengal’s numbers stabilize at this new lower level, or continue to decline? Data from 2026–2027 will be crucial.</li>
</ul>

<hr />

<h2 id="conclusion">Conclusion</h2>

<p>West Bengal’s ~37% decline in Class 12 exam takers from 2024 to 2025 is unprecedented among major Indian states. While the 2017 age-mandate policy provides a plausible explanation - a correction after years of unqualified students in the system - the sheer magnitude demands deeper investigation.</p>

<p>As someone from West Bengal, these numbers are both surprising and worrying. They point to deeper structural issues in educational access, retention, or migration that go beyond a single policy change. Whether this is a one-time correction or the beginning of a longer-term trend will become clearer in the years ahead.</p>

<hr />

<p><em>If you found this worth your time, please subscribe to <strong>The Data Signal</strong> - it’s free. I explore data, AI, analytics, and strategy, tackling interesting questions that don’t have obvious answers. It would mean the world to me, knowing that someone is finding value in my work.</em></p>]]></content><author><name></name></author><category term="data-stories" /><summary type="html"><![CDATA[Recently, I came across some surprising statistics. In West Bengal, for the final exam of Class 12, this year (2025) almost 3 lakhs fewer students appeared compared to previous years. The figure sounded mind-bogglingly shocking to me. The board secretary highlighted that this happened because they brought in a rule in 2017 - that mandated students need to be 10 years old in Class 5. Still, the 3 lakh number is huge. Especially when we are far from hitting below the replacement level TFR. In any case, I got curious and wanted to explore what is happening with other states. State Selection Taking a look at 28 states would increase the noise more than the signal. And I don’t intend to publish a comprehensive comparative study of all the states. So, I cherry-picked a few, just to see how they’re doing: Tamil Nadu: Often cited as a model state for educational outcomes. Good to have them as a benchmark. Bihar: Eastern Indian state, West Bengal’s neighbor. Most populous state in the East. The lowest per capita income in the country. Maharashtra: Western Indian state. Highest state GDP. Mix of urban and rural demographics. Haryana: Northern Indian state. 7th richest in terms of per capita income. Uttar Pradesh: Bihar’s neighbor. 2nd most populous state overall. Similar socio-economic profile to Bihar in many ways. This selection gives us a mix of geographical regions, economic profiles, and educational performance levels - enough to see if West Bengal’s trend is an outlier or part of a broader pattern. Let’s have a look at the absolute numbers - and their trends. (Note: the numbers are quite decentralized. So I had to collate them from several sources. Directionally, this should be quite accurate.) Absolute Numbers: Trends from 2021–2025 What stands out: For Haryana and West Bengal, there has been a decline for 2 consecutive years. But others have shown a consistent pattern overall. For UP, the pattern is interesting - there’s some volatility, but it is still stable. No sharp decline like West Bengal. The West Bengal drop is particularly dramatic: from approximately 800K students in 2021 to around 470K in 2025. That’s a 41% decline. Haryana shows a similar downward trend, though less severe - from about 230K to 195K (a 15% drop). Maharashtra, Tamil Nadu, Bihar, and Uttar Pradesh remain relatively stable, with only minor fluctuations around their baseline numbers. Summary: Year-on-Year Changes State/Board 2021 2022 2023 2024 2025 Change (2021–2025) % Change Uttar Pradesh (UPMSP) 26,10,000 24,10,971 27,69,000 24,53,000 26,91,000 +81,000 +3.1% Maharashtra (MSBSHSE) 15,75,752 15,68,977 15,29,096 15,49,326 15,98,553 +22,801 +1.4% Bihar (BSEB) 13,40,000 13,56,000 13,04,000 12,91,000 12,92,000 -48,000 -3.6% Tamil Nadu (TNBSE) 8,18,000 8,06,000 8,00,000 7,61,000 7,92,000 -26,000 -3.2% West Bengal (WBCHSE) 8,00,000 7,21,000 8,25,000 7,55,000 4,74,000 -3,26,000 -40.8% Haryana (HBSE) 2,28,000 2,46,000 2,63,000 2,14,000 1,94,000 -34,000 -14.9% The table makes it clear: West Bengal’s decline is not just steep, it’s an outlier. No other state comes close to this magnitude of drop. Normalizing by Population Let’s take a look at the number of exam takers per 1,000 population: Data Signals: Maharashtra maintains the highest rate throughout (around 12–13 students per 1,000 population), suggesting either better retention rates or favorable demographics. West Bengal’s rate drops dramatically from about 8 to under 5 per 1,000 population - confirming that this isn’t just a population effect, but a real decline in participation rates. This chart reveals interesting patterns, but comes with a caveat. This has an implicit assumption: the proportion of the eligible age group (16-18 year-olds) within the total population is uniform across states. Which, obviously, is not true. States with younger populations will have proportionally more children under 10 and fewer teenagers, while states further along in demographic transition will have a higher share of the 16-18 cohort. But we don’t have more granular data readily available. At least, I couldn’t find one. Ideally, we should be looking at the base of 16-20 or 15-19 age cohorts. That would give us a better idea about the “eligible” group. Another Proxy: Youth Population Base The closest we can get to the ideal metric is the population of the 0–14 age group in 2021, which was published by Data for India. This can serve as a reasonable proxy for the eligible age group in 2025. What if we try to see the number of students appearing per 1,000 youth? What this reveals: It looks far worse for West Bengal. But this is a total guesstimation - at best. Here, ‘Youth’ is proxied by the 0–14 population share in 2021. By 2025, this group spans roughly ages 4–18. It’s not a perfect match to the Class 12 eligible cohort (≈17–18 years), but directionally it reflects the size of the feeder base. Maharashtra leads with 58.0 students per 1,000 youth, followed by Tamil Nadu (50.0) and Uttar Pradesh (37.7). West Bengal sits at just 22.6 - trailing behind Haryana (26.7). This stark difference (58.0 for Maharashtra vs 22.6 for West Bengal) suggests one of two things: Massive dropout rates between lower and upper secondary levels in West Bengal Demographic differences - West Bengal might have a younger age structure with proportionally more children in the 0–10 range than the 10–14 range Most likely, it’s a combination of both. What Could Be Happening? The 2017 Policy and the Overage Bulge If the age mandate was introduced in 2017, requiring students to be 10 years old in Class 5, those students would hit Class 12 around 2024–2025. The timing checks out perfectly. But here’s a critical question: Was there a bulge of overage students in earlier years that’s now correcting? In many Indian states, particularly in rural areas, it’s not uncommon for children to start school late or repeat grades. If West Bengal had a significant population of unqualified students in the system pre-2017, the 2021–2023 cohorts might have represented this bulge working its way through. The 2024–2025 drop could then be the system “normalizing” to age-appropriate enrollments. This would explain why the drop is so dramatic - it’s not just one year’s worth of students, but potentially 2–3 years’ worth of overage students who would have been in the system under the old regime. Why Haryana? Haryana’s decline is notable but less discussed. I haven’t found evidence of a similar age-mandate policy there. This warrants investigation. Possible factors could include: Migration patterns (families moving for work) Shift to private schooling or alternative examination boards (CBSE, ICSE) Economic factors affecting school retention Future Work This exploratory analysis raises more questions than it answers (which has been my goal anyways for this newsletter): Granular age-cohort data: Getting actual 15–19 or 16–20 population data by state would dramatically improve the accuracy of per-capita calculations. Covid’s shadow: Does the pandemic have something to do with this? The 2024-2025 cohort would have been in Classes 9-10 during 2020-2021 (peak Covid years). If the pandemic disproportionately affected rural schooling (due to a lack of digital infrastructure), we might see this reflected in the numbers. Breaking down the data by urban-rural divide would help determine if Covid-induced dropouts are part of the story. Pre-2017 enrollment patterns: Analyzing the age distribution of students in West Bengal’s secondary schools from 2015–2020 would reveal if there was indeed an overage bulge. Dropout analysis: Where exactly are students dropping out? Between Class 8–10? Or 10–12? State-level progression ratios would be illuminating. Cross-board comparison: Many students in urban areas take CBSE/ICSE boards instead of state boards. Are West Bengal’s numbers declining while CBSE enrollment is rising? Haryana deep-dive: Understanding what’s driving Haryana’s decline could reveal factors beyond policy changes - economic trends, migration patterns, or shifts in educational preferences. Long-term tracking: Will West Bengal’s numbers stabilize at this new lower level, or continue to decline? Data from 2026–2027 will be crucial. Conclusion West Bengal’s ~37% decline in Class 12 exam takers from 2024 to 2025 is unprecedented among major Indian states. While the 2017 age-mandate policy provides a plausible explanation - a correction after years of unqualified students in the system - the sheer magnitude demands deeper investigation. As someone from West Bengal, these numbers are both surprising and worrying. They point to deeper structural issues in educational access, retention, or migration that go beyond a single policy change. Whether this is a one-time correction or the beginning of a longer-term trend will become clearer in the years ahead. If you found this worth your time, please subscribe to The Data Signal - it’s free. I explore data, AI, analytics, and strategy, tackling interesting questions that don’t have obvious answers. It would mean the world to me, knowing that someone is finding value in my work.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://shibaprasadb.github.io/images/posts/2025-10-04-class12-dropout/students_by_state_facets.png" /><media:content medium="image" url="https://shibaprasadb.github.io/images/posts/2025-10-04-class12-dropout/students_by_state_facets.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Trap of Copying the USP</title><link href="https://shibaprasadb.github.io/2025/09/28/catenaccio-biryani-usp.html" rel="alternate" type="text/html" title="The Trap of Copying the USP" /><published>2025-09-28T00:00:00+00:00</published><updated>2025-09-28T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2025/09/28/catenaccio-biryani-usp</id><content type="html" xml:base="https://shibaprasadb.github.io/2025/09/28/catenaccio-biryani-usp.html"><![CDATA[<p>My current organization has a hybrid setup. That means I need to be in the office a few days a week and work from home for the rest.</p>

<p>On the days I am at the office, I prefer to have lunch in the office cafeteria. It is generally either chicken kebabs and roti or a chicken salad - both good, high-protein options.</p>

<p>Last week, I saw something interesting. There was a biryani counter on an unusual day (we generally have biryani on Wednesdays). The sign said, “Calcutta style Biryani — biryani with potatoes (alu)”.</p>

<p>I got excited and went for it, instead of my usual lunch. It was a complete disaster. They had just added potatoes to some form of biryani. The biryani was spicier than a typical Andhra-style biryani, and the overall taste was quite bad. I regretted taking in approximately 700 calories for nothing.</p>

<p>Later, while thinking about the biryani, I suddenly remembered <a href="https://en.wikipedia.org/wiki/Catenaccio">Catenaccio</a>, the football formation made famous by Helenio Herrera when he was at Inter. The legendary Inter team, playing that formation, went on to play three European finals back-to-back and won two.</p>

<p>There have been enough write-ups about the famous formation and style of play, but to give a brief summary: it became famous as a defensive formation. But it had another element to it: the quick attack. Herrera apparently gave strong instructions to his players that when they got the ball, they had to move it as quickly as possible into the opponents’ half with as few touches as possible.</p>

<p>After Herrera, many tried to replicate the system but failed miserably. Herrera himself once explained why people failed to replicate it:</p>

<blockquote>
  <p>The problem is that most of the people who copied me copied me wrongly. They forgot to include the attacking principles that my Catenaccio included.</p>
</blockquote>

<p>This is essentially what happened with the “Calcutta Biryani” at my office cafeteria. They were busy replicating the USP (adding potatoes) but miserably failed to replicate the other things that make Calcutta biryani what it is: soft, fluffy rice, light spices, and diverse ingredients.</p>

<p>This is something that I feel is more common than we realize, especially in the business world and in daily life. Organizations try to mimic others’ success by only replicating their USP. And they miss out on the nuances that made the business what it is.</p>

<p>For example, the <a href="https://ceres.shop/blog/a-slice-of-history-the-saga-of-mcdonalds-illfated-mcpizza">McPizza from McDonald’s</a>, the <a href="https://inspireip.com/microsoft-zune-failure-case-study/">Zune from Microsoft</a>, among many other cases.</p>

<p>When copying success, we gravitate toward the obvious differentiator: the potatoes, the defensive setup, the standout feature. But we ignore the invisible supporting structures that actually make it work. Copy the whole system, not just the part everyone notices. Call it the Catenaccio Principle.</p>

<hr />

<p><em>This piece was first written for my ‘Ordinary Analysis’ newsletter. <a href="https://ordinaryanalysis.substack.com/p/catenaccio-calcutta-biryani-and-the">Read it there</a>.</em></p>]]></content><author><name></name></author><category term="product" /><category term="strategy" /><summary type="html"><![CDATA[Catenaccio, Kolkata biryani, and why copying someone's best feature rarely works.]]></summary></entry><entry><title type="html">Notes on Vibe Coding</title><link href="https://shibaprasadb.github.io/2025/09/05/vibe-coding.html" rel="alternate" type="text/html" title="Notes on Vibe Coding" /><published>2025-09-05T00:00:00+00:00</published><updated>2025-09-05T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2025/09/05/vibe-coding</id><content type="html" xml:base="https://shibaprasadb.github.io/2025/09/05/vibe-coding.html"><![CDATA[<p>I have been using LLMs (Claude and ChatGPT) for coding a lot, especially for my secondary languages (like Python).</p>

<p>I am fairly comfortable using them as a co-pilot: I design everything, solve smaller tasks with them, and then patch everything together.</p>

<p>Over the weekend, I did some vibe-coding, coupled with my old style of using LLMs, to create my professional website. I had tried doing something like this long back, but my lack of knowledge in HTML/CSS was holding me back. So I had a very basic WordPress page. Thanks to LLMs, I could easily create my website.</p>

<p>Here are some notes and observations:</p>

<h3>Importance of software engineering</h3>

<p>I am not a software engineer in any shape or form. I just have a basic understanding because of my work, and that proved to be quite crucial in this whole exercise. The biggest hurdle I found while doing vibe-coding was that the code wasn’t modular enough most of the time. I had to give constant nudges like this:</p>

<p><a href="/images/posts/2025-09-05-notes-vibecoding/vibe_nudge.png"><img src="/images/posts/2025-09-05-notes-vibecoding/vibe_nudge.png" alt="Don't just vibe code." /></a></p>

<p>And once prompted, it did a good job of modifying.</p>

<h3>Smaller chunks, small tasks</h3>

<p>Be it vibe-coding or simple prompting, I never try to do multiple things in one go. I break down the bigger task into smaller buckets, then ask the LLM to perform those smaller tasks. In that way, things don’t break. It is also quite easy to write prompts, and the room for ambiguity reduces to a huge extent.</p>

<h3>Commit frequently</h3>

<p>I don’t know if I have some kind of insecurity about it, or if it is just a generally good practice. Whenever I am building anything with the help of LLMs, I commit my code changes very frequently. That way, I don’t have to worry about ‘losing’ anything. Even if a new update is badly designed, I can just ignore it and stay with my older one.</p>

<h3>Start over</h3>

<p>It is not a very common thing, but it has happened to me when the output produced was quite subpar (both in vibe-coding and in general prompting). The model will often just not get it at all! The best thing at that point is to start over. If the project is small, you can just upload the files directly in a new chat. Or in the previous chat, just ask for a detailed documentation of what you have done, then paste it in a new chat, and start again.</p>

<p>If you have too many files, then probably something like Cursor or Codex might help more.</p>

<hr />

<p>These have been my observations and experience so far. I’ll be experimenting more and plan to update this in 2–3 months. Curious to know: how has your experience been when coding with LLMs?</p>]]></content><author><name></name></author><category term="reflections" /><summary type="html"><![CDATA[I have been using LLMs (Claude and ChatGPT) for coding a lot, especially for my secondary languages (like Python). I am fairly comfortable using them as a co-pilot: I design everything, solve smaller tasks with them, and then patch everything together. Over the weekend, I did some vibe-coding, coupled with my old style of using LLMs, to create my professional website. I had tried doing something like this long back, but my lack of knowledge in HTML/CSS was holding me back. So I had a very basic WordPress page. Thanks to LLMs, I could easily create my website. Here are some notes and observations: Importance of software engineering I am not a software engineer in any shape or form. I just have a basic understanding because of my work, and that proved to be quite crucial in this whole exercise. The biggest hurdle I found while doing vibe-coding was that the code wasn’t modular enough most of the time. I had to give constant nudges like this: And once prompted, it did a good job of modifying. Smaller chunks, small tasks Be it vibe-coding or simple prompting, I never try to do multiple things in one go. I break down the bigger task into smaller buckets, then ask the LLM to perform those smaller tasks. In that way, things don’t break. It is also quite easy to write prompts, and the room for ambiguity reduces to a huge extent. Commit frequently I don’t know if I have some kind of insecurity about it, or if it is just a generally good practice. Whenever I am building anything with the help of LLMs, I commit my code changes very frequently. That way, I don’t have to worry about ‘losing’ anything. Even if a new update is badly designed, I can just ignore it and stay with my older one. Start over It is not a very common thing, but it has happened to me when the output produced was quite subpar (both in vibe-coding and in general prompting). The model will often just not get it at all! The best thing at that point is to start over. If the project is small, you can just upload the files directly in a new chat. Or in the previous chat, just ask for a detailed documentation of what you have done, then paste it in a new chat, and start again. If you have too many files, then probably something like Cursor or Codex might help more. These have been my observations and experience so far. I’ll be experimenting more and plan to update this in 2–3 months. Curious to know: how has your experience been when coding with LLMs?]]></summary></entry><entry><title type="html">Why I’m Cross-Posting Beyond Substack</title><link href="https://shibaprasadb.github.io/2025/09/02/BeyondSubstack.html" rel="alternate" type="text/html" title="Why I’m Cross-Posting Beyond Substack" /><published>2025-09-02T00:00:00+00:00</published><updated>2025-09-02T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2025/09/02/BeyondSubstack</id><content type="html" xml:base="https://shibaprasadb.github.io/2025/09/02/BeyondSubstack.html"><![CDATA[<p>I have been using Substack for some time now.</p>

<p>I have two blog-newsletters: one where I write about <a href="https://ordinaryanalysis.substack.com/">my reflections</a>, and another that is <a href="https://datasignal.substack.com/">data-tech related</a>.</p>

<p>As a platform to distribute your writings, Substack works really well. So, I will continue to publish my personal blog there (or if I like the new setup, then I might start here more regularly too).</p>

<p>But for more technical content, Substack might not be the best platform.</p>

<p>One thing is that it performs rather poorly on different platforms where you would like to share your work. Again, this is not Substack’s fault, but it does a great disservice to the people sharing their work.</p>

<p>Another is that owning your content is a great thing. On a personal website, you will be the one owning everything, and this can be used as a repository for all the work that you do.<br />
And let’s face it, all Substack blogs, mostly, look the same. Personal websites, OTOH, reflect the author in a better way.</p>

<p>Keeping this in mind, I will start sharing my tech-related explorations here, and maybe add a link or two for Substack.</p>

<p>Let’s see how it goes.</p>

<p>I created this website with some vibe coding and through some LLM-driven (non-vibe-coding) help. My next post will be related to that. I enjoyed my first vibe coding explorations, but there are some pitfalls.</p>

<hr />

<p>This is my first post on this site. Going forward, I’ll be using this space for my <strong>technical explorations</strong> — topics around data science, analytics, product thinking, and experiments with new tools and workflows.</p>

<p>If you’d like to keep up with my more reflective, personal essays, you can subscribe to <a href="https://ordinaryanalysis.substack.com/">Ordinary Analysis</a>. For data-tech content in a newsletter format, you’ll still find me on <a href="https://datasignal.substack.com/">Data Signal</a>.</p>

<p>Thanks for reading, and welcome!</p>]]></content><author><name></name></author><category term="reflections" /><summary type="html"><![CDATA[I have been using Substack for some time now. I have two blog-newsletters: one where I write about my reflections, and another that is data-tech related. As a platform to distribute your writings, Substack works really well. So, I will continue to publish my personal blog there (or if I like the new setup, then I might start here more regularly too). But for more technical content, Substack might not be the best platform. One thing is that it performs rather poorly on different platforms where you would like to share your work. Again, this is not Substack’s fault, but it does a great disservice to the people sharing their work. Another is that owning your content is a great thing. On a personal website, you will be the one owning everything, and this can be used as a repository for all the work that you do. And let’s face it, all Substack blogs, mostly, look the same. Personal websites, OTOH, reflect the author in a better way. Keeping this in mind, I will start sharing my tech-related explorations here, and maybe add a link or two for Substack. Let’s see how it goes. I created this website with some vibe coding and through some LLM-driven (non-vibe-coding) help. My next post will be related to that. I enjoyed my first vibe coding explorations, but there are some pitfalls. This is my first post on this site. Going forward, I’ll be using this space for my technical explorations — topics around data science, analytics, product thinking, and experiments with new tools and workflows. If you’d like to keep up with my more reflective, personal essays, you can subscribe to Ordinary Analysis. For data-tech content in a newsletter format, you’ll still find me on Data Signal. Thanks for reading, and welcome!]]></summary></entry><entry><title type="html">When Product Patches Operational Cracks</title><link href="https://shibaprasadb.github.io/2025/07/02/when-product-patches-operational-cracks.html" rel="alternate" type="text/html" title="When Product Patches Operational Cracks" /><published>2025-07-02T00:00:00+00:00</published><updated>2025-07-02T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2025/07/02/when-product-patches-operational-cracks</id><content type="html" xml:base="https://shibaprasadb.github.io/2025/07/02/when-product-patches-operational-cracks.html"><![CDATA[<p>Ever notice how tech companies love slapping on fancy features instead of fixing what’s actually broken?<br />
It’s like installing a smart doorbell on a house with no locks. Here’s a perfect example from India’s cab scene and why this kind of thinking is completely backwards.</p>

<p>Have you ever taken an auto or cab ride from one of India’s leading cab service providers? Especially after 11 PM? Or during “late-night” hours?</p>

<p>You might have noticed something then. After your ride is completed, you receive multiple calls from an auto-generated voice. It usually prompts something like: <em>“Have you reached your destination safely? Dial 1…”</em></p>

<p>Often, they “spam” the customer 4–5 times. Sometimes it stops after just one call. On the surface, this feels like a great feature. But we may need to double-click on this to understand what’s really happening and why.</p>

<hr />

<p>First things first: I started wondering, why does this feature even exist?<br />
<strong>It feels like a reactive measure at best, and a highly inefficient one at worst.</strong></p>

<p>To understand this better, I’d recommend listening to one episode from <em>The Ken</em> (a podcast report that digs into this issue among others).</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/46PLVdNWcuo" frameborder="0" allowfullscreen=""></iframe>

<p>This episode outlines the poor safety protocols followed by the company. According to the report, there are barely any proper checks while onboarding new drivers. Competitors, on the other hand, typically conduct police verifications and background checks. That step is simply missing here.</p>

<p>So what do they do instead?</p>

<p>They call you after the ride ends. Sometimes once. Sometimes five times. Basically, it’s a product-level fix for an operational inefficiency. And honestly, that might not be enough.</p>

<hr />

<h2 id="so-what-could-be-done-better">So what could be done better?</h2>

<h3 id="the-basics-first">The basics first</h3>
<ul>
  <li>Introduce police verification. Like other companies already do.</li>
</ul>

<h3 id="operational-logic-over-product-patchwork">Operational logic over product patchwork</h3>
<ul>
  <li>Give drivers the option to skip verification, but flag them clearly.</li>
  <li>Don’t assign unverified drivers to women passengers during late hours.</li>
  <li>Or maybe don’t assign any late-night rides to them at all (after 10 PM?).</li>
</ul>

<h3 id="data-driven-interventions">Data-driven interventions</h3>
<ul>
  <li>What % of total rides actually result in complaints? <em>(My hunch: &lt;1%. That 1% should get laser focus.)</em></li>
  <li>Is there a pattern in routes or timings when incidents happen?</li>
  <li>For “red zones” or flagged routes, enable active monitoring. <em>(It might be costly but this is passenger safety we’re talking about.)</em></li>
  <li>Build a credit-profile-like system for drivers. Star ratings aren’t enough. Use ride history, complaint count, and behavioural flags to score and prioritise safer drivers. <em>(One more constraint in the optimization problem.)</em></li>
</ul>

<hr />

<p>These are just a few ways I think the problem could be addressed more meaningfully.</p>

<p>Because, let’s be honest, the current system doesn’t inspire confidence. It feels like a bandaid. And reactive measures like these? They might make the news once in a while, but in the long run, they’ll remain futile attempts at patching deeper operational gaps.</p>

<p>The real essence of safety isn’t a notification or a call. It’s a system that doesn’t need either.</p>

<blockquote>
  <p><em>(Note: The Ken episode names a specific company, but this isn’t just about one app. These patterns show up across platforms that patch problems instead of solving them.)</em></p>
</blockquote>]]></content><author><name></name></author><category term="product" /><summary type="html"><![CDATA[Ever notice how tech companies love slapping on fancy features instead of fixing what’s actually broken? It’s like installing a smart doorbell on a house with no locks. Here’s a perfect example from India’s cab scene and why this kind of thinking is completely backwards. Have you ever taken an auto or cab ride from one of India’s leading cab service providers? Especially after 11 PM? Or during “late-night” hours? You might have noticed something then. After your ride is completed, you receive multiple calls from an auto-generated voice. It usually prompts something like: “Have you reached your destination safely? Dial 1…” Often, they “spam” the customer 4–5 times. Sometimes it stops after just one call. On the surface, this feels like a great feature. But we may need to double-click on this to understand what’s really happening and why. First things first: I started wondering, why does this feature even exist? It feels like a reactive measure at best, and a highly inefficient one at worst. To understand this better, I’d recommend listening to one episode from The Ken (a podcast report that digs into this issue among others). This episode outlines the poor safety protocols followed by the company. According to the report, there are barely any proper checks while onboarding new drivers. Competitors, on the other hand, typically conduct police verifications and background checks. That step is simply missing here. So what do they do instead? They call you after the ride ends. Sometimes once. Sometimes five times. Basically, it’s a product-level fix for an operational inefficiency. And honestly, that might not be enough. So what could be done better? The basics first Introduce police verification. Like other companies already do. Operational logic over product patchwork Give drivers the option to skip verification, but flag them clearly. Don’t assign unverified drivers to women passengers during late hours. Or maybe don’t assign any late-night rides to them at all (after 10 PM?). Data-driven interventions What % of total rides actually result in complaints? (My hunch: &lt;1%. That 1% should get laser focus.) Is there a pattern in routes or timings when incidents happen? For “red zones” or flagged routes, enable active monitoring. (It might be costly but this is passenger safety we’re talking about.) Build a credit-profile-like system for drivers. Star ratings aren’t enough. Use ride history, complaint count, and behavioural flags to score and prioritise safer drivers. (One more constraint in the optimization problem.) These are just a few ways I think the problem could be addressed more meaningfully. Because, let’s be honest, the current system doesn’t inspire confidence. It feels like a bandaid. And reactive measures like these? They might make the news once in a while, but in the long run, they’ll remain futile attempts at patching deeper operational gaps. The real essence of safety isn’t a notification or a call. It’s a system that doesn’t need either. (Note: The Ken episode names a specific company, but this isn’t just about one app. These patterns show up across platforms that patch problems instead of solving them.)]]></summary></entry><entry><title type="html">LinearLeap: Towards More Intelligent Machine Learning Tools</title><link href="https://shibaprasadb.github.io/2025/05/21/linearleap.html" rel="alternate" type="text/html" title="LinearLeap: Towards More Intelligent Machine Learning Tools" /><published>2025-05-21T00:00:00+00:00</published><updated>2025-05-21T00:00:00+00:00</updated><id>https://shibaprasadb.github.io/2025/05/21/linearleap</id><content type="html" xml:base="https://shibaprasadb.github.io/2025/05/21/linearleap.html"><![CDATA[<p>Upload data, run regression, get recommendations - powered by LLMs and built with analysts in mind.</p>

<p>LLMs are doing a fantastic job in automating repetitive and mundane day-to-day tasks. However, where they can truly add value is in performing “intelligent” tasks.</p>

<p>With this in mind, I started exploring how I could create “intelligent” ML models. When I say intelligent, I mean models that can provide crisp, actionable recommendations to analysts and stakeholders - not just <a href="https://www.explainxkcd.com/wiki/index.php/1838:_Machine_Learning">stir through piles of data</a>, try 10 different models, and then say “here’s everything, select whatever you like.”</p>

<p>To address this need, I developed an intelligent Linear Regression assistant. It’s currently hosted on Streamlit Cloud and accessible via this link: <a href="https://linearleap.streamlit.app">LinearLeap</a></p>

<p>This web application leverages multimodal LLMs (currently configured to use Gemini). While I’m using a free API for demonstration purposes, users can enter their own API keys to use the tool as extensively as they wish.</p>

<p>The Multiple Linear Regression model is still a work in progress, which I plan to enhance later. Nevertheless, I’m moderately satisfied with the Linear Regression tool’s current capabilities.</p>

<p>If you’re an analyst, data scientist, or stakeholder with some ML knowledge, I invite you to try it out and share your feedback. Your input will be valuable as I continue to improve the application.</p>

<h2 id="features">Features</h2>

<ul>
  <li>Upload and analyze your datasets with ease</li>
  <li>Perform linear and multilinear regression analysis</li>
  <li>Visualize relationships between variables</li>
  <li>Get detailed statistical insights and predictions</li>
  <li>Receive tailored recommendations based on your data (GenAI generated)</li>
</ul>

<h2 id="resources">Resources</h2>

<p><strong>GitHub repository:</strong><br />
<a href="https://github.com/shibaprasadb/linear-leap">LinearLeap - Github</a></p>

<p><strong>Demo video:</strong></p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/d78AHXw-7TI?start=4" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<p>(Please excuse my presentation - recording yourself is a humbling experience!!)</p>

<h2 id="future-enhancements-planned">Future enhancements planned</h2>

<ul>
  <li>Fully integrating Multiple Linear Regression</li>
  <li>Better support for categorical variables</li>
  <li>Enhanced visualizations and export options</li>
  <li>More robust handling of multicollinearity</li>
  <li>Even smarter GenAI-generated recommendations</li>
</ul>]]></content><author><name></name></author><category term="technical" /><summary type="html"><![CDATA[Upload data, run regression, get recommendations - powered by LLMs and built with analysts in mind. LLMs are doing a fantastic job in automating repetitive and mundane day-to-day tasks. However, where they can truly add value is in performing “intelligent” tasks. With this in mind, I started exploring how I could create “intelligent” ML models. When I say intelligent, I mean models that can provide crisp, actionable recommendations to analysts and stakeholders - not just stir through piles of data, try 10 different models, and then say “here’s everything, select whatever you like.” To address this need, I developed an intelligent Linear Regression assistant. It’s currently hosted on Streamlit Cloud and accessible via this link: LinearLeap This web application leverages multimodal LLMs (currently configured to use Gemini). While I’m using a free API for demonstration purposes, users can enter their own API keys to use the tool as extensively as they wish. The Multiple Linear Regression model is still a work in progress, which I plan to enhance later. Nevertheless, I’m moderately satisfied with the Linear Regression tool’s current capabilities. If you’re an analyst, data scientist, or stakeholder with some ML knowledge, I invite you to try it out and share your feedback. Your input will be valuable as I continue to improve the application. Features Upload and analyze your datasets with ease Perform linear and multilinear regression analysis Visualize relationships between variables Get detailed statistical insights and predictions Receive tailored recommendations based on your data (GenAI generated) Resources GitHub repository: LinearLeap - Github Demo video: (Please excuse my presentation - recording yourself is a humbling experience!!) Future enhancements planned Fully integrating Multiple Linear Regression Better support for categorical variables Enhanced visualizations and export options More robust handling of multicollinearity Even smarter GenAI-generated recommendations]]></summary></entry></feed>