Entering the Wizarding World...
Lumos!

S h r e y J a r a d i

Data Scientist & Engineer

Transforming data into actionable insights through innovative solutions

Impact Metrics

0
% Accuracy Improvement
NLP Document Classification
0
% Processing Time Reduction
AWS EC2 Multi-instance Processing
0
% Query Resolution Time Reduction
RAG Pipeline with LLMs
0
K+ Records Analyzed
Biomedical NLP Models

Work Experience

Data Scientist - NLP

Global Action Alliance, Chicago, USA

Feb 2025 - Present

  • Developed NLP models to analyze 100K+ biomedical records, improving document classification accuracy by 22%
  • Designed and implemented multi-instance data processing on AWS EC2 for large-scale dataset extraction and cleaning, accelerating processing time by 18%

Data Scientist

Pure Data Consulting Inc, Chicago, USA

Dec 2022 – Feb 2025

  • Developed Retrieval-Augmented Generation (RAG) pipeline using pre-trained Large Language Models (LLMs), improving response accuracy for knowledge-based systems and reducing query resolution time by 16%
  • Automated data preprocessing workflows using Python and SQL, improving data readiness for machine learning models
  • Delivered actionable insights through statistical analyses and data visualizations using Power BI, including A/B testing of new model features, aiding stakeholders in making data-driven decisions and improving operational efficiency

Data Engineer

Accenture Solutions Private Limited (Client: Microsoft), India

June 2021 - Aug 2022

  • Built and optimized scalable data pipelines in Azure Data Factory and Synapse to enable efficient real-time data processing and predictive analytics
  • Reduced decision-making time by 8% for leadership teams by creating 5+ interactive Power BI dashboards
  • Boosted reporting efficiency by 12% through SQL optimization and 4 Azure Analysis Services tabular models

Data Scientist

Softtek India Private Limited, India

April 2018 - June 2021

  • Reduced manual ticketing workload by 9% by developing a ServiceNow automation app using Google Dialogflow APIs with Speech-to-Text models
  • Developed a predictive classification model using Python, NLP, and FastText to streamline IT support operations, improving issue resolution workflows by leveraging machine learning techniques
  • Automated internal processes and cut manual effort by 8% for Nova University by leading a team to design an account termination solution using Python and Selenium

Projects

Evaluate Cryptocurrency potential using Predictive Analysis

Jan 2023 – May 2023

Designed a predictive analytics pipeline leveraging Random Forest, CatBoost, and HuggingFace BERT models to achieve 67% accuracy, enabling data-driven investment insights with optimized model deployment workflows. Conducted A/B testing to evaluate model performance with different feature sets, improving the prediction accuracy by 5%.

Python Pandas Numpy Scikit-learn Tensorflow BERT Random Forest CatBoost
graph LR A[" Market Data"] --> B[" Technical Analysis"] C[" News Data"] --> D[" Sentiment Analysis"] D --> E[" BERT Model"] B --> F[" Feature Engineering"] F --> G[" Model Training"] E --> G G --> H1[" Random Forest"] G --> H2[" CatBoost"] H1 --> I[" Prediction"] H2 --> I I --> J[" Investment Insights"] I --> K[" A/B Testing"] K --> L[" 5% Accuracy Improvement"]
View Project

The Sorting Hat is analyzing your skills...

Skills

Programming

Python 95%
SQL 90%
R 85%
PowerBI 88%

Machine Learning

NLP 92%
PyTorch 88%
TensorFlow 85%
RAG & LLMs 90%
Predictive Modelling CNN Model Deployment Langchain BedRock Sagemaker

Cloud & Big Data

Azure ADF Synapse Data Explorer Data Lake SQL Server MongoDB

Others & Methodology

Git Agile DataBricks REST APIs Workflow Automation DevOps CI/CD

Education

Master of Science in Data Science

Illinois Institute of Technology, Chicago, IL

Aug 2022 - May 2024

Graduate Teaching Assistant: Supported 3 graduate-level courses; guided 100+ students in Machine Learning, Data Analysis, and Applied Statistics concepts and labs.

Certifications & Achievements

Microsoft Azure Fundamentals

Demonstrating expertise in cloud-based data pipelines & real-time data solutions

ACM ICPC Regional

Represented College in ACM International Collegiate Programming contest (ACM ICPC) Amritapuri Regional

Co-founded WebStories

Co-founded a web development firm (WebStories), developing bespoke websites for diverse clients

Open-source Contribution

PyCon 2024, Pittsburgh: Documented the datetime64("now") feature in the NumPy library

Recommendations

"Shrey demonstrated exceptional expertise in developing NLP models and optimizing data processing pipelines. His work on biomedical document classification significantly improved our system's accuracy."

Global Action Alliance Team Lead

"Shrey's implementation of RAG pipelines with LLMs was outstanding. He consistently delivered high-quality solutions and showed strong problem-solving abilities."

Pure Data Consulting Senior Data Scientist

Code Highlights

# NLP Model for Document Classification
import torch
from transformers import AutoTokenizer, AutoModel

class DocumentClassifier:
    def __init__(self, model_name='bert-base-uncased'):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)
    
    def classify(self, text):
        inputs = self.tokenizer(text, return_tensors='pt')
        outputs = self.model(**inputs)
        return outputs.last_hidden_state.mean(dim=1)

# Achieved 22% accuracy improvement
classifier = DocumentClassifier()
result = classifier.classify("Biomedical document text...")
-- Optimized SQL Query for Data Pipeline
WITH ranked_data AS (
    SELECT 
        id,
        category,
        score,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY score DESC
        ) as rank
    FROM documents
    WHERE processed_date >= DATEADD(day, -30, GETDATE())
)
SELECT 
    category,
    AVG(score) as avg_score,
    COUNT(*) as total_count
FROM ranked_data
WHERE rank <= 100
GROUP BY category
-- Reduced query time by 12%
# RAG Pipeline Implementation
import torch.nn as nn
from transformers import AutoModel

class RAGModel(nn.Module):
    def __init__(self, encoder_name='sentence-transformers/all-MiniLM-L6-v2'):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(encoder_name)
        self.retriever = nn.Linear(384, 128)
        self.generator = nn.Linear(128, 50257)
    
    def forward(self, query, context):
        query_emb = self.encoder(query).pooler_output
        context_emb = self.encoder(context).pooler_output
        retrieved = self.retriever(query_emb + context_emb)
        return self.generator(retrieved)

# Reduced query resolution time by 16%

Get in Touch

Feel free to reach out for opportunities or collaborations

(312) 934-6334

jaradishrey@gmail.com

Chicago, IL