Shrey Jaradi - Portfolio

Work Experience

Data Scientist - NLP

Global Action Alliance, Chicago, USA

Feb 2025 - Present

Developed NLP models to analyze 100K+ biomedical records, improving document classification accuracy by 22%
Designed and implemented multi-instance data processing on AWS EC2 for large-scale dataset extraction and cleaning, accelerating processing time by 18%

Data Scientist

Pure Data Consulting Inc, Chicago, USA

Dec 2022 – Feb 2025

Developed Retrieval-Augmented Generation (RAG) pipeline using pre-trained Large Language Models (LLMs), improving response accuracy for knowledge-based systems and reducing query resolution time by 16%
Automated data preprocessing workflows using Python and SQL, improving data readiness for machine learning models
Delivered actionable insights through statistical analyses and data visualizations using Power BI, including A/B testing of new model features, aiding stakeholders in making data-driven decisions and improving operational efficiency

Data Engineer

Accenture Solutions Private Limited (Client: Microsoft), India

June 2021 - Aug 2022

Built and optimized scalable data pipelines in Azure Data Factory and Synapse to enable efficient real-time data processing and predictive analytics
Reduced decision-making time by 8% for leadership teams by creating 5+ interactive Power BI dashboards
Boosted reporting efficiency by 12% through SQL optimization and 4 Azure Analysis Services tabular models

Data Scientist

Softtek India Private Limited, India

April 2018 - June 2021

Reduced manual ticketing workload by 9% by developing a ServiceNow automation app using Google Dialogflow APIs with Speech-to-Text models
Developed a predictive classification model using Python, NLP, and FastText to streamline IT support operations, improving issue resolution workflows by leveraging machine learning techniques
Automated internal processes and cut manual effort by 8% for Nova University by leading a team to design an account termination solution using Python and Selenium

Projects

Evaluate Cryptocurrency potential using Predictive Analysis

Jan 2023 – May 2023

Designed a predictive analytics pipeline leveraging Random Forest, CatBoost, and HuggingFace BERT models to achieve 67% accuracy, enabling data-driven investment insights with optimized model deployment workflows. Conducted A/B testing to evaluate model performance with different feature sets, improving the prediction accuracy by 5%.

Python Pandas Numpy Scikit-learn Tensorflow BERT Random Forest CatBoost

graph LR A[" Market Data"] --> B[" Technical Analysis"] C[" News Data"] --> D[" Sentiment Analysis"] D --> E[" BERT Model"] B --> F[" Feature Engineering"] F --> G[" Model Training"] E --> G G --> H1[" Random Forest"] G --> H2[" CatBoost"] H1 --> I[" Prediction"] H2 --> I I --> J[" Investment Insights"] I --> K[" A/B Testing"] K --> L[" 5% Accuracy Improvement"]

View Project

The Sorting Hat is analyzing your skills...

Skills

Programming

Python 95%

SQL 90%

R 85%

PowerBI 88%

Machine Learning

NLP 92%

PyTorch 88%

TensorFlow 85%

RAG & LLMs 90%

Predictive Modelling CNN Model Deployment Langchain BedRock Sagemaker

Cloud & Big Data

Azure ADF Synapse Data Explorer Data Lake SQL Server MongoDB

Others & Methodology

Git Agile DataBricks REST APIs Workflow Automation DevOps CI/CD

Certifications & Achievements

Microsoft Azure Fundamentals

Demonstrating expertise in cloud-based data pipelines & real-time data solutions

ACM ICPC Regional

Represented College in ACM International Collegiate Programming contest (ACM ICPC) Amritapuri Regional

Co-founded WebStories

Co-founded a web development firm (WebStories), developing bespoke websites for diverse clients

Open-source Contribution

PyCon 2024, Pittsburgh: Documented the datetime64("now") feature in the NumPy library

Recommendations

"Shrey demonstrated exceptional expertise in developing NLP models and optimizing data processing pipelines. His work on biomedical document classification significantly improved our system's accuracy."

Global Action Alliance Team Lead

"Shrey's implementation of RAG pipelines with LLMs was outstanding. He consistently delivered high-quality solutions and showed strong problem-solving abilities."

Pure Data Consulting Senior Data Scientist

Code Highlights

                    # NLP Model for Document Classification
import torch
from transformers import AutoTokenizer, AutoModel

class DocumentClassifier:
    def __init__(self, model_name='bert-base-uncased'):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)
    
    def classify(self, text):
        inputs = self.tokenizer(text, return_tensors='pt')
        outputs = self.model(**inputs)
        return outputs.last_hidden_state.mean(dim=1)

# Achieved 22% accuracy improvement
classifier = DocumentClassifier()
result = classifier.classify("Biomedical document text...")
                

                    -- Optimized SQL Query for Data Pipeline
WITH ranked_data AS (
    SELECT 
        id,
        category,
        score,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY score DESC
        ) as rank
    FROM documents
    WHERE processed_date >= DATEADD(day, -30, GETDATE())
)
SELECT 
    category,
    AVG(score) as avg_score,
    COUNT(*) as total_count
FROM ranked_data
WHERE rank <= 100
GROUP BY category
-- Reduced query time by 12%
                

                    # RAG Pipeline Implementation
import torch.nn as nn
from transformers import AutoModel

class RAGModel(nn.Module):
    def __init__(self, encoder_name='sentence-transformers/all-MiniLM-L6-v2'):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(encoder_name)
        self.retriever = nn.Linear(384, 128)
        self.generator = nn.Linear(128, 50257)
    
    def forward(self, query, context):
        query_emb = self.encoder(query).pooler_output
        context_emb = self.encoder(context).pooler_output
        retrieved = self.retriever(query_emb + context_emb)
        return self.generator(retrieved)

# Reduced query resolution time by 16%
                

S h r e y J a r a d i

Data Scientist & Engineer

Impact Metrics

Work Experience

Data Scientist - NLP

Global Action Alliance, Chicago, USA

Data Scientist

Pure Data Consulting Inc, Chicago, USA

Data Engineer

Accenture Solutions Private Limited (Client: Microsoft), India

Data Scientist

Softtek India Private Limited, India

Projects

Evaluate Cryptocurrency potential using Predictive Analysis

Skills

Programming

Machine Learning

Cloud & Big Data

Others & Methodology

Education

Master of Science in Data Science

Illinois Institute of Technology, Chicago, IL

Certifications & Achievements

Microsoft Azure Fundamentals

ACM ICPC Regional

Co-founded WebStories

Open-source Contribution

Recommendations

Code Highlights

Get in Touch