Hongsup Shin

Essential qualities of ML tech leads

ML

leadership

Good ML tech leadership is undoubtedly challenging. I’ve learned (sometimes the hard way) that great ML tech leadership requires a unique blend of technical vision, people skills, and operational excellence. While no one perfectly embodies all of these, I think understanding them is crucial for developing good tech leads in ML community, myself included.

Ranking metrics: pitfalls and best practices

ML

Learning-to-rank

Positional discount in ranking metrics creates subtle complexities. Parameters like K values and relevance levels significantly impact model training and evaluation, requiring careful consideration. This post explores how metrics like MRR and NDCG handle position-based discounting and examines common pitfalls in their practical implementation.

Learning-to-rank for hardware bug discovery

ML

verification

Learning-to-rank

The mainstream hardware research for bug discovery uses reinforcement learning (RL) algorithms, but productionizing RL applications still faces many challenges. As an alternative, I propose a deployment-friendly approach using learning-to-rank algorithm that improves bug discovery rate.

Model tuning with Weights & Biases, Ray Tune, and LightGBM

ML

ML Ops

visualization

ML model debugging often requires thorough inspection of model hyperparameters. By combining Weights & Biases’ detailed visualization with Ray Tune’s flexible tuning machinery, model inspection becomes much easier. This post demonstrates how to integrate both tools when tuning LightGBM models, and how to share the tuning results.

Thompson sampling in practice: modifications and limitations

ML

verification

Bayesian

Thompson sampling is a simple multi-armed bandit method popular among industry practitioners. In this post, I discuss its implementation, practical challenges and limitations, and how hardware verification can utilize the algorithm.

Building effective ML teams: lessons from industry

ML

collaboration

Although ML has become ubiquitous, truly effective ML teams remain rare. With ML tools and expertise becoming increasingly democratized, the factors differentiating teams will be clear - research excellence, strong technical vision, open culture, and thoughtful technical standards.

Building datasets for model benchmarking in production

ML

ML Ops

data

Benchmarking is critical to developing and evaluating ML models in both research and development settings. In production, we can use benchmarking to evaluate new features on historic data, similar to A/B testing. In this post, I discuss practical considerations when building custom benchmark datasets for ML models in production.

Modeling tabular data using conditional GAN

paper

GenAI

ML

Tabular data synthesis (data augmentation) is an under-studied area compared to unstructured data. This paper uses GAN to model unique properties of tabular data such as mixed data types and class imbalance. This technique has many potentials for model improvement and privacy. The technique is currently available under the Synthetic Data Vault library in Python.

Dissecting racial bias in an algorithm used to manage the health of populations

paper

ML

ethics

fairness

Amidst the LLM hype, algorithmic bias in a critical domain such as healthcare is continued being overlooked. This algorithm-audit paper found racial bias in a widely used healthcare system and discussed the problem of using a wrong target variable. The paper is a few years old but the message is still relevant, and we discussed what’s happened since then.

Algorithmic decision-making and fairness (Stanford Tech Ethics course, Week 1)

ethics

fairness

criminal justice

This week, we covered algorithmic decision-making, algorithmic accountability, and different definitions of fairness. We used the COMPAS case as an example to explore the challenges and problems in automated decision-making system driven by algorithms.

Moral complicity and moral disengagement (Stanford Tech Ethics course, Week 1)

ethics

I started taking “Ethics, Technology + Public Policy for Practitioners”, a Stanford online course. In Week 1, we read Ursula K. Le Guin’s “The Ones Who Walk Away from Omelas”, and discussed moral complicity. This post is my reflection on the story, the cohort, and my thoughts on moral complicity and moral disengagement.

Constitutional AI: harmlessness from AI feedback

paper

GenAI

LLM

ML

There is an arms race of large language models (LLMs) in industry where companies use different approaches and techniques. Anthropic claims to adopt a more cautious approach that minimizes harm by LLMs than others. Let’s look into constitutional AI, the core algorithm of their LLM, to understand how this harm mitigation works.

Visualization in Bayesian workflow

paper

Bayesian

visualization

ML

This paper summarizes types of data visualization that we can use in Bayesian modeling and inference. It also provides a good overview of how to do Bayesian data analysis properly, including model validation such as prior and posterior predictive checks.

Interoperability testing for hyperparameter tuning: MLflow, LightGBM, sklearn, and dask-ml

ML

ML Ops

MLflow autologging allows monitoring LightGBM training loss during model training. This behavior is not always expected when we use scikit-learn and dask to tune LightGBM models. This notebook describes how the unexpected behavior manifests and explains some gotchas when using these tools together.

Comparing type inference methods for mixed data arrays

ML

data

Pandas have two type inference methods. Let’s compare the methods by inferring data types for mixed data type arrays.

Zero-shot text-to-image generation

paper

GenAI

There is so much hype in generative AI. But how does it actually work? We discuss OpenAI’s DALL-E paper to understand model architecture but more importantly, whether their model validation is solid and reasonable.

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

paper

data

responsible AI

Garbage in, garbage out. It seems like a lot of people in the ML community still don’t understand this logic. We discuss poor data-handling practices and their critical ramifications.

Tech volunteering tips for nonprofits

volunteering

ML

Lessons I’ve learned from my own experience by working with various nonprofit organizations such as DataKind and Texas Justice Initiative

Police shooting in Texas 2016-2019

criminal justice

visualization

volunteering

Jupyter Notebook on police shooting analysis in Texas from 2016 to 2019 (done in collaboration with Texas Justice Initiative)

FAccT 2021. Journalism, data leverage, education, and language models

conference

LLM

responsible AI

ML

Summary of Day 3 at FAccT 2021. Julian Anguin’s Markup, language models, measurements, and data average

FAccT 2021. Automated decision-making, causal accountability, and robustness

conference

responsible AI

ML

Summary of Day 2 at FAccT 2021. Automated decision-making, accountability and recourse, and model robustness

FAccT 2021. AI audit, governance, and trustworthiness

conference

responsible AI

ML

Summary of Day 1 at FAccT 2021. Algorithm audit, impact assessment, data governance, trust in AI, and explainable AI

Tutorials at FAccT 2021

conference

fairness

responsible AI

ML

FAccT 2021 (virtual) tutorial summary. Causal analysis, XAI, and algorithmic impact

Markdown and GitHub for scientific writing

collaboration

How to use GitHub to publish and review academic manuscripts for better tracking, communication, and transparency

Efficient bug discovery with ML for hardware verification

ML

verification

My Arm Research blog post about using ML in hardware engineering to make verification more compute-efficient

Critiquing and rethinking at ACM FAT 2020

conference

fairness

ML

Summary of FAT 2020 in Barcelona, Spain. “Critiquing and rethinking” was their new attempt to open up a discussion between multidisciplinary stakeholders

Reflection on USF tech policy and data ethics workshop

ethics

ML

A reflection piece about the USF tech ethics and policy workshop focusing on data ethics as a tech worker myself

Fairness in ML at ACM FAT 2019

conference

fairness

ML

Several key moments from the conference and my thoughts at the FAT (Fairness, Accountability, and Transparency in ML) conference