Ben Wortman | Data Scientist

About me

I'm a lead data data scientist with a strong background in natural language processing (NLP) and currently working at FINRA. With a deep-rooted interest in the world of data and its transformative potential, I have channeled my expertise into deciphering the complex world of financial markets and regulatory compliance through data-driven insights. I have a masters in Informatics from Penn State University where my research topics included include deep learning (DL) computer vision towards emotions recognition, the use of NLP to inform our predictions on in-the-wild emotion recognition tasks.

I love coding and I always tell people I think its one of the freest forms of expression since if you can dream it, you can program it. For my DL projects I prefer using Pytorch, however I'm also proficient in Tensorflow/Keras. As it relates to machine learning, some other topics I am interested in include XAI and uncertainty quantification, and parallel/big data computing.

In addition to programming personal projects, I have quite a few hobbies. Lately I've been spending most of my free time writing music, exploring state parks, and playing DnD with my friends.

View Resume

Select Projects

Hidden Adversarial Patch Attack for Optical Flow

Adversarial patches have been of interest to researchers in recent years due to their easy implementation in real world attacks. In this paper I expand upon previous research by demonstrating a new hidden patch attack on optical flow. By altering the transparency during training I generate patches that are invariant to their background meaning they can be inconspicuously applied using a transparent film to any number of objects.

Manuscript Source Code

Image Defencing GAN

An end-to-end model for the simultaneous detection and inpainting of images obstructed by fences. This was achieved by overlaying raw images with existing fence masks and training to minimize SSIM, L2, and Adversarial loss. Despite training exclusively on synthetic data, this method was able to generalize to unseen, real world data and effectively inpaint ~90% of the test images.

Manuscript Source Code

Uncertainty Quantification in Autonomous Vehicles

During my summer as an intern at Carnegie Mellon SEI, I prototyped an interface for autonomous drone controllers. This included training a DL bayesian object detector, modifying a DL depth estimator with MC dropout to give uncertainty estimates on distance, and finally object localization using the camera's intrinsic projection matrix and GPS coordinates to give users a birds eye view of detected objects in the field.

View Storyboard

HICEM: High-Coverage Emotion Modeling for Artifical Emotional Intelligence

Through the use of NLP word embeddings from Facebook’s FastText model, I was able to produce an updated emotion model which provides a 5% increase in performance using only 46% of the labels. When it comes to annotation, this increases the amount of information each label has while nearly cutting in half the costs for an equivalent dataset.

Manuscript

In-the-wild Emotion Recognition with Semantic Loss

Since emotion labels are not completely independent, I am using semantic embeddings as targets to encode additional information into the model's training. In addition to this I have trained a complementary model using transfer learning from a pretrained Resnet to make predictions on LMA components which have been shown to be correlated with emotion. I have also crowdsourced an additional dataset annotated for human interaction to be processed in a separate channel.

Coming Soon!

Probabilistic Attribution for Walled Garden Ad Impressions

While working as an intern at Impact Radius, I matching walled garden Ad impressions to customer conversion data in order to give customers insight into how their ad campaigns were performing. To do this, I created features from browser metadata and census demographic data local to the conversion IP before using a fuzzed decision tree algorithm for classifcation. In addition to this, I also developed a preprocessing step that identified 30% fraudulent TV promo code linked conversions. When filtered out using an SVM, these dramatically improved predictive performance and the explainability of the model (shapely values).

Other Fun Things!

Tarot Chat

A fun personal project. I built a streamlit chat bot powered by gpt-3.5 and that provides custom tarot card readings.

Chat

SkyThoughts

Another fun personal project. I built a dash app powered by gpt-4 that generates mind maps for brainstorming new ideas.

Brainstorm!

Deep Fried Startups

Keeping with the theme of brainstorming, I built a startup idea generator powered by GPT-4o-mini.

Changing Seasons

For fun I built a webcrawler to gather ~10k images from online and then trained a GAN to switch the seasons in the images.

Source Code

Collage Builder

To get some practice in interfaces I programmed a simple collage builder application for my younger brother's etsy page.

Store Source Code

Conway's Game of Life

One of my first projects. I explored interface design using Conway's game of life. I include tunable parameters so users can explore how changing paramters affects the simulation.

Demo Source Code

Music

During my free time I love to make music. I play guitar, piano, and drums. On weekends you can usually catch me playing out with my band.

Spotify Webstore

Hi, my name is Ben Wortman
I'm a Data Scientist!

About me

Select Projects