Machine Learning Engineering

Palo Alto, CA | Engineering

Job Description

About Us

We believe in a world where all information can seamlessly flow between your devices, services, and applications and you’re never directed to examine a webpage to get an answer to your question.  This requires building a new kind of search, one that can see the entire web as structured information, rather than documents.  

At Diffbot, we apply computer vision and natural language processing to the problem of structuring information.  Located a block from the Stanford campus, Diffbot is the first startup incubated by Stanford StartX and funded by Sun Microsystem’s founder Andy Bechtolsheim and Earthlink founder Sky Dayton.  We’re a small, but growing, team of world-class machine learning, natural language processing, and web search pioneers. Our APIs currently power many of the world’s largest internet sites. 

Quick Facts

  • Team of 8, with a mix of recent grads, serial entrepreneurs, and web veterans

  • Machine learning at web-scale: it’s not just a part of what we do, it *is* what we do

  • Massive datasets (both supervised and unsupervised) and real-time loads, with many classifiers that perform above human-level accuracy

  • Many proprietary and exotic technologies for visual rendering, statistical modeling, and web search

  • Sustainable revenue and growth plan

  • Well-funded with excellent pay and benefits

  • Beautiful environment located walking distance to Stanford campus, restaurants

The Machine Vision Engineering Role

Machine vision engineers at Diffbot are a resourceful bunch, always looking to squeeze every drop of signal out of a dataset.  Unlike machine learning roles at other companies, our goalpost is to extract the unequivocal truth from a source document, not a subjective ranking, sentiment, or score.  Because of this higher standard for accuracy, we’ve had to create new systems for handling training data and invent novel and performance-optimized algorithms.

  • Mix of object classification, scene understanding, and document analysis in a novel setting

  • Derive features by combining signals from disparate sources

  • Invent and test new ML techniques that can generalize to the web

  • Leverage near-infinite amounts of unsupervised training data on fast machines (40-core, 120GB ram, SSD, GPU)

To apply, send an e-mail to jobs(at) introducing yourself to the team.  Let’s create the future of the web together.

Apply Now
for this Job
Spread the Word
Not the right job?
Describe your perfect job
Join our Talent Network »