Data Engineer at Hive (open until filled)

About Hive

Hive is a full-stack deep learning platform helping to bring companies into the AI era. We take complex visual challenges and build custom machine learning models to solve them. For AI to work, companies need large volumes of high quality training data. We generate this data through Hive Data, our proprietary data labeling platform with over 500,000 globally distributed workers, generating millions of high quality pieces of data per day. We then use this training data to build machine learning models for verticals such as Media, Autonomous Driving, Security, and Retail. Today, we work with some of the largest companies in the world to redefine how they think about unstructured visual data. Together, we build solutions that incorporate AI into their businesses to completely transform industries.

We are fortunate that investors like Peter Thiel (Founders Fund), General Catalyst, 8VC, and others see Hive’s potential to be groundbreaking in AI business solutions. We have over 100 rock stars globally in our San Francisco and Delhi offices. Please reach out if you are interested in joining the AI revolution!

If you’re interested in applying, please reach out to Kristine at kristine@thehive.ai with a copy of your resume.

Data Engineer Role

In order to execute our vision, we need to grow our team of best-in-class data engineers. We are looking for developers who conduct impeccable data practices and implement high quality data infrastructures. We value hard workers who are comfortable improvising solutions to a stream of big data challenges while building a system that stands the test of time. Our ideal candidate has experience building data infrastructure from the ground up, contributes innovative ideas and ingenious implementations to the team, and is capable of planning out scalable, maintainable data pipelines.

As a data engineer, you would at first work primarily on our Hive Media product, taking real-time data from hundreds of television streams and turning them into a combination of real-time and scheduled outputs, especially our signature ads feed. Your work would improve the quality of our results while reducing computational cost and latency. Expect truly novel challenges.

Responsibilities

    • Writing scheduled Spark pipelines that perform sophisticated query plans on the entirety of our datasets
    • Writing real-time pipelines that execute complex operations on incoming data
    • Synchronizing large amounts of data between unstructured and structured formats on various data sources
    • Creating testing and alerting for data pipelines
    • Building out our data infrastructure and managing dependencies between data pipelines
    • Determining and implementing metrics that provide visibility into our data quality

Requirements

    • You have an undergraduate and / or graduate degree in computer science or a similar technical field, with a sound understanding of statistics
    • You have 1-2 years of industry experience as a data engineer
    • You have hands-on experience doing ETL and have written data pipelines in either Spark or MapReduce
    • You have a sound understanding of SQL or CQL
    • You have worked with data lakes such as S3 or HDFS
    • You have worked with various databases, such as Postgres, Cassandra, or Redshift before, and understand their pros and cons
    • You have a working knowledge of the following technologies, or are not afraid of picking them up on the fly: Mesos, Chronos/cron, Marathon, Jenkins
    • You are fluent in at least one scripting language (preferably NodeJS or python) and one compiled language (such as Scala, Java, or C)
    • You have great communication skills and ability to work with others
    • You are a strong team player, with a do-whatever-it-takes attitude

What We Offer You

We are a group of ambitious individuals who are passionate about creating a revolutionary machine learning company. At Hive, you will have a significant career development opportunity and a chance to contribute to one of the fastest growing AI startups in San Francisco. The work you will do here will have a noticeable and direct impact on the development of Hive.

Our benefits include competitive pay, equity, health / vision / dental insurance, catered lunch and dinner, and a corporate gym membership.

Thank you for your interest in Hive.

Posted in Job