ML/AI Engineer
Get $2000 Referral BonusApplyDescription
Our customer is a U.S.-based data center operations company that runs a DC monitoring platform for tracking energy efficiency, uptime compliance, and anomaly detection across infrastructure. The team is building an agentic AI layer on top of their existing data pipeline — and we're looking for an ML/AI Engineer to own this stream end-to-end.
Requirements:
● 5+ years of commercial experience as an ML or AI Engineer
● strong Python 3.11 skills — production ML serving code, async handlers
● experience with scikit-learn: Isolation Forest, model serialization, scoring threshold selection
● hands-on experience with time-series analysis: stationarity, rolling statistics, anomaly baseline establishment
● experience building agentic workflows with LangGraph: stateful agent graph design, tool node definition, memory integration
● experience with Anthropic API (Claude) or similar LLM providers: tool_use message format, system prompt engineering, retry handling
● experience with Azure Functions: Python handler authorship, cold-start optimization
● familiarity with Snowflake SQL API: parameterized query execution from Python
● experience with MLflow or equivalent: experiment tracking, model registry, artifact versioning
● prompt engineering skills: system prompt design, tool description optimization, human-in-the-loop flow design
● english level: Upper Intermediate or higher
Would be a plus:
● experience with predictive maintenance in industrial or data center environments
● prior production deployment of LLM-based agents with tool use and human oversight controls
● familiarity with alternative anomaly detection approaches: LSTM autoencoders, statistical process control
Responsibilities:
● train and deploy an Isolation Forest anomaly detection model on UPS load and CRAC delta-T time series data
● deploy model inference as an Azure Function (Python 3.11, scikit-learn) with latency SLA compliance
● version all model artifacts in Azure Blob Storage with rollback support
● design and implement a LangGraph multi-tool agent with tools for querying Snowflake, sending ops alerts, generating reports, and retrieving baselines
● implement short-term (in-process) and long-term (Snowflake memory table) memory for the agent
● build a human-in-the-loop approval step for all agent write actions with audit logging
● develop scheduled agents (cron): daily PUE optimization report, weekly Uptime Institute compliance check, monthly energy summary
● set up event-triggered agents: anomaly webhook, alarm escalation chain, root cause analysis report
● version all agent prompts, tool definitions, and LangGraph workflow graphs in Azure DevOps
● communicate directly with the client to clarify requirements and present progress
Why Rolique?
● we believe in fairness, transparency and helpfulness in everyday work
● your personal development is important to us — we promote internal knowledge transfer and strengthen your "zone of genius"
● 20 days of paid vacation and 5 days of sick leave
● personal budget for courses, training, and certifications
● health support and sports compensation
● accounting support

