BATUHAN DURMUS

Backend + LLM Systems Engineer · Journalism Technologist

Remote · EU/UK/US timezones | batu@batu0.com | https://batu0.com | https://github.com/avakado0 | https://linkedin.com/in/batu0

Open to remote contract, full-time, and fractional engagements.

SUMMARY
───────

Backend developer and systems builder with 3+ years of production experience delivering complete systems from scratch — architecture, scraping infrastructure, data pipelines, REST APIs, AI agent workflows, and cloud deployment in Python and Node.js. Scopes, builds, and hands off production-ready multi-service backends on GCP and AWS. Background includes 8+ years of investigative journalism (2015-2023) with OSINT research and data-driven reporting expertise. 5.0/5.0 rating on Upwork.

TECHNICAL SKILLS
────────────────

Languages: Python, JavaScript, Node.js, TypeScript, Solidity

Frameworks: Django REST Framework, FastAPI, Flask, Express.js, React 18

AI, Machine Learning, and Agent Development: Machine Learning, Natural Language Processing (NLP), LangChain, LangGraph, OpenAI API, Gemini API (incl. Google Generative AI SDK), Claude API, Model Context Protocol (MCP), Vertex AI (image generation, Chirp TTS), Retrieval-Augmented Generation (RAG), FAISS vector search, Sentence Transformers, Hugging Face Transformers, custom multi-agent orchestration, PyTorch, TensorFlow

Blockchain and Smart-Contract Auditing: Solidity (0.8+), Hardhat, Foundry, OpenZeppelin, Uniswap SDK, Halmos formal verification, Solodit and BlockThreat intel integration; built a custom Python auditing pipeline; multi-protocol audit workflows (staking, lending, stablecoin protocols)

APIs and Protocols: REST APIs, RESTful API design, WebSocket, OAuth, OpenAPI, Swagger, JSON, gRPC-style internal contracts, webhooks

Architecture: Microservices, microservices architecture, event-driven systems, caching strategies, multi-layer storage, idempotent execution design, deterministic state machines

Web Scraping and Automation: Scrapy, BeautifulSoup, lxml, Selenium, Puppeteer, Requests, XPath, CSS selectors, anti-bot bypass, proxy management

Databases: PostgreSQL, MySQL, MongoDB, Redis, Neo4j, SQLite, SQL design and optimization, NoSQL

Data Processing and NLP: Pandas, NumPy, scikit-learn, SpaCy, NLTK, Pydantic, topic modeling, text classification, named entity recognition, PDF extraction, OCR (PyPDF2, pdfplumber, pytesseract)

Testing: pytest, unit testing, integration testing

Media Processing: FFmpeg, MoviePy, Whisper (speech-to-text), OpenVoice/EmotiVoice (TTS), Stable Diffusion, Stable Video Diffusion, Blender Engine, Pillow, NVIDIA CUDA GPU acceleration

Cloud and Containerization: Docker, Docker Compose, Google Cloud Run, Google Artifact Registry, Google Cloud Scheduler, Google Cloud Storage, Google DocumentAI, GCP Secret Manager, Firebase Admin SDK, Azure Microsoft Authentication Library (MSAL), AWS EC2, AWS S3, AWS Lambda, Stripe, GPU passthrough

CI/CD and Orchestration: CI/CD pipelines, Git, GitHub, Make, Cloud Build, serverless deployments, deployment and component diagramming

Security and Operations: Linux, Bash scripting, API authentication, OAuth, Redis token bucket rate limiting, API Gateway IAM, operational security for sensitive datasets

PROFESSIONAL EXPERIENCE
───────────────────────

Freelance Backend Developer and Systems Builder

Upwork and Private Clients | Dec 2023 - Present | Remote

5.0/5.0 Upwork rating with $40K+ earned on the platform across both freelance engagements below.

VLM Training-Data Pipeline (VNGRS) — Dec 2025 - Jan 2026
Config-driven Python pipeline for vision-language-model training data, ~295 commits.

- Architected ingestion from HuggingFace, Kaggle, GitHub, and scraped PDFs into a unified training-corpus format.
- Resumable state machine handling long-running fetches with checkpointing across AWS EC2/S3.
- Single-config-file driven so the data team can spin up new corpus targets without code changes.

Bitcoin Analytics Platform (My Crypto Canvas) — Dec 2024 - Dec 2025 (delivered)
Built the entire backend from first commit to production for a real-time Bitcoin data platform.

- Architected and deployed 15+ microservices on Google Cloud Run — scraping layer, API connectors, data ingestion pipelines, caching, WebSocket streaming, authentication, and AI-powered analytics.
- Built scrapers and API connectors ingesting 80+ live data points from blockchain explorers, financial APIs, and web sources using Scrapy, Puppeteer, and custom REST integrations.
- Designed multi-layer storage architecture with MySQL, MongoDB, and Redis. Implemented Redis-based caching for live metrics with MongoDB fallback.
- Built a Node.js WebSocket service for real-time price, volume, and RSI streaming with shared caching across REST APIs.
- Integrated GCP API Gateway for REST API authentication, quotas, and IAM roles. Developed Redis-backed WebSocket authentication with single-connection enforcement.
- Engineered LangGraph reflection workflows (TextEnricher -> EnrichmentReflection -> ReflectionRealization with MemorySaver checkpointer) for daily and weekly Bitcoin, ETF, vibe-meter, and asset-comparison summaries — schema-gated regeneration, bounded iterations, and LangChain SQLiteCache-backed prompt/response caching.
- Routed 5 LLM providers behind a unified retry-and-backoff layer (LLMClient.post_with_retries, exponential backoff, asyncio.Semaphore concurrency caps): Gemini 2.5, OpenAI GPT-4, Vertex AI for embeddings / image generation / TTS, and Perplexity Sonar.
- Built statistical analysis endpoints including ATH/ATL calculations and time-based reaggregation.

AI Newsroom Automation System (My Crypto Canvas Phase 4) — Oct 2025 - Dec 2025 (delivered)
Designed, built, and deployed the full pipeline from architectural spec to production.

- Delivered a 3-microservice cloud-native AI news automation system on Google Cloud Run using FastAPI and MongoDB Atlas.
- News Fetching Service: dual discovery engine (Gemini Grounded Search + Perplexity Sonar) with batch normalization and two-layer deduplication — SHA256 URL+title fingerprints in Redis (24h TTL, beat-namespaced), then Vertex AI embedding cosine similarity with deterministic hash-embedding fallback.
- Newsroom Service: 5-persona LangGraph state machine (Ava / Orion / Lex / Knox / Sato, beats externalized to JSON) reusing the shared reflection-loop architecture, gated by a JSON contract (tldr <= 400 chars, >= 4 body paragraphs, 2-5 cited URLs, image prompt) with Pydantic validate_article before persistence — bad generations never reach the blog DB.
- News Studio Service: Vertex AI ImageGenerationModel (imagegeneration@006, 16:9, ukiyo-e template) for article imagery and Google Cloud Text-to-Speech (texttospeech_v1, Neural2 voices) for voice digests, uploaded to Google Cloud Storage with public URL generation.
- Automated scheduling via Cloud Scheduler for 6-hourly news fetches and daily audio digest generation.

Backend Architecture and System Design — Icebreaker (Dolph) — Jan 2026 - Feb 2026 (delivered)
Designed and delivered the full system architecture from scratch — from architectural spec through implementation, testing, and multi-phase delivery. ~177 commits on the backend.

- Architected backend for an AI-powered Chrome extension using Gemini Live for real-time information capture.
- Designed and implemented a deterministic state machine with approval-first trust model, immutable action contracts, and idempotent execution guarantees.
- Built end-to-end backend loop: text-based intent parsing, deterministic action composition, session-based queuing, explicit human approval, lawful state transitions, and gated execution.
- Architected Chrome extension as a pure sensing surface (Phase 2): content script DOM extraction with confidence scoring, background service worker transport, and read-only verification panel.
- Established frozen architectural contracts with versioned change requirements and comprehensive test coverage.

Auth Token Extraction — Jul 2024

- Solved a complex authentication challenge for a California SOS portal that had stumped the client's development team. Extracted and automated auth token capture using browser automation.

Freelance Web-Scraping, Data Extraction and Automation Developer

Upwork | Jan 2022 - Dec 2023 | Remote

Entry into paid engineering work — scraping, crawling, and browser-automation projects for commercial, academic, and investigative clients. Established the 5.0/5.0 Upwork rating that the backend engagements above continue to earn against.

- Wrote production scrapers spanning real-estate portals, academic resources, cryptocurrency trackers, government institutional and service websites, and social-media platforms using Scrapy, Requests, Selenium, lxml, and BeautifulSoup (Python) plus Puppeteer (Node.js).
- Built Twitter applications via the Twitter API — parsers, processors, and bots capable of human-user tasks alongside cognitive tasks via basic NLP libraries (NLTK, SpaCy).
- Designed rule-based link crawlers, request prioritization queues, and scraping pipelines (output formats, database connection, duplicate filtering) with rotating-proxy / user-agent pools and randomized delays for bot-detection bypass.
- Ticketweb / Ticketmaster / Vividseats daily scrapers — Scrapy-based daily runs for Upwork clients, pushing updated scraping stats to a Google Sheets URL after each run.
- Flight Archive Scraper — historic flight archives of Ankara and other TAV-operated airports, used to compute airline-delay averages.
- News Archive Scraper — crawled news sites backward into a database for archival and search.
- Court Files Parser — extracted official hearing records and indictments from a free-journalism site that had run 4 years of trial observation, exporting crimes/judges/prosecutors to CSV.
- Literature Research scraper — JSTOR / Web of Science article-abstract extraction.
- HSBC Bank Statement Calculator — PDF parser collapsing same-day rows (Camelot, PDFPlumber, PyPDF2).

Selected Technical Projects (Independent)

YouTube Production System (YPS2) — Solo Full-Stack Build
Designed and built an end-to-end automated YouTube video production platform from scratch as sole developer and system architect.
- 15+ containerized microservices orchestrated via Docker Compose with GPU passthrough (NVIDIA CUDA).
- Django REST Framework backend serving a React 18 single-page application.
- LLM-powered content pipeline using LangChain and OpenAI for topic research, narrative planning, and script generation. Custom AI agent framework for autonomous multi-step content workflows.
- RAG system with FAISS vector search for knowledge-grounded script generation.
- Automated video assembly engine (MoviePy) with dynamic panning, transitions, and image alignment. TTS pipeline with voice conversion (OpenVoice/EmotiVoice). Speech-to-text (Whisper). Stable Video Diffusion and Blender 3D rendering integration.
- Multi-source research engine aggregating DuckDuckGo, Google APIs, Wikipedia, and Tavily. Telegram integration for content ingestion.
- Go-based YouTube upload service alongside Flask API gateway.
- Full pipeline from research to publish with minimal human intervention.

Legal Data Mining and NLP Pipelines
- Scraped and processed 8M+ Turkish Court of Cassation and 12K Constitutional Court decisions. Built parsing and analysis pipelines using NLP techniques including categorization, topic modeling, and text enrichment for investigative use.

BitzeOOP Judicial Appointments Analytics
- Built a Python data processing and analytics pipeline for 60K+ judicial appointments over 13 years, transforming PDF and CSV source data into normalized person-level career histories.
- Designed data quality workflows for duplicate detection, problematic-row handling, and manual review of ambiguous records to improve reliability on messy public-record datasets.
- Modeled people, posts, appointments, and locations with Neo4j-backed graph relationships and developed transition analysis, anomaly detection, and visualization workflows for judicial career movements.
- Produced statistical analyses of judicial assignments that informed 2 investigative reports for EuroMed Rights.

Historical Content Enrichment System
- Built a 7-microservice architecture integrating Wikipedia APIs and RAG from multiple books to produce enriched historical narratives.

Turkish Parliament MPs performance-stats scraper / Parliamentary Data Analytics Platform (Dec 2021)
- The first coding project after the journalism pivot. Scraper extracting question proposals, research proposals, parliamentary questions, and MP resumes from the parliament website (item pipelines aggregating parallel pages), plus an analytics layer calculating performance metrics for MPs and political parties. Later commercialized.

TBMM Observation Engine (tbmm_observation_engine)
- Turkish Grand National Assembly observation pipeline reading the daily parliamentary record (tutanak). Scrapy with Puppeteer middleware. ~68 commits. Successor to the earlier MPs performance-stats scraper — same target, larger scope.

Journalist-Facing Legal Parser
- Developed a parsing tool for court decisions and records enabling reporters to quickly access case details.

EARLIER EXPERIENCE (JOURNALISM AND NGO)
───────────────────────────────────────

Subject Matter Expert

EuroMed Rights (Contract) | Nov 2023 - May 2024 | Copenhagen, Denmark — Remote

Contract that grew out of my personal BitzeOOP judicial-appointments project. After roughly 700 hours of solo data-engineering on the pipeline, EuroMed Rights picked it up and commissioned deliverables consumed by their research output on judicial independence and human-rights conditions in the Euro-Mediterranean region. The bridge between the journalism track and the backend-engineering track is the BitzeOOP work itself; this contract was the first external pickup of it.

Investigative Journalist and OSINT Researcher

Freelance and NGOs | 2015 - 2023

- Trial Observation Programme Officer — Amnesty International Turkey (Oct 2019 - Sep 2023): Attended and reported on court hearings involving free speech, press freedom, and impunity. Produced detailed reports, liaised with lawyers and rights holders, published explanatory articles and internal-magazine pieces on court architecture and the principle of publicity.
- Investigative Research Scholarship — P24 Independent Journalism Foundation (Jun 2019 - Feb 2021): 1 of 4 scholarship winners. Investigated Istanbul's taxi industry through 50+ information requests to 8 state institutions. Findings published in T24 (May 2021) and featured in a YouTube documentary.
- Freelance News Producer (Fixer) — Various Outlets (Nov 2015 - Oct 2019): Produced news projects for The Guardian, The Financial Times, The Daily Telegraph, Trouw, NRC Handelsblad, NOS Radio, Der Tagesspiegel, RTL News, Middle East Eye, Il Manifesto.
- Turkey and Middle East Consultant and News Producer — The Yomiuri Shimbun (Apr 2017 - Dec 2018): Supplied political, economic, social, and pop-culture reports from Turkey to Cairo HQ. Arranged interviews under coup, post-terror, and state-of-emergency conditions — including detainment experiences. Collaborated with Cairo, Jerusalem, Tehran, and Rome offices.
- Freelance News Producer — The Yomiuri Shimbun (Jun 2015 - Apr 2017): Created news frameworks, scheduled interviews, and conducted research for correspondents.
- Contributor — "De autoritaire verleiding" by Casper Thomas (Oct 2018): Supported investigative research for book on authoritarian regimes in Turkey, Russia, Hungary, India, and the US.

NGO and Civil Society Work

- Project Manager — Zero Discrimination Foundation (Mar - Jun 2016): Coordinated launch of EU-funded project on Turkish Roma groups with multiple NGOs nationwide.
- Lobbier, Organizer and Translator — DurDe Movement (Oct 2013 - Jan 2016): Organized anti-racist NGO events and commemorations with Armenian Diaspora groups.
- Member and Organizer — 3H Movement (first liberal youth association in Turkey): Organized events at Friedrich Naumann Foundation; gave a lecture on comparative religious sociology ("Marketplace of Religion Theory in The US").
- Youth Ambassador — Anne Frank House, Amsterdam.

EDUCATION AND TRAINING
──────────────────────

- B.A. in Sociology — Bogazici University, 2019. Entered ranked 77th of ~2M on the national university entrance exam; financed studies via Top 100s' Scholarship (government) and TÜBİTAK 2205 License Scholarship Programme.
- Nilufer Science High School, Bursa — Full scholarship (rank 2,300 of 1.2M, 2007 OKS). First programming exposure (C).
- Investigative Journalism Scholarship — P24 Independent Journalism Foundation, 2019.
- 140+ Technical Courses — Udemy and Coursera: Python, Data Engineering, NLP, Deep Learning (5-course specialization), LangGraph/LLM Agents, Web Scraping, API Development, Software Architecture, Intelligence Analysis, Knowledge Graphs.

LANGUAGES
─────────

- English — Professional Working Proficiency (consecutive and simultaneous translation experience)
- Turkish — Native
- Latin — A1
- Ottoman Turkish — Low-level reading (for classical philosophy and archival work)