๐Ÿฅˆ 2nd Place โ€” TalentCLEF 2026 Task B

TalentCLEF 2026 Job-Skill Retrieval

๐Ÿ“… 2026 ๐Ÿ‘ค Co-First Author & Corresponding ๐Ÿ›๏ธ CLEF 2026 Working Notes ๐Ÿ† Codabench: TalentCLEF Task B

Overview

TalentCLEF 2026 Task B is a retrieval problem: given a free-text job-title query, rank the full ESCO skill corpus by graded relevance, scored with graded nDCG. It is hard because queries are short and lexically distant from the structured skill records, and the corpus holds thousands of near-synonym aliases the ranker must disambiguate. Our system is a four-stage pipeline that confines fine-tuning to a single bi-encoder stage and otherwise relies on zero-shot LLM inference.

๐Ÿ† Achievement: Reached 0.7913 graded nDCG on the official test set โ€” 2nd place on the Task B leaderboard. The full pipeline runs end-to-end on a single consumer GPU.

Pipeline

A GIST-fine-tuned bi-encoder ranker produces the initial ranking; two-sided test-time augmentation enriches both sides of the encoder; a two-stage LLM-reranker cascade then refines the top of the ranking with pointwise scoring and a pairwise tournament.

   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  โ‘   Job-title query   "data science intern"  โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  โ‘ก  JobBERT ranker (GIST fine-tuned)         โ”‚
   โ”‚     siamese bi-encoder ยท scores all          โ”‚
   โ”‚     9,052 ESCO skills (no top-K cutoff)      โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  โ‘ข  Two-sided test-time augmentation         โ”‚
   โ”‚     doc-side: alias-explode (max-cosine)     โ”‚
   โ”‚     query-side: multi-style HyDE             โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  โ‘ฃ  Pointwise LLM rerank (top 500)           โ”‚
   โ”‚     Qwen 0โ€“9 relevance ยท z-score fusion      โ”‚
   โ”‚     s = 0.3ยทz(s_be) + 0.7ยทz(s_llm)           โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  โ‘ค  Pairwise tournament (top 150)            โ”‚
   โ”‚     A/B comparisons ยท Bradley-Terry win countโ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  Ranked ESCO skills      โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Stage 2 โ€” JobBERT ranker with GIST fine-tuning

Stage 3 โ€” Two-sided test-time augmentation

The single largest gain in the pipeline (+0.116 graded nDCG), purely from indexing and output design โ€” no model change.

๐Ÿ“„ Doc-side: alias-explode

Rather than concatenating all aliases of a skill into one document, each alias is encoded as an independent document. A skill's score is the max cosine over its alias embeddings โ€” capturing whichever alias view best matches the query.

๐Ÿ”ฎ Query-side: multi-style HyDE

For each query, Qwen generates three hypothetical skill descriptions in different styles (long paragraph, one-sentence, keyword list). Each is encoded with the same bi-encoder; the skill score is the max cosine over the original query and the three HyDE views. Max-pooling over all three styles beats any single style.

Stage 4โ€“5 โ€” LLM-reranker cascade

Qwen 2.5-7B-Instruct-AWQ is applied zero-shot as a reranker in two complementary modes on top of the bi-encoder output (LLM cascade adds +0.043 on test).

Tech Stack

Python PyTorch Sentence-Transformers JobBERT-v2 GIST Loss HyDE Qwen 2.5-7B LLM Reranker ESCO Information Retrieval