Experimental Results & Model Performance

Repository & Resources

Model Performance Summary

Key Findings

  1. Model Size Impact: Models with fewer than 7B parameters lack basic understanding of the task and struggle to output correctly structured data. Team consensus is to use models no smaller than 7B parameters.
  2. Prompt Engineering: Removing CSV format references from prompts improves model performance, as the format was confusing some models.
  3. Conversation Clustering Optimization: The human-labeled data contains one topic per conversation. Prompts should be adjusted to account for this when optimizing for benchmarks.
  4. Data Processing Improvements:
  5. Hardware Requirements: More GPU capacity is needed to enable parallel experimentation. Current setup with 1 weak GPU makes parallel testing impossible.

Technical Improvements

Script Enhancements

  1. Prompt Preprocessing: