feat: sync full workspace including web modules, docs, and configurations to Gitea
Optimized the root .gitignore to exclude virtual environments, node modules, and temp folders to ensure clean and lightweight version tracking. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
30
axhub-make/skills/third-party/deep-research/.gitignore
vendored
Normal file
30
axhub-make/skills/third-party/deep-research/.gitignore
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
|
||||
# Virtual environments
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Research output (kept local)
|
||||
*.json
|
||||
|
||||
# Test output
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
htmlcov/
|
||||
495
axhub-make/skills/third-party/deep-research/ARCHITECTURE_REVIEW.md
vendored
Normal file
495
axhub-make/skills/third-party/deep-research/ARCHITECTURE_REVIEW.md
vendored
Normal file
@@ -0,0 +1,495 @@
|
||||
# Deep Research Skill: Architecture Review & Failure Analysis
|
||||
|
||||
**Date:** 2025-11-04
|
||||
**Purpose:** Comprehensive quality check against industry best practices and known LLM failure modes
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Status:** PRODUCTION-READY with 3 optimization recommendations
|
||||
|
||||
**Critical Issues:** 0
|
||||
**Optimization Opportunities:** 3
|
||||
**Strengths:** 8
|
||||
|
||||
---
|
||||
|
||||
## 1. COMPARISON TO INDUSTRY IMPLEMENTATIONS
|
||||
|
||||
### vs. AnkitClassicVision/Claude-Code-Deep-Research
|
||||
|
||||
| Feature | Their Approach | Our Approach | Winner |
|
||||
|---------|---------------|--------------|--------|
|
||||
| **Phases** | 7 (Scope→Plan→Retrieve→Triangulate→Draft→Critique→Package) | 8 (adds REFINE after Critique) | **Ours** (gap filling) |
|
||||
| **Validation** | Not documented | Automated 8-check system | **Ours** |
|
||||
| **Failure Handling** | Not documented | Explicit stop rules + error gates | **Ours** |
|
||||
| **Graph-of-Thoughts** | Yes, subagent spawning | Yes, parallel agents | **Tie** |
|
||||
| **Credibility Scoring** | Basic triangulation | 0-100 quantitative system | **Ours** |
|
||||
| **State Management** | Not documented | JSON serialization, recoverable | **Ours** |
|
||||
|
||||
**Verdict:** Our implementation is MORE ROBUST with superior validation and failure handling.
|
||||
|
||||
---
|
||||
|
||||
## 2. ALIGNMENT WITH ANTHROPIC BEST PRACTICES
|
||||
|
||||
### From Official Documentation & Community Research
|
||||
|
||||
✅ **PASS: Frontmatter Format**
|
||||
- Proper YAML with `name:` and `description:`
|
||||
- Description includes triggers and exclusions
|
||||
|
||||
✅ **PASS: Self-Contained Structure**
|
||||
- All resources in single directory
|
||||
- Progressive disclosure via references
|
||||
- No external dependencies (stdlib only)
|
||||
|
||||
⚠️ **WARNING: SKILL.md Length**
|
||||
- Current: 343 lines
|
||||
- Best practice recommendation: 100-200 lines
|
||||
- Official Anthropic: "No strict maximum" for complex skills with scripts
|
||||
- **Assessment:** ACCEPTABLE given complexity, but could optimize
|
||||
|
||||
✅ **PASS: Context Management**
|
||||
- Static-first architecture for caching (>1024 tokens)
|
||||
- Explicit cache boundary markers
|
||||
- Progressive loading (not full inline)
|
||||
- "Loss in the middle" avoidance
|
||||
|
||||
✅ **PASS: Plan-First Approach**
|
||||
- Decision tree at top of SKILL.md
|
||||
- Mode selection before execution
|
||||
- Phase-by-phase instructions
|
||||
|
||||
---
|
||||
|
||||
## 3. FAILURE MODE ANALYSIS
|
||||
|
||||
### Based on Research: "Why Do Multi-Agent LLM Systems Fail?" (arXiv:2503.13657)
|
||||
|
||||
#### 3.1 System Design Issues
|
||||
|
||||
**ISSUE: No referee for correctness validation**
|
||||
- ✅ **MITIGATED:** We have automated validator with 8 checks
|
||||
- ✅ **MITIGATED:** Human review required after 2 validation failures
|
||||
|
||||
**ISSUE: Poor termination conditions**
|
||||
- ⚠️ **PARTIAL:** Our modes define phase counts but no explicit timeout enforcement
|
||||
- **RECOMMENDATION:** Add max time limits per mode in SKILL.md
|
||||
|
||||
**ISSUE: Memory gaps (agents don't retain context)**
|
||||
- ✅ **MITIGATED:** ResearchState with JSON serialization
|
||||
- ✅ **MITIGATED:** State saved after each phase
|
||||
|
||||
#### 3.2 Inter-Agent Misalignment
|
||||
|
||||
**ISSUE: Agents work at cross-purposes**
|
||||
- ✅ **MITIGATED:** Single orchestration flow, no conflicting subagents
|
||||
- ✅ **MITIGATED:** Clear phase boundaries and handoffs
|
||||
|
||||
**ISSUE: Communication failures between agents**
|
||||
- ✅ **MITIGATED:** Centralized ResearchState, not distributed agents
|
||||
- Note: We use Task tool for parallel retrieval, not autonomous multi-agent
|
||||
|
||||
#### 3.3 Task Verification Problems
|
||||
|
||||
**ISSUE: Incomplete results go unchecked**
|
||||
- ✅ **MITIGATED:** Validator checks all required sections
|
||||
- ✅ **MITIGATED:** 3+ source triangulation enforced
|
||||
- ✅ **MITIGATED:** Credibility scoring (average must be >60/100)
|
||||
|
||||
**ISSUE: Iteration loops and cognitive deadlocks**
|
||||
- ✅ **MITIGATED:** Max 2 validation fix attempts, then escalate to user
|
||||
- ⚠️ **PARTIAL:** No explicit iteration limit for REFINE phase
|
||||
- **RECOMMENDATION:** Add max iterations to REFINE phase
|
||||
|
||||
---
|
||||
|
||||
## 4. SINGLE POINTS OF FAILURE (SPOF) ANALYSIS
|
||||
|
||||
### 4.1 CRITICAL PATH ANALYSIS
|
||||
|
||||
```
|
||||
User Query
|
||||
↓
|
||||
Decision Tree (SCOPE check) ← SPOF #1: If wrong decision, wastes resources
|
||||
↓
|
||||
Phase Execution Loop
|
||||
↓
|
||||
Validation Gate ← SPOF #2: If validator has bugs, bad reports pass
|
||||
↓
|
||||
File Write ← SPOF #3: If filesystem fails, research lost
|
||||
↓
|
||||
Delivery
|
||||
```
|
||||
|
||||
#### SPOF #1: Decision Tree Misclassification
|
||||
**Risk:** Skill invoked for simple lookups, wastes time
|
||||
**Mitigation:** ✅ Explicit "Do NOT use" in description
|
||||
**Status:** LOW RISK
|
||||
|
||||
#### SPOF #2: Validator Bugs
|
||||
**Risk:** Broken validation lets bad reports through
|
||||
**Mitigation:** ✅ Test fixtures (valid/invalid reports tested)
|
||||
**Evidence:** Test report passed ALL 8 CHECKS
|
||||
**Status:** LOW RISK (well-tested)
|
||||
|
||||
#### SPOF #3: Filesystem Failures
|
||||
**Risk:** Research completes but file write fails
|
||||
**Mitigation:** ⚠️ No retry logic for file operations
|
||||
**Recommendation:** Add try-except with retry for file writes
|
||||
**Status:** MEDIUM RISK
|
||||
|
||||
#### SPOF #4: Web Search API Unavailable
|
||||
**Risk:** Cannot retrieve sources, research fails
|
||||
**Mitigation:** ❌ No fallback mechanism
|
||||
**Recommendation:** Graceful degradation message to user
|
||||
**Status:** MEDIUM RISK (external dependency)
|
||||
|
||||
### 4.2 DEPENDENCY ANALYSIS
|
||||
|
||||
**External Dependencies:**
|
||||
1. WebSearch tool (Claude Code built-in) ← Cannot control
|
||||
2. Filesystem write access ← Usually reliable
|
||||
3. Python 3.x interpreter ← Standard
|
||||
|
||||
**Internal Dependencies:**
|
||||
1. validate_report.py ← Tested ✅
|
||||
2. source_evaluator.py ← Logic-based, no external calls ✅
|
||||
3. citation_manager.py ← String manipulation only ✅
|
||||
4. research_engine.py ← Orchestration, state management ✅
|
||||
|
||||
**Assessment:** Minimal dependency risk. Core functionality is self-contained.
|
||||
|
||||
---
|
||||
|
||||
## 5. OCCAM'S RAZOR: SIMPLIFICATION ANALYSIS
|
||||
|
||||
### Question: Is our 8-phase pipeline over-engineered?
|
||||
|
||||
#### Comparison of Approaches
|
||||
|
||||
**Minimal (3 phases):**
|
||||
Scope → Retrieve → Package
|
||||
- ❌ No verification
|
||||
- ❌ No synthesis
|
||||
- ❌ No quality control
|
||||
|
||||
**Standard (6 phases):**
|
||||
Scope → Plan → Retrieve → Triangulate → Synthesize → Package
|
||||
- ✅ Verification
|
||||
- ✅ Synthesis
|
||||
- ⚠️ No critique/refinement
|
||||
|
||||
**Our Approach (8 phases):**
|
||||
Scope → Plan → Retrieve → Triangulate → Synthesize → Critique → Refine → Package
|
||||
- ✅ Verification
|
||||
- ✅ Synthesis
|
||||
- ✅ Red-team critique
|
||||
- ✅ Gap filling
|
||||
|
||||
**Competitor (7 phases):**
|
||||
AnkitClassicVision has 7 phases (no separate REFINE)
|
||||
|
||||
#### Analysis
|
||||
|
||||
**REFINE Phase:**
|
||||
- Purpose: Address gaps identified in CRITIQUE
|
||||
- Cost: 2-5 additional minutes
|
||||
- Benefit: Completeness, addresses weaknesses before delivery
|
||||
- **Verdict:** JUSTIFIED for deep/ultradeep modes, COULD SKIP in quick/standard
|
||||
|
||||
**RECOMMENDATION:** Make REFINE phase conditional:
|
||||
- Quick mode: Skip
|
||||
- Standard mode: Skip (stay at 6 phases)
|
||||
- Deep mode: Include
|
||||
- UltraDeep mode: Include + iterate
|
||||
|
||||
**Potential Savings:**
|
||||
- Standard mode: 5-10 min → 4-8 min (faster than competitor's 7 phases)
|
||||
- Still beat OpenAI (5-30 min) and Gemini (2-5 min but lower quality)
|
||||
|
||||
---
|
||||
|
||||
## 6. WRITING STANDARDS ENFORCEMENT
|
||||
|
||||
### New Requirements (Added Today)
|
||||
|
||||
✅ **Precision:** Every word deliberately chosen
|
||||
✅ **Economy:** No fluff, eliminate fancy grammar
|
||||
✅ **Clarity:** Exact numbers, specific data
|
||||
✅ **Directness:** State findings without embellishment
|
||||
✅ **High signal-to-noise:** Dense information
|
||||
|
||||
### Implementation Locations
|
||||
|
||||
1. **SKILL.md lines 195-204:** Writing Standards section with examples
|
||||
2. **SKILL.md lines 160-165:** Report section standards
|
||||
3. **report_template.md lines 8-15:** Top-level HTML comments
|
||||
4. **report_template.md lines 59-61:** Main Analysis comments
|
||||
|
||||
### Verification Method
|
||||
|
||||
**Before:** No explicit guidance → LLM might use vague language
|
||||
**After:** 4 enforcement points with concrete examples
|
||||
|
||||
**Example transformation enforced:**
|
||||
- ❌ "significantly improved outcomes"
|
||||
- ✅ "reduced mortality 23% (p<0.01)"
|
||||
|
||||
---
|
||||
|
||||
## 7. STRESS TEST: EDGE CASES
|
||||
|
||||
### 7.1 Low Source Availability (<10 sources)
|
||||
|
||||
**Current Handling:**
|
||||
- ✅ Validator flags warning if <10 sources
|
||||
- ✅ SKILL.md says "document if fewer"
|
||||
- ⚠️ No automatic stop if 0-5 sources found
|
||||
|
||||
**RECOMMENDATION:** Add hard stop at <5 sources:
|
||||
```markdown
|
||||
**Stop immediately if:**
|
||||
- <5 sources after exhaustive search → Report limitation, ask user
|
||||
```
|
||||
**Status:** Already present in SKILL.md line 207 ✅
|
||||
|
||||
### 7.2 Contradictory Sources
|
||||
|
||||
**Current Handling:**
|
||||
- ✅ TRIANGULATE phase cross-references
|
||||
- ✅ Flag contradictions explicitly
|
||||
- ✅ Source credibility scoring helps prioritize
|
||||
|
||||
**Status:** HANDLED ✅
|
||||
|
||||
### 7.3 Time Pressure (User Wants Quick Result)
|
||||
|
||||
**Current Handling:**
|
||||
- ✅ Quick mode: 2-5 min with 3 phases
|
||||
- ✅ Mode selection at start
|
||||
|
||||
**Status:** HANDLED ✅
|
||||
|
||||
### 7.4 Technical Topic with Limited Public Sources
|
||||
|
||||
**Current Handling:**
|
||||
- ⚠️ No specialized academic database access
|
||||
- ⚠️ Relies entirely on WebSearch tool
|
||||
|
||||
**Note:** Competitor (K-Dense-AI/claude-scientific-skills) provides access to 26 scientific databases including PubMed, PubChem, AlphaFold DB.
|
||||
|
||||
**RECOMMENDATION:** Future enhancement - MCP server for academic databases
|
||||
|
||||
---
|
||||
|
||||
## 8. VALIDATION INFRASTRUCTURE ROBUSTNESS
|
||||
|
||||
### 8.1 Validator Test Coverage
|
||||
|
||||
**Test Fixtures:**
|
||||
- ✅ `valid_report.md` - passes all checks
|
||||
- ✅ `invalid_report.md` - triggers specific failures
|
||||
|
||||
**Test Execution:**
|
||||
```bash
|
||||
python scripts/validate_report.py --report tests/fixtures/valid_report.md
|
||||
# Result: ALL 8 CHECKS PASSED ✅
|
||||
```
|
||||
|
||||
**Real-World Test:**
|
||||
```bash
|
||||
python scripts/validate_report.py --report ../../research_output/senolytics_clinical_trials_test.md
|
||||
# Result: ALL 8 CHECKS PASSED ✅
|
||||
# Report: 2,356 words, 15 sources
|
||||
```
|
||||
|
||||
**Coverage:**
|
||||
1. ✅ Executive summary length (50-250 words)
|
||||
2. ✅ Required sections present
|
||||
3. ✅ Citations formatted [1], [2], [3]
|
||||
4. ✅ Bibliography matches citations
|
||||
5. ✅ No placeholder text (TBD, TODO)
|
||||
6. ✅ Word count reasonable (500-10000)
|
||||
7. ✅ Minimum 10 sources
|
||||
8. ✅ No broken internal links
|
||||
|
||||
**Status:** ROBUST ✅
|
||||
|
||||
### 8.2 Edge Case: What if Validator Itself Fails?
|
||||
|
||||
**Current Handling:**
|
||||
```python
|
||||
except Exception as e:
|
||||
print(f"❌ ERROR: Cannot read report: {e}")
|
||||
sys.exit(1)
|
||||
```
|
||||
|
||||
**Issue:** Generic exception catch, no retry logic
|
||||
**Risk:** Medium (validator crash would block delivery)
|
||||
**RECOMMENDATION:** Add validator self-test on invocation
|
||||
|
||||
---
|
||||
|
||||
## 9. PERFORMANCE BENCHMARKS
|
||||
|
||||
### Speed Comparison
|
||||
|
||||
| Implementation | Time | Phases | Quality |
|
||||
|----------------|------|--------|---------|
|
||||
| Claude Desktop | <1 min | Unknown | Low (no citations) |
|
||||
| Gemini Deep Research | 2-5 min | Unknown | Medium |
|
||||
| OpenAI Deep Research | 5-30 min | Unknown | High |
|
||||
| AnkitClassicVision | Unknown | 7 | Unknown (no validation) |
|
||||
| **Ours (Quick)** | **2-5 min** | **3** | **Medium** |
|
||||
| **Ours (Standard)** | **5-10 min** | **6** | **High** |
|
||||
| **Ours (Deep)** | **10-20 min** | **8** | **Highest** |
|
||||
| **Ours (UltraDeep)** | **20-45 min** | **8+** | **Highest** |
|
||||
|
||||
**Positioning:**
|
||||
- Quick mode: Competitive with Gemini (2-5 min)
|
||||
- Standard mode: Faster than OpenAI (5-10 vs 5-30)
|
||||
- Deep mode: Unmatched quality, reasonable time
|
||||
- UltraDeep mode: Premium tier, maximum rigor
|
||||
|
||||
---
|
||||
|
||||
## 10. RECOMMENDATIONS SUMMARY
|
||||
|
||||
### CRITICAL (0)
|
||||
None identified. System is production-ready.
|
||||
|
||||
### HIGH PRIORITY (2)
|
||||
|
||||
**1. Add Filesystem Retry Logic**
|
||||
```python
|
||||
# In report writing
|
||||
max_retries = 3
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
output_path.write_text(report)
|
||||
break
|
||||
except IOError as e:
|
||||
if attempt == max_retries - 1:
|
||||
raise
|
||||
time.sleep(1)
|
||||
```
|
||||
|
||||
**2. Conditional REFINE Phase**
|
||||
Update SKILL.md and research_engine.py:
|
||||
```python
|
||||
def get_phases_for_mode(mode: ResearchMode) -> List[ResearchPhase]:
|
||||
if mode == ResearchMode.QUICK:
|
||||
return [SCOPE, RETRIEVE, PACKAGE]
|
||||
elif mode == ResearchMode.STANDARD:
|
||||
return [SCOPE, PLAN, RETRIEVE, TRIANGULATE, SYNTHESIZE, PACKAGE] # Skip REFINE
|
||||
elif mode == ResearchMode.DEEP:
|
||||
return [SCOPE, PLAN, RETRIEVE, TRIANGULATE, SYNTHESIZE, CRITIQUE, REFINE, PACKAGE]
|
||||
# ...
|
||||
```
|
||||
|
||||
### MEDIUM PRIORITY (3)
|
||||
|
||||
**3. Add Explicit Timeout Enforcement**
|
||||
```markdown
|
||||
**Time Limits:**
|
||||
- Quick mode: 5 min max
|
||||
- Standard mode: 12 min max
|
||||
- Deep mode: 25 min max
|
||||
- UltraDeep mode: 50 min max
|
||||
```
|
||||
|
||||
**4. Add WebSearch Failure Graceful Degradation**
|
||||
```markdown
|
||||
**If WebSearch unavailable:**
|
||||
- Notify user immediately
|
||||
- Ask if they want to proceed with limited sources
|
||||
- Document limitation prominently in report
|
||||
```
|
||||
|
||||
**5. Add REFINE Phase Iteration Limit**
|
||||
```markdown
|
||||
**REFINE Phase:**
|
||||
- Max 2 iterations
|
||||
- If gaps remain after 2 iterations, document in limitations section
|
||||
```
|
||||
|
||||
### LOW PRIORITY (1)
|
||||
|
||||
**6. Future Enhancement: Academic Database Access**
|
||||
- Consider MCP server for PubMed, PubChem, ArXiv
|
||||
- Would match K-Dense-AI/claude-scientific-skills capability
|
||||
- Not blocking for current use cases
|
||||
|
||||
---
|
||||
|
||||
## 11. FINAL VERDICT
|
||||
|
||||
### Architecture Soundness: ✅ EXCELLENT
|
||||
|
||||
**Strengths:**
|
||||
1. Superior validation infrastructure vs competitors
|
||||
2. Robust state management with recovery
|
||||
3. Well-tested with fixtures and real-world data
|
||||
4. Context-optimized (85% latency reduction potential)
|
||||
5. Writing standards enforce precision and clarity
|
||||
6. Graceful degradation paths
|
||||
7. Minimal external dependencies
|
||||
8. Progressive disclosure for efficiency
|
||||
|
||||
**Weaknesses:**
|
||||
1. No filesystem retry logic (easy fix)
|
||||
2. REFINE phase not conditional by mode (optimization opportunity)
|
||||
3. No explicit timeout enforcement (nice-to-have)
|
||||
|
||||
### Occam's Razor Assessment: ✅ APPROPRIATELY COMPLEX
|
||||
|
||||
The 8-phase pipeline is justified for deep research. Making REFINE conditional would optimize standard mode without sacrificing quality.
|
||||
|
||||
### Production Readiness: ✅ READY
|
||||
|
||||
The system is production-ready with minor optimizations available. Zero critical blockers identified.
|
||||
|
||||
---
|
||||
|
||||
## 12. COMPARISON TO ORIGINAL REQUIREMENTS
|
||||
|
||||
### User's Request:
|
||||
> "Can you create a skill that does a high level if not better version of that [Claude Desktop deep research] -- it can use python scrips and libraries, don't hesitate to inspire yourself with github repo. Once done deploy globally so i can use in any instance of claude code."
|
||||
|
||||
### Delivered:
|
||||
|
||||
✅ **High-level or better:** Beats Claude Desktop, OpenAI, Gemini in quality
|
||||
✅ **Python scripts:** 4 scripts (research_engine, validator, source_evaluator, citation_manager)
|
||||
✅ **GitHub inspiration:** Analyzed AnkitClassicVision, Anthropic official, community repos
|
||||
✅ **Globally deployed:** Located in `~/.claude/skills/deep-research/`
|
||||
✅ **Works in any instance:** Self-contained, no external dependencies
|
||||
|
||||
### Additional Deliverables (Beyond Request):
|
||||
|
||||
✅ Automated validation (8 checks)
|
||||
✅ Source credibility scoring (0-100)
|
||||
✅ 4 depth modes (quick/standard/deep/ultradeep)
|
||||
✅ Context optimization (2025 best practices)
|
||||
✅ Writing standards enforcement (precision, economy)
|
||||
✅ Comprehensive documentation (6 supporting files)
|
||||
✅ Test fixtures and real-world validation
|
||||
✅ Competitive analysis vs market leaders
|
||||
|
||||
---
|
||||
|
||||
## CONCLUSION
|
||||
|
||||
The deep research skill is **production-ready** with **zero critical issues** and outperforms competing implementations in validation, failure handling, and quality control.
|
||||
|
||||
The 2 high-priority optimizations (filesystem retry, conditional REFINE) would enhance robustness and efficiency but are not blocking.
|
||||
|
||||
**Overall Grade: A (95/100)**
|
||||
|
||||
*Deductions:*
|
||||
- -3 for missing filesystem retry logic
|
||||
- -2 for non-conditional REFINE phase
|
||||
|
||||
**Recommendation:** Deploy as-is, implement optimizations in v1.1 based on real-world usage patterns.
|
||||
420
axhub-make/skills/third-party/deep-research/AUTONOMY_VERIFICATION.md
vendored
Normal file
420
axhub-make/skills/third-party/deep-research/AUTONOMY_VERIFICATION.md
vendored
Normal file
@@ -0,0 +1,420 @@
|
||||
# Autonomy Verification: Claude Code Skill Independence
|
||||
|
||||
**Date:** 2025-11-04
|
||||
**Purpose:** Verify deep-research skill operates autonomously without blocking user interaction
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
✅ **VERIFIED: Skill operates autonomously by default**
|
||||
|
||||
- **Discovery**: Properly configured with valid YAML frontmatter
|
||||
- **Autonomy**: Optimized for independent operation
|
||||
- **Blocking**: Only stops for critical errors (by design)
|
||||
- **Scripts**: No interactive prompts
|
||||
- **Default behavior**: Proceed → Execute → Deliver
|
||||
|
||||
---
|
||||
|
||||
## 1. SKILL DISCOVERY VERIFICATION
|
||||
|
||||
### Location Check
|
||||
```
|
||||
~/.claude/skills/deep-research/
|
||||
└── SKILL.md (with valid YAML frontmatter)
|
||||
```
|
||||
|
||||
**Status:** ✅ DISCOVERED
|
||||
|
||||
### Frontmatter Validation
|
||||
```yaml
|
||||
---
|
||||
name: deep-research
|
||||
description: Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.
|
||||
---
|
||||
```
|
||||
|
||||
**Python YAML Parser:** ✅ VALID
|
||||
**Description Length:** 414 characters
|
||||
**Trigger Keywords:** "deep research", "comprehensive analysis", "research report", "compare X vs Y", "analyze trends"
|
||||
**Exclusions:** "simple lookups", "debugging", "1-2 searches"
|
||||
|
||||
---
|
||||
|
||||
## 2. AUTONOMY OPTIMIZATION
|
||||
|
||||
### Before Optimization (Issues Identified)
|
||||
|
||||
**ISSUE #1: Clarify Section Too Aggressive**
|
||||
```markdown
|
||||
**When to ask:**
|
||||
- Question ambiguous or vague
|
||||
- Scope unclear (too broad/narrow)
|
||||
- Mode unspecified for complex topics
|
||||
- Time constraints critical
|
||||
```
|
||||
**Problem:** Could cause Claude to stop and ask questions too frequently, breaking autonomous flow.
|
||||
|
||||
**ISSUE #2: Preview Section Ambiguous**
|
||||
```markdown
|
||||
**Preview scope if:**
|
||||
- Mode is deep/ultradeep
|
||||
- Topic highly specialized
|
||||
- User requests preview
|
||||
```
|
||||
**Problem:** Unclear if this means "wait for approval" or just "announce plan and proceed".
|
||||
|
||||
### After Optimization (Fixed)
|
||||
|
||||
**FIX #1: Autonomy-First Clarify**
|
||||
```markdown
|
||||
### 1. Clarify (Rarely Needed - Prefer Autonomy)
|
||||
|
||||
**DEFAULT: Proceed autonomously. Make reasonable assumptions based on query context.**
|
||||
|
||||
**ONLY ask if CRITICALLY ambiguous:**
|
||||
- Query is genuinely incomprehensible (e.g., "research the thing")
|
||||
- Contradictory requirements (e.g., "quick 50-source ultradeep analysis")
|
||||
|
||||
**When in doubt: PROCEED with standard mode. User can redirect if needed.**
|
||||
|
||||
**Good autonomous assumptions:**
|
||||
- Technical query → Assume technical audience
|
||||
- Comparison query → Assume balanced perspective needed
|
||||
- Trend query → Assume recent 1-2 years unless specified
|
||||
- Standard mode is default for most queries
|
||||
```
|
||||
|
||||
**FIX #2: Clear Announcement (No Blocking)**
|
||||
```markdown
|
||||
**Announce plan (then proceed immediately):**
|
||||
- Briefly state: selected mode, estimated time, number of sources
|
||||
- Example: "Starting standard mode research (5-10 min, 15-30 sources)"
|
||||
- NO need to wait for approval - proceed directly to execution
|
||||
```
|
||||
|
||||
**FIX #3: Explicit Autonomy Principle**
|
||||
```markdown
|
||||
**AUTONOMY PRINCIPLE:** This skill operates independently. Proceed with reasonable assumptions. Only stop for critical errors or genuinely incomprehensible queries.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. AUTONOMOUS OPERATION FLOW
|
||||
|
||||
### Happy Path (No User Interaction)
|
||||
|
||||
```
|
||||
User Input: "deep research on quantum computing 2025"
|
||||
↓
|
||||
Skill Activates (triggers: "deep research")
|
||||
↓
|
||||
Plan: Standard mode (5-10 min, 15-30 sources)
|
||||
Announce: "Starting standard mode research..."
|
||||
↓
|
||||
Phase 1: SCOPE
|
||||
- Define research boundaries
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Phase 2: PLAN
|
||||
- Strategy formulation
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Phase 3: RETRIEVE
|
||||
- Web searches (15-30 sources)
|
||||
- Parallel agent spawning
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Phase 4: TRIANGULATE
|
||||
- Cross-verify 3+ sources per claim
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Phase 5: SYNTHESIZE
|
||||
- Generate insights
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Phase 6: PACKAGE
|
||||
- Generate markdown report
|
||||
- Save to ~/.claude/research_output/
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Phase 7: VALIDATE
|
||||
- Run 8 automated checks
|
||||
- No user input needed ✅
|
||||
↓
|
||||
Deliver:
|
||||
- Executive summary (inline)
|
||||
- File path confirmation
|
||||
- Source quality summary
|
||||
↓
|
||||
DONE (Total user interactions: 0 ✅)
|
||||
```
|
||||
|
||||
### Error Path (Intentional Stops)
|
||||
|
||||
**These are INTENTIONAL blocking points (by design):**
|
||||
|
||||
1. **Validation Failure (2 attempts)**
|
||||
- Condition: Report fails validation twice
|
||||
- Action: Stop, report issues, ask user
|
||||
- Justification: Don't deliver broken reports
|
||||
|
||||
2. **Insufficient Sources (<5)**
|
||||
- Condition: Exhaustive search finds <5 sources
|
||||
- Action: Report limitation, ask to proceed
|
||||
- Justification: User should know about data scarcity
|
||||
|
||||
3. **Critically Ambiguous Query**
|
||||
- Condition: Query is genuinely incomprehensible
|
||||
- Action: Ask for clarification
|
||||
- Justification: Can't proceed without basic understanding
|
||||
|
||||
**These stops are CORRECT behavior - quality over blind automation.**
|
||||
|
||||
---
|
||||
|
||||
## 4. PYTHON SCRIPT VERIFICATION
|
||||
|
||||
### Interactive Prompt Check
|
||||
|
||||
**Command:** `grep -r "input(" scripts/`
|
||||
**Result:** ✅ No input() calls found
|
||||
|
||||
**Scripts Verified:**
|
||||
- ✅ `research_engine.py` (578 lines) - No interactive prompts
|
||||
- ✅ `validate_report.py` (293 lines) - No interactive prompts
|
||||
- ✅ `source_evaluator.py` (292 lines) - No interactive prompts
|
||||
- ✅ `citation_manager.py` (177 lines) - No interactive prompts
|
||||
|
||||
### Syntax Validation
|
||||
|
||||
**Command:** `python -m py_compile scripts/*.py`
|
||||
**Result:** ✅ All scripts compile without errors
|
||||
|
||||
**Dependencies:** Python stdlib only (no external packages requiring user setup)
|
||||
|
||||
---
|
||||
|
||||
## 5. AUTONOMOUS MODE SELECTION
|
||||
|
||||
### Default Behavior Matrix
|
||||
|
||||
| User Query | Auto-Selected Mode | Time | Sources | User Input Needed? |
|
||||
|------------|-------------------|------|---------|-------------------|
|
||||
| "deep research X" | Standard | 5-10 min | 15-30 | ❌ No |
|
||||
| "quick overview of X" | Quick | 2-5 min | 10-15 | ❌ No |
|
||||
| "comprehensive analysis X" | Standard | 5-10 min | 15-30 | ❌ No |
|
||||
| "compare X vs Y" | Standard | 5-10 min | 15-30 | ❌ No |
|
||||
| "research the thing" (ambiguous) | Ask clarification | N/A | N/A | ✅ Yes (justified) |
|
||||
|
||||
**Autonomous Decision Logic:**
|
||||
- Clear query → Standard mode (DEFAULT)
|
||||
- "quick" keyword → Quick mode
|
||||
- "comprehensive" keyword → Standard mode
|
||||
- "deep" or "thorough" → Deep mode
|
||||
- Ambiguous → Standard mode (when in doubt, proceed)
|
||||
- Incomprehensible → Ask (rare edge case)
|
||||
|
||||
---
|
||||
|
||||
## 6. FILE STRUCTURE VERIFICATION
|
||||
|
||||
### Required Files (Claude Code Skill)
|
||||
|
||||
```
|
||||
~/.claude/skills/deep-research/
|
||||
├── SKILL.md ✅ (with valid frontmatter)
|
||||
├── scripts/ ✅ (all executable, no interactive prompts)
|
||||
│ ├── research_engine.py
|
||||
│ ├── validate_report.py
|
||||
│ ├── source_evaluator.py
|
||||
│ └── citation_manager.py
|
||||
├── templates/ ✅
|
||||
│ └── report_template.md
|
||||
├── reference/ ✅
|
||||
│ └── methodology.md
|
||||
└── tests/ ✅
|
||||
└── fixtures/
|
||||
├── valid_report.md
|
||||
└── invalid_report.md
|
||||
```
|
||||
|
||||
**Status:** ✅ All files present and properly structured
|
||||
|
||||
---
|
||||
|
||||
## 7. TRIGGER KEYWORDS (Automatic Invocation)
|
||||
|
||||
The skill automatically activates when user says:
|
||||
|
||||
✅ "deep research"
|
||||
✅ "comprehensive analysis"
|
||||
✅ "research report"
|
||||
✅ "compare X vs Y"
|
||||
✅ "analyze trends"
|
||||
|
||||
**Exclusions (skill does NOT activate for):**
|
||||
|
||||
❌ Simple lookups (use WebSearch instead)
|
||||
❌ Debugging (use standard tools)
|
||||
❌ Questions answerable with 1-2 searches
|
||||
|
||||
---
|
||||
|
||||
## 8. CONTEXT OPTIMIZATION (Independent Operation)
|
||||
|
||||
### Static vs Dynamic Content
|
||||
|
||||
**Static Content (Cached after first use):**
|
||||
- Core system instructions
|
||||
- Decision trees
|
||||
- Workflow definitions
|
||||
- Output contracts
|
||||
- Quality standards
|
||||
- Error handling
|
||||
|
||||
**Dynamic Content (Runtime only):**
|
||||
- User query
|
||||
- Retrieved sources
|
||||
- Generated analysis
|
||||
|
||||
**Benefit for Autonomy:**
|
||||
- First invocation: Full processing
|
||||
- Subsequent invocations: 85% faster (cached static content)
|
||||
- No external dependencies
|
||||
- No user configuration needed
|
||||
|
||||
---
|
||||
|
||||
## 9. INDEPENDENCE CHECKLIST
|
||||
|
||||
| Requirement | Status | Evidence |
|
||||
|-------------|--------|----------|
|
||||
| **Valid YAML frontmatter** | ✅ Pass | Python YAML parser validates |
|
||||
| **Skill discoverable by Claude Code** | ✅ Pass | Located in `~/.claude/skills/` |
|
||||
| **Clear trigger keywords** | ✅ Pass | 5+ triggers in description |
|
||||
| **Clear exclusion criteria** | ✅ Pass | "Do NOT use for..." specified |
|
||||
| **Autonomy principle stated** | ✅ Pass | "Operates independently" explicit |
|
||||
| **Default behavior: proceed** | ✅ Pass | "When in doubt: PROCEED" |
|
||||
| **No unnecessary clarification** | ✅ Pass | "Rarely Needed - Prefer Autonomy" |
|
||||
| **No approval waiting** | ✅ Pass | "NO need to wait for approval" |
|
||||
| **No interactive prompts in scripts** | ✅ Pass | `grep` confirms no input() |
|
||||
| **Python stdlib only (no setup)** | ✅ Pass | requirements.txt empty |
|
||||
| **All scripts compile** | ✅ Pass | `py_compile` succeeds |
|
||||
| **Error handling graceful** | ✅ Pass | Retry logic, clear error messages |
|
||||
| **Output path predetermined** | ✅ Pass | `~/.claude/research_output/` |
|
||||
| **Validation automated** | ✅ Pass | 8 checks, no manual review |
|
||||
| **Mode selection autonomous** | ✅ Pass | Standard as default |
|
||||
|
||||
**Total:** 15/15 checks passed ✅
|
||||
|
||||
---
|
||||
|
||||
## 10. COMPARISON: Before vs After Optimization
|
||||
|
||||
| Aspect | Before | After | Improvement |
|
||||
|--------|--------|-------|-------------|
|
||||
| **Clarify frequency** | "When to ask" (ambiguous conditions) | "Rarely needed" (explicit autonomy) | ✅ 90% fewer stops |
|
||||
| **Preview behavior** | "Preview scope if..." (unclear) | "Announce and proceed" (clear) | ✅ No blocking |
|
||||
| **Autonomy principle** | Implicit | Explicit ("operates independently") | ✅ Clear guidance |
|
||||
| **Default action** | Unclear | "PROCEED with standard mode" | ✅ Removes ambiguity |
|
||||
| **User interaction** | 2-3 stops possible | 0-1 stops (errors only) | ✅ 90% reduction |
|
||||
|
||||
---
|
||||
|
||||
## 11. EDGE CASE HANDLING
|
||||
|
||||
### Truly Ambiguous Query
|
||||
|
||||
**User:** "research the thing"
|
||||
|
||||
**Behavior:**
|
||||
1. Skill recognizes query is incomprehensible
|
||||
2. Asks: "What topic should I research?"
|
||||
3. User clarifies: "quantum computing"
|
||||
4. Proceeds autonomously
|
||||
|
||||
**Verdict:** ✅ Correct behavior (can't proceed without basic information)
|
||||
|
||||
### Borderline Ambiguous Query
|
||||
|
||||
**User:** "research recent developments"
|
||||
|
||||
**Old Behavior:** Might ask "Recent developments in what?"
|
||||
**New Behavior:** Makes reasonable assumption (tech/science), proceeds
|
||||
**Verdict:** ✅ Improved autonomy
|
||||
|
||||
### Clear Query
|
||||
|
||||
**User:** "deep research on CRISPR gene editing 2024-2025"
|
||||
|
||||
**Behavior:**
|
||||
1. Skill activates
|
||||
2. Announces: "Starting standard mode research (5-10 min, 15-30 sources)"
|
||||
3. Executes all 6 phases
|
||||
4. Generates 2,000-5,000 word report
|
||||
5. Delivers report
|
||||
|
||||
**User interactions:** 0 ✅
|
||||
|
||||
---
|
||||
|
||||
## 12. FINAL VERIFICATION
|
||||
|
||||
### Manual Test Simulation
|
||||
|
||||
**Test Query:** "comprehensive analysis of senolytics clinical trials"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. ✅ Skill activates (trigger: "comprehensive analysis")
|
||||
2. ✅ Announces plan without waiting
|
||||
3. ✅ Executes standard mode (6 phases)
|
||||
4. ✅ Gathers 15-30 sources
|
||||
5. ✅ Triangulates 3+ sources per claim
|
||||
6. ✅ Generates report (2,000-5,000 words)
|
||||
7. ✅ Validates automatically (8 checks)
|
||||
8. ✅ Saves to ~/.claude/research_output/
|
||||
9. ✅ Delivers executive summary
|
||||
|
||||
**Actual Result (from previous test):**
|
||||
- Report: 2,356 words ✅
|
||||
- Sources: 15 citations ✅
|
||||
- Validation: ALL 8 CHECKS PASSED ✅
|
||||
- User interactions: 0 ✅
|
||||
|
||||
**Verdict:** ✅ OPERATES AUTONOMOUSLY AS DESIGNED
|
||||
|
||||
---
|
||||
|
||||
## 13. GITHUB REPOSITORY SYNC
|
||||
|
||||
**Repository:** https://github.com/199-biotechnologies/claude-deep-research-skill
|
||||
**Visibility:** PRIVATE
|
||||
**Commit:** e4cd081
|
||||
|
||||
**Next Steps:**
|
||||
- Commit autonomy optimizations
|
||||
- Push to GitHub
|
||||
- Verify consistency
|
||||
|
||||
---
|
||||
|
||||
## CONCLUSION
|
||||
|
||||
### Autonomy Status: ✅ VERIFIED
|
||||
|
||||
The deep-research skill is properly configured as a Claude Code skill and optimized for autonomous operation:
|
||||
|
||||
1. **Discovery:** ✅ Valid frontmatter, correct location
|
||||
2. **Triggers:** ✅ Clear activation keywords
|
||||
3. **Autonomy:** ✅ Explicit "proceed independently" principle
|
||||
4. **Default:** ✅ "When in doubt, proceed" with reasonable assumptions
|
||||
5. **Scripts:** ✅ No interactive prompts, stdlib only
|
||||
6. **Blocking:** ✅ Only stops for critical errors (by design)
|
||||
7. **Flow:** ✅ 0 user interactions in happy path
|
||||
8. **Testing:** ✅ Real-world validation successful
|
||||
|
||||
**Independence Score:** 15/15 checks passed (100%)
|
||||
|
||||
**Ready for autonomous deployment and use.**
|
||||
179
axhub-make/skills/third-party/deep-research/COMPETITIVE_ANALYSIS.md
vendored
Normal file
179
axhub-make/skills/third-party/deep-research/COMPETITIVE_ANALYSIS.md
vendored
Normal file
@@ -0,0 +1,179 @@
|
||||
# Competitive Analysis: Deep Research Skill vs Market Leaders
|
||||
|
||||
## Competitive Landscape (2025)
|
||||
|
||||
### OpenAI Deep Research (o3-based)
|
||||
- **Time**: 5-30 minutes
|
||||
- **Sources**: Multi-step, unspecified count
|
||||
- **Model**: o3 reasoning
|
||||
- **Benchmark**: 26.6% on "Humanity's Last Exam"
|
||||
- **Strengths**: Visual browser, transparency sidebar, reasoning capability
|
||||
- **Weaknesses**: Slow, occasional hallucinations, may reference rumors
|
||||
|
||||
### Google Gemini Deep Research (2.5)
|
||||
- **Time**: "A few minutes"
|
||||
- **Sources**: "Hundreds of websites"
|
||||
- **Model**: Gemini 2.5 Flash Thinking
|
||||
- **Strengths**: PDF/image upload, Google Drive integration, interactive reports
|
||||
- **Process**: Creates plan for approval before executing
|
||||
- **Weaknesses**: Limited quality control
|
||||
|
||||
### Claude Desktop Research
|
||||
- **Time**: "Less than a minute" (claimed)
|
||||
- **Sources**: 427 sources in example (breadth over depth)
|
||||
- **Strengths**: Speed, Google Workspace integration
|
||||
- **Weaknesses**:
|
||||
- Often lacks cited sources for verification
|
||||
- Doesn't ask clarifying questions
|
||||
- Quality inconsistent
|
||||
- US/Japan/Brazil only, expensive ($100/mo Max plan)
|
||||
|
||||
---
|
||||
|
||||
## Our Deep Research Skill Advantages
|
||||
|
||||
### Speed Competitive
|
||||
- **Standard Mode**: 5-10 minutes (faster than OpenAI, comparable to Gemini)
|
||||
- **Quick Mode**: 2-5 minutes (approaches Claude Desktop speed)
|
||||
- **Parallel Agents**: Simultaneous source retrieval for efficiency
|
||||
|
||||
### Superior Quality Control
|
||||
| Feature | OpenAI | Gemini | Claude Desktop | **Our Skill** |
|
||||
|---------|--------|--------|---------------|---------------|
|
||||
| Source credibility scoring | ❌ | ❌ | ❌ | ✅ (0-100) |
|
||||
| 3+ source triangulation | Partial | ❌ | ❌ | ✅ (enforced) |
|
||||
| Built-in validation | ❌ | ❌ | ❌ | ✅ (automated) |
|
||||
| Critique phase | ❌ | ❌ | ❌ | ✅ (red-team) |
|
||||
| Refine phase | ❌ | ❌ | ❌ | ✅ (gap filling) |
|
||||
| Citation quality | Good | Good | Poor | ✅ Excellent |
|
||||
|
||||
### Better Methodology
|
||||
- **8-Phase Pipeline**: More thorough than competitors' ad-hoc approaches
|
||||
- **Graph-of-Thoughts**: Non-linear reasoning with branching paths
|
||||
- **Multiple Modes**: 4 depth levels (quick/standard/deep/ultradeep)
|
||||
- **Decision Trees**: Clear logic for mode and tool selection
|
||||
- **Stop Rules**: Prevents runaway research or low-quality loops
|
||||
|
||||
### Unique Differentiators
|
||||
|
||||
1. **Source Credibility Assessment**
|
||||
- Every source scored 0-100
|
||||
- Evaluates domain authority, recency, expertise, bias
|
||||
- Filters low-quality sources automatically
|
||||
|
||||
2. **Triangulation Phase**
|
||||
- Minimum 3 sources for major claims
|
||||
- Cross-reference verification
|
||||
- Flags contradictions explicitly
|
||||
|
||||
3. **Critique + Refine Cycle**
|
||||
- Red-team analysis before delivery
|
||||
- Identifies gaps and weaknesses
|
||||
- Iteratively improves before finalization
|
||||
|
||||
4. **Validation Infrastructure**
|
||||
- Automated quality checks
|
||||
- Catches placeholders, broken citations
|
||||
- Enforces quality standards
|
||||
|
||||
5. **Progressive Disclosure**
|
||||
- Tight SKILL.md (237 lines)
|
||||
- Detailed methodology in references
|
||||
- Efficient context management
|
||||
|
||||
### Performance Comparison
|
||||
|
||||
| Metric | OpenAI | Gemini | Claude Desktop | **Our Skill** |
|
||||
|--------|--------|--------|----------------|---------------|
|
||||
| **Speed** | 5-30 min | 2-5 min | <1 min | 2-10 min |
|
||||
| **Source Count** | Unspecified | Hundreds | 427 | 15-50 |
|
||||
| **Citation Quality** | Excellent | Good | Poor | Excellent |
|
||||
| **Verification** | Partial | Minimal | None | Rigorous (3+) |
|
||||
| **Customization** | None | Minimal | None | 4 modes |
|
||||
| **Validation** | None | None | None | Automated |
|
||||
| **Credibility Scoring** | No | No | No | Yes (0-100) |
|
||||
| **Cost** | $20/mo+ | $20/mo+ | $100/mo | Free (Claude Code) |
|
||||
|
||||
---
|
||||
|
||||
## Competitive Positioning
|
||||
|
||||
### When to Use Our Skill vs Competitors
|
||||
|
||||
**Use Our Skill When:**
|
||||
- Quality and verification are critical
|
||||
- Need source credibility assessment
|
||||
- Want multiple depth modes
|
||||
- Require local deployment/privacy
|
||||
- Need validation before delivery
|
||||
- Want reproducible methodology
|
||||
|
||||
**Use OpenAI When:**
|
||||
- Maximum reasoning depth needed
|
||||
- Visual content analysis required
|
||||
- Can afford 30+ minutes
|
||||
- Need visual browser capabilities
|
||||
|
||||
**Use Gemini When:**
|
||||
- PDF/image upload needed
|
||||
- Google Workspace integration required
|
||||
- Interactive reports desired
|
||||
- Fast turnaround acceptable with less rigor
|
||||
|
||||
**Use Claude Desktop When:**
|
||||
- Speed is absolute priority (< 1 min)
|
||||
- Breadth over depth preferred
|
||||
- Basic research acceptable
|
||||
- Can afford $100/mo
|
||||
|
||||
---
|
||||
|
||||
## Technical Advantages
|
||||
|
||||
### Architecture
|
||||
- **File-based skills system**: Portable, version-controlled
|
||||
- **No external dependencies**: Pure Python stdlib
|
||||
- **Offline-capable**: No API calls required
|
||||
- **Modular design**: Easy to customize and extend
|
||||
|
||||
### Quality Engineering
|
||||
- **Automated validation**: Catches 8+ error types
|
||||
- **Test fixtures**: Reproducible quality checks
|
||||
- **Error handling**: Clear stop rules and escalation
|
||||
- **Graceful degradation**: Handles limited sources
|
||||
|
||||
### Developer Experience
|
||||
- **Clear documentation**: SKILL.md, methodology, templates
|
||||
- **Testing infrastructure**: Valid/invalid fixtures
|
||||
- **Progressive disclosure**: Efficient context management
|
||||
- **Decision trees**: Explicit logic paths
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Summary
|
||||
|
||||
| Capability | Score | Notes |
|
||||
|-----------|-------|-------|
|
||||
| **Speed** | 8/10 | Faster than OpenAI, comparable to Gemini |
|
||||
| **Quality** | 10/10 | Superior validation and verification |
|
||||
| **Depth** | 9/10 | 8-phase pipeline, critique + refine |
|
||||
| **Citations** | 10/10 | Automatic tracking, validation |
|
||||
| **Credibility** | 10/10 | Unique 0-100 scoring system |
|
||||
| **Flexibility** | 10/10 | 4 modes, customizable |
|
||||
| **Cost** | 10/10 | Free with Claude Code |
|
||||
| **Privacy** | 10/10 | Local execution, no external APIs |
|
||||
|
||||
**Overall**: 77/80 (96%)
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Our Deep Research Skill delivers:
|
||||
- ✅ **Speed**: 5-10 min standard (competitive with Gemini, faster than OpenAI)
|
||||
- ✅ **Quality**: Superior through triangulation, critique, and validation
|
||||
- ✅ **Depth**: 8-phase methodology exceeds competitors
|
||||
- ✅ **Innovation**: Unique credibility scoring and validation
|
||||
- ✅ **Value**: Free, local, portable
|
||||
|
||||
**Best in class** for quality-critical research where verification and credibility matter.
|
||||
293
axhub-make/skills/third-party/deep-research/CONTEXT_OPTIMIZATION.md
vendored
Normal file
293
axhub-make/skills/third-party/deep-research/CONTEXT_OPTIMIZATION.md
vendored
Normal file
@@ -0,0 +1,293 @@
|
||||
# Context Optimization: 2025 Engineering Best Practices
|
||||
|
||||
## Applied Optimizations
|
||||
|
||||
This skill implements cutting-edge context engineering research from 2025 to achieve **85% latency reduction** and **90% cost reduction** through intelligent context management.
|
||||
|
||||
---
|
||||
|
||||
## 1. Prompt Caching Architecture
|
||||
|
||||
### Static-First Structure
|
||||
|
||||
**SKILL.md organized as:**
|
||||
```
|
||||
[STATIC BLOCK - Cached, >1024 tokens]
|
||||
├─ Frontmatter
|
||||
├─ Core system instructions
|
||||
├─ Decision trees
|
||||
├─ Workflow definitions
|
||||
├─ Output contracts
|
||||
├─ Quality standards
|
||||
└─ Error handling
|
||||
|
||||
[DYNAMIC BLOCK - Runtime only]
|
||||
├─ User query
|
||||
├─ Retrieved sources
|
||||
└─ Generated analysis
|
||||
```
|
||||
|
||||
**Result:** After first invocation, static instructions are cached, reducing latency by up to 85% and costs by up to 90% on subsequent calls.
|
||||
|
||||
### Format Consistency
|
||||
|
||||
- Exact whitespace, line breaks, and capitalization maintained
|
||||
- Consistent markdown formatting throughout
|
||||
- Clear delimiters (HTML comments, horizontal rules)
|
||||
|
||||
**Why it matters:** Cache hits require exact matching. Consistent formatting ensures maximum cache efficiency.
|
||||
|
||||
---
|
||||
|
||||
## 2. Progressive Disclosure
|
||||
|
||||
### On-Demand Loading
|
||||
|
||||
Rather than inlining all content, we reference external files:
|
||||
|
||||
```markdown
|
||||
# Load only when needed
|
||||
- [methodology.md](./reference/methodology.md) - Loaded per-phase
|
||||
- [report_template.md](./templates/report_template.md) - Loaded for Phase 8 only
|
||||
```
|
||||
|
||||
**Benefit:** Reduces token usage by 60-75% compared to full inline approach. Context stays focused on current phase.
|
||||
|
||||
### Reference Strategy
|
||||
|
||||
- **Heavy content**: External files (methodology, templates)
|
||||
- **Critical instructions**: Inline (decision trees, quality gates)
|
||||
- **Examples**: External (test fixtures)
|
||||
|
||||
---
|
||||
|
||||
## 3. Avoiding "Loss in the Middle"
|
||||
|
||||
### The Problem
|
||||
|
||||
Research shows LLMs struggle with information buried in middle of long contexts. Recall drops significantly for middle sections.
|
||||
|
||||
### Our Solution
|
||||
|
||||
**Explicit guidance in SKILL.md:**
|
||||
```
|
||||
Critical: Avoid "Loss in the Middle"
|
||||
- Place key findings at START and END of sections, not buried
|
||||
- Use explicit headers and markers
|
||||
- Structure: Summary → Details → Conclusion
|
||||
```
|
||||
|
||||
**Report structure enforced:**
|
||||
- Executive Summary (START)
|
||||
- Main content (MIDDLE)
|
||||
- Synthesis & Insights (END)
|
||||
- Recommendations (END)
|
||||
|
||||
**Result:** Critical information positioned where models have highest recall.
|
||||
|
||||
---
|
||||
|
||||
## 4. Explicit Section Markers
|
||||
|
||||
### HTML Comments for Navigation
|
||||
|
||||
```html
|
||||
<!-- STATIC CONTEXT BLOCK START - Optimized for prompt caching -->
|
||||
...
|
||||
<!-- STATIC CONTEXT BLOCK END -->
|
||||
|
||||
<!-- 📝 Dynamic content begins here -->
|
||||
```
|
||||
|
||||
**Purpose:** Helps model understand context boundaries and efficiently navigate long documents.
|
||||
|
||||
### Hierarchical Structure
|
||||
|
||||
- Clear markdown hierarchy (##, ###)
|
||||
- Numbered sections
|
||||
- ASCII tree diagrams for decision flows
|
||||
|
||||
---
|
||||
|
||||
## 5. Context Pruning Strategies
|
||||
|
||||
### Selective Loading
|
||||
|
||||
**Phase 1 (SCOPE):**
|
||||
```python
|
||||
# Only load scope instructions
|
||||
load("./reference/methodology.md#phase-1-scope")
|
||||
# Do not load phases 2-8 yet
|
||||
```
|
||||
|
||||
**Phase 8 (PACKAGE):**
|
||||
```python
|
||||
# Only load template when needed
|
||||
load("./templates/report_template.md")
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
| Approach | Token Usage | Latency | Cost |
|
||||
|----------|-------------|---------|------|
|
||||
| Inline all | ~15,000 | High | High |
|
||||
| Progressive (ours) | ~4,000-6,000 | 85% lower | 90% lower |
|
||||
|
||||
---
|
||||
|
||||
## 6. Agent Communication Protocol
|
||||
|
||||
### Multi-Agent Context Sharing
|
||||
|
||||
When spawning parallel agents for retrieval:
|
||||
|
||||
```python
|
||||
# Each agent gets minimal context
|
||||
agent.context = {
|
||||
"query": user_query,
|
||||
"phase": "RETRIEVE",
|
||||
"instructions": load("./reference/methodology.md#phase-3-retrieve"),
|
||||
"sources": assigned_sources # Only their subset
|
||||
}
|
||||
```
|
||||
|
||||
**Avoid:** Sending full skill context to every agent
|
||||
**Benefit:** 3-5x faster parallel execution
|
||||
|
||||
---
|
||||
|
||||
## 7. KV Cache Efficiency
|
||||
|
||||
### Consistent Prefixes
|
||||
|
||||
The static block acts as consistent prefix across all invocations:
|
||||
|
||||
**First call:**
|
||||
```
|
||||
[Static Block 2000 tokens] + [Query 100 tokens] = 2100 tokens processed
|
||||
```
|
||||
|
||||
**Subsequent calls (cached):**
|
||||
```
|
||||
[Cached] + [Query 100 tokens] = 100 tokens processed
|
||||
```
|
||||
|
||||
**Speedup:** 20x for static portion
|
||||
|
||||
### Implications
|
||||
|
||||
- First research query: 5-10 minutes
|
||||
- Subsequent queries: 2-5 minutes (cache hit)
|
||||
- Enterprise use: Massive cost savings with repeated research
|
||||
|
||||
---
|
||||
|
||||
## 8. Validation Layer
|
||||
|
||||
### Context-Aware Validation
|
||||
|
||||
Validator checks for context bloat:
|
||||
|
||||
```python
|
||||
def check_word_count(self):
|
||||
word_count = len(self.content.split())
|
||||
if word_count > 10000:
|
||||
self.warnings.append(
|
||||
f"Report very long: {word_count} words (consider condensing)"
|
||||
)
|
||||
```
|
||||
|
||||
**Purpose:** Keeps outputs concise, preventing downstream context issues.
|
||||
|
||||
---
|
||||
|
||||
## Benchmark: Before vs After
|
||||
|
||||
### Old Approach (Pre-2025)
|
||||
|
||||
```
|
||||
SKILL.md: 413 lines, all inline
|
||||
├─ Full methodology embedded (long)
|
||||
├─ Templates inlined
|
||||
├─ No caching markers
|
||||
└─ No progressive loading
|
||||
|
||||
Result: ~18,000 tokens per invocation, no caching benefit
|
||||
```
|
||||
|
||||
### New Approach (2025 Optimized)
|
||||
|
||||
```
|
||||
SKILL.md: 300 lines, strategic structure
|
||||
├─ Static block (cached after first use)
|
||||
├─ Progressive references
|
||||
├─ Explicit markers
|
||||
└─ Dynamic zone clearly separated
|
||||
|
||||
Result: ~2,000 tokens cached, ~4,000 dynamic = 6,000 total
|
||||
Cache hit: 2,000 tokens reused, only 4,000 new tokens processed
|
||||
```
|
||||
|
||||
### Performance Gains
|
||||
|
||||
| Metric | Old | New | Improvement |
|
||||
|--------|-----|-----|-------------|
|
||||
| **First call latency** | 10 min | 10 min | 0% (same) |
|
||||
| **Cached call latency** | 10 min | 1.5 min | **85%** |
|
||||
| **Token cost (cached)** | 18K | 4K | **78%** |
|
||||
| **Context efficiency** | Low | High | **3-4x** |
|
||||
|
||||
---
|
||||
|
||||
## Research Sources
|
||||
|
||||
These optimizations based on:
|
||||
|
||||
1. **"A Survey of Context Engineering for Large Language Models"** (arXiv:2507.13334, 2025) by Lingrui Mei et al.
|
||||
2. **Anthropic Prompt Caching Documentation** (2025) - 90% cost reduction, 85% latency reduction
|
||||
3. **"Context Windows Get Huge"** - IEEE Spectrum (2025) - Long context best practices
|
||||
4. **WebWeaver Framework** (2025) - Avoiding "loss in the middle" in research pipelines
|
||||
5. **Kimi Linear Model** (2025) - 75% KV cache reduction techniques
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
When creating new research skills, ensure:
|
||||
|
||||
- [ ] Static content first (>1024 tokens for caching)
|
||||
- [ ] Dynamic content last
|
||||
- [ ] Explicit cache boundary markers
|
||||
- [ ] Progressive reference loading (not inline)
|
||||
- [ ] "Loss in the middle" avoidance (key info at start/end)
|
||||
- [ ] Clear section navigation markers
|
||||
- [ ] Format consistency maintained
|
||||
- [ ] Context pruning per phase
|
||||
- [ ] Validation for output size
|
||||
- [ ] Multi-agent minimal context protocol
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential 2026 optimizations:
|
||||
|
||||
1. **Adaptive context windows** - Adjust based on query complexity
|
||||
2. **Semantic caching** - Cache similar (not identical) contexts
|
||||
3. **Context compression** - Auto-summarize retrieved sources
|
||||
4. **Hierarchical agents** - Deeper context partitioning
|
||||
5. **Real-time cache metrics** - Monitor hit rates, optimize
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
By applying 2025 context engineering research, this skill achieves:
|
||||
|
||||
✅ **85% latency reduction** (cached calls)
|
||||
✅ **90% cost reduction** (token savings)
|
||||
✅ **3-4x context efficiency** (progressive loading)
|
||||
✅ **No "loss in the middle"** (strategic positioning)
|
||||
✅ **Production-ready architecture** (scalable, maintainable)
|
||||
|
||||
These optimizations make deep research practical for high-frequency use cases while maintaining superior quality vs competitors.
|
||||
167
axhub-make/skills/third-party/deep-research/QUICK_START.md
vendored
Normal file
167
axhub-make/skills/third-party/deep-research/QUICK_START.md
vendored
Normal file
@@ -0,0 +1,167 @@
|
||||
# Deep Research Skill - Quick Start Guide
|
||||
|
||||
## What is This?
|
||||
|
||||
A comprehensive research engine for Claude Code that **matches and exceeds** Claude Desktop's "Advanced Research" feature. It conducts enterprise-grade deep research with extended reasoning, multi-source synthesis, and citation-backed reports.
|
||||
|
||||
## How to Use
|
||||
|
||||
### Simple Invocation (Recommended)
|
||||
|
||||
Just ask Claude Code to use deep research:
|
||||
|
||||
```
|
||||
Use deep research to analyze the current state of AI agent frameworks in 2025
|
||||
```
|
||||
|
||||
```
|
||||
Deep research: Should we migrate from PostgreSQL to Supabase?
|
||||
```
|
||||
|
||||
```
|
||||
Use deep research in ultradeep mode to review recent advances in longevity science
|
||||
```
|
||||
|
||||
### Direct CLI Usage
|
||||
|
||||
```bash
|
||||
# Standard research (6 phases, ~5-10 minutes)
|
||||
python3 ~/.claude/skills/deep-research/research_engine.py \
|
||||
--query "Your research question" \
|
||||
--mode standard
|
||||
|
||||
# Deep research (8 phases, ~10-20 minutes)
|
||||
python3 ~/.claude/skills/deep-research/research_engine.py \
|
||||
--query "Your research question" \
|
||||
--mode deep
|
||||
|
||||
# Quick research (3 phases, ~2-5 minutes)
|
||||
python3 ~/.claude/skills/deep-research/research_engine.py \
|
||||
--query "Your research question" \
|
||||
--mode quick
|
||||
|
||||
# Ultra-deep research (8+ phases, ~20-45 minutes)
|
||||
python3 ~/.claude/skills/deep-research/research_engine.py \
|
||||
--query "Your research question" \
|
||||
--mode ultradeep
|
||||
```
|
||||
|
||||
## Research Modes Explained
|
||||
|
||||
| Mode | Phases | Time | Use When |
|
||||
|------|--------|------|----------|
|
||||
| **Quick** | 3 | 2-5 min | Initial exploration, simple questions |
|
||||
| **Standard** | 6 | 5-10 min | Most research needs (default) |
|
||||
| **Deep** | 8 | 10-20 min | Complex topics, important decisions |
|
||||
| **UltraDeep** | 8+ | 20-45 min | Critical analysis, comprehensive reports |
|
||||
|
||||
## What You Get
|
||||
|
||||
Every research report includes:
|
||||
|
||||
- **Executive Summary** - Key findings in 3-5 bullets
|
||||
- **Detailed Analysis** - With full citations [1], [2], [3]
|
||||
- **Synthesis & Insights** - Novel insights beyond sources
|
||||
- **Limitations & Caveats** - What's uncertain or missing
|
||||
- **Recommendations** - Actionable next steps
|
||||
- **Full Bibliography** - All sources with credibility scores
|
||||
- **Methodology Appendix** - How research was conducted
|
||||
|
||||
## Output Location
|
||||
|
||||
All research is saved to:
|
||||
```
|
||||
~/.claude/research_output/
|
||||
```
|
||||
|
||||
Format: `research_report_YYYYMMDD_HHMMSS.md`
|
||||
|
||||
## Features That Beat Claude Desktop Research
|
||||
|
||||
✅ **8-Phase Pipeline** - More thorough than Claude Desktop's approach
|
||||
✅ **Multiple Research Modes** - Choose depth vs speed
|
||||
✅ **Source Credibility Scoring** - Evaluates each source (0-100 score)
|
||||
✅ **Graph-of-Thoughts** - Non-linear exploration with branching reasoning
|
||||
✅ **Citation Management** - Automatic tracking and bibliography generation
|
||||
✅ **Critique Phase** - Built-in red-team analysis of findings
|
||||
✅ **Refine Phase** - Addresses gaps before finalizing
|
||||
✅ **Local File Integration** - Can search your codebase/docs
|
||||
✅ **Code Execution** - Can run analyses and validations
|
||||
|
||||
## Example Use Cases
|
||||
|
||||
### Technology Evaluation
|
||||
```
|
||||
Use deep research to compare Next.js 15 vs Remix vs Astro for my project
|
||||
```
|
||||
|
||||
### Market Analysis
|
||||
```
|
||||
Deep research: What are the key trends in longevity biotech funding 2023-2025?
|
||||
```
|
||||
|
||||
### Technical Decision
|
||||
```
|
||||
Use deep research to help me choose between Auth0, Clerk, and Supabase Auth
|
||||
```
|
||||
|
||||
### Scientific Review
|
||||
```
|
||||
Use deep research in ultradeep mode to summarize senolytics research progress
|
||||
```
|
||||
|
||||
### Competitive Intelligence
|
||||
```
|
||||
Deep research: Who are the top 5 competitors in the AI code assistant space?
|
||||
```
|
||||
|
||||
## Quality Standards
|
||||
|
||||
Every report guarantees:
|
||||
- ✅ 10+ distinct sources (unless highly specialized topic)
|
||||
- ✅ 3+ source verification for major claims
|
||||
- ✅ Full citation tracking
|
||||
- ✅ Credibility assessment for each source
|
||||
- ✅ Limitations documented
|
||||
- ✅ Methodology explained
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
1. **Be Specific** - "Compare X vs Y for use case Z" is better than "Tell me about X"
|
||||
2. **State Your Goal** - "Help me decide..." vs "Give me an overview..."
|
||||
3. **Choose Right Mode** - Use Quick for exploration, Deep for decisions
|
||||
4. **Check Scope First** - Review Phase 1 output to ensure on track
|
||||
5. **Use Citations** - Drill deeper by asking about specific sources [1], [2], etc.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
deep-research/
|
||||
├── SKILL.md # Main skill definition (11KB)
|
||||
├── research_engine.py # Core engine (16KB)
|
||||
├── utils/
|
||||
│ ├── citation_manager.py # Citation tracking (6KB)
|
||||
│ └── source_evaluator.py # Credibility scoring (8KB)
|
||||
├── README.md # Full documentation
|
||||
├── QUICK_START.md # This guide
|
||||
└── requirements.txt # No external deps needed!
|
||||
```
|
||||
|
||||
## No Dependencies Required!
|
||||
|
||||
The skill uses only Python standard library - no pip install needed for basic usage.
|
||||
|
||||
## Version
|
||||
|
||||
**v1.0** - Released 2025-11-04
|
||||
|
||||
Built to match and exceed Claude Desktop's Advanced Research feature.
|
||||
|
||||
---
|
||||
|
||||
**Ready to use?** Just type:
|
||||
```
|
||||
Use deep research to [your question here]
|
||||
```
|
||||
|
||||
Claude Code will automatically load this skill and execute the research pipeline!
|
||||
259
axhub-make/skills/third-party/deep-research/README.md
vendored
Normal file
259
axhub-make/skills/third-party/deep-research/README.md
vendored
Normal file
@@ -0,0 +1,259 @@
|
||||
# Deep Research Skill for Claude Code
|
||||
|
||||
A comprehensive research engine that brings Claude Desktop's Advanced Research capabilities (and more) to Claude Code terminal.
|
||||
|
||||
## Features
|
||||
|
||||
### Core Research Pipeline
|
||||
- **8.5-Phase Research Pipeline**: Scope → Plan → Retrieve (Parallel) → Triangulate → **Outline Refinement** → Synthesize → Critique → Refine → Package
|
||||
- **Multiple Research Modes**: Quick, Standard, Deep, and UltraDeep
|
||||
- **Graph-of-Thoughts Reasoning**: Non-linear exploration with branching thought paths
|
||||
|
||||
### 2025 Enhancements (Latest - v2.2)
|
||||
- **🔄 Auto-Continuation System (NEW)**: **TRUE UNLIMITED length** (50K, 100K+ words) via recursive agent spawning with context preservation
|
||||
- **📄 Progressive File Assembly**: Section-by-section generation with quality safeguards
|
||||
- **⚡ Parallel Search Execution**: 5-10 concurrent searches + parallel agents (3-5x faster Phase 3)
|
||||
- **🎯 First Finish Search (FFS) Pattern**: Adaptive completion based on quality thresholds
|
||||
- **🔍 Enhanced Citation Validation (CiteGuard)**: Hallucination detection, URL verification, multi-source cross-checking
|
||||
- **📋 Dynamic Outline Evolution (WebWeaver)**: Adapt structure after Phase 4 based on evidence
|
||||
- **🔗 Attribution Gradients UI**: Interactive citation tooltips showing evidence chains in HTML reports
|
||||
- **🛡️ Anti-Fatigue Enforcement**: Prose-first quality checks prevent bullet-point degradation
|
||||
|
||||
### Traditional Strengths
|
||||
- **Citation Management**: Automatic source tracking and bibliography generation
|
||||
- **Source Credibility Assessment**: Evaluates source quality and potential biases
|
||||
- **Structured Reports**: Professional markdown, HTML (McKinsey-style), and PDF outputs
|
||||
- **Verification & Triangulation**: Cross-references claims across multiple sources
|
||||
|
||||
## Installation
|
||||
|
||||
The skill is already installed globally in `~/.claude/skills/deep-research/`
|
||||
|
||||
No additional dependencies required for basic usage.
|
||||
|
||||
## Usage
|
||||
|
||||
### In Claude Code
|
||||
|
||||
Simply invoke the skill:
|
||||
|
||||
```
|
||||
Use deep research to analyze the state of quantum computing in 2025
|
||||
```
|
||||
|
||||
Or specify a mode:
|
||||
|
||||
```
|
||||
Use deep research in ultradeep mode to compare PostgreSQL vs Supabase
|
||||
```
|
||||
|
||||
### Direct CLI Usage
|
||||
|
||||
```bash
|
||||
# Standard research
|
||||
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode standard
|
||||
|
||||
# Deep research (all 8 phases)
|
||||
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode deep
|
||||
|
||||
# Quick research (3 phases only)
|
||||
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode quick
|
||||
|
||||
# Ultra-deep research (extended iterations)
|
||||
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode ultradeep
|
||||
```
|
||||
|
||||
## Research Modes
|
||||
|
||||
| Mode | Phases | Duration | Best For |
|
||||
|------|--------|----------|----------|
|
||||
| **Quick** | 3 phases | 2-5 min | Simple topics, initial exploration |
|
||||
| **Standard** | 6 phases | 5-10 min | Most research questions |
|
||||
| **Deep** | 8 phases | 10-20 min | Complex topics requiring thorough analysis |
|
||||
| **UltraDeep** | 8+ phases | 20-45 min | Critical decisions, comprehensive reports |
|
||||
|
||||
## Output
|
||||
|
||||
Research reports are saved to organized folders in `~/Documents/[Topic]_Research_[Date]/`
|
||||
|
||||
Each report includes:
|
||||
- Executive Summary
|
||||
- Detailed Analysis with Citations
|
||||
- Synthesis & Insights
|
||||
- Limitations & Caveats
|
||||
- Recommendations
|
||||
- Full Bibliography
|
||||
- Methodology Appendix
|
||||
|
||||
### Unlimited Report Generation (2025 Auto-Continuation System)
|
||||
|
||||
Reports use **progressive file assembly with auto-continuation** - achieving truly unlimited length through recursive agent spawning:
|
||||
|
||||
**How It Works:**
|
||||
|
||||
1. **Initial Generation (18K words)**
|
||||
- Generate sections 1-10 progressively
|
||||
- Each section written to file immediately (stays under 32K limit per agent)
|
||||
- Save continuation state with research context
|
||||
|
||||
2. **Auto-Continuation (if needed)**
|
||||
- Automatically spawns continuation agent via Task tool
|
||||
- Continuation agent loads state: themes, narrative arc, citations, quality metrics
|
||||
- Generates next batch of sections (another 18K words)
|
||||
- Updates state and spawns next agent if more sections remain
|
||||
|
||||
3. **Recursive Chaining**
|
||||
- Each agent stays under 32K output token limit
|
||||
- Chain continues until all sections complete
|
||||
- Final agent generates bibliography and validates report
|
||||
|
||||
**Realistic Report Sizes:**
|
||||
- **Quick mode**: 2,000-4,000 words (single run) ✅
|
||||
- **Standard mode**: 4,000-8,000 words (single run) ✅
|
||||
- **Deep mode**: 8,000-15,000 words (single run) ✅
|
||||
- **UltraDeep mode**: 20,000-100,000+ words (auto-continuation) ✅
|
||||
|
||||
**Example: 50,000 word report:**
|
||||
- Agent 1: Sections 1-10 (18K words) → Spawns Agent 2
|
||||
- Agent 2: Sections 11-20 (18K words) → Spawns Agent 3
|
||||
- Agent 3: Sections 21-25 + Bibliography (14K words) → Complete!
|
||||
- Total: 50K words across 3 agents, each under 32K limit
|
||||
|
||||
**Context Preservation (Quality Safeguards):**
|
||||
|
||||
Continuation state includes:
|
||||
- ✅ Research question and key themes
|
||||
- ✅ Main findings summaries (100 words each)
|
||||
- ✅ Narrative arc position (beginning/middle/end)
|
||||
- ✅ Quality metrics (avg words, citation density, prose ratio)
|
||||
- ✅ All citations used + bibliography entries
|
||||
- ✅ Writing style characteristics
|
||||
|
||||
Each continuation agent:
|
||||
- Reads last 3 sections to understand flow
|
||||
- Maintains established themes and style
|
||||
- Continues citation numbering correctly
|
||||
- Matches quality metrics (±20% tolerance)
|
||||
- Verifies coherence before each section
|
||||
|
||||
**Quality Gates (Per Section):**
|
||||
- [ ] Word count: Within ±20% of average
|
||||
- [ ] Citation density: Matches established rate
|
||||
- [ ] Prose ratio: ≥80% prose (not bullets)
|
||||
- [ ] Theme alignment: Ties to key themes
|
||||
- [ ] Style consistency: Matches established patterns
|
||||
|
||||
**Benefits:**
|
||||
- ✅ TRUE unlimited length (50K, 100K+ words achievable)
|
||||
- ✅ Fully automatic (no manual intervention)
|
||||
- ✅ Context preserved across continuations
|
||||
- ✅ Quality maintained throughout
|
||||
- ✅ Each agent stays under 32K token limit
|
||||
- ✅ Progressive assembly prevents truncation
|
||||
|
||||
## Examples
|
||||
|
||||
### Technology Analysis
|
||||
```
|
||||
Use deep research to evaluate whether we should adopt Next.js 15 for our project
|
||||
```
|
||||
|
||||
### Market Research
|
||||
```
|
||||
Use deep research to analyze longevity biotech funding trends 2023-2025
|
||||
```
|
||||
|
||||
### Technical Decision
|
||||
```
|
||||
Use deep research to compare authentication solutions: Auth0 vs Clerk vs Supabase Auth
|
||||
```
|
||||
|
||||
### Scientific Review
|
||||
```
|
||||
Use deep research in ultradeep mode to summarize recent advances in senolytic therapies
|
||||
```
|
||||
|
||||
## Quality Standards
|
||||
|
||||
Every research output:
|
||||
- ✅ Minimum 10+ distinct sources
|
||||
- ✅ Citations for all major claims
|
||||
- ✅ Cross-verified facts (3+ sources)
|
||||
- ✅ Executive summary under 250 words
|
||||
- ✅ Limitations section
|
||||
- ✅ Full bibliography
|
||||
- ✅ Methodology documentation
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
deep-research/
|
||||
├── SKILL.md # Main skill definition
|
||||
├── research_engine.py # Core orchestration engine
|
||||
├── utils/
|
||||
│ ├── citation_manager.py # Citation tracking & bibliography
|
||||
│ └── source_evaluator.py # Source credibility assessment
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
1. **Be Specific**: Frame questions clearly with context
|
||||
2. **Set Expectations**: Specify if you need comparisons, recommendations, or pure analysis
|
||||
3. **Choose Appropriate Mode**: Use Quick for exploration, Deep for decisions
|
||||
4. **Review Scope**: Check Phase 1 output to ensure research is on track
|
||||
5. **Leverage Citations**: Use citation numbers to drill deeper into specific sources
|
||||
|
||||
## Comparison with Claude Desktop Research
|
||||
|
||||
| Feature | Claude Desktop | Deep Research Skill |
|
||||
|---------|---------------|---------------------|
|
||||
| Multi-source synthesis | ✅ | ✅ |
|
||||
| Citation tracking | ✅ | ✅ |
|
||||
| Iterative refinement | ✅ | ✅ |
|
||||
| Source verification | ✅ | ✅ Enhanced |
|
||||
| Credibility scoring | ❌ | ✅ |
|
||||
| 8-phase methodology | ❌ | ✅ |
|
||||
| Graph-of-Thoughts | ❌ | ✅ |
|
||||
| Multiple modes | ❌ | ✅ |
|
||||
| Local file integration | ❌ | ✅ |
|
||||
| Code execution | ❌ | ✅ |
|
||||
|
||||
## 2025 Research Papers Implemented
|
||||
|
||||
This skill now incorporates cutting-edge techniques from 2025 academic research:
|
||||
|
||||
1. **Parallel Execution** (GAP, Flash-Searcher, TPS-Bench)
|
||||
- DAG-based parallel tool use for independent subtasks
|
||||
- 3-5x faster retrieval phase
|
||||
- Concurrent search strategies
|
||||
|
||||
2. **First Finish Search** (arXiv 2505.18149)
|
||||
- Quality threshold gates by mode
|
||||
- Continue background searches for depth
|
||||
- Optimal latency-accuracy tradeoff
|
||||
|
||||
3. **Citation Validation** (CiteGuard, arXiv 2510.17853)
|
||||
- Hallucination pattern detection
|
||||
- Multi-source verification (DOI + URL)
|
||||
- Strict mode for critical reports
|
||||
|
||||
4. **Dynamic Outlines** (WebWeaver, arXiv 2509.13312)
|
||||
- Evidence-driven structure adaptation
|
||||
- Phase 4.5 refinement step
|
||||
- Prevents locked-in research paths
|
||||
|
||||
5. **Attribution Gradients** (arXiv 2510.00361)
|
||||
- Interactive evidence chains
|
||||
- Hover tooltips in HTML reports
|
||||
- Improved auditability
|
||||
|
||||
## Version
|
||||
|
||||
2.0 (2025-11-05) - Major update with 2025 research enhancements
|
||||
1.0 (2025-11-04) - Initial release
|
||||
|
||||
## License
|
||||
|
||||
User skill - modify as needed for your workflow
|
||||
856
axhub-make/skills/third-party/deep-research/SKILL.md
vendored
Normal file
856
axhub-make/skills/third-party/deep-research/SKILL.md
vendored
Normal file
@@ -0,0 +1,856 @@
|
||||
---
|
||||
name: deep-research
|
||||
description: Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.
|
||||
---
|
||||
|
||||
# Deep Research
|
||||
|
||||
<!-- STATIC CONTEXT BLOCK START - Optimized for prompt caching -->
|
||||
<!-- All static instructions, methodology, and templates below this line -->
|
||||
<!-- Dynamic content (user queries, results) added after this block -->
|
||||
|
||||
## Core System Instructions
|
||||
|
||||
**Purpose:** Deliver citation-backed, verified research reports through 8-phase pipeline (Scope → Plan → Retrieve → Triangulate → Synthesize → Critique → Refine → Package) with source credibility scoring and progressive context management.
|
||||
|
||||
**Context Strategy:** This skill uses 2025 context engineering best practices:
|
||||
- Static instructions cached (this section)
|
||||
- Progressive disclosure (load references only when needed)
|
||||
- Avoid "loss in the middle" (critical info at start/end, not buried)
|
||||
- Explicit section markers for context navigation
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree (Execute First)
|
||||
|
||||
```
|
||||
Request Analysis
|
||||
├─ Simple lookup? → STOP: Use WebSearch, not this skill
|
||||
├─ Debugging? → STOP: Use standard tools, not this skill
|
||||
└─ Complex analysis needed? → CONTINUE
|
||||
|
||||
Mode Selection
|
||||
├─ Initial exploration? → quick (3 phases, 2-5 min)
|
||||
├─ Standard research? → standard (6 phases, 5-10 min) [DEFAULT]
|
||||
├─ Critical decision? → deep (8 phases, 10-20 min)
|
||||
└─ Comprehensive review? → ultradeep (8+ phases, 20-45 min)
|
||||
|
||||
Execution Loop (per phase)
|
||||
├─ Load phase instructions from [methodology](./reference/methodology.md#phase-N)
|
||||
├─ Execute phase tasks
|
||||
├─ Spawn parallel agents if applicable
|
||||
└─ Update progress
|
||||
|
||||
Validation Gate
|
||||
├─ Run `python scripts/validate_report.py --report [path]`
|
||||
├─ Pass? → Deliver
|
||||
└─ Fail? → Fix (max 2 attempts) → Still fails? → Escalate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow (Clarify → Plan → Act → Verify → Report)
|
||||
|
||||
**AUTONOMY PRINCIPLE:** This skill operates independently. Infer assumptions from query context. Only stop for critical errors or incomprehensible queries.
|
||||
|
||||
### 1. Clarify (Rarely Needed - Prefer Autonomy)
|
||||
|
||||
**DEFAULT: Proceed autonomously. Derive assumptions from query signals.**
|
||||
|
||||
**ONLY ask if CRITICALLY ambiguous:**
|
||||
- Query is incomprehensible (e.g., "research the thing")
|
||||
- Contradictory requirements (e.g., "quick 50-source ultradeep analysis")
|
||||
|
||||
**When in doubt: PROCEED with standard mode. User will redirect if incorrect.**
|
||||
|
||||
**Default assumptions:**
|
||||
- Technical query → Assume technical audience
|
||||
- Comparison query → Assume balanced perspective needed
|
||||
- Trend query → Assume recent 1-2 years unless specified
|
||||
- Standard mode is default for most queries
|
||||
|
||||
---
|
||||
|
||||
### 2. Plan
|
||||
|
||||
**Mode selection criteria:**
|
||||
- **Quick** (2-5 min): Exploration, broad overview, time-sensitive
|
||||
- **Standard** (5-10 min): Most use cases, balanced depth/speed [DEFAULT]
|
||||
- **Deep** (10-20 min): Important decisions, need thorough verification
|
||||
- **UltraDeep** (20-45 min): Critical analysis, maximum rigor
|
||||
|
||||
**Announce plan and execute:**
|
||||
- Briefly state: selected mode, estimated time, number of sources
|
||||
- Example: "Starting standard mode research (5-10 min, 15-30 sources)"
|
||||
- Proceed without waiting for approval
|
||||
|
||||
---
|
||||
|
||||
### 3. Act (Phase Execution)
|
||||
|
||||
**All modes execute:**
|
||||
- Phase 1: SCOPE - Define boundaries ([method](./reference/methodology.md#phase-1-scope))
|
||||
- Phase 3: RETRIEVE - Parallel search execution (5-10 concurrent searches + agents) ([method](./reference/methodology.md#phase-3-retrieve---parallel-information-gathering))
|
||||
- Phase 8: PACKAGE - Generate report using [template](./templates/report_template.md)
|
||||
|
||||
**Standard/Deep/UltraDeep execute:**
|
||||
- Phase 2: PLAN - Strategy formulation
|
||||
- Phase 4: TRIANGULATE - Verify 3+ sources per claim
|
||||
- Phase 4.5: OUTLINE REFINEMENT - Adapt structure based on evidence (WebWeaver 2025) ([method](./reference/methodology.md#phase-45-outline-refinement---dynamic-evolution-webweaver-2025))
|
||||
- Phase 5: SYNTHESIZE - Generate novel insights
|
||||
|
||||
**Deep/UltraDeep execute:**
|
||||
- Phase 6: CRITIQUE - Red-team analysis
|
||||
- Phase 7: REFINE - Address gaps
|
||||
|
||||
**Critical: Avoid "Loss in the Middle"**
|
||||
- Place key findings at START and END of sections, not buried
|
||||
- Use explicit headers and markers
|
||||
- Structure: Summary → Details → Conclusion (not Details sandwiched)
|
||||
|
||||
**Progressive Context Loading:**
|
||||
- Load [methodology](./reference/methodology.md) sections on-demand
|
||||
- Load [template](./templates/report_template.md) only for Phase 8
|
||||
- Do not inline everything - reference external files
|
||||
|
||||
**Anti-Hallucination Protocol (CRITICAL):**
|
||||
- **Source grounding**: Every factual claim MUST cite a specific source immediately [N]
|
||||
- **Clear boundaries**: Distinguish between FACTS (from sources) and SYNTHESIS (your analysis)
|
||||
- **Explicit markers**: Use "According to [1]..." or "[1] reports..." for source-grounded statements
|
||||
- **No speculation without labeling**: Mark inferences as "This suggests..." not "Research shows..."
|
||||
- **Verify before citing**: If unsure whether source actually says X, do NOT fabricate citation
|
||||
- **When uncertain**: Say "No sources found for X" rather than inventing references
|
||||
|
||||
**Parallel Execution Requirements (CRITICAL for Speed):**
|
||||
|
||||
**Phase 3 RETRIEVE - Mandatory Parallel Search:**
|
||||
1. **Decompose query** into 5-10 independent search angles before ANY searches
|
||||
2. **Launch ALL searches in single message** with multiple tool calls (NOT sequential)
|
||||
3. **Quality threshold monitoring** for FFS pattern:
|
||||
- Track source count and avg credibility score
|
||||
- Proceed when threshold reached (mode-specific, see methodology)
|
||||
- Continue background searches for additional depth
|
||||
4. **Spawn 3-5 parallel agents** using Task tool for deep-dive investigations
|
||||
|
||||
**Example correct execution:**
|
||||
```
|
||||
[Single message with 8+ parallel tool calls]
|
||||
WebSearch #1: Core topic semantic
|
||||
WebSearch #2: Technical keywords
|
||||
WebSearch #3: Recent 2024-2025 filtered
|
||||
WebSearch #4: Academic domains
|
||||
WebSearch #5: Critical analysis
|
||||
WebSearch #6: Industry trends
|
||||
Task agent #1: Academic paper analysis
|
||||
Task agent #2: Technical documentation deep dive
|
||||
```
|
||||
|
||||
**❌ WRONG (sequential execution):**
|
||||
```
|
||||
WebSearch #1 → wait for results → WebSearch #2 → wait → WebSearch #3...
|
||||
```
|
||||
|
||||
**✅ RIGHT (parallel execution):**
|
||||
```
|
||||
All searches + agents launched simultaneously in one message
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Verify (Always Execute)
|
||||
|
||||
**Step 1: Citation Verification (Catches Fabricated Sources)**
|
||||
|
||||
```bash
|
||||
python scripts/verify_citations.py --report [path]
|
||||
```
|
||||
|
||||
**Checks:**
|
||||
- DOI resolution (verifies citation actually exists)
|
||||
- Title/year matching (detects mismatched metadata)
|
||||
- Flags suspicious entries (2024+ without DOI, no URL, failed verification)
|
||||
|
||||
**If suspicious citations found:**
|
||||
- Review flagged entries manually
|
||||
- Remove or replace fabricated sources
|
||||
- Re-run until clean
|
||||
|
||||
**Step 2: Structure & Quality Validation**
|
||||
|
||||
```bash
|
||||
python scripts/validate_report.py --report [path]
|
||||
```
|
||||
|
||||
**8 automated checks:**
|
||||
1. Executive summary length (50-250 words)
|
||||
2. Required sections present (+ recommended: Claims table, Counterevidence)
|
||||
3. Citations formatted [1], [2], [3]
|
||||
4. Bibliography matches citations
|
||||
5. No placeholder text (TBD, TODO)
|
||||
6. Word count reasonable (500-10000)
|
||||
7. Minimum 10 sources
|
||||
8. No broken internal links
|
||||
|
||||
**If fails:**
|
||||
- Attempt 1: Auto-fix formatting/links
|
||||
- Attempt 2: Manual review + correction
|
||||
- After 2 failures: **STOP** → Report issues → Ask user
|
||||
|
||||
---
|
||||
|
||||
### 5. Report
|
||||
|
||||
**CRITICAL: Generate COMPREHENSIVE, DETAILED markdown reports**
|
||||
|
||||
**File Organization (CRITICAL - Clean Accessibility):**
|
||||
|
||||
**1. Create Organized Folder in Documents:**
|
||||
- ALWAYS create dedicated folder: `~/Documents/[TopicName]_Research_[YYYYMMDD]/`
|
||||
- Extract clean topic name from research question (remove special chars, use underscores/CamelCase)
|
||||
- Examples:
|
||||
- "psilocybin research 2025" → `~/Documents/Psilocybin_Research_20251104/`
|
||||
- "compare React vs Vue" → `~/Documents/React_vs_Vue_Research_20251104/`
|
||||
- "AI safety trends" → `~/Documents/AI_Safety_Trends_Research_20251104/`
|
||||
- If folder exists, use it; if not, create it
|
||||
- This ensures clean organization and easy accessibility
|
||||
|
||||
**2. Save All Formats to Same Folder:**
|
||||
|
||||
**Markdown (Primary Source):**
|
||||
- Save to: `[Documents folder]/research_report_[YYYYMMDD]_[topic_slug].md`
|
||||
- Also save copy to: `~/.claude/research_output/` (internal tracking)
|
||||
- Full detailed report with all findings
|
||||
|
||||
**HTML (McKinsey Style - ALWAYS GENERATE):**
|
||||
- Save to: `[Documents folder]/research_report_[YYYYMMDD]_[topic_slug].html`
|
||||
- Use McKinsey template: [mckinsey_template](./templates/mckinsey_report_template.html)
|
||||
- Design principles: Sharp corners (NO border-radius), muted corporate colors (navy #003d5c, gray #f8f9fa), ultra-compact layout, info-first structure
|
||||
- Place critical metrics dashboard at top (extract 3-4 key quantitative findings)
|
||||
- Use data tables for dense information presentation
|
||||
- 14px base font, compact spacing, no decorative gradients or colors
|
||||
- **Attribution Gradients (2025):** Wrap each citation [N] in `<span class="citation">` with nested tooltip div showing source details
|
||||
- OPEN in browser automatically after generation
|
||||
|
||||
**PDF (Professional Print - ALWAYS GENERATE):**
|
||||
- Save to: `[Documents folder]/research_report_[YYYYMMDD]_[topic_slug].pdf`
|
||||
- Use generating-pdf skill (via Task tool with general-purpose agent)
|
||||
- Professional formatting with headers, page numbers
|
||||
- OPEN in default PDF viewer after generation
|
||||
|
||||
**3. File Naming Convention:**
|
||||
All files use same base name for easy matching:
|
||||
- `research_report_20251104_psilocybin_2025.md`
|
||||
- `research_report_20251104_psilocybin_2025.html`
|
||||
- `research_report_20251104_psilocybin_2025.pdf`
|
||||
|
||||
**Length Requirements (UNLIMITED with Progressive Assembly):**
|
||||
- Quick mode: 2,000+ words (baseline quality threshold)
|
||||
- Standard mode: 4,000+ words (comprehensive analysis)
|
||||
- Deep mode: 6,000+ words (thorough investigation)
|
||||
- UltraDeep mode: 10,000-50,000+ words (NO UPPER LIMIT - as comprehensive as evidence warrants)
|
||||
|
||||
**How Unlimited Length Works:**
|
||||
Progressive file assembly allows ANY report length by generating section-by-section.
|
||||
Each section is written to file immediately (avoiding output token limits).
|
||||
Complex topics with many findings? Generate 20, 30, 50+ findings - no constraint!
|
||||
|
||||
**Content Requirements:**
|
||||
- Use [template](./templates/report_template.md) as exact structure
|
||||
- Generate each section to APPROPRIATE depth (determined by evidence, not word targets)
|
||||
- Include specific data, statistics, dates, numbers (not vague statements)
|
||||
- Multiple paragraphs per finding with evidence (as many as needed)
|
||||
- Each section gets focused generation attention
|
||||
- DO NOT write summaries - write FULL analysis
|
||||
|
||||
**Writing Standards:**
|
||||
- **Narrative-driven**: Write in flowing prose. Each finding tells a story with beginning (context), middle (evidence), end (implications)
|
||||
- **Precision**: Every word deliberately chosen, carries intention
|
||||
- **Economy**: No fluff, eliminate fancy grammar, unnecessary modifiers
|
||||
- **Clarity**: Exact numbers embedded in sentences ("The study demonstrated a 23% reduction in mortality"), not isolated in bullets
|
||||
- **Directness**: State findings without embellishment
|
||||
- **High signal-to-noise**: Dense information, respect reader's time
|
||||
|
||||
**Bullet Point Policy (Anti-Fatigue Enforcement):**
|
||||
- Use bullets SPARINGLY: Only for distinct lists (product names, company roster, enumerated steps)
|
||||
- NEVER use bullets as primary content delivery - they fragment thinking
|
||||
- Each findings section requires substantive prose paragraphs (3-5+ paragraphs minimum)
|
||||
- Example: Instead of "• Market size: $2.4B" write "The global market reached $2.4 billion in 2023, driven by increasing consumer demand and regulatory tailwinds [1]."
|
||||
|
||||
**Anti-Fatigue Quality Check (Apply to EVERY Section):**
|
||||
Before considering a section complete, verify:
|
||||
- [ ] **Paragraph count**: ≥3 paragraphs for major sections (## headings)
|
||||
- [ ] **Prose-first**: <20% of content is bullet points (≥80% must be flowing prose)
|
||||
- [ ] **No placeholders**: Zero instances of "Content continues", "Due to length", "[Sections X-Y]"
|
||||
- [ ] **Evidence-rich**: Specific data points, statistics, quotes (not vague statements)
|
||||
- [ ] **Citation density**: Major claims cited within same sentence
|
||||
|
||||
**If ANY check fails:** Regenerate the section before moving to next.
|
||||
|
||||
**Source Attribution Standards (Critical for Preventing Fabrication):**
|
||||
- **Immediate citation**: Every factual claim followed by [N] citation in same sentence
|
||||
- **Quote sources directly**: Use "According to [1]..." or "[1] reports..." for factual statements
|
||||
- **Distinguish fact from synthesis**:
|
||||
- ✅ GOOD: "Mortality decreased 23% (p<0.01) in the treatment group [1]."
|
||||
- ❌ BAD: "Studies show mortality improved significantly."
|
||||
- **No vague attributions**:
|
||||
- ❌ NEVER: "Research suggests...", "Studies show...", "Experts believe..."
|
||||
- ✅ ALWAYS: "Smith et al. (2024) found..." [1], "According to FDA data..." [2]
|
||||
- **Label speculation explicitly**:
|
||||
- ✅ GOOD: "This suggests a potential mechanism..." (analysis, not fact)
|
||||
- ❌ BAD: "The mechanism is..." (presented as fact without citation)
|
||||
- **Admit uncertainty**:
|
||||
- ✅ GOOD: "No sources found addressing X directly."
|
||||
- ❌ BAD: Fabricating a citation to fill the gap
|
||||
- **Template pattern**: "[Specific claim with numbers/data] [Citation]. [Analysis/implication]."
|
||||
|
||||
**Deliver to user:**
|
||||
1. Executive summary (inline in chat)
|
||||
2. Organized folder path (e.g., "All files saved to: ~/Documents/Psilocybin_Research_20251104/")
|
||||
3. Confirmation of all three formats generated:
|
||||
- Markdown (source)
|
||||
- HTML (McKinsey-style, opened in browser)
|
||||
- PDF (professional print, opened in viewer)
|
||||
4. Source quality assessment summary (source count)
|
||||
5. Next steps (if relevant)
|
||||
|
||||
**Generation Workflow: Progressive File Assembly (Unlimited Length)**
|
||||
|
||||
**Phase 8.1: Setup**
|
||||
```bash
|
||||
# Extract topic slug from research question
|
||||
# Create folder: ~/Documents/[TopicName]_Research_[YYYYMMDD]/
|
||||
mkdir -p ~/Documents/[folder_name]
|
||||
|
||||
# Create initial markdown file with frontmatter
|
||||
# File path: [folder]/research_report_[YYYYMMDD]_[slug].md
|
||||
```
|
||||
|
||||
**Phase 8.2: Progressive Section Generation**
|
||||
|
||||
**CRITICAL STRATEGY:** Generate and write each section individually to file using Write/Edit tools.
|
||||
This allows unlimited report length while keeping each generation manageable.
|
||||
|
||||
**OUTPUT TOKEN LIMIT SAFEGUARD (CRITICAL - Claude Code Default: 32K):**
|
||||
|
||||
Claude Code default limit: 32,000 output tokens (≈24,000 words total per skill execution)
|
||||
This is a HARD LIMIT and cannot be changed within the skill.
|
||||
|
||||
**What this means:**
|
||||
- Total output (your text + all tool call content) must be <32,000 tokens
|
||||
- 32,000 tokens ≈ 24,000 words max
|
||||
- Leave safety margin: Target ≤20,000 words total output
|
||||
|
||||
**Realistic report sizes per mode:**
|
||||
- Quick mode: 2,000-4,000 words ✅ (well under limit)
|
||||
- Standard mode: 4,000-8,000 words ✅ (comfortably under limit)
|
||||
- Deep mode: 8,000-15,000 words ✅ (achievable with care)
|
||||
- UltraDeep mode: 15,000-20,000 words ⚠️ (at limit, monitor closely)
|
||||
|
||||
**For reports >20,000 words:**
|
||||
User must run skill multiple times:
|
||||
- Run 1: "Generate Part 1 (sections 1-6)" → saves to part1.md
|
||||
- Run 2: "Generate Part 2 (sections 7-12)" → saves to part2.md
|
||||
- User manually combines or asks Claude to merge files
|
||||
|
||||
**Auto-Continuation Strategy (TRUE Unlimited Length):**
|
||||
|
||||
When report exceeds 18,000 words in single run:
|
||||
1. Generate sections 1-10 (stay under 18K words)
|
||||
2. Save continuation state file with context preservation
|
||||
3. Spawn continuation agent via Task tool
|
||||
4. Continuation agent: Reads state → Generates next batch → Spawns next agent if needed
|
||||
5. Chain continues recursively until complete
|
||||
|
||||
This achieves UNLIMITED length while respecting 32K limit per agent
|
||||
|
||||
**Initialize Citation Tracking:**
|
||||
```
|
||||
citations_used = [] # Maintain this list in working memory throughout
|
||||
```
|
||||
|
||||
**Section Generation Loop:**
|
||||
|
||||
**Pattern:** Generate section content → Use Write/Edit tool with that content → Move to next section
|
||||
Each Write/Edit call contains ONE section (≤2,000 words per call)
|
||||
|
||||
1. **Executive Summary** (200-400 words)
|
||||
- Generate section content
|
||||
- Tool: Write(file, content=frontmatter + Executive Summary)
|
||||
- Track citations used
|
||||
- Progress: "✓ Executive Summary"
|
||||
|
||||
2. **Introduction** (400-800 words)
|
||||
- Generate section content
|
||||
- Tool: Edit(file, old=last_line, new=old + Introduction section)
|
||||
- Track citations used
|
||||
- Progress: "✓ Introduction"
|
||||
|
||||
3. **Finding 1** (600-2,000 words)
|
||||
- Generate complete finding
|
||||
- Tool: Edit(file, append Finding 1)
|
||||
- Track citations used
|
||||
- Progress: "✓ Finding 1"
|
||||
|
||||
4. **Finding 2** (600-2,000 words)
|
||||
- Generate complete finding
|
||||
- Tool: Edit(file, append Finding 2)
|
||||
- Track citations used
|
||||
- Progress: "✓ Finding 2"
|
||||
|
||||
... Continue for ALL findings (each finding = one Edit tool call, ≤2,000 words)
|
||||
|
||||
**CRITICAL:** If you have 10 findings × 1,500 words each = 15,000 words of findings
|
||||
This is OKAY because each Edit call is only 1,500 words (under 2,000 word limit per tool call)
|
||||
The FILE grows to 15,000 words, but no single tool call exceeds limits
|
||||
|
||||
4. **Synthesis & Insights**
|
||||
- Generate: Novel insights beyond source statements (as long as needed for synthesis)
|
||||
- Tool: Edit (append to file)
|
||||
- Track: Extract citations, append to citations_used
|
||||
- Progress: "Generated Synthesis ✓"
|
||||
|
||||
5. **Limitations & Caveats**
|
||||
- Generate: Counterevidence, gaps, uncertainties (appropriate depth)
|
||||
- Tool: Edit (append to file)
|
||||
- Track: Extract citations, append to citations_used
|
||||
- Progress: "Generated Limitations ✓"
|
||||
|
||||
6. **Recommendations**
|
||||
- Generate: Immediate actions, next steps, research needs (appropriate depth)
|
||||
- Tool: Edit (append to file)
|
||||
- Track: Extract citations, append to citations_used
|
||||
- Progress: "Generated Recommendations ✓"
|
||||
|
||||
7. **Bibliography (CRITICAL - ALL Citations)**
|
||||
- Generate: COMPLETE bibliography with EVERY citation from citations_used list
|
||||
- Format: [1], [2], [3]... [N] - each citation gets full entry
|
||||
- Verification: Check citations_used list - if list contains [1] through [73], generate all 73 entries
|
||||
- NO ranges ([1-50]), NO placeholders ("Additional citations"), NO truncation
|
||||
- Tool: Edit (append to file)
|
||||
- Progress: "Generated Bibliography ✓ (N citations)"
|
||||
|
||||
8. **Methodology Appendix**
|
||||
- Generate: Research process, verification approach (appropriate depth)
|
||||
- Tool: Edit (append to file)
|
||||
- Progress: "Generated Methodology ✓"
|
||||
|
||||
**Phase 8.3: Auto-Continuation Decision Point**
|
||||
|
||||
After generating sections, check word count:
|
||||
|
||||
**If total output ≤18,000 words:** Complete normally
|
||||
- Generate Bibliography (all citations)
|
||||
- Generate Methodology
|
||||
- Verify complete report
|
||||
- Save copy to ~/.claude/research_output/
|
||||
- Done! ✓
|
||||
|
||||
**If total output will exceed 18,000 words:** Auto-Continuation Protocol
|
||||
|
||||
**Step 1: Save Continuation State**
|
||||
Create file: `~/.claude/research_output/continuation_state_[report_id].json`
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "2.1.1",
|
||||
"report_id": "[unique_id]",
|
||||
"file_path": "[absolute_path_to_report.md]",
|
||||
"mode": "[quick|standard|deep|ultradeep]",
|
||||
|
||||
"progress": {
|
||||
"sections_completed": [list of section IDs done],
|
||||
"total_planned_sections": [total count],
|
||||
"word_count_so_far": [current word count],
|
||||
"continuation_count": [which continuation this is, starts at 1]
|
||||
},
|
||||
|
||||
"citations": {
|
||||
"used": [1, 2, 3, ..., N],
|
||||
"next_number": [N+1],
|
||||
"bibliography_entries": [
|
||||
"[1] Full citation entry",
|
||||
"[2] Full citation entry",
|
||||
...
|
||||
]
|
||||
},
|
||||
|
||||
"research_context": {
|
||||
"research_question": "[original question]",
|
||||
"key_themes": ["theme1", "theme2", "theme3"],
|
||||
"main_findings_summary": [
|
||||
"Finding 1: [100-word summary]",
|
||||
"Finding 2: [100-word summary]",
|
||||
...
|
||||
],
|
||||
"narrative_arc": "[Current position in story: beginning/middle/conclusion]"
|
||||
},
|
||||
|
||||
"quality_metrics": {
|
||||
"avg_words_per_finding": [calculated average],
|
||||
"citation_density": [citations per 1000 words],
|
||||
"prose_vs_bullets_ratio": [e.g., "85% prose"],
|
||||
"writing_style": "technical-precise-data-driven"
|
||||
},
|
||||
|
||||
"next_sections": [
|
||||
{"id": N, "type": "finding", "title": "Finding X", "target_words": 1500},
|
||||
{"id": N+1, "type": "synthesis", "title": "Synthesis", "target_words": 1000},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Spawn Continuation Agent**
|
||||
|
||||
Use Task tool with general-purpose agent:
|
||||
|
||||
```
|
||||
Task(
|
||||
subagent_type="general-purpose",
|
||||
description="Continue deep-research report generation",
|
||||
prompt="""
|
||||
CONTINUATION TASK: You are continuing an existing deep-research report.
|
||||
|
||||
CRITICAL INSTRUCTIONS:
|
||||
1. Read continuation state file: ~/.claude/research_output/continuation_state_[report_id].json
|
||||
2. Read existing report to understand context: [file_path from state]
|
||||
3. Read LAST 3 completed sections to understand flow and style
|
||||
4. Load research context: themes, narrative arc, writing style from state
|
||||
5. Continue citation numbering from state.citations.next_number
|
||||
6. Maintain quality metrics from state (avg words, citation density, prose ratio)
|
||||
|
||||
CONTEXT PRESERVATION:
|
||||
- Research question: [from state]
|
||||
- Key themes established: [from state]
|
||||
- Findings so far: [summaries from state]
|
||||
- Narrative position: [from state]
|
||||
- Writing style: [from state]
|
||||
|
||||
YOUR TASK:
|
||||
Generate next batch of sections (stay under 18,000 words):
|
||||
[List next_sections from state]
|
||||
|
||||
Use Write/Edit tools to append to existing file: [file_path]
|
||||
|
||||
QUALITY GATES (verify before each section):
|
||||
- Words per section: Within ±20% of [avg_words_per_finding]
|
||||
- Citation density: Match [citation_density] ±0.5 per 1K words
|
||||
- Prose ratio: Maintain ≥80% prose (not bullets)
|
||||
- Theme alignment: Section ties to key_themes
|
||||
- Style consistency: Match [writing_style]
|
||||
|
||||
After generating sections:
|
||||
- If more sections remain: Update state, spawn next continuation agent
|
||||
- If final sections: Generate complete bibliography, verify report, cleanup state file
|
||||
|
||||
HANDOFF PROTOCOL (if spawning next agent):
|
||||
1. Update continuation_state.json with new progress
|
||||
2. Add new citations to state
|
||||
3. Add summaries of new findings to state
|
||||
4. Update quality metrics
|
||||
5. Spawn next agent with same instructions
|
||||
"""
|
||||
)
|
||||
```
|
||||
|
||||
**Step 3: Report Continuation Status**
|
||||
Tell user:
|
||||
```
|
||||
📊 Report Generation: Part 1 Complete (N sections, X words)
|
||||
🔄 Auto-continuing via spawned agent...
|
||||
Next batch: [section list]
|
||||
Progress: [X%] complete
|
||||
```
|
||||
|
||||
**Phase 8.4: Continuation Agent Quality Protocol**
|
||||
|
||||
When continuation agent starts:
|
||||
|
||||
**Context Loading (CRITICAL):**
|
||||
1. Read continuation_state.json → Load ALL context
|
||||
2. Read existing report file → Review last 3 sections
|
||||
3. Extract patterns:
|
||||
- Sentence structure complexity
|
||||
- Technical terminology used
|
||||
- Citation placement patterns
|
||||
- Paragraph transition style
|
||||
|
||||
**Pre-Generation Checklist:**
|
||||
- [ ] Loaded research context (themes, question, narrative arc)
|
||||
- [ ] Reviewed previous sections for flow
|
||||
- [ ] Loaded citation numbering (start from N+1)
|
||||
- [ ] Loaded quality targets (words, density, style)
|
||||
- [ ] Understand where in narrative arc (beginning/middle/end)
|
||||
|
||||
**Per-Section Generation:**
|
||||
1. Generate section content
|
||||
2. Quality checks:
|
||||
- Word count: Within target ±20%
|
||||
- Citation density: Matches established rate
|
||||
- Prose ratio: ≥80% prose
|
||||
- Theme connection: Ties to key_themes
|
||||
- Style match: Consistent with quality_metrics.writing_style
|
||||
3. If ANY check fails: Regenerate section
|
||||
4. If passes: Write to file, update state
|
||||
|
||||
**Handoff Decision:**
|
||||
- Calculate: Current word count + remaining sections × avg_words_per_section
|
||||
- If total < 18K: Generate all remaining sections + finish
|
||||
- If total > 18K: Generate partial batch, update state, spawn next agent
|
||||
|
||||
**Final Agent Responsibilities:**
|
||||
- Generate final content sections
|
||||
- Generate COMPLETE bibliography using ALL citations from state.citations.bibliography_entries
|
||||
- Read entire assembled report
|
||||
- Run validation: python scripts/validate_report.py --report [path]
|
||||
- Delete continuation_state.json (cleanup)
|
||||
- Report complete to user with metrics
|
||||
|
||||
**Anti-Fatigue Built-In:**
|
||||
Each agent generates manageable chunks (≤18K words), maintaining quality.
|
||||
Context preservation ensures coherence across continuation boundaries.
|
||||
|
||||
**Generate HTML (McKinsey Style)**
|
||||
1. Read McKinsey template from `./templates/mckinsey_report_template.html`
|
||||
2. Extract 3-4 key quantitative metrics from findings for dashboard
|
||||
3. **Use Python script for MD to HTML conversion:**
|
||||
|
||||
```bash
|
||||
cd ~/.claude/skills/deep-research
|
||||
python scripts/md_to_html.py [markdown_report_path]
|
||||
```
|
||||
|
||||
The script returns two parts:
|
||||
- **Part A ({{CONTENT}}):** All sections except Bibliography, properly converted to HTML
|
||||
- **Part B ({{BIBLIOGRAPHY}}):** Bibliography section only, formatted as HTML
|
||||
|
||||
**CRITICAL:** The script handles ALL conversion automatically:
|
||||
- Headers: ## → `<div class="section"><h2 class="section-title">`, ### → `<h3 class="subsection-title">`
|
||||
- Lists: Markdown bullets → `<ul><li>` with proper nesting
|
||||
- Tables: Markdown tables → `<table>` with thead/tbody
|
||||
- Paragraphs: Text wrapped in `<p>` tags
|
||||
- Bold/italic: **text** → `<strong>`, *text* → `<em>`
|
||||
- Citations: [N] preserved for tooltip conversion in step 4
|
||||
|
||||
4. **Add Citation Tooltips (Attribution Gradients):**
|
||||
For each [N] citation in {{CONTENT}} (not bibliography), optionally add interactive tooltips:
|
||||
```html
|
||||
<span class="citation">[N]
|
||||
<span class="citation-tooltip">
|
||||
<div class="tooltip-title">[Source Title]</div>
|
||||
<div class="tooltip-source">[Author/Publisher]</div>
|
||||
<div class="tooltip-claim">
|
||||
<div class="tooltip-claim-label">Supports Claim:</div>
|
||||
[Extract sentence with this citation]
|
||||
</div>
|
||||
</span>
|
||||
</span>
|
||||
```
|
||||
NOTE: This step is optional for speed. Basic [N] citations are sufficient.
|
||||
|
||||
5. Replace placeholders in template:
|
||||
- {{TITLE}} - Report title (extract from first ## heading in MD)
|
||||
- {{DATE}} - Generation date (YYYY-MM-DD format)
|
||||
- {{SOURCE_COUNT}} - Number of unique sources
|
||||
- {{METRICS_DASHBOARD}} - Metrics HTML from step 2
|
||||
- {{CONTENT}} - HTML from Part A (script output)
|
||||
- {{BIBLIOGRAPHY}} - HTML from Part B (script output)
|
||||
|
||||
6. **CRITICAL: NO EMOJIS** - Remove any emoji characters from final HTML
|
||||
|
||||
7. Save to: `[folder]/research_report_[YYYYMMDD]_[slug].html`
|
||||
|
||||
8. **Verify HTML (MANDATORY):**
|
||||
```bash
|
||||
python scripts/verify_html.py --html [html_path] --md [md_path]
|
||||
```
|
||||
- Check passes: Proceed to step 9
|
||||
- Check fails: Fix errors and re-run verification
|
||||
|
||||
9. Open in browser: `open [html_path]`
|
||||
|
||||
**Generate PDF**
|
||||
1. Use Task tool with general-purpose agent
|
||||
2. Invoke generating-pdf skill with markdown as input
|
||||
3. Save to: `[folder]/research_report_[YYYYMMDD]_[slug].pdf`
|
||||
4. PDF will auto-open when complete
|
||||
|
||||
---
|
||||
|
||||
## Output Contract
|
||||
|
||||
**Format:** Comprehensive markdown report following [template](./templates/report_template.md) EXACTLY
|
||||
|
||||
**Required sections (all must be detailed):**
|
||||
- Executive Summary (2-3 concise paragraphs, 50-250 words)
|
||||
- Introduction (2-3 paragraphs: question, scope, methodology, assumptions)
|
||||
- Main Analysis (4-8 findings, each 300-500 words with citations [1], [2], [3])
|
||||
- Synthesis & Insights (500-1000 words: patterns, novel insights, implications)
|
||||
- Limitations & Caveats (2-3 paragraphs: gaps, assumptions, uncertainties)
|
||||
- Recommendations (3-5 immediate actions, 3-5 next steps, 3-5 further research)
|
||||
- **Bibliography (CRITICAL - see rules below)**
|
||||
- Methodology Appendix (2-3 paragraphs: process, sources, verification)
|
||||
|
||||
**Bibliography Requirements (ZERO TOLERANCE - Report is UNUSABLE without complete bibliography):**
|
||||
- ✅ MUST include EVERY citation [N] used in report body (if report has [1]-[50], write all 50 entries)
|
||||
- ✅ Format: [N] Author/Org (Year). "Title". Publication. URL (Retrieved: Date)
|
||||
- ✅ Each entry on its own line, complete with all metadata
|
||||
- ❌ NO placeholders: NEVER use "[8-75] Additional citations", "...continue...", "etc.", "[Continue with sources...]"
|
||||
- ❌ NO ranges: Write [3], [4], [5]... individually, NOT "[3-50]"
|
||||
- ❌ NO truncation: If 30 sources cited, write all 30 entries in full
|
||||
- ⚠️ Validation WILL FAIL if bibliography contains placeholders or missing citations
|
||||
- ⚠️ Report is GARBAGE without complete bibliography - no way to verify claims
|
||||
|
||||
**Strictly Prohibited:**
|
||||
- Placeholder text (TBD, TODO, [citation needed])
|
||||
- Uncited major claims
|
||||
- Broken links
|
||||
- Missing required sections
|
||||
- **Short summaries instead of detailed analysis**
|
||||
- **Vague statements without specific evidence**
|
||||
|
||||
**Writing Standards (Critical):**
|
||||
- **Narrative-driven**: Write in flowing prose with complete sentences that build understanding progressively
|
||||
- **Precision**: Choose each word deliberately - every word must carry intention
|
||||
- **Economy**: Eliminate fluff, unnecessary adjectives, fancy grammar
|
||||
- **Clarity**: Use precise technical terms, avoid ambiguity. Embed exact numbers in sentences, not bullets
|
||||
- **Directness**: State findings clearly without embellishment
|
||||
- **Signal-to-noise**: High information density, respect reader's time
|
||||
- **Bullet discipline**: Use bullets only for distinct lists (products, companies, steps). Default to prose paragraphs
|
||||
- **Examples of precision**:
|
||||
- Bad: "significantly improved outcomes" → Good: "reduced mortality 23% (p<0.01)"
|
||||
- Bad: "several studies suggest" → Good: "5 RCTs (n=1,847) show"
|
||||
- Bad: "potentially beneficial" → Good: "increased biomarker X by 15%"
|
||||
- Bad: "• Market: $2.4B" → Good: "The market reached $2.4 billion in 2023, driven by consumer demand [1]."
|
||||
|
||||
**Quality gates (enforced by validator):**
|
||||
- Minimum 2,000 words (standard mode)
|
||||
- Average credibility score >60/100
|
||||
- 3+ sources per major claim
|
||||
- Clear facts vs. analysis distinction
|
||||
- All sections present and detailed
|
||||
|
||||
---
|
||||
|
||||
## Error Handling & Stop Rules
|
||||
|
||||
**Stop immediately if:**
|
||||
- 2 validation failures on same error → Pause, report, ask user
|
||||
- <5 sources after exhaustive search → Report limitation, request direction
|
||||
- User interrupts/changes scope → Confirm new direction
|
||||
|
||||
**Graceful degradation:**
|
||||
- 5-10 sources → Note in limitations, proceed with extra verification
|
||||
- Time constraint reached → Package partial results, document gaps
|
||||
- High-priority critique issue → Address immediately
|
||||
|
||||
**Error format:**
|
||||
```
|
||||
⚠️ Issue: [Description]
|
||||
📊 Context: [What was attempted]
|
||||
🔍 Tried: [Resolution attempts]
|
||||
💡 Options:
|
||||
1. [Option 1]
|
||||
2. [Option 2]
|
||||
3. [Option 3]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quality Standards (Always Enforce)
|
||||
|
||||
Every report must:
|
||||
- 10+ sources (document if fewer)
|
||||
- 3+ sources per major claim
|
||||
- Executive summary <250 words
|
||||
- Full citations with URLs
|
||||
- Credibility assessment
|
||||
- Limitations section
|
||||
- Methodology documented
|
||||
- No placeholders
|
||||
|
||||
**Priority:** Thoroughness over speed. Quality > speed.
|
||||
|
||||
---
|
||||
|
||||
## Inputs & Assumptions
|
||||
|
||||
**Required:**
|
||||
- Research question (string)
|
||||
|
||||
**Optional:**
|
||||
- Mode (quick/standard/deep/ultradeep)
|
||||
- Time constraints
|
||||
- Required perspectives/sources
|
||||
- Output format
|
||||
|
||||
**Assumptions:**
|
||||
- User requires verified, citation-backed information
|
||||
- 10-50 sources available on topic
|
||||
- Time investment: 5-45 minutes
|
||||
|
||||
---
|
||||
|
||||
## When to Use / NOT Use
|
||||
|
||||
**Use when:**
|
||||
- Comprehensive analysis (10+ sources needed)
|
||||
- Comparing technologies/approaches/strategies
|
||||
- State-of-the-art reviews
|
||||
- Multi-perspective investigation
|
||||
- Technical decisions
|
||||
- Market/trend analysis
|
||||
|
||||
**Do NOT use:**
|
||||
- Simple lookups (use WebSearch)
|
||||
- Debugging (use standard tools)
|
||||
- 1-2 search answers
|
||||
- Time-sensitive quick answers
|
||||
|
||||
---
|
||||
|
||||
## Scripts (Offline, Python stdlib only)
|
||||
|
||||
**Location:** `./scripts/`
|
||||
|
||||
- **research_engine.py** - Orchestration engine
|
||||
- **validate_report.py** - Quality validation (8 checks)
|
||||
- **citation_manager.py** - Citation tracking
|
||||
- **source_evaluator.py** - Credibility scoring (0-100)
|
||||
|
||||
**No external dependencies required.**
|
||||
|
||||
---
|
||||
|
||||
## Progressive References (Load On-Demand)
|
||||
|
||||
**Do not inline these - reference only:**
|
||||
- [Complete Methodology](./reference/methodology.md) - 8-phase details
|
||||
- [Report Template](./templates/report_template.md) - Output structure
|
||||
- [README](./README.md) - Usage docs
|
||||
- [Quick Start](./QUICK_START.md) - Fast reference
|
||||
- [Competitive Analysis](./COMPETITIVE_ANALYSIS.md) - vs OpenAI/Gemini
|
||||
|
||||
**Context Management:** Load files on-demand for current phase only. Do not preload all content.
|
||||
|
||||
---
|
||||
|
||||
<!-- STATIC CONTEXT BLOCK END -->
|
||||
<!-- ⚡ Above content is cacheable (>1024 tokens, static) -->
|
||||
<!-- 📝 Below: Dynamic content (user queries, retrieved data, generated reports) -->
|
||||
<!-- This structure enables 85% latency reduction via prompt caching -->
|
||||
|
||||
---
|
||||
|
||||
## Dynamic Execution Zone
|
||||
|
||||
**User Query Processing:**
|
||||
[User research question will be inserted here during execution]
|
||||
|
||||
**Retrieved Information:**
|
||||
[Search results and sources will be accumulated here]
|
||||
|
||||
**Generated Analysis:**
|
||||
[Findings, synthesis, and report content generated here]
|
||||
|
||||
**Note:** This section remains empty in the skill definition. Content populated during runtime only.
|
||||
476
axhub-make/skills/third-party/deep-research/WORD_PRECISION_AUDIT.md
vendored
Normal file
476
axhub-make/skills/third-party/deep-research/WORD_PRECISION_AUDIT.md
vendored
Normal file
@@ -0,0 +1,476 @@
|
||||
# Word Precision Audit: Deep Research Skill
|
||||
|
||||
**Date:** 2025-11-04
|
||||
**Purpose:** Systematic review of every word in SKILL.md for precision, intention, and clarity
|
||||
|
||||
---
|
||||
|
||||
## Audit Methodology
|
||||
|
||||
**Criteria for precision:**
|
||||
1. **No hedge words** ("reasonably", "generally", "basically", "essentially")
|
||||
2. **No weak verbs** ("can", "may", "might", "should" → use "must", "will", "do")
|
||||
3. **No vague adjectives** ("good", "nice", "reasonable" → use specific criteria)
|
||||
4. **No passive voice** where active is stronger
|
||||
5. **No colloquialisms** in formal directives
|
||||
6. **No double negatives** ("no need to" → "proceed without")
|
||||
7. **No redundancy** (say once, clearly)
|
||||
8. **No ambiguous pronouns** without clear referents
|
||||
|
||||
---
|
||||
|
||||
## Issues Found (14 total)
|
||||
|
||||
### HIGH PRIORITY (8 issues)
|
||||
|
||||
#### Issue #1: "reasonable assumptions" (Lines 54, 58)
|
||||
**Current:**
|
||||
```markdown
|
||||
Proceed with reasonable assumptions.
|
||||
Make reasonable assumptions based on query context.
|
||||
```
|
||||
|
||||
**Problem:** "reasonable" is subjective, vague, creates uncertainty about what's acceptable
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Infer assumptions from query context.
|
||||
Derive assumptions from query signals.
|
||||
```
|
||||
|
||||
**Intention carried:** "reasonable" → permission-seeking, cautious | "infer/derive" → direct action, confident
|
||||
|
||||
---
|
||||
|
||||
#### Issue #2: "genuinely incomprehensible" (Line 61)
|
||||
**Current:**
|
||||
```markdown
|
||||
Query is genuinely incomprehensible
|
||||
```
|
||||
|
||||
**Problem:** "genuinely" is hedge word, weakens the criterion
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Query is incomprehensible
|
||||
```
|
||||
|
||||
**Intention carried:** "genuinely" → doubting, qualifying | removed → clear, definitive
|
||||
|
||||
---
|
||||
|
||||
#### Issue #3: "User can redirect if needed" (Line 64)
|
||||
**Current:**
|
||||
```markdown
|
||||
PROCEED with standard mode. User can redirect if needed.
|
||||
```
|
||||
|
||||
**Problem:** "can" is weak permission, "if needed" is uncertain, both undermine autonomy
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
PROCEED with standard mode. User will redirect if incorrect.
|
||||
```
|
||||
|
||||
**Intention carried:** "can...if needed" → uncertain, permission-seeking | "will...if incorrect" → confident, definitive
|
||||
|
||||
---
|
||||
|
||||
#### Issue #4: "NO need to wait" - double negative (Line 85)
|
||||
**Current:**
|
||||
```markdown
|
||||
NO need to wait for approval - proceed directly to execution
|
||||
```
|
||||
|
||||
**Problem:** Double negative ("NO need") is weaker than direct command, "proceed directly to execution" is wordy
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Proceed without waiting for approval
|
||||
```
|
||||
|
||||
**Intention carried:** "NO need to" → permissive, passive | "Proceed without" → imperative, active
|
||||
|
||||
---
|
||||
|
||||
#### Issue #5: Contraction "Don't" (Line 113)
|
||||
**Current:**
|
||||
```markdown
|
||||
Don't inline everything - use references
|
||||
```
|
||||
|
||||
**Problem:** Contraction in formal directive, less authoritative
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Do not inline everything - reference external files
|
||||
```
|
||||
|
||||
**Intention carried:** "Don't" → casual | "Do not" → formal, authoritative
|
||||
|
||||
---
|
||||
|
||||
#### Issue #6: "ask to proceed" - weak request (Line 229)
|
||||
**Current:**
|
||||
```markdown
|
||||
<5 sources after exhaustive search → Report limitation, ask to proceed
|
||||
```
|
||||
|
||||
**Problem:** "ask to proceed" is weak, implies uncertainty about whether to continue
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
<5 sources after exhaustive search → Report limitation, request direction
|
||||
```
|
||||
|
||||
**Intention carried:** "ask to proceed" → tentative | "request direction" → professional, clear need
|
||||
|
||||
---
|
||||
|
||||
#### Issue #7: "When uncertain" contradicts autonomy (Line 262)
|
||||
**Current:**
|
||||
```markdown
|
||||
**When uncertain:** Be thorough, not fast. Quality > speed.
|
||||
```
|
||||
|
||||
**Problem:** "When uncertain" directly contradicts autonomy principle (line 54 says operate independently), creates confusion about when to be uncertain
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
**Priority:** Thoroughness over speed. Quality > speed.
|
||||
```
|
||||
|
||||
**Intention carried:** "When uncertain" → hesitation, doubt | "Priority" → clear directive, no uncertainty
|
||||
|
||||
---
|
||||
|
||||
#### Issue #8: "acceptable" is passive (Line 280)
|
||||
**Current:**
|
||||
```markdown
|
||||
Extended reasoning acceptable (5-45 min)
|
||||
```
|
||||
|
||||
**Problem:** "acceptable" is passive, permission-seeking, weak
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Time investment: 5-45 minutes
|
||||
```
|
||||
|
||||
**Intention carried:** "acceptable" → asking permission | "investment" → stating fact
|
||||
|
||||
---
|
||||
|
||||
### MEDIUM PRIORITY (6 issues)
|
||||
|
||||
#### Issue #9: "Good autonomous assumptions" - vague judgment (Line 66)
|
||||
**Current:**
|
||||
```markdown
|
||||
**Good autonomous assumptions:**
|
||||
```
|
||||
|
||||
**Problem:** "Good" is vague value judgment without criteria
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
**Default assumptions:**
|
||||
```
|
||||
|
||||
**Intention carried:** "Good" → subjective approval-seeking | "Default" → objective, standard procedure
|
||||
|
||||
---
|
||||
|
||||
#### Issue #10: "Standard+" unclear notation (Lines 96, 101)
|
||||
**Current:**
|
||||
```markdown
|
||||
**Standard+ adds:**
|
||||
**Deep+ adds:**
|
||||
```
|
||||
|
||||
**Problem:** "+" notation is programming jargon, unclear if it means "and above" or "additional to"
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
**Standard/Deep/UltraDeep execute:**
|
||||
**Deep/UltraDeep execute:**
|
||||
```
|
||||
|
||||
**Intention carried:** "+" → ambiguous scope | explicit listing → clear scope
|
||||
|
||||
---
|
||||
|
||||
#### Issue #11: "(optional)" weakens directive (Line 174)
|
||||
**Current:**
|
||||
```markdown
|
||||
4. Next steps (optional)
|
||||
```
|
||||
|
||||
**Problem:** "(optional)" signals uncertainty, weakens the delivery item
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
4. Next steps (if relevant)
|
||||
```
|
||||
OR remove entirely since it's in "Deliver to user" section
|
||||
|
||||
**Intention carried:** "(optional)" → uncertain, dismissible | "(if relevant)" → conditional, purposeful | removed → expected
|
||||
|
||||
---
|
||||
|
||||
#### Issue #12: "Offer:" implies asking permission (Lines 176-179)
|
||||
**Current:**
|
||||
```markdown
|
||||
**Offer:**
|
||||
- Deep-dive any section
|
||||
- Follow-up questions
|
||||
- Alternative formats
|
||||
```
|
||||
|
||||
**Problem:** "Offer" implies asking permission, waiting for response, breaks autonomous flow
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
**Available on request:**
|
||||
- Section deep-dives
|
||||
- Follow-up analysis
|
||||
- Alternative formats
|
||||
```
|
||||
OR remove entirely (user will ask if interested)
|
||||
|
||||
**Intention carried:** "Offer" → salesperson, permission-seeking | "Available on request" → service menu, user-initiated | removed → autonomous
|
||||
|
||||
---
|
||||
|
||||
#### Issue #13: "hit" colloquial (Line 234)
|
||||
**Current:**
|
||||
```markdown
|
||||
Time constraint hit → Package partial results, document gaps
|
||||
```
|
||||
|
||||
**Problem:** "hit" is colloquial, imprecise for technical directive
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Time constraint reached → Package partial results, document gaps
|
||||
```
|
||||
|
||||
**Intention carried:** "hit" → casual, imprecise | "reached" → formal, precise
|
||||
|
||||
---
|
||||
|
||||
#### Issue #14: "explicitly needed" redundant (Line 324)
|
||||
**Current:**
|
||||
```markdown
|
||||
Load these files only when explicitly needed for current phase.
|
||||
```
|
||||
|
||||
**Problem:** "explicitly needed" is redundant - either needed or not, "explicitly" adds no precision
|
||||
|
||||
**Fix:**
|
||||
```markdown
|
||||
Load files on-demand for current phase only.
|
||||
```
|
||||
|
||||
**Intention carried:** "explicitly needed" → overthinking, redundant | "on-demand" → clear technical term
|
||||
|
||||
---
|
||||
|
||||
## Impact Analysis
|
||||
|
||||
### Before Fixes (Current State)
|
||||
|
||||
**Hedge words count:** 4 ("reasonable" ×2, "genuinely", "acceptable")
|
||||
**Weak modal verbs:** 2 ("can redirect", "may")
|
||||
**Passive constructions:** 3 ("can", "acceptable", "optional")
|
||||
**Vague adjectives:** 2 ("good", "reasonable")
|
||||
**Colloquialisms:** 1 ("hit")
|
||||
**Redundancies:** 2 ("explicitly needed", "NO need to")
|
||||
|
||||
**Total weakness indicators:** 14
|
||||
|
||||
### After Fixes (Proposed State)
|
||||
|
||||
**Hedge words count:** 0
|
||||
**Weak modal verbs:** 0
|
||||
**Passive constructions:** 0
|
||||
**Vague adjectives:** 0
|
||||
**Colloquialisms:** 0
|
||||
**Redundancies:** 0
|
||||
|
||||
**Total weakness indicators:** 0
|
||||
|
||||
---
|
||||
|
||||
## Word Intention Analysis
|
||||
|
||||
### Critical Word Replacements
|
||||
|
||||
| Current Word | Unintended Intention | Replacement | Intended Intention |
|
||||
|--------------|---------------------|-------------|-------------------|
|
||||
| reasonable | subjective, cautious | infer/derive | objective, confident |
|
||||
| genuinely | doubting, qualifying | [remove] | certain, definitive |
|
||||
| can | permission-seeking | will | confident expectation |
|
||||
| if needed | uncertain | if incorrect | conditional, clear |
|
||||
| NO need to | passive, permissive | Proceed without | active, imperative |
|
||||
| Don't | casual, conversational | Do not | formal, authoritative |
|
||||
| ask to | tentative, weak | request | professional, clear |
|
||||
| When uncertain | hesitant, contradictory | Priority | directive, unambiguous |
|
||||
| acceptable | permission-seeking | investment | factual, confident |
|
||||
| Good | subjective approval | Default | objective standard |
|
||||
| + | ambiguous, jargon | explicit list | clear, precise |
|
||||
| optional | dismissible, weak | [remove or "if relevant"] | purposeful or expected |
|
||||
| Offer | salesperson, passive | [remove] | autonomous |
|
||||
| hit | casual, imprecise | reached | formal, precise |
|
||||
| explicitly needed | redundant, overthinking | on-demand | technical, concise |
|
||||
|
||||
---
|
||||
|
||||
## Linguistic Precision Principles Applied
|
||||
|
||||
### 1. Imperative Voice for Commands
|
||||
**Before:** "NO need to wait for approval"
|
||||
**After:** "Proceed without waiting for approval"
|
||||
**Principle:** Direct commands > passive permissions
|
||||
|
||||
### 2. Remove Hedge Words
|
||||
**Before:** "genuinely incomprehensible"
|
||||
**After:** "incomprehensible"
|
||||
**Principle:** Qualifiers weaken, removal strengthens
|
||||
|
||||
### 3. Eliminate Subjective Judgments
|
||||
**Before:** "Good autonomous assumptions"
|
||||
**After:** "Default assumptions"
|
||||
**Principle:** Objective standards > vague judgments
|
||||
|
||||
### 4. Active Voice Over Passive
|
||||
**Before:** "Extended reasoning acceptable"
|
||||
**After:** "Time investment: 5-45 minutes"
|
||||
**Principle:** Active assertions > passive permissions
|
||||
|
||||
### 5. Precise Technical Terms
|
||||
**Before:** "Time constraint hit"
|
||||
**After:** "Time constraint reached"
|
||||
**Principle:** Formal precision > colloquial approximation
|
||||
|
||||
### 6. Remove Redundancy
|
||||
**Before:** "explicitly needed"
|
||||
**After:** "on-demand"
|
||||
**Principle:** Say once clearly > repeat with qualifiers
|
||||
|
||||
### 7. Strong Modals
|
||||
**Before:** "User can redirect if needed"
|
||||
**After:** "User will redirect if incorrect"
|
||||
**Principle:** "will" (expectation) > "can" (possibility)
|
||||
|
||||
---
|
||||
|
||||
## Autonomy Language Analysis
|
||||
|
||||
### Contradiction Resolution
|
||||
|
||||
**Problem:** Line 262 "When uncertain" contradicts Line 54 "operates independently"
|
||||
|
||||
**Analysis:**
|
||||
- Line 54 establishes autonomy principle: proceed independently
|
||||
- Line 262 suggests there are times of uncertainty
|
||||
- These create cognitive dissonance: am I uncertain or autonomous?
|
||||
|
||||
**Resolution:**
|
||||
- Replace "When uncertain" with "Priority"
|
||||
- Frame as quality standard, not uncertainty condition
|
||||
- Maintains autonomy while setting quality expectations
|
||||
|
||||
**Result:** No contradiction, clear hierarchy (autonomy + quality priority)
|
||||
|
||||
---
|
||||
|
||||
## Permission-Seeking Language Removal
|
||||
|
||||
### Identified Permission-Seeking Patterns
|
||||
|
||||
1. "reasonable assumptions" → seeking approval for assumption quality
|
||||
2. "can redirect if needed" → seeking permission to proceed
|
||||
3. "NO need to wait" → asking if it's okay to proceed
|
||||
4. "acceptable" → asking if time investment is okay
|
||||
5. "Offer" → asking permission to provide options
|
||||
|
||||
### Replacement Strategy
|
||||
|
||||
Replace all permission-seeking with:
|
||||
- **Assertions:** State facts confidently
|
||||
- **Imperatives:** Give direct commands
|
||||
- **Expectations:** Describe what will happen
|
||||
- **Standards:** Define objective criteria
|
||||
|
||||
---
|
||||
|
||||
## Testing Precision Improvements
|
||||
|
||||
### Scenario 1: Ambiguous Query
|
||||
|
||||
**Before (with weak language):**
|
||||
> "Make reasonable assumptions based on query context. User can redirect if needed."
|
||||
|
||||
**Interpretation:** Unclear what "reasonable" means, "can" suggests permission, "if needed" is vague
|
||||
|
||||
**After (precise language):**
|
||||
> "Infer assumptions from query context. User will redirect if incorrect."
|
||||
|
||||
**Interpretation:** Clear action (infer), confident expectation (will), definite condition (incorrect)
|
||||
|
||||
### Scenario 2: Time Investment
|
||||
|
||||
**Before (passive):**
|
||||
> "Extended reasoning acceptable (5-45 min)"
|
||||
|
||||
**Interpretation:** Sounds like asking permission for time
|
||||
|
||||
**After (assertive):**
|
||||
> "Time investment: 5-45 minutes"
|
||||
|
||||
**Interpretation:** States fact, no permission sought
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Phase 1: HIGH PRIORITY (Autonomy-Critical)
|
||||
Fix Issues #1-8 immediately - these directly impact autonomous operation
|
||||
|
||||
### Phase 2: MEDIUM PRIORITY (Clarity Improvements)
|
||||
Fix Issues #9-14 after Phase 1 - these improve clarity but don't block autonomy
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
After fixes applied:
|
||||
|
||||
- [ ] No hedge words ("basically", "essentially", "generally", "reasonably")
|
||||
- [ ] No weak modals ("can", "may", "might", "could" where "will", "must" fit)
|
||||
- [ ] No passive voice where active is stronger
|
||||
- [ ] No subjective judgments ("good", "nice", "reasonable")
|
||||
- [ ] No colloquialisms in formal directives
|
||||
- [ ] No double negatives ("NO need to")
|
||||
- [ ] No redundancies ("explicitly needed")
|
||||
- [ ] No permission-seeking language
|
||||
- [ ] All commands use imperative voice
|
||||
- [ ] All conditions state clear criteria
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Total issues found:** 14
|
||||
**High priority:** 8 (autonomy-impacting)
|
||||
**Medium priority:** 6 (clarity improvements)
|
||||
|
||||
**Primary problem:** Permission-seeking and hedge language that undermines autonomous operation principle
|
||||
|
||||
**Primary fix:** Replace all permission-seeking with assertions, imperatives, and expectations
|
||||
|
||||
**Expected impact:**
|
||||
- Clearer autonomous behavior (no uncertainty about when to proceed)
|
||||
- Stronger directives (commands not suggestions)
|
||||
- Precise language (every word carries specific intention)
|
||||
- Zero ambiguity about autonomy expectations
|
||||
384
axhub-make/skills/third-party/deep-research/reference/methodology.md
vendored
Normal file
384
axhub-make/skills/third-party/deep-research/reference/methodology.md
vendored
Normal file
@@ -0,0 +1,384 @@
|
||||
# Deep Research Methodology: 8-Phase Pipeline
|
||||
|
||||
## Overview
|
||||
|
||||
This document contains the detailed methodology for conducting deep research. The 8 phases represent a comprehensive approach to gathering, verifying, and synthesizing information from multiple sources.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: SCOPE - Research Framing
|
||||
|
||||
**Objective:** Define research boundaries and success criteria
|
||||
|
||||
**Activities:**
|
||||
1. Decompose the question into core components
|
||||
2. Identify stakeholder perspectives
|
||||
3. Define scope boundaries (what's in/out)
|
||||
4. Establish success criteria
|
||||
5. List key assumptions to validate
|
||||
|
||||
**Ultrathink Application:** Use extended reasoning to explore multiple framings of the question before committing to scope.
|
||||
|
||||
**Output:** Structured scope document with research boundaries
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: PLAN - Strategy Formulation
|
||||
|
||||
**Objective:** Create an intelligent research roadmap
|
||||
|
||||
**Activities:**
|
||||
1. Identify primary and secondary sources
|
||||
2. Map knowledge dependencies (what must be understood first)
|
||||
3. Create search query strategy with variants
|
||||
4. Plan triangulation approach
|
||||
5. Estimate time/effort per phase
|
||||
6. Define quality gates
|
||||
|
||||
**Graph-of-Thoughts:** Branch into multiple potential research paths, then converge on optimal strategy.
|
||||
|
||||
**Output:** Research plan with prioritized investigation paths
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: RETRIEVE - Parallel Information Gathering
|
||||
|
||||
**Objective:** Systematically collect information from multiple sources using parallel execution for maximum speed
|
||||
|
||||
**CRITICAL: Execute ALL searches in parallel using a single message with multiple tool calls**
|
||||
|
||||
### Query Decomposition Strategy
|
||||
|
||||
Before launching searches, decompose the research question into 5-10 independent search angles:
|
||||
|
||||
1. **Core topic (semantic search)** - Meaning-based exploration of main concept
|
||||
2. **Technical details (keyword search)** - Specific terms, APIs, implementations
|
||||
3. **Recent developments (date-filtered)** - What's new in 2024-2025
|
||||
4. **Academic sources (domain-specific)** - Papers, research, formal analysis
|
||||
5. **Alternative perspectives (comparison)** - Competing approaches, criticisms
|
||||
6. **Statistical/data sources** - Quantitative evidence, metrics, benchmarks
|
||||
7. **Industry analysis** - Commercial applications, market trends
|
||||
8. **Critical analysis/limitations** - Known problems, failure modes, edge cases
|
||||
|
||||
### Parallel Execution Protocol
|
||||
|
||||
**Step 1: Launch ALL searches concurrently (single message)**
|
||||
|
||||
**CRITICAL: Use correct tool and parameters to avoid errors**
|
||||
|
||||
Choose ONE search approach per research session:
|
||||
|
||||
**Option A: Use WebSearch (built-in, no MCP required)**
|
||||
- Standard web search with simple query string
|
||||
- Parameters: `query` (required)
|
||||
- Optional: `allowed_domains`, `blocked_domains`
|
||||
- Example: `WebSearch(query="quantum computing 2025")`
|
||||
|
||||
**Option B: Use Exa MCP (if available, more powerful)**
|
||||
- Advanced semantic + keyword search
|
||||
- Tool name: `mcp__Exa__exa_search`
|
||||
- Parameters: `query` (required), `type` (auto/neural/keyword), `num_results`, `start_published_date`, `include_domains`
|
||||
- Example: `mcp__Exa__exa_search(query="quantum computing", type="neural", num_results=10)`
|
||||
|
||||
**NEVER mix parameter styles** - this causes "Invalid tool parameters" errors.
|
||||
|
||||
**Step 2: Spawn parallel deep-dive agents**
|
||||
|
||||
Use Task tool with general-purpose agents (3-5 agents) for:
|
||||
- Academic paper analysis (PDFs, detailed extraction)
|
||||
- Documentation deep dives (technical specs, API docs)
|
||||
- Repository analysis (code examples, implementations)
|
||||
- Specialized domain research (requires multi-step investigation)
|
||||
|
||||
**Example parallel execution (using WebSearch):**
|
||||
```
|
||||
[Single message with multiple tool calls]
|
||||
- WebSearch(query="quantum computing 2025 state of the art")
|
||||
- WebSearch(query="quantum computing limitations challenges")
|
||||
- WebSearch(query="quantum computing commercial applications 2024-2025")
|
||||
- WebSearch(query="quantum computing vs classical comparison")
|
||||
- WebSearch(query="quantum error correction research", allowed_domains=["arxiv.org", "scholar.google.com"])
|
||||
- Task(subagent_type="general-purpose", description="Analyze quantum computing papers", prompt="Deep dive into quantum computing academic papers from 2024-2025, extract key findings and methodologies")
|
||||
- Task(subagent_type="general-purpose", description="Industry analysis", prompt="Analyze quantum computing industry reports and market data, identify commercial applications")
|
||||
- Task(subagent_type="general-purpose", description="Technical challenges", prompt="Extract technical limitations and challenges from quantum computing research")
|
||||
```
|
||||
|
||||
**Example parallel execution (using Exa MCP - if available):**
|
||||
```
|
||||
[Single message with multiple tool calls]
|
||||
- mcp__Exa__exa_search(query="quantum computing state of the art", type="neural", num_results=10, start_published_date="2024-01-01")
|
||||
- mcp__Exa__exa_search(query="quantum computing limitations", type="keyword", num_results=10)
|
||||
- mcp__Exa__exa_search(query="quantum computing commercial", type="auto", num_results=10, start_published_date="2024-01-01")
|
||||
- mcp__Exa__exa_search(query="quantum error correction", type="neural", num_results=10, include_domains=["arxiv.org"])
|
||||
- Task(subagent_type="general-purpose", description="Academic analysis", prompt="Analyze quantum computing academic papers")
|
||||
```
|
||||
|
||||
**Step 3: Collect and organize results**
|
||||
|
||||
As results arrive:
|
||||
1. Extract key passages with source metadata (title, URL, date, credibility)
|
||||
2. Track information gaps that emerge
|
||||
3. Follow promising tangents with additional targeted searches
|
||||
4. Maintain source diversity (mix academic, industry, news, technical docs)
|
||||
5. Monitor for quality threshold (see FFS pattern below)
|
||||
|
||||
### First Finish Search (FFS) Pattern
|
||||
|
||||
**Adaptive completion based on quality threshold:**
|
||||
|
||||
**Quality gate:** Proceed to Phase 4 when FIRST threshold reached:
|
||||
- **Quick mode:** 10+ sources with avg credibility >60/100 OR 2 minutes elapsed
|
||||
- **Standard mode:** 15+ sources with avg credibility >60/100 OR 5 minutes elapsed
|
||||
- **Deep mode:** 25+ sources with avg credibility >70/100 OR 10 minutes elapsed
|
||||
- **UltraDeep mode:** 30+ sources with avg credibility >75/100 OR 15 minutes elapsed
|
||||
|
||||
**Continue background searches:**
|
||||
- If threshold reached early, continue remaining parallel searches in background
|
||||
- Additional sources used in Phase 5 (SYNTHESIZE) for depth and diversity
|
||||
- Allows fast progression without sacrificing thoroughness
|
||||
|
||||
### Quality Standards
|
||||
|
||||
**Source diversity requirements:**
|
||||
- Minimum 3 source types (academic, industry, news, technical docs)
|
||||
- Temporal diversity (mix of recent 2024-2025 + foundational older sources)
|
||||
- Perspective diversity (proponents + critics + neutral analysis)
|
||||
- Geographic diversity (not just US sources)
|
||||
|
||||
**Credibility tracking:**
|
||||
- Score each source 0-100 using source_evaluator.py
|
||||
- Flag low-credibility sources (<40) for additional verification
|
||||
- Prioritize high-credibility sources (>80) for core claims
|
||||
|
||||
**Techniques:**
|
||||
- Use WebSearch for current information (primary tool)
|
||||
- Use WebFetch for deep dives into specific sources (secondary)
|
||||
- Use Exa search (via WebSearch with type="neural") for semantic exploration
|
||||
- Use Grep/Read for local documentation
|
||||
- Execute code for computational analysis (when needed)
|
||||
- Use Task tool to spawn parallel retrieval agents (3-5 agents)
|
||||
|
||||
**Output:** Organized information repository with source tracking, credibility scores, and coverage map
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: TRIANGULATE - Cross-Reference Verification
|
||||
|
||||
**Objective:** Validate information across multiple independent sources
|
||||
|
||||
**Activities:**
|
||||
1. Identify claims requiring verification
|
||||
2. Cross-reference facts across 3+ sources
|
||||
3. Flag contradictions or uncertainties
|
||||
4. Assess source credibility
|
||||
5. Note consensus vs. debate areas
|
||||
6. Document verification status per claim
|
||||
|
||||
**Quality Standards:**
|
||||
- Core claims must have 3+ independent sources
|
||||
- Flag any single-source information
|
||||
- Note recency of information
|
||||
- Identify potential biases
|
||||
|
||||
**Output:** Verified fact base with confidence levels
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: OUTLINE REFINEMENT - Dynamic Evolution (WebWeaver 2025)
|
||||
|
||||
**Objective:** Adapt research direction based on evidence discovered
|
||||
|
||||
**Problem Solved:** Prevents "locked-in" research when evidence points to different conclusions or uncovers more important angles than initially planned.
|
||||
|
||||
**When to Execute:**
|
||||
- **Standard/Deep/UltraDeep modes only** (Quick mode skips this)
|
||||
- After Phase 4 (TRIANGULATE) completes
|
||||
- Before Phase 5 (SYNTHESIZE)
|
||||
|
||||
**Activities:**
|
||||
|
||||
1. **Review Initial Scope vs. Actual Findings**
|
||||
- Compare Phase 1 scope with Phase 3-4 discoveries
|
||||
- Identify unexpected patterns or contradictions
|
||||
- Note underexplored angles that emerged as critical
|
||||
- Flag overexplored areas that proved less important
|
||||
|
||||
2. **Evaluate Outline Adaptation Need**
|
||||
|
||||
**Signals for adaptation (ANY triggers refinement):**
|
||||
- Major findings contradict initial assumptions
|
||||
- Evidence reveals more important angle than originally scoped
|
||||
- Critical subtopic emerged that wasn't in original plan
|
||||
- Original research question was too broad/narrow based on evidence
|
||||
- Sources consistently discuss aspects not in initial outline
|
||||
|
||||
**Signals to keep current outline:**
|
||||
- Evidence aligns with initial scope
|
||||
- All key angles adequately covered
|
||||
- No major gaps or surprises
|
||||
|
||||
3. **Refine Outline (if needed)**
|
||||
|
||||
**Update structure to reflect evidence:**
|
||||
- Add sections for unexpected but important findings
|
||||
- Demote/remove sections with insufficient evidence
|
||||
- Reorder sections based on evidence strength and importance
|
||||
- Adjust scope boundaries based on what's actually discoverable
|
||||
|
||||
**Example adaptation:**
|
||||
```
|
||||
Original outline:
|
||||
1. Introduction
|
||||
2. Technical Architecture
|
||||
3. Performance Benchmarks
|
||||
4. Conclusion
|
||||
|
||||
Refined after Phase 4 (evidence revealed security as critical):
|
||||
1. Introduction
|
||||
2. Technical Architecture
|
||||
3. **Security Vulnerabilities (NEW - major finding)**
|
||||
4. Performance Benchmarks (demoted - less critical than expected)
|
||||
5. **Real-World Failure Modes (NEW - pattern emerged)**
|
||||
6. Synthesis & Recommendations
|
||||
```
|
||||
|
||||
4. **Targeted Gap Filling (if major gaps found)**
|
||||
|
||||
If outline refinement reveals critical knowledge gaps:
|
||||
- Launch 2-3 targeted searches for newly identified angles
|
||||
- Quick retrieval only (don't restart full Phase 3)
|
||||
- Time-box to 2-5 minutes
|
||||
- Update triangulation for new evidence only
|
||||
|
||||
5. **Document Adaptation Rationale**
|
||||
|
||||
Record in methodology appendix:
|
||||
- What changed in outline
|
||||
- Why it changed (evidence-driven reasons)
|
||||
- What additional research was conducted (if any)
|
||||
|
||||
**Quality Standards:**
|
||||
- Adaptation must be evidence-driven (cite specific sources that prompted change)
|
||||
- No more than 50% outline restructuring (if more needed, scope was severely mis scoped)
|
||||
- Retain original research question core (don't drift into different topic entirely)
|
||||
- New sections must have supporting evidence already gathered
|
||||
|
||||
**Output:** Refined outline that accurately reflects evidence landscape, ready for synthesis
|
||||
|
||||
**Anti-Pattern Warning:**
|
||||
- ❌ DON'T adapt outline based on speculation or "what would be interesting"
|
||||
- ❌ DON'T add sections without supporting evidence already in hand
|
||||
- ❌ DON'T completely abandon original research question
|
||||
- ✅ DO adapt when evidence clearly indicates better structure
|
||||
- ✅ DO document rationale for changes
|
||||
- ✅ DO stay within original topic scope
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: SYNTHESIZE - Deep Analysis
|
||||
|
||||
**Objective:** Connect insights and generate novel understanding
|
||||
|
||||
**Activities:**
|
||||
1. Identify patterns across sources
|
||||
2. Map relationships between concepts
|
||||
3. Generate insights beyond source material
|
||||
4. Create conceptual frameworks
|
||||
5. Build argument structures
|
||||
6. Develop evidence hierarchies
|
||||
|
||||
**Ultrathink Integration:** Use extended reasoning to explore non-obvious connections and second-order implications.
|
||||
|
||||
**Output:** Synthesized understanding with insight generation
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: CRITIQUE - Quality Assurance
|
||||
|
||||
**Objective:** Rigorously evaluate research quality
|
||||
|
||||
**Activities:**
|
||||
1. Review for logical consistency
|
||||
2. Check citation completeness
|
||||
3. Identify gaps or weaknesses
|
||||
4. Assess balance and objectivity
|
||||
5. Verify claims against sources
|
||||
6. Test alternative interpretations
|
||||
|
||||
**Red Team Questions:**
|
||||
- What's missing?
|
||||
- What could be wrong?
|
||||
- What alternative explanations exist?
|
||||
- What biases might be present?
|
||||
- What counterfactuals should be considered?
|
||||
|
||||
**Output:** Critique report with improvement recommendations
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: REFINE - Iterative Improvement
|
||||
|
||||
**Objective:** Address gaps and strengthen weak areas
|
||||
|
||||
**Activities:**
|
||||
1. Conduct additional research for gaps
|
||||
2. Strengthen weak arguments
|
||||
3. Add missing perspectives
|
||||
4. Resolve contradictions
|
||||
5. Enhance clarity
|
||||
6. Verify revised content
|
||||
|
||||
**Output:** Strengthened research with addressed deficiencies
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: PACKAGE - Report Generation
|
||||
|
||||
**Objective:** Deliver professional, actionable research
|
||||
|
||||
**Activities:**
|
||||
1. Structure report with clear hierarchy
|
||||
2. Write executive summary
|
||||
3. Develop detailed sections
|
||||
4. Create visualizations (tables, diagrams)
|
||||
5. Compile full bibliography
|
||||
6. Add methodology appendix
|
||||
|
||||
**Output:** Complete research report ready for use
|
||||
|
||||
---
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Graph-of-Thoughts Reasoning
|
||||
|
||||
Rather than linear thinking, branch into multiple reasoning paths:
|
||||
- Explore alternative framings in parallel
|
||||
- Pursue tangential leads that might be relevant
|
||||
- Merge insights from different branches
|
||||
- Backtrack and revise as new information emerges
|
||||
|
||||
### Parallel Agent Deployment
|
||||
|
||||
Use Task tool to spawn sub-agents for:
|
||||
- Parallel source retrieval
|
||||
- Independent verification paths
|
||||
- Competing hypothesis evaluation
|
||||
- Specialized domain analysis
|
||||
|
||||
### Adaptive Depth Control
|
||||
|
||||
Automatically adjust research depth based on:
|
||||
- Information complexity
|
||||
- Source availability
|
||||
- Time constraints
|
||||
- Confidence levels
|
||||
|
||||
### Citation Intelligence
|
||||
|
||||
Smart citation management:
|
||||
- Track provenance of every claim
|
||||
- Link to original sources
|
||||
- Assess source credibility
|
||||
- Handle conflicting sources
|
||||
- Generate proper bibliographies
|
||||
10
axhub-make/skills/third-party/deep-research/requirements.txt
vendored
Normal file
10
axhub-make/skills/third-party/deep-research/requirements.txt
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
# Deep Research Skill Dependencies
|
||||
# These are standard library modules, no external dependencies needed for core functionality
|
||||
|
||||
# Optional: For enhanced features, uncomment if needed
|
||||
# requests>=2.31.0 # For web fetching
|
||||
# beautifulsoup4>=4.12.0 # For HTML parsing
|
||||
# markdownify>=0.11.6 # For HTML to markdown conversion
|
||||
# numpy>=1.24.0 # For statistical analysis
|
||||
# pandas>=2.0.0 # For data analysis
|
||||
# networkx>=3.1 # For knowledge graph analysis
|
||||
177
axhub-make/skills/third-party/deep-research/scripts/citation_manager.py
vendored
Normal file
177
axhub-make/skills/third-party/deep-research/scripts/citation_manager.py
vendored
Normal file
@@ -0,0 +1,177 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Citation Management System
|
||||
Tracks sources, generates citations, and maintains bibliography
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import List, Dict, Optional
|
||||
from datetime import datetime
|
||||
from urllib.parse import urlparse
|
||||
import hashlib
|
||||
|
||||
|
||||
@dataclass
|
||||
class Citation:
|
||||
"""Represents a single citation"""
|
||||
id: str
|
||||
title: str
|
||||
url: str
|
||||
authors: Optional[List[str]] = None
|
||||
publication_date: Optional[str] = None
|
||||
retrieved_date: str = field(default_factory=lambda: datetime.now().strftime('%Y-%m-%d'))
|
||||
source_type: str = "web" # web, academic, documentation, book, paper
|
||||
doi: Optional[str] = None
|
||||
citation_count: int = 0
|
||||
|
||||
def to_apa(self, index: int) -> str:
|
||||
"""Generate APA format citation"""
|
||||
author_str = ""
|
||||
if self.authors:
|
||||
if len(self.authors) == 1:
|
||||
author_str = f"{self.authors[0]}."
|
||||
elif len(self.authors) == 2:
|
||||
author_str = f"{self.authors[0]} & {self.authors[1]}."
|
||||
else:
|
||||
author_str = f"{self.authors[0]} et al."
|
||||
|
||||
date_str = f"({self.publication_date})" if self.publication_date else "(n.d.)"
|
||||
|
||||
return f"[{index}] {author_str} {date_str}. {self.title}. Retrieved {self.retrieved_date}, from {self.url}"
|
||||
|
||||
def to_inline(self, index: int) -> str:
|
||||
"""Generate inline citation [index]"""
|
||||
return f"[{index}]"
|
||||
|
||||
def to_markdown(self, index: int) -> str:
|
||||
"""Generate markdown link format"""
|
||||
return f"[{index}] [{self.title}]({self.url}) (Retrieved: {self.retrieved_date})"
|
||||
|
||||
|
||||
class CitationManager:
|
||||
"""Manages citations and bibliography"""
|
||||
|
||||
def __init__(self):
|
||||
self.citations: Dict[str, Citation] = {}
|
||||
self.citation_order: List[str] = []
|
||||
|
||||
def add_source(
|
||||
self,
|
||||
url: str,
|
||||
title: str,
|
||||
authors: Optional[List[str]] = None,
|
||||
publication_date: Optional[str] = None,
|
||||
source_type: str = "web",
|
||||
doi: Optional[str] = None
|
||||
) -> str:
|
||||
"""Add a source and return its citation ID"""
|
||||
# Generate unique ID based on URL
|
||||
citation_id = hashlib.md5(url.encode()).hexdigest()[:8]
|
||||
|
||||
if citation_id not in self.citations:
|
||||
citation = Citation(
|
||||
id=citation_id,
|
||||
title=title,
|
||||
url=url,
|
||||
authors=authors,
|
||||
publication_date=publication_date,
|
||||
source_type=source_type,
|
||||
doi=doi
|
||||
)
|
||||
self.citations[citation_id] = citation
|
||||
self.citation_order.append(citation_id)
|
||||
|
||||
# Increment citation count
|
||||
self.citations[citation_id].citation_count += 1
|
||||
|
||||
return citation_id
|
||||
|
||||
def get_citation_number(self, citation_id: str) -> Optional[int]:
|
||||
"""Get the citation number for a given ID"""
|
||||
try:
|
||||
return self.citation_order.index(citation_id) + 1
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
def get_inline_citation(self, citation_id: str) -> str:
|
||||
"""Get inline citation marker [n]"""
|
||||
num = self.get_citation_number(citation_id)
|
||||
return f"[{num}]" if num else "[?]"
|
||||
|
||||
def generate_bibliography(self, style: str = "markdown") -> str:
|
||||
"""Generate full bibliography"""
|
||||
if style == "markdown":
|
||||
lines = ["## Bibliography\n"]
|
||||
for i, citation_id in enumerate(self.citation_order, 1):
|
||||
citation = self.citations[citation_id]
|
||||
lines.append(citation.to_markdown(i))
|
||||
return "\n".join(lines)
|
||||
|
||||
elif style == "apa":
|
||||
lines = ["## Bibliography\n"]
|
||||
for i, citation_id in enumerate(self.citation_order, 1):
|
||||
citation = self.citations[citation_id]
|
||||
lines.append(citation.to_apa(i))
|
||||
return "\n".join(lines)
|
||||
|
||||
return "Unsupported citation style"
|
||||
|
||||
def get_statistics(self) -> Dict[str, any]:
|
||||
"""Get citation statistics"""
|
||||
return {
|
||||
'total_sources': len(self.citations),
|
||||
'total_citations': sum(c.citation_count for c in self.citations.values()),
|
||||
'source_types': self._count_by_type(),
|
||||
'most_cited': self._get_most_cited(5),
|
||||
'uncited': self._get_uncited()
|
||||
}
|
||||
|
||||
def _count_by_type(self) -> Dict[str, int]:
|
||||
"""Count sources by type"""
|
||||
counts = {}
|
||||
for citation in self.citations.values():
|
||||
counts[citation.source_type] = counts.get(citation.source_type, 0) + 1
|
||||
return counts
|
||||
|
||||
def _get_most_cited(self, n: int = 5) -> List[tuple]:
|
||||
"""Get most cited sources"""
|
||||
sorted_citations = sorted(
|
||||
self.citations.items(),
|
||||
key=lambda x: x[1].citation_count,
|
||||
reverse=True
|
||||
)
|
||||
return [(self.get_citation_number(cid), c.title, c.citation_count)
|
||||
for cid, c in sorted_citations[:n]]
|
||||
|
||||
def _get_uncited(self) -> List[str]:
|
||||
"""Get sources that were added but never cited"""
|
||||
return [c.title for c in self.citations.values() if c.citation_count == 0]
|
||||
|
||||
def export_to_file(self, filepath: str, style: str = "markdown"):
|
||||
"""Export bibliography to file"""
|
||||
with open(filepath, 'w') as f:
|
||||
f.write(self.generate_bibliography(style))
|
||||
|
||||
|
||||
# Example usage
|
||||
if __name__ == '__main__':
|
||||
manager = CitationManager()
|
||||
|
||||
# Add sources
|
||||
id1 = manager.add_source(
|
||||
url="https://example.com/article1",
|
||||
title="Understanding Deep Research",
|
||||
authors=["Smith, J.", "Johnson, K."],
|
||||
publication_date="2025"
|
||||
)
|
||||
|
||||
id2 = manager.add_source(
|
||||
url="https://example.com/article2",
|
||||
title="AI Research Methods",
|
||||
source_type="academic"
|
||||
)
|
||||
|
||||
# Use citations
|
||||
print(f"Inline citation: {manager.get_inline_citation(id1)}")
|
||||
print(f"\nBibliography:\n{manager.generate_bibliography()}")
|
||||
print(f"\nStatistics:\n{manager.get_statistics()}")
|
||||
330
axhub-make/skills/third-party/deep-research/scripts/md_to_html.py
vendored
Normal file
330
axhub-make/skills/third-party/deep-research/scripts/md_to_html.py
vendored
Normal file
@@ -0,0 +1,330 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Markdown to HTML converter for research reports
|
||||
Properly converts markdown sections to HTML while preserving structure and formatting
|
||||
"""
|
||||
|
||||
import re
|
||||
from typing import Tuple
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def convert_markdown_to_html(markdown_text: str) -> Tuple[str, str]:
|
||||
"""
|
||||
Convert markdown to HTML in two parts: content and bibliography
|
||||
|
||||
Args:
|
||||
markdown_text: Full markdown report text
|
||||
|
||||
Returns:
|
||||
Tuple of (content_html, bibliography_html)
|
||||
"""
|
||||
# Split content and bibliography
|
||||
parts = markdown_text.split('## Bibliography')
|
||||
content_md = parts[0]
|
||||
bibliography_md = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
# Convert content (everything except bibliography)
|
||||
content_html = _convert_content_section(content_md)
|
||||
|
||||
# Convert bibliography separately
|
||||
bibliography_html = _convert_bibliography_section(bibliography_md)
|
||||
|
||||
return content_html, bibliography_html
|
||||
|
||||
|
||||
def _convert_content_section(markdown: str) -> str:
|
||||
"""Convert main content sections to HTML"""
|
||||
html = markdown
|
||||
|
||||
# Remove title and front matter (first ## heading is handled separately)
|
||||
lines = html.split('\n')
|
||||
processed_lines = []
|
||||
skip_until_first_section = True
|
||||
|
||||
for line in lines:
|
||||
# Skip everything until we hit "## Executive Summary" or first major section
|
||||
if skip_until_first_section:
|
||||
if line.startswith('## ') and not line.startswith('### '):
|
||||
skip_until_first_section = False
|
||||
processed_lines.append(line)
|
||||
continue
|
||||
processed_lines.append(line)
|
||||
|
||||
html = '\n'.join(processed_lines)
|
||||
|
||||
# Convert headers
|
||||
# ## Section Title → <div class="section"><h2 class="section-title">Section Title</h2></div>
|
||||
html = re.sub(
|
||||
r'^## (.+)$',
|
||||
r'<div class="section"><h2 class="section-title">\1</h2>',
|
||||
html,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# ### Subsection → <h3 class="subsection-title">Subsection</h3>
|
||||
html = re.sub(
|
||||
r'^### (.+)$',
|
||||
r'<h3 class="subsection-title">\1</h3>',
|
||||
html,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# #### Subsubsection → <h4 class="subsubsection-title">Title</h4>
|
||||
html = re.sub(
|
||||
r'^#### (.+)$',
|
||||
r'<h4 class="subsubsection-title">\1</h4>',
|
||||
html,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# Convert **bold** text
|
||||
html = re.sub(r'\*\*(.+?)\*\*', r'<strong>\1</strong>', html)
|
||||
|
||||
# Convert *italic* text
|
||||
html = re.sub(r'\*(.+?)\*', r'<em>\1</em>', html)
|
||||
|
||||
# Convert inline code `code`
|
||||
html = re.sub(r'`(.+?)`', r'<code>\1</code>', html)
|
||||
|
||||
# Convert unordered lists
|
||||
html = _convert_lists(html)
|
||||
|
||||
# Convert tables
|
||||
html = _convert_tables(html)
|
||||
|
||||
# Convert paragraphs (wrap non-HTML lines in <p> tags)
|
||||
html = _convert_paragraphs(html)
|
||||
|
||||
# Close all open sections
|
||||
html = _close_sections(html)
|
||||
|
||||
# Wrap executive summary if present
|
||||
html = html.replace(
|
||||
'<h2 class="section-title">Executive Summary</h2>',
|
||||
'<div class="executive-summary"><h2 class="section-title">Executive Summary</h2>'
|
||||
)
|
||||
if '<div class="executive-summary">' in html:
|
||||
# Close executive summary at the next section
|
||||
html = html.replace(
|
||||
'</h2>\n<div class="section">',
|
||||
'</h2></div>\n<div class="section">',
|
||||
1
|
||||
)
|
||||
|
||||
return html
|
||||
|
||||
|
||||
def _convert_bibliography_section(markdown: str) -> str:
|
||||
"""Convert bibliography section to HTML"""
|
||||
if not markdown.strip():
|
||||
return ""
|
||||
|
||||
html = markdown
|
||||
|
||||
# Convert each [N] citation to a proper bibliography entry
|
||||
# Look for patterns like [1] Title - URL
|
||||
html = re.sub(
|
||||
r'\[(\d+)\]\s*(.+?)\s*-\s*(https?://[^\s\)]+)',
|
||||
r'<div class="bib-entry"><span class="bib-number">[\1]</span> <a href="\3" target="_blank">\2</a></div>',
|
||||
html
|
||||
)
|
||||
|
||||
# Convert any remaining **bold** sections
|
||||
html = re.sub(r'\*\*(.+?)\*\*', r'<strong>\1</strong>', html)
|
||||
|
||||
# Wrap in bibliography content div
|
||||
html = f'<div class="bibliography-content">{html}</div>'
|
||||
|
||||
return html
|
||||
|
||||
|
||||
def _convert_lists(html: str) -> str:
|
||||
"""Convert markdown lists to HTML lists"""
|
||||
lines = html.split('\n')
|
||||
result = []
|
||||
in_list = False
|
||||
list_level = 0
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
stripped = line.strip()
|
||||
|
||||
# Check for unordered list item
|
||||
if stripped.startswith('- ') or stripped.startswith('* '):
|
||||
if not in_list:
|
||||
result.append('<ul>')
|
||||
in_list = True
|
||||
list_level = len(line) - len(line.lstrip())
|
||||
|
||||
# Get the content after the marker
|
||||
content = stripped[2:]
|
||||
result.append(f'<li>{content}</li>')
|
||||
|
||||
# Check for ordered list item
|
||||
elif re.match(r'^\d+\.\s', stripped):
|
||||
if not in_list:
|
||||
result.append('<ol>')
|
||||
in_list = True
|
||||
list_level = len(line) - len(line.lstrip())
|
||||
|
||||
# Get the content after the number and period
|
||||
content = re.sub(r'^\d+\.\s', '', stripped)
|
||||
result.append(f'<li>{content}</li>')
|
||||
|
||||
else:
|
||||
# Not a list item
|
||||
if in_list:
|
||||
# Check if we're still in the list (indented continuation)
|
||||
current_level = len(line) - len(line.lstrip())
|
||||
if current_level > list_level and stripped:
|
||||
# Continuation of previous list item
|
||||
if result[-1].endswith('</li>'):
|
||||
result[-1] = result[-1][:-5] + ' ' + stripped + '</li>'
|
||||
continue
|
||||
else:
|
||||
# End of list
|
||||
result.append('</ul>' if '<ul>' in '\n'.join(result[-10:]) else '</ol>')
|
||||
in_list = False
|
||||
list_level = 0
|
||||
|
||||
result.append(line)
|
||||
|
||||
# Close any remaining open list
|
||||
if in_list:
|
||||
result.append('</ul>' if '<ul>' in '\n'.join(result[-10:]) else '</ol>')
|
||||
|
||||
return '\n'.join(result)
|
||||
|
||||
|
||||
def _convert_tables(html: str) -> str:
|
||||
"""Convert markdown tables to HTML tables"""
|
||||
lines = html.split('\n')
|
||||
result = []
|
||||
in_table = False
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
if '|' in line and line.strip().startswith('|'):
|
||||
if not in_table:
|
||||
result.append('<table>')
|
||||
in_table = True
|
||||
# This is the header row
|
||||
cells = [cell.strip() for cell in line.split('|')[1:-1]]
|
||||
result.append('<thead><tr>')
|
||||
for cell in cells:
|
||||
result.append(f'<th>{cell}</th>')
|
||||
result.append('</tr></thead>')
|
||||
result.append('<tbody>')
|
||||
elif '---' in line:
|
||||
# Skip separator row
|
||||
continue
|
||||
else:
|
||||
# Data row
|
||||
cells = [cell.strip() for cell in line.split('|')[1:-1]]
|
||||
result.append('<tr>')
|
||||
for cell in cells:
|
||||
result.append(f'<td>{cell}</td>')
|
||||
result.append('</tr>')
|
||||
else:
|
||||
if in_table:
|
||||
result.append('</tbody></table>')
|
||||
in_table = False
|
||||
result.append(line)
|
||||
|
||||
if in_table:
|
||||
result.append('</tbody></table>')
|
||||
|
||||
return '\n'.join(result)
|
||||
|
||||
|
||||
def _convert_paragraphs(html: str) -> str:
|
||||
"""Wrap non-HTML lines in paragraph tags"""
|
||||
lines = html.split('\n')
|
||||
result = []
|
||||
in_paragraph = False
|
||||
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
|
||||
# Skip empty lines
|
||||
if not stripped:
|
||||
if in_paragraph:
|
||||
result.append('</p>')
|
||||
in_paragraph = False
|
||||
result.append(line)
|
||||
continue
|
||||
|
||||
# Skip lines that are already HTML tags
|
||||
if (stripped.startswith('<') and stripped.endswith('>')) or \
|
||||
stripped.startswith('</') or \
|
||||
'<h' in stripped or '<div' in stripped or '<ul' in stripped or \
|
||||
'<ol' in stripped or '<li' in stripped or '<table' in stripped or \
|
||||
'</div>' in stripped or '</ul>' in stripped or '</ol>' in stripped:
|
||||
if in_paragraph:
|
||||
result.append('</p>')
|
||||
in_paragraph = False
|
||||
result.append(line)
|
||||
continue
|
||||
|
||||
# Regular text line - wrap in paragraph
|
||||
if not in_paragraph:
|
||||
result.append('<p>' + line)
|
||||
in_paragraph = True
|
||||
else:
|
||||
result.append(line)
|
||||
|
||||
if in_paragraph:
|
||||
result.append('</p>')
|
||||
|
||||
return '\n'.join(result)
|
||||
|
||||
|
||||
def _close_sections(html: str) -> str:
|
||||
"""Close all open section divs"""
|
||||
# Count open and closed divs
|
||||
open_divs = html.count('<div class="section">')
|
||||
closed_divs = html.count('</div>')
|
||||
|
||||
# Add closing divs for sections
|
||||
# Each section should be closed before the next section starts
|
||||
lines = html.split('\n')
|
||||
result = []
|
||||
section_open = False
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
if '<div class="section">' in line:
|
||||
if section_open:
|
||||
result.append('</div>') # Close previous section
|
||||
section_open = True
|
||||
result.append(line)
|
||||
|
||||
# Close final section if still open
|
||||
if section_open:
|
||||
result.append('</div>')
|
||||
|
||||
return '\n'.join(result)
|
||||
|
||||
|
||||
def main():
|
||||
"""Test the converter with a sample markdown file"""
|
||||
import sys
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python md_to_html.py <markdown_file>")
|
||||
sys.exit(1)
|
||||
|
||||
md_file = Path(sys.argv[1])
|
||||
if not md_file.exists():
|
||||
print(f"Error: File {md_file} not found")
|
||||
sys.exit(1)
|
||||
|
||||
markdown_text = md_file.read_text()
|
||||
content_html, bib_html = convert_markdown_to_html(markdown_text)
|
||||
|
||||
print("=== CONTENT HTML ===")
|
||||
print(content_html[:1000])
|
||||
print("\n=== BIBLIOGRAPHY HTML ===")
|
||||
print(bib_html[:500])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
578
axhub-make/skills/third-party/deep-research/scripts/research_engine.py
vendored
Normal file
578
axhub-make/skills/third-party/deep-research/scripts/research_engine.py
vendored
Normal file
@@ -0,0 +1,578 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Deep Research Engine for Claude Code
|
||||
Orchestrates comprehensive research across multiple sources with verification and synthesis
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Any
|
||||
from dataclasses import dataclass, asdict
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class ResearchPhase(Enum):
|
||||
"""Research pipeline phases"""
|
||||
SCOPE = "scope"
|
||||
PLAN = "plan"
|
||||
RETRIEVE = "retrieve"
|
||||
TRIANGULATE = "triangulate"
|
||||
SYNTHESIZE = "synthesize"
|
||||
CRITIQUE = "critique"
|
||||
REFINE = "refine"
|
||||
PACKAGE = "package"
|
||||
|
||||
|
||||
class ResearchMode(Enum):
|
||||
"""Research depth modes"""
|
||||
QUICK = "quick" # 3 phases: scope, retrieve, package
|
||||
STANDARD = "standard" # 6 phases: skip refine and critique
|
||||
DEEP = "deep" # Full 8 phases
|
||||
ULTRADEEP = "ultradeep" # 8 phases + extended iterations
|
||||
|
||||
|
||||
@dataclass
|
||||
class Source:
|
||||
"""Represents a research source"""
|
||||
url: str
|
||||
title: str
|
||||
snippet: str
|
||||
retrieved_at: str
|
||||
credibility_score: float = 0.0
|
||||
source_type: str = "web" # web, academic, documentation, code
|
||||
verification_status: str = "unverified" # unverified, verified, conflicted
|
||||
|
||||
def to_citation(self, index: int) -> str:
|
||||
"""Generate citation string"""
|
||||
return f"[{index}] {self.title} - {self.url} (Retrieved: {self.retrieved_at})"
|
||||
|
||||
|
||||
@dataclass
|
||||
class ResearchState:
|
||||
"""Maintains research state across phases"""
|
||||
query: str
|
||||
mode: ResearchMode
|
||||
phase: ResearchPhase
|
||||
scope: Dict[str, Any]
|
||||
plan: Dict[str, Any]
|
||||
sources: List[Source]
|
||||
findings: List[Dict[str, Any]]
|
||||
synthesis: Dict[str, Any]
|
||||
critique: Dict[str, Any]
|
||||
report: str
|
||||
metadata: Dict[str, Any]
|
||||
|
||||
def save(self, filepath: Path):
|
||||
"""Save research state to file with retry logic"""
|
||||
max_retries = 3
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
with open(filepath, 'w') as f:
|
||||
json.dump(self._serialize(), f, indent=2)
|
||||
return # Success
|
||||
except (IOError, OSError) as e:
|
||||
if attempt == max_retries - 1:
|
||||
# Final attempt failed
|
||||
raise IOError(f"Failed to save state after {max_retries} attempts: {e}")
|
||||
# Wait with exponential backoff before retry
|
||||
wait_time = (attempt + 1) * 0.5 # 0.5s, 1s, 1.5s
|
||||
time.sleep(wait_time)
|
||||
|
||||
def _serialize(self) -> dict:
|
||||
"""Convert to serializable dict"""
|
||||
return {
|
||||
'query': self.query,
|
||||
'mode': self.mode.value,
|
||||
'phase': self.phase.value,
|
||||
'scope': self.scope,
|
||||
'plan': self.plan,
|
||||
'sources': [asdict(s) for s in self.sources],
|
||||
'findings': self.findings,
|
||||
'synthesis': self.synthesis,
|
||||
'critique': self.critique,
|
||||
'report': self.report,
|
||||
'metadata': self.metadata
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def load(cls, filepath: Path) -> 'ResearchState':
|
||||
"""Load research state from file"""
|
||||
with open(filepath, 'r') as f:
|
||||
data = json.load(f)
|
||||
|
||||
return cls(
|
||||
query=data['query'],
|
||||
mode=ResearchMode(data['mode']),
|
||||
phase=ResearchPhase(data['phase']),
|
||||
scope=data['scope'],
|
||||
plan=data['plan'],
|
||||
sources=[Source(**s) for s in data['sources']],
|
||||
findings=data['findings'],
|
||||
synthesis=data['synthesis'],
|
||||
critique=data['critique'],
|
||||
report=data['report'],
|
||||
metadata=data['metadata']
|
||||
)
|
||||
|
||||
|
||||
class ResearchEngine:
|
||||
"""Main research orchestration engine"""
|
||||
|
||||
def __init__(self, mode: ResearchMode = ResearchMode.STANDARD):
|
||||
self.mode = mode
|
||||
self.state: Optional[ResearchState] = None
|
||||
self.output_dir = Path.home() / ".claude" / "research_output"
|
||||
self.output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def initialize_research(self, query: str) -> ResearchState:
|
||||
"""Initialize new research session"""
|
||||
self.state = ResearchState(
|
||||
query=query,
|
||||
mode=self.mode,
|
||||
phase=ResearchPhase.SCOPE,
|
||||
scope={},
|
||||
plan={},
|
||||
sources=[],
|
||||
findings=[],
|
||||
synthesis={},
|
||||
critique={},
|
||||
report="",
|
||||
metadata={
|
||||
'started_at': datetime.now().isoformat(),
|
||||
'version': '1.0'
|
||||
}
|
||||
)
|
||||
return self.state
|
||||
|
||||
def get_phase_instructions(self, phase: ResearchPhase) -> str:
|
||||
"""Get instructions for current phase"""
|
||||
instructions = {
|
||||
ResearchPhase.SCOPE: """
|
||||
# Phase 1: SCOPE
|
||||
|
||||
Your task: Define research boundaries and success criteria
|
||||
|
||||
## Execute:
|
||||
1. Decompose the question into 3-5 core components
|
||||
2. Identify 2-4 key stakeholder perspectives
|
||||
3. Define what's IN scope and what's OUT of scope
|
||||
4. List 3-5 success criteria for this research
|
||||
5. Document 3-5 assumptions that need validation
|
||||
|
||||
## Output Format:
|
||||
```json
|
||||
{
|
||||
"core_components": ["component1", "component2", ...],
|
||||
"stakeholder_perspectives": ["perspective1", "perspective2", ...],
|
||||
"in_scope": ["item1", "item2", ...],
|
||||
"out_of_scope": ["item1", "item2", ...],
|
||||
"success_criteria": ["criteria1", "criteria2", ...],
|
||||
"assumptions": ["assumption1", "assumption2", ...]
|
||||
}
|
||||
```
|
||||
|
||||
Use extended reasoning to explore multiple framings before finalizing scope.
|
||||
""",
|
||||
ResearchPhase.PLAN: """
|
||||
# Phase 2: PLAN
|
||||
|
||||
Your task: Create intelligent research roadmap
|
||||
|
||||
## Execute:
|
||||
1. Identify 5-10 primary sources to investigate
|
||||
2. List 5-10 secondary/backup sources
|
||||
3. Map knowledge dependencies (what must be understood first)
|
||||
4. Create 10-15 search query variations
|
||||
5. Plan triangulation approach (how to verify claims)
|
||||
6. Define 3-5 quality gates
|
||||
|
||||
## Output Format:
|
||||
```json
|
||||
{
|
||||
"primary_sources": ["source_type1", "source_type2", ...],
|
||||
"secondary_sources": ["source_type1", "source_type2", ...],
|
||||
"knowledge_dependencies": {"concept1": ["prerequisite1", "prerequisite2"], ...},
|
||||
"search_queries": ["query1", "query2", ...],
|
||||
"triangulation_strategy": "description of verification approach",
|
||||
"quality_gates": ["gate1", "gate2", ...]
|
||||
}
|
||||
```
|
||||
|
||||
Use Graph-of-Thoughts: branch into 3-4 potential research paths, evaluate, then converge on optimal strategy.
|
||||
""",
|
||||
ResearchPhase.RETRIEVE: """
|
||||
# Phase 3: RETRIEVE
|
||||
|
||||
Your task: Systematically collect information from multiple sources
|
||||
|
||||
## Execute:
|
||||
1. Use WebSearch with iterative query refinement (minimum 10 searches)
|
||||
2. Use WebFetch to deep-dive into 5-10 most promising sources
|
||||
3. Extract key passages with metadata
|
||||
4. Track information gaps
|
||||
5. Follow 2-3 promising tangents
|
||||
6. Ensure source diversity (different domains, perspectives)
|
||||
|
||||
## Tools to Use:
|
||||
- WebSearch: For current information and broad coverage
|
||||
- WebFetch: For detailed extraction from specific URLs
|
||||
- Grep/Read: For local documentation if relevant
|
||||
- Task: Spawn 2-3 parallel retrieval agents for efficiency
|
||||
|
||||
## Output:
|
||||
Store all sources with metadata. Each source should include:
|
||||
- URL/location
|
||||
- Title
|
||||
- Key excerpts
|
||||
- Relevance score
|
||||
- Source type
|
||||
- Retrieved timestamp
|
||||
|
||||
Aim for 15-30 distinct sources minimum.
|
||||
""",
|
||||
ResearchPhase.TRIANGULATE: """
|
||||
# Phase 4: TRIANGULATE
|
||||
|
||||
Your task: Validate information across multiple independent sources
|
||||
|
||||
## Execute:
|
||||
1. List all major claims from retrieved information
|
||||
2. For each claim, find 3+ independent confirmatory sources
|
||||
3. Flag any contradictions or uncertainties
|
||||
4. Assess source credibility (domain expertise, recency, bias)
|
||||
5. Document consensus areas vs. debate areas
|
||||
6. Mark verification status for each claim
|
||||
|
||||
## Quality Standards:
|
||||
- Core claims MUST have 3+ independent sources
|
||||
- Flag any single-source claims as "unverified"
|
||||
- Note information recency
|
||||
- Identify potential biases
|
||||
|
||||
## Output Format:
|
||||
```json
|
||||
{
|
||||
"verified_claims": [
|
||||
{
|
||||
"claim": "statement",
|
||||
"sources": ["source1", "source2", "source3"],
|
||||
"confidence": "high|medium|low"
|
||||
}
|
||||
],
|
||||
"unverified_claims": [...],
|
||||
"contradictions": [
|
||||
{
|
||||
"topic": "what's contradicted",
|
||||
"viewpoint1": {"claim": "...", "sources": [...]},
|
||||
"viewpoint2": {"claim": "...", "sources": [...]}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
""",
|
||||
ResearchPhase.SYNTHESIZE: """
|
||||
# Phase 5: SYNTHESIZE
|
||||
|
||||
Your task: Connect insights and generate novel understanding
|
||||
|
||||
## Execute:
|
||||
1. Identify 5-10 key patterns across sources
|
||||
2. Map relationships between concepts
|
||||
3. Generate 3-5 insights that go beyond source material
|
||||
4. Create conceptual frameworks or mental models
|
||||
5. Build argument structures
|
||||
6. Develop evidence hierarchies
|
||||
|
||||
## Use Extended Reasoning:
|
||||
- Explore non-obvious connections
|
||||
- Consider second-order implications
|
||||
- Think about what sources might be missing
|
||||
- Generate novel hypotheses
|
||||
|
||||
## Output Format:
|
||||
```json
|
||||
{
|
||||
"patterns": ["pattern1", "pattern2", ...],
|
||||
"concept_relationships": {"concept1": ["related_to1", "related_to2"], ...},
|
||||
"novel_insights": ["insight1", "insight2", ...],
|
||||
"frameworks": ["framework_description1", ...],
|
||||
"key_arguments": [
|
||||
{
|
||||
"argument": "main claim",
|
||||
"supporting_evidence": ["evidence1", "evidence2"],
|
||||
"strength": "strong|moderate|weak"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
""",
|
||||
ResearchPhase.CRITIQUE: """
|
||||
# Phase 6: CRITIQUE
|
||||
|
||||
Your task: Rigorously evaluate research quality
|
||||
|
||||
## Execute Red Team Analysis:
|
||||
1. Check logical consistency
|
||||
2. Verify citation completeness
|
||||
3. Identify gaps or weaknesses
|
||||
4. Assess balance and objectivity
|
||||
5. Test alternative interpretations
|
||||
6. Challenge assumptions
|
||||
|
||||
## Red Team Questions:
|
||||
- What's missing from this research?
|
||||
- What could be wrong?
|
||||
- What alternative explanations exist?
|
||||
- What biases might be present?
|
||||
- What counterfactuals should be considered?
|
||||
- What would a skeptic say?
|
||||
|
||||
## Output Format:
|
||||
```json
|
||||
{
|
||||
"strengths": ["strength1", "strength2", ...],
|
||||
"weaknesses": ["weakness1", "weakness2", ...],
|
||||
"gaps": ["gap1", "gap2", ...],
|
||||
"biases": ["bias1", "bias2", ...],
|
||||
"improvements_needed": [
|
||||
{
|
||||
"issue": "description",
|
||||
"recommendation": "how to fix",
|
||||
"priority": "high|medium|low"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
""",
|
||||
ResearchPhase.REFINE: """
|
||||
# Phase 7: REFINE
|
||||
|
||||
Your task: Address gaps and strengthen weak areas
|
||||
|
||||
## Execute:
|
||||
1. Conduct additional research for identified gaps
|
||||
2. Strengthen weak arguments with more evidence
|
||||
3. Add missing perspectives
|
||||
4. Resolve contradictions where possible
|
||||
5. Enhance clarity and structure
|
||||
6. Verify all revised content
|
||||
|
||||
## Focus On:
|
||||
- High priority improvements from critique
|
||||
- Missing stakeholder perspectives
|
||||
- Weak evidence chains
|
||||
- Unclear explanations
|
||||
|
||||
## Output:
|
||||
Updated findings, sources, and synthesis with improvements documented.
|
||||
""",
|
||||
ResearchPhase.PACKAGE: """
|
||||
# Phase 8: PACKAGE
|
||||
|
||||
Your task: Deliver professional, actionable research report
|
||||
|
||||
## Generate Complete Report:
|
||||
|
||||
```markdown
|
||||
# Research Report: [Topic]
|
||||
|
||||
## Executive Summary
|
||||
[3-5 key findings bullets]
|
||||
[Primary recommendation]
|
||||
[Confidence level: High/Medium/Low]
|
||||
|
||||
## Introduction
|
||||
### Research Question
|
||||
[Original question]
|
||||
|
||||
### Scope & Methodology
|
||||
[What was investigated and how]
|
||||
|
||||
### Key Assumptions
|
||||
[Important assumptions made]
|
||||
|
||||
## Main Analysis
|
||||
|
||||
### Finding 1: [Title]
|
||||
[Detailed explanation with evidence]
|
||||
[Citations: [1], [2], [3]]
|
||||
|
||||
### Finding 2: [Title]
|
||||
[Detailed explanation with evidence]
|
||||
[Citations: [4], [5], [6]]
|
||||
|
||||
[Continue for all findings...]
|
||||
|
||||
## Synthesis & Insights
|
||||
[Patterns and connections]
|
||||
[Novel insights]
|
||||
[Implications]
|
||||
|
||||
## Limitations & Caveats
|
||||
[Known gaps]
|
||||
[Assumptions]
|
||||
[Areas of uncertainty]
|
||||
|
||||
## Recommendations
|
||||
[Action items]
|
||||
[Next steps]
|
||||
[Further research needs]
|
||||
|
||||
## Bibliography
|
||||
[1] Source 1 full citation
|
||||
[2] Source 2 full citation
|
||||
...
|
||||
|
||||
## Appendix: Methodology
|
||||
[Research process]
|
||||
[Sources consulted]
|
||||
[Verification approach]
|
||||
```
|
||||
|
||||
Save report to file with timestamp.
|
||||
"""
|
||||
}
|
||||
|
||||
return instructions.get(phase, "No instructions available for this phase")
|
||||
|
||||
def execute_phase(self, phase: ResearchPhase) -> Dict[str, Any]:
|
||||
"""Execute a research phase"""
|
||||
print(f"\n{'='*80}")
|
||||
print(f"PHASE {phase.value.upper()}: Starting...")
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
instructions = self.get_phase_instructions(phase)
|
||||
print(instructions)
|
||||
|
||||
# In real usage, Claude will execute these instructions
|
||||
# This returns a structured result that Claude should populate
|
||||
result = {
|
||||
'phase': phase.value,
|
||||
'status': 'instructions_displayed',
|
||||
'timestamp': datetime.now().isoformat()
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
def run_pipeline(self, query: str) -> str:
|
||||
"""Run complete research pipeline"""
|
||||
print(f"\n{'#'*80}")
|
||||
print(f"# DEEP RESEARCH ENGINE")
|
||||
print(f"# Query: {query}")
|
||||
print(f"# Mode: {self.mode.value}")
|
||||
print(f"{'#'*80}\n")
|
||||
|
||||
# Initialize research
|
||||
self.initialize_research(query)
|
||||
|
||||
# Determine phases based on mode
|
||||
phases = self._get_phases_for_mode()
|
||||
|
||||
# Execute each phase
|
||||
for phase in phases:
|
||||
self.state.phase = phase
|
||||
result = self.execute_phase(phase)
|
||||
|
||||
# Save state after each phase
|
||||
state_file = self.output_dir / f"research_state_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
|
||||
self.state.save(state_file)
|
||||
print(f"\n✓ Phase {phase.value} complete. State saved to: {state_file}\n")
|
||||
|
||||
# Generate report path
|
||||
report_file = self.output_dir / f"research_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"RESEARCH PIPELINE COMPLETE")
|
||||
print(f"Report will be saved to: {report_file}")
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
return str(report_file)
|
||||
|
||||
def _get_phases_for_mode(self) -> List[ResearchPhase]:
|
||||
"""Get phases based on research mode"""
|
||||
if self.mode == ResearchMode.QUICK:
|
||||
return [
|
||||
ResearchPhase.SCOPE,
|
||||
ResearchPhase.RETRIEVE,
|
||||
ResearchPhase.PACKAGE
|
||||
]
|
||||
elif self.mode == ResearchMode.STANDARD:
|
||||
return [
|
||||
ResearchPhase.SCOPE,
|
||||
ResearchPhase.PLAN,
|
||||
ResearchPhase.RETRIEVE,
|
||||
ResearchPhase.TRIANGULATE,
|
||||
ResearchPhase.SYNTHESIZE,
|
||||
ResearchPhase.PACKAGE
|
||||
]
|
||||
elif self.mode == ResearchMode.DEEP:
|
||||
return list(ResearchPhase)
|
||||
elif self.mode == ResearchMode.ULTRADEEP:
|
||||
# In ultradeep, we might iterate some phases
|
||||
return list(ResearchPhase)
|
||||
|
||||
return list(ResearchPhase)
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Deep Research Engine for Claude Code",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python research_engine.py --query "state of quantum computing 2025" --mode deep
|
||||
python research_engine.py --query "PostgreSQL vs Supabase comparison" --mode standard
|
||||
python research_engine.py -q "longevity biotech funding trends" -m ultradeep
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--query', '-q',
|
||||
type=str,
|
||||
required=True,
|
||||
help='Research question or topic'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mode', '-m',
|
||||
type=str,
|
||||
choices=['quick', 'standard', 'deep', 'ultradeep'],
|
||||
default='standard',
|
||||
help='Research depth mode (default: standard)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--resume',
|
||||
type=str,
|
||||
help='Resume from saved state file'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Initialize engine
|
||||
mode = ResearchMode(args.mode)
|
||||
engine = ResearchEngine(mode=mode)
|
||||
|
||||
if args.resume:
|
||||
# Load previous state
|
||||
state_file = Path(args.resume)
|
||||
if not state_file.exists():
|
||||
print(f"Error: State file not found: {state_file}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
engine.state = ResearchState.load(state_file)
|
||||
print(f"Resumed research from: {state_file}")
|
||||
|
||||
# Run pipeline
|
||||
report_path = engine.run_pipeline(args.query)
|
||||
|
||||
print(f"\nResearch complete! Report path: {report_path}")
|
||||
print(f"\nNow Claude should execute each phase using the displayed instructions.")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
292
axhub-make/skills/third-party/deep-research/scripts/source_evaluator.py
vendored
Normal file
292
axhub-make/skills/third-party/deep-research/scripts/source_evaluator.py
vendored
Normal file
@@ -0,0 +1,292 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Source Credibility Evaluator
|
||||
Assesses source quality, credibility, and potential biases
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Dict, Optional
|
||||
from urllib.parse import urlparse
|
||||
from datetime import datetime, timedelta
|
||||
import re
|
||||
|
||||
|
||||
@dataclass
|
||||
class CredibilityScore:
|
||||
"""Represents source credibility assessment"""
|
||||
overall_score: float # 0-100
|
||||
domain_authority: float # 0-100
|
||||
recency: float # 0-100
|
||||
expertise: float # 0-100
|
||||
bias_score: float # 0-100 (higher = more neutral)
|
||||
factors: Dict[str, str]
|
||||
recommendation: str # "high_trust", "moderate_trust", "low_trust", "verify"
|
||||
|
||||
|
||||
class SourceEvaluator:
|
||||
"""Evaluates source credibility and quality"""
|
||||
|
||||
# Domain reputation tiers
|
||||
HIGH_AUTHORITY_DOMAINS = {
|
||||
# Academic & Research
|
||||
'arxiv.org', 'nature.com', 'science.org', 'cell.com', 'nejm.org',
|
||||
'thelancet.com', 'springer.com', 'sciencedirect.com', 'plos.org',
|
||||
'ieee.org', 'acm.org', 'pubmed.ncbi.nlm.nih.gov',
|
||||
|
||||
# Government & International Organizations
|
||||
'nih.gov', 'cdc.gov', 'who.int', 'fda.gov', 'nasa.gov',
|
||||
'gov.uk', 'europa.eu', 'un.org',
|
||||
|
||||
# Established Tech Documentation
|
||||
'docs.python.org', 'developer.mozilla.org', 'docs.microsoft.com',
|
||||
'cloud.google.com', 'aws.amazon.com', 'kubernetes.io',
|
||||
|
||||
# Reputable News (Fact-check verified)
|
||||
'reuters.com', 'apnews.com', 'bbc.com', 'economist.com',
|
||||
'nature.com/news', 'scientificamerican.com'
|
||||
}
|
||||
|
||||
MODERATE_AUTHORITY_DOMAINS = {
|
||||
# Tech News & Analysis
|
||||
'techcrunch.com', 'theverge.com', 'arstechnica.com', 'wired.com',
|
||||
'zdnet.com', 'cnet.com',
|
||||
|
||||
# Industry Publications
|
||||
'forbes.com', 'bloomberg.com', 'wsj.com', 'ft.com',
|
||||
|
||||
# Educational
|
||||
'wikipedia.org', 'britannica.com', 'khanacademy.org',
|
||||
|
||||
# Tech Blogs (established)
|
||||
'medium.com', 'dev.to', 'stackoverflow.com', 'github.com'
|
||||
}
|
||||
|
||||
LOW_AUTHORITY_INDICATORS = [
|
||||
'blogspot.com', 'wordpress.com', 'wix.com', 'substack.com'
|
||||
]
|
||||
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def evaluate_source(
|
||||
self,
|
||||
url: str,
|
||||
title: str,
|
||||
content: Optional[str] = None,
|
||||
publication_date: Optional[str] = None,
|
||||
author: Optional[str] = None
|
||||
) -> CredibilityScore:
|
||||
"""Evaluate source credibility"""
|
||||
|
||||
domain = self._extract_domain(url)
|
||||
|
||||
# Calculate component scores
|
||||
domain_score = self._evaluate_domain_authority(domain)
|
||||
recency_score = self._evaluate_recency(publication_date)
|
||||
expertise_score = self._evaluate_expertise(domain, title, author)
|
||||
bias_score = self._evaluate_bias(domain, title, content)
|
||||
|
||||
# Calculate overall score (weighted average)
|
||||
overall = (
|
||||
domain_score * 0.35 +
|
||||
recency_score * 0.20 +
|
||||
expertise_score * 0.25 +
|
||||
bias_score * 0.20
|
||||
)
|
||||
|
||||
# Determine factors
|
||||
factors = self._identify_factors(
|
||||
domain, domain_score, recency_score, expertise_score, bias_score
|
||||
)
|
||||
|
||||
# Generate recommendation
|
||||
recommendation = self._generate_recommendation(overall)
|
||||
|
||||
return CredibilityScore(
|
||||
overall_score=round(overall, 2),
|
||||
domain_authority=round(domain_score, 2),
|
||||
recency=round(recency_score, 2),
|
||||
expertise=round(expertise_score, 2),
|
||||
bias_score=round(bias_score, 2),
|
||||
factors=factors,
|
||||
recommendation=recommendation
|
||||
)
|
||||
|
||||
def _extract_domain(self, url: str) -> str:
|
||||
"""Extract domain from URL"""
|
||||
parsed = urlparse(url)
|
||||
domain = parsed.netloc.lower()
|
||||
# Remove www prefix
|
||||
domain = domain.replace('www.', '')
|
||||
return domain
|
||||
|
||||
def _evaluate_domain_authority(self, domain: str) -> float:
|
||||
"""Evaluate domain authority (0-100)"""
|
||||
if domain in self.HIGH_AUTHORITY_DOMAINS:
|
||||
return 90.0
|
||||
elif domain in self.MODERATE_AUTHORITY_DOMAINS:
|
||||
return 70.0
|
||||
elif any(indicator in domain for indicator in self.LOW_AUTHORITY_INDICATORS):
|
||||
return 40.0
|
||||
else:
|
||||
# Unknown domain - moderate skepticism
|
||||
return 55.0
|
||||
|
||||
def _evaluate_recency(self, publication_date: Optional[str]) -> float:
|
||||
"""Evaluate information recency (0-100)"""
|
||||
if not publication_date:
|
||||
return 50.0 # Unknown date
|
||||
|
||||
try:
|
||||
pub_date = datetime.fromisoformat(publication_date.replace('Z', '+00:00'))
|
||||
age = datetime.now() - pub_date
|
||||
|
||||
# Recency scoring
|
||||
if age < timedelta(days=90): # < 3 months
|
||||
return 100.0
|
||||
elif age < timedelta(days=365): # < 1 year
|
||||
return 85.0
|
||||
elif age < timedelta(days=730): # < 2 years
|
||||
return 70.0
|
||||
elif age < timedelta(days=1825): # < 5 years
|
||||
return 50.0
|
||||
else:
|
||||
return 30.0
|
||||
|
||||
except Exception:
|
||||
return 50.0
|
||||
|
||||
def _evaluate_expertise(
|
||||
self,
|
||||
domain: str,
|
||||
title: str,
|
||||
author: Optional[str]
|
||||
) -> float:
|
||||
"""Evaluate source expertise (0-100)"""
|
||||
score = 50.0
|
||||
|
||||
# Academic/research domains get high expertise
|
||||
if any(d in domain for d in ['arxiv', 'nature', 'science', 'ieee', 'acm']):
|
||||
score += 30
|
||||
|
||||
# Government/official sources
|
||||
if '.gov' in domain or 'who.int' in domain:
|
||||
score += 25
|
||||
|
||||
# Technical documentation
|
||||
if 'docs.' in domain or 'documentation' in title.lower():
|
||||
score += 20
|
||||
|
||||
# Author credentials (if available)
|
||||
if author:
|
||||
if any(title in author.lower() for title in ['dr.', 'phd', 'professor']):
|
||||
score += 15
|
||||
|
||||
return min(score, 100.0)
|
||||
|
||||
def _evaluate_bias(
|
||||
self,
|
||||
domain: str,
|
||||
title: str,
|
||||
content: Optional[str]
|
||||
) -> float:
|
||||
"""Evaluate potential bias (0-100, higher = more neutral)"""
|
||||
score = 70.0 # Start neutral
|
||||
|
||||
# Check for sensationalism in title
|
||||
sensational_indicators = [
|
||||
'!', 'shocking', 'unbelievable', 'you won\'t believe',
|
||||
'secret', 'they don\'t want you to know'
|
||||
]
|
||||
title_lower = title.lower()
|
||||
if any(indicator in title_lower for indicator in sensational_indicators):
|
||||
score -= 20
|
||||
|
||||
# Academic sources are typically less biased
|
||||
if any(d in domain for d in ['arxiv', 'nature', 'science', 'ieee']):
|
||||
score += 20
|
||||
|
||||
# Check for balance in content (if available)
|
||||
if content:
|
||||
# Look for balanced language
|
||||
balanced_indicators = ['however', 'although', 'on the other hand', 'critics argue']
|
||||
if any(indicator in content.lower() for indicator in balanced_indicators):
|
||||
score += 10
|
||||
|
||||
return min(max(score, 0), 100.0)
|
||||
|
||||
def _identify_factors(
|
||||
self,
|
||||
domain: str,
|
||||
domain_score: float,
|
||||
recency_score: float,
|
||||
expertise_score: float,
|
||||
bias_score: float
|
||||
) -> Dict[str, str]:
|
||||
"""Identify key credibility factors"""
|
||||
factors = {}
|
||||
|
||||
if domain_score >= 85:
|
||||
factors['domain'] = "High authority domain"
|
||||
elif domain_score <= 45:
|
||||
factors['domain'] = "Low authority domain - verify claims"
|
||||
|
||||
if recency_score >= 85:
|
||||
factors['recency'] = "Recent information"
|
||||
elif recency_score <= 40:
|
||||
factors['recency'] = "Outdated information - verify currency"
|
||||
|
||||
if expertise_score >= 80:
|
||||
factors['expertise'] = "Expert source"
|
||||
elif expertise_score <= 45:
|
||||
factors['expertise'] = "Limited expertise indicators"
|
||||
|
||||
if bias_score >= 80:
|
||||
factors['bias'] = "Balanced perspective"
|
||||
elif bias_score <= 50:
|
||||
factors['bias'] = "Potential bias detected"
|
||||
|
||||
return factors
|
||||
|
||||
def _generate_recommendation(self, overall_score: float) -> str:
|
||||
"""Generate trust recommendation"""
|
||||
if overall_score >= 80:
|
||||
return "high_trust"
|
||||
elif overall_score >= 60:
|
||||
return "moderate_trust"
|
||||
elif overall_score >= 40:
|
||||
return "low_trust"
|
||||
else:
|
||||
return "verify"
|
||||
|
||||
|
||||
# Example usage
|
||||
if __name__ == '__main__':
|
||||
evaluator = SourceEvaluator()
|
||||
|
||||
# Test sources
|
||||
test_sources = [
|
||||
{
|
||||
'url': 'https://www.nature.com/articles/s41586-2025-12345',
|
||||
'title': 'Breakthrough in Quantum Computing',
|
||||
'publication_date': '2025-10-15'
|
||||
},
|
||||
{
|
||||
'url': 'https://someblog.wordpress.com/shocking-discovery',
|
||||
'title': 'SHOCKING! You Won\'t Believe This Discovery!',
|
||||
'publication_date': '2020-01-01'
|
||||
},
|
||||
{
|
||||
'url': 'https://docs.python.org/3/library/asyncio.html',
|
||||
'title': 'asyncio — Asynchronous I/O',
|
||||
'publication_date': '2025-11-01'
|
||||
}
|
||||
]
|
||||
|
||||
for source in test_sources:
|
||||
score = evaluator.evaluate_source(**source)
|
||||
print(f"\nSource: {source['title']}")
|
||||
print(f"URL: {source['url']}")
|
||||
print(f"Overall Score: {score.overall_score}/100")
|
||||
print(f"Recommendation: {score.recommendation}")
|
||||
print(f"Factors: {score.factors}")
|
||||
354
axhub-make/skills/third-party/deep-research/scripts/validate_report.py
vendored
Normal file
354
axhub-make/skills/third-party/deep-research/scripts/validate_report.py
vendored
Normal file
@@ -0,0 +1,354 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Report Validation Script
|
||||
Ensures research reports meet quality standards before delivery
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Tuple, Dict
|
||||
|
||||
|
||||
class ReportValidator:
|
||||
"""Validates research report quality"""
|
||||
|
||||
def __init__(self, report_path: Path):
|
||||
self.report_path = report_path
|
||||
self.content = self._read_report()
|
||||
self.errors: List[str] = []
|
||||
self.warnings: List[str] = []
|
||||
|
||||
def _read_report(self) -> str:
|
||||
"""Read report file"""
|
||||
try:
|
||||
with open(self.report_path, 'r', encoding='utf-8') as f:
|
||||
return f.read()
|
||||
except Exception as e:
|
||||
print(f"❌ ERROR: Cannot read report: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
def validate(self) -> bool:
|
||||
"""Run all validation checks"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"VALIDATING REPORT: {self.report_path.name}")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
checks = [
|
||||
("Executive Summary", self._check_executive_summary),
|
||||
("Required Sections", self._check_required_sections),
|
||||
("Citations", self._check_citations),
|
||||
("Bibliography", self._check_bibliography),
|
||||
("Placeholder Text", self._check_placeholders),
|
||||
("Content Truncation", self._check_content_truncation),
|
||||
("Word Count", self._check_word_count),
|
||||
("Source Count", self._check_source_count),
|
||||
("Broken Links", self._check_broken_references),
|
||||
]
|
||||
|
||||
for check_name, check_func in checks:
|
||||
print(f"⏳ Checking: {check_name}...", end=" ")
|
||||
passed = check_func()
|
||||
if passed:
|
||||
print("✅ PASS")
|
||||
else:
|
||||
print("❌ FAIL")
|
||||
|
||||
self._print_summary()
|
||||
|
||||
return len(self.errors) == 0
|
||||
|
||||
def _check_executive_summary(self) -> bool:
|
||||
"""Check executive summary exists and is under 250 words"""
|
||||
pattern = r'## Executive Summary(.*?)(?=##|\Z)'
|
||||
match = re.search(pattern, self.content, re.DOTALL | re.IGNORECASE)
|
||||
|
||||
if not match:
|
||||
self.errors.append("Missing 'Executive Summary' section")
|
||||
return False
|
||||
|
||||
summary = match.group(1).strip()
|
||||
word_count = len(summary.split())
|
||||
|
||||
if word_count > 250:
|
||||
self.warnings.append(f"Executive summary too long: {word_count} words (should be ≤250)")
|
||||
|
||||
if word_count < 50:
|
||||
self.warnings.append(f"Executive summary too short: {word_count} words (should be ≥50)")
|
||||
|
||||
return True
|
||||
|
||||
def _check_required_sections(self) -> bool:
|
||||
"""Check all required sections are present"""
|
||||
required = [
|
||||
"Executive Summary",
|
||||
"Introduction",
|
||||
"Main Analysis",
|
||||
"Synthesis",
|
||||
"Limitations",
|
||||
"Recommendations",
|
||||
"Bibliography",
|
||||
"Methodology"
|
||||
]
|
||||
|
||||
# Recommended sections (warnings if missing, not errors)
|
||||
recommended = [
|
||||
"Counterevidence Register",
|
||||
"Claims-Evidence Table"
|
||||
]
|
||||
|
||||
missing = []
|
||||
for section in required:
|
||||
if not re.search(rf'##.*{section}', self.content, re.IGNORECASE):
|
||||
missing.append(section)
|
||||
|
||||
if missing:
|
||||
self.errors.append(f"Missing sections: {', '.join(missing)}")
|
||||
return False
|
||||
|
||||
# Check recommended sections (warnings only)
|
||||
missing_recommended = []
|
||||
for section in recommended:
|
||||
if not re.search(rf'##.*{section}', self.content, re.IGNORECASE):
|
||||
missing_recommended.append(section)
|
||||
|
||||
if missing_recommended:
|
||||
self.warnings.append(f"Missing recommended sections (for academic rigor): {', '.join(missing_recommended)}")
|
||||
|
||||
return True
|
||||
|
||||
def _check_citations(self) -> bool:
|
||||
"""Check citation format and presence"""
|
||||
# Find all citation references [1], [2], etc.
|
||||
citations = re.findall(r'\[(\d+)\]', self.content)
|
||||
|
||||
if not citations:
|
||||
self.errors.append("No citations found in report")
|
||||
return False
|
||||
|
||||
unique_citations = set(citations)
|
||||
|
||||
if len(unique_citations) < 10:
|
||||
self.warnings.append(f"Only {len(unique_citations)} unique sources cited (recommended: ≥10)")
|
||||
|
||||
# Check for consecutive citation numbers
|
||||
citation_nums = sorted([int(c) for c in unique_citations])
|
||||
if citation_nums:
|
||||
max_citation = max(citation_nums)
|
||||
expected = set(range(1, max_citation + 1))
|
||||
missing = expected - set(citation_nums)
|
||||
|
||||
if missing:
|
||||
self.warnings.append(f"Non-consecutive citation numbers, missing: {sorted(missing)}")
|
||||
|
||||
return True
|
||||
|
||||
def _check_bibliography(self) -> bool:
|
||||
"""Check bibliography exists, matches citations, and has no truncation placeholders"""
|
||||
pattern = r'## Bibliography(.*?)(?=##|\Z)'
|
||||
match = re.search(pattern, self.content, re.DOTALL | re.IGNORECASE)
|
||||
|
||||
if not match:
|
||||
self.errors.append("Missing 'Bibliography' section")
|
||||
return False
|
||||
|
||||
bib_section = match.group(1)
|
||||
|
||||
# CRITICAL: Check for truncation placeholders (2025 CiteGuard enhancement)
|
||||
truncation_patterns = [
|
||||
(r'\[\d+-\d+\]', 'Citation range (e.g., [8-75])'),
|
||||
(r'Additional.*citations', 'Phrase "Additional citations"'),
|
||||
(r'would be included', 'Phrase "would be included"'),
|
||||
(r'\[\.\.\.continue', 'Pattern "[...continue"'),
|
||||
(r'\[Continue with', 'Pattern "[Continue with"'),
|
||||
(r'etc\.(?!\w)', 'Standalone "etc."'),
|
||||
(r'and so on', 'Phrase "and so on"'),
|
||||
]
|
||||
|
||||
for pattern_re, description in truncation_patterns:
|
||||
if re.search(pattern_re, bib_section, re.IGNORECASE):
|
||||
self.errors.append(f"⚠️ CRITICAL: Bibliography contains truncation placeholder: {description}")
|
||||
self.errors.append(f" This makes the report UNUSABLE - complete bibliography required")
|
||||
return False
|
||||
|
||||
# Count bibliography entries [1], [2], etc.
|
||||
bib_entries = re.findall(r'^\[(\d+)\]', bib_section, re.MULTILINE)
|
||||
|
||||
if not bib_entries:
|
||||
self.errors.append("Bibliography has no entries")
|
||||
return False
|
||||
|
||||
# Check citation number continuity (no gaps)
|
||||
bib_nums = sorted([int(n) for n in bib_entries])
|
||||
if bib_nums:
|
||||
expected = list(range(1, bib_nums[-1] + 1))
|
||||
actual = bib_nums
|
||||
missing = [n for n in expected if n not in actual]
|
||||
if missing:
|
||||
self.errors.append(f"Bibliography has gaps in numbering: missing {missing}")
|
||||
return False
|
||||
|
||||
# Find citations in text
|
||||
text_citations = set(re.findall(r'\[(\d+)\]', self.content))
|
||||
bib_citations = set(bib_entries)
|
||||
|
||||
# Check all citations have bibliography entries
|
||||
missing_in_bib = text_citations - bib_citations
|
||||
if missing_in_bib:
|
||||
self.errors.append(f"Citations missing from bibliography: {sorted(missing_in_bib)}")
|
||||
return False
|
||||
|
||||
# Check for unused bibliography entries
|
||||
unused = bib_citations - text_citations
|
||||
if unused:
|
||||
self.warnings.append(f"Unused bibliography entries: {sorted(unused)}")
|
||||
|
||||
return True
|
||||
|
||||
def _check_placeholders(self) -> bool:
|
||||
"""Check for placeholder text that shouldn't be in final report"""
|
||||
placeholders = [
|
||||
'TBD', 'TODO', 'FIXME', 'XXX',
|
||||
'[citation needed]', '[needs citation]',
|
||||
'[placeholder]', '[TODO]', '[TBD]'
|
||||
]
|
||||
|
||||
found_placeholders = []
|
||||
for placeholder in placeholders:
|
||||
if placeholder in self.content:
|
||||
found_placeholders.append(placeholder)
|
||||
|
||||
if found_placeholders:
|
||||
self.errors.append(f"Found placeholder text: {', '.join(found_placeholders)}")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _check_content_truncation(self) -> bool:
|
||||
"""Check for content truncation patterns (2025 Progressive Assembly enhancement)"""
|
||||
truncation_patterns = [
|
||||
(r'Content continues', 'Phrase "Content continues"'),
|
||||
(r'Due to length', 'Phrase "Due to length"'),
|
||||
(r'would continue', 'Phrase "would continue"'),
|
||||
(r'\[Sections \d+-\d+', 'Pattern "[Sections X-Y"'),
|
||||
(r'Additional sections', 'Phrase "Additional sections"'),
|
||||
(r'comprehensive.*word document that continues', 'Pattern "comprehensive...document that continues"'),
|
||||
]
|
||||
|
||||
for pattern_re, description in truncation_patterns:
|
||||
if re.search(pattern_re, self.content, re.IGNORECASE):
|
||||
self.errors.append(f"⚠️ CRITICAL: Content truncation detected: {description}")
|
||||
self.errors.append(f" Report is INCOMPLETE and UNUSABLE - regenerate with progressive assembly")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _check_word_count(self) -> bool:
|
||||
"""Check overall report length"""
|
||||
word_count = len(self.content.split())
|
||||
|
||||
if word_count < 500:
|
||||
self.warnings.append(f"Report is very short: {word_count} words (consider expanding)")
|
||||
# No upper limit warning - progressive assembly supports unlimited lengths
|
||||
|
||||
return True
|
||||
|
||||
def _check_source_count(self) -> bool:
|
||||
"""Check minimum source count"""
|
||||
pattern = r'## Bibliography(.*?)(?=##|\Z)'
|
||||
match = re.search(pattern, self.content, re.DOTALL | re.IGNORECASE)
|
||||
|
||||
if not match:
|
||||
return True # Already caught in bibliography check
|
||||
|
||||
bib_section = match.group(1)
|
||||
bib_entries = re.findall(r'^\[(\d+)\]', bib_section, re.MULTILINE)
|
||||
|
||||
source_count = len(set(bib_entries))
|
||||
|
||||
if source_count < 10:
|
||||
self.warnings.append(f"Only {source_count} sources (recommended: ≥10)")
|
||||
|
||||
return True
|
||||
|
||||
def _check_broken_references(self) -> bool:
|
||||
"""Check for broken internal references"""
|
||||
# Find all markdown links [text](./path)
|
||||
internal_links = re.findall(r'\[.*?\]\((\.\/.*?)\)', self.content)
|
||||
|
||||
broken = []
|
||||
for link in internal_links:
|
||||
# Remove anchor if present
|
||||
link_path = link.split('#')[0]
|
||||
full_path = self.report_path.parent / link_path
|
||||
|
||||
if not full_path.exists():
|
||||
broken.append(link)
|
||||
|
||||
if broken:
|
||||
self.errors.append(f"Broken internal links: {', '.join(broken)}")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _print_summary(self):
|
||||
"""Print validation summary"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"VALIDATION SUMMARY")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
if self.errors:
|
||||
print(f"❌ ERRORS ({len(self.errors)}):")
|
||||
for error in self.errors:
|
||||
print(f" • {error}")
|
||||
print()
|
||||
|
||||
if self.warnings:
|
||||
print(f"⚠️ WARNINGS ({len(self.warnings)}):")
|
||||
for warning in self.warnings:
|
||||
print(f" • {warning}")
|
||||
print()
|
||||
|
||||
if not self.errors and not self.warnings:
|
||||
print("✅ ALL CHECKS PASSED - Report meets quality standards!\n")
|
||||
elif not self.errors:
|
||||
print("✅ VALIDATION PASSED (with warnings)\n")
|
||||
else:
|
||||
print("❌ VALIDATION FAILED - Please fix errors before delivery\n")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Validate research report quality",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python validate_report.py --report report.md
|
||||
python validate_report.py -r ~/.claude/research_output/research_report_20251104_153045.md
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--report', '-r',
|
||||
type=str,
|
||||
required=True,
|
||||
help='Path to research report markdown file'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
report_path = Path(args.report)
|
||||
|
||||
if not report_path.exists():
|
||||
print(f"❌ ERROR: Report file not found: {report_path}")
|
||||
sys.exit(1)
|
||||
|
||||
validator = ReportValidator(report_path)
|
||||
passed = validator.validate()
|
||||
|
||||
sys.exit(0 if passed else 1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
430
axhub-make/skills/third-party/deep-research/scripts/verify_citations.py
vendored
Normal file
430
axhub-make/skills/third-party/deep-research/scripts/verify_citations.py
vendored
Normal file
@@ -0,0 +1,430 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Citation Verification Script (Enhanced with CiteGuard techniques)
|
||||
|
||||
Catches fabricated citations by checking:
|
||||
1. DOI resolution (via doi.org)
|
||||
2. Basic metadata matching (title similarity, year match)
|
||||
3. URL accessibility verification
|
||||
4. Hallucination pattern detection (generic titles, suspicious patterns)
|
||||
5. Flags suspicious entries for manual review
|
||||
|
||||
Enhanced in 2025 with:
|
||||
- Content alignment checking (when URL available)
|
||||
- Multi-source verification (DOI + URL + metadata cross-check)
|
||||
- Advanced hallucination detection patterns
|
||||
- Better false positive reduction
|
||||
|
||||
Usage:
|
||||
python verify_citations.py --report [path]
|
||||
python verify_citations.py --report [path] --strict # Fail on any unverified
|
||||
|
||||
Does NOT require API keys - uses free DOI resolver and heuristics.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import argparse
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Tuple
|
||||
from urllib import request, error
|
||||
from urllib.parse import quote
|
||||
import json
|
||||
import time
|
||||
|
||||
class CitationVerifier:
|
||||
"""Verify citations in research report"""
|
||||
|
||||
def __init__(self, report_path: Path, strict_mode: bool = False):
|
||||
self.report_path = report_path
|
||||
self.strict_mode = strict_mode
|
||||
self.content = self._read_report()
|
||||
self.suspicious = []
|
||||
self.verified = []
|
||||
self.errors = []
|
||||
|
||||
# Hallucination detection patterns (2025 CiteGuard enhancement)
|
||||
self.suspicious_patterns = [
|
||||
# Generic academic-sounding but fake patterns
|
||||
(r'^(A |An |The )?(Study|Analysis|Review|Survey|Investigation) (of|on|into)',
|
||||
"Generic academic title pattern"),
|
||||
(r'^(Recent|Current|Modern|Contemporary) (Advances|Developments|Trends) in',
|
||||
"Generic 'advances' title pattern"),
|
||||
# Too perfect, templated titles
|
||||
(r'^[A-Z][a-z]+ [A-Z][a-z]+: A (Comprehensive|Complete|Systematic) (Review|Analysis|Guide)$',
|
||||
"Too perfect, templated structure"),
|
||||
]
|
||||
|
||||
def _read_report(self) -> str:
|
||||
"""Read report file"""
|
||||
try:
|
||||
with open(self.report_path, 'r', encoding='utf-8') as f:
|
||||
return f.read()
|
||||
except Exception as e:
|
||||
print(f"L ERROR: Cannot read report: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
def extract_bibliography(self) -> List[Dict]:
|
||||
"""Extract bibliography entries from report"""
|
||||
pattern = r'## Bibliography(.*?)(?=##|\Z)'
|
||||
match = re.search(pattern, self.content, re.DOTALL | re.IGNORECASE)
|
||||
|
||||
if not match:
|
||||
self.errors.append("No Bibliography section found")
|
||||
return []
|
||||
|
||||
bib_section = match.group(1)
|
||||
|
||||
# Parse entries: [N] Author (Year). "Title". Venue. URL
|
||||
entries = []
|
||||
lines = bib_section.strip().split('\n')
|
||||
|
||||
current_entry = None
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# Check if starts with citation number [N]
|
||||
match_num = re.match(r'^\[(\d+)\]\s+(.+)$', line)
|
||||
if match_num:
|
||||
if current_entry:
|
||||
entries.append(current_entry)
|
||||
|
||||
num = match_num.group(1)
|
||||
rest = match_num.group(2)
|
||||
|
||||
# Try to parse: Author (Year). "Title". Venue. URL
|
||||
year_match = re.search(r'\((\d{4})\)', rest)
|
||||
title_match = re.search(r'"([^"]+)"', rest)
|
||||
doi_match = re.search(r'doi\.org/(10\.\S+)', rest)
|
||||
url_match = re.search(r'https?://[^\s\)]+', rest)
|
||||
|
||||
current_entry = {
|
||||
'num': num,
|
||||
'raw': rest,
|
||||
'year': year_match.group(1) if year_match else None,
|
||||
'title': title_match.group(1) if title_match else None,
|
||||
'doi': doi_match.group(1) if doi_match else None,
|
||||
'url': url_match.group(0) if url_match else None
|
||||
}
|
||||
elif current_entry:
|
||||
# Multi-line entry, append to raw
|
||||
current_entry['raw'] += ' ' + line
|
||||
|
||||
if current_entry:
|
||||
entries.append(current_entry)
|
||||
|
||||
return entries
|
||||
|
||||
def verify_doi(self, doi: str) -> Tuple[bool, Dict]:
|
||||
"""
|
||||
Verify DOI exists and get metadata.
|
||||
Returns (success, metadata_dict)
|
||||
"""
|
||||
if not doi:
|
||||
return False, {}
|
||||
|
||||
try:
|
||||
# Use content negotiation to get JSON metadata
|
||||
url = f"https://doi.org/{quote(doi)}"
|
||||
req = request.Request(url)
|
||||
req.add_header('Accept', 'application/vnd.citationstyles.csl+json')
|
||||
|
||||
with request.urlopen(req, timeout=10) as response:
|
||||
data = json.loads(response.read().decode('utf-8'))
|
||||
|
||||
return True, {
|
||||
'title': data.get('title', ''),
|
||||
'year': data.get('issued', {}).get('date-parts', [[None]])[0][0],
|
||||
'authors': [
|
||||
f"{a.get('family', '')} {a.get('given', '')}"
|
||||
for a in data.get('author', [])
|
||||
],
|
||||
'venue': data.get('container-title', '')
|
||||
}
|
||||
except error.HTTPError as e:
|
||||
if e.code == 404:
|
||||
return False, {'error': 'DOI not found (404)'}
|
||||
return False, {'error': f'HTTP {e.code}'}
|
||||
except Exception as e:
|
||||
return False, {'error': str(e)}
|
||||
|
||||
def verify_url(self, url: str) -> Tuple[bool, str]:
|
||||
"""
|
||||
Verify URL is accessible (2025 CiteGuard enhancement).
|
||||
Returns (accessible, status_message)
|
||||
"""
|
||||
if not url:
|
||||
return False, "No URL"
|
||||
|
||||
try:
|
||||
# HEAD request to check accessibility without downloading
|
||||
req = request.Request(url, method='HEAD')
|
||||
req.add_header('User-Agent', 'Mozilla/5.0 (Research Citation Verifier)')
|
||||
|
||||
with request.urlopen(req, timeout=10) as response:
|
||||
if response.status == 200:
|
||||
return True, "URL accessible"
|
||||
else:
|
||||
return False, f"HTTP {response.status}"
|
||||
except error.HTTPError as e:
|
||||
return False, f"HTTP {e.code}"
|
||||
except error.URLError as e:
|
||||
return False, f"URL error: {e.reason}"
|
||||
except Exception as e:
|
||||
return False, f"Connection error: {str(e)[:50]}"
|
||||
|
||||
def detect_hallucination_patterns(self, entry: Dict) -> List[str]:
|
||||
"""
|
||||
Detect common LLM hallucination patterns in citations (2025 CiteGuard).
|
||||
Returns list of detected issues.
|
||||
"""
|
||||
issues = []
|
||||
title = entry.get('title', '')
|
||||
|
||||
if not title:
|
||||
return issues
|
||||
|
||||
# Check against suspicious patterns
|
||||
for pattern, description in self.suspicious_patterns:
|
||||
if re.match(pattern, title, re.IGNORECASE):
|
||||
issues.append(f"Suspicious title pattern: {description}")
|
||||
|
||||
# Check for overly generic titles
|
||||
generic_words = ['overview', 'introduction', 'guide', 'handbook', 'manual']
|
||||
if any(word in title.lower() for word in generic_words) and len(title.split()) < 5:
|
||||
issues.append("Very generic short title")
|
||||
|
||||
# Check for placeholder-like titles
|
||||
if any(x in title.lower() for x in ['tbd', 'todo', 'placeholder', 'example']):
|
||||
issues.append("Placeholder text in title")
|
||||
|
||||
# Check for inconsistent metadata
|
||||
if entry.get('year'):
|
||||
year = int(entry['year'])
|
||||
# Very recent without DOI or URL is suspicious
|
||||
if year >= 2024 and not entry.get('doi') and not entry.get('url'):
|
||||
issues.append("Recent year (2024+) with no verification method")
|
||||
# Future year is definitely wrong
|
||||
if year > 2025:
|
||||
issues.append(f"Future year: {year}")
|
||||
# Very old with modern phrasing is suspicious
|
||||
if year < 2000 and any(word in title.lower() for word in ['ai', 'llm', 'gpt', 'transformer']):
|
||||
issues.append(f"Anachronistic: pre-2000 ({year}) citation mentioning modern AI terms")
|
||||
|
||||
return issues
|
||||
|
||||
def check_title_similarity(self, title1: str, title2: str) -> float:
|
||||
"""
|
||||
Simple title similarity check (word overlap).
|
||||
Returns score 0.0-1.0
|
||||
"""
|
||||
if not title1 or not title2:
|
||||
return 0.0
|
||||
|
||||
# Normalize: lowercase, remove punctuation, split
|
||||
def normalize(s):
|
||||
s = s.lower()
|
||||
s = re.sub(r'[^\w\s]', ' ', s)
|
||||
return set(s.split())
|
||||
|
||||
words1 = normalize(title1)
|
||||
words2 = normalize(title2)
|
||||
|
||||
if not words1 or not words2:
|
||||
return 0.0
|
||||
|
||||
overlap = len(words1 & words2)
|
||||
total = len(words1 | words2)
|
||||
|
||||
return overlap / total if total > 0 else 0.0
|
||||
|
||||
def verify_entry(self, entry: Dict) -> Dict:
|
||||
"""Verify a single bibliography entry (Enhanced 2025 with CiteGuard)"""
|
||||
result = {
|
||||
'num': entry['num'],
|
||||
'status': 'unknown',
|
||||
'issues': [],
|
||||
'metadata': {},
|
||||
'verification_methods': []
|
||||
}
|
||||
|
||||
# STEP 1: Run hallucination detection (CiteGuard 2025)
|
||||
hallucination_issues = self.detect_hallucination_patterns(entry)
|
||||
if hallucination_issues:
|
||||
result['issues'].extend(hallucination_issues)
|
||||
result['status'] = 'suspicious'
|
||||
|
||||
# STEP 2: Has DOI?
|
||||
if entry['doi']:
|
||||
print(f" [{entry['num']}] Checking DOI {entry['doi']}...", end=' ')
|
||||
success, metadata = self.verify_doi(entry['doi'])
|
||||
|
||||
if success:
|
||||
result['metadata'] = metadata
|
||||
result['status'] = 'verified'
|
||||
print("")
|
||||
|
||||
# Check title similarity if we have both
|
||||
if entry['title'] and metadata.get('title'):
|
||||
similarity = self.check_title_similarity(
|
||||
entry['title'],
|
||||
metadata['title']
|
||||
)
|
||||
|
||||
if similarity < 0.5:
|
||||
result['issues'].append(
|
||||
f"Title mismatch (similarity: {similarity:.1%})"
|
||||
)
|
||||
result['status'] = 'suspicious'
|
||||
|
||||
# Check year match
|
||||
if entry['year'] and metadata.get('year'):
|
||||
if int(entry['year']) != int(metadata['year']):
|
||||
result['issues'].append(
|
||||
f"Year mismatch: report says {entry['year']}, DOI says {metadata['year']}"
|
||||
)
|
||||
result['status'] = 'suspicious'
|
||||
|
||||
else:
|
||||
print(f"✗ {metadata.get('error', 'Failed')}")
|
||||
result['status'] = 'unverified'
|
||||
result['issues'].append(f"DOI resolution failed: {metadata.get('error', 'unknown')}")
|
||||
|
||||
# STEP 3: Check URL accessibility (if no DOI or DOI failed)
|
||||
if entry['url'] and result['status'] != 'verified':
|
||||
url_ok, url_status = self.verify_url(entry['url'])
|
||||
if url_ok:
|
||||
result['verification_methods'].append('URL')
|
||||
# Upgrade status if URL verifies
|
||||
if result['status'] in ['unknown', 'no_doi', 'unverified']:
|
||||
result['status'] = 'url_verified'
|
||||
print(f" [{entry['num']}] URL accessible ✓")
|
||||
else:
|
||||
result['issues'].append(f"URL check failed: {url_status}")
|
||||
|
||||
# STEP 4: Final fallback - no verification method
|
||||
if not entry['doi'] and not entry['url']:
|
||||
if 'No DOI provided' not in ' '.join(result['issues']):
|
||||
result['issues'].append("No DOI or URL - cannot verify")
|
||||
result['status'] = 'suspicious'
|
||||
|
||||
return result
|
||||
|
||||
def verify_all(self):
|
||||
"""Verify all bibliography entries"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"CITATION VERIFICATION: {self.report_path.name}")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
entries = self.extract_bibliography()
|
||||
|
||||
if not entries:
|
||||
print("L No bibliography entries found\n")
|
||||
return False
|
||||
|
||||
print(f"Found {len(entries)} citations\n")
|
||||
|
||||
results = []
|
||||
for entry in entries:
|
||||
result = self.verify_entry(entry)
|
||||
results.append(result)
|
||||
|
||||
# Rate limiting
|
||||
time.sleep(0.5)
|
||||
|
||||
# Summarize
|
||||
print(f"\n{'='*60}")
|
||||
print(f"VERIFICATION SUMMARY")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
verified = [r for r in results if r['status'] == 'verified']
|
||||
url_verified = [r for r in results if r['status'] == 'url_verified']
|
||||
suspicious = [r for r in results if r['status'] == 'suspicious']
|
||||
unverified = [r for r in results if r['status'] in ['unverified', 'no_doi', 'unknown']]
|
||||
|
||||
print(f'DOI Verified: {len(verified)}/{len(results)}')
|
||||
print(f'URL Verified: {len(url_verified)}/{len(results)}')
|
||||
print(f'Suspicious: {len(suspicious)}/{len(results)}')
|
||||
print(f'Unverified: {len(unverified)}/{len(results)}')
|
||||
print()
|
||||
|
||||
if suspicious:
|
||||
print('SUSPICIOUS CITATIONS (Manual Review Needed):')
|
||||
for r in suspicious:
|
||||
print(f"\n [{r['num']}]")
|
||||
for issue in r['issues']:
|
||||
print(f" - {issue}")
|
||||
print()
|
||||
|
||||
if unverified and len(unverified) > 0:
|
||||
print('UNVERIFIED CITATIONS (Could not check):')
|
||||
for r in unverified:
|
||||
print(f" [{r['num']}] {r['issues'][0] if r['issues'] else 'Unknown'}")
|
||||
print()
|
||||
|
||||
# Decision (Enhanced 2025 - includes URL-verified as acceptable)
|
||||
total_verified = len(verified) + len(url_verified)
|
||||
|
||||
if suspicious:
|
||||
print('WARNING: Suspicious citations detected')
|
||||
if self.strict_mode:
|
||||
print(' STRICT MODE: Failing due to suspicious citations')
|
||||
return False
|
||||
else:
|
||||
print(' (Continuing in non-strict mode)')
|
||||
|
||||
if self.strict_mode and unverified:
|
||||
print('STRICT MODE: Unverified citations found')
|
||||
return False
|
||||
|
||||
if total_verified / len(results) < 0.5:
|
||||
print('WARNING: Less than 50% citations verified')
|
||||
return True # Pass with warning
|
||||
else:
|
||||
print('CITATION VERIFICATION PASSED')
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Verify citations in research report",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python verify_citations.py --report report.md
|
||||
|
||||
Note: Requires internet connection to check DOIs.
|
||||
Uses free DOI resolver - no API key needed.
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--report', '-r',
|
||||
type=str,
|
||||
required=True,
|
||||
help='Path to research report markdown file'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--strict',
|
||||
action='store_true',
|
||||
help='Strict mode: fail on any unverified or suspicious citations'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
report_path = Path(args.report)
|
||||
|
||||
if not report_path.exists():
|
||||
print(f"ERROR: Report file not found: {report_path}")
|
||||
sys.exit(1)
|
||||
|
||||
verifier = CitationVerifier(report_path, strict_mode=args.strict)
|
||||
passed = verifier.verify_all()
|
||||
|
||||
sys.exit(0 if passed else 1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
220
axhub-make/skills/third-party/deep-research/scripts/verify_html.py
vendored
Normal file
220
axhub-make/skills/third-party/deep-research/scripts/verify_html.py
vendored
Normal file
@@ -0,0 +1,220 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
HTML Report Verification Script
|
||||
Validates that HTML reports are properly generated with all sections from MD
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import List, Tuple
|
||||
|
||||
|
||||
class HTMLVerifier:
|
||||
"""Verify HTML research reports"""
|
||||
|
||||
def __init__(self, html_path: Path, md_path: Path):
|
||||
self.html_path = html_path
|
||||
self.md_path = md_path
|
||||
self.errors = []
|
||||
self.warnings = []
|
||||
|
||||
def verify(self) -> bool:
|
||||
"""
|
||||
Run all verification checks
|
||||
|
||||
Returns:
|
||||
True if all checks pass, False otherwise
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"HTML REPORT VERIFICATION")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
print(f"HTML File: {self.html_path}")
|
||||
print(f"MD File: {self.md_path}\n")
|
||||
|
||||
# Read files
|
||||
try:
|
||||
html_content = self.html_path.read_text()
|
||||
md_content = self.md_path.read_text()
|
||||
except Exception as e:
|
||||
self.errors.append(f"Failed to read files: {e}")
|
||||
return False
|
||||
|
||||
# Run checks
|
||||
self._check_sections(html_content, md_content)
|
||||
self._check_no_placeholders(html_content)
|
||||
self._check_no_emojis(html_content)
|
||||
self._check_structure(html_content)
|
||||
self._check_citations(html_content, md_content)
|
||||
self._check_bibliography(html_content, md_content)
|
||||
|
||||
# Report results
|
||||
self._print_results()
|
||||
|
||||
return len(self.errors) == 0
|
||||
|
||||
def _check_sections(self, html: str, md: str):
|
||||
"""Verify all markdown sections are present in HTML"""
|
||||
# Extract section headings from markdown
|
||||
md_sections = re.findall(r'^## (.+)$', md, re.MULTILINE)
|
||||
|
||||
# Extract sections from HTML
|
||||
html_sections = re.findall(r'<h2 class="section-title">(.+?)</h2>', html)
|
||||
|
||||
# Check if we have placeholder sections like <div class="section">#</div>
|
||||
placeholder_sections = re.findall(r'<div class="section">#</div>', html)
|
||||
|
||||
if placeholder_sections:
|
||||
self.errors.append(
|
||||
f"Found {len(placeholder_sections)} placeholder sections (empty '#' divs) - content not converted properly"
|
||||
)
|
||||
|
||||
# Compare section counts
|
||||
if len(md_sections) > len(html_sections) + 1: # +1 for bibliography which is separate
|
||||
self.errors.append(
|
||||
f"Section count mismatch: MD has {len(md_sections)} sections, HTML has only {len(html_sections)} + bibliography"
|
||||
)
|
||||
missing = set(md_sections) - set(html_sections)
|
||||
if missing:
|
||||
self.errors.append(f"Missing sections in HTML: {missing}")
|
||||
|
||||
# Verify Executive Summary is present
|
||||
if "Executive Summary" in md and "Executive Summary" not in html:
|
||||
self.errors.append("Executive Summary missing from HTML")
|
||||
|
||||
def _check_no_placeholders(self, html: str):
|
||||
"""Check for common placeholders that shouldn't be in final report"""
|
||||
placeholders = [
|
||||
'{{TITLE}}', '{{DATE}}', '{{CONTENT}}', '{{BIBLIOGRAPHY}}',
|
||||
'{{METRICS_DASHBOARD}}', '{{SOURCE_COUNT}}', 'TODO', 'TBD',
|
||||
'PLACEHOLDER', 'FIXME'
|
||||
]
|
||||
|
||||
found = []
|
||||
for placeholder in placeholders:
|
||||
if placeholder in html:
|
||||
found.append(placeholder)
|
||||
|
||||
if found:
|
||||
self.errors.append(f"Found unreplaced placeholders: {', '.join(found)}")
|
||||
|
||||
def _check_no_emojis(self, html: str):
|
||||
"""Verify no emojis are present in HTML"""
|
||||
# Common emoji patterns
|
||||
emoji_pattern = re.compile(
|
||||
"["
|
||||
"\U0001F600-\U0001F64F" # emoticons
|
||||
"\U0001F300-\U0001F5FF" # symbols & pictographs
|
||||
"\U0001F680-\U0001F6FF" # transport & map symbols
|
||||
"\U0001F1E0-\U0001F1FF" # flags
|
||||
"\U00002702-\U000027B0"
|
||||
"\U000024C2-\U0001F251"
|
||||
"]+",
|
||||
flags=re.UNICODE
|
||||
)
|
||||
|
||||
emojis = emoji_pattern.findall(html)
|
||||
if emojis:
|
||||
unique_emojis = set(emojis)
|
||||
self.errors.append(f"Found {len(emojis)} emojis in HTML (should be none): {unique_emojis}")
|
||||
|
||||
def _check_structure(self, html: str):
|
||||
"""Verify HTML has proper structure"""
|
||||
required_elements = [
|
||||
('<html', 'HTML tag'),
|
||||
('<head', 'head tag'),
|
||||
('<body', 'body tag'),
|
||||
('<title>', 'title tag'),
|
||||
('class="header"', 'header section'),
|
||||
('class="content"', 'content section'),
|
||||
('class="bibliography"', 'bibliography section'),
|
||||
]
|
||||
|
||||
for element, name in required_elements:
|
||||
if element not in html:
|
||||
self.errors.append(f"Missing {name} in HTML")
|
||||
|
||||
# Check for unclosed tags (basic check)
|
||||
open_divs = html.count('<div')
|
||||
close_divs = html.count('</div>')
|
||||
|
||||
if abs(open_divs - close_divs) > 2: # Allow small discrepancy
|
||||
self.warnings.append(
|
||||
f"Possible unclosed divs: {open_divs} opening tags, {close_divs} closing tags"
|
||||
)
|
||||
|
||||
def _check_citations(self, html: str, md: str):
|
||||
"""Verify citations are present"""
|
||||
# Extract citations from markdown
|
||||
md_citations = set(re.findall(r'\[(\d+)\]', md))
|
||||
|
||||
# Extract citations from HTML (excluding bibliography)
|
||||
html_content = html.split('class="bibliography"')[0] if 'class="bibliography"' in html else html
|
||||
html_citations = set(re.findall(r'\[(\d+)\]', html_content))
|
||||
|
||||
if len(md_citations) > 0 and len(html_citations) == 0:
|
||||
self.errors.append("No citations found in HTML content (but present in MD)")
|
||||
|
||||
if len(md_citations) > len(html_citations) * 1.5: # Allow some variation
|
||||
self.warnings.append(
|
||||
f"Fewer citations in HTML ({len(html_citations)}) than MD ({len(md_citations)})"
|
||||
)
|
||||
|
||||
def _check_bibliography(self, html: str, md: str):
|
||||
"""Verify bibliography is present and formatted"""
|
||||
if '## Bibliography' in md:
|
||||
if 'class="bibliography"' not in html:
|
||||
self.errors.append("Bibliography section missing from HTML")
|
||||
elif 'class="bib-entry"' not in html:
|
||||
self.warnings.append("Bibliography present but entries not properly formatted")
|
||||
|
||||
def _print_results(self):
|
||||
"""Print verification results"""
|
||||
print(f"\n{'-'*60}")
|
||||
print("VERIFICATION RESULTS")
|
||||
print(f"{'-'*60}\n")
|
||||
|
||||
if self.errors:
|
||||
print(f"❌ ERRORS ({len(self.errors)}):")
|
||||
for i, error in enumerate(self.errors, 1):
|
||||
print(f" {i}. {error}")
|
||||
print()
|
||||
|
||||
if self.warnings:
|
||||
print(f"⚠️ WARNINGS ({len(self.warnings)}):")
|
||||
for i, warning in enumerate(self.warnings, 1):
|
||||
print(f" {i}. {warning}")
|
||||
print()
|
||||
|
||||
if not self.errors and not self.warnings:
|
||||
print("✅ All checks passed! HTML report is valid.")
|
||||
print()
|
||||
|
||||
print(f"{'-'*60}\n")
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point"""
|
||||
parser = argparse.ArgumentParser(description='Verify HTML research report')
|
||||
parser.add_argument('--html', type=Path, required=True, help='Path to HTML report')
|
||||
parser.add_argument('--md', type=Path, required=True, help='Path to markdown report')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.html.exists():
|
||||
print(f"Error: HTML file not found: {args.html}")
|
||||
return 1
|
||||
|
||||
if not args.md.exists():
|
||||
print(f"Error: Markdown file not found: {args.md}")
|
||||
return 1
|
||||
|
||||
verifier = HTMLVerifier(args.html, args.md)
|
||||
success = verifier.verify()
|
||||
|
||||
return 0 if success else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
443
axhub-make/skills/third-party/deep-research/templates/mckinsey_report_template.html
vendored
Normal file
443
axhub-make/skills/third-party/deep-research/templates/mckinsey_report_template.html
vendored
Normal file
@@ -0,0 +1,443 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>{{TITLE}} - Deep Research Report</title>
|
||||
<style>
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
|
||||
font-size: 14px;
|
||||
line-height: 1.5;
|
||||
color: #1a1a1a;
|
||||
background: #ffffff;
|
||||
}
|
||||
|
||||
.container {
|
||||
max-width: 1400px;
|
||||
margin: 0 auto;
|
||||
background: white;
|
||||
}
|
||||
|
||||
.header {
|
||||
background: #003d5c;
|
||||
color: white;
|
||||
padding: 25px 40px;
|
||||
border-bottom: 3px solid #002840;
|
||||
}
|
||||
|
||||
.header h1 {
|
||||
font-size: 26px;
|
||||
font-weight: 600;
|
||||
margin-bottom: 8px;
|
||||
letter-spacing: -0.5px;
|
||||
}
|
||||
|
||||
.header-meta {
|
||||
font-size: 13px;
|
||||
color: #b8d4e6;
|
||||
display: flex;
|
||||
gap: 25px;
|
||||
}
|
||||
|
||||
.metrics-dashboard {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
||||
gap: 0;
|
||||
border-bottom: 2px solid #003d5c;
|
||||
}
|
||||
|
||||
.metric {
|
||||
padding: 20px 30px;
|
||||
background: #f8f9fa;
|
||||
border-right: 1px solid #d1d5db;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.metric:last-child {
|
||||
border-right: none;
|
||||
}
|
||||
|
||||
.metric-number {
|
||||
font-size: 32px;
|
||||
font-weight: 700;
|
||||
color: #003d5c;
|
||||
display: block;
|
||||
margin-bottom: 6px;
|
||||
}
|
||||
|
||||
.metric-label {
|
||||
font-size: 12px;
|
||||
color: #4a5568;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.content {
|
||||
padding: 30px 40px;
|
||||
}
|
||||
|
||||
.section {
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
|
||||
.section-title {
|
||||
font-size: 20px;
|
||||
font-weight: 700;
|
||||
color: #003d5c;
|
||||
margin: 32px 0 16px 0;
|
||||
padding-bottom: 8px;
|
||||
border-bottom: 2px solid #003d5c;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.8px;
|
||||
line-height: 1.3;
|
||||
}
|
||||
|
||||
.subsection-title {
|
||||
font-size: 16px;
|
||||
font-weight: 700;
|
||||
color: #1a1a1a;
|
||||
margin: 24px 0 12px 0;
|
||||
line-height: 1.4;
|
||||
}
|
||||
|
||||
.executive-summary {
|
||||
background: #f8f9fa;
|
||||
padding: 20px;
|
||||
margin-bottom: 30px;
|
||||
border-left: 4px solid #003d5c;
|
||||
}
|
||||
|
||||
.executive-summary p {
|
||||
margin-bottom: 12px;
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
p {
|
||||
margin-bottom: 14px;
|
||||
line-height: 1.7;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
/* Better paragraph spacing in content sections */
|
||||
.content > p,
|
||||
.section p {
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
|
||||
.findings-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
gap: 20px;
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
|
||||
.finding-card {
|
||||
background: #f8f9fa;
|
||||
padding: 18px;
|
||||
border-left: 3px solid #003d5c;
|
||||
}
|
||||
|
||||
.finding-card h3 {
|
||||
font-size: 14px;
|
||||
font-weight: 700;
|
||||
color: #003d5c;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
|
||||
.finding-card p {
|
||||
font-size: 13px;
|
||||
line-height: 1.5;
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.data-table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
margin: 15px 0;
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.data-table th {
|
||||
background: #003d5c;
|
||||
color: white;
|
||||
padding: 10px 15px;
|
||||
text-align: left;
|
||||
font-weight: 600;
|
||||
font-size: 12px;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
}
|
||||
|
||||
.data-table td {
|
||||
padding: 10px 15px;
|
||||
border-bottom: 1px solid #e5e7eb;
|
||||
}
|
||||
|
||||
.data-table tr:hover {
|
||||
background: #f8f9fa;
|
||||
}
|
||||
|
||||
ul, ol {
|
||||
margin: 16px 0 16px 28px;
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
li {
|
||||
margin-bottom: 10px;
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
padding-left: 8px;
|
||||
}
|
||||
|
||||
/* Nested lists */
|
||||
li ul, li ol {
|
||||
margin-top: 10px;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
|
||||
/* Better bullet/number spacing */
|
||||
ol {
|
||||
list-style-position: outside;
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
ul {
|
||||
list-style-position: outside;
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
.key-insight {
|
||||
background: white;
|
||||
border: 1px solid #d1d5db;
|
||||
border-left: 3px solid #003d5c;
|
||||
padding: 15px;
|
||||
margin: 15px 0;
|
||||
}
|
||||
|
||||
.key-insight strong {
|
||||
color: #003d5c;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.citation {
|
||||
color: #003d5c;
|
||||
font-weight: 600;
|
||||
text-decoration: none;
|
||||
cursor: pointer;
|
||||
position: relative;
|
||||
padding: 2px 4px;
|
||||
background: #f0f7fc;
|
||||
border-radius: 2px;
|
||||
transition: all 0.2s ease;
|
||||
}
|
||||
|
||||
.citation:hover {
|
||||
background: #003d5c;
|
||||
color: white;
|
||||
}
|
||||
|
||||
/* Attribution Gradients (2025 Enhancement) */
|
||||
.citation-tooltip {
|
||||
display: none;
|
||||
position: absolute;
|
||||
bottom: 100%;
|
||||
left: 50%;
|
||||
transform: translateX(-50%);
|
||||
margin-bottom: 8px;
|
||||
background: white;
|
||||
border: 2px solid #003d5c;
|
||||
box-shadow: 0 4px 12px rgba(0, 61, 92, 0.15);
|
||||
padding: 12px;
|
||||
min-width: 300px;
|
||||
max-width: 500px;
|
||||
z-index: 1000;
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
.citation:hover .citation-tooltip {
|
||||
display: block;
|
||||
}
|
||||
|
||||
.tooltip-title {
|
||||
font-weight: 700;
|
||||
color: #003d5c;
|
||||
margin-bottom: 8px;
|
||||
font-size: 13px;
|
||||
border-bottom: 1px solid #d1d5db;
|
||||
padding-bottom: 6px;
|
||||
}
|
||||
|
||||
.tooltip-source {
|
||||
color: #4a5568;
|
||||
margin-bottom: 8px;
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.tooltip-claim {
|
||||
background: #f8f9fa;
|
||||
padding: 8px;
|
||||
margin-top: 8px;
|
||||
border-left: 3px solid #003d5c;
|
||||
font-size: 11px;
|
||||
}
|
||||
|
||||
.tooltip-claim-label {
|
||||
font-weight: 600;
|
||||
color: #003d5c;
|
||||
text-transform: uppercase;
|
||||
font-size: 10px;
|
||||
letter-spacing: 0.5px;
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.evidence-chain {
|
||||
margin-top: 10px;
|
||||
padding-top: 10px;
|
||||
border-top: 1px solid #d1d5db;
|
||||
}
|
||||
|
||||
.evidence-chain-label {
|
||||
font-weight: 600;
|
||||
color: #003d5c;
|
||||
font-size: 11px;
|
||||
margin-bottom: 6px;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
}
|
||||
|
||||
.evidence-step {
|
||||
padding: 6px;
|
||||
background: #f8f9fa;
|
||||
margin-bottom: 4px;
|
||||
font-size: 11px;
|
||||
border-left: 2px solid #d1d5db;
|
||||
padding-left: 8px;
|
||||
}
|
||||
|
||||
.bibliography {
|
||||
background: #f8f9fa;
|
||||
padding: 30px;
|
||||
margin-top: 40px;
|
||||
border-left: 4px solid #003d5c;
|
||||
}
|
||||
|
||||
.bibliography-content {
|
||||
background: #f8f9fa;
|
||||
padding: 20px 0;
|
||||
}
|
||||
|
||||
.bib-entry {
|
||||
margin-bottom: 18px;
|
||||
padding-left: 50px;
|
||||
text-indent: -50px;
|
||||
line-height: 1.6;
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.bib-number {
|
||||
color: #003d5c;
|
||||
font-weight: 700;
|
||||
margin-right: 8px;
|
||||
}
|
||||
|
||||
.bib-entry a {
|
||||
color: #003d5c;
|
||||
word-wrap: break-word;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
.bib-entry a:hover {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
.compact-list {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
gap: 15px;
|
||||
margin: 15px 0;
|
||||
}
|
||||
|
||||
.compact-list li {
|
||||
margin-bottom: 8px;
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.info-box {
|
||||
background: white;
|
||||
border: 1px solid #d1d5db;
|
||||
padding: 15px;
|
||||
margin: 15px 0;
|
||||
}
|
||||
|
||||
.info-box h4 {
|
||||
font-size: 13px;
|
||||
font-weight: 700;
|
||||
color: #003d5c;
|
||||
margin-bottom: 8px;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.highlight-stat {
|
||||
font-weight: 700;
|
||||
color: #003d5c;
|
||||
}
|
||||
|
||||
strong {
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
@media print {
|
||||
.container {
|
||||
max-width: 100%;
|
||||
}
|
||||
}
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.metrics-dashboard {
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
}
|
||||
.findings-grid {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
.compact-list {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<div class="header">
|
||||
<h1>{{TITLE}}</h1>
|
||||
<div class="header-meta">
|
||||
<span>{{DATE}}</span>
|
||||
<span>•</span>
|
||||
<span>{{SOURCE_COUNT}} Sources</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{{METRICS_DASHBOARD}}
|
||||
|
||||
<div class="content">
|
||||
{{CONTENT}}
|
||||
|
||||
<div class="bibliography">
|
||||
<div class="section-title">Bibliography</div>
|
||||
{{BIBLIOGRAPHY}}
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
414
axhub-make/skills/third-party/deep-research/templates/report_template.md
vendored
Normal file
414
axhub-make/skills/third-party/deep-research/templates/report_template.md
vendored
Normal file
@@ -0,0 +1,414 @@
|
||||
# Research Report: [Topic]
|
||||
|
||||
<!-- =============================================================================
|
||||
PROGRESSIVE FILE ASSEMBLY STRATEGY (2025 - Unlimited Length):
|
||||
|
||||
This report is generated section-by-section using progressive file assembly.
|
||||
Each section is generated to APPROPRIATE depth (however many words needed) and
|
||||
written to file immediately using Write/Edit tools.
|
||||
|
||||
WHY: Manages output token limits while maintaining quality throughout
|
||||
RESULT: Large reports (up to 20,000 words per skill run) - sections sized naturally by content
|
||||
|
||||
CLAUDE CODE LIMIT: 32,000 output tokens (≈20,000 words max per run)
|
||||
For reports >20,000 words: Run skill multiple times for different parts
|
||||
|
||||
GENERATION WORKFLOW:
|
||||
1. Generate Executive Summary → Write to file
|
||||
(As long as needed for comprehensive summary)
|
||||
|
||||
2. Generate Introduction → Edit/append to file
|
||||
(As long as needed to establish context)
|
||||
|
||||
3. Generate Finding 1 → Edit/append to file
|
||||
(As long as needed to fully present evidence and analysis)
|
||||
|
||||
4. Generate Finding 2 → Edit/append to file
|
||||
(Each finding sized appropriately - some may need 300 words, others 1,500)
|
||||
|
||||
5. Continue for ALL findings (no limit on number OR length per finding!)
|
||||
|
||||
6. Generate Synthesis → Edit/append to file
|
||||
(As long as needed for deep synthesis)
|
||||
|
||||
7. Generate Limitations → Edit/append to file
|
||||
8. Generate Recommendations → Edit/append to file
|
||||
9. Generate Bibliography (ALL citations) → Edit/append to file
|
||||
10. Generate Methodology → Edit/append to file
|
||||
|
||||
SIZING PRINCIPLE:
|
||||
- Each section should be as long as IT NEEDS TO BE
|
||||
- Simple finding? Maybe 400 words is enough
|
||||
- Complex multi-faceted finding? Could be 1,200 words
|
||||
- Let evidence and analysis determine length, not arbitrary targets
|
||||
- Only constraint: Keep each INDIVIDUAL generation under ~2,000 words to avoid output limits
|
||||
- If a section needs >2,000 words, break it into subsections and generate progressively
|
||||
|
||||
CITATION TRACKING (CRITICAL):
|
||||
- Maintain running list in working memory: citations_used = [1, 2, 3, ...]
|
||||
- After each section: Add new citations to list
|
||||
- In Bibliography: Generate entry for EVERY citation in final list
|
||||
- NO gaps, NO ranges, NO placeholders
|
||||
|
||||
============================================================================= -->
|
||||
|
||||
<!-- WRITING STANDARDS (Apply to EACH section): -->
|
||||
<!-- - PRECISION: Each word deliberately chosen, carries intention -->
|
||||
<!-- - ECONOMY: No fluff, eliminate fancy grammar, unnecessary adjectives -->
|
||||
<!-- - CLARITY: Use exact numbers, specific data, precise technical terms -->
|
||||
<!-- - DIRECTNESS: State findings without embellishment -->
|
||||
<!-- - HIGH SIGNAL-TO-NOISE: Respect reader's time, dense information -->
|
||||
<!-- Examples: "reduced mortality 23%" not "significantly improved outcomes" -->
|
||||
<!-- Examples: "5 RCTs (n=1,847)" not "several studies suggest" -->
|
||||
|
||||
<!-- SOURCE ATTRIBUTION (CRITICAL - PREVENTS FABRICATION): -->
|
||||
<!-- EVERY factual claim MUST be followed by [N] citation in same sentence -->
|
||||
<!-- Use "According to [1]..." or "[1] reports..." for factual statements -->
|
||||
<!-- DISTINGUISH fact from synthesis: -->
|
||||
<!-- ✅ GOOD: "Mortality decreased 23% (p<0.01) in treatment group [1]." -->
|
||||
<!-- ❌ BAD: "Studies show mortality improved significantly." -->
|
||||
<!-- NO vague attributions like "research suggests" or "experts believe" -->
|
||||
<!-- ADMIT uncertainty: "No sources found for X" not fabricated citations -->
|
||||
<!-- LABEL speculation: "This suggests..." not "Research shows..." -->
|
||||
|
||||
<!-- ANTI-TRUNCATION (CRITICAL - Each Section Must Be COMPLETE): -->
|
||||
<!-- ❌ FORBIDDEN: "Content continues...", "Due to length...", "[Sections X-Y...]" -->
|
||||
<!-- ✅ REQUIRED: Generate current section COMPLETELY (you're only writing 500 words!) -->
|
||||
<!-- ✅ REQUIRED: Write to file immediately, then move to next section -->
|
||||
<!-- Progressive assembly handles unlimited length - you handle quality per section -->
|
||||
|
||||
## Executive Summary
|
||||
|
||||
[Write 3-5 bullet points, 50-250 words total]
|
||||
- **Key Finding 1:** [Major discovery with specific data/metrics]
|
||||
- **Key Finding 2:** [Important insight with evidence]
|
||||
- **Key Finding 3:** [Critical conclusion with implications]
|
||||
- [Additional findings as needed]
|
||||
|
||||
**Primary Recommendation:** [One clear sentence stating the main recommendation]
|
||||
|
||||
**Confidence Level:** [High/Medium/Low with brief justification]
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
### Research Question
|
||||
[State the original question clearly and completely]
|
||||
|
||||
[Add 1-2 sentences providing context for why this question matters]
|
||||
|
||||
### Scope & Methodology
|
||||
[2-3 paragraphs explaining:]
|
||||
- What specific aspects were investigated
|
||||
- What was included vs excluded from scope
|
||||
- What research methods were used (web search, academic sources, industry reports, etc.)
|
||||
- How many sources were consulted
|
||||
- Time period covered
|
||||
|
||||
### Key Assumptions
|
||||
[List 3-5 important assumptions made during research]
|
||||
- Assumption 1: [Description and why it matters]
|
||||
- Assumption 2: [Description and why it matters]
|
||||
- [Continue...]
|
||||
|
||||
---
|
||||
|
||||
## Main Analysis
|
||||
|
||||
<!-- CRITICAL: Write 4-8 detailed findings, each 300-500 words -->
|
||||
<!-- Each finding should have multiple paragraphs with evidence -->
|
||||
<!-- Include specific data, quotes, statistics, not vague statements -->
|
||||
<!-- PRECISION: Use exact numbers, specific metrics, no fluff words -->
|
||||
<!-- "mortality reduced 23%" not "significantly improved" -->
|
||||
<!-- "5 trials (n=1,847)" not "several studies" -->
|
||||
|
||||
### Finding 1: [Descriptive Title That Captures the Key Point]
|
||||
|
||||
[Opening paragraph: State the finding clearly and why it matters]
|
||||
|
||||
[Body paragraphs:
|
||||
- Present detailed evidence
|
||||
- Include specific data, statistics, dates, numbers
|
||||
- Explain mechanisms, causes, or relationships
|
||||
- Discuss implications
|
||||
- Address nuances or exceptions
|
||||
]
|
||||
|
||||
**Key Evidence:**
|
||||
- Data point 1 from Source A [1]
|
||||
- Data point 2 from Source B [2]
|
||||
- Conflicting view from Source C [3] and how it was resolved
|
||||
|
||||
**Implications:**
|
||||
[1-2 paragraphs on what this finding means for the user's decision/understanding]
|
||||
|
||||
**Sources:** [1], [2], [3], [4]
|
||||
|
||||
---
|
||||
|
||||
### Finding 2: [Descriptive Title]
|
||||
|
||||
[Follow same detailed structure as Finding 1]
|
||||
[Minimum 300 words per finding]
|
||||
[Include multiple paragraphs with evidence]
|
||||
|
||||
**Sources:** [5], [6], [7], [8]
|
||||
|
||||
---
|
||||
|
||||
### Finding 3: [Descriptive Title]
|
||||
|
||||
[Continue with same detail level]
|
||||
|
||||
**Sources:** [9], [10], [11]
|
||||
|
||||
---
|
||||
|
||||
### Finding 4: [Descriptive Title]
|
||||
|
||||
[And so on... Include 4-8 major findings minimum]
|
||||
|
||||
**Sources:** [12], [13], [14]
|
||||
|
||||
---
|
||||
|
||||
[Continue with additional findings as needed]
|
||||
|
||||
---
|
||||
|
||||
## Synthesis & Insights
|
||||
|
||||
<!-- This section should be 500-1000 words -->
|
||||
<!-- Go beyond just summarizing - generate NEW insights -->
|
||||
|
||||
### Patterns Identified
|
||||
|
||||
[2-3 paragraphs identifying key patterns across findings]
|
||||
|
||||
**Pattern 1: [Name]**
|
||||
[Explain the pattern in detail, cite which findings support it]
|
||||
|
||||
**Pattern 2: [Name]**
|
||||
[Continue...]
|
||||
|
||||
### Novel Insights
|
||||
|
||||
[2-3 paragraphs of insights that go BEYOND what sources explicitly stated]
|
||||
|
||||
**Insight 1: [Name]**
|
||||
[What you discovered by connecting information across sources]
|
||||
[Why this matters even though no single source said it explicitly]
|
||||
|
||||
**Insight 2: [Name]**
|
||||
[Continue...]
|
||||
|
||||
### Implications
|
||||
|
||||
[2-3 paragraphs on what all this means]
|
||||
|
||||
**For [User Context]:**
|
||||
[Specific implications for the user's situation/decision]
|
||||
|
||||
**Broader Implications:**
|
||||
[Wider significance of these findings]
|
||||
|
||||
**Second-Order Effects:**
|
||||
[What might happen as consequences of these findings]
|
||||
|
||||
---
|
||||
|
||||
## Limitations & Caveats
|
||||
|
||||
<!-- Be honest and comprehensive about what's uncertain -->
|
||||
|
||||
### Counterevidence Register
|
||||
|
||||
<!-- Document findings that contradict or challenge main conclusions -->
|
||||
|
||||
[2-3 paragraphs explaining contradictory evidence found during research]
|
||||
|
||||
**Contradictory Finding 1:** [Description]
|
||||
- Source: [Citation]
|
||||
- Why it contradicts: [Explanation]
|
||||
- How resolved/interpreted: [Your analysis]
|
||||
- Impact on conclusions: [Minimal/Moderate/Significant]
|
||||
|
||||
**Contradictory Finding 2:** [Continue...]
|
||||
|
||||
### Known Gaps
|
||||
|
||||
[2-3 paragraphs explaining:]
|
||||
- What information was not available
|
||||
- What questions remain unanswered
|
||||
- What would strengthen this research
|
||||
|
||||
**Gap 1:** [Description]
|
||||
- Why it's missing
|
||||
- How it affects conclusions
|
||||
- How to address it in future research
|
||||
|
||||
**Gap 2:** [Continue...]
|
||||
|
||||
### Assumptions
|
||||
|
||||
[Revisit key assumptions from intro, now with more detail on their validity]
|
||||
|
||||
**Assumption 1:** [Restate]
|
||||
- Evidence supporting it: [...]
|
||||
- Evidence challenging it: [...]
|
||||
- Overall validity: [...]
|
||||
|
||||
### Areas of Uncertainty
|
||||
|
||||
[2-3 paragraphs on:]
|
||||
- Where sources disagree
|
||||
- Where evidence is thin
|
||||
- Where extrapolation was necessary
|
||||
- What could change conclusions
|
||||
|
||||
**Uncertainty 1:** [Topic]
|
||||
[Detailed explanation of what's uncertain and why]
|
||||
|
||||
**Uncertainty 2:** [Continue...]
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
<!-- Make this actionable and specific -->
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
[3-5 specific actions the user should take NOW]
|
||||
|
||||
1. **[Action Title]**
|
||||
- What: [Specific action]
|
||||
- Why: [Rationale based on findings]
|
||||
- How: [Implementation steps]
|
||||
- Timeline: [When to do this]
|
||||
|
||||
2. **[Continue with similar detail...]**
|
||||
|
||||
### Next Steps
|
||||
|
||||
[3-5 actions for the near-term future (1-3 months)]
|
||||
|
||||
1. **[Step Title]**
|
||||
- [Similar detailed structure]
|
||||
|
||||
### Further Research Needs
|
||||
|
||||
[3-5 areas where additional research would be valuable]
|
||||
|
||||
1. **[Research Topic]**
|
||||
- What to investigate: [Specific question]
|
||||
- Why it matters: [Connection to current findings]
|
||||
- Suggested approach: [How to research it]
|
||||
|
||||
---
|
||||
|
||||
## Bibliography
|
||||
|
||||
<!-- ============================================================================ -->
|
||||
<!-- CRITICAL: Generate COMPLETE bibliography with ALL sources cited in report -->
|
||||
<!-- DO NOT use placeholders like "[8-75] Additional citations" or "etc." -->
|
||||
<!-- DO NOT use "...continue..." or "[Continue with all sources...]" -->
|
||||
<!-- EVERY citation [N] in report body MUST have corresponding entry here -->
|
||||
<!-- If report cites [1]-[25], bibliography MUST contain all 25 complete entries -->
|
||||
<!-- Format: [N] Author/Organization (Year). "Title". Publication. URL -->
|
||||
<!-- ============================================================================ -->
|
||||
|
||||
[1] Author Name or Organization (2025). "Full Title of Article or Paper". Publication Name or Website. https://full-url.com (Retrieved: 2025-11-04)
|
||||
|
||||
[2] Second Author (2024). "Second Article Title". Journal Name, Volume(Issue), pages. https://doi-or-url.com (Retrieved: 2025-11-04)
|
||||
|
||||
<!-- Add ALL remaining citations [3] through [N] here -->
|
||||
<!-- Standard reports: 15-30 sources | Deep/UltraDeep: 30-50 sources -->
|
||||
<!-- Write each entry completely - NO ranges, NO "etc.", NO placeholders -->
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Methodology
|
||||
|
||||
### Research Process
|
||||
|
||||
[2-3 paragraphs describing the research process in detail]
|
||||
|
||||
**Phase Execution:**
|
||||
- Phase 1 (SCOPE): [What was done]
|
||||
- Phase 2 (PLAN): [What was done]
|
||||
- Phase 3 (RETRIEVE): [What was done]
|
||||
- [Continue for all phases executed]
|
||||
|
||||
### Sources Consulted
|
||||
|
||||
**Total Sources:** [Number]
|
||||
|
||||
**Source Types:**
|
||||
- Academic journals: [Number]
|
||||
- Industry reports: [Number]
|
||||
- News articles: [Number]
|
||||
- Government/regulatory: [Number]
|
||||
- Documentation: [Number]
|
||||
- [Other categories]
|
||||
|
||||
**Geographic Coverage:**
|
||||
[If relevant, note geographic distribution of sources]
|
||||
|
||||
**Temporal Coverage:**
|
||||
[Date range of sources, recency distribution]
|
||||
|
||||
### Verification Approach
|
||||
|
||||
[2-3 paragraphs explaining:]
|
||||
|
||||
**Triangulation:**
|
||||
- How claims were verified across multiple sources
|
||||
- Minimum sources required per major claim: 3
|
||||
- How contradictions were handled
|
||||
|
||||
**Credibility Assessment:**
|
||||
- How source quality was evaluated
|
||||
- Scoring system used (0-100)
|
||||
- Average credibility score: [Number]/100
|
||||
- Distribution: [High/medium/low source counts]
|
||||
|
||||
**Quality Control:**
|
||||
- Validation checks performed
|
||||
- Issues found and corrected
|
||||
- Final quality metrics
|
||||
|
||||
### Claims-Evidence Table
|
||||
|
||||
<!-- Explicit mapping of major claims to supporting sources -->
|
||||
|
||||
| Claim ID | Major Claim | Evidence Type | Supporting Sources | Confidence |
|
||||
|----------|-------------|---------------|-------------------|------------|
|
||||
| C1 | [First major claim from findings] | [Primary data / Meta-analysis / Expert opinion] | [1], [2], [3] | High / Medium / Low |
|
||||
| C2 | [Second major claim] | [Evidence type] | [4], [5], [6] | High / Medium / Low |
|
||||
| C3 | [Third major claim] | [Evidence type] | [7], [8] | High / Medium / Low |
|
||||
| ... | [Continue for all major claims] | ... | ... | ... |
|
||||
|
||||
**Confidence Levels:**
|
||||
- **High**: 3+ independent sources, consistent findings, strong methodology
|
||||
- **Medium**: 2 sources OR single high-quality source with minor contradictions
|
||||
- **Low**: Single source OR significant contradictions in evidence
|
||||
|
||||
---
|
||||
|
||||
## Report Metadata
|
||||
|
||||
**Research Mode:** [Quick/Standard/Deep/UltraDeep]
|
||||
**Total Sources:** [Number]
|
||||
**Word Count:** [Approximate count]
|
||||
**Research Duration:** [Time taken]
|
||||
**Generated:** [Date and time]
|
||||
**Validation Status:** [Passed with X warnings / Passed without warnings]
|
||||
|
||||
---
|
||||
|
||||
<!-- END OF TEMPLATE -->
|
||||
<!-- Remember: Write COMPREHENSIVE, DETAILED reports -->
|
||||
<!-- Target 2,000-5,000 words minimum, more for deep modes -->
|
||||
<!-- Include specific data, evidence, and analysis throughout -->
|
||||
Reference in New Issue
Block a user