Free software and open source tools for investigative journalism and journalistic research

Free software for journalists: Tutorials, bookmarks and open source tools for journalistic research, investigations and privacy and other digital tools for investigative journalism and data driven journalism or datajournalism:

Independent media tools for journalists and investigative reporting

With free open source software it is possible to run research tools for sensitive documents or data on your own computer or server instead of spying cloud services.

Tutorials and tips: How to use open source research tools for investigative journalism

Toolbox: Free software, open source tools and resources

Free software and open source discovery and research tools for journalists:

Search engines for fulltext search and discovery

Research methods, techniques and technology: Fulltext search, Information retrieval, Desktop Search, Enterprise Search and faceted search

Tutorials:

Open source search tools:

Search libraries and APIs

If you want code yourself, you can use this powerful engines as base:

Databases, digital archives, data management systems, document management systems and content management systems

Methods: Archive, database, forms, categories (tagging), classification, meta data, repository, document management (DMS), content management (CMS) or enterprise content management (ECM), knowledge management, knowledge base, bookmarks

  • Zotero: Bookmark database and citations manager with tagging and annotation features
  • Docear: Bookmark database and citations manager with mindmap, tagging and annotation features
  • LibreOffice Calc: Open source spreadsheet program
  • Document cloud: Document management system for paper based documents like scans or PDF
  • Semantic Mediawiki: Extends Mediawiki to a semantic data base
  • Drupal CMS: The CMS module fields provides an easy to use UI to create own content types, data fields and forms
  • Agorum: Automated extraction of structured amounts of money from bills

Tagging and annotation

Methods: Annotation, Tagging, Social Tagging, Folxonomies

Tutorial: Tagging and annotation for collaborative investigative journalism

  • Zotero: Bookmark database and citations manager with tagging and annotation features
  • Docear: Bookmark database and citations manager with mindmap, tagging and annotation features
  • Document Cloud: Tagging and annotation for paper based documents like scans or PDF documents
  • Neonion: Collaborative annotations within text
  • Pundit: Annotations within text and within images
  • Hypothesis
  • Annotator.js

Text mining, text analysis and document mining

Method: Text mining, Natural Language Processing (NLP), Named entities extraction

More:

Reconcilation and merging

Methods: Compare, merge, reconcile, link, clustering

  • Fuzzy search with lists: Checks, if there are search result for each list entry
  • OpenRefine
  • DocDiff: Shows and visualize the differences between two versions of a text
  • Fslint: Compares two directories and searches for same files which are in both directories

Graphs and social network analysis (SNA)

Tools to analyze and visualize connections and relations:

  • Network analysis tutorial: How to visualize connections & relations in documents with Open Semantic Search
  • Gephi: Desktop tool for analysis and data visualization of networks, connections and graphs
  • Cytoscape.js: Javascript library for data visualization of networks, connections and graphs
  • Semantic Mediawiki: Very flexible CMS for linked data
  • Detective: Python/Django and neo4j graph database based CMS for connections

Privacy, security, safety and encryption


Digital security: Protect your research, sources and whistleblowers with privacy tools and encryption tools:

Methods: Encryption (PGP, OTR) and anonymization

Tutorials:

Open source tools:

  • Tails - the amnesic incognito live system Linux based operating system for encryption and anonymous access of the internet
  • Truecrypt: Hard disk encryption for windows
  • GNUPG: Open PGP based - Email encryption
  • Enigmail: Encryption plugin for the Thunderbird E-Mail client
  • Tor project: Anonymity online
  • OTR: Encryption for chats and instant messaging
  • Textsecure: Messenger for encryption (like Whatsapp but for privacy)
  • Jitsy: Encrypted communicator (like Skype but open source and safer end to end encryption)
  • Redphone: Encrypted voice over IP communicator for smartphones
  • Secure Drop: Upload platform for whistleblowers
  • Global Leaks: Another upload platform for whistleblowing

Media monitoring, news filtering, news pipes and alerts

Open source software for media monitoring, news processing, news filtering and alerting:

  • Open Semantic Search rules for news pipes and alerts: Filters and alerts for news from different news sources and data sources. Has a very powerfull filter and search query language (Apache Lucene based), f.e. supporting fuzzy search. Supports many file formats and data sources because you can use all standard connectors for Solr.
  • Mozilla Thunderbird: Desktop software for reading, filtering and autotagging RSS-Feeds
  • Streamtools: Visual news pipes for stream processing from the New York Times Lab
  • Huginn: Ruby on rails and SQL based agents

Extract data or convert data

Methods: Data integration, extraction, data converter, data migration, ETL (Extract Transfer Load), Scraping

Extract text or structured data from documents

Extract text from images (OCR)

Extract text from sound files (speech recognition)

  • CMU Sphinx: Open source speech recognition toolkit

Extract structured data from websites (Scraping)

  • Websites: Scrapy: Extract structured data from websites

Extract transform load (ETL) Frameworks for import and transform or convert data

Data visualization

Method: data visualization

Tools for data visualization or data visualisation:

Charts and diagrams

Maps and mapping (spatial data)

Create interactive maps and visualize spatial data (geodata) with open source software for mapping:

  • CartoDB: Open source webapplication and mapping tool for interactive maps
  • QGIS: Open source desktop tool for maps
  • Leaflet: Javascript library for interactive maps
  • Open Layers: Powerfull javascript library for maps
  • Open Street Map: Open source and open data for maps
  • Serving tiles: How to run your own map server with open source software

Visualize events on a timeline

Create timelines with open source timeline tools and visualize events on interactive multimedia timelines:

Graphs, networks, connections and relations

  • Network analysis tutorial: How to visualize connections & relations in documents with open semantic search
  • Gephi: Desktop tool for analysis and data visualization of networks, connections and graphs
  • Cytoscape.js: Javascript library for data visualization of networks, connections and graphs
  • Sigma js: Javascript library for data visualization of networks, connections and graphs

Redact documents and delete meta data

Clean sensitive documents and delete meta data stored invisible inside the document files or photos like serial numbers of hardware (i.e. of your photo camera) or software or user names:

Statistics and analytics

Method: Data analysis, statistics, chart, diagram, data visualization

Universal open source toolset

The ultimate universal open source toolset is a Linux distribution like Debian GNU/Linux or Ubuntu Linux comming with thousands of packages of free software and open source tools, software libraries and programming languages.

You dont have to remove your existing operating system: With open-source virtualization software like Virtual Box for Windows or Mac you can run a Linux distribution within a window in your existing operating system environment.

Maybe you want to start with Linux on your existing system environment with the preconfigurated Debian based virtual maschine (VM) Open Semantic Desktop Search providing a preselected and preconfigurated collection of tools for investigative journalists.