AGI -- Artificial General Intelligence

A messy and incomplete list of open source (and some notable closed-source) Artificial General Intelligence projects.

A good overview is given by: Pei Wang's AGI Intro

Project plans

Some all-encompassing plans for do-it-all systems, typically joining together natural language processing, semantic reasoning, and 3D/virtual robotic interfaces.
OpenCog

Novamente's general cognition/reasoning system. Includes NLP subsystem, reasoning, 3d virtual avatar, robotics interfaces. Open-source, GPL license.

Pei Wang's NARS project

NARS, the Non-Axiomatic Reasoning System, is a general-purpose reasoning system. Several white-papers. Inspired OpenCog. (OpenCog claims to overcome certain limitations in NARS) OpenNARS is Pei Wang's implementation. Released under GPLv2.

Nutcracker

Performs textual entailment using a first-order-logic (FOL) theorem prover, and an FOL model builder. Written in prolog(!) Non-free license, bars commercial use.

TexAI

Aims to couple common-sense knowledge-basees systems to natural langauge text processing. Open source project.

Stan Franklin's LIDA

An intelligent agent, communicating by email. Built for the US Navy. Based on Baar's Global Workspace Theory. Answers only one question "What do I do next?". See Tutorial

MultiNet

The MultiNet paradigm - Knowledge Representation with Multilayered Extended Semantic Networks by Hermann Helbig. Wires up NLP processing to hard-wired upper ontology, and adds reasoning. No source code available.

John Weng's SAIL architecture

Seems primarily aimed at robots.

Nick Cassimatis's PolyScheme

no code available.

Jeff Hawkins Numenta

Commercialized "Heierarchical Temporal Memory"

SNePS

SNePS is a knowledge representation, reasoning, and acting (KRRA) system. See also the Wikipedia page See also a paper by Shapiro, part of the SNePS group.

OntoSem - Ontological Semantics

Developed by Hakia Labs, proprietary, commercial software for taking NLP input and generating ontological frames/expressions from it. See also ontologicalsemantics.com.

Open source projects

Some smaller, less-encompassing projects or pieces/parts:
MicroPsi
Study of emotional agents. Simple virtual robotic agents that roam a 3D world and interact in various psycholgically motivated (needs & wants) kinds of ways. Humboldt University of Berlin. Java/Eclipse infrastructure.
AGIsim
GPL. AGISim is a framework for the creation of virtual worlds for artificial intelligence research, allowing AI and human controlled agents to interact in realtime within sensory-rich contexts. AGISim is built on the Crystal Space 3D game engine. Some parts of AGISim are closely related to OpenCog.
A.L.I.C.E.
Chatterbot, AIML
Hypergraph DB
Database for storing hypergraphs. Pretty Cool. Java based, C++ coming. Strange BSD-like license, but requires source code! Compatibility of license with GPL is unclear.

Ontologies, Knowledge Bases and Reasoning Engines

A giant list at Some Ongoing KBS/Ontology Projects and Groups. Problems with ontologies are reviewed in Ontology Development Pitfalls A giant list is at Peter Clark's Ongoing KBS/Ontology Projects page.

Big ones include

ConceptNet3

Common-sense knowledgebase. Large. GPL license. Users can edit data online, at http://torg.media.mit.edu:3000/

Open Mind Common Sense

Collection of english-language sentences, rather than using a strict upper ontology. This is actually quite conventient, if you have a good NLP input system, as it helps avoid the strictures of pre-designed ontologies; and rather gets you to deal with the structure of your NLP-to-KR layter. From MIT. -- large -- 700K sentences

YAGO

Yago is a huge semantic knowledge base, consisting primarily of information about entities. Contains 2M entities, and 20M facts about them. The YAGO-NAGA project also includes SOFIE, a system for automatically extending an ontology via NLP and reasoning.

WordNet

Semantic network.

See also: Wordnet::Similarity A perl module implementing various word similarity measures from Wordnet data. i.e. Thesaurus-like.

Historical Thesaurus of English

Licensing is unclear.

SUMO - Suggested Upper Merged Ontology

SUMO WP article. Includes an open source Sigma knowledge engineering environment, includes a theorem prover. Sigma uses KIF.

"The largest formal public ontology in existence", availble under GPL. (although OpenCyc is arguably bigger, and is free.) Has mappings to WordNet.

OpenCyc

Large KB under artistic license. Source for engine not available. KB seems messy and capricious. The uppper ontology is not clear.

ThoughtTreasure

Common sense KB, available in CycL. GPL'ed

Conceptual Nets

A knowledge representation system. Conceptual Graph Interchange Format is an ISO standard. See also "Common Logic Interchange Format (CLIF)", which is more lisp-like.

Ontolingua

Seems well-engineered. Actual KB is slim. Source not available. Might be a dead project??

GFO - General Formal Ontology

Provides a firm theoretical foundation for representing ontologies; no actual data. OWL version of GFO under a modified BSD license. Examples include the periodic table of elements, amino acids. See also WP article.

DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering
PSL - Process Specification Language
BFO - Basic Formal Ontology
Algernon - Rule-Based Programming
Java, on sourceforge. Recommended for small-to-medium systems. A frame-slot type system.
CLIPS - A Tool for Building Expert Systems
Originally from NASA, now public domain. See also Wikipedia page. Rather old, and primitive.
SOAR expert system
DAML+OIL
Obsoleted by OWL
KIF - Knowledge Interchange Format
Obsoleted by SOU-KIF (used in SUMO)

Reasoning engines/Inference engines

Reasoning engines only, without accompanying inference engines.

See also datalog for a decent list of databases/reasoners that implement the dtalog query system.

See also Open Source Rule Engines in Java

PLN Probabilistic Logic Network

Uses probabalistic analog of first-order logic, kind-of. Ideal for uncertain inference. Beta available now. In the process of being ported to Opencog. GNU GPLv3 Affero license.

XSB

Prolog engine, open source. Supports tabling/memoing, well-founded negation. This is one of the fastest inference engines out there, per results of the Madrid 2009 Semantic Web OpenRuleBench results. Personally, I suspect that this is because of a strong grounding in inference and language design theory on the part of the developers.

Yap

Prolog engine. for performance, adds "demand-driven indexing". This is one of the fastest inference engines out there, per results of the Madrid 2009 Semantic Web OpenRuleBench results. Personally, I suspect that this is because of a strong grounding in inference and language design theory on the part of the developers.

IRIS

Inference engine, bottom-up. Implements the datalog query system. Has "Magic Set" optimization. Implemented in Java. Immature? LGPL license.

PowerLoom

PowerLoom uses a fully expressive, logic-based representation language (a variant of KIF). It uses a natural deduction inference engine that combines forward and backward chaining to derive what logically follows from the facts and rules asserted in the knowledge base. Has interfaces to common-lisp, C++ and Java. GPL license.

PyKE

Inference engine, specifically tailored to work well with Python. Features:

PyIE - Python Inference Engine

No website. spotty download. See announce for details. Appears to be one-time-only code snapshot release.

The Scone Knowledge-Base Project

Sigma Knowledge Engineering Environment

Primarily an inference engine coupled to an ontology. GPL license.

DROOLS

Drools is a business rule management system (BRMS) and an enhanced Rules Engine implementation, ReteOO, based on Charles Forgy's Rete algorithm tailored for the Java language. Despite using RETE, this is possibly the slowest inference engines out there, as well as the least stable (per WWW Madric 2009 Semantic Web OpenRuleBench results).

Prova

Function symbols. Meant for event processing, not data processing ...

Boolean SAT, SMT Propositional logic solvers

Use Boolean SAT for traditional propositional logic solvers, use SMT for solvers that include arithmetic expressions.

NLP - Natural Language Processing

A wiki containing an extensive listing of software oand other things is at ACLWeb, and in particular, at the Tools and Software page. A small list is at the NLP Resources wiki page at agiri.org.

A particularly important theory is Dick Hudson's Word Grammar.

Other NLP resources include:

Morphology software
A wiki of morphology s/w
VerbOcean
A set of semantic-like verb frames.
FrameNet
A set of semantic-like frames. Free for personal use, but has commercial license.
WordNet
Dictionary of synonyms, antonyms, etc.

See also http://www.singinst.org/research/researchareas

General tools

Freeling

Includes a shallow parser, a sentence splitter, entity detection, sense annotation (using wordnet senses), etc.

CWB

The IMS Open Corpus Workbench (CWB) is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP.

NLP Parsers

Another kind of useful linguistic resource is the NLP parser. Some free NLP parsers are:
Link Grammar Parser
From Carnegie-Mellon. A parser for the English language, based on "link grammar", a novel theory of natural language syntax. Written in C, with a BSD license. English dictionary includes 90K words. Actively maintained.
RelEx Semantic Relationship Extractor
Built on top of the Carnegie Mellon link parser. Extracts semantic structures from link data. Includes ability to handle multi-sentence corpus, entity detection, and perform anaphora (pronoun) resolution via Hobbs algo. Apache v2 license. Written in Java. Actively developed/maintained.
Fluild Construction Grammars.
Idea from Luc Steels. There is a LISP implementation at http://www.emergent-languages.org/ A Java implementation at TexAI.
Maltparser
Maltparser is a system for data-driven dependency parsing, which will learn a parsing model from treebank data, and can then be used to parse new data using the induced model. Java, BSD license.

Word Sense Disambiguation

Word sense disambiguation attempts to determine which of multiple possible semantic senses are used in a sentence. A good set of references and code are on Rada Mihalcea senseval.org page. Code is under GPL license. See also:

Entity Extraction

Other NL tasks include entity extraction. Entity extraction refers to the recognition of names, dates, places in a body of text. Related is the recognition of technical terms. Also popular are "frameworks", which will graph things, provide user interfaces, etc.; these are useful for R&D.

Freeling
NLTK -- Natural Language Toolkit
Has a book, multiple articles. Integration into WordNet. Written in python. Not clear whether it has an actual parser. Seems to do some sort of entity extraction, esp. for biomedical terms.
GATE - General Architecture for Text Engineering
Java, GPL'ed. Big. GATE is supplied with an Information Extraction system called ANNIE, which seems to be focues on "entity extraction". Also in use for Dialogue processing and Natural Language Generation.

Learning, data clustering

Linear classifiers, data dimension reduction, data clsutering, principal component analysis, etc. An overview includes the The Impoverished Social Scientist's Guide to Free Statistical Software and Resources.
libSVM
Library that implements Support Vector Machine, which is one of many ways of doing a linear classifier.
TiMBL
Fast, decision-tree-based implementation of k-nearest neighbor classification. Implements half-dozen algo's. GPL'ed. (Might not scale well for large problems?)
SimCluster
Assumes data is located on a simplex, and uses that fact in it's algo's. Includes an algo for PCA analysis, another using a partition clustering algorithm, and an agglomerative hierarchical clustering using the Aitchison distance. Command-line interface. Written in C. (No library intrfaces currently defined.) GPL license.
Mfuzz
Mfuzz clustering. Aimed at genetic expression time-series data, claimed to be robust against noise. Uses R language. GPLv2 license.
Rattle
R-based data mining. GPL.
OpenBioMind
Performs clustering using generic programming techniques. (i.e. attempts to find small algorithmic expressions that will cluster the data). Omniclust is an n-ary agglomerative search algorithm. For details, see, Clustering gene expression data via mining ensembles of classification rules evolved using moses. Looks M, Goertzel B, de Souza Coelho L, Mudado M, Pennachin C. Genetic and Evolutionary Computation Conference. (GECCO 2007): 407-414. Java codebase.

Narrow AI

RapidMiner (YALE) Java data mining
OntoWiki and Powl
Semantic web development. Screenshots show business-type apps: addressbook, calander, etc. Powl seems to be a classes and GUI designer. GPL license
OWL API
Java interface for the W3C Web Ontology Language OWL. LGPL license.
Siafu: an Open Source Context Simulator
Simulate individual agents
Jamocha - one engine for all your rules.
Rule engine

Robotics

MOAST
The Mobility Open Architecture Simulation and Tools (MOAST) framework aids in the development of autonomous robots. It includes an architecture, control modules, interface specs, and data sets and is fully integrated with the USARSim simulation system.
OpenJaus
Robotics messaging. Military standard.

Big computers

National Science Foundation: Google+IBM: Cluster Exploratory -- grants for large cluster science

Misc links

MapReduce
Distributed processing using key-value generaion and reducing primitives. See Hadoop for an open-source implementation.
Shard databases
Shard overview escribes an alternate to centralized, normalized datbases.