Asadai Project

"All-Singing, All-Dancing Access to Information"

© Steven O. Kimbrough

kimbrough@wharton.upenn.edu


Principal Documents

  1. "A Note on Ingredient-Product Modeling, Competitive Intelligence and Text Mining," Steven O. Kimbrough, DRAFT: 2006-12-30. .xls (Excel) file.
  2. Knowledge@Wharton article, July 2, 2003, "In Search of Serendipity: Bridging the Gap That Separates Technologies and New Markets". http://knowledge.wharton.upenn.edu/articles.cfm?catid=14&articleid=812.
  3. Briefing slides: "Information from Text: Decision Support for Product Matching" PDF . (April 2003).
  4. Briefing slides: "Information from Text" PDF. (May 2003).
  5. Briefing slides: "Practical Reasoning's Core of Discovery" PDF
  6. "Executive Briefing: Capabilities of Practical Reasoning's Core of Discovery" HTML
  7. "Overview of Practical Reasoning's Core of Discovery" HTML
  8. Dworman, Garett O., Kimbrough, Steven O., and Patch, Chuck. (2000) "On Pattern-Direct Search of Archives and Collections," Journal of the American Society for Information Science 51, no. 1. MS Word

Other Documents

Open Source Sources

  1. dmoz.org/Computers/Software/Information_Retrieval/.

External Projects and Resources

  1. Manning & Schutze website for statistical natural language processing. Ah, but the new site is http://www-nlp.stanford.edu/fsnlp/.

    See also: http://www-nlp.stanford.edu/links/statnlp.html.

  2. http://www.gutenberg.net/. Project Gutenberg: electronic texts now past copyright.
  3. UNITED STATES NATIONAL INFORMATION INFRASTRUCTURE VIRTUAL LIBRARY.
  4. Hearst, Marti A. (1999) "Untangling Text Data Mining," Proceedings of ACL'99: the 37th Annual Meeting of the Association for Computational Linguistics. Paper
  5. Arrowsmith Home Page
  6. Center for Intelligent Information Retrieval at U Mass.
  7. UTS page on Knowledge Management.
  8. ATT Worldnet:Reference|Knowledge Management.
  9. Information Retrieval by Keith van Rijsbergen.
  10. Knowledge Management at Bookings.
  11. Dear Steve, Here is the URL where I found the hierarchy data: The files I downloaded were the ones that end with lcco_*.wpd Annapurna

Newspaper and Trade Press Documents

  1. Document Reading Made Easy. A new software might help journalists sort through reams of documents in minutes Rebecca Fairley Raney posted: 2002-07-02 For the reporters who have devoted themselves to documents, spending hours reading school board agendas backwards to catch every detail, Murray Craig's invention holds a definite appeal.

    Licensed by eNeuralNet.

  2. "A Scholar Recants on His 'Shakespeare' Discovery" The New York Times 6/20/02, by William S. Niederkorn. "intertextual analysis" and "stylometrics" (discourse analysis)

    Also locally in plain text.

  3. Andrew Warzecha, December 23, 2000. "Differentiating Content Management, Document Management, and Portals. As the definitions of these three product categories become more blurred, it's important to learn the distinct characteristics of each." The Meta Group: www.metagroup.com. HTML
  4. "Taxonomies for Enterprise Knowledge", Jan. 2001, Knowledge Management Magazine.

Trade Press

  1. Knowledge Management Magazine.
  2. SearchTools, reviews of search engines.

Interesting Companies and Products

Text Mining Companies and Products

  1. ClearForest.
  2. eNeuralNet. (2002-7-6). eNeuralNet has created the next generation of Business Intelligence tools... The Knowledge Management (KM) world has been waiting for the next breakthrough technology. eNeuralNet's offering to the marketplace is so unique and powerful that it will be the ASP of choice for every consultant and hardware vendor that has truthfully evaluated the alternatives and desires to transform their clients into "knowledge powered enterprises." eNeuralNet has the ability to take vast, unorganized streams of linear data, either paper or electronic, and translate that data into a lightning fast, single multi-dimensional Knowledge Cube that provides unprecedented access and context to your corporate knowledge created (or recovered) from your own data!
  3. Megaputer (2002-7-4)

General

  1. http://www.srdnet.com/nora.htm, Detecting patterns is the key to fraud prevention. Systems Research & Development (SRD) has a collusion detection product called Non-Obvious Relationship Awareness or NORA.
  2. http://www.kdnuggets.com/, KDNuggets, Data Mining, Knowledge Discovery, Genomic Mining, Web Mining
  3. http://www.leximancer.com/kdnuggets.html, Leximancer - Practical Text Mining and Concept Mapping
  4. http://www.textanalysis.info/, Text Analysis Info is a free information source for information that deals with the analysis of content of human communication, mostly but not limited to text. Several programs support the coding of audio, video, or even chatroom sources.
  5. Intellligent Results (5/14/02).
  6. InMentia.
  7. Top Yield(3/5/02) from Tate.
  8. Stratify (formerly PurpleYogi). (2/24/02).
  9. Intelliseek
  10. Mohomine
  11. in-q-tel.com
  12. Knowledge-Base, Gordon Freedman & Co.
  13. Stellent. A knowledge management company.
  14. LivingText.
  15. Convera.
  16. Cognisphere.
  17. Vantage (accessed 1/2/01)
  18. Quantum Leap Innovations.
  19. Ejemoni. (Garett Dworman)
    H5 Technologies.
  20. Intraspect Software.
  21. Autonomy.
  22. NextPage, maker of Folio Views.
  23. Grooveand GrooveNetworks.
  24. 80-20 Software, 425-739-6767.
  25. Mohomine, 858-362-3000.
  26. Smartlogik, (44) 020-793-069-00.
  27. Microsoft's Tahoe project (see Google and "Mircosoft Tahoe")
    At Microsoft.
    Review of SharePoint (Microsoft Tahoe)

    SharePoint home page.

    See also Microsoft's Okapi search engine (e.g., TREC7).

  28. Content magazine. ContentWorld.
  29. Medline .
  30. North American Industry Classification System (NAICS).
  31. MARC 21 format info from The Library Corporation.
  32. Systems Planning, Inc., Stephen Toney and MARC View.
  33. TREC Home Page.
  34. U.S. Patents (Inc.)

Text Data Sources

  1. SEC: www.sec.gov
  2. Free Edgar
  3. Edgar On Line.

Computational Linguistics Etc.

  1. The Association for Computational Linguistics.
  2. Computational Linguistics - Resources and Institutions.
  3. Computational Linguistics (MIT).
  4. The Association for Computational Linguistics.
  5. Linguistics Meta-index.

Useful services/utilities

  1. PDF conversion

Classification Schemes

  1. Yaki TECHNOLOGIES, ICD-9
  2. ICD-9 Organization
  3. ICD-9 codes
  4. MeSH home
  5. Unified Medical Language System (UMLS)

Search Engines

  1. Teoma.

Demos

  1. CoD on Bad.
  2. Asadai on Tip-It

Sources for Information

  1. Patents: US Patent and Trademark Office. Note: impact resistence.
  2. SEC: Securities and Exchange Commission

Text Processing and Parsing

  1. Charming Python: Text processing in Python at IBM by David Mertz.
  2. Charming Python: Parsing with the SimpleParse module at IBM by David Mertz, January 2002.

Useful Sites

  1. Christopher Browne's Web Pages. "Text/Document Databases." EDMS-Electronic Document Management Systems. Text Database Systems.
  2. Yahoo's Industry Summary page.
  3. Delphion, "Intellectual Asset Management".

Demos

TextTell on Stella.

Research and Organizations

  1. Association for Literary and Linguistic Computing.
  2. An Overview of Public Domain Language Engineering Generic Tools.
  3. Stanford Natural Language Processing Group.
  4. Foundations of Statistical Natural Language Processing Christopher D. Manning and Hinrich Schutze.
  5. Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources.
  6. PennTools: Computational Linguistics Resources At Penn.
  7. Steven Bird's CIS 530 home page.
  8. NLTK: Natural Language Tool Kit.
  9. Do-It-Yourself Corpus Linguistics. Language Discovery Tools for Teachers, Translators, and Writers.
  10. Concordance (concordance software): "Software for text analysis gives you better insight into electronic texts."
  11. Concordancers: MonoConc. MonoConc commercial site.
  12. Tutorial: Concordances and Corpora by Catherine N. Ball.
  13. Dave Ness at Comcast.
  14. PHYLIP. PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executables.

Open Directories

  1. dmoz.org, the Open Directory Project. Lots of links to open (freely available) documents and software.

Specifications, Requirements, Rules

  1. http://dodssp.daps.mil/, DODSSP, The Department of Defense Single Stock Point for Military Specifications, Standards and Related Publications.
DuPont Report.