Knowledge Bases

The Central Information system is envisioned at a repository of information and experience, from library books for general reading, a computer database to map topics and indices of the materials available, and interactive computer programs and instructional video “how-to's” to relate what we have learned, in sufficient detail that the system can almost become self-instructive.
Post Reply
User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Knowledge Bases

Post by LoneBear » Sat Sep 24, 2011 3:20 pm

We've accumulated a lot of knowledge on the Antiquatis site, and one of the things I want to include for the Monastery is a knowledge base (KB) that not only contains religious and mythological information, but also has the ability to correlate to scientific data. But representing such information is somewhat a challenge. I'm starting this topic to throw out some ideas on how I would like to be able to store and associate information.

When dealing with historical data, a couple of things happen: names change over time and original concepts get buried in related concepts. This is nothing new; it's the way the human mind prefers to represent information--I believe it is called "chunking." It makes information more manageable by reducing details to general concepts. Such a knowledge base should account for that, but with the use of a computer, should be able to retain all the details with a grouping or summary function surrounding them. The chronological changes need to be accounted for, so you can backtrack to the source.

There are also cultural differences, particularly in religions. Take for example, the Norse warrior gods, the AEsir. The Vedas contain a very similar group, the Asura, which may or may not be referring to the same bunch of gods. But they are related by generic patterns of behavior. So knowledge must also have some kind of "source" tag or microtheory to identify the general mythology it originated from, as well as a behavioral relationship (warrior gods) to associate the concepts.

Most mythology relates gods by offspring and a type of military ranking, to determine the placement in the pantheon. In my discussions with Gopi (my Norse and his Vedic), we found a remarkable number of similarities, based on phonetic association, behavioral association, and similar stories told of them. I consider this far more important than who beget what, as it starts to build a correlation between different religions, showing that there may be a common factor.

What I would like in a KB is the ability to somehow quantify these associative attributes, like virtues and vices, and define how these motifs act as a behavioral basis (like zodiac influence) and interact between entities. It would be interesting to get the behavioral system designed to the point where one could predict the interaction, then verify it against the stories told. That would give some interesting clues to how ethics applies at an individual level.

It would be fascinating to punch in all the info from religions around the world, defining the gods, stories and defining moments, then let the computer run a comparison analysis to see if there is a "root" story behind it all. Can you imagine what that would do to the spiritual side of people, to be able to prove that we're actually all talking about the same thing?

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Re: Knowledge Bases

Post by LoneBear » Sun Sep 25, 2011 4:26 pm

Some other features I would like, since I accidentally deleted some data today and had to spend a couple hours sifting through backups...

All content and associations should support revisions, roll back and roll forward. Part of that chronological system... if you screw something up, it would be nice to reset it to what it was (roll back), or to go back to what you did from after a reset (roll forward). This would include things like user and profile information (which should just be another chunk of data in the DB) as well as associations--if I had a good start at entering the hierarchy of Norse gods, then blow an update, I would like to roll back that relation to my last "save point" (revision) and start again, rather than have to spend the time figuring out what damage I did and attempting to correct it.

This could be done two ways, sort of like the auto-save on an editor and a manual save. Anytime something is updated, it should make a new revision, but it would be nice to set a tag as to where you left off, on a set of updates. (Same features in a CMS--code management system).

Also, any deletions should be put in a "trash bin" (something nice about Windows that Linux doesn't do with ext3), so anything deleted can be recovered for at least a predefined period. Something that's been in the trash for say, a month, would automatically be deleted so there is a safety, along with the system being self-cleaning.

And with all the hassles I've gone through with the Drupal 7 upgrade, also needs a way to export/import the stored information in a code-independent fashion--not just a dump of the database, which is highly code dependent requiring specific names and ordering. The dump should be human-readable with URIs, rather than internal numbering schemes.

Last thing I learned from Drupal was that all the code modules need to be backwards compatible for at least 1 major version (Drupal policy is NO backwards compatibility). The killer on the D7 upgrade was that half the user-written modules I was using (like playing audio and video) have not been ported over to version 7, so I'm just out of luck unless I find some way to convert them for use with another module, which usually means keeping the old version around and recreating all those pages from scratch--a very time-consuming process (that I'm now going through with rstheory.org). If they had 1-version compatibility, even though designed for D6, would continue to work in D7 (but not in D8). That gives you some room to maneuver when upgrading.

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Modules and hooks

Post by LoneBear » Tue Sep 27, 2011 12:35 pm

The feature I like most about Drupal is at the programming level, where it uses "modules" and "hooks" to add functionality.

In other software, like phpBB (the forum software), upgrades are a pain because after you update the phpBB code, you have to manually go in and edit the files to add code for other features, like the "recent users" list down the bottom. Of course, they change things and sometimes it is hard to find where to put your code and what you have to do to make it work again.

With the hook system, it automatically checks for functions, based on a naming convention, that should be called prior, during or after an operation. Since it is automatic, all you have to do to add features is to add an appropriately-named function to your module and it's there. No editing of the core code (as in phpBB).

For example, when you install a Drupal module, you provide functions for install(), enable(), disable() and uninstall(). When the admin turns the module on, the install() is called so your code can do any setup functions it needs, then enable() is called to tell it to start processing. You can disable a module which keeps the installed stuff there (like database tables, etc), but inhibits any of its functions from being called for processing. Uninstall is the reverse of install, which deletes all traces of the module.

It is a very clean way to handle features.

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Security

Post by LoneBear » Fri Sep 30, 2011 2:19 pm

All pages on CI must have a fairly well grained access control system. It should be possible to limit access to "objectionable" material, such as Christian references in Islamic countries, by tagging pages with a taxonomy of terms, and the ability to look up an IP address to provide suitable content to the net surfer / lurker. If a person is interested enough, they would create an account and access to additional material would become available.

To wit, the blind-door security method is one of the better ones, where you just don't see what you don't have access to (unlike ghosting menus). It is also better to prevent hacking. That would include removing items from menus, entire menus if no content is accessible, as well as pages, text and graphics. It may be possible to set a "content" in a user profile to determine what kind of material they are interested in, so if visiting a page that is outside that context, a warning can be displayed, like a parental warning for adult material, to get their consent first. That should take care of a lot of problems regarding people getting content "they didn't want."

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Re: Knowledge Bases

Post by LoneBear » Sun Oct 02, 2011 2:58 pm

Something else that goes hand-in-hand with security is data integrity. In the many years I've worked with computers, I have found that it is better to insure data integrity in the database--in other words, don't let bad data get in there (code injected stuff, bad relations, invalid characters, etc). That's one of Drupal's messy spots. Poor documentation and a lot of user-contributed code, that never read the documentation there was so a lot of the coding rules were not followed. Their attempt to fix it was through the use of "filters", which process the information in the database and selectively filter out stuff, such as limiting which html tags can be sent to the browser. Problem is--anything can go in; it is only cleaned up for output.

I would like to see good coding standards, a transactional database system (like InnoDB) so if an operation is interrupted, the transaction can be rolled back and no partial relationships will exist in the database, and a good filter system on INPUT to inhibit the use of code injection and defective data. Graphical interfaces are much nicer in this respect, as it is drag-and-drop, versus typing or selecting from a list.

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Calendars, dates and times

Post by LoneBear » Mon Oct 03, 2011 9:58 am

Also need a standardized way to define calendars, dates and times, as well as time zones (geographically adjusted time of day). Unix timestamps only go back to 1970. Gregorian and Julian dates, later than 1582, differ by 13 days now. And there are many other calendars.

I'm thinking that with the accuracy of computers and astronomical measurement, a better system of dating can be used, with a granularity to it, as "Coordinated Universal Time" (UTC) is anything but "Universal." It would be nice to have an astronomical clock, such that the locations of constellations and planets could be determined for any period in history or the future, as well as "local times" for any planet in the solar system, and for Earth, a standard time that can be adjusted in user preferences to a local time based on the time zone system.

User avatar
Tulan
Cellarius
Cellarius
Posts: 453
Joined: Wed Aug 31, 2005 8:04 pm
Location: Austin, TX
Contact:

Re: Knowledge Bases

Post by Tulan » Tue Oct 04, 2011 12:23 pm

What I would like in a KB is the ability to somehow quantify these associative attributes, like virtues and vices, and define how these motifs act as a behavioral basis (like zodiac influence) and interact between entities. It would be interesting to get the behavioral system designed to the point where one could predict the interaction, then verify it against the stories told. That would give some interesting clues to how ethics applies at an individual level.
RDF/OWL graph databases and reasoners abound with this type of functionality - the time consuming (it's not even really difficult) part will be building the RDF vocabularies; luckily a lot of efforts have been made by various groups (UMBEL is the most notable which takes OpenCYC's and a few other big project's concept maps and makes it general, friendly, and standardized) to create a generalized vocabulary of concepts from which more fine-grained relationships, taxonomies, vocabularies, &c... can be constructed.
All content and associations should support revisions, roll back and roll forward.
This is a no-brainer; in all of my web applications these days I have an abstraction layer on top of my database models that allows me to give the model a simple attribute like "__history_meta__" that will activate a post-commit hook on any UPDATE/DELETE operations that INSERTs the record before it was modified with a version number and "iteration timestamp".

I also make heavy use of Mercurial (distributed version control) to version the database binaries itself on a daily basis so I have an entire history of the DB.

The cool thing about a RDF graph database like AllegroGraph is that it's ACID compliant (supports Commit, Rollback, and Checkpointing) like InnoDB; so models can wrap their database session transactions (and rollback on errors, or even application errors).

What I do for my current company is I have two different databases: I use a relational DB for the standard users/profiles/sessions/ACL/roles/&c... type of data (where domain models are abundant) and Riak (a horizontally scalable distributed key value store DB) for our data the requires less structure, it's primarily used for a product that produces enormous amounts of data, something we can MapReduce on.

I would do something similar for CI - use PostgreSQL (relational) for the standard user and app related data then use AllegroGraph for the actual knowledge representation and reasoning systems.
Also, any deletions should be put in a "trash bin".
In all of my web applications my domain models are built with a "removed" column, the models automatically query the data with a constraint for "removed=False" - this way data can be "removed" without being "deleted" and then you can run a cronjob once a month that prunes out data that's been "removed" for the last month. Really effective and works well with versioned backups of the DB too (that way your operating data set is kept managable, but if you have some need to retrieve all of that old data, you can still dip into backups).
The feature I like most about Drupal is at the programming level, where it uses "modules" and "hooks" to add functionality.
In most modern web application frameworks this is also standard practice - I use Python (the language) and Pyramid (a Python web application framework); Pyramid has a very powerful hook, extension, and meta-programming substrate (it really is a substrate, the entire framework is built on that system and it uses C-extension interfaces to make the object lookup super fast) - it also makes use of the Zope Component Architecture to make it even more flexible not to mention great use of Object Traversal.

PHP is now getting behind the times in comparison with Ruby and Python these days as both a language and also in the amount of libraries and tools available.
All pages on CI must have a fairly well grained access control system.
This is also taken care of for the web application side - Pyramid has a powerful (but general enough) ACL/Auth system baked in; you can define ACL's at the object level which let you inherit rules, extend rules, and many other features I can't really articulate at this moment. I can only say it's fairly easy to do with Pyramid!
Something else that goes hand-in-hand with security is data integrity. In the many years I've worked with computers, I have found that it is better to insure data integrity in the database--in other words, don't let bad data get in there (code injected stuff, bad relations, invalid characters, etc).
This is a two part problem - one part of it is poor programming standards when it comes to the database abstraction; this is something taken care of by Pyramid + SQLAlchemy (which has stringent data modeling and custom types to ensure SQL injection, code injection, and bad character encoding doesn't and can't happen since you [the programmer] never ever write raw SQL). The second part is just plain old stupid data inserted by the user - that's something we have to handle with stringent validation logic; something I've also spent the last 4 years building a lot of - so I generally consider that taken care of too.

There's a reason why I got away from PHP - most of these problems are now well known and solved in the Python and Ruby web programming communities, however it's something that seems to continue plaguing PHP apps (it's because PHP is a horrible language for anything beyond quick little 1-2 page web scripts and for some reason, PHP seems to attract both really good and really bad programmers; sadly there are far more bad PHP programmers than good that end up with their code being popularized or worshipped for some odd reason).
I would like to see good coding standards, a transactional database system (like InnoDB) so if an operation is interrupted, the transaction can be rolled back and no partial relationships will exist in the database.
I definitely agree with coding standards, Central Information will probably be structured much like a rigorous open-source project where standards have to be high in order for cohesion and code quality to be maintained - I would also say that systems like BitBucket (or self-hosted probably) and Mercurial (a distributed version control system) are necessities for bug tracking, new feature requests, ticketing and commit tracking, etc... It's the standard fare these days - the alternatives usually are GitHub and git.
I'm thinking that with the accuracy of computers and astronomical measurement, a better system of dating can be used, with a granularity to it, as "Coordinated Universal Time" (UTC) is anything but "Universal." It would be nice to have an astronomical clock, such that the locations of constellations and planets could be determined for any period in history or the future, as well as "local times" for any planet in the solar system, and for Earth, a standard time that can be adjusted in user preferences to a local time based on the time zone system.
I really like this idea a lot!
Ah, you seek meaning? Then listen to the music, not the song. - Kosh Naranek

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Re: Knowledge Bases

Post by LoneBear » Tue Oct 04, 2011 1:11 pm

Are you to the point where you can propose the tools to use, and for what parts of the system?

I would like to start taking a look at the various tools and checking for support on various operating systems.

I've always had a preference for "compile" languages and strict data typing, as I grew up when "high-speed" meant 200khz CPU speeds, every cycle counted, limited memory and not enough run-time checking to keep things from overwriting memory. I know that is not the case these days, but I consider programming to be more of an art, than a science.

I would also prefer that all the code used was open source, GPL licensed. (Getting a little tired of Windoze these days. I miss AmigaDOS!)

BTW... impressed with how far you've come with computers. Looks like you've developed a lot of good, practical skills--something missing in the industry today.

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Error Handling

Post by LoneBear » Tue Oct 04, 2011 1:15 pm

The system MUST have good error handling. I've been pulling out my hair debugging the Drupal 7 problems since the site upgrade. Nothing more annoying then clicking "submit" and ending up with a blank screen. Or, as in the case of Ubercart, "Unable to retrieve shipping quotes." WHY??? Indicating something failed is insufficient--errors need to report sufficient information so the problem can be identified and corrected quickly and efficiently.

I prefer the "exception" system (try {} catch {} blocks) as it has good granularity and is modular. Makes it easy to rollback a failed database transaction in one shot and to selectively handle "expected" errors.

User avatar
Tulan
Cellarius
Cellarius
Posts: 453
Joined: Wed Aug 31, 2005 8:04 pm
Location: Austin, TX
Contact:

Re: Knowledge Bases

Post by Tulan » Tue Oct 04, 2011 2:12 pm

Are you to the point where you can propose the tools to use, and for what parts of the system?
Yes, I'll put together a proposal document to be reviewed by you for comment. I'll go from there.
I would like to start taking a look at the various tools and checking for support on various operating systems.
I think if we stick with the web application idea for now (browsers are compatible on all OS's) it will be easier to get jump-started, later on building custom browser applications can be considered.
I've always had a preference for "compile" languages and strict data typing, as I grew up when "high-speed" meant 200khz CPU speeds, every cycle counted, limited memory and not enough run-time checking to keep things from overwriting memory.
Yeah, these days (particularly for web applications) dynamic languages are great for the majority of tasks and if you ever have to get raw speed you can dip down into something compiled - Python is great for that, you can use vanilla python for pretty much anything and if you have something bottlenecking you, you can write it in C as a Python extension and use it in your Python code (but keep the speed factor).
I would also prefer that all the code used was open source, GPL licensed.
Everything I use and would propose to use is open-source, closed-source software is a nightmare and I wouldn't dream of using it.
The system MUST have good error handling. I've been pulling out my hair debugging the Drupal 7 problems since the site upgrade. Nothing more annoying then clicking "submit" and ending up with a blank screen. Or, as in the case of Ubercart, "Unable to retrieve shipping quotes." WHY??? Indicating something failed is insufficient--errors need to report sufficient information so the problem can be identified and corrected quickly and efficiently.

I prefer the "exception" system (try {} catch {} blocks) as it has good granularity and is modular. Makes it easy to rollback a failed database transaction in one shot and to selectively handle "expected" errors.
Yeah, you've been missing out - Python web frameworks provide some really cool error handling, logging, and interception middleware. Here's an example - if your actual code throws an exception (like a type error, or undefined variable error) if you are in debug mode for development, it will give you a an interactive stack trace with the object context for each execution path - plus it will give you an actual view of the code.

When on production, this system will email that exact stack trace as an HTML email to a specified address and display the user with a friendlier (and less revealing) page - it also logs to a file on the server. This makes development a no-brainer and going live very safe.

[color=#0080FF]http://ixmat.us/data/dow ... ng[/color]
Ah, you seek meaning? Then listen to the music, not the song. - Kosh Naranek

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Role-based security

Post by LoneBear » Fri Oct 07, 2011 3:34 pm

One of the other features I like in Drupal is the role-based security, where users can be granted "roles" (like job descriptions) that grant privileges associated with those roles. Things like administration, editing, authenticated user, anonymous user, etc.

Couple drawbacks that I've found...

Drupal starts with "deny all" and goes through the security system, looking for "grant" access. It would also be nice to have "deny" access (like the OpenVMS identifier-based ACLs), but that would also require a logical ordering sequence (priority) to checking the access privs, so a high-order "deny" would have preference over several low-order "grants".

Access permissions are definable by module, or by "node type" (what is being queued for display on a browser). Nodes are based on content, blocks, or menus. For example, a R/W access can be given to a "forum node" so a registered user can post topics, or the "editor" role can be granted to a user to allow R/W access to forums, blogs, pages, stories, etc. A nice update would be to allow any content to contain a tag to link to the roles, so a forum page could be tagged as a forum, or a monastic forum, etc. with the posting user able to determine which categories it applies to, to which the roles would give appropriate access.

Last missing feature is a way to group roles in a hierarchy, sort of like a department has a number of job descriptions, but the department, itself, also has specific access features. I was thinking of the monastic aspects in this case, where, for example, all those involved in the monastic aspects could read the blogs of other in the monastery, but they would not be available to the general public or other Institute sections. Sort of a "monastic read" access, rather than having to maintain read access to the different levels of study.

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Applications

Post by LoneBear » Tue Oct 11, 2011 9:09 pm

Normally, a website only has a single "registration" to get access. In the case of Antiquatis, with its scientific, religious and philosophic bases, it would be nice to be able to configure "applications" for different aspects of the Institute functions. Not everyone would be interested in the "religious caste," for example, so it would be nice to also have a application form to apply for that access, which can be reviewed and appropriate settings can be made, based on the information in the application.

It would be a supplement to the existing user account, but more than just another security role... more like access to a subdomain or sub-site of the main site.

User avatar
Tulan
Cellarius
Cellarius
Posts: 453
Joined: Wed Aug 31, 2005 8:04 pm
Location: Austin, TX
Contact:

Re: Knowledge Bases

Post by Tulan » Tue Oct 11, 2011 10:10 pm

I really like that idea! We could have the primary profile registration page - then for each sub-section of the site there can be individual registration pages with a simple amendment registration form for motto's, signatures, or whatever.
Ah, you seek meaning? Then listen to the music, not the song. - Kosh Naranek

User avatar
LoneBear
Legatus Legionis
Legatus Legionis
Posts: 3578
Joined: Thu Jul 22, 2004 12:38 am
Location: Utah
Contact:

Searching

Post by LoneBear » Fri Oct 21, 2011 10:32 am

One of the things I find annoying with search engines, is that most do not have a way to either qualify or sort by date--not after the most POPULAR match, but the most RECENT match, as I do a lot of searches regarding problem fixing. I was looking for info on a problem with named running out of memory and most of the results that come up are years old, with outdated versions of the operating system. Because the problem occurred with the latest kernel update, people having this issue have not posted enough on it to make it popular to get to the top of a search engine.

In knowledge base terms, there are a LOT of assumptions that go into a query, so I was thinking a kind of "search profile" was needed, where one can define some search criteria, creating a "scope" for the search. For example, if I am making a religious query, it would be nice to get the results of the religions I am familiar with, first. If I'm working with astronomy and type in "mercury", I not interested in the mercurial element, nor the Ford car--I'm after the planet.

So I was thinking that all information stored in the CI database needs proper scoping, of sufficient detail to narrow down search, indexing and topic mapping to get a person to what they are looking for. Perhaps something like the Dewey Decimal system used at Libraries as a starting point for a subject-matter scope.

Another would be an author scope--not just who wrote it, but the author's attributes as well. Example: Father Brown of the Franciscan order of monks contains 3 scopes: Fr. Brown <-- Franciscan monks <-- monks (general).

And temporal scoping would be nice, not just when it was created or modified, but what period it applies to. That way one could search for references to Olivanders in 382 BC.

Post Reply