Docle coding and classification system browser
Oon Y K
The linnean Docle coding and classification system is the most widely used coding system by Australian primary health care providers. With the emerging role of the electronic medical record, expectations of greater utility of such an EMR is heightened. There will be a need for hospital-hospital and GP-hospital inter-transfer of medical data, coding issues will arise in the context of this data interchange. A transportable electronic medical record without a clearly defined medical coding system is like a vehicle without wheels. And in fact a transportable medical record system leveraging the natural expressivity of Docle is afoot. But which coding system? There is a spectrum of choices, natural language at one end of the spectrum, to the other extreme of representing medical entities as numbers. Docle is an all but natural alphabetic language of medicine suitable for machine processing. The Docle framework has equal facility for ‘lumpers’ and ‘splitters’.
Docle is over ten years old, it did not make much of an impact until it went linnean in 1995. Within months, it was incorporated in Australia’s most widely used medical package. The Docle browser operates on over 16,000 medical objects that are related to one another in a linnean hierarchy much like what Carolus Linnaes did to biological classification. The secret of a good coding system is properties and not pointing, it also solves the ostension problem. The Docle system is mapped to the ICD codes and has a ‘less is more’ approach of recycling lexical patterns and the judicious use of operators to make powerful expressions. Docle may just well emerge to fulfill the vision of UMLS but not carry the dead baggage of the old numeric codes.This Docle browser, in a nutshell, explodes the power of linnean medical classification.
1) Docle is a belief system modelled on the linnean biological classification system. The linnean biological framework has been stable for over 200 years. There is no coding and classification problem in biology. By modelling on this framework, the full potential of medical coding and classification is unleashed with the Docle framework. Medical entities are classified as species and placed in a hierarchy much like a species such as homo sapiens. Every one of the over ten thousand Docle medical objects are thus related in a congruous framework. For instance the liver object at the level called ORDER knows all its associated diseases, symptoms and signs. The Docle health classification system has drawn the two strands of biology and medicine together with a common linnean model.
2) Backward compatibility and future potential.
The choice of coding system of the future for Australian medical practice must in a sense comprise at least the functions found in the three buttons one expects in a video recorder. The buttons are Play, Rewind and Fast Forward. The linnean Docle coding and classification system that is being played on the Medical Director software by over 6000 Australian general practitioners in 1999 has those three functions. Docle is in an advanced stage of mapping to ICD9 and ICD10; in that sense it has a rewind function. Docle has the expressivity of natural language, and yet the precision of numeric coding, and can be mapped and made backward and forward compatible to any coding system. The fast forward function of Docle is its intrinsic strength in knowledge representation. The embodiment of such a knowledge tool is the medical spreadsheet demonstrated at the HIC98 conference in Brisbane.
3) Simplicity of design framework, clarity of embodiment.
Docle is a linnean, hierarchical system with multiple inheritance . All that means is that Docle combines some of the ideas of Smalltalk and the linnean biological classification of Carolus Linnaeus to tackle the issues of medical coding and classification. The problem in medical coding and classification is comparable to the challenge of covering all the surface area in say the bathroom with band-aids with either no or controlled overlap. Certain surfaces in the bathroom, such as the taps and toilet bowls are difficult to say the least. This metaphor describes the need to code for every symptom, sign, disease, investigation and investigation result in clinical medicine. Another way to look at the medical coding challenge is to imagine the work being similar to the biologist classifying thousands of problematic species like the platypus.
The linnean biological framework comprising phyla, class, order, family and species is an ingeniously suited framework, after modifications, for classifying medical objects. Docle is different in that it is totally alphabetic and uses primary, secondary and tertiary keys to access "objects" that hold the linnean properties of each medical object. All these objects are linked together in a congruent 'belief system'. No more the mapping of diseases to meaningless numbers. Docle breaks free from the restrictions of multiaxial coding. Nothing stays the same in medicine, the Docle framework is designed for expected constant change and improvements in our understanding of medical science. With Docle, the band-aids come in various shapes and sizes in order that everything is covered in a congruent manner. One can even imagine tiny band-aids that are arrayed in the manner of jigsaw puzzles to make up the complete area of a large irregularly shaped band-aid. Certain parts of the bathroom are covered up to six or seven layers of band-aids that are meticulously cut, measured and catalogued. Each band-aid, no matter how small, knows its relationships to every other band-aid in the bathroom.
Embodiments of Docle:
The primary key for colles fracture is fracture.radius@coll-es ,
the secondary key being frac.radi@colles ( the Docle algorithm automatically generates a standard abbreviation), while the tertiary or alias key is collesFracture.
There is also fracture.radius@galleazi. Now these two entities have as their genus fracture.radius^ . The entry for colles fracture was a late entry. The previous ‘best’ code had been fracture.radius ( frac.radi), this “band-aid” covered a lot of territory, fracture.radius@coll-es is a species form of the genus fracture.radius. A medical entity often has several genera (multiple inheritance). This is akin to a small band-aid overlaid by several large irregularly shaped band-aids. The other good name for genus is UMG or useful medical grouping; remember Docle is linnean, hence it looks better to use terms like genus, family order where the hierarchy is already predetermined. With colles fracture, if you had coded with fracture.radius, no matter. In future when you search for colles fracture, the search engine would recruit all the species associated with the genus or UMG called "fracture.radius", as that is the UMG that fracture.radius@coll-es also belongs to. The issue of deprecation of codes is important. With Docle, the scheme used to manage codes that are no longer used is termed Graceful Deprecation (see below).
Updating Docle is easy. A hypothetical example is that of an eschericia coli strain causing an outbreak of food poisoning in Melbourne. The microbiologists in Parkville have identified a new species they have christened the 'Melbourne' strain. Coding this in Docle is a doddle:
infection<eschericia@coli@melbourne -> infe>esch@coli@melb
as there is in historical precedence, a whole series of infection<escherichia@coli@subspecies .
The operator characters are . for “located at”, @ for associated with and of course > and < meaning “leading to” and “due to” plus other useful operators.
Now compare the above with the complexities of ICD9 and ICD10 below, where a lifetime of study still will not make you competent in coding. Life is too short for living with numeric coding systems with self inflicted complexity.
4) The KISS principle.
Docle is simple and efficient, it is totally alien to the look and feel of numeric coding systems such as ICD9 and ICD10.
Below is an example of the trauma of working with ICD-9 and ICD-10 coding:
910.0 Abrasion or friction burn of face, neck or scalp except eye, without mention of infection
910.1 Abrasion or friction burn of face, neck or scalp except eye, infected
both map forward historically to ICD-10-AM code
S00.01Superficial injury of scalp, abrasion.
However the AR-DRG grouper software code 910.0 groups to DRGs 492 - 494 Trauma to the skin, subcutaneous tissue & breast while code 910.1 groups to DRGs 489-491 Cellulitis. The two ICD-9-CM codes group to different DRGs but map historically to the same ICD-10-AM code (S00.01). To maintain congruency, a patch is needed. The ICD-9-CM codes needs to be logically mapped to maintain the DRG groupings. A logical map for 910.1 to ICD-10-AM code L08.9 Local infection of skin and subcutaneous tissue, unspecified is needed.The above examples show that when codes transcend multiple anatomical sites, pathophysiological processes and with the mixing of medical idioms - inevitably leads to future incongruencies. Constructing medical belief systems for decision support with such a numeric scheme is a material challenge.
5) The Ostension problem - properties, not pointing
One significant difference between the Docle and the traditional numeric approach to medical coding and classification involves the philosophical issue termed the ambiguity of ostension. The inflection point in the evolution of medical coding systems is the grappling with the issue of ambiguity of ostension. Ambiguity of ostension is a topic found in basic philosophy text books. The numeric coding schemes use 'pointing' while Docle uses the concept of 'properties' or behaviour. Pointing leads to a conundrum referred to in philosophy as the ambiguity of ostension or the fallacy of mere pointing. In the classic sense this happens when we try to extend the vocabulary of a young child. The parable of the inquisitive child ( as covered by Gareth Matthews in 'Philosophy and the Young Child' ) asking of the meaning of the French word 'La Table'. He was not satisfied about the father merely pointing to a table. He asks 'How does one know that the pointing is not at the table top or the colour of the table?'. Looking up the meaning of a word in the dictionary is not helpful when it just points to a bigger unknown word. Looking up a key in a numeric coding scheme leads to an obscure numeric code. A grand scheme like UMLS is like looking up for the meaning of a word and to be told that the meaning of that word in Swedish, German and Swahili. Augustine in De Magistro resolved the problem by stating that the meaning of a word say " bird-catching" is demonstrated by a bird-catcher doing his thing . Augustine goes on to say that an observer intelligent enough will eventually catch on to what bird-catching is and hence what bird-catching means. Meaning is derived from the properties or aspects of the behaviour of 'bird-catching'. In a sense, the Docle coding scheme comprises objects with behaviour. Each Docle object is evaluated by its 'behaviour', its utility is derived by leveraging on its relationships to other Docle objects. It is obviously much easier for the software implementor to implement his decision support using a coding scheme based on properties and behaviour, rather than a pointing system. An alphabetic scheme like Docle is much easier to support. Over a weekend, Medical Director could request for 50 to 100 items to be coded in Docle. Which coding scheme can give that sort of turnaround? Docle can. Remember the fallacy of ostension. The prayer for every medical informatician is “Give us each day more properties rather than more pointing in our coding systems and forgive us for the trouble we have caused with the codes we have deprecated.” At 16,000 items identified and classified over the past 10 years, the hack work for Docle is largely done.
6) Decision Support
The cogent reason for digitizing medical data is machine facilitated decision support. Docle has demonstrated its sheer suitability for decision support in the form of the PLUM MEDICAL SPREADSHEET as documented in the HIC98 CDROM.
7) Universal Transportable Medical Record.
Any universal transportable medical record initiative without a defined medical coding system is like a vehicle without wheels. Watch out for a newcomer that uses Docle, this project has been christened DocleScript.
Docle is not too big, that is a plus. UMLS is big, maybe too big. This applies to the other numeric coding schemes. I personally think that big is not necessary better. Who needs 60 terms to describe candida infection - just two or three are enough. A big coding list will just slow the pick list. The optimal size should be large enough to fulfil our needs and no more. Remember properties and not pointing! When people boast about their 400,000 terms list, it is more hubris than logic. Resolution matters, not size - a poor resolution photograph if blown up will only show more fuzziness. Just like the photos captured on a Kodak box camera, no matter how you enlarge the print from the box camera, the photo resolution is still a disappointment. Mapping four fuzzy numeric coding systems gives you a fuzzy numeric system four times bigger. There is so much hubris and hype about the existing numeric "official" codes that we need our eyes opened to see the emperor's clothes for what they are.
9) Splitters versus lumpers; differentiation versus integration.
The perennial conundrum faced by the WHO classification body has always been the issue to lump codes together or to split them into more specific codes. With Docle, we can split and lump at the same time. We can differentiate the various species and subspecies of “DiabetesMellitus”, but the differentiated species know their memberships in the various genera. We need complexity but with complete integrity. We need to split and lump at the same time. This differentiation/integration design creates outward simplicity for the end users who are only interested in real life medical entities and not some abstract artificial codes that span disjointed anatomical areas and pathophysiology as in the numeric codes examples.
10) Variable versus computer constants
One of the biggest differences between Docle and the prior art of Read, ICD, SNOMED and ICPC is that Docle uses the computer variable concept while the rest of the field are implemented like computer constants. A variable is like a container. This container in Docle, called a Docle object, can be accessed via primary, secondary and tertiary keys. The three types of keys are equivalent in the sense that they all point to the same container with its stored methods and data. Inside the container is the belief system about the object. While the name of the variable is fixed, the contents of the variable may vary over time. Docle makes use of the concept of separation of the belief system data from the key code itself. This separation of the properties to the code key itself provides Docle with unparalleled flexibility to expand and mutate with the growth of medical knowledge. The key, be it primary, secondary (Docle abbreviations) or tertiary (aliases) - all leads to the same medical object with its stored behaviour. Medical advance will lead to gradual adjustments to the behaviour of the medical object. It is hard to envisage the need to change species names such as rheumatoidArthritis or diabetesMellitus.
11) Number codes is not a viable belief system
The linnean System in biology is a viable belief system that is alive, moving on with the advancement of biological knowledge. It is a framework or road map to the realm of biology. The gaps in the linnean framework excites the imagination of the biologist about missing links in their knowledge. It is a powerful method of cognating the knowledge that is being accumulated. This yearning for classifying and cognating medical knowledge was expressed in the preface of the ICD 9 manual, but it was just a yearning.
12) Granularity problem and the Genus chunking solution.
The granularity problem is familiar with anyone attempting to write a decision support program in medicine. An instance of this problem is the flagging of the disease/drug interaction between the beta-blockers and diabetes mellitus. It would be tedious, inefficient and prone to error to try to pick up every specific type of beta-blocker interacting with every variation of diabetes mellitus. An example of the beneficial effects of chunking into genus level is the case of diabetes mellitus. Chunking up of the three variants of diabetesMellitus: diabetesMellitus@gestation, diabetesMellitus@insulinIndependentDiabetesMellitus, diabetesMellitus@nonInsulinDependentDiabetesMellitus
into a genus called diabetesMellitus, allows the common behaviour to be stored in the diabetesMellitus genus object. Likewise we can chunk up the therapeutic species of propanolol, atenolol and metoprolol into the medical genus betaBlocker. An adverse drug-disease interaction is flagged when the two genera of betaBlocker and diabetesMellitus are combined. A new beta-blocker will inherit this interaction behaviour as soon as it is tagged as belonging to the genus of betaBlocker in its container holding its belief system.
13) Why choke on number codes when Docle is a feast in verse?
Numeric coding schemes look like this T-2800 M-44060 E-2001 F-03003 for pulmonary tuberculosis with granuloma.
While some Docle codes for chest pains look like these:
where the / operator means increased or aggravated by and the % operator denotes quantification.
14) Docle carries with it a free and unified medical abbreviation standard.
The abbreviation is the computer generated secondary key. For example carcinoma located at thyroid is coded as carcinoma.thyroid and the secondary key/abbreviation is carc.thyr
15) Code shear technology.
Docle is built up of words joined by operators, much like an internet address. Coded entities are modified by aspects such as laterality, acute, chronic, simple, compound, complicated and male or female. The modifiers are added to the main code by clicking of buttons. The & character is the shear operator, an example is the code fracture.femur&rightHandSide&simple. During processing the substring &rightHandSide can be sheared off to return the basic code: fracture.femur .
16) Best Practice by stealth
EBM can be encoded inside the Docle object, each disease docle object can have a list of ranked recommended treatments and a list of ranked investigations. Adoption of a Docle type coding system will achieve Cochrane by stealth.
17) Anatomical belief system behind the Docle framework.
All Docle objects are linked together to form a viable and congruous belief system, even in the anatomical sense. The need to map every Docle object onto an anatomical framework has thrown up a previously unnamed body organ. It detected a gap in its anatomical hierarchy. The anatomical locations scrotum and testis has a missing superclass, Docle has christened this organ the tistum. The Docle for tistum is tist which has as its subclasses scrotum and testis. Docle is the first medical coding system with an official term for balls. A disease entity that involves a finger can trigger the message that the hand is involved is part of this anatomical belief system.
18) Docle has been subjected to the duress of actual use. Four years ago Docle was introduced; it is undergoing stepwise refinements with constant usage. This is both in the medical community usage and in decision support project development.
19) Graceful Deprecation
All coding schemes get into the sticky situation of having to dispose of codes that are no longer wanted. There are a variety of schemes to deal with codes that are superseded. One way is to publish a list to say these codes are no longer valid. Another scheme is to use an autogenerated number to represent a code. This autogenerated number once used is never recycled. Problem is that your electronic medical record will be studded with all these autogenerated numbers. To make sense of these autogenerated numbers one will need a lookup table. Making sense of the relationships among these autogenerated numbers will require great ingenuity and effort. There is a softer and gentler approach to deprecated codes employed by Docle whereby the bottom line worst case scenario is being saddled with a human readable and understandable Docle code or expression that makes clinical sense. With future Docle revisions, a new scheme for handling deprecated codes termed Graceful Deprecation is introduced.
20) The Docle Browser.
The Docle concept is best explained by diving into the browser. Just like it is hard to explain about the world wide web. It is much easier to give someone an internet connection and a Netscape browser and let him roam.
The Docle browser is a multi-paned application that shows to advantage the Docle framework. Properties of each Docle object such as its phylum, class, order, family, genus and species are displayed on selection of a Docle entry. Associated species with membership of similar genera are also displayed on a listing pane. The main pick list contents can be altered by selecting for the various linnean hierarchy. For example, the main pick list can be populated by anatomical entries by clicking at the linnean Order level. By selecting an anatomical object, all Docle species with reference to the anatomical object are listed in the main pick list itself. This ‘diving’ or drilling down capacity makes the browser a powerful tool for extending and enhancing the integrity of Docle itself. By listing all conditions associated with the common bile duct, one may find that a single medical condition may have been inadvertently coded twice or overlapped with another code.
Oon, Y Kuang. HISA HIC Conferences Proceedings for 1996, 1997, 1998
www.docle.com.au contains links to articles about Docle.
Matthew, Gareth. Philosophy and the young child. Harvard University Press 1980. Page 96.