Sunday, January 18, 2009

Indian Kanoon - The road so far and the road ahead

I was quite pleased to find law information publicly available on the judis and
the indiacode. However, it was too difficult to look for anything on these
websites and so I started building tool sets to play with law data. At a
certain point I felt that integration of these small software pieces will be
very interesting. I was still skeptic as to whether search on law documents
meant anything to common people who do not know the law jargon. In any case I
integrated the tool sets into a search engine and got pleasantly surprised when
many of my common queries were well answered. So I deployed it as a publicly
available service, called it Indian Kanoon and fortunately many people have
found it useful over time.

When actual people start using a service (whether free or fee-based), the
demand for correctness and usability increases significantly. The need to
understand the problems, think about the issues and fix them have kept me in
tight grip. Indian Kanoon was announced last January in a very crude form and a
number of changes have gone in the past year. So this post is mostly to
highlight what all work has gone into indian kanoon in the last year, what the
challenges were and what features are planned in future.

Integrating more legal documents

Indian Kanoon started only with supreme court judgments and central laws.
Clearly this was not sufficient to many people who wanted to search in high
court judgments, law commission reports and law journals. Over last year, a
number of other legal documents have been added. Firstly, the law commission reports
and a law journal
was added. The law journal "Central India Law Quarterly" has been
digitized and was put up on Internet by Devaranjan. The only problem in their integration
was that the many of these documents were images scanned from the books. So I used tesseract,
a free OCR software supported by google, for extracting text from these images.
However, the text extraction quality was just 90% and I am skeptical if google
uses tesseract for its own google books project. Tarunabh pointed out the availability
of constituent assembly debates that can be integrated. He pointed out two main
problems in integrating them. First, the article numbers in the debates were different
than in the constitution. Secondly, debates are cited in the court judgments using
page numbers in the official books. But both of these numbers were not available in
the digital copy provided by the government. So the only way out was to go back to
the actual books. We did not want to give away the digital route yet. So we went to books.google.com that had a scanned copy of the debates. Tarunabh emailed Google
to release those books in public domain as the copyright on them has expired the
previous year. Google replied saying that they are not sure about the copyright
expiration and will be conservative in making books publicly available. Finally,
I loaned the books from a library, manually copied the page numbers and the
association list between the article numbers in the debates and the article numbers
in the Constitution and integrated the constituent assembly debates.

Indian Kanoon was highly deficient in terms of high court judgments and even in
Supreme court judgments as Dilip earlier pointed out on my blog. So I
integrated the high court judgments and made Indian Kanoon more comprehensive.

Features

Beside making Indian Kanoon comprehensive in terms of legal documents, a number
of features to make searching easier have been added. The most common problem
was the mis-spelling of Indian names and so I I first added the most critical
feature for
spelling suggestions
. Ability to search and order documents by date was added next. The search and forums were redesigned to look aesthetically appealing. In order to provide notifications for new judgments, RSS feed for court judgments was recently added. Finally, people may like to monitor documents related to certain words or phrases. So on Tarunabh's suggestion I added the
RSS feed for any arbitrary query.

Contributing code back

Developing indian kanoon software has been possible because of the availability
of large amount of free software. As a result I was able to modify these
software and customize it for law search. Indian Kanoon uses a feature rich
open source database - Postgresql as the
backend. When users submit a query, matching documents are found, ordered and
the top few are shown. For each document, the search engine also displays a
small text excerpt where the query terms appear. The text excerpt allows people
to quickly evaluate whether the document is relevant to the query. The
headline function developed for indian kanoon was contributed back to postgres
and has been
added to the postgres CVS head
. Beside that a bug in postgres was fixed as well. I also sent the
phrase search function to the postgres list. But, Teodor Sigaev, who merged OpenFTS in the Postgresql, wants a generic operator that can check for arbitrary distance between the lexemes. I have not yet got time to work on this operator.

Beside development on the database, the Indian Kanoon forums has been released
as djangobb - Django Bulletin board that uses the django web application framework. The judis recently moved to a really obfuscated website where the judgment did not have a
stable URL. Prashant Iyengar pointed out that we are not getting the live feed from the judis. So I reverse engineered the website and released the judis reverse engineering code.

Future works

Even after so much of work a number of things need to be improved on indian
kanoon. Here is a list of changes that I think are required to make indian
kanoon more comprehensive, more rich and better in search. Please feel free to
suggest more.

1. Reverse engineering different court and tribunal websites so that indian
kanoon can provide a live feed of all Indian court and tribunal judgments.

2. Currently indian kanoon cannot answer questions like "list of judgments in
which a particular law section was held" and "search only in family law
judgments". The problem is that we do not have enough semantic information
about judgments. So I want to enable common users to start tagging documents.
There will be two kinds of tagging: categorizing court judgments and laws into
broad categories like family law, constitutional law, right to equality etc and
secondly, tag whether a judgment explains, bolsters, or overturns a given law
or judgment. The tags generated by the users will be available to everyone
with the Creative Commons-Attribution-Share Alike license 3.0.

3. A number of people type in natural language in the search box. For example,
someone will type "recent judgments from delhi high court". Even though we can
answer these questions, we directly search the query to the documents. For
example, the above query could have been reduced to "doctypes: delhi sortby:
mostrecent". So what we need is a small natural language processor that can
automatically convert such natural language queries to a more precise query
that the engine can evaluate.

4. I only support searching for a set of words in the documents. Roy wanted a
more sophisticated
query langauge
that supports boolean queries. This will enable people to
issue more complicated queries like (freedom OR speech) AND (NOT expression).

5. With the addition of more data over time, Indian Kanoon takes more than a
second to evaluate some queries. A number of software changes (or possible
hardware upgrade) are required to bring back the evaluation time to sub-second.

15 comments:

RAMESH said...

Dear Sushant and all those involved in the making of Indian Kanoon.

I say this in all truth and from the depths of my heart - I've seen sooo many but Yours is the Ultimate Law Search Site!

My gratitude and Congratulations to you for bringing this site to us. You've done a great service to every Indian. Wonder what makes you tick !

Ramesh Bawri, Shillong, Meghalaya

Shrichand Nahar said...

Efforts taken by you to make available such a large data on Indian Kanoon portal are, to say the least, commendable.

Keep it up.

Consider adding a link on the main page for reporting errors or other problems during browsing the data as also for suggestions to improve.

If such link can be added on the top of the document itself with other links, that would be more helpful.This link may be added apart from presently available option of Feedback under Forum.

Thanks a lot.

Shrichand Nahar, Pune

The Gypsy said...

Dear Sushant,
I was completely blown away by your efforts on the website. It is a tremendous break through for the general public and the Indian judiciary. I wish you all success.

Anonymous said...

dear sushant
thank u so much for indian kanoon website. It helped me so much in preparation of my exams. feel really grateful to u for that. Thank u so much.

RAMESH said...

Hardly a day passes when I do not log in to Indian Kanoon and seek help from the site and it never fails me. I can only reiterate my gratitude to all the people who have developed this site.

In my view, it is the Best and most useful of all law-related site that i know of.

Thank you Indian Kanoon. Please keep it up!

Ramesh Bawri
Senior Member
State Consumer Disputes Redressal Commission, Meghalaya, SHILLONG

NitaPratap said...

Dear Sushant, this effort to provide a free service, is such a brilliant way to give back to the society. There are few people who figure out the best use of their talent to help people at large. Keep up the efforts!

Prashant Reddy said...

Dear Sushant,

Thanks a ton for the fantastic Indian Kanoon website. Superb work.

Cheers,
Prashant

Sushant Sinha said...

Dear Sushant,
your web site display of indian kanoon.org/doc/540625 is not fair, since it was a non reportable document of High Court . Request you to delete this.
Anonymous

Deleted it! Thanks for informing! -Sushant.

shivchakra said...

dear sushant,

wat an idea sirji..!!!

thanks a ton

kalyan52 said...

My sincerest gratitude for this wonderful web site "India Kanoon" I find it very informative, innovative. I would like to draw your kind attention to my submissions on 24 October and 25 November, 2013, about forfeiture of my LTC Bill for misinterpretation of CCS-LTC rule.I fervently hope that I shall be advised to pursue this gross injustice to be tamed.Whims of controlling authority should not come in way towards justified settlement of this case.
Thanks and best regards.
KALYAN GHOSH, LUCKNOW.

abhijit4law said...

Dear Sushant Sinha,

What compels you to delete the so-called "non-reportable" judgments when N.I.C., Ministry of IT, Govt. of India itself makes such judgments available for all at www.judis.nic.in , I wonder !!!!?

Your act of deletion is probably a misguided attack on the very noble aim & purpose of Indiankanoon itself. You are wise enough and hence, I request you to reconsider the ethical side of the said deletion.

I am a lawyer by profession and so far as my little knowledge goes, even a non-reportable Judgment of superior courts can not be surreptitiously hidden or concealed from people. Actually, the very word "Non-reportable judgment" is an absurdity like "wodden iron". A judgment being a product of some judicial mind of Superior Court, can not be kept away from people of India, if not barred under section 8 or section 9 of the Right To Information Act, 2005, though with the very phrase "Non-reportable", a judgment may be marked to be applicable for a very specific case only and not to be followed flagrantly as a precedent in every case. Be it mentioned also a "non-reportable judgment" faces no embargo to be applied in other cases, if relevant.

Anyway, thank you cordially for your effort with indiankanoon

Sincerely yours,
Abhijit Majumdar
Advocate
Barrackpore Bar Association, Kolkata-700120
abhijit4law@gmail.com

Anonymous said...

visit site buy ritalin australia - buy ritalin dublin

Anonymous said...

Quality articles is the important to interest the people to visit the
web site, that's what this web page is providing.

my site: wieviel goji beeren pro tag

Anonymous said...

I'm not sure why but this site is loading very slow for me.

Is anyone else having this problem or is it a problem on my end?
I'll check back later and see if the problem still exists.


Have a look at my web-site: www.citifinancial.com [http://m-pedia.referata.com/wiki/User:CallieKuster]

Vinayak E said...

Thanks a ton for providing this treasure trove of judgements !!

This is the REAL RIGHT TO INFORMATION ,,.... !!

I'm an NOT and I.T. savvy person, but IF there is any other way I can help this NOBLE cause , please let me know

regards

Vinayak

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 India License.