Tuesday, September 16, 2014

Making Indian Kanoon the gold standard for legal research

A new release was rolled out on Sunday in an effort to make Indian Kanoon the gold standard for legal research in India. The release consisted of a lot of user visible changes and includes all changes that were planned in July. While some of these changes are related to improving the infrastructure, most of the changes have come from people complains and the problems they have been facing while using Indian Kanoon. Here are the broad list of changes:

1. Removed duplicate judgments: Many court websites in India have separate urls for each case number even when these cases are combined and only one judgment is delivered. The new release ensures duplicate judgments are filtered out.

2. Improved the judgment layout: The new release removes page numbers and case numbers that are sprinkled in the judgment. Also new code has been developed to identify paragraphs, quotes and tables so that judgment could be laid out in an appealing format.

3. Improved PDF copy: htmldoc is enhanced to generate PDF output using a more readable Georgia font using the approach here.

4. Consumer Court judgments added: Roughly 1 lakh (100K) judgments from National and State Consumer Redressal Commission have been added to the Indian Kanoon database. New cases from these commissions will also get updated every day. For example look at the consumer cases against  airtel here.

5. Improved titles for Bombay, Kolkata, Andhra and Kerala high court: Since these courts do not provide meta information, the petitioner and such details are extracted from the free text. Earlier there were a lot of error in these extractions.

6.  New Design:  A new center based design was rolled out on Sunday. Do let me know of any usability issues you have encountered in the new design.

7. Software updated: The entire software stack was updated including the kernel on the production host. For the first time it has been achieved without any downtime or any user experience issues. Also for the first time the production traffic was served on Sunday using a multi-node setup. It is not a user facing change.

Thursday, July 3, 2014

Planned Improvements for the next release

Here is a broad list of changes to Indian Kanoon that I am planning for the next release (hopefully out by July end 2014)

1. Remove duplicate judgment. Many high courts are like Uttarakhand, Jharkhand, Orissa and Bombay are publishing the same judgment on different URLs. Perhaps every case number has a new URL even when cases are combined together in a judgment.

2. Improve the judgment layout. It will involve removing page numbers and case numbers sprinkled in between the text. It will also involve improving the layout and paragraph detection algorithm.

3. Improve the pdf output. Currently IK uses htmldoc for generating pdf output. But it has limitations on the fonts we can use. So plan is to move to Pisa.

4. Add judgments from Consumer Courts at national and state levels. District consumer forums in Maharasthra, Gujarat and Karnataka are using local languages that are using custom embedded fonts in the PDF. As the text is not in unicode, it will require lot more effort to reverse engineer the fonts for any text processing. So district consumer courts will not be added.

5. Improve titles for Bombay, Kolkata, Andhra and Kerala high court. Since these courts do not provide meta information, the petitioner and extracted from the free text. Currently there are lot of error in these extractions and the plan is to improve them.

Tuesday, June 24, 2014

Indian Kanoon: Search Within Results

Some of my lawyer friends have told me in past that they need a way to "Search Within Results". Consider an example where a person has filed a "child custody" case and he or she is an accused in a separate criminal trial of "attempt to murder". The lawyer wants to know whether the ongoing criminal trial will have a bearing on the "child custody" case or not. The lawyer searches for "child custody" and then want to only look on those cases that have "attempt to murder" in it. One easy way would be to just search for "child custody attempt to murder". But a search for entire phrase may not get you very relevant result because the phrases are disconnected. So it will be better if we can filter the documents by "custody of child" and then look for most relevant documents by the phrase "attempt to murder".

To support this use case, a new "filter" keyword is added. Now you can search for "attempt to murder"  in documents that have "custody of child" by the following search query: "attempt to murder filter: custody of child". In this case the ranking will only be decided by the phrase "attempt to murder" giving better results.

There is a new box provided at the bottom of the page titled "Search Within Results" that takes existing search as the "filter" and then uses new query to rank order the matching documents.