ClayDesk e-Discovery Blog: 2014

Monday, October 13, 2014

Tax Court Approves Predictive Coding for First-Pass Document Review

Invariably, the logical answer to coping up with Big Data with regards to eDiscovery is Predictive Coding. While definitions of predictive coding vary, but a common form includes uploading electronic documents to a server followed by taking representative samples, and ‘Seed Sets’ are created by attorneys who are familiar with the legal issues of the case. Attorneys, then, review the seed sets and code each document for responsiveness or other attributes, such as privilege or confidentiality. Utilizing a re-iterative approach, predictive coding software is tweaked and adjusted regarding how the computer will analyze future documents.

Recently, a U.S. Tax Court gave permission to use predictive coding in Dynamo Holdings, Ltd. vs. Commissioner, 143 T.C. No. 9 (September, 17, 2014) case, whereby permitting a taxpayer to use predictive coding as a first-pass review of a large set of documents, despite the. Apparently, the big idea is to reduce costs. While respondents in this case asserted predictive coding to be an ‘unproven technology’, the court completely disagreed justifying this by citing several precedents along with an expert testimony. Predictive coding contains two important elements known as ‘Recall’ and ‘Precision’ – I have detailed these concepts in my earlier post. Inspite of this, the court’s opinion is important for taxpayers faced with requests for a substantial amount of ESI, and has the potential to reduce costs that may easily run into millions of dollars.

This reaffirms one thing for sure – IT, which was once considered a necessary evil, is now evolving to form a symbiotic relationship with the legal industry, and with other industries alike. Manual document review is certainly going to be obsolete in the near future – if not already! Analytics, predictive coding, machine learning products and technologies providing us with business intelligence (BI) to make informed decisions. For example, Microsoft’s newest products such as Delve, along with host of BI tools provide meanings to your data, while SharePoint e-Discovery center adheres to the regulatory compliance and standards. With this said, predictive coding technology is essentially replacing manual work, and tech savvy attorneys seem to have a ball with one!

The important aspect in this regard lies with determining the optimal values for ‘recall’ and ‘precision’ within the predictive coding software!

Saturday, May 31, 2014

The Power Of Cloud Computing: Multi-Tenant Database Architecture

Software as a Service (SaaS) denotes a novel and innovative paradigm, and the fact that companies do not have to purchase and maintain their own Information Technology (ICT) infrastructure; instead services from third party are acquired. Multi-tenancy permits SaaS providers to provide similar service to various customers (tenants), which share physical and/or virtual resources transparently.

Multi-tenancy database architecture essentially forms a design in which a single instance of the software is run on the service provider’s infrastructure, and multiple tenants access the same instance. Simply put “A multi-tenant application lets customers (tenants) share the same hardware resources, by offering them one shared application and database instance, while allowing them to configure the application to fit their needs as if it runs on a dedicated environment”. One of the most conspicuous features of Multi-tenant architecture is that allows for consolidating multiple businesses onto the same operational platform or system. Multi-tenancy invariably takes place at the database layer of a service. As an analogy, think of a rental apartment building with numerous tenants, each having its own requirement of storage, space, and utilities.

Easier application deployment for service providers, improved rate of hardware utilization, and reduction in overall costs especially for SMEs are core benefits of Multi-tenant model. In traditional single-tenant software development, tenants usually have their own virtual server. This set-up is similar to the traditional Application Service Provider (ASP) model. However, in the SME segment, for instance, server utilization in such a model is low. By placing several tenants on the same server, the server utilization can be improved.

There different kinds of Multi-tenant models that exist in database applications today are as follows:

1. Separate application, separate database, and infrastructure (Isolated Tenancy)

2. Separate application, separate database, shared infrastructure (Infrastructure Tenancy)

3. Shared application separate database, shared infrastructure (Application Tenancy)

4. Shared application, shared database, shared infrastructure (Shared Tenancy)

The figure below illustrates a high level architecture of Multi-tenancy. Multi-tenant approaches as a continuum paradigm. The far left (Isolated Tenancy) depicts each tenant with its own application instance running and as we move further towards the right, sharing of tenancy increases, ultimately reaching the far right side (Shared Tenancy)

: Multi-tenancy application architecture

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Friday, May 16, 2014

What’s Wrong with Outsourcing? Really?

A company’s existence is directly linked to its profit-making capabilities. This includes employing the most gifted workforce, running optimized operations, having excellent quality controls in place, just to name a few. There is an invisible force, however, constantly acting behind this entire process - the force of 'laws of economics' - principles of demand and supply.

The word ‘globalization’ is not a new buzz word anymore. However, its relation to economics is where the dilemma of outsourcing and offshoring lies. Gone are the days when corporations had loyal employees working for them, the technological advancement has disrupted not only how we work but how we think – Yes! We think Google, Facebook, LinkedIn, Twitter, and for the most part have become dependent upon technology.

So, what impact does technology have on driving profits for a company? Look around you – things have changed, human behavior has changed, our thinking process has changed – we have become victims to this unstoppable monster. As the Greek philosopher, Heraclitus, rightly said “There is nothing permanent except change”. As a result, companies who adapt to the changing environment remain at the forefront, and those who resist potentially may bear the grunt. In any case, the objective remains to make profits for shareholders.

We all are aware of the exponential growth of technological innovations and big data. What should companies do to maximize their profits in this dynamic environment? Outsourcing seems to be the logical solution. The single biggest advantage is reduction in existing costs. Consider a simple scenario related to e-Discovery industry:

Company A is looking to hire Document Review Attorney for its e-Discovery project. What could possibly be the lowest per hour rate for a first pass review? How does 20 dollars per hour sounds! In today’s economy, believe it or not, you will find qualified, experienced, and certified individuals who would be willing to work. In the US, this rate is certainly peanuts for an attorney, but in India, Pakistan, Philippines, and Bangladesh, for example, 20 dollars per hour would fetch a luxury lifestyle.

With the advent of cloud computing, developing countries now have access to all the latest technologies, learning tools, methodologies, norms, usages etc. Workforce has truly become global and cloud computing is driving costs further down. As buyers influx the marketplace searching for low priced efficient technologies, sellers lower their costs to remain competitive. Consequently, companies may not afford or attract high paid workers. To bridge the gap, various outsourcing models fit the puzzle, providing same services at a drastically reduced price. Companies now have access to equally qualified workforce available in the cloud. To top it off, Ivy League universities now offer Bachelors and Masters level degrees online. So, for example, I could obtain an MBA degree from an Ivy League business school, while residing anywhere in the world, and provide expertise on a project via the cloud.

Having said that, profitability, principles of demand and supply, and cloud computing technologies are factors exerting pressures on US companies to find alternative ways to increase profitability. Microsoft and Amazon provide secure state-of-the-art data storage centers, and with SaaS, PaaS, and IaaS technologies, allowing for data security. A good example is of WordPress – majority of their employees are virtual. Similarly, Microsoft with its launch of Office 365 and allied products is evidently cloud based, and a qualified professional could administer, manage, and support Office 365 from anywhere in the world!

Watch us on YouTube

: Outsourcing

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Thursday, May 8, 2014

7 Tips for Implementing E-Discovery Best Practices

E-Discovery best practices begin with making data management as part of daily routine and business operations. Attorneys cannot achieve this objective without the help of IT department, and IT personnel cannot properly maintain data without guidance from attorneys about what should be kept or destroyed. Federal Rules of Civil Procedure related to e-Discovery and keeping up with changing law in the area is a good start, however, knowing and understanding how to put these lessons to practically work in practice is the key to implementing and conducting e-Discovery successfully.

Planning ahead plays a pivotal role as it sets the standard for effective relationships between internal and external legal and technical resources. Below are few tips for implementing effective best practices for both inside and outside counsel.

Be proactive and have a formal document retention policy in place with rules for saving and destroying electronic documents.
Increase company-wide awareness of litigation readiness, and train employees to organize documents in an organized manner. Better yet, implement an effective document management solution such as M-Files – which includes e-compliance module.
Effectively cater to big data and effectively implement strategy for later archival, identification, and production in a timely fashion.
Train IT personnel to act as a deposition witness as per rule 30(b)(6).
Preserve potential evidence when necessary while effectively train and involve key legal and IT personnel as soon as litigation is imminent.
Must have adequate knowledge about client’s information systems and operations to effectively define e-Discovery parameters, ensuring smooth functioning with opposing counsel. Try to minimize disruption of clients operations.
When a document request is received, be a partner in the data retrieval process – not just a messenger.

While harmony, effective communication, and smooth functioning between attorneys and IT personnel can prove to be beneficial for the organization, keeping current with latest technology and how it can streamline the e-Discovery process is equally important. After all, the purpose of technology is to act as a tool to handle complex e-Discovery in a speedy and cost efficient manner.

: e-Discovery best practices

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Monday, April 28, 2014

E-Discovery Costs vs. Disseminating Justice – What’s Important?

In e-Discovery, courts, attorneys, e-Discovery consultants, and other industry veterans emphatically deliberate proportionality and predictive coding as major apparatuses for reducing e-Discovery costs. First, Rule 26 - “duty to disclose; general provisions governing discovery” of FRCP encompasses, in entirety, matters relating to initial disclosure, time, scope and limits, pretrial disclosure, limitations, parties conference, sanctions, etc., In other words, the legislative intention behind Rule 26 is to ensure and streamline e-Discovery governance matters.

Secondly, e-Discovery costs can easily escalate to millions of dollars. For instance, on average a Gigabyte (GB) contains 15,000 documents. An average collection of 50 GB entails 750,000 documents which need to be sifted through for relevant details pertaining to specifics of case for defensibility purposes. To give you an idea in terms of costs, reviewing those documents could cost as high as $2 per document or 1.5 million dollars! If 60% were culled down using technology assisted review (TAR), costs would still be as high as $600,000 dollars! E-Discovery budget calculators can be found here.

Here’s the catch! These 750,000 documents are culled down in order to identify potentially relevant documents. The traditional e-Discovery approach is to process all data to TIFF or native for full linear review, whereas, newest and advanced method entails indexing, culling, legal first pass review, and process data for review. With the advent of ‘Big Data’ technology introduced (TAR) or predictive coding as a tool for handling e-Discovery in an efficient cost effective manner.

Statistics plays a pivotal role in TAR, and courts have endorsed usage of TAR in one way or other. However, there may be pitfalls as I explained in one of my earlier posts relating to the limitations of precision and recall in TAR.

Has our justice system become dependent on technology?

Technology is great, however, it must strictly be used as a tool in aid to the due-process of law. As an attorney, I would argue against our justice system’s inclination towards dependability on technology. There are other ways to reduce costs such as global talent acquisition, outsourcing, dual-shoring, offshoring etc., and numerous law firms and corporations have adopted such business models, documenting additional 60% reduction in e-Discovery costs. While reduction in e-Discovery costs are essential, the opportunity cost may undermine defensibility.

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Tuesday, April 22, 2014

7 Things E-Discovery Auditors Must Do

U.S. Federal Rules of Civil Procedure (FRCP) require organizations to look at the ability to respond in a legally defensible manner to discovery requests. Moreover, as organizations expand globally, they need to be ready at all times to provide information that could be requested as evidence in a legal proceeding. Internal or external auditors are in the best position to recommend policies and best practices that can prepare organizations to respond to a data discovery request. The auditors must:

Determine the effectiveness of the e-Discovery communication plan
Document the IT environment
Regularly review backup, retention, and data destruction policies
Review compliance with document destruction procedures, when a litigation hold is issued
Document the steps that will be taken to respond to e-discovery requests
During litigation, determine whether employees are preserving the integrity of relevant material
Review existing backup controls, reports, and inventories of media stored off site

Failing to prepare for an e-discovery request can result in sanctions. Organizations need to have a litigation readiness policy and plan in place to effectively deal with lawsuits. Auditors play a pivotal role in managing litigation risks and help organizations take a proactive approach to e-Discovery by recommending strategies that address key data preservation, storage, destruction, and recovery disquiets. Microsoft SharePoint 2013 and M-Files, for instance, offer e-Discovery and content management solutions to cater to these needs.

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Friday, April 18, 2014

Corporate Social Responsibility in E-Discovery Industry

Corporate Social Responsibility (CSR) is a management concept in which companies integrate social and environmental concerns in their business operations and interactions with their stakeholders. The basic definition according to Wiki:

“Corporate social responsibility is a form of corporate self-regulation integrated into a business model. CSR policy functions as a built-in, self-regulating mechanism whereby a business monitors and ensures its active compliance with the spirit of the law, ethical standards, and international norms”

CSR is best incorporated with the “Triple Bottom Line” (TBL) approach, which is essentially an accounting framework incorporating three dimensions of performance: financial, social, and environmental.

A triple bottom line measures the company's economic value, "people account" – which measures the company's degree of social responsibility and the company's "planet account" – which measures the company's environmental responsibility. While CSR indoctrination within the e-Discovery industry may be prevalent, only a handful of companies may actually have developed and adopted CSR.

Adopting to a mindset of a good corporate citizen, at ClayDesk, we have initiated a CSR program and embedding CSR practices in our business. The foremost area of focus for CSR initiatives are directed towards promotion of legal education, e-Discovery laws, Pro-Bono legal work, sponsor a student, and steps towards a paperless (go-green) environment. These steps will bring about positive change and improve the quality of life of members of the society.

Some of the core CSR issues relate to: environmental management, eco-efficiency, responsible sourcing, stakeholder engagement, labor standards and working conditions, employee and community relations, social equity, gender balance, human rights, good governance, and anti-corruption measures. Denmark, for instance, has CSR Law in place which mandates companies to report their CSR initiatives. Apart from providing charity and sponsorships, CSR concept goes beyond by allowing companies the opportunity to become a socially and ethically responsible corporate citizen.

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Friday, April 11, 2014

When Should E-Discovery Vendors Be Disqualified? Gordon V. Kaleida Health Case

Generally speaking, courts have inherent authority to disqualify parties, representatives, and consultants from participating in litigation. Attorneys, expert witnesses, and litigation consultants may face disqualification motions in the event of a conflict of interest. With the rapid expansion of the eDiscovery industry, however, a new question has arisen: If an eDiscovery vendor has a potential conflict of interest, when should it be disqualified? What standard should apply?

To put the problem in perspective, imagine that you manage discovery at a law firm representing the defendant in a contentious wage and hour dispute, and you recently hired an eDiscovery vendor to assist you in scanning and coding your client’s documents, at a cost of $50,000. Two months later, you receive notice from your vendor that the plaintiff’s counsel has requested its services in connection with the same case. How would you react? Would you expect a court to disqualify the vendor if it accepted the engagement? This scenario occurred in Gordon v. Kaleida Health, resulting in the first judicial order squarely addressing vendor disqualification. The Kaleida Health court ultimately denied the defendant’s motion to disqualify, allowing the vendor to continue participating in the case.

Discussion of Gordon v. Kaleida Health

Kaleida Health arose out of a now commonplace dispute between a hospital and its hourly employees under the Fair Labor Standards Act (“FLSA”). The plaintiffs, a group of hourly employees, sued the defendant, Kaleida Health, a regional hospital system, claiming they were not paid for work time during meal breaks, shift preparation, and required training, in violation of FLSA.

Kaleida Health’s attorneys, Nixon Peabody, LLP (“Nixon”), hired D4 Discovery (“D4”), an eDiscovery vendor, to scan and code documents for use in the litigation. In connection with the work, Nixon and D4 executed a confidentiality agreement. D4 was to “objectively code” the documents using categories based on characteristics of the document, such as the author and the type of document. The coded documents would then be used by Nixon in preparing for upcoming depositions.

Two months later, plaintiffs’ counsel, Thomas & Solomon, LLP (“Thomas”), requested D4 to provide ESI consulting services to it in connection with the same case. D4 notified Nixon, who promptly objected based on the scanning and coding services D4 provided the defendant during the litigation. D4 then provided assurances that Kaleida Health’s documents would not be used in consulting the plaintiffs and that an entirely different group of employees would work with the plaintiffs’ counsel. Nixon, on behalf of Kaleida Health, persisted in its objection to D4 working for the plaintiffs and ultimately filed a motion to disqualify the vendor.

Magistrate Judge Foschio’s analysis began by outlining the standard governing the disqualification of experts and consultants. According to the court, the entity sought to be disqualified must be an expert or a consultant, defined as a “‘source of information and opinions in technical, scientific, medical or other fields of knowledge’” or “one who gives professional advice or services” in that field. After the moving party makes this initial showing, it must meet two further requirements. First, the party’s counsel must have had an “‘objectively reasonable’ belief that a confidential relationship existed with the expert or consultant.” Second, the moving party must also show “that . . . confidential information was ‘actually disclosed’ to the expert or consultant.”

Applying this standard, Judge Foschio ultimately found that because the scanning and objective coding services performed by D4 did not require specialized knowledge or skill and were of a “clerical nature,” D4 was not an “expert” or “consultant.” Further, the court determined that the defendant failed to prove that it provided confidential information to D4 because it did not show “any direct connection between the scanning and coding work . . . and Defendants’ production of [its] ESI.”

Rejecting Kaleida Health’s argument, the court declined to apply to D4 and other eDiscovery vendors the presumption of confidential communications, imputation of shared confidences, and vicarious disqualification applicable in the context of attorney disqualification when a party “switches sides.” The court— as an alternative basis to its finding that D4 did not act as an expert or consultant—held that disqualification was improper because no “prior confidential relationship” existed between Kaleida Health and D4.

Because Kaleida Health represents the first significant attempt at exploring the issues surrounding vendor disqualification, whether later courts should follow Kaleida Health’s lead in exclusively applying the disqualification rules for experts and consultants to vendors becomes the main issue in its wake. To come to a conclusion on this point, one must first explore the different schemes that courts may apply when considering disqualification.

This above excerpt is a part of article originally written by Michael A. Cottone, a candidate for Doctor of Jurisprudence, The University of Tennessee College of Law, May 2014.

e-Discovery | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

: Appellate Court - Lahore

Monday, April 7, 2014

The trade-off between ‘Recall’ and ‘Precision’ in predictive coding (part 2 of 2)

This is the second part of the two-part series of posts relating to information retrieval by applying predictive coding analysis, and details out the trade-off between Recall and Precision. For part 1 of 2, click here.

To clarify further:

Precision (P) is the fraction of retrieved documents that are relevant, where Precision = (number of relevant items retrieved/number of retrieved items) = P (relevant | retrieved)

Recall (R) is the fraction of relevant documents that are retrieved, where Recall = (number of relevant items retrieved/number of relevant items = P (retrieved | relevant)

Recall and Precision are inversely related. A solid criticism of these two metrics is the aspect of biasness, where certain record may be relevant to a person, may not be relevant to another.

So how do you gain optimal values for Recall and Precision in a TAR platform?

Let’s consider a simple scenario:

• A database contains 80 records on a particular topic

• A search was conducted on that topic and 60 records were retrieved.

• Of the 60 records retrieved, 45 were relevant.

Calculate the precision and recall.

Solution:

Using the designations above:

• A = Number of relevant records retrieved,

• B = Number of relevant records not retrieved, and

• C = Number of irrelevant records retrieved.

In this example A = 45, B = 35 (80-45) and C = 15 (60-45).

Recall = (45 / (45 + 35)) * 100% => 45/80 * 100% = 56%

Precision = (45 / (45 + 15)) * 100% => 45/60 * 100% = 75%

So, essentially - the optimal result - high Recall with high Precision is difficult to achieve.

According to Cambridge University Press:

“The advantage of having the two numbers for precision and recall is that one is more important than the other in many circumstances. Typical web surfers would like every result on the first page to be relevant (high precision) but have not the slightest interest in knowing let alone looking at every document that is relevant. In contrast, various professional searchers such as paralegals and intelligence analysts are very concerned with trying to get as high recall as possible, and will tolerate fairly low precision results in order to get it. Individuals searching their hard disks are also often interested in high recall searches. Nevertheless, the two quantities clearly trade off against one another: you can always get a recall of 1 (but very low precision) by retrieving all documents for all queries! Recall is a non-decreasing function of the number of documents retrieved. On the other hand, in a good system, precision usually decreases as the number of documents retrieved is increased”

For part 1 of 1, click here.

e-Discovery | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

: Recall and Precision

Saturday, April 5, 2014

The trade-off between ‘Recall’ and ‘Precision’ in predictive coding (part 1 of 2)

This is a two-part series of posts relating to information retrieval by applying predictive coding analysis, and details out the trade-off between Recall and Precision. Predicting Coding – sometimes referred to as ‘Technology Assisted Review’ (TAR) is basically the integration of technology into human document review process. The two-fold benefit of using TAR is speeding up the review process and reducing costs. Sophisticated algorithms are utilized to produce relevant set of documents. The underlying process in TAR is based on concept of Statistics. In TAR, a sample set of documents (seed-sets) are coded by subject matter experts, acting as the primary reference data to teach TAR machine recognition of relevant patterns in the larger data set. In simple terms, a ‘data sample’ is created based on chosen sampling strategies such as random, stratified, systematic, etc. Remember, it is critical to ensure that seed-sets are prepared by subject matter experts. Based on seed-sets, the algorithm in TAR platform starts assigning predictions to the documents in the database. Through an iterative process, adjustments can be made on the fly to reach desired objectives. The two important metrics used to measure the efficacy of TAR are:

Recall
Precision

Recall is the fraction of the documents that are relevant to the query that are successfully retrieved, whereas, Precision is the fraction of retrieved documents that are relevant to the find. If the computer, in trying to identify relevant documents, identifies a set of 100,000 documents, and after human review, 75,000 out of the 100,000 are found to be relevant, the precision of that set is 75%. In a given population of 200,000 documents, assume 30,000 documents are selected for review as the result of TAR. If 20,000 documents are ultimately found within the 30,000 to be responsive, the selected set has a 66% precision measure. But if another 5,000 relevant documents are found in the remaining 170,000 that were not selected for review, which means the set selected for review has a recall of 80% (20,000 / 25,000).

To be continued....

e-Discovery | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
www.claydesk.com
(855) – 833 – 7775
(703) – 646 - 3043

Syed Raza