Document Classifier

Well, it works 'pretty' well for now.

load_files Loads text files with categories as subfolder names. This returns a dictionary-like object

Also shuffles the data for us :)


In [1]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.datasets import load_files

In [2]:
# categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
# all_of_it = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=None)
all_of_it = load_files("D:\kaam\AdditionalParsed", shuffle=True, random_state=None)
# all_of_it = load_files("20_newsgroups", shuffle=True, random_state=None)
# all_of_it = load_files("20_newsgroups_full", shuffle=True, random_state=None)

Dividing the training and test data into 80-20 ratio(Roughly).

And show the classes available to us


In [3]:
total = len(all_of_it.data)
num = int(0.8 * total)
# num7 = int(0.7 * total)
# num6 = int(0.6 * total)
# num5 = int(0.5 * total)
num4 = int(0.4 * total)
print("No. of Training data: " , num)
print("No. of Testing data: " , total - num)

train_data = all_of_it.data[:num]
test_data = all_of_it.data[num:]

all_of_it.target_names


No. of Training data:  1417
No. of Testing data:  355
Out[3]:
['AoI', 'MC']

Different ratios of training and test set.

As per the first implementation to conclude the POC


In [4]:
# train_data7 = all_of_it.data[:num7]
# test_data7 = all_of_it.data[num7:]

# train_data6 = all_of_it.data[:num6]
# test_data6 = all_of_it.data[num6:]

# train_data5 = all_of_it.data[:num5]
# test_data5 = all_of_it.data[num5:]

train_data4 = all_of_it.data[:num4]
test_data4 = all_of_it.data[num4:]

Some details about the dataset


In [5]:
print(dir(all_of_it))
print(all_of_it.target_names)
# print(type(all_of_it.description))
print(len(all_of_it.data))
print(all_of_it.target[:10])


['DESCR', 'data', 'filenames', 'target', 'target_names']
['AoI', 'MC']
1772
[1 0 1 0 1 1 1 1 1 0]

How the files look like:


In [6]:
all_of_it.filenames[:5]


Out[6]:
array(['D:\\kaam\\AdditionalParsed\\MC\\0001571996-17-000004-6.txt',
       'D:\\kaam\\AdditionalParsed\\AoI\\0000950123-07-004783-2.txt',
       'D:\\kaam\\AdditionalParsed\\MC\\0001213900-17-003784-6.txt',
       'D:\\kaam\\AdditionalParsed\\AoI\\0000732417-15-000016-3.txt',
       'D:\\kaam\\AdditionalParsed\\MC\\0001562762-17-000038-3.txt'], 
      dtype='<U56')

In [7]:
print(all_of_it.data[0])


b' htm 69770 Material contracts 10 2017-03-31T08:00:45.420258-05:00 Filing 0001571996-17-000004 2017-03-31 10-K Annual  & Quarterly Reports 0001571996 Dell Technologies Inc 3571 Electronic Computers TEXAS DELAWARE DELAWARE Denali Holding Inc. false Non-accelerated Filer 001-37867 EXHIBIT 10.38 Exhibit Exhibit 6 exhibit1038_020317.htm EX-10.38 6 exhibit1038_020317.htm EXHIBIT 10.38 \n \n \n Exhibit 10.38 \n INDEMNIFICATION AGREEMENT \n This Indemnification Agreement is dated as of ______ (this " Agreement ") and is between Denali Holding Inc., a Delaware corporation (the " Company "), and __________ (" Indemnitee "). \nWHEREAS, Indemnitee is a director and/or officer of the Company and may also serve as a director, executive officer, employee, consultant, fiduciary or agent (collectively, the " Indemnifiable Positions") of other corporations, limited liability companies, partnerships, joint ventures, trusts, employee benefit plans or other enterprises controlled by the Company (collectively, th e " Controlled Entities "); \nWHEREAS, in order to induce Indemnitee to continue to serve as a director and/or executive officer of the Company and/or in other Indemnifiable Positions of the Controlled Entities, the Company wishes to provide for the indemnification of, and the advancement of Expenses (as defined herein) to, Indemnitee to the maximum extent permitted by law; \n WHEREAS, the certificate of incorporation of the Company (the " Charter") provides for the indemnification of the Company\'s directors and officers to the fullest extent permitted under the Delaware General Corporation Law (the " DGCL "); and \nWHEREAS, the Company and Indemnitee desire to enter into this Agreement to set forth their agreement regarding indemnification and the advancement of Expenses and to clarify the priority of the indemnification and advancement of Expenses with respect to certain Jointly Indemnifiable Claims (as defined herein). \nNOW, THEREFORE, in consideration of Indemnitee\'s service or continued service to the Company and/or the Controlled Entities and the covenants and agreements set forth below, and for other good and valuable consideration, the receipt and adequacy of which are hereby acknowledged, the parties hereto, intending to be legally bound, hereby agree as follows. \n Section 1.     Indemnification . \n To the fullest extent permitted by the DGCL: \n(a)    The Company shall indemnify Indemnitee if Indemnitee was or is made or is threatened to be made a party to, or is otherwise involved in, as a witness or otherwise, any threatened, pending or completed Action, Suit or Proceeding (brought in the right of the Company or otherwise), whether civil, criminal, administrative or investigative and whether form al or informal, including appeals. \n (b)    The indemnification provided by this   Section 1  shall be from and against all loss and liability suffered and Expenses (including attorneys\' fees), Judgments, Fines and Amounts Paid in Settlement actually and reasonably incurred by or on behalf of Indemnitee in connection with such Action, Suit or Proceeding, including any appeals. \n \n \n 1 \n \n \n \n Section 2.     Payment of Expenses. To the fullest extent permitted by the DGCL, Expenses (including attorneys\' fees) incurred by Indemnitee in appearing at, participating in or defending any Action, Suit or Procee ding or in connection with an enforcement action as contemplated by   Section 3 ( d), shall be paid by the Company in advance of the final disposition of such Action, Suit or Proceeding or such enforcement action within 15 days after receipt by the Company of a statement or statements from Indemnitee requesting such advance or advances from time to time. The Indemnitee hereby undertakes to repay any amounts advanced (without interest) to the extent that it is ultimately determined that Indemnitee is not entitled under this Agreement to be indemnified by the Company in respect of such Action, Suit or Proceeding or su ch enforcement action as contemplated by   Section 3 ( d). No other form of undertaking shall be required of Indemnitee other than the execution of this Agreement. This   Section 2   shall be subject to   Section 3 ( a ) and shall not apply to any claim made by Indemnitee for which indemnity is excluded pursuant to   Section 6 ( a ). \n Section 3.     Procedure for Indemnification; Notification and Defense of Claim . \n(a)    Promptly after receipt by Indemnitee of actual notice of the commencement of any Action, Suit or Proceeding, Indemnitee shall, if a claim in respect thereof is to be made or could be made against the Company hereunder, notify the Company in writing of the commencement thereof. The failure to promptly notify the Company of the commencement of the Action, Suit or Proceeding, or of Indemnitee\'s request for indemnification, will not relieve the Company from any liability that it may have to Indemnitee hereunder, except to the extent the Company is actually and materially prejudiced (through the forfeiture of substantive rights or defenses) in its defense of such Action, Suit or Proceeding as a result of such failure. With respect to any Action, Suit or Proceeding of which the Company is so notified as provided in this Agreement, the Company shall, subject to the last two sentences of  this paragraph and subject to the Company\'s prior determination pursuant to   Section 3 ( c) to grant Indemnitee\'s indemnification request with respect to such Action, Suit or Proceeding prior to the final disposition of such Action, Suit or Proceeding or such enforcement action, be entitled to assume the defense of such Action, Suit or Proceeding, with counsel reasonably acceptable to Indemnitee (which acceptance shall not be unreasonably withheld or delayed), upon the delivery to Indemnitee of written notice of its election to do so. After delivery of such notice, approval of such counsel by Indemnitee and the retention of such counsel by the Company, the Company will not be liable to Indemnitee under this Agreement for any subsequently-incurred fees of separate counsel engaged by or on behalf of Indemnitee with respect to the same Action, Suit or Proceeding unless the Company does not continue to retain such counsel to defend such Action, Suit or Proceeding. Notwithstanding the foregoing, if Indemnitee, based on the advice of his or her counsel, shall have reasonably concluded that, in the conduct of any such defense, there is or is reasonably likely to be a conflict of interest or position between the Company and Indemnitee with respect to a significant issue, then the Company will not be entitled, without the written consent of Indemnitee, to assume such defense. In addition, the Company will not be entitled, without the written consent of Indemnitee, to assume the defense of any claim brought by or i n the right of the Company. \n(b)    To obtain indemnification under this Agreement, upon final disposition of such Action, Suit or Proceeding or such enforcement action, in each case, contemplated in   Section 1 , Indemnitee shall submit to the Company a written request therefor including such documentation \n \n \n 2 \n \n \n \nand information as is reasonably available to Indemnitee and, to the extent available, such documentation and information as is reasonably necessary to enable the Company to determine whether and to what extent Indemnitee is entitled to indemnification. In addition, Indemnitee shall reasonably cooperate with the Company and shall give the Company such additi onal information as the Company may reasonably require. \n(c)    The determination whether to grant Indemnitee\'s indemnification request shall be made promptly and in any event within 30 days following the Company\'s receipt of a request f or indemnification in accordance with   Section 3 ( a ) (the " Indemnity Review Period"). Prior to final disposition of the Action, Suit or Proceeding or enforcement action in respect of which the indemnification request is made, if (i) the Company\'s determination of whether to grant Indemnitee\'s indemnification request shall not have been made within the Indemnity Review Period or (ii) the Company denies the Indemnitee\'s indemnification request during the Indemnity Review Period, then, in each case, the indemnification request shall be deemed to have been denied and the Indemnitee shall not be entitled to such indemnification during the pendency of such Action, Suit or Proceeding or enforcement action, without prejudice to any subsequent indemnification request made by the Indemnitee following final disposition of such Action, Suit or Proceeding or such enforcement action in respect of such Action, Suit or Proceeding or such enforcement action. Following final disposition of the Action, Suit or Proceeding or enforcement action in respect of which the indemnification request is made, if the Company\'s determination of whether to grant Indemnitee\'s indemnification request shall not have been made within the Indemnity Review Period, then the requisite determination of entitlement to indemnification shall, subject to   Section 6, nonetheless be deemed to have been made and Indemnitee shall be entitled to such indemnification, absent (i) a misstatement by Indemnitee of a material fact, or an omission of a material fact necessary to make Indemnitee\'s statement not materially misleading, in connection with the request for indemnification, or (ii) a prohibition of such indemnification under applicable law. If the Company determines that Indemnitee is entitled to such indemnification within the Indemnity Review Period, the Company will make payments to Indemnitee  of any indemnifiable amounts pursuant to   Section 1, in each case within 30 days following any payment request from Indemnitee (for any such payment request, the applicable " Payment Request Period "). \n (d)    In the event that (i) the Company determines in accordance with this   Section 3  that Indemnitee is not entitled to indemnification under this Agreement, (ii) the Company denies a request for indemnification, in whole or in part, or fails to respond or make a determination of entitlement to indemnification within the Indemnity Review Period, (iii) payment of any indemnifiable amounts pursuant to   Section 1  is not made by the Company within the applicable Payment Request Period, (iv) advancement of Expenses is not timely made in accordance with   Section 2, or (v) the Company or any other person takes or threatens to take any action to declare this Agreement void or unenforceable, or institutes any litigation or other action or proceeding designed to deny, or to recover from, the Indemnitee the benefits provided or intended to be provided to Indemnitee hereunder, Indemnitee shall be entitled to an adjudication in any court of competent jurisdiction of his or her entitlement to such indemnification or advancement of Expenses. To the extent not already advanced pursuant to   Section 2, Indemnitee\'s Expenses (including attorneys\' fees) incurred in connection with successfully establishing Indemnitee\'s right to indemnification or advancement of Expenses, in whole  or in part, in any such proceeding or otherwise shall also be \n \n \n 3 \n \n \n \n indemnified by the Company;   provided  that to the extent Indemnitee is successful in part and unsuccessful in part in establishing Indemnitee\'s right to indemnification or advancement of Expenses hereunder, Indemnitee  shall be entitled to partial indemnification of Expenses in accordance with   Section 20 . \n(e)    Indemnitee shall be presumed to be entitled to advancement of Expenses and, following final disposition of the Action, Suit or Proceeding or enforcement action in respect of which the indemnification request is made, to indemnification, in each case, under this Agreement upon submission of a request therefor in accordance with   Section 2   or   Section 3  of this Agreement, as the case may be. The Company shall have the burden of proof in overcoming such presumption, and such presumption shall be used as a basis for a determination of entitlement to indemnification and advancement of Expenses unless the Company overcomes such presumption by clear and convincing evidence. Neither the failure of the Company to have made a determination prior to the commencement of any action pursuant to this Agreement that indemnification is proper in the circumstances because Indemnitee has met the applicable standard of conduct, nor an actual determination by the Company that Indemnitee has not met such applicable standard of conduct, shall be a defense to the action or create a  presumption that Indemnitee has not met the applicable standard of conduct. \n Section 4.     Insurance and Subrogation . \n(a)    To the extent the Company maintains a policy or policies of insurance providing directors\' and officers\' liability insurance, Indemnitee shall be covered by such policy or policies, in accordance with its or their terms, to the maximum extent of the coverage provided to any other director or officer of the Company. If, at the time the Company receives from Indemnitee any notice of the commencement of an Action, Suit or Proceeding, the Company has such insurance in effect which would reasonably be expected to cover such Action, Suit or Proceeding, the Company shall give prompt notice of the commencement of such Action, Suit or Proceeding to the insurers in accordance with the procedures set forth in such policy or policies. The Company shall thereafter take all necessary or reasonably desirable action to cause such insurers to pay, on behalf of Indemnitee, all amounts payable as a  result of such Action, Suit or Proceeding in accordance with the terms of such policy or policies. \n (b)    Subject to   Section 9 ( b), in the event of any payment by the Company under this Agreement, the Company shall be subrogated to the extent of such payment to all of the rights of recovery of Indemnitee with respect to any insurance policy. Indemnitee shall execute all papers reasonably required and take all action reasonably necessary to secure such rights, including execution of such documents as are necessary to enable the Company to effectively bring suit to enforce such rights in accordance with the terms of such insurance policy. The Company shall pay or  reimburse all Expenses incurred by Indemnitee in connection with such subrogation. \n (c)    Subject to   Section 9 ( b), the Company shall not be liable under this Agreement to make any payment of amounts otherwise indemnifiable hereunder (including, but not limited to, Judgments, Fines and Amounts Paid in Settlement, and ERISA excise taxes or penalties) if and to the extent that Indemnitee has otherwise actually received such payment under this Agreement or any insurance p olicy, contract, agreement or otherwise. \n \n \n 4 \n \n \n \n Section 5.     Certain Definitions . For purposes of this Agreement, the following definitions shall apply: \n (a)    The term " Action ,   Suit or Proceeding" shall be broadly construed and shall include, without limitation, the investigation, preparation, prosecution, defense, settlement, arbitration and appeal of, and the giving of testimony in, any threatened, pending or completed claim, action, suit, arbitration, investigation, inquiry, alternative dispute mechanism or proceeding, whether civil (including intentional and unintentional tort claims), criminal, administrative or investigative, in each case, by reason of the service of Indemnitee as a director and/or officer of the Company and/or in other Indemnifiable Positions of the Controlled Entities, or by reason of any action alleged to have been taken or omitted in any such capacity, and whether pursuant to any alleged breach of any fiduciary duty owed by, or failure to meet any standard of care applicable to, any such Indemnitee in respect of the Company or any Controlled Entity, or  otherwise. \n (b)    The term " Expenses" shall include all out-of-pocket costs of any type or nature whatsoever (including, without limitation, all attorneys\' fees and related disbursements), in each case, actually and reasonably incurred by or on behalf of Indemnitee in connection with either the investigation, defense or appeal of an Action, Suit or Proceeding or establishing or enforcing a right to indemnification under this Agreement or otherwise incurred in connection with a claim that is indemnifiable hereunder. \n (c)    The term " Judgments ,   Fines and Amounts Paid in Settlement" shall be broadly construed and shall mean any direct or indirect payments of any type or nature whatsoever owing or paid in connection with an Action, Suit or Proceeding, including without limitation, all judgments, awards, fines, penalties and amounts in settlement, as well as any penalties or excise taxes assessed on a person with respect to an employee  benefit plan. \n Section 6.     Limitation on Indemnification. Notwithstanding any other provision herein to the contrary, the Company shall not be obligated pursuant to this Agreement: \n (a)     Claims Initiated by Indemnitee. To indemnify or advance Expenses to Indemnitee with respect to any threatened, pending or completed claim, action, suit, arbitration, investigation, inquiry, alternative dispute mechanism or proceeding, whether civil (including intentional and unintentional tort claims), criminal, administrative or investigative, however denominated, initiated or brought voluntarily by Indemnitee whether by way of defense, counterclaim or cross claim or otherwise, other than (i) an action brought to establish or enforce a right to indemnification or  advancement of Expenses under this Agreement (which shall be governed by the provisions of   Section 6 ( b) of this Agreement), a claim, action, suit, arbitration, investigation, inquiry, alternative dispute mechanism or proceeding that was authorized or consented to by the Board of Directors of the Company, it being understood and agreed that such authorization or consent shall not be unreasonably withheld in connection with any compulsory counterclaim brought by Indemnitee in response to an Action, Suit or Proceeding otherwise indemnifiable under this Agreement or (ii) as otherwise required under the DGCL. \n (b)     Action for Indemnification. To indemnify Indemnitee for any Expenses incurred by Indemnitee with respect to an action instituted by Indemnitee to enforce or interpret this Agreement \n \n \n 5 \n \n \n \nif Indemnitee is not successful in such enforcement action in establishing Indemnitee\'s right, in whole or in part, to indemnification or advancement of Expenses hereunder;   provided  that to the extent Indemnitee is successful in part and unsuccessful in part in establishing Indemnitee\'s right to indemnification or advancement of Expenses hereunder, Indemnitee  shall be entitled to partial indemnification of Expenses in accordance with   Section 20 . \n (c)     Section 16 ( b )   Matters. To indemnify Indemnitee on account of any Action, Suit or Proceeding in which judgment is rendered against Indemnitee for disgorgement of profits made from the purchase or sale by Indemnitee of securities of the Company pursuant to the provisions of Section 16(b) of the Securities Exchange Act of 1934, as amended (excluding any purchase or sale deemed to be made in connection with any merger, consolidation, reorganization or other transaction undertaken by the Company). \n (d)     Fraud or Willful Misconduct. To indemnify Indemnitee on account of conduct by Indemnitee where such conduct has been determined to have been knowingly fraudulent or constitute willful misconduct by a final (not interlocutory) judgment or other adjudication of a court or arbitration or administrative body of competent jurisdiction as to which there is no further right or option of appeal or the time within which an appeal must be filed has expired without such filing. For the avoidance of doubt, each of the Company and Indemnitee acknowledge and agree that any actions or omissions by Indemnitee that are permitted by Section 6.2 of that Certain Sponsor Stockholders Agreement, dated October, 29, 2013, by and among the Company and the investors signatory thereto, and that do not otherwise constitute willful misconduct shall not be considered willful misconduct for purposes of this Agreement. \n (e)     Prohibited by Law. To indemnify Indemnitee in any circumstance where such indemnification has been determined to be prohibited by law by a final (not interlocutory) judgment or other adjudication of a court or arbitration or administrative body of competent jurisdiction as to which there is no further right or option of appeal or the time within which an appeal must be filed  has expired without such filing. \n (f)     Unauthorized Settlement. To indemnify Indemnitee for any amounts paid in settlement of any Action, Suit or Proceeding without the Company\'s prior written consent;   provided   that the Company will not unreasonably withhold or delay its consent to any proposed settlement. \n Section 7.     Certain Settlement Provisions. The Company shall be permitted to settle any Action, Suit or Proceeding, except that it shall not settle any Action, Suit or Proceeding in any manner that would impose any penalty (unless the only penalty imposed is a monetary payment that will be paid in full by the Company (or its insurers)) or limitations or constitute any admission of wrongdoing or which may compromise, or may adversely affect, the defense of the Indemnitee in any other Action, Suit or Proceeding, whether civil or criminal, without Indemnitee\'s prior written consent. Indemnitee will not unreasonably withhold or delay his or, her consent to any proposed settlement. \n Section 8.     Savings Clause. If any provision or provisions (or portion thereof) of this Agreement shall be invalidated on any ground by any court of competent jurisdiction, then the Company shall neverthele ss indemnify Indemnitee if Indemnitee was or is made or is threatened to be made a party \n \n \n 6 \n \n \n \nor is otherwise involved in any threatened, pending or completed Action, Suit or Proceeding (brought in the right of the Company or otherwise), whether civil, criminal, administrative or investigative and whether formal or informal, including appeals, from and against all loss and liability suffered and Expenses (including attorneys\' fees), Judgments, Fines and Amounts Paid in Settlement actually and reasonably incurred by or on behalf of Indemnitee in connection with such Action, Suit or Proceeding, including any appeals, to the fullest extent permitted by any applicable portion of this Agreement that shall not have been invalidated. \n Section 9.     Contribution/Jointly Indemnifiable Claims . \n(a)    In order to provide for just and equitable contribution in circumstances in which the indemnification provided for herein is held by a court of competent jurisdiction to be unavailable to Indemnitee in whole or in part, it is agreed that, in such event, the Company shall, to the fullest extent permitted by law, contribute to the payment of all of Indemnitee\'s loss and liability suffered and Expenses (including attorneys\' fees), Judgments, Fines and Amounts Paid in Settlement actually and reasonably incurred by or on behalf of Indemnitee in connection with any Action, Suit or Proceeding, including any appeals, in an amount that is just and equitable in the circumstances;   provided, that, without limiting the generality of the foregoing, such contribution shall not be required where such holding by the court is due to any limitation on indemnification set fo rth in   Section 4 ( c ), 6, or 7 hereof. \n(b)    Given that certain Jointly Indemnifiable Claims by reason of the service of Indemnitee as a director of the Company and/or in other Indemnifiable Positions of the Controlled Entities, or by reason of any action alleged to have been taken or omitted in any such capacity, the Company acknowledges and agrees that the Company shall, and to the extent applicable shall cause the Controlled Entities to, be fully and primarily responsible for the payment to the Indemnitee in respect of indemnification or advancement of Expenses in connection with any such Jointly Indemnifiable Claim, pursuant to and in accordance with (as applicable) the terms of (i) the DGCL, (ii) the Charter, (iii) this Agreement, (iv) any other agreement between the Company or any Controlled Entity and the Indemnitee pursuant to which the Indemnitee is indemnified, (v) the laws of the jurisdiction of incorporation or organization of any Controlled Entity and/or (vi) the certificate of incorporation, certificate of organization, bylaws, partnership agreement, operating agreement, certificate of formation, certificate of limited partnership or other organizational or governing documents of any Controlled Entity ((i) through (vii) collectively, the " Indemnification Sources"), irrespective of any right of recovery the Indemnitee may have from the Indemnitee-Related Entities. Under no circumstance shall the Company or any Controlled Entity be entitled to any right of subrogation or contribution by the Indemnitee-Related Entities and no right of advancement or recovery the Indemnitee may have from the Indemnitee-Related Entities shall reduce or otherwise alter the rights of the Indemnitee or the obligations of the Company or any Controlled Entity under the Indemnification Sources. In the event that any of the Indemnitee-Related Entities shall make any payment to the Indemnitee in respect of indemnification or advancement of Expenses with respect to any Jointly Indemnifiable Claim, (i) the Company shall, and to the extent applicable shall cause the Controlled Entities to, reimburse the Indemnitee-Related Entity making such payment to the extent of such payment promptly upon written demand from such Indemnitee-Related Entity, (ii) to the extent not previously and fully reimbursed by the Company and/or any Controlled Entity \n \n \n 7 \n \n \n \npursuant to clause (i), the Indemnitee-Related Entity making such payment shall be subrogated to the extent of the outstanding balance of such payment to all of the rights of recovery of the Indemnitee against the Company and/or any Controlled Entity or under any insurance policy, as applicable, and (iii) Indemnitee and the Company and, as applicable, any Controlled Entity shall execute all papers reasonably required and shall do all things that may be reasonably necessary to secure such rights, including the execution of such documents as may be necessary to enable the Indemnitee-Related Entities effectively to bring suit to enforce such rights. The Company and Indemnitee agree that each of the Indemnitee-Rela ted Entities shall be third-party beneficiaries with respect to this   Section 9 ( b ),   entitled to enforce this   Section 9 ( b) as though each such Indemnitee-Related Entity were a party to this Agreement. The Company shall cause each of the Controlled Entities to perform the terms and obligations of this   Section 9 ( b ) as though each such Controlled Entity was a party to this Agreement. For purposes of this   Section 9 ( b ), the following terms shall have the following meanings: \n (i)    The term " Indemnitee-Related Entities" means any company, corporation, limited liability company, partnership, joint venture, trust, employee benefit plan or other enterprise (other than the Company, any Controlled Entity or the insurer under and pursuant to an insurance policy of the Company or any Controlled Entity) from whom an Indemnitee may be entitled to indemnification or advancement of Expenses with respect to which, in whole or in part, the Company or any Controlled Entity may also have an indemnification or advancement obligation. \n (ii)    The term " Jointly Indemnifiable Claims" shall be broadly construed and shall include, without limitation, any Action, Suit or Proceeding for which the Indemnitee shall be entitled to indemnification or advancement of Expenses from both (i) the Company and/or any Controlled Entity pursuant to the Indemnification Sources, on the one hand, and (ii) any Indemnitee-Related Entity pursuant to any other agreement between any Indemnitee-Related Entity and the Indemnitee pursuant to which the Indemnitee is indemnified, the laws of the jurisdiction of incorporation or organization of any Indemnitee-Related Entity and/or the certificate of incorporation, certificate of organization, bylaws, partnership agreement, operating agreement, certificate of formation, certificate of limited partnership or other organizational or governing documents of any Indemnitee-Related Entity, on the other hand. \n Section 10.     Form and Delivery of Communications. All notices, requests, demands and other communications under this Agreement shall be in writing and shall be deemed to have been duly given if (a) delivered by hand, upon receipt by the party to whom said notice or other communication shall have been directed, (b)mailed by certified or registered mail with postage prepaid, on the third business day after the date on which it is so mailed, (c) mailed by reputable overnight courier, one day after deposit with such courier and with written verification of receipt or (d) sent by email or facsimile transmission, with receipt of oral confirmation that such transmission has been received. Addresses for notice to either party are shown on the signature page of this  Agreement, or as subsequently modified by written notice. \n \n \n 8 \n \n \n \n Section 11.     Nonexclusivity. The provisions for indemnification and advancement of Expenses set forth in this Agreement shall not be deemed exclusive of any other rights which Indemnitee may have under any provision of law, in any court in which a proceeding is brought, other agreements or otherwise, and Indemnitee\'s rights hereunder shall inure to the benefit of the heirs, executors and administrators of Indemnitee. No amendment or alteration of the Charter or the Company\'s bylaws or any other agreement shall adversely affect the rights provided to Indemnitee  under this Agreement. \n Section 12.     No Construction as Employment Agreement; Duration of Agreement. Nothing contained herein shall be construed as giving Indemnitee any right to be retained as a director and/or officer of the Company or in other Indemnifiable Positions of the Controlled Entities or in the employ of the Company or any of the Controlled Entities. For the avoidance of doubt, the indemnification and advancement of Expenses provided under this Agreement shall continue as to the Indemnitee even though he may have ceased to be a director and/or officer of the Company and/or in other Indemnifiable Positions of the Control led Entities. \n Section 13.     Interpretation of Agreement. It is understood that the parties hereto intend this Agreement to be interpreted and enforced so as to provide indemnification to Indemnitee to the fullest extent now or hereafter permitted by the DGCL notwithstanding that such indemnification may not be specifically authorized by the Charter or the Company\'s bylaws, or by statute as of the date hereof. In the event of any change after the date of this Agreement in any applicable law, statute or rule which expands the right of a Delaware corporation to indemnify a member of its board of directors or an officer, employee, consultant, fiduciary or agent, it is the intent of the parties hereto that Indemnitee shall enjoy by this Agreement the greater benefits afforded by such change. In the event of any change in any applicable law, statute or rule which narrows the right of a Delaware corporation to indemnify a member of its board of directors or an officer, employee, consultant, fiduciary or agent, such change, to the extent not otherwise required by such law, statute or rule to be applied to this Agreement, shal l have no effect on this Agreement or the parties\' rights and obligations hereunder. \n Section 14.     Entire Agreement. Without limiting any of the rights of Indemnitee under the Charter and/or the Company\'s bylaws, this Agreement and the documents expressly referred to herein constitute the entire agreement between the parties hereto with respect to the matters covered hereby, and any other prior or contemporaneous oral or written understandings or agreements with respect  to the matters covered hereby are expressly superseded by this Agreement. \n Section 15.     Modification and Waiver. No supplement, modification, waiver or amendment of this Agreement shall be binding unless executed in writing by both of the parties hereto. No waiver of any of the provisions of this Agreement shall be deemed or shall constitute a waiver of any other provision hereof (whether or not similar) nor shall such waiver constitute a continuing waiver. For the avoidance of doubt, this Agreement may not be terminated by the Company without Indemnitee\'s prior written consent. \n Section 16.     Successor and Assigns. All of the terms and provisions of this Agreement shall be binding upon, shall inure to the benefit of and shall be enforceable by the parties hereto and their respective successors, assigns, spouses, heirs, executors, administrators and legal representatives. The Company shall require and cause any direct or indirect successor (whether by purchase, merger , \n \n \n 9 \n \n \n \nconsolidation or otherwise) to all or substantially all of the business or assets of the Company, by written agreement in form and substance reasonably satisfactory to Indemnitee, expressly to assume and agree to perform this Agreement in the same manner and to the same extent that the Company would be required to perform if no such succession had taken plac e. \n Section 17.     Service of Process and Venue .  The Company and Indemnitee hereby irrevocably and unconditionally (i) agree that any action or proceeding arising out of or in connection with this Agreement shall be brought only  in the Chancery Court of the State of Delaware (the " Delaware Court"), and not in any other state or federal court in the United States of America or any court in any other country, (ii) consent to submit to the exclusive jurisdiction of the Delaware Court for purposes of any action or proceeding arising out of or in connection with this Agreement, (iii) waive any objection to the laying of venue of any such action or proceeding in the Delaware Court, and (iv) waive, and agree not to plead or to make, any claim that any such action or proceeding brought in the Delaware Court has been brought in an im proper or inconvenient forum. \n Section 18.     Governing Law .  This Agreement shall be governed by and construed in accordance with the laws of the State of Delaware. If a court of competent jurisdiction shall make a final determination that the provisions of the law of any state other than Delaware govern indemnification by the Company of Indemnitee, then the indemnification provided under this Agreement shall in all instances be enforceable to the fullest extent permitted under such law, notwithstanding any provision of this Agreement to the contrary. \n Section 19.     Injunctive Relief. The parties hereto agree that each party hereto may enforce this Agreement by seeking specific performance hereof, without any necessity of showing irreparable harm or posting a bond, which requirements are hereby waived, and that by seeking specific performance, Indemnitee shall not be precluded from seeking or obtaining any other relief to which he or sh e may be entitled. \n Section 20.     Partial Indemnification. If Indemnitee is entitled under any provision of this Agreement to indemnification by the Company for some or a portion of loss and liability suffered and Expenses (including attorneys\' fees), Judgments, Fines and Amounts Paid in Settlement actually and reasonably incurred by or on behalf of Indemnitee in connection with an Action, Suit or Proceeding, including any appeals, but not, however, for the total amount thereof, the Company shall nevertheless indemnify Indemnitee for the portion of such amounts otherwise payable hereunder. \n Section 21.     Mutual Acknowledgement. Both the Company and Indemnitee acknowledge that in certain instances, federal law or applicable public policy may prohibit the Company from indemnifying its directors, officers, employees, consultants, fiduciaries or agents under this Agreement or otherwise. Indemnitee understands and acknowledges that the Company may be required to submit the question of indemnification to a court in certain circumstances for a determination of the Company\'s right, under public policy, to indemnify Indemnitee. \n Section 22.     Counterparts. This Agreement may be executed in two or more counterparts, each of which shall be deemed to be an original and all of which together shall be deemed to be one and \n \n \n 10 \n \n \n \nthe same instrument, notwithstanding that both parties are not signatories to the original or same counterpart. \n Section 23.     Headings. The section and subsection headings contained in this Agreement are for reference purposes only and shall not affect in any way the meaning or interpretation of this Agreement. \nThis Indemnification Agreement has been duly executed and delivered to be effective as of the date stated above. \n DELL TECHNOLOGIES \n \n \n \n By:                          \n Name: \n Title: \n \n Address: One Dell Way, Round Rock, Texas 78682 \n \n \n INDEMNITEE: \n \n \n                          \n   \n \n Address: \n \n \n \n \n \n \n \n \n \n \n [Indemnification Agreement] \n \n \n 11 \n \n'

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer()
X_train_tf = vect.fit_transform(train_data)
print(X_train_tf.shape)
vect.vocabulary_


(1417, 45528)
Out[8]:
{'htm': 24744,
 '69770': 8327,
 'material': 29061,
 'contracts': 16490,
 '10': 2145,
 '2017': 4286,
 '03': 1694,
 '31t08': 5547,
 '00': 0,
 '45': 6777,
 '420258': 6604,
 '05': 1812,
 'filing': 22230,
 '0001571996': 881,
 '17': 3549,
 '000004': 35,
 '31': 5464,
 'annual': 11650,
 'quarterly': 34864,
 'reports': 36179,
 'dell': 17850,
 'technologies': 40524,
 'inc': 25425,
 '3571': 5974,
 'electronic': 19839,
 'computers': 16014,
 'texas': 40758,
 'delaware': 17794,
 'denali': 17905,
 'holding': 24570,
 'false': 21891,
 'non': 30979,
 'accelerated': 10278,
 'filer': 22223,
 '001': 1072,
 '37867': 6218,
 'exhibit': 21409,
 '38': 6241,
 'exhibit1038_020317': 21437,
 'ex': 21010,
 'indemnification': 25587,
 'agreement': 10974,
 'this': 40913,
 'is': 26485,
 'dated': 17456,
 'as': 12240,
 'of': 31674,
 '______': 9905,
 'and': 11516,
 'between': 13431,
 'corporation': 16696,
 'the': 40785,
 'company': 15860,
 '__________': 9909,
 'indemnitee': 25598,
 'whereas': 43939,
 'director': 18421,
 'or': 32133,
 'officer': 31714,
 'may': 29135,
 'also': 11299,
 'serve': 38090,
 'executive': 21329,
 'employee': 20039,
 'consultant': 16371,
 'fiduciary': 22193,
 'agent': 10930,
 'collectively': 15680,
 'indemnifiable': 25581,
 'positions': 33751,
 'other': 32364,
 'corporations': 16700,
 'limited': 28174,
 'liability': 28036,
 'companies': 15858,
 'partnerships': 32841,
 'joint': 26919,
 'ventures': 43051,
 'trusts': 41682,
 'benefit': 13293,
 'plans': 33500,
 'enterprises': 20323,
 'controlled': 16535,
 'by': 14233,
 'th': 40769,
 'entities': 20351,
 'in': 25383,
 'order': 32165,
 'to': 41174,
 'induce': 25684,
 'continue': 16467,
 'wishes': 44121,
 'provide': 34526,
 'for': 22539,
 'advancement': 10739,
 'expenses': 21581,
 'defined': 17758,
 'herein': 24327,
 'maximum': 29126,
 'extent': 21677,
 'permitted': 33196,
 'law': 27686,
 'certificate': 14878,
 'incorporation': 25519,
 'charter': 15042,
 'provides': 34532,
 'directors': 18424,
 'officers': 31716,
 'fullest': 22922,
 'under': 42164,
 'general': 23266,
 'dgcl': 18242,
 'desire': 18098,
 'enter': 20315,
 'into': 26232,
 'set': 38117,
 'forth': 22666,
 'their': 40804,
 'regarding': 35712,
 'clarify': 15382,
 'priority': 34184,
 'with': 44125,
 'respect': 36391,
 'certain': 14866,
 'jointly': 26920,
 'claims': 15370,
 'now': 31241,
 'therefore': 40842,
 'consideration': 16279,
 'service': 38098,
 'continued': 16468,
 'covenants': 16841,
 'agreements': 10976,
 'below': 13263,
 'good': 23612,
 'valuable': 42925,
 'receipt': 35342,
 'adequacy': 10599,
 'which': 43960,
 'are': 12072,
 'hereby': 24318,
 'acknowledged': 10428,
 'parties': 32823,
 'hereto': 24341,
 'intending': 26088,
 'be': 13110,
 'legally': 27866,
 'bound': 13870,
 'agree': 10965,
 'follows': 22517,
 'section': 37886,
 'shall': 38227,
 'indemnify': 25593,
 'if': 25051,
 'was': 43708,
 'made': 28738,
 'threatened': 40952,
 'party': 32844,
 'otherwise': 32375,
 'involved': 26351,
 'witness': 44161,
 'any': 11780,
 'pending': 33066,
 'completed': 15939,
 'action': 10507,
 'suit': 39938,
 'proceeding': 34236,
 'brought': 14061,
 'right': 36790,
 'whether': 43956,
 'civil': 15337,
 'criminal': 16966,
 'administrative': 10677,
 'investigative': 26314,
 'form': 22621,
 'al': 11111,
 'informal': 25756,
 'including': 25473,
 'appeals': 11844,
 'provided': 34527,
 'from': 22836,
 'against': 10908,
 'all': 11214,
 'loss': 28515,
 'suffered': 39911,
 'attorneys': 12534,
 'fees': 22036,
 'judgments': 26996,
 'fines': 22300,
 'amounts': 11450,
 'paid': 32653,
 'settlement': 38137,
 'actually': 10533,
 'reasonably': 35275,
 'incurred': 25555,
 'on': 31910,
 'behalf': 13208,
 'connection': 16218,
 'such': 39895,
 'payment': 32956,
 'appearing': 11851,
 'at': 12420,
 'participating': 32811,
 'defending': 17732,
 'procee': 34231,
 'ding': 18390,
 'an': 11472,
 'enforcement': 20205,
 'contemplated': 16428,
 'advance': 10735,
 'final': 22249,
 'disposition': 18655,
 'within': 44148,
 '15': 3235,
 'days': 17492,
 'after': 10895,
 'statement': 39326,
 'statements': 39327,
 'requesting': 36263,
 'advances': 10741,
 'time': 41060,
 'undertakes': 42211,
 'repay': 36133,
 'advanced': 10736,
 'without': 44153,
 'interest': 26128,
 'that': 40781,
 'it': 26597,
 'ultimately': 42003,
 'determined': 18161,
 'not': 31150,
 'entitled': 20354,
 'indemnified': 25591,
 'su': 39669,
 'ch': 14941,
 'no': 30932,
 'undertaking': 42212,
 'required': 36268,
 'than': 40774,
 'execution': 21326,
 'subject': 39707,
 'apply': 11888,
 'claim': 15364,
 'indemnity': 25603,
 'excluded': 21296,
 'pursuant': 34717,
 'procedure': 34229,
 'notification': 31192,
 'defense': 17734,
 'promptly': 34365,
 'actual': 10531,
 'notice': 31184,
 'commencement': 15756,
 'thereof': 40849,
 'could': 16776,
 'hereunder': 24347,
 'notify': 31198,
 'writing': 44307,
 'failure': 21862,
 'request': 36260,
 'will': 44036,
 'relieve': 35931,
 'have': 24147,
 'except': 21256,
 'materially': 29063,
 'prejudiced': 33983,
 'through': 40965,
 'forfeiture': 22603,
 'substantive': 39841,
 'rights': 36794,
 'defenses': 17735,
 'its': 26654,
 'result': 36504,
 'so': 38785,
 'notified': 31195,
 'last': 27630,
 'two': 41813,
 'sentences': 38028,
 'paragraph': 32725,
 'prior': 34177,
 'determination': 18156,
 'grant': 23684,
 'assume': 12388,
 'counsel': 16782,
 'acceptable': 10288,
 'acceptance': 10290,
 'unreasonably': 42430,
 'withheld': 44139,
 'delayed': 17796,
 'upon': 42584,
 'delivery': 17849,
 'written': 44313,
 'election': 19819,
 'do': 18875,
 'approval': 11954,
 'retention': 36549,
 'liable': 28039,
 'subsequently': 39802,
 'separate': 38043,
 'engaged': 20217,
 'same': 37509,
 'unless': 42373,
 'does': 18915,
 'retain': 36529,
 'defend': 17726,
 'notwithstanding': 31216,
 'foregoing': 22577,
 'based': 13032,
 'advice': 10767,
 'his': 24492,
 'her': 24302,
 'concluded': 16058,
 'conduct': 16121,
 'there': 40833,
 'likely': 28154,
 'conflict': 16175,
 'position': 33748,
 'significant': 38494,
 'issue': 26570,
 'then': 40812,
 'consent': 16249,
 'addition': 10576,
 'obtain': 31580,
 'each': 19311,
 'case': 14592,
 'submit': 39747,
 'therefor': 40841,
 'documentation': 18899,
 'information': 25762,
 'available': 12749,
 'necessary': 30606,
 'enable': 20072,
 'determine': 18160,
 'what': 43922,
 'cooperate': 16622,
 'give': 23479,
 'additi': 10574,
 'onal': 31915,
 'require': 36267,
 'event': 20954,
 '30': 5372,
 'following': 22515,
 'accordance': 10340,
 'review': 36633,
 'period': 33160,
 'been': 13166,
 'ii': 25135,
 'denies': 17916,
 'during': 19236,
 'deemed': 17684,
 'denied': 17915,
 'pendency': 33062,
 'prejudice': 33982,
 'subsequent': 39801,
 'requisite': 36278,
 'entitlement': 20356,
 'nonetheless': 31038,
 'absent': 10220,
 'misstatement': 29761,
 'fact': 21835,
 'omission': 31866,
 'make': 28802,
 'misleading': 29743,
 'prohibition': 34324,
 'applicable': 11876,
 'determines': 18163,
 'payments': 32958,
 'whole': 43986,
 'part': 32787,
 'fails': 21858,
 'respond': 36405,
 'iii': 25136,
 'iv': 26671,
 'timely': 41071,
 'person': 33219,
 'takes': 40349,
 'threatens': 40954,
 'take': 40343,
 'declare': 17624,
 'void': 43428,
 'unenforceable': 42263,
 'institutes': 25995,
 'litigation': 28284,
 'designed': 18087,
 'deny': 17945,
 'recover': 35500,
 'benefits': 13298,
 'intended': 26087,
 'adjudication': 10637,
 'court': 16828,
 'competent': 15910,
 'jurisdiction': 27036,
 'already': 11297,
 'successfully': 39887,
 'establishing': 20805,
 'successful': 39886,
 'unsuccessful': 42479,
 'partial': 32794,
 '20': 4087,
 'presumed': 34089,
 'submission': 39745,
 'burden': 14164,
 'proof': 34382,
 'overcoming': 32503,
 'presumption': 34092,
 'used': 42694,
 'basis': 13044,
 'overcomes': 32502,
 'clear': 15434,
 'convincing': 16603,
 'evidence': 20983,
 'neither': 30671,
 'proper': 34394,
 'circumstances': 15302,
 'because': 13139,
 'has': 24120,
 'met': 29467,
 'standard': 39291,
 'nor': 31104,
 'create': 16909,
 'insurance': 26047,
 'subrogation': 39778,
 'maintains': 28786,
 'policy': 33653,
 'policies': 33651,
 'providing': 34535,
 'covered': 16846,
 'terms': 40710,
 'coverage': 16843,
 'receives': 35356,
 'effect': 19606,
 'would': 44276,
 'expected': 21552,
 'cover': 16842,
 'prompt': 34362,
 'insurers': 26054,
 'procedures': 34230,
 'thereafter': 40837,
 'desirable': 18097,
 'cause': 14656,
 'pay': 32934,
 'payable': 32938,
 'subrogated': 39776,
 'recovery': 35506,
 'execute': 21320,
 'papers': 32712,
 'secure': 37901,
 'documents': 18904,
 'effectively': 19613,
 'bring': 14004,
 'enforce': 20195,
 'reimburse': 35817,
 'but': 14211,
 'erisa': 20602,
 'excise': 21285,
 'taxes': 40466,
 'penalties': 33056,
 'received': 35350,
 'olicy': 31817,
 'contract': 16482,
 'definitions': 17769,
 'purposes': 34711,
 'term': 40684,
 'broadly': 14031,
 'construed': 16363,
 'include': 25465,
 'limitation': 28171,
 'investigation': 26311,
 'preparation': 34012,
 'prosecution': 34463,
 'arbitration': 12036,
 'appeal': 11840,
 'giving': 23484,
 'testimony': 40750,
 'inquiry': 25913,
 'alternative': 11321,
 'dispute': 18667,
 'mechanism': 29274,
 'intentional': 26096,
 'unintentional': 42332,
 'tort': 41298,
 'reason': 35268,
 'alleged': 11220,
 'taken': 40346,
 'omitted': 31871,
 'capacity': 14467,
 'breach': 13947,
 'duty': 19248,
 'owed': 32568,
 'meet': 29313,
 'care': 14514,
 'entity': 20361,
 'out': 32427,
 'pocket': 33627,
 'costs': 16764,
 'type': 41824,
 'nature': 30455,
 'whatsoever': 43925,
 'related': 35878,
 'disbursements': 18483,
 'either': 19778,
 'enforcing': 20212,
 'mean': 29247,
 'direct': 18406,
 'indirect': 25652,
 'owing': 32577,
 'awards': 12803,
 'well': 43846,
 'assessed': 12331,
 'plan': 33489,
 'provision': 34545,
 'contrary': 16504,
 'obligated': 31535,
 'initiated': 25841,
 'however': 24701,
 'denominated': 17925,
 'voluntarily': 43459,
 'way': 43747,
 'counterclaim': 16789,
 'cross': 16986,
 'establish': 20800,
 'governed': 23637,
 'provisions': 34549,
 'authorized': 12704,
 'consented': 16251,
 'board': 13734,
 'being': 13224,
 'understood': 42205,
 'agreed': 10967,
 'authorization': 12701,
 'compulsory': 16000,
 'response': 36418,
 'instituted': 25994,
 'interpret': 26190,
 '16': 3402,
 'matters': 29088,
 'account': 10348,
 'judgment': 26995,
 'rendered': 36044,
 'disgorgement': 18579,
 'profits': 34300,
 'purchase': 34683,
 'sale': 37482,
 'securities': 37910,
 'exchange': 21276,
 'act': 10493,
 '1934': 3832,
 'amended': 11387,
 'excluding': 21299,
 'merger': 29432,
 'consolidation': 16312,
 'reorganization': 36110,
 'transaction': 41439,
 'undertaken': 42210,
 'fraud': 22772,
 'willful': 44041,
 'misconduct': 29734,
 'where': 43937,
 'knowingly': 27318,
 'fraudulent': 22776,
 'constitute': 16338,
 'interlocutory': 26153,
 'body': 13751,
 'further': 22985,
 'option': 32113,
 'must': 30232,
 'filed': 22221,
 'expired': 21602,
 'avoidance': 12786,
 'doubt': 19007,
 'acknowledge': 10427,
 'actions': 10510,
 'omissions': 31867,
 'sponsor': 39098,
 'stockholders': 39505,
 'october': 31647,
 '29': 5198,
 '2013': 4138,
 'among': 11437,
 'investors': 26328,
 'signatory': 38477,
 'thereto': 40856,
 'considered': 16282,
 'prohibited': 34321,
 'circumstance': 15301,
 'unauthorized': 42082,
 'withhold': 44142,
 'delay': 17795,
 'proposed': 34427,
 'settle': 38132,
 'manner': 28891,
 'impose': 25330,
 'penalty': 33057,
 'only': 31962,
 'imposed': 25331,
 'monetary': 29934,
 'full': 22920,
 'limitations': 28172,
 'admission': 10689,
 'wrongdoing': 44318,
 'compromise': 15992,
 'adversely': 10755,
 'affect': 10818,
 'savings': 37628,
 'clause': 15410,
 'portion': 33728,
 'invalidated': 26272,
 'ground': 23787,
 'neverthele': 30738,
 'ss': 39169,
 'formal': 22624,
 'contribution': 16525,
 'just': 27041,
 'equitable': 20495,
 'held': 24259,
 'unavailable': 42085,
 'contribute': 16520,
 'amount': 11447,
 'limiting': 28177,
 'generality': 23270,
 'due': 19179,
 'fo': 22487,
 'rth': 37233,
 'hereof': 24337,
 'given': 23480,
 'acknowledges': 10431,
 'agrees': 10978,
 'fully': 22923,
 'primarily': 34144,
 'responsible': 36427,
 'laws': 27703,
 'organization': 32211,
 'vi': 43268,
 'bylaws': 14240,
 'partnership': 32840,
 'operating': 32059,
 'formation': 22635,
 'organizational': 32213,
 'governing': 23639,
 'vii': 43316,
 'sources': 38931,
 'irrespective': 26460,
 'reduce': 35587,
 'alter': 11306,
 'obligations': 31543,
 'making': 28809,
 'demand': 17858,
 'previously': 34122,
 'reimbursed': 35818,
 'outstanding': 32475,
 'balance': 12908,
 'things': 40897,
 'rela': 35873,
 'ted': 40528,
 'third': 40901,
 'beneficiaries': 13291,
 'though': 40939,
 'were': 43869,
 'perform': 33137,
 'meanings': 29253,
 'means': 29254,
 'venture': 43048,
 'trust': 41675,
 'enterprise': 20322,
 'insurer': 26053,
 'whom': 43996,
 'obligation': 31540,
 'both': 13855,
 'one': 31936,
 'hand': 24012,
 'communications': 15842,
 'notices': 31188,
 'requests': 36264,
 'demands': 17863,
 'duly': 19194,
 'delivered': 17845,
 'said': 37461,
 'communication': 15841,
 'directed': 18407,
 'mailed': 28767,
 'certified': 14887,
 'registered': 35746,
 'mail': 28766,
 'postage': 33774,
 'prepaid': 34010,
 'business': 14199,
 'day': 17487,
 'date': 17453,
 'reputable': 36247,
 'overnight': 32523,
 'courier': 16822,
 'deposit': 17984,
 'verification': 43127,
 'sent': 38019,
 'email': 19934,
 'facsimile': 21833,
 'transmission': 41504,
 'oral': 32136,
 'confirmation': 16165,
 'addresses': 10591,
 'shown': 38400,
 'signature': 38480,
 'page': 32647,
 'modified': 29895,
 '11': 2447,
 'nonexclusivity': 31040,
 'exclusive': 21303,
 'inure': 26263,
 'heirs': 24254,
 'executors': 21338,
 'administrators': 10681,
 'amendment': 11399,
 'alteration': 11308,
 '12': 2664,
 'construction': 16356,
 'employment': 20049,
 'duration': 19224,
 'nothing': 31180,
 'contained': 16407,
 'retained': 36534,
 'employ': 20036,
 'even': 20952,
 'he': 24185,
 'ceased': 14778,
 'control': 16532,
 'led': 27838,
 '13': 2907,
 'interpretation': 26191,
 'intend': 26085,
 'interpreted': 26194,
 'enforced': 20202,
 'hereafter': 24314,
 'specifically': 39014,
 'statute': 39356,
 'change': 14988,
 'rule': 37301,
 'expands': 21537,
 'member': 29352,
 'intent': 26093,
 'enjoy': 20254,
 'greater': 23721,
 'afforded': 10867,
 'narrows': 30417,
 'applied': 11886,
 'shal': 38224,
 '14': 3062,
 'entire': 20344,
 'expressly': 21656,
 'referred': 35641,
 'contemporaneous': 16434,
 'understandings': 42200,
 'superseded': 40022,
 'modification': 29892,
 'waiver': 43610,
 'supplement': 40042,
 'binding': 13553,
 'executed': 21321,
 'similar': 38529,
 'continuing': 16473,
 'terminated': 40694,
 'successor': 39892,
 'assigns': 12361,
 'enforceable': 20201,
 'respective': 36395,
 'successors': 39893,
 'spouses': 39115,
 'legal': 27858,
 'representatives': 36203,
 'substantially': 39836,
 'assets': 12339,
 'substance': 39829,
 'satisfactory': 37600,
 'succession': 39888,
 'had': 23960,
 'plac': 33469,
 'process': 34240,
 'venue': 43053,
 'irrevocably': 26468,
 'unconditionally': 42133,
 'arising': 12106,
 'chancery': 14983,
 'state': 39320,
 'federal': 22022,
 'united': 42350,
 'states': 39330,
 'america': 11411,
 'country': 16813,
 'waive': 43608,
 'objection': 31517,
 'laying': 27714,
 'plead': 33537,
 'im': 25194,
 'inconvenient': 25506,
 'forum': 22680,
 '18': 3655,
 'govern': 23634,
 'instances': 25983,
 '19': 3782,
 'injunctive': 25860,
 'relief': 35929,
 'seeking': 37931,
 'specific': 39011,
 'performance': 33141,
 'necessity': 30611,
 'showing': 38399,
 'irreparable': 26457,
 'harm': 24093,
 'posting': 33781,
 'bond': 13785,
 'requirements': 36273,
 'waived': 43609,
 'precluded': 33920,
 'obtaining': 31586,
 'sh': 38205,
 'some': 38878,
 'total': 41307,
 'nevertheless': 30739,
 '21': 4435,
 'mutual': 30248,
 'acknowledgement': 10429,
 'public': 34638,
 'prohibit': 34319,
 'indemnifying': 25595,
 'employees': 20040,
 'consultants': 16372,
 'fiduciaries': 22192,
 'agents': 10932,
 'understands': 42201,
 'question': 34888,
 '22': 4550,
 'counterparts': 16798,
 'more': 29989,
 'original': 32230,
 'together': 41202,
 'instrument': 26016,
 'signatories': 38476,
 'counterpart': 16796,
 '23': 4645,
 'headings': 24190,
 'subsection': 39796,
 'reference': 35631,
 'meaning': 29250,
 'effective': 19612,
 'stated': 39321,
 'above': 10195,
 'name': 30368,
 'title': 41120,
 'address': 10586,
 'round': 37092,
 'rock': 36956,
 '78682': 8796,
 '81335': 8976,
 'articles': 12205,
 '2016': 4194,
 '07': 1921,
 '21t08': 4543,
 '47': 6871,
 '53': 7344,
 '961255': 9724,
 '0000950123': 346,
 '004783': 1293,
 '2007': 4109,
 '0001344705': 620,
 'crystal': 17014,
 'river': 36847,
 'capital': 14476,
 '6798': 8248,
 'real': 35232,
 'estate': 20811,
 'investment': 26324,
 'new': 30741,
 'york': 44550,
 'maryland': 29021,
 'true': 41667,
 'small': 38716,
 'reporting': 36178,
 '32958': 5650,
 'amendments': 11400,
 'restatement': 36442,
 'y30866exv3w1': 44476,
 'first': 22338,
 'desires': 18100,
 'amend': 11383,
 'restate': 36438,
 'currently': 17145,
 'hereinafter': 24333,
 'amende': 11386,
 'second': 37858,
 'article': 12197,
 'called': 14374,
 'purpose': 34708,
 'formed': 22639,
 'engage': 20216,
 'lawful': 27693,
 'activity': 10523,
 'engaging': 20221,
 'reit': 35861,
 '856': 9165,
 'internal': 26166,
 'revenue': 36616,
 'code': 15578,
 '1986': 3918,
 'organized': 32216,
 'force': 22550,
 'enumerated': 20396,
 'objects': 31525,
 'restricted': 36476,
 'inference': 25726,
 'regarded': 35709,
 'independent': 25615,
 'they': 40885,
 'powers': 33825,
 'resident': 36336,
 'principal': 34158,
 'office': 31712,
 'incorporated': 25514,
 '300': 5373,
 'east': 19367,
 'lombard': 28458,
 'street': 39588,
 'baltimore': 12933,
 '21202': 4462,
 'defining': 17763,
 'regulating': 35788,
 'number': 31393,
 'affairs': 10815,
 'managed': 28841,
 'direction': 18412,
 'four': 22701,
 'increased': 25534,
 'decreased': 17650,
 'never': 30737,
 'less': 27953,
 'minimum': 29674,
 'mgcl': 29524,
 'increase': 25533,
 'fill': 22234,
 'vacancy': 42883,
 'resulting': 36510,
 'beginning': 13194,
 'initial': 25832,
 'elected': 19814,
 'solely': 38843,
 'holders': 24567,
 'classes': 15395,
 'series': 38079,
 'preferred': 33971,
 'stock': 39494,
 'classified': 15402,
 'severally': 38164,
 'hold': 24557,
 'three': 40956,
 'class': 15392,
 'initially': 25837,
 'expiring': 21606,
 'me': 29238,
 'eting': 20872,
 '2006': 4105,
 'another': 11675,
 'meeting': 29316,
 '2008': 4112,
 'members': 29354,
 'until': 42519,
 'qualify': 34843,
 'whose': 43999,
 'expires': 21604,
 'year': 44499,
 'names': 30375,
 'who': 43983,
 'hall': 23985,
 'harald': 24061,
 'hansen': 24051,
 'leo': 27936,
 'walsh': 43644,
 'clifford': 15465,
 'lai': 27522,
 'rodman': 36967,
 'drake': 19065,
 'elects': 19843,
 'becomes': 13147,
 'eligible': 19877,
 '802': 8912,
 'setting': 38129,
 'vacancies': 42882,
 'filled': 22235,
 'affirmative': 10852,
 'vote': 43522,
 'majority': 28799,
 'remaining': 35964,
 'quorum': 34924,
 'remainder': 35960,
 'directorship': 18427,
 'occurred': 31619,
 'extraordinary': 21705,
 'relating': 35883,
 'removal': 36019,
 'permitting': 33197,
 'requiring': 36275,
 'approved': 11957,
 'shares': 38266,
 ...}

In [9]:
X_train_tf.toarray()


Out[9]:
array([[ 0.00201421,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.00291319,  0.0204684 ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.01976081,  0.02497682,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       ..., 
       [ 0.01228805,  0.11511633,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.02259326,  0.01511836,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.00637793,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

In [10]:
X_test_tf = vect.transform(test_data)
print(X_test_tf.shape)


(355, 45528)

In [11]:
# from sklearn.feature_extraction.text import CountVectorizer
# count_vect = CountVectorizer(stop_words='english')
# X = count_vect.fit_transform(train_data)

In [12]:
# print(X.shape)
# count_vect.vocabulary_

In [13]:
# X.toarray()

In [14]:
# from sklearn.feature_extraction.text import TfidfTransformer
# tf = TfidfTransformer()
# X_train_tf = tf.fit_transform(X)
# X_train_tf.shape

In [15]:
# X_train_tf.toarray()

In [16]:
# docs_test = ["God is great", "Retina scan gives early diagnosis about diabetes"]
# X_test = count_vect.transform(docs_test)
# X_tf_test = tf.transform(X_test)

# X_test = count_vect.transform(test_data)
# X_test_tf = tf.transform(X_test)
# X_test_tf.shape

In [17]:
# from sklearn.naive_bayes import MultinomialNB
# clf = MultinomialNB()
# clf.fit(X_train_tf,all_of_it.target[:num])

In [18]:
from sklearn import svm
clf = svm.SVC(decision_function_shape="ovo", C = 10000.0, kernel='rbf', gamma = 0.6)
clf.fit(X_train_tf, all_of_it.target[:num])


Out[18]:
SVC(C=10000.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovo', degree=3, gamma=0.6, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [19]:
print(clf.score(X_test_tf, all_of_it.target[num:]))
# predict = clf.predict(X_test_tf)
# for text, category in zip(docs_test,predict):
#     print(text + "\tbelongs to:\t" + all_of_it.target_names[category])


0.997183098592

In [20]:
# all_of_it.target_names[int(clf.predict(tf.fit_transform(count_vect.transform(f.read()))))]
HOME_DIR = "D:\kaam\AdditionalParsedTest"
import os
print("File:\tClassified as:")
for home,subdir,files in os.walk(HOME_DIR):
    for file in files:
        with open(os.path.join(HOME_DIR, file)) as f:
#                 print(file + "\t" + all_of_it.target_names[int(clf.predict(tf.transform(count_vect.transform([f.read()]))))])
            print(file + "\t" + all_of_it.target_names[int(clf.predict(vect.transform([f.read()])))])


File:	Classified as:
AOI1.txt	AoI
AOI2.txt	AoI
AOI3.txt	AoI
AOI4.txt	AoI
AOI5.txt	AoI
MC1.txt	MC
MC2.txt	MC
MC3.txt	MC
MC4.txt	MC
MC5.txt	MC

new implementation

Using different ratios


In [21]:
vect4 = TfidfVectorizer()
# vect5 = TfidfVectorizer()
# vect6 = TfidfVectorizer()
# vect7 = TfidfVectorizer()


X_train_tf4 = vect4.fit_transform(train_data4)
# X_train_tf5 = vect5.fit_transform(train_data5)
# X_train_tf6 = vect6.fit_transform(train_data6)
# X_train_tf7 = vect7.fit_transform(train_data7)

X_test_tf4 = vect4.transform(test_data4)
# X_test_tf5 = vect5.transform(test_data5)
# X_test_tf6 = vect6.transform(test_data6)
# X_test_tf7 = vect7.transform(test_data7)

In [22]:
clf4 = svm.SVC(decision_function_shape="ovo", C = 10000.0, kernel='rbf', gamma = 0.6)
clf4.fit(X_train_tf4, all_of_it.target[:num4])

# clf5 = svm.SVC(decision_function_shape="ovo", C = 10000.0, kernel='rbf', gamma = 0.6)
# clf5.fit(X_train_tf5, all_of_it.target[:num5])

# clf6 = svm.SVC(decision_function_shape="ovo", C = 10000.0, kernel='rbf', gamma = 0.6)
# clf6.fit(X_train_tf6, all_of_it.target[:num6])

# clf7 = svm.SVC(decision_function_shape="ovo", C = 10000.0, kernel='rbf', gamma = 0.6)
# clf7.fit(X_train_tf7, all_of_it.target[:num7])


Out[22]:
SVC(C=10000.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovo', degree=3, gamma=0.6, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [23]:
print(clf4.score(X_test_tf4, all_of_it.target[num4:]))
# print(clf5.score(X_test_tf5, all_of_it.target[num5:]))
# print(clf6.score(X_test_tf6, all_of_it.target[num6:]))
# print(clf7.score(X_test_tf7, all_of_it.target[num7:]))


0.993421052632

In [ ]: