CSV Files and Data Frames

This is a sample solution for the CSV and data frames exercise. This does not mean that this is the only way to solve this exercise. As with any programming task - and also with most data analysis tasks - there are multiple solutions for the same problem.

Loading data from CSV

Your first step is always accessing the data. Often, the data is stored in a database or within files. One common and generic exchange format for files are Comma Separated Value (CSV) files. The first line of such a file indicates the names of the features, the following lines each contain a single instance.

First, download the bankruptcy data set we prepared for you and and upload it to your Jupyter notebook. Please note, that we slightly modified the data from the original available in the UCI archive UCI for this exercise, e.g., to include missing values.

Use the cell below to load the data from the CSV file. The data should be loaded into a data frame. Data frames are available in python using the pandas library. In comparison to matrices or similar types, they allow different types of columns, are usually easier to manipulate, e.g., by adding or removing rows/columns, and rows and columns can be named.

Once you have done this, print some information about the data:

number of instances
number of features
names of the features

You should have 55 instances with 7 features.

Remove features

If you load all data from a file, you often also load irrelevant features for a task. In case of the data you just loaded, the feature is called Company. This is an ID feature for the instances in the data. Such data must often be removed before further analysis of the data. Your second task is to remove this feature from the data.

import pandas as pd

# load the data directly from the URL. This is supported since pandas 0.19.2
data = pd.read_csv(
    'http://user.informatik.uni-goettingen.de/~sherbold/analcatdata_bankruptcy.csv')
# since jupyter notebooks show the return value of the last statement, this shows the data frame
# jupyter also provides a nice HTML rendering for the data
data

	Company	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	Bankrupt
0	360Networks	9.3	-7.7	1.6	9.1	3.726	1.0
1	Advanced_Radio_Telecom	42.6	-60.1	-10.1	0.3	4.130	1.0
2	Ardent_Communications	-28.8	-203.2	-51.0	14.7	0.111	1.0
3	At_Home_Corp.	2.5	-433.1	-6.0	29.3	1.949	1.0
4	Convergent_Communications	26.1	-57.4	-23.5	54.2	0.855	1.0
5	Covad_Communications	39.2	-111.8	-77.8	10.5	0.168	1.0
6	e.spire	-5.4	-105.2	-5.8	38.9	0.028	1.0
7	eGlobe	-35.2	-92.4	-32.5	48.5	11.280	1.0
8	Exodus_Communications	10.5	-12.4	-2.3	21.0	2.500	1.0
9	General_Datacomm_Industries	-22.4	-124.5	-7.9	125.6	1.595	1.0
10	Global_Telesystems	24.6	-29.0	-2.0	21.3	1.968	1.0
11	GST_Telecom	6.6	-50.9	-2.6	28.9	0.258	1.0
12	Metricom	33.9	-46.5	-17.5	0.9	0.828	1.0
13	Net2000_Communications	19.1	-66.3	-25.5	22.3	0.460	1.0
14	NetVoice_Technologies	-21.1	-46.0	-26.8	81.4	0.698	1.0
15	PSINet	2.5	-228.7	-6.7	38.6	0.030	1.0
16	Rhythms_NetConnections	47.0	-78.2	-42.0	4.4	0.168	1.0
17	RSL_Communications	9.1	-40.2	-0.7	81.5	0.522	1.0
18	FictionCorp	NaN	-49.2	-87.4	119.9	2.919	1.0
19	SSE_Telecom	43.0	-49.2	-87.4	119.9	2.919	1.0
20	Startec_Global_Communications	-34.9	-79.0	-13.5	127.8	0.197	1.0
21	Teligent	20.6	-146.3	-36.0	12.6	0.075	1.0
22	U.S._Wireless	-51.6	-326.1	-98.7	0.9	2.402	1.0
23	Viatel	-93.0	-95.2	-7.3	34.8	0.071	1.0
24	AweSomeCorp	43.0	-49.2	-87.4	119.9	NaN	1.0
25	WebLink_Wireless	-127.5	-121.3	6.4	65.7	0.248	1.0
26	Winstar	-1.2	-47.5	-9.7	14.5	0.456	1.0
27	Aether_Systems	30.6	-14.4	-4.9	2.2	3.482	0.0
28	Akamai_Technologies	9.8	-33.8	-7.1	3.2	5.965	0.0
29	Allegiance_Telecom	37.8	-45.4	-7.1	17.1	3.450	0.0
30	UnknownCorp	43.0	-49.2	-87.4	NaN	2.919	1.0
31	ALLTEL_Corp.	2.2	31.6	22.0	58.0	2.758	0.0
32	BellSouth	-11.5	27.6	24.4	51.4	2.266	0.0
33	Broadwing	-4.1	-5.8	7.7	31.6	1.222	0.0
34	CenturyTel	-5.7	21.1	14.3	28.9	1.153	0.0
35	Citizens_Communications	25.2	0.0	7.9	25.9	0.717	0.0
36	Commonwealth_Telephone	-5.7	0.0	16.3	49.9	1.517	0.0
37	FancyCorp	43.0	-49.2	NaN	119.9	2.919	1.0
38	Conestoga_Enterprises	1.7	11.4	14.1	48.0	1.347	0.0
39	Digex	13.8	-37.6	-13.0	32.3	13.768	0.0
40	Equant	8.2	-15.6	0.3	87.7	5.444	0.0
41	Garmin	74.7	54.6	27.9	74.6	3.720	0.0
42	Gilat_Satellite_Networks	43.1	-0.4	3.4	40.0	0.925	0.0
43	IDT_Corp.	48.7	38.4	-9.2	65.4	0.705	0.0
44	Infonet	40.7	0.6	1.8	49.2	7.497	0.0
45	AnotherCorp	43.0	-49.2	-87.4	119.9	2.919	NaN
46	Openwave_Systems	20.3	-61.3	1.9	27.0	35.178	0.0
47	Price_Communications	16.1	3.6	11.6	21.4	0.856	0.0
48	Qwest	-6.1	0.0	9.4	22.6	2.123	0.0
49	SBC_Communications	-7.2	18.6	20.8	52.2	2.413	0.0
50	Telephone_and_Data_Systems	-5.3	31.0	4.9	26.9	1.362	0.0
51	Time_Warner_Telecom	4.1	-10.1	9.7	36.0	7.623	0.0
52	U.S._Cellular	0.3	27.9	16.1	49.5	4.357	0.0
53	Verizon_Communications	-7.4	8.9	15.3	39.3	1.273	0.0
54	Western_Wireless	1.1	-39.5	15.7	41.8	1.449	0.0

# Now we drop the column company
# inplace means that the current data frame is modified
# without inplace, a copy is created that we would have to assign again
data.drop(labels='Company', axis='columns', inplace=True)
data

	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	Bankrupt
0	9.3	-7.7	1.6	9.1	3.726	1.0
1	42.6	-60.1	-10.1	0.3	4.130	1.0
2	-28.8	-203.2	-51.0	14.7	0.111	1.0
3	2.5	-433.1	-6.0	29.3	1.949	1.0
4	26.1	-57.4	-23.5	54.2	0.855	1.0
5	39.2	-111.8	-77.8	10.5	0.168	1.0
6	-5.4	-105.2	-5.8	38.9	0.028	1.0
7	-35.2	-92.4	-32.5	48.5	11.280	1.0
8	10.5	-12.4	-2.3	21.0	2.500	1.0
9	-22.4	-124.5	-7.9	125.6	1.595	1.0
10	24.6	-29.0	-2.0	21.3	1.968	1.0
11	6.6	-50.9	-2.6	28.9	0.258	1.0
12	33.9	-46.5	-17.5	0.9	0.828	1.0
13	19.1	-66.3	-25.5	22.3	0.460	1.0
14	-21.1	-46.0	-26.8	81.4	0.698	1.0
15	2.5	-228.7	-6.7	38.6	0.030	1.0
16	47.0	-78.2	-42.0	4.4	0.168	1.0
17	9.1	-40.2	-0.7	81.5	0.522	1.0
18	NaN	-49.2	-87.4	119.9	2.919	1.0
19	43.0	-49.2	-87.4	119.9	2.919	1.0
20	-34.9	-79.0	-13.5	127.8	0.197	1.0
21	20.6	-146.3	-36.0	12.6	0.075	1.0
22	-51.6	-326.1	-98.7	0.9	2.402	1.0
23	-93.0	-95.2	-7.3	34.8	0.071	1.0
24	43.0	-49.2	-87.4	119.9	NaN	1.0
25	-127.5	-121.3	6.4	65.7	0.248	1.0
26	-1.2	-47.5	-9.7	14.5	0.456	1.0
27	30.6	-14.4	-4.9	2.2	3.482	0.0
28	9.8	-33.8	-7.1	3.2	5.965	0.0
29	37.8	-45.4	-7.1	17.1	3.450	0.0
30	43.0	-49.2	-87.4	NaN	2.919	1.0
31	2.2	31.6	22.0	58.0	2.758	0.0
32	-11.5	27.6	24.4	51.4	2.266	0.0
33	-4.1	-5.8	7.7	31.6	1.222	0.0
34	-5.7	21.1	14.3	28.9	1.153	0.0
35	25.2	0.0	7.9	25.9	0.717	0.0
36	-5.7	0.0	16.3	49.9	1.517	0.0
37	43.0	-49.2	NaN	119.9	2.919	1.0
38	1.7	11.4	14.1	48.0	1.347	0.0
39	13.8	-37.6	-13.0	32.3	13.768	0.0
40	8.2	-15.6	0.3	87.7	5.444	0.0
41	74.7	54.6	27.9	74.6	3.720	0.0
42	43.1	-0.4	3.4	40.0	0.925	0.0
43	48.7	38.4	-9.2	65.4	0.705	0.0
44	40.7	0.6	1.8	49.2	7.497	0.0
45	43.0	-49.2	-87.4	119.9	2.919	NaN
46	20.3	-61.3	1.9	27.0	35.178	0.0
47	16.1	3.6	11.6	21.4	0.856	0.0
48	-6.1	0.0	9.4	22.6	2.123	0.0
49	-7.2	18.6	20.8	52.2	2.413	0.0
50	-5.3	31.0	4.9	26.9	1.362	0.0
51	4.1	-10.1	9.7	36.0	7.623	0.0
52	0.3	27.9	16.1	49.5	4.357	0.0
53	-7.4	8.9	15.3	39.3	1.273	0.0
54	1.1	-39.5	15.7	41.8	1.449	0.0

Remove instances with missing values

Real-life data is often not clean, i.e., the data has many problems which must be addressed first, before it can be used for analysis. One common problem are missing features, i.e., not all features are available for all data. This is also the case for the data you just loaded. All missing values are marked as NA in the CSV file.

Your third task is to remove all instances from the data, that have any missing values and store the remaining instances in a new data frame. If this works correctly, five instances should be removed. You can check this, e.g., by comparing the sizes of the data frames or printing the instances that were removed.

print('Number of instance before dropping instances with missing values:', len(data))
data.dropna(inplace=True)
print('Number of instance after dropping instances with missing values:', len(data))

Number of instance before dropping instances with missing values: 55
Number of instance after dropping instances with missing values: 50

Computing with data frames

Sometimes you have to compute new columns from the values of existing columns. Please append two new columns to the data frame: The sum of the columns WC/TA and RE/TA and the product of the columns EBIT/TA and S/TA.

# columns can be access by their name
# new columns can be added by using their name
data['WC/TA+RE/TA'] = data['WC/TA']+data['RE/TA']
# the above way to access the columns is shorthand for using loc (location)
# : selects all rows
data.loc[:, 'EBIT/TA*S/TA'] = data.loc[:, 'EBIT/TA']*data.loc[:, 'S/TA']
data

	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	Bankrupt	WC/TA+RE/TA	EBIT/TA*S/TA
0	9.3	-7.7	1.6	9.1	3.726	1.0	1.6	14.56
1	42.6	-60.1	-10.1	0.3	4.130	1.0	-17.5	-3.03
2	-28.8	-203.2	-51.0	14.7	0.111	1.0	-232.0	-749.70
3	2.5	-433.1	-6.0	29.3	1.949	1.0	-430.6	-175.80
4	26.1	-57.4	-23.5	54.2	0.855	1.0	-31.3	-1273.70
5	39.2	-111.8	-77.8	10.5	0.168	1.0	-72.6	-816.90
6	-5.4	-105.2	-5.8	38.9	0.028	1.0	-110.6	-225.62
7	-35.2	-92.4	-32.5	48.5	11.280	1.0	-127.6	-1576.25
8	10.5	-12.4	-2.3	21.0	2.500	1.0	-1.9	-48.30
9	-22.4	-124.5	-7.9	125.6	1.595	1.0	-146.9	-992.24
10	24.6	-29.0	-2.0	21.3	1.968	1.0	-4.4	-42.60
11	6.6	-50.9	-2.6	28.9	0.258	1.0	-44.3	-75.14
12	33.9	-46.5	-17.5	0.9	0.828	1.0	-12.6	-15.75
13	19.1	-66.3	-25.5	22.3	0.460	1.0	-47.2	-568.65
14	-21.1	-46.0	-26.8	81.4	0.698	1.0	-67.1	-2181.52
15	2.5	-228.7	-6.7	38.6	0.030	1.0	-226.2	-258.62
16	47.0	-78.2	-42.0	4.4	0.168	1.0	-31.2	-184.80
17	9.1	-40.2	-0.7	81.5	0.522	1.0	-31.1	-57.05
19	43.0	-49.2	-87.4	119.9	2.919	1.0	-6.2	-10479.26
20	-34.9	-79.0	-13.5	127.8	0.197	1.0	-113.9	-1725.30
21	20.6	-146.3	-36.0	12.6	0.075	1.0	-125.7	-453.60
22	-51.6	-326.1	-98.7	0.9	2.402	1.0	-377.7	-88.83
23	-93.0	-95.2	-7.3	34.8	0.071	1.0	-188.2	-254.04
25	-127.5	-121.3	6.4	65.7	0.248	1.0	-248.8	420.48
26	-1.2	-47.5	-9.7	14.5	0.456	1.0	-48.7	-140.65
27	30.6	-14.4	-4.9	2.2	3.482	0.0	16.2	-10.78
28	9.8	-33.8	-7.1	3.2	5.965	0.0	-24.0	-22.72
29	37.8	-45.4	-7.1	17.1	3.450	0.0	-7.6	-121.41
31	2.2	31.6	22.0	58.0	2.758	0.0	33.8	1276.00
32	-11.5	27.6	24.4	51.4	2.266	0.0	16.1	1254.16
33	-4.1	-5.8	7.7	31.6	1.222	0.0	-9.9	243.32
34	-5.7	21.1	14.3	28.9	1.153	0.0	15.4	413.27
35	25.2	0.0	7.9	25.9	0.717	0.0	25.2	204.61
36	-5.7	0.0	16.3	49.9	1.517	0.0	-5.7	813.37
38	1.7	11.4	14.1	48.0	1.347	0.0	13.1	676.80
39	13.8	-37.6	-13.0	32.3	13.768	0.0	-23.8	-419.90
40	8.2	-15.6	0.3	87.7	5.444	0.0	-7.4	26.31
41	74.7	54.6	27.9	74.6	3.720	0.0	129.3	2081.34
42	43.1	-0.4	3.4	40.0	0.925	0.0	42.7	136.00
43	48.7	38.4	-9.2	65.4	0.705	0.0	87.1	-601.68
44	40.7	0.6	1.8	49.2	7.497	0.0	41.3	88.56
46	20.3	-61.3	1.9	27.0	35.178	0.0	-41.0	51.30
47	16.1	3.6	11.6	21.4	0.856	0.0	19.7	248.24
48	-6.1	0.0	9.4	22.6	2.123	0.0	-6.1	212.44
49	-7.2	18.6	20.8	52.2	2.413	0.0	11.4	1085.76
50	-5.3	31.0	4.9	26.9	1.362	0.0	25.7	131.81
51	4.1	-10.1	9.7	36.0	7.623	0.0	-6.0	349.20
52	0.3	27.9	16.1	49.5	4.357	0.0	28.2	796.95
53	-7.4	8.9	15.3	39.3	1.273	0.0	1.5	601.29
54	1.1	-39.5	15.7	41.8	1.449	0.0	-38.4	656.26

Merging data frames

The next task of this exercise is to merge data frames. For this, load the data from the same CSV file as above again. Then merge the data frame with the result from task 2.4, such that:

the dropped feature from task 2.2 is part of the merged data frame; and
the removed instances from task 2.3 are still gone; and
the indirectly computed features from task 2.4 are part of the merged data frame.

data2 = pd.read_csv(
    'http://user.informatik.uni-goettingen.de/~sherbold/analcatdata_bankruptcy.csv')
# merge combines two data frames with a join operation
# by default an "inner join" on the index is performed, i.e., all instances with the same index are joined
merged_data = data.merge(data2)
merged_data

	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	Bankrupt	WC/TA+RE/TA	EBIT/TA*S/TA	Company
0	9.3	-7.7	1.6	9.1	3.726	1.0	1.6	14.56	360Networks
1	42.6	-60.1	-10.1	0.3	4.130	1.0	-17.5	-3.03	Advanced_Radio_Telecom
2	-28.8	-203.2	-51.0	14.7	0.111	1.0	-232.0	-749.70	Ardent_Communications
3	2.5	-433.1	-6.0	29.3	1.949	1.0	-430.6	-175.80	At_Home_Corp.
4	26.1	-57.4	-23.5	54.2	0.855	1.0	-31.3	-1273.70	Convergent_Communications
5	39.2	-111.8	-77.8	10.5	0.168	1.0	-72.6	-816.90	Covad_Communications
6	-5.4	-105.2	-5.8	38.9	0.028	1.0	-110.6	-225.62	e.spire
7	-35.2	-92.4	-32.5	48.5	11.280	1.0	-127.6	-1576.25	eGlobe
8	10.5	-12.4	-2.3	21.0	2.500	1.0	-1.9	-48.30	Exodus_Communications
9	-22.4	-124.5	-7.9	125.6	1.595	1.0	-146.9	-992.24	General_Datacomm_Industries
10	24.6	-29.0	-2.0	21.3	1.968	1.0	-4.4	-42.60	Global_Telesystems
11	6.6	-50.9	-2.6	28.9	0.258	1.0	-44.3	-75.14	GST_Telecom
12	33.9	-46.5	-17.5	0.9	0.828	1.0	-12.6	-15.75	Metricom
13	19.1	-66.3	-25.5	22.3	0.460	1.0	-47.2	-568.65	Net2000_Communications
14	-21.1	-46.0	-26.8	81.4	0.698	1.0	-67.1	-2181.52	NetVoice_Technologies
15	2.5	-228.7	-6.7	38.6	0.030	1.0	-226.2	-258.62	PSINet
16	47.0	-78.2	-42.0	4.4	0.168	1.0	-31.2	-184.80	Rhythms_NetConnections
17	9.1	-40.2	-0.7	81.5	0.522	1.0	-31.1	-57.05	RSL_Communications
18	43.0	-49.2	-87.4	119.9	2.919	1.0	-6.2	-10479.26	SSE_Telecom
19	-34.9	-79.0	-13.5	127.8	0.197	1.0	-113.9	-1725.30	Startec_Global_Communications
20	20.6	-146.3	-36.0	12.6	0.075	1.0	-125.7	-453.60	Teligent
21	-51.6	-326.1	-98.7	0.9	2.402	1.0	-377.7	-88.83	U.S._Wireless
22	-93.0	-95.2	-7.3	34.8	0.071	1.0	-188.2	-254.04	Viatel
23	-127.5	-121.3	6.4	65.7	0.248	1.0	-248.8	420.48	WebLink_Wireless
24	-1.2	-47.5	-9.7	14.5	0.456	1.0	-48.7	-140.65	Winstar
25	30.6	-14.4	-4.9	2.2	3.482	0.0	16.2	-10.78	Aether_Systems
26	9.8	-33.8	-7.1	3.2	5.965	0.0	-24.0	-22.72	Akamai_Technologies
27	37.8	-45.4	-7.1	17.1	3.450	0.0	-7.6	-121.41	Allegiance_Telecom
28	2.2	31.6	22.0	58.0	2.758	0.0	33.8	1276.00	ALLTEL_Corp.
29	-11.5	27.6	24.4	51.4	2.266	0.0	16.1	1254.16	BellSouth
30	-4.1	-5.8	7.7	31.6	1.222	0.0	-9.9	243.32	Broadwing
31	-5.7	21.1	14.3	28.9	1.153	0.0	15.4	413.27	CenturyTel
32	25.2	0.0	7.9	25.9	0.717	0.0	25.2	204.61	Citizens_Communications
33	-5.7	0.0	16.3	49.9	1.517	0.0	-5.7	813.37	Commonwealth_Telephone
34	1.7	11.4	14.1	48.0	1.347	0.0	13.1	676.80	Conestoga_Enterprises
35	13.8	-37.6	-13.0	32.3	13.768	0.0	-23.8	-419.90	Digex
36	8.2	-15.6	0.3	87.7	5.444	0.0	-7.4	26.31	Equant
37	74.7	54.6	27.9	74.6	3.720	0.0	129.3	2081.34	Garmin
38	43.1	-0.4	3.4	40.0	0.925	0.0	42.7	136.00	Gilat_Satellite_Networks
39	48.7	38.4	-9.2	65.4	0.705	0.0	87.1	-601.68	IDT_Corp.
40	40.7	0.6	1.8	49.2	7.497	0.0	41.3	88.56	Infonet
41	20.3	-61.3	1.9	27.0	35.178	0.0	-41.0	51.30	Openwave_Systems
42	16.1	3.6	11.6	21.4	0.856	0.0	19.7	248.24	Price_Communications
43	-6.1	0.0	9.4	22.6	2.123	0.0	-6.1	212.44	Qwest
44	-7.2	18.6	20.8	52.2	2.413	0.0	11.4	1085.76	SBC_Communications
45	-5.3	31.0	4.9	26.9	1.362	0.0	25.7	131.81	Telephone_and_Data_Systems
46	4.1	-10.1	9.7	36.0	7.623	0.0	-6.0	349.20	Time_Warner_Telecom
47	0.3	27.9	16.1	49.5	4.357	0.0	28.2	796.95	U.S._Cellular
48	-7.4	8.9	15.3	39.3	1.273	0.0	1.5	601.29	Verizon_Communications
49	1.1	-39.5	15.7	41.8	1.449	0.0	-38.4	656.26	Western_Wireless

Selecting subsets

Based on the data frame from task 2.5, create new data frames according to the following criteria.

A data frame with only the rows 10 to 20 and all columns.
A data frame with only the columns 1 to 4 and all rows.
A data frame with only the columns WC/TA and EBIT/TA and all rows.
A data frame with all rows that have the value RE/TA less than -20 and all columns.
A data frame with all rows that have the value RE/TA less than -20 and bankrupt equal to 0 and all columns.
A data frame with all rows that have the value RE/TA less than -20 and bankrupt equal to 0 and only the columns WC/TA and EBIT/TA.

# integer based indexing fetches rows, in this case from 10 (inclusive) to 21 (exclusive)
merged_data[10:21]

	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	Bankrupt	WC/TA+RE/TA	EBIT/TA*S/TA	Company
10	24.6	-29.0	-2.0	21.3	1.968	1.0	-4.4	-42.60	Global_Telesystems
11	6.6	-50.9	-2.6	28.9	0.258	1.0	-44.3	-75.14	GST_Telecom
12	33.9	-46.5	-17.5	0.9	0.828	1.0	-12.6	-15.75	Metricom
13	19.1	-66.3	-25.5	22.3	0.460	1.0	-47.2	-568.65	Net2000_Communications
14	-21.1	-46.0	-26.8	81.4	0.698	1.0	-67.1	-2181.52	NetVoice_Technologies
15	2.5	-228.7	-6.7	38.6	0.030	1.0	-226.2	-258.62	PSINet
16	47.0	-78.2	-42.0	4.4	0.168	1.0	-31.2	-184.80	Rhythms_NetConnections
17	9.1	-40.2	-0.7	81.5	0.522	1.0	-31.1	-57.05	RSL_Communications
18	43.0	-49.2	-87.4	119.9	2.919	1.0	-6.2	-10479.26	SSE_Telecom
19	-34.9	-79.0	-13.5	127.8	0.197	1.0	-113.9	-1725.30	Startec_Global_Communications
20	20.6	-146.3	-36.0	12.6	0.075	1.0	-125.7	-453.60	Teligent

# in general, we can use iloc for integer based locations
merged_data.iloc[:, 1:5]

	RE/TA	EBIT/TA	S/TA	BVE/BVL
0	-7.7	1.6	9.1	3.726
1	-60.1	-10.1	0.3	4.130
2	-203.2	-51.0	14.7	0.111
3	-433.1	-6.0	29.3	1.949
4	-57.4	-23.5	54.2	0.855
5	-111.8	-77.8	10.5	0.168
6	-105.2	-5.8	38.9	0.028
7	-92.4	-32.5	48.5	11.280
8	-12.4	-2.3	21.0	2.500
9	-124.5	-7.9	125.6	1.595
10	-29.0	-2.0	21.3	1.968
11	-50.9	-2.6	28.9	0.258
12	-46.5	-17.5	0.9	0.828
13	-66.3	-25.5	22.3	0.460
14	-46.0	-26.8	81.4	0.698
15	-228.7	-6.7	38.6	0.030
16	-78.2	-42.0	4.4	0.168
17	-40.2	-0.7	81.5	0.522
18	-49.2	-87.4	119.9	2.919
19	-79.0	-13.5	127.8	0.197
20	-146.3	-36.0	12.6	0.075
21	-326.1	-98.7	0.9	2.402
22	-95.2	-7.3	34.8	0.071
23	-121.3	6.4	65.7	0.248
24	-47.5	-9.7	14.5	0.456
25	-14.4	-4.9	2.2	3.482
26	-33.8	-7.1	3.2	5.965
27	-45.4	-7.1	17.1	3.450
28	31.6	22.0	58.0	2.758
29	27.6	24.4	51.4	2.266
30	-5.8	7.7	31.6	1.222
31	21.1	14.3	28.9	1.153
32	0.0	7.9	25.9	0.717
33	0.0	16.3	49.9	1.517
34	11.4	14.1	48.0	1.347
35	-37.6	-13.0	32.3	13.768
36	-15.6	0.3	87.7	5.444
37	54.6	27.9	74.6	3.720
38	-0.4	3.4	40.0	0.925
39	38.4	-9.2	65.4	0.705
40	0.6	1.8	49.2	7.497
41	-61.3	1.9	27.0	35.178
42	3.6	11.6	21.4	0.856
43	0.0	9.4	22.6	2.123
44	18.6	20.8	52.2	2.413
45	31.0	4.9	26.9	1.362
46	-10.1	9.7	36.0	7.623
47	27.9	16.1	49.5	4.357
48	8.9	15.3	39.3	1.273
49	-39.5	15.7	41.8	1.449

# with lists of strings we get columns
merged_data[['WC/TA', 'EBIT/TA']]

	WC/TA	EBIT/TA
0	9.3	1.6
1	42.6	-10.1
2	-28.8	-51.0
3	2.5	-6.0
4	26.1	-23.5
5	39.2	-77.8
6	-5.4	-5.8
7	-35.2	-32.5
8	10.5	-2.3
9	-22.4	-7.9
10	24.6	-2.0
11	6.6	-2.6
12	33.9	-17.5
13	19.1	-25.5
14	-21.1	-26.8
15	2.5	-6.7
16	47.0	-42.0
17	9.1	-0.7
18	43.0	-87.4
19	-34.9	-13.5
20	20.6	-36.0
21	-51.6	-98.7
22	-93.0	-7.3
23	-127.5	6.4
24	-1.2	-9.7
25	30.6	-4.9
26	9.8	-7.1
27	37.8	-7.1
28	2.2	22.0
29	-11.5	24.4
30	-4.1	7.7
31	-5.7	14.3
32	25.2	7.9
33	-5.7	16.3
34	1.7	14.1
35	13.8	-13.0
36	8.2	0.3
37	74.7	27.9
38	43.1	3.4
39	48.7	-9.2
40	40.7	1.8
41	20.3	1.9
42	16.1	11.6
43	-6.1	9.4
44	-7.2	20.8
45	-5.3	4.9
46	4.1	9.7
47	0.3	16.1
48	-7.4	15.3
49	1.1	15.7

# we can also use boolean formulas to filter the rows of data frames
merged_data[merged_data['RE/TA'] < -20]

	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	Bankrupt	WC/TA+RE/TA	EBIT/TA*S/TA	Company
1	42.6	-60.1	-10.1	0.3	4.130	1.0	-17.5	-3.03	Advanced_Radio_Telecom
2	-28.8	-203.2	-51.0	14.7	0.111	1.0	-232.0	-749.70	Ardent_Communications
3	2.5	-433.1	-6.0	29.3	1.949	1.0	-430.6	-175.80	At_Home_Corp.
4	26.1	-57.4	-23.5	54.2	0.855	1.0	-31.3	-1273.70	Convergent_Communications
5	39.2	-111.8	-77.8	10.5	0.168	1.0	-72.6	-816.90	Covad_Communications
6	-5.4	-105.2	-5.8	38.9	0.028	1.0	-110.6	-225.62	e.spire
7	-35.2	-92.4	-32.5	48.5	11.280	1.0	-127.6	-1576.25	eGlobe
9	-22.4	-124.5	-7.9	125.6	1.595	1.0	-146.9	-992.24	General_Datacomm_Industries
10	24.6	-29.0	-2.0	21.3	1.968	1.0	-4.4	-42.60	Global_Telesystems
11	6.6	-50.9	-2.6	28.9	0.258	1.0	-44.3	-75.14	GST_Telecom
12	33.9	-46.5	-17.5	0.9	0.828	1.0	-12.6	-15.75	Metricom
13	19.1	-66.3	-25.5	22.3	0.460	1.0	-47.2	-568.65	Net2000_Communications
14	-21.1	-46.0	-26.8	81.4	0.698	1.0	-67.1	-2181.52	NetVoice_Technologies
15	2.5	-228.7	-6.7	38.6	0.030	1.0	-226.2	-258.62	PSINet
16	47.0	-78.2	-42.0	4.4	0.168	1.0	-31.2	-184.80	Rhythms_NetConnections
17	9.1	-40.2	-0.7	81.5	0.522	1.0	-31.1	-57.05	RSL_Communications
18	43.0	-49.2	-87.4	119.9	2.919	1.0	-6.2	-10479.26	SSE_Telecom
19	-34.9	-79.0	-13.5	127.8	0.197	1.0	-113.9	-1725.30	Startec_Global_Communications
20	20.6	-146.3	-36.0	12.6	0.075	1.0	-125.7	-453.60	Teligent
21	-51.6	-326.1	-98.7	0.9	2.402	1.0	-377.7	-88.83	U.S._Wireless
22	-93.0	-95.2	-7.3	34.8	0.071	1.0	-188.2	-254.04	Viatel
23	-127.5	-121.3	6.4	65.7	0.248	1.0	-248.8	420.48	WebLink_Wireless
24	-1.2	-47.5	-9.7	14.5	0.456	1.0	-48.7	-140.65	Winstar
26	9.8	-33.8	-7.1	3.2	5.965	0.0	-24.0	-22.72	Akamai_Technologies
27	37.8	-45.4	-7.1	17.1	3.450	0.0	-7.6	-121.41	Allegiance_Telecom
35	13.8	-37.6	-13.0	32.3	13.768	0.0	-23.8	-419.90	Digex
41	20.3	-61.3	1.9	27.0	35.178	0.0	-41.0	51.30	Openwave_Systems
49	1.1	-39.5	15.7	41.8	1.449	0.0	-38.4	656.26	Western_Wireless

# when we combine multiple conditions, we must use () because we have bitmasks
# we also need to use the bitwise operators & and | instead of the keywords 'and' and 'or'
merged_data[(merged_data['RE/TA'] < -20) & (merged_data['Bankrupt'] == 0)]

	WC/TA	RE/TA	EBIT/TA	S/TA	BVE/BVL	WC/TA+RE/TA	EBIT/TA*S/TA	Company
26	9.8	-33.8	-7.1	3.2	5.965	-24.0	-22.72	Akamai_Technologies
27	37.8	-45.4	-7.1	17.1	3.450	-7.6	-121.41	Allegiance_Telecom
35	13.8	-37.6	-13.0	32.3	13.768	-23.8	-419.90	Digex
41	20.3	-61.3	1.9	27.0	35.178	-41.0	51.30	Openwave_Systems
49	1.1	-39.5	15.7	41.8	1.449	-38.4	656.26	Western_Wireless

# we can use the conditions also with loc and also select the columns we want
merged_data.loc[(merged_data['RE/TA'] < -20) &
                (merged_data['Bankrupt'] == 0), ['WC/TA', 'EBIT/TA']]

	WC/TA	EBIT/TA
26	9.8	-7.1
27	37.8	-7.1
35	13.8	-13.0
41	20.3	1.9
49	1.1	15.7