导入pandas和numpy
import pandas as pd
import numpy as np
1.导入数据并查看属性
data = pd.read_csv("data/Online_Retail_Fake.csv") # 导入数据
data.shape # 查看shape属性
outputs:
(541910, 8)
data.size # 查看size属性
outputs:
4335280
data # 查看数据
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | NaN | 17850.0 | United Kingdom |
2 | 536365 | 84406B | NaN | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
data.info() # 查看元数据信息
output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541910 entries, 0 to 541909
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 InvoiceNo 541910 non-null object
1 StockCode 541909 non-null object
2 Description 540454 non-null object
3 Quantity 541910 non-null int64
4 InvoiceDate 541910 non-null object
5 UnitPrice 541907 non-null float64
6 CustomerID 406828 non-null float64
7 Country 541909 non-null object
dtypes: float64(2), int64(1), object(5)
memory usage: 33.1+ MB
2.异常数据处理
2.1空值
data.isnull().sum() # 查看是否有空值
InvoiceNo 0
StockCode 1
Description 1456
Quantity 0
InvoiceDate 0
UnitPrice 3
CustomerID 135082
Country 1
dtype: int64
可以看到各个属性列的空值个数
2.2 重复数据
data.duplicated()
output:
0 False
1 False
2 False
3 False
4 False
...
541905 False
541906 False
541907 False
541908 False
541909 False
Length: 541910, dtype: bool
data.duplicated().sum()
outputs:
5268
可以看出有较多重复数据
data[data.duplicated()].head()
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
517 | 536409 | 21866 | UNION JACK FLAG LUGGAGE TAG | 1 | 2010/12/1 11:45 | 1.25 | 17908.0 | United Kingdom |
527 | 536409 | 22866 | HAND WARMER SCOTTY DOG DESIGN | 1 | 2010/12/1 11:45 | 2.10 | 17908.0 | United Kingdom |
537 | 536409 | 22900 | SET 2 TEA TOWELS I LOVE LONDON | 1 | 2010/12/1 11:45 | 2.95 | 17908.0 | United Kingdom |
539 | 536409 | 22111 | SCOTTIE DOG HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 4.95 | 17908.0 | United Kingdom |
555 | 536412 | 22327 | ROUND SNACK BOXES SET OF 4 SKULLS | 1 | 2010/12/1 11:49 | 2.95 | 17920.0 | United Kingdom |
可以看出InvoiceNo
为536409的信息较多,需要单独分析
查看InvoiceNo为536409的数据
data[data["InvoiceNo"] == "536409"]
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
483 | 536409 | 90199C | 5 STRAND GLASS NECKLACE CRYSTAL | 3 | 2010/12/1 11:45 | 6.35 | 17908.0 | United Kingdom |
484 | 536409 | 21479 | WHITE SKULL HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 3.75 | 17908.0 | United Kingdom |
485 | 536409 | 22111 | SCOTTIE DOG HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 4.95 | 17908.0 | United Kingdom |
486 | 536409 | 22785 | SQUARECUSHION COVER PINK UNION FLAG | 1 | 2010/12/1 11:45 | 6.75 | 17908.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | |
537 | 536409 | 22900 | SET 2 TEA TOWELS I LOVE LONDON | 1 | 2010/12/1 11:45 | 2.95 | 17908.0 | United Kingdom |
538 | 536409 | 21980 | PACK OF 12 RED RETROSPOT TISSUES | 12 | 2010/12/1 11:45 | 0.29 | 17908.0 | United Kingdom |
539 | 536409 | 22111 | SCOTTIE DOG HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 4.95 | 17908.0 | United Kingdom |
540 | 536409 | 22112 | CHOCOLATE HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 4.95 | 17908.0 | United Kingdom |
data[data.duplicated(keep=False)][data[data.duplicated(keep=False)]["InvoiceNo"] == "536409"]
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
485 | 536409 | 22111 | SCOTTIE DOG HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 4.95 | 17908.0 | United Kingdom |
489 | 536409 | 22866 | HAND WARMER SCOTTY DOG DESIGN | 1 | 2010/12/1 11:45 | 2.10 | 17908.0 | United Kingdom |
494 | 536409 | 21866 | UNION JACK FLAG LUGGAGE TAG | 1 | 2010/12/1 11:45 | 1.25 | 17908.0 | United Kingdom |
517 | 536409 | 21866 | UNION JACK FLAG LUGGAGE TAG | 1 | 2010/12/1 11:45 | 1.25 | 17908.0 | United Kingdom |
521 | 536409 | 22900 | SET 2 TEA TOWELS I LOVE LONDON | 1 | 2010/12/1 11:45 | 2.95 | 17908.0 | United Kingdom |
527 | 536409 | 22866 | HAND WARMER SCOTTY DOG DESIGN | 1 | 2010/12/1 11:45 | 2.10 | 17908.0 | United Kingdom |
537 | 536409 | 22900 | SET 2 TEA TOWELS I LOVE LONDON | 1 | 2010/12/1 11:45 | 2.95 | 17908.0 | United Kingdom |
539 | 536409 | 22111 | SCOTTIE DOG HOT WATER BOTTLE | 1 | 2010/12/1 11:45 | 4.95 | 17908.0 | United Kingdom |
可直接获取如上重复数据
2.3 异常处理
2.3.1 空值处理
data.dropna()
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
5 | 536365 | 22752 | SET 7 BABUSHKA NESTING BOXES | 2 | 2010/12/1 8:26 | 7.65 | 17850.0 | United Kingdom |
6 | 536365 | 21730 | GLASS STAR FROSTED T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 4.25 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
406825 rows × 8 columns
如上实现某一行有一个空值则去掉
data.dropna(how="all")
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | NaN | 17850.0 | United Kingdom |
2 | 536365 | 84406B | NaN | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
如上实现某一行全为空值才去掉
data.dropna(axis=1)
outputs:
InvoiceNo | Quantity | InvoiceDate | |
---|---|---|---|
0 | 536365 | 6 | 2010/12/1 8:26 |
1 | 536365 | 6 | 2010/12/1 8:26 |
2 | 536365 | 8 | 2010/12/1 8:26 |
3 | 536365 | 6 | 2010/12/1 8:26 |
4 | 536365 | 6 | 2010/12/1 8:26 |
... | ... | ... | ... |
541905 | 581587 | 6 | 2011/12/9 12:50 |
541906 | 581587 | 4 | 2011/12/9 12:50 |
541907 | 581587 | 4 | 2011/12/9 12:50 |
541908 | 581587 | 3 | 2011/12/9 12:50 |
541909 | 581587 | 3 | 2011/12/9 12:50 |
541910 rows × 3 columns
如上实现按列去除
data.dropna(axis=1, how="all")
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | NaN | 17850.0 | United Kingdom |
2 | 536365 | 84406B | NaN | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
如上实现按列去除全为空值的列
data.fillna(value=0)
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | 0.00 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | 0 | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
如上实现空值填充为0
data.fillna(method="ffill")
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | WHITE METAL LANTERN | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
如上实现空值填充为上面的数据
data.fillna(method="bfill")
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | KNITTED UNION FLAG HOT WATER BOTTLE | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
如上实现空值填充为下面的数据
2.3.2 重复值处理
data.drop_duplicates()
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | NaN | 17850.0 | United Kingdom |
2 | 536365 | 84406B | NaN | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
去除重复数据
data
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | NaN | 17850.0 | United Kingdom |
2 | 536365 | 84406B | NaN | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
data = data.fillna(method="bfill")
data
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | KNITTED UNION FLAG HOT WATER BOTTLE | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541909 | 581587 | 22138 | Wrong booking | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
541910 rows × 8 columns
data.drop_duplicates(["InvoiceNo", "UnitPrice"])
outputs:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
5 | 536365 | 22752 | SET 7 BABUSHKA NESTING BOXES | 2 | 2010/12/1 8:26 | 7.65 | 17850.0 | United Kingdom |
6 | 536365 | 21730 | GLASS STAR FROSTED T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 4.25 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541897 | 581587 | 22728 | ALARM CLOCK BAKELIKE PINK | 4 | 2011/12/9 12:50 | 3.75 | 12680.0 | France |
541903 | 581587 | 23256 | CHILDRENS CUTLERY SPACEBOY | 4 | 2011/12/9 12:50 | 4.15 | 12680.0 | France |
541904 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 2011/12/9 12:50 | 0.85 | 12680.0 | France |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011/12/9 12:50 | 2.10 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011/12/9 12:50 | 4.95 | 12680.0 | France |
204727 rows × 8 columns
如上实现去除 InvoiceNo
和UnitPrice
相同的数据