Pandas arrays
For most data types, pandas uses NumPy arrays as the concreteobjects contained with a Index
, Series
, orDataFrame
.
For some data types, pandas extends NumPy’s type system.
Pandas and third-party libraries can extend NumPy’s type system (see Extension types).The top-level array()
method can be used to create a new array, which may bestored in a Series
, Index
, or as a column in a DataFrame
.
array (data, dtype, numpy.dtype, …) | Create an array. |
Datetime data
NumPy cannot natively represent timezone-aware datetimes. Pandas supports thiswith the arrays.DatetimeArray
extension array, which can hold timezone-naiveor timezone-aware values.
Timestamp
, a subclass of datetime.datetime
, is pandas’scalar type for timezone-naive or timezone-aware datetime data.
Timestamp | Pandas replacement for python datetime.datetime object. |
Properties
Methods
Timestamp.astimezone (self, tz) | Convert tz-aware Timestamp to another time zone. |
Timestamp.ceil (self, freq[, ambiguous, …]) | return a new Timestamp ceiled to this resolution |
Timestamp.combine (date, time) | date, time -> datetime with same date and time fields |
Timestamp.ctime () | Return ctime() style string. |
Timestamp.date () | Return date object with same year, month and day. |
Timestamp.day_name (self[, locale]) | Return the day name of the Timestamp with specified locale. |
Timestamp.dst () | Return self.tzinfo.dst(self). |
Timestamp.floor (self, freq[, ambiguous, …]) | return a new Timestamp floored to this resolution |
Timestamp.freq | |
Timestamp.freqstr | Return the total number of days in the month. |
Timestamp.fromordinal (ordinal[, freq, tz]) | passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself |
Timestamp.fromtimestamp (ts) | timestamp[, tz] -> tz’s local time from POSIX timestamp. |
Timestamp.isocalendar () | Return a 3-tuple containing ISO year, week number, and weekday. |
Timestamp.isoformat (self[, sep]) | |
Timestamp.isoweekday () | Return the day of the week represented by the date. |
Timestamp.month_name (self[, locale]) | Return the month name of the Timestamp with specified locale. |
Timestamp.normalize (self) | Normalize Timestamp to midnight, preserving tz information. |
Timestamp.now ([tz]) | Return new Timestamp object representing current time local to tz. |
Timestamp.replace (self[, year, month, day, …]) | implements datetime.replace, handles nanoseconds |
Timestamp.round (self, freq[, ambiguous, …]) | Round the Timestamp to the specified resolution |
Timestamp.strftime () | format -> strftime() style string. |
Timestamp.strptime (string, format) | Function is not implemented. |
Timestamp.time () | Return time object with same time but with tzinfo=None. |
Timestamp.timestamp () | Return POSIX timestamp as float. |
Timestamp.timetuple () | Return time tuple, compatible with time.localtime(). |
Timestamp.timetz () | Return time object with same time and tzinfo. |
Timestamp.to_datetime64 () | Return a numpy.datetime64 object with ‘ns’ precision. |
Timestamp.to_numpy () | Convert the Timestamp to a NumPy datetime64. |
Timestamp.to_julian_date (self) | Convert TimeStamp to a Julian Date. |
Timestamp.to_period (self[, freq]) | Return an period of which this timestamp is an observation. |
Timestamp.to_pydatetime () | Convert a Timestamp object to a native Python datetime object. |
Timestamp.today (cls[, tz]) | Return the current time in the local timezone. |
Timestamp.toordinal () | Return proleptic Gregorian ordinal. |
Timestamp.tz_convert (self, tz) | Convert tz-aware Timestamp to another time zone. |
Timestamp.tz_localize (self, tz[, ambiguous, …]) | Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp. |
Timestamp.tzname () | Return self.tzinfo.tzname(self). |
Timestamp.utcfromtimestamp (ts) | Construct a naive UTC datetime from a POSIX timestamp. |
Timestamp.utcnow () | Return a new Timestamp representing UTC day and time. |
Timestamp.utcoffset () | Return self.tzinfo.utcoffset(self). |
Timestamp.utctimetuple () | Return UTC time tuple, compatible with time.localtime(). |
Timestamp.weekday () | Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray
.For timezone-aware data, the .dtype
of a DatetimeArray
is aDatetimeTZDtype
. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are tz-aware, then every value in the array must have the same timezone.
arrays.DatetimeArray (values[, dtype, freq, copy]) | Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
DatetimeTZDtype ([unit, tz]) | An ExtensionDtype for timezone-aware datetime data. |
Timedelta data
NumPy can natively represent timedeltas. Pandas provides Timedelta
for symmetry with Timestamp
.
Timedelta | Represents a duration, the difference between two dates or times. |
Properties
Methods
A collection of timedeltas may be stored in a TimedeltaArray
.
Timespan data
Pandas represents spans of times as Period
objects.
Period
Period | Represents a period of time |
Properties
Methods
Period.asfreq () | Convert Period to desired frequency, either at the start or end of the interval |
Period.now () | |
Period.strftime () | Returns the string representation of the Period , depending on the selected fmt . |
Period.to_timestamp () | Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period |
A collection of timedeltas may be stored in a arrays.PeriodArray
.Every period in a PeriodArray
must have the same freq
.
arrays.PeriodArray (values[, freq, dtype, copy]) | Pandas ExtensionArray for storing Period data. |
Interval data
Arbitrary intervals can be represented as Interval
objects.
Interval | Immutable object implementing an Interval, a bounded slice-like interval. |
Properties
A collection of intervals may be stored in an arrays.IntervalArray
.
Nullable integer
numpy.ndarray
cannot natively represent integer-data with missing values.Pandas provides this through arrays.IntegerArray
.
Int8Dtype | An ExtensionDtype for int8 integer data. |
Int16Dtype | An ExtensionDtype for int16 integer data. |
Int32Dtype | An ExtensionDtype for int32 integer data. |
Int64Dtype | An ExtensionDtype for int64 integer data. |
UInt8Dtype | An ExtensionDtype for uint8 integer data. |
UInt16Dtype | An ExtensionDtype for uint16 integer data. |
UInt32Dtype | An ExtensionDtype for uint32 integer data. |
UInt64Dtype | An ExtensionDtype for uint64 integer data. |
Categorical data
Pandas defines a custom data type for representing data that can take only alimited, fixed set of values. The dtype of a Categorical
can be described bya pandas.api.types.CategoricalDtype
.
CategoricalDtype ([categories]) | Type for categorical data with the categories and orderedness. |
Categorical data can be stored in a pandas.Categorical
Categorical (values[, categories, ordered, …]) | Represent a categorical variable in classic R / S-plus fashion. |
The alternative Categorical.from_codes()
constructor can be used when youhave the categories and integer codes already:
The dtype information is available on the Categorical
np.asarray(categorical)
works by implementing the array interface. Be aware, that this convertsthe Categorical back to a NumPy array, so categories and order information is not preserved!
A Categorical
can be stored in a Series
or DataFrame
.To create a Series of dtype category
, use cat = s.astype(dtype)
orSeries(…, dtype=dtype)
where dtype
is either
- the string
'category'
- an instance of
CategoricalDtype
.
If the Series is of dtype CategoricalDtype
, Series.cat
can be used to change the categoricaldata. See Categorical accessor for more.
Sparse data
Data where a single value is repeated many times (e.g. 0
or NaN
) maybe stored efficiently as a SparseArray
.
SparseArray (data[, sparse_index, index, …]) | An ExtensionArray for storing sparse data. |
The Series.sparse
accessor may be used to access sparse-specific attributesand methods if the Series
contains sparse values. SeeSparse accessor for more.